Bayesian Adversarial Learning for Speaker Recognition

Jen Tzung Chien, Chun Lin Kuo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents a new generative adversarial network (GAN) which artificially generates the i-vectors to compensate the imbalanced or insufficient data in speaker recognition based on the probabilistic linear discriminant analysis. Theoretically, GAN is powerful to generate the artificial data which are misclassified as the real data. However, GAN suffers from the mode collapse problem in two-player optimization over generator and discriminator. This study deals with this challenge by improving the model regularization through characterizing the weight uncertainty in GAN. A new Bayesian GAN is implemented to learn a regularized model from diverse data where the strong modes are flattened via the marginalization. In particular, we present a variational GAN (VGAN) where the encoder, generator and discriminator are jointly estimated according to the variational inference. The computation cost is significantly reduced. To assure the preservation of gradient values, the learning objective based on Wasserstein distance is further introduced. The issues of model collapse and gradient vanishing are alleviated. Experiments on NIST i-vector Speaker Recognition Challenge demonstrate the superiority of the proposed VGAN to the variational autoencoder, the standard GAN and the Bayesian GAN based on the sampling method. The learning efficiency and generation performance are evaluated.

Original languageEnglish
Title of host publication2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages381-388
Number of pages8
ISBN (Electronic)9781728103068
DOIs
StatePublished - Dec 2019
Event2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Singapore, Singapore
Duration: 15 Dec 201918 Dec 2019

Publication series

Name2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings

Conference

Conference2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019
CountrySingapore
CitySingapore
Period15/12/1918/12/19

Keywords

  • Bayesian learning
  • generative adversarial networks
  • speaker recognition
  • variational autoencoder

Fingerprint Dive into the research topics of 'Bayesian Adversarial Learning for Speaker Recognition'. Together they form a unique fingerprint.

  • Cite this

    Chien, J. T., & Kuo, C. L. (2019). Bayesian Adversarial Learning for Speaker Recognition. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings (pp. 381-388). [9004033] (2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU46091.2019.9004033