Neural adversarial learning for speaker recognition

Jen-Tzung Chien*, Kang Ting Peng

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

This paper presents the adversarial learning approaches to deal with various tasks in speaker recognition based on probabilistic discriminant analysis (PLDA) which is seen as a latent variable model for reconstruction of i-vectors. The first task aims to reduce the dimension of i-vectors based on an adversarial manifold learning where the adversarial neural networks of generator and discriminator are merged to preserve neighbor embedding of i-vectors in a low-dimensional space. The generator is trained to fool the discriminator with the generated samples in latent space. A PLDA subspace model is constructed by jointly minimizing a PLDA reconstruction error, a manifold loss for neighbor embedding and an adversarial loss caused by the generator and discriminator. The second task of adversarial learning is developed to tackle the imbalanced data problem. A PLDA based generative adversarial network is trained to generate new i-vectors to balance the size of training utterances across different speakers. An adversarial augmentation learning is proposed for robust speaker recognition. In particular, the minimax optimization is performed to estimate a generator and a discriminator where the class conditional i-vectors produced by generator could not be distinguished from real i-vectors via discriminator. A multiobjective learning is realized for a specialized neural model with the cosine similarity between real and fake i-vectors as well as the regularization for Gaussianity. Experiments are conducted to show the merit of adversarial learning in subspace construction and data augmentation for PLDA-based speaker recognition.

Original languageEnglish
Pages (from-to)422-440
Number of pages19
JournalComputer Speech and Language
Volume58
DOIs
StatePublished - 1 Nov 2019

Keywords

  • Adversarial learning
  • Data augmentation
  • Manifold learning
  • Probabilistic linear discriminant analysis
  • Speaker recognition

Fingerprint Dive into the research topics of 'Neural adversarial learning for speaker recognition'. Together they form a unique fingerprint.

Cite this