Mixture of PLDA for noise robust i-vector speaker verification

Man Wai Mak, Xiaomin Pang, Jen-Tzung Chien

Research output: Contribution to journalArticlepeer-review

45 Scopus citations

Abstract

In real-world environments, noisy utterances with variable noise levels are recorded and then converted to i-vectors for cosine distance or PLDA scoring. This paper investigates the effect of noise-level variability on i-vectors. It demonstrates that noise-level variability causes the i-vectors to shift, causing the noise contaminated i-vectors to form clusters in the i-vector space. It also demonstrates that optimal subspaces for discriminating speakers are noise-level dependent. Based on these observations, this paper proposes using signal-to-noise ratio (SNR) of utterances as guidance for training mixture of PLDA models. To maximize the coordination among the PLDA models, mixtures of PLDA models are trained simultaneously via an EM algorithm using the utterances contaminated with noise at various levels. For scoring, given a test i-vector, the marginal likelihoods from individual PLDA models are linearly combined by the posterior probabilities of the test utterance's SNR. Verification scores are the ratio of the marginal likelihoods. Results based on NIST 2012 SRE suggest that the SNR-dependent mixture of PLDA is not only suitable for the situations where the test utterances exhibit a wide range of SNR, but also beneficial for the test utterances with unknown SNR distribution. Supplementary materials containing full derivations of the EM algorithms and scoring functions can be found in http://bioinfo.eie.polyu.edu.hk/mPLDA/SuppMaterials.pdf.

Original languageEnglish
Pages (from-to)130-142
Number of pages13
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume24
Issue number1
DOIs
StatePublished - 1 Jan 2016

Keywords

  • I-vectors
  • Mixture of PLDA
  • Noise robustness
  • Probabilistic LDA
  • Speaker verification

Fingerprint Dive into the research topics of 'Mixture of PLDA for noise robust i-vector speaker verification'. Together they form a unique fingerprint.

Cite this