A Bayesian prediction approach to robust speech recognition and online environmental learning

Jen-Tzung Chien*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

A robust speech recognizer is developed to tackle the inevitable mismatch between training and testing environments. Because the realistic environments are uncertain and nonstationary, it is necessary to characterize the uncertainty of speech hidden Markov models (HMMs) for recognition and trace the uncertainty incrementally to catch the newest environmental statistics. In this paper, we develop a new Bayesian predictive classification (BPC) for robust decision and online environmental learning. The BPC decision is adequately established by modeling the uncertainties of both the HMM mean vector and precision matrix using a conjugate prior density. The frame-based predictive distributions using multivariate t distributions and approximate Gaussian distributions are herein exploited. After the recognition, the prior density is pooled with the likelihood of the current test sentence to generate the reproducible prior density. The hyperparameters of the prior density are accordingly adjusted to meet the newest environments and apply for the recognition of upcoming data. As a result, an efficient online unsupervised learning strategy is developed for HMM-based speech recognition without needing adaptation data. In the experiments, the proposed approach is significantly better than conventional plug-in maximum a posteriori (MAP) decision on the recognition of connected Chinese digits in hands-free car environments. This approach is economical in computation.

Original languageEnglish
Pages (from-to)321-334
Number of pages14
JournalSpeech Communication
Volume37
Issue number3-4
DOIs
StatePublished - 1 Jul 2002

Keywords

  • Bayesian predictive classification (BPC)
  • Hidden Markov model
  • Online unsupervised learning
  • Speaker adaptation
  • Speech recognition

Fingerprint Dive into the research topics of 'A Bayesian prediction approach to robust speech recognition and online environmental learning'. Together they form a unique fingerprint.

Cite this