Acoustic factor analysis for streamed hidden Markov modeling

Jen-Tzung Chien*, Chuan Wei Ting

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Scopus citations


This paper presents a novel streamed hidden Markov model (HMM) framework for speech recognition. The factor analysis (FA) principle is adopted to explore the common factors from acoustic features. The streaming regularities in building HMMs are governed by the correlation between cepstral features, which is inherent in common factors. Those features corresponding to the same factor are generated by the identical HMM state. Accordingly, the multiple Markov chains are adopted to characterize the variation trends in different dimensions of cepstral vectors. An FA streamed HMM (FASHMM) method is developed to relax the assumption of standard HMM topology, namely, that all features of a speech frame perform the same state emission. The proposed FASHMM is more flexible than the streamed factorial HMM (SFHMM) where the streaming was empirically determined. To reduce the number of factor loading matrices in FA, we evaluated the similarity between individual matrices to find the optimal solution to parameter clustering of FA models. A new decoding algorithm was presented to perform FASHMM speech recognition. FASHMM carries out the streamed Markov chains for a sequence of multivariate Gaussian mixture observations through the state transitions of the partitioned vectors. In the experiments, the proposed method reduced the recognition error rates significantly when compared with the standard HMM and SFHMM methods.

Original languageEnglish
Article number5165112
Pages (from-to)1279-1291
Number of pages13
JournalIEEE Transactions on Audio, Speech and Language Processing
Issue number7
StatePublished - 1 Sep 2009


  • Factor analysis (FA)
  • Markov chain
  • Speech recognition
  • Streamed hidden Markov model

Fingerprint Dive into the research topics of 'Acoustic factor analysis for streamed hidden Markov modeling'. Together they form a unique fingerprint.

Cite this