Continuous topic language modeling for speech recognition

Chuang Hua Chueh*, Jen-Tzung Chien

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Continuous representation of word sequence can effectively solve data sparseness problem in n-gram language model, where the discrete variables of words are represented and the unseen events are prone to happen. This problem is increasingly severe when extracting long-distance regularities for high-order n-gram model. Rather than considering discrete word space, we construct the continuous space of word sequence where the latent topic information is extracted. The continuous vector is formed by the topic posterior probabilities and the least-squares projection matrix from discrete word space to continuous topic space is estimated accordingly. The unseen words can be predicted through the new continuous latent topic language model. In the experiments on continuous speech recognition, we obtain significant performance improvement over the conventional topic-based language model.

Original languageEnglish
Title of host publication2008 IEEE Workshop on Spoken Language Technology, SLT 2008 - Proceedings
Pages193-196
Number of pages4
DOIs
StatePublished - 1 Dec 2008
Event2008 IEEE Workshop on Spoken Language Technology, SLT 2008 - Goa, India
Duration: 15 Dec 200819 Dec 2008

Publication series

Name2008 IEEE Workshop on Spoken Language Technology, SLT 2008 - Proceedings

Conference

Conference2008 IEEE Workshop on Spoken Language Technology, SLT 2008
CountryIndia
CityGoa
Period15/12/0819/12/08

Keywords

  • Clustering methods
  • Natural languages
  • Smoothing methods
  • Speech recognition

Fingerprint Dive into the research topics of 'Continuous topic language modeling for speech recognition'. Together they form a unique fingerprint.

Cite this