Knowledge integration for improving performance in LVCSR

Chen Yu Chiang, Sabato Marco Siniscalchi, Sin-Horng Chen, Chin Hui Lee

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations

Abstract

This paper presents a knowledge integration framework to improve performance in large vocabulary continuous speech recognition. Two types of knowledge sources, manner attribute and prosodic structure, are incorporated. For manner of articulation, six attribute detectors trained with an American English corpus (WSJ0) are utilized to rescore hypothesized phones in word lattices obtained by a baseline ASR system. For the prosodic structure, models trained with an unsupervised joint prosody labeling and modeling (PLM) technique using WSJ0 are used in lattice rescoring. Experimental results on the American English WSJ word recognition task of the Nov92 test set show that the proposed approach significantly outperforms the baseline system that does not use articulatory and prosodic information. The results also demonstrate the effectiveness and usefulness of the PLM technique in constructing prosodic models for American English ASR.

Original languageEnglish
Pages (from-to)1786-1790
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
StatePublished - 1 Jan 2013
Event14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013 - Lyon, France
Duration: 25 Aug 201329 Aug 2013

Keywords

  • Attribute detector
  • Knowledge-based system
  • LVCSR
  • Prosody labeling/modeling

Fingerprint Dive into the research topics of 'Knowledge integration for improving performance in LVCSR'. Together they form a unique fingerprint.

Cite this