Enriching Mandarin speech recognition by incorporating a hierarchical prosody model

Jyh Her Yang*, Ming Chieh Liu, Hao Hsiang Chang, Chen Yu Chiang, Yih-Ru Wang, Sin-Horng Chen

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

This paper presents a new probabilistic framework of Mandarin speech recognition by incorporating a sophisticated hierarchical prosody model into the conventional HMM-based system. The prosody model describes the relations of linguistic cues of various levels, break types and prosodic states which represent the prosody hierarchical structure, and prosody-related acoustic features. Aside from producing the recognized word sequences, the system also decodes other information including word's part-of-speech, punctuation marks, inter-syllable break types, and prosodic states of syllables. Experimental results on the TCC300 corpus, which consists of paragraphic utterances, showed that the proposed system significantly outperformed the baseline system. The word and character error rates decreased from 24.4% and 18.1% to 20.7% and 14.4% (or 15.2% and 20.4% relative improvements), respectively.

Original languageEnglish
Title of host publication2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
Pages5052-5055
Number of pages4
DOIs
StatePublished - 18 Aug 2011
Event36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Prague, Czech Republic
Duration: 22 May 201127 May 2011

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
CountryCzech Republic
CityPrague
Period22/05/1127/05/11

Keywords

  • Hierarchical prosody model
  • Mandarin speech recognition

Fingerprint Dive into the research topics of 'Enriching Mandarin speech recognition by incorporating a hierarchical prosody model'. Together they form a unique fingerprint.

  • Cite this

    Yang, J. H., Liu, M. C., Chang, H. H., Chiang, C. Y., Wang, Y-R., & Chen, S-H. (2011). Enriching Mandarin speech recognition by incorporating a hierarchical prosody model. In 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings (pp. 5052-5055). [5947492] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2011.5947492