Prosody-dependent acoustic modeling for mandarin speech recognition

Tzu Hsuan Chiu, Chen Yu Chiang, Yuan Fu Liao*, Jyh Her Yang, Yih-Ru Wang, Sin-Horng Chen

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


A study on introducing prosodic information to acoustic modeling (AM) for speech recognition is reported in this paper. It extends the conventional context-dependent (CD) triphone HMM modeling approach to further consider the dependency of phone model on the break type of nearby inter-syllable boundary. Four break types are considered, including major break, minor break, normal non-break, and tightly-coupled non-break. In the training phase, break labeling is automatically accomplished by a Prosody Labeling and Modeling algorithm proposed previously. Then, prosody- and phonetic-dependent phone models are constructed by a standard decision tree-based context clustering of HMMs. The effectiveness of the new AM was examined on a Mandarin syllable recognition task. Experimental results showed that the new approach outperformed the conventional CD-AM on achieving better syllable recognition rate as well as on obtaining a more efficient syllable lattice with better compromise on complexity verse syllable coverage rate.

Original languageEnglish
Title of host publicationProceedings of the 6th International Conference on Speech Prosody, SP 2012
PublisherTongji University Press
Number of pages4
ISBN (Print)9787560848693
StatePublished - 1 Jan 2012
Event6th International Conference on Speech Prosody 2012, SP 2012 - Shanghai, China
Duration: 22 May 201225 May 2012

Publication series

NameProceedings of the 6th International Conference on Speech Prosody, SP 2012


Conference6th International Conference on Speech Prosody 2012, SP 2012


  • Acoustic modeling
  • Prosodic break
  • Prosody-dependent acoustic model
  • Speech recognition

Fingerprint Dive into the research topics of 'Prosody-dependent acoustic modeling for mandarin speech recognition'. Together they form a unique fingerprint.

Cite this