Latent prosody model of continuous mandarin speech

Chen Yu Chiang*, Xiao Dong Wang, Yuan Fu Liao, Yih-Ru Wang, Sin-Horng Chen, Keikichi Hirose

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

The major difficulty of prosody modeling and automatic tone recognition of continuous Mandarin speech is the complex interaction of tones and prosody/intonation on F0 contours. In this study, we propose a latent prosody model (LPM) aiming to jointly model the affections of tone and prosody state on F0. The main purposes are twofold including (1) automatic prosody state labeling and (2) improving tone recognition accuracy. The basic idea is to introduce latent prosody state variables into an additive statistic model of F0 which already considers the affecting factors of tone and speaker. Experiments on the Tree-Bank corpus showed that LPM not only gave meaningful prosody state labeling results but also improved the average tone recognition rate from 80.86% of a multi-layer perceptron (MLP) baseline to 82.55%.

Original languageEnglish
Title of host publication2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07
DOIs
StatePublished - 6 Aug 2007
Event2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07 - Honolulu, HI, United States
Duration: 15 Apr 200720 Apr 2007

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume4
ISSN (Print)1520-6149

Conference

Conference2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07
CountryUnited States
CityHonolulu, HI
Period15/04/0720/04/07

Keywords

  • Speech processing
  • Speech recognition
  • Tone recognition

Fingerprint Dive into the research topics of 'Latent prosody model of continuous mandarin speech'. Together they form a unique fingerprint.

Cite this