Prosodic model of Mandarin speech and its application to pitch level generation for text-to-speech

Shaw-Hwa Hwang*, Sin-Horng Chen

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

5 Scopus citations


A prosodic model of Mandarin speech is proposed to simulate human's pronunciation mechanism for exploring the hidden pronunciation states embedded in the input text. Parameters representing these pronunciation states are then used to assist prosody information generation. A multirate recurrent neural network (MRNN) is employed to realize the prosodic model. Two learning methods were proposed to train the MRNN. One is an indirect method which firstly uses an additional SRNN to track the dynamics of the prosody information of the utterance; and then takes the outputs of its hidden layer as desired targets to train the MRNN. The other is a direct training method which integrates the MRNN and the following MLP prosody synthesizers to directly learn the relation between the input linguistic features and the output prosody information. Simulation results confirmed the effectiveness of the approach. Most synthesized prosodic parameter sequences match quite well with their original counterparts.

Original languageEnglish
Pages (from-to)616-618
Number of pages3
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
StatePublished - 1 Jan 1995
EventProceedings of the 1995 20th International Conference on Acoustics, Speech, and Signal Processing. Part 1 (of 5) - Detroit, MI, USA
Duration: 9 May 199512 May 1995

Fingerprint Dive into the research topics of 'Prosodic model of Mandarin speech and its application to pitch level generation for text-to-speech'. Together they form a unique fingerprint.

Cite this