A statistics-based pitch contour model for Mandarin speech

Sin-Horng Chen, Wen Hsing Lai, Yih-Ru Wang

Research output: Contribution to journalArticle

22 Scopus citations

Abstract

A statistics-based syllable pitch contour model for Mandarin speech is proposed. This approach takes the mean and the shape of a syllable log-pitch contour as two basic modeling units and considers several affecting factors that contribute to their variations. The affecting factors include the speaker, prosodic state (which essentially represents the high-level linguistic components of F0 and will be explained more clearly in Sec. I), tone, and initial and final syllable classes. The parameters of the two modeling units were automatically estimated using the expectation-maximization (EM) algorithm. Experimental results showed that the root mean squared errors (RMSEs) obtained in the closed and open tests in the reconstructed pitch period were 0.362 and 0.373 ms, respectively. This model provides a way to separate the effects of several major factors. All of the inferred values of the affecting factors were in close agreement with our prior linguistic knowledge. It also gives a quantitative and more complete description of the coarticulation effect of neighboring tones rather than conventional qualitative descriptions of the tone sandhi rules. In addition, the model can provide useful cues to determine the prosodic phrase boundaries, including those occurring at intersyllable locations, with or without punctuation marks.

Original languageEnglish
Pages (from-to)908-925
Number of pages18
JournalJournal of the Acoustical Society of America
Volume117
Issue number2
DOIs
StatePublished - 1 Feb 2005

Fingerprint Dive into the research topics of 'A statistics-based pitch contour model for Mandarin speech'. Together they form a unique fingerprint.

  • Cite this