In this paper, a new approach to model syllable pitch contour for Mandarin speech is proposed. It takes the mean and shape of syllable pitch contour as two basic modeling units and considers several affecting factors that contribute to their variations. Parameters of the two models are automatically estimated by the EM algorithm. Experimental results showed that RMSEs of 0.551 ms and 0.614 ms in the reconstructed pitch were obtained for the closed and open tests, respectively. All inferred values of those affecting factors agreed well with our prior linguistic knowledge. Besides, the prosodic states automatically labeled by the pitch mean model provided useful cues to determine the prosodic phrase boundaries occurred at inter-syllable locations without punctuation marks. So it is a promising pitch modeling approach.
|Number of pages||4|
|State||Published - 1 Jan 2003|
|Event||8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland|
Duration: 1 Sep 2003 → 4 Sep 2003
|Conference||8th European Conference on Speech Communication and Technology, EUROSPEECH 2003|
|Period||1/09/03 → 4/09/03|