Hidden speaking rate is proposed in this paper. In contrast to traditional raw speaking rate estimation that simply averages number of syllable or phone per second with or without pauses, the proposed hidden speaking rate is estimated by normalizing effects of lexical information and prosodic structure based on the existing speaking rate-dependent hierarchical prosodic model (SR-HPM). The significance of the proposed hidden speaking rate is exemplified by analysis on the speaking rate estimation for a Mandarin speech database containing four parallel speech corpora of a female professional announcer with fast, normal, medium and slow speaking rates. By conducting prosody generation experiment on the same speech corpus, the hidden speaking rate is proved to be more meaningful and accurate to represent speaker’s intended or underlying speaking rate than conventional raw speaking rate.
|頁（從 - 到）||592-596|
|期刊||Proceedings of the International Conference on Speech Prosody|
|出版狀態||Published - 1 一月 2018|
|事件||9th International Conference on Speech Prosody, SP 2018 - Poznan, Poland|
持續時間: 13 六月 2018 → 16 六月 2018