A speaking rate-controlled Mandarin TTS system

Chiao Hua Hsieh, Yih-Ru Wang, Chen Yu Chiang, Sin-Horng Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

In this paper, a new speaking rate-controlled Mandarin TTS system based on a speaking rate-dependent hierarchical prosodic model (SR-HPM) [6] is proposed. In the training phase, a data-driven approach is employed to automatically build the SR-HPM directly from a large prosody-unlabeled speech database containing utterances of various speaking rates. The SR-HPM comprises 15 sub-models designed to describe various relationships among 3 types of prosodic-acoustic features of speech utterances, two types of prosodic tags specifying a 4-layer prosody hierarchy, linguistic features of various levels of the associated texts, and the speaking rates. In the test phase, the SR-HPM is employed to generate 4 prosodic-acoustic features, including syllable pitch contours, syllable durations, syllable energy levels, and syllable juncture pause durations. Combining these prosodic features with the spectral features generated by the HTS synthesizer, the system can generate natural speech for any speaking rate in a wide range of 0.15-0.3 seconds/syllable. A distinct feature of the system to control the occurrence frequencies of breaks of various types as well as their pause durations according to the given speaking rate was demonstrated. A subjective test showed that MOS scores of 3.35, 3.44 and 3.28 were achieved respectively for fast (SR=0.17 sec/syllable), medium (SR=0.2 sec/syllable) and slow (SR=0.25 sec/syllable) synthetic speeches.

Original languageEnglish
Title of host publication2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
Pages6900-6904
Number of pages5
DOIs
StatePublished - 18 Oct 2013
Event2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC, Canada
Duration: 26 May 201331 May 2013

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
CountryCanada
CityVancouver, BC
Period26/05/1331/05/13

Keywords

  • Mandarin prosody modeling
  • Speaking rate modeling
  • Speaking rate-controlled TTS

Fingerprint Dive into the research topics of 'A speaking rate-controlled Mandarin TTS system'. Together they form a unique fingerprint.

Cite this