Structural maximum a posteriori speaker adaptation of speaking rate-dependent hierarchical prosodic model for Mandarin TTS

I. Bin Liao, Chen Yu Chiang, Sin-Horng Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

In this paper, a structural maximum a posterior speaker adaptation method to adjust the existing speaking rate (SR) dependent hierarchical prosodic model (SR-HPM) to a new speaker's data for realizing a new voice of any given SR is discussed. The adaptive SR-HPM is formulated based on MAP estimation with a reference SR-HPM serving as an informative prior. The prior information provided by the reference SR-HPM is hierarchically organized by decision trees. The results of objective and subjective evaluations showed that the proposed method not only performed slightly better than the maximum likelihood-based model in the observed SR range of the target speaker's data, but also was much better in the unseen SR range.

Original languageEnglish
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5625-5629
Number of pages5
ISBN (Electronic)9781479999880
DOIs
StatePublished - 18 May 2016
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
Duration: 20 Mar 201625 Mar 2016

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2016-May
ISSN (Print)1520-6149

Conference

Conference41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
CountryChina
CityShanghai
Period20/03/1625/03/16

Keywords

  • hierarchical prosodic model
  • Mandarin TTS
  • prosodic-acoustic features
  • speaker adaptation

Fingerprint Dive into the research topics of 'Structural maximum a posteriori speaker adaptation of speaking rate-dependent hierarchical prosodic model for Mandarin TTS'. Together they form a unique fingerprint.

Cite this