A consistency analysis on an acoustic module for Mandarin text-to-speech

Cheng Yu Yeh*, Shun Chieh Chang, Shaw-Hwa Hwang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

In this work, a consistency analysis on an acoustic module for a Mandarin text-to-speech (TTS) is presented as a way to improve the speech quality. Found by an inspection on the pronunciation process of human beings, the consistency can be interpreted as a high correlation of a warping curve between the spectrum and the prosody intra a syllable. Through three steps in the procedure of the consistency analysis, the HMM algorithm is used firstly to decode HMM-state sequences within a syllable at the same time as to divide them into three segments. Secondly, based on a designated syllable, the vector quantization (VQ) with the Linde-Buzo-Gray (LBG) algorithm is used to train the VQ codebooks of each segment. Thirdly, the prosodic vector of each segment is encoded as an index by VQ codebooks, and then the probability of each possible path is evaluated as a prerequisite to analyze the consistency. It is demonstrated experimentally that a consistency is definitely acquired in case the syllable is located exactly in the same word. These results offer a research direction that the warping process between the spectrum and the prosody intra a syllable must be considered in a TTS system to improve the speech quality.

Original languageEnglish
Pages (from-to)266-277
Number of pages12
JournalSpeech Communication
Volume55
Issue number2
DOIs
StatePublished - 1 Feb 2013

Keywords

  • Acoustic module
  • Consistency analysis
  • Hidden Markov model (HMM)
  • Speech synthesis
  • Text-to-speech (TTS)
  • Vector quantization (VQ)

Fingerprint Dive into the research topics of 'A consistency analysis on an acoustic module for Mandarin text-to-speech'. Together they form a unique fingerprint.

Cite this