Consistency analysis of the spectrum and prosody within a syllable for Mandarin speech

Kuan Lin Chen, Cheng Yu Yeh*, Shaw-Hwa Hwang, Long Jhe Yan

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

This work presents a study of Mandarin speech focusing on consistency analysis of the spectrum and prosody within syllables. Identified as a result of inspection of the human pronunciation process, this consistency can be interpreted as a high correlation between the warping curves of the spectrum and the prosody intra a syllable. The consistency analysis consisted of three steps. First, the hidden Markov model algorithm was used to decode the hidden Markov model-state sequences within a syllable, while at the same time dividing them into three segments. Second, based on a designated syllable, the vector quantization (VQ) with the Linde-Buzo-Gray algorithm was employed to train the VQ codebooks of the prosodic vector of each segment. Third, the prosodic vector of each segment was encoded as an index using the VQ codebooks, and then, to analyze the consistency, the probability of each possible path was evaluated as a prerequisite. Finally, two syllables were used as examples to verify the consistency property found in the experiments. It is demonstrated experimentally that there is definitely consistency in the case where the syllable is located in exactly the same word. These results offer a research direction in that the warping process between the spectrum and the prosody intra a syllable must be considered in text-to-speech systems to improve the synthesized speech quality.

Original languageEnglish
Pages (from-to)1851-1861
Number of pages11
JournalMathematical Methods in the Applied Sciences
Volume36
Issue number14
DOIs
StatePublished - 30 Sep 2013

Keywords

  • consistency analysis
  • hidden Markov model (HMM)
  • speech synthesis
  • text-to-speech (TTS)
  • vector quantization (VQ)

Fingerprint Dive into the research topics of 'Consistency analysis of the spectrum and prosody within a syllable for Mandarin speech'. Together they form a unique fingerprint.

Cite this