A study of mutual information for GMM-based spectral conversion

Hsin Te Hwang, Yu Tsao, Hsin Min Wang, Yih-Ru Wang, Sin-Horng Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

The Gaussian mixture model (GMM)-based method has dominated the field of voice conversion (VC) for last decade. However, the converted spectra are excessively smoothed and thus produce muffled converted sound. In this study, we improve the speech quality by enhancing the dependency between the source (natural sound) and converted feature vectors (converted sound). It is believed that enhancing this dependency can make the converted sound closer to the natural sound. To this end, we propose an integrated maximum a posteriori and mutual information (MAPMI) criterion for parameter generation on spectral conversion. Experimental results demonstrate that the quality of converted speech by the proposed MAPMI method outperforms that by the conventional method in terms of formal listening test.

Original languageEnglish
Title of host publication13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Pages78-81
Number of pages4
StatePublished - 1 Dec 2012
Event13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 - Portland, OR, United States
Duration: 9 Sep 201213 Sep 2012

Publication series

Name13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Volume1

Conference

Conference13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
CountryUnited States
CityPortland, OR
Period9/09/1213/09/12

Keywords

  • GMM
  • Mutual information
  • Voice conversion

Fingerprint Dive into the research topics of 'A study of mutual information for GMM-based spectral conversion'. Together they form a unique fingerprint.

Cite this