A probabilistic interpretation for artificial neural network-based voice conversion

Hsin Te Hwang, Yu Tsao, Hsin Min Wang, Yih-Ru Wang, Sin-Horng Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

Voice conversion (VC) using artificial neural networks (ANNs) has shown its capability to produce better sound quality of the converted speech than that using Gaussian mixture model (GMM). Although ANN-based VC works reasonably well, there is still room for further improvement. One of the promising ways is to adopt the successful techniques in statistical model-based parameter generation (SMPG), such as trajectory-based mapping approaches that are originally designed for GMM-based VC and hidden Markov model (HMM)-based speech synthesis. This study presents a probabilistic interpretation for ANN-based VC. In this way, ANN-based VC can easily incorporate the successful techniques in SMPG. Experimental results demonstrate that the performance of ANN-based VC can be effectively improved by two trajectory-based mapping techniques (maximum likelihood parameter generation (MLPG) algorithm and maximum likelihood-based trajectory mapping considering global variance (referred to as MLGV)), compared to the conventional ANN-based VC with frame-based mapping and the GMM-based VC with the MLPG algorithm. Moreover, ANN-based VC with the trajectory-based mapping techniques can achieve comparable performance when compared to the state-of-the-art GMM-based VC with the MLGV algorithm.

Original languageEnglish
Title of host publication2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages552-558
Number of pages7
ISBN (Electronic)9789881476807
DOIs
StatePublished - 19 Feb 2016
Event2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015 - Hong Kong, Hong Kong
Duration: 16 Dec 201519 Dec 2015

Publication series

Name2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015

Conference

Conference2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
CountryHong Kong
CityHong Kong
Period16/12/1519/12/15

Fingerprint Dive into the research topics of 'A probabilistic interpretation for artificial neural network-based voice conversion'. Together they form a unique fingerprint.

Cite this