Multi-view and multi-objective semi-supervised learning for large vocabulary continuous speech recognition

Xiaodong Cui*, Jing Huang, Jen-Tzung Chien

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

Current hidden Markov acoustic modeling for large vocabulary continuous speech recognition (LVCSR) relies on the availability of abundant labeled transcriptions. Given that speech labeling is both expensive and time-consuming while there is a huge amount of unlabeled data easily available nowadays, semi-supervised learning (SSL) from both labeled and unlabeled data which aims to reduce the development cost for LVCSR becomes more important than ever. In this paper, we propose SSL for LVCSR by using the multiple views learned from different acoustic features and randomized decision trees. In addition, we develop the multi-objective learning of HMM-based acoustic models by optimizing a hybrid criterion which is established by the combination of the discriminative mutual information from labeled data and the entropy from unlabeled data. Experiments conducted on Broadcast News show the benefits of proposed methods.

Original languageEnglish
Title of host publication2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
Pages4668-4671
Number of pages4
DOIs
StatePublished - 18 Aug 2011
Event36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Prague, Czech Republic
Duration: 22 May 201127 May 2011

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
CountryCzech Republic
CityPrague
Period22/05/1127/05/11

Keywords

  • discriminative training
  • LVCSR
  • multi-objective learning
  • multi-view
  • semi-supervised learning

Fingerprint Dive into the research topics of 'Multi-view and multi-objective semi-supervised learning for large vocabulary continuous speech recognition'. Together they form a unique fingerprint.

Cite this