Traditional Chinese parser and language modeling for Mandadin ASR

Ang Hsing Lin, Yih-Ru Wang, Sin-Horng Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

A new approach of traditional Chinese parser to improving the language modeling of Mandarin speech recognition is proposed in this paper. The parser first uses a preprocessing to correct some word segmentation inconsistencies of the text corpus. It then employs a CRF-based word segmentation method and a CRF-based POS tagger to resegment the texts so as to generate better word strings for training an n-gram language model (LM) for ASR. Experimental results on the TCC-300 corpus showed that a word error rate (WER) of 13.4% was achieved by the proposed method. It is about 45% improvement on the relative WER reduction as compared with the previous system.

Original languageEnglish
Title of host publication2013 International Conference Oriental COCOSDA Held Jointly with 2013 Conference on Asian Spoken Language Research and Evaluation, O-COCOSDA/CASLRE 2013
DOIs
StatePublished - 1 Dec 2013
Event2013 International Conference Oriental COCOSDA Held Jointly with 2013 Conference on Asian Spoken Language Research and Evaluation, O-COCOSDA/CASLRE 2013 - Gurgaon, India
Duration: 25 Nov 201327 Nov 2013

Publication series

Name2013 International Conference Oriental COCOSDA Held Jointly with 2013 Conference on Asian Spoken Language Research and Evaluation, O-COCOSDA/CASLRE 2013

Conference

Conference2013 International Conference Oriental COCOSDA Held Jointly with 2013 Conference on Asian Spoken Language Research and Evaluation, O-COCOSDA/CASLRE 2013
CountryIndia
CityGurgaon
Period25/11/1327/11/13

Keywords

  • Chinese word segmentation
  • Conditional random field
  • Language model
  • automatic speech recognition
  • weighted finite state transducer

Fingerprint Dive into the research topics of 'Traditional Chinese parser and language modeling for Mandadin ASR'. Together they form a unique fingerprint.

Cite this