Large-vocabulary continuous speech recognition systems: A look at some recent advances

George Saon*, Jen-Tzung Chien

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

76 Scopus citations

Abstract

Over the past decade or so, several advances have been made to the design of modern large vocabulary continuous speech recognition (LVCSR) systems to the point where their application has broadened from early speaker dependent dictation systems to speaker-independent automatic broadcast news transcription and indexing, lectures and meetings transcription, conversational telephone speech transcription, open-domain voice search, medical and legal speech recognition, and call center applications, to name a few. The commercial success of these systems is an impressive testimony to how far research in LVCSR has come, and the aim of this article is to describe some of the technological underpinnings of modern systems. It must be said, however, that, despite the commercial success and widespread adoption, the problem of large-vocabulary speech recognition is far from being solved: background noise, channel distortions, foreign accents, casual and disfluent speech, or unexpected topic change can cause automated systems to make egregious recognition errors. This is because current LVCSR systems are not robust to mismatched training and test conditions and cannot handle context as well as human listeners despite being trained on thousands of hours of speech and billions of words of text.

Original languageEnglish
Article number6296522
Pages (from-to)18-33
Number of pages16
JournalIEEE Signal Processing Magazine
Volume29
Issue number6
DOIs
StatePublished - 1 Jan 2012

Fingerprint Dive into the research topics of 'Large-vocabulary continuous speech recognition systems: A look at some recent advances'. Together they form a unique fingerprint.

Cite this