Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition

Yih Liang Shen, Chao Yuan Huang, Syu Siang Wang, Yu Tsao, Hsin Min Wang, Tai-Shih Chi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Conventional deep neural network (DNN)-based speech enhancement (SE) approaches aim to minimize the mean square error (MSE) between enhanced speech and clean reference. The MSE-optimized model may not directly improve the performance of an automatic speech recognition (ASR) system. If the target is to minimize the recognition error, the recognition results should be used to design the objective function for optimizing the SE model. However, the structure of an ASR system, which consists of multiple units, such as acoustic and language models, is usually complex and not differentiable. In this study, we propose to adopt the reinforcement learning (RL) algorithm to optimize the SE model based on the recognition results. We evaluated the proposed RL-based SE system on the Mandarin Chinese broadcast news corpus (MATBN). Experimental results demonstrate that the proposed SE system can effectively improve the ASR results with a notable 12:40% and 19:23% error rate reductions for signal to noise ratio (SNR) at 0 dB and 5 dB conditions, respectively.

Original languageEnglish
Title of host publication2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6750-6754
Number of pages5
ISBN (Electronic)9781479981311
DOIs
StatePublished - 1 May 2019
Event44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Brighton, United Kingdom
Duration: 12 May 201917 May 2019

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2019-May
ISSN (Print)1520-6149

Conference

Conference44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
CountryUnited Kingdom
CityBrighton
Period12/05/1917/05/19

Keywords

  • automatic speech recognition
  • character error rate
  • deep neural network
  • reinforcement learning
  • speech enhancement

Fingerprint Dive into the research topics of 'Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition'. Together they form a unique fingerprint.

  • Cite this

    Shen, Y. L., Huang, C. Y., Wang, S. S., Tsao, Y., Wang, H. M., & Chi, T-S. (2019). Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition. In 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings (pp. 6750-6754). [8683648] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2019-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2019.8683648