Voice activity detection based on frequency modulation of harmonics

Chung Chien Hsu, Tse En Lin, Jian Hueng Chen, Tai-Shih Chi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations

Abstract

In this paper, we propose a voice activity detection (VAD) algorithm based on spectro-temporal modulation structures of input sounds. A multi-resolution spectro-temporal analysis framework is used to inspect prominent speech structures. By comparing with an adaptive threshold, the proposed VAD distinguishes speech from non-speech based on the energy of the frequency modulation of harmonics. Compared with three standard VADs, ITU-T G.729B, ETSI AMR1 and AMR2, our proposed VAD significantly outperforms them in non-stationary noises in terms of the receiver operating characteristic (ROC) curves and the recognition rates from a practical distributed speech recognition (DSR) system.

Original languageEnglish
Title of host publication2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
Pages6679-6683
Number of pages5
DOIs
StatePublished - 18 Oct 2013
Event2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC, Canada
Duration: 26 May 201331 May 2013

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
CountryCanada
CityVancouver, BC
Period26/05/1331/05/13

Keywords

  • frequency modulation
  • spectro-temporal analysis
  • voice activity detection

Fingerprint Dive into the research topics of 'Voice activity detection based on frequency modulation of harmonics'. Together they form a unique fingerprint.

  • Cite this

    Hsu, C. C., Lin, T. E., Chen, J. H., & Chi, T-S. (2013). Voice activity detection based on frequency modulation of harmonics. In 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings (pp. 6679-6683). [6638954] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2013.6638954