An adaptive band-partitioning spectral entropy based speech detection in realistic noisy environments

Bing-Fei Wu, Kun Ching Wang

Research output: Contribution to conferencePaper

3 Scopus citations

Abstract

Generally, the feature parameters used for speech detection are highly sensitive to the environment. The performance of speech detection is severely degraded under realistic noisy environments since the characteristics of a speech signal cannot be fully expressed by those feature parameters. As a result, this study seeks the acoustic fingerprints of speech spectrogram as a robust feature to distinguish a speech from a non-speech, especially in adverse environments, and the fact that the frequency energies of difference types of noise are concentrated on different frequency bands [12], an ABSE (Adaptive Band-partitioning Spectral Entropy)-based speech detection algorithm is proposed to detect speech signals in adverse environments. Additionally, the ABSE-based algorithm is demonstrated to work in real-time with minimal processing delay. Experimental results indicate that the ABSE parameter is very effective for several SNRs (Signal to Noise Ratios) and various noise conditions. Furthermore, the proposed ABSE-based algorithm outperforms other approaches and is reliable in a real car.

Original languageEnglish
Pages957-960
Number of pages4
StatePublished - 1 Jan 2004
Event8th International Conference on Spoken Language Processing, ICSLP 2004 - Jeju, Jeju Island, Korea, Republic of
Duration: 4 Oct 20048 Oct 2004

Conference

Conference8th International Conference on Spoken Language Processing, ICSLP 2004
CountryKorea, Republic of
CityJeju, Jeju Island
Period4/10/048/10/04

Fingerprint Dive into the research topics of 'An adaptive band-partitioning spectral entropy based speech detection in realistic noisy environments'. Together they form a unique fingerprint.

  • Cite this

    Wu, B-F., & Wang, K. C. (2004). An adaptive band-partitioning spectral entropy based speech detection in realistic noisy environments. 957-960. Paper presented at 8th International Conference on Spoken Language Processing, ICSLP 2004, Jeju, Jeju Island, Korea, Republic of.