Multi-keyword spotting of telephone speech using orthogonal transform-based SBR and RNN prosodic model

Wern Jun Wang, Chun Jen Lee, Eng Fong Huang, Sin-Horng Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

In this paper, orthogonal transform-based signal bias removal (OTSBR) approach and RNN prosodic model are proposed for multi-keyword spotting of telephone speech. OTSBR is employed in the pre-processing stage of acoustic decoding and aimed at channel bias estimation to eliminate the acoustic mismatch between training and testing environments. The RNN prosodic model is adopted in the post-processing stage of the acoustic decoding to detect word boundaries for reordering the keyword candidates from the keyword spotter. Simulations on the real speech database collected from the Phone Directory Assistant Service developed in Chunghwa Telecommunication Laboratories (CTL-PDAS) were performed to evaluate the proposed methods. Experimental results showed that 71.0% of keyword detection rate and 81.8% of top 5 keywords inclusion rate can be attained by incorporating OTSBR and RNN prosodic model into the system.

Original languageEnglish
Title of host publicationEUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology
EditorsBorge Lindberg, Henrik Benner, Paul Dalsgaard, Zheng-Hua Tan
PublisherInternational Speech Communication Association
Pages2773-2776
Number of pages4
ISBN (Electronic)8790834100, 9788790834104
StatePublished - 1 Jan 2001
Event7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001 - Aalborg, Denmark
Duration: 3 Sep 20017 Sep 2001

Publication series

NameEUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology

Conference

Conference7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001
CountryDenmark
CityAalborg
Period3/09/017/09/01

Fingerprint Dive into the research topics of 'Multi-keyword spotting of telephone speech using orthogonal transform-based SBR and RNN prosodic model'. Together they form a unique fingerprint.

  • Cite this

    Wang, W. J., Lee, C. J., Huang, E. F., & Chen, S-H. (2001). Multi-keyword spotting of telephone speech using orthogonal transform-based SBR and RNN prosodic model. In B. Lindberg, H. Benner, P. Dalsgaard, & Z-H. Tan (Eds.), EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology (pp. 2773-2776). (EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology). International Speech Communication Association.