A sample-based phone boundary detection algorithm is proposed in this paper. Some sample-based acoustic parameters are first extracted in the proposed method, including six sub-band signal envelopes, sample-based KL distance and spectral entropy. Then, the sample-based KL distance is used for boundary candidates preselection. Last, a supervised neural network is employed for final boundary detection. Experimental results using the TIMIT speech corpus showed that EERs of 13.2% and 15.1% were achieved for the training and test data sets, respectively. Moreover, 43.5% and 88.2% of boundaries detected were within 80- and 240-sample error tolerance from manual labeling results at the EER operating point.
|出版狀態||Published - 1 十二月 2010|
|事件||11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 - Makuhari, Chiba, Japan|
持續時間: 26 九月 2010 → 30 九月 2010
|Conference||11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010|
|期間||26/09/10 → 30/09/10|