TY - JOUR
T1 - Compact decision trees with cluster validity for speech recognition
AU - Chien, Jen-Tzung
AU - Huang, Chih Hsien
AU - Chen, Shun Ju
PY - 2002/1/1
Y1 - 2002/1/1
N2 - A decision tree is built by successively splitting the observation frames of a phonetic unit according to the best phonetic questions. To prevent over-large tree models, the stopping criterion is required to suppress tree growing. It is crucial to exploit the goodness-of-split criteria to choose the best questions for node splitting and test if the hypothesis of splitting should be terminated. The robust tree models could be established. In this study, we apply the Hubert’s Γ statistic as the node splitting criterion and the T2-statistic as the stopping criterion. Hubert’s Γ statistic is a cluster validity measure, which characterizes the degree of clustering in the available data. This measure is useful to select the best questions to unravel tree nodes. Further, we examine the population closeness of two child nodes with a significant level. T2-statistic is determined to validate whether the corresponding mean vectors are close together. The splitting is stopped when validated. In continuous speech recognition experiments, the proposed methods achieve better recognition rates with smaller tree models compared to the maximum likelihood and minimum description length criteria.
AB - A decision tree is built by successively splitting the observation frames of a phonetic unit according to the best phonetic questions. To prevent over-large tree models, the stopping criterion is required to suppress tree growing. It is crucial to exploit the goodness-of-split criteria to choose the best questions for node splitting and test if the hypothesis of splitting should be terminated. The robust tree models could be established. In this study, we apply the Hubert’s Γ statistic as the node splitting criterion and the T2-statistic as the stopping criterion. Hubert’s Γ statistic is a cluster validity measure, which characterizes the degree of clustering in the available data. This measure is useful to select the best questions to unravel tree nodes. Further, we examine the population closeness of two child nodes with a significant level. T2-statistic is determined to validate whether the corresponding mean vectors are close together. The splitting is stopped when validated. In continuous speech recognition experiments, the proposed methods achieve better recognition rates with smaller tree models compared to the maximum likelihood and minimum description length criteria.
UR - http://www.scopus.com/inward/record.url?scp=0036293864&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2002.5743878
DO - 10.1109/ICASSP.2002.5743878
M3 - Article
AN - SCOPUS:0036293864
VL - 1
SP - 873
EP - 876
JO - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
JF - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
SN - 0736-7791
ER -