Transformation-based adaptation, which transforms clusters of speaker-independent (SI) hidden Markov model (HMM) parameters to an enrolled speaker by using cluster-dependent transformation functions, is an effective algorithm for robust speech recognition. To obtain desirable performance for any amount of adaptation data, it is beneficial to establish a tree structure of HMM parameters and apply it to dynamically control the sharing of transformation parameters. Traditionally, the transformation sharing is determined by phonetic rules or by clustering the acoustic space of training data. The tree structure is then kept unchanged for speaker adaptation (SA). In this paper, we adapt the tree structure to new environment such that the transformation parameters can be extracted adaptively by referring to the newest hierarchy of HMM parameters. The adaptation of hierarchical tree is herein combined into the maximum likelihood (ML) estimation of transformation parameters. From a series of speaker adaptation experiments, we find that the transformation-based adaptation with adaptive hierarchy of HMM parameters outperforms that with the static hierarchy for different cases of tree depths and adaptation data lengths.
- Hidden Markov model
- Speaker adaptation
- Speech recognition
- Transformation-based adaptation
- Tree structure