An HMM-based algorithm for content ranking and coherence-feature extraction

Chien-Liang Liu, Wen Hoar Hsaio, Chia-Hoang Lee, Hsiao Cheng Chi

Research output: Contribution to journalArticle

16 Scopus citations

Abstract

In this paper, we propose an algorithm called coherence hidden Markov model (HMM) to extract coherence features and rank content. Coherence HMM is a variant of HMM and is used to model the stochastic process of essay writing and identify topics as hidden states, given sequenced clauses as observations. This study uses probabilistic latent semantic analysis for parameter estimation of coherence HMM. In coherence-feature extraction, support vector regression (SVR) with surface features and coherence features is used for essay grading. The experimental results indicate that SVR can benefit from coherence features. The adjacent agreement rate and the exact agreement rate are 95.24% and 59.80%, respectively. Moreover, this study submits high-scoring essays to the same experiment and finds that the adjacent agreement rate and exact agreement rate are 98.33% and 64.50%, respectively. In content ranking, we design and implement an intelligent assisted blog writing system based on the coherence-HMM ranking model. Several corpora are employed to help users efficiently compose blog articles. When users finish composing a clause or sentence, the system provides candidate texts for their reference based on current clause or sentence content. The experimental results demonstrate that all participants can benefit from the system and save considerable time on writing articles.

Original languageEnglish
Pages (from-to)440-450
Number of pages11
JournalIEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans
Volume43
Issue number2
DOIs
StatePublished - 11 Nov 2013

Keywords

  • Coherence-feature extraction
  • Hidden Markov model (HMM)
  • Input devices and strategies
  • Natural language processing (NLP)
  • Predictive content

Fingerprint Dive into the research topics of 'An HMM-based algorithm for content ranking and coherence-feature extraction'. Together they form a unique fingerprint.

  • Cite this