Effective temporal data classification by integrating sequential pattern mining and probabilistic induction

Vincent Shin-Mu Tseng*, Chao Hui Lee

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

38 Scopus citations

Abstract

Data classification is an important topic in the field of data mining due to its wide applications. A number of related methods have been proposed based on the well-known learning models such as decision tree or neural network. Although data classification was widely discussed, relatively few studies explored the topic of temporal data classification. Most of the existing researches focused on improving the accuracy of classification by using statistical models, neural network, or distance-based methods. However, they cannot interpret the results of classification to users. In many research cases, such as gene expression of microarray, users prefer the classification information above a classifier only with a high accuracy. In this paper, we propose a novel pattern-based data mining method, namely classify-by-sequence (CBS), for classifying large temporal datasets. The main methodology behind the CBS is integrating sequential pattern mining with probabilistic induction. The CBS has the merit of simplicity in implementation and its pattern-based architecture can supply clear classification information to users. Through experimental evaluation, the CBS was shown to deliver classification results with high accuracy under two real time series datasets. In addition, we designed a simulator to evaluate the performance of CBS under datasets with different characteristics. The experimental results show that CBS can discover the hidden patterns and classify data effectively by utilizing the mined sequential patterns.

Original languageEnglish
Pages (from-to)9524-9532
Number of pages9
JournalExpert Systems with Applications
Volume36
Issue number5
DOIs
StatePublished - 1 Jul 2009

Keywords

  • Classification
  • Data mining
  • Scoring method
  • Sequential pattern
  • Temporal data

Fingerprint Dive into the research topics of 'Effective temporal data classification by integrating sequential pattern mining and probabilistic induction'. Together they form a unique fingerprint.

Cite this