An information granulation based data mining approach for classifying imbalanced data

Mu-Chen Chen*, Long Sheng Chen, Chun Chin Hsu, Wei Rong Zeng

*Corresponding author for this work

Research output: Contribution to journalArticle

64 Scopus citations

Abstract

Recently, the class imbalance problem has attracted much attention from researchers in the field of data mining. When learning from imbalanced data in which most examples are labeled as one class and only few belong to another class, traditional data mining approaches do not have a good ability to predict the crucial minority instances. Unfortunately, many real world data sets like health examination, inspection, credit fraud detection, spam identification and text mining all are faced with this situation. In this study, we present a novel model called the "Information Granulation Based Data Mining Approach" to tackle this problem. The proposed methodology, which imitates the human ability to process information, acquires knowledge from Information Granules rather then from numerical data. This method also introduces a Latent Semantic Indexing based feature extraction tool by using Singular Value Decomposition, to dramatically reduce the data dimensions. In addition, several data sets from the UCI Machine Learning Repository are employed to demonstrate the effectiveness of our method. Experimental results show that our method can significantly increase the ability of classifying imbalanced data.

Original languageEnglish
Pages (from-to)3214-3227
Number of pages14
JournalInformation Sciences
Volume178
Issue number16
DOIs
StatePublished - 15 Aug 2008

Keywords

  • Data mining
  • Feed-forward neural network
  • Granular computing
  • Imbalanced data
  • Information granulation
  • Latent semantic indexing

Fingerprint Dive into the research topics of 'An information granulation based data mining approach for classifying imbalanced data'. Together they form a unique fingerprint.

  • Cite this