Efficiently mining uncertain high-utility itemsets

Jerry Chun Wei Lin*, Wensheng Gan, Philippe Fournier-Viger, Tzung Pei Hong, Vincent Shin-Mu Tseng

*Corresponding author for this work

Research output: Contribution to journalArticle

18 Scopus citations

Abstract

Data mining consists of deriving implicit, potentially meaningful and useful knowledge from databases such as information about the most profitable items. High-utility itemset mining (HUIM) has thus emerged as an important research topic in data mining. But most HUIM algorithms can only handle precise data, although big data collected in real-life applications using experimental measurements or noisy sensors is often uncertain. In this paper, an efficient algorithm, named Mining Uncertain High-Utility Itemsets (MUHUI), is proposed to efficiently discover potential high-utility itemsets (PHUIs) in uncertain data. Based on the probability-utility-list (PU-list) structure, the MUHUI algorithm directly mines PHUIs without generating candidates, and can avoid constructing PU-lists for numerous unpromising itemsets by applying several efficient pruning strategies, which greatly improve its performance. Extensive experiments conducted on both real-life and synthetic datasets show that the proposed algorithm significantly outperforms the state-of-the-art PHUI-List algorithm in terms of efficiency and scalability, and that the proposed MUHUI algorithm scales well when mining PHUIs in large-scale uncertain datasets.

Original languageEnglish
Pages (from-to)2801-2820
Number of pages20
JournalSoft Computing
Volume21
Issue number11
DOIs
StatePublished - 1 Jun 2017

Keywords

  • Data mining
  • High-utility itemset
  • Large-scale dataset
  • Pruning strategies
  • Uncertainty

Fingerprint Dive into the research topics of 'Efficiently mining uncertain high-utility itemsets'. Together they form a unique fingerprint.

  • Cite this

    Lin, J. C. W., Gan, W., Fournier-Viger, P., Hong, T. P., & Tseng, V. S-M. (2017). Efficiently mining uncertain high-utility itemsets. Soft Computing, 21(11), 2801-2820. https://doi.org/10.1007/s00500-016-2159-1