Efficient mining of a concise and lossless representation of high utility itemsets

Cheng Wei Wu*, Philippe Fournier-Viger, Philip S. Yu, S. Tseng

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

35 Scopus citations

Abstract

Mining high utility itemsets from transactional databases is an important data mining task, which refers to the discovery of itemsets with high utilities (e.g. high profits). Although several studies have been carried out, current methods may present too many high utility itemsets for users, which degrades the performance of the mining task in terms of execution and memory efficiency. To achieve high efficiency for the mining task and provide a concise mining result to users, we propose a novel framework in this paper for mining closed + high utility itemsets, which serves as a compact and lossless representation of high utility itemsets. We present an efficient algorithm called CHUD (Closed + High Utility itemset Discovery) for mining closed + high utility itemsets. Further, a method called DAHU (Derive All High Utility itemsets) is proposed to recover all high utility itemsets from the set of closed + high utility itemsets without accessing the original database. Results of experiments on real and synthetic datasets show that CHUD and DAHU are very efficient with a massive reduction (up to 800 times in our experiments) in the number of high utility itemsets. In addition, when all high utility itemsets are recovered by DAHU, the approach combining CHUD and DAHU also outperforms the state-of-the-art algorithms in mining high utility itemsets.

Original languageEnglish
Title of host publicationProceedings - 11th IEEE International Conference on Data Mining, ICDM 2011
Pages824-833
Number of pages10
DOIs
StatePublished - 1 Dec 2011
Event11th IEEE International Conference on Data Mining, ICDM 2011 - Vancouver, BC, Canada
Duration: 11 Dec 201114 Dec 2011

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Conference

Conference11th IEEE International Conference on Data Mining, ICDM 2011
CountryCanada
CityVancouver, BC
Period11/12/1114/12/11

Keywords

  • Closed high utility itemset
  • Frequent itemset
  • Lossless and concise representation
  • Utility mining

Fingerprint Dive into the research topics of 'Efficient mining of a concise and lossless representation of high utility itemsets'. Together they form a unique fingerprint.

Cite this