Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits

Hua-Fu Li, Hsin-Yun Huang, Suh-Yin Lee

Research output: Contribution to journalArticlepeer-review

35 Scopus citations

Abstract

Mining utility itemsets from data steams is one of the most interesting research issues in data mining and knowledge discovery. In this paper, two efficient sliding window-based algorithms, MHUI-BIT (Mining High-Utility Itemsets based on BITvector) and MHUI-TID (Mining High-Utility Itemsets based on TIDlist), are proposed for mining high-utility itemsets from data streams. Based on the sliding window-based framework of the proposed approaches, two effective representations of item information, Bitvector and TIDlist, and a lexicographical tree-based summary data structure, LexTree-2HTU, are developed to improve the efficiency of discovering high-utility itemsets with positive profits from data streams. Experimental results show that the proposed algorithms outperform than the existing approaches for discovering high-utility itemsets from data streams over sliding windows. Beside, we also propose the adapted approaches of algorithms MHUI-BIT and MHUI-TID in order to handle the case when we are interested in mining utility itemsets with negative item profits. Experiments show that the variants of algorithms MHUI-BIT and MHUI-TID are efficient approaches for mining high-utility itemsets with negative item profits over stream transaction-sensitive sliding windows.
Original languageEnglish
Pages (from-to)495-522
Number of pages28
JournalKnowledge and Information Systems
Volume28
Issue number3
DOIs
StatePublished - Sep 2011

Keywords

  • Data mining; Data streams; Utility mining; High-utility itemsets; Utility itemset with positive item profits; Utility itemset with negative item profits

Fingerprint Dive into the research topics of 'Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits'. Together they form a unique fingerprint.

Cite this