Fast algorithms for mining high-utility itemsets with various discount strategies

Jerry Chun-Wei Lin*, Wensheng Gan, Philippe Fournier-Viger, Tzung Pei Hong, S. Tseng

*Corresponding author for this work

Research output: Contribution to journalArticle

21 Scopus citations

Abstract

In recent years, mining high-utility itemsets (HUIs) has emerged as a key topic in data mining. It consists of discovering sets of items generating a high profit in a transactional database by considering both purchase quantities and unit profits of items. Many algorithms have been proposed for this task. However, most of them assume the unrealistic assumption that unit profits of items remain unchanged over time. But in real-life, the profit of an item or itemset varies as a function of cost prices, sales prices and sale strategies. Recently, a three-phase algorithm has been proposed to mine HUIs, while considering that each item may have different discount strategies. However, the complete set of HUIs cannot be retrieved based on the traditional TWU model with its defined discount strategies. Moreover, it suffers from the well-known drawbacks of Apriori-based algorithms such as maintaining a huge amount of candidates in memory and repeatedly performing time-consuming database scans. In this paper, a HUI-DTP algorithm for mining HUIs when considering discount strategies of items is introduced. The HUI-DTP is designed as a two-phase algorithm to mine the complete set of HUIs based on a novel downward closure property and a vertical TID-list structure. Furthermore, the HUI-DMiner is an algorithm relying on a compact data structure (Positive-and-Negative Utility-list, PNU-list) and properties of two new pruning strategies to efficiently discover HUIs without candidate generation, while considerably reducing the size of the search space. Moreover, a strategy named Estimated Utility Co-occurrence Strategy which stores the relationships between 2-itemsets is also applied in the improved HUI-DEMiner algorithm to speed up computation. An extensive experimental study carried on several real-life datasets shows that the proposed algorithms outperform the previous best algorithm in terms of runtime, memory consumption and scalability.

Original languageEnglish
Pages (from-to)109-126
Number of pages18
JournalAdvanced Engineering Informatics
Volume30
Issue number2
DOIs
StatePublished - 1 Apr 2016

Keywords

  • Discount strategies
  • Downward closure property
  • High-utility itemsets
  • PNU-list
  • Pruning strategies

Fingerprint Dive into the research topics of 'Fast algorithms for mining high-utility itemsets with various discount strategies'. Together they form a unique fingerprint.

  • Cite this