High utility itemsets (HUIs) mining is an emerging topic in data mining, which refers to discovering all itemsets having a utility meeting a user-specified minimum utility threshold min-util. However, setting min-util appropriately is a difficult problem for users. Generally speaking, finding an appropriate minimum utility threshold by trial and error is a tedious process for users. If min-util is set too low, too many HUIs will be generated, which may cause the mining process to be very inefficient. On the other hand, if min-util is set too high, it is likely that no HUIs will be found. In this paper, we address the above issues by proposing a new framework for top-k high utility itemset mining, where k is the desired number of HUIs to be mined. Two types of efficient algorithms named TKU (mining Top-K Utility itemsets) and TKO (mining Top-K utility itemsets in One phase) are proposed for mining such itemsets without the need to set min-util. We provide a structural comparison of the two algorithms with discussions on their advantages and limitations. Empirical evaluations on both real and synthetic datasets show that the performance of the proposed algorithms is close to that of the optimal case of state-of-the-art utility mining algorithms.
|Number of pages||14|
|Journal||IEEE Transactions on Knowledge and Data Engineering|
|State||Published - 1 Jan 2016|
- high utility itemset mining
- top-k high utility itemset mining
- top-k pattern mining
- Utility mining