A disk-based mining algorithm for frequent pattern discovery from big data in distributed computing environments

Kawuu W. Lin*, Sheng Hao Chung, Chun Yuan Hsiao, Chun-Cheng Lin, Pei Ling Chen

*Corresponding author for this work

Research output: Contribution to journalArticle

Abstract

In distributed computing environments, frequent pattern mining by a multi-computing node can greatly improve mining efficiency. However, the drawback of memory limitations may cause interruption in the kernel and computing nodes when recursively building a frequent-pattern (FP) tree or an FP-growth algorithm. In this paper, we propose disk-based FP-tree generation and node-based clustering mechanisms to solve the insufficient memory problem. Results from empirical evaluations show that the proposed method delivers excellent scalability.

Original languageEnglish
Pages (from-to)1259-1268
Number of pages10
JournalJournal of Internet Technology
Volume17
Issue number6
DOIs
StatePublished - 1 Jan 2016

Keywords

  • Clustering
  • Data mining
  • Distributed computing
  • Frequent pattern mining

Fingerprint Dive into the research topics of 'A disk-based mining algorithm for frequent pattern discovery from big data in distributed computing environments'. Together they form a unique fingerprint.

Cite this