Sequential pattern mining (SPAM) is one of the most interesting research issues of data mining. In this paper, a new research problem of mining data streams for sequential patterns is defined. A data stream is an unbound sequence of data elements arriving at a rapid rate. Based on the characteristics of data streams, the problem complexity of mining data streams for sequential patterns is more difficult than that of mining sequential patterns from large static databases. Therefore, mining sequential patterns from data streams is a challenging research issue of data mining and knowledge discovery. Hence, an efficient single-pass algorithm, called IncSparn (Incremental Sequential pattern mining of streaming itemset-sequences), is proposed for discovering sequential patterns from streaming itemset-sequences over extended sliding window models. In the framework of IncSpam algorithm, a new sliding window model, called CSW-BV (Customer Sliding Window with Bit-Vectors), and an extended lexicographic tree-based data structure, called LesSeq-Tree (Lexicographic Sequence Tree), are developed to reduce the time and memory needed to slide the windows over streaming data and maintain all sequential patterns of current sliding windows. Experimental results show that the proposed method is an efficient single-pass algorithm for mining sequential patterns from streaming data.
|Number of pages||22|
|Journal||International Journal of Innovative Computing, Information and Control|
|State||Published - Mar 2012|
- Data streams; Data mining; Data stream mining; Sequential pattern mining