Mining useful information from large databases has become an important research area in recent years. Among the classes of knowledge derived, sequential pattern can be applied in many domains, such as market analysis, web click streams, and biological data. The fast updated sequential pattern tree (FUSP-tree) algorithm was proposed to update discovered sequential patterns in incremental mining. However, it must rescan the original database for maintaining discovered sequential patterns. This study proposes the PreFUSP-TREE-INS algorithm based on the pre-large concept for maintaining discovered sequential patterns without rescanning the original database until the cumulative number of newly added customer sequences exceeds a safety bound. The execution time for reconstructing the tree when old or new customer sequences are added into the original database is reduced by using pre-large sequences. The pre-large sequences are defined by lower and upper support thresholds that prevent the movement of sequences directly from large to small and vice versa. Experiments are conducted to show the performance of the proposed algorithm for various minimum support thresholds and ratios of inserted sequences.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Computer Vision and Pattern Recognition
- Artificial Intelligence