Sequence Pattern Mining 27
Genetic Programming (GP) [12], [13] improved the expression ability of
GA by evolving individuals as tree structures. Although GP enables a more
explicit representation of reference rules, it may suffer from the problems of
loose structure and bloating, especially in dynamic problems.
Recently, many data mining methods to deal with sequence databases
have been proposed to satisfy the increasing demand for efficient information
mining in time-related dynamic systems. A. K. H. Tung proposed the inter-
transaction association rule mining [14], which not only extracts relationship
within the transaction, but also uses an extended Apriori method to obtain
inter-transaction associations by defining and mining frequent intertransac-
tion itemsets(FITI). This method uses the sequential association between
transactions to represent temporal relations between transaction items. Since
the method is basically an Apriori-based mining method, it shares the same
disadvantage of not able to deal with the databases with a very large number
of attributes.
TPrefixSpan algorithm [15] is a kind of nonambiguous temporal pattern
mining and it can deal with temporal databases using temporal relationships
between two intervals which was proposed by Kam and Fu in 2000 [16].
This method deals with interval-based event data, however, it overcomes
the ambiguous problem with interval-based temporal mining by correctly
describing the temporal relationship between every pair of events. As a result,
the method is capable of building an unique temporal sequence for every time
sequence of interval events. However, since there are 14 kinds of relationships
between every pair of attributes, it would suffer from efficiency problems
when processing the database with a large number of attributes.
Recurrent neural networks and associative memory can also deal with the
time related database. Recurrent neural networks are used for extracting
rules [17] as a form of Deterministic Finite-state Automata. The recurrent
connection and time delay between neurons allows the network to generate
time-varying patterns. Associative memory is mainly a content-addressable
structure that maps specific input representations to specific output repre-
sentations, and the time delay between neurons also enables to mimic the
time related relationship patterns in real-time systems.
1.3 GNP-Based Data Mining Method
Genetic Network Programming (GNP) [18],[19],[20] we introduce in section 2
is an further extension of Genetic Algorithm (GA). GNP uses directed graph
structures as solutions. The directed graph of GNP contributes to creating
quite compact programs and implicitly memorizing past action sequences in
the network flows. Also, GNP can find solutions without the bloating problem
compared with GP, because of the fixed number of nodes in GNP.
Unlike the other methods mentioned in the above, GNP-based data mining
allows the GNP individuals to evolve and extract association rules in the rule