and the sale periods of itemsets. Mining SPHUIs provides the
benefit of revealing the most popular itemsets in each short
period that also have high utilities. This type of patterns is more
interesting than HUIs for real-life applications such as market
basket analysis.
2. An efficient two-phase short-period high-utility itemset mining
(SPHUI
TP
) algorithm is presented to discover SPHUIs in a level-
wise manner. In its first phase, the algorithm utilizes an upper-
bound on the utilities of itemsets to find a small set of candidate
SPHUIs. Then, in the second phase, the algorithm identifies the
actual SPHUIs in this set of candidates.
3. To speed up the baseline SPHUI
TP
algorithm, two pruning
strategies are further introduced. The resulting algorithms are
respectively named SPHUI
MT
and SPHUI
TID
. The two strategies
are designed to reduce the search space for mining SPHUIs.
4. Experiments on various real-life and synthetic datasets are then
conducted to evaluate the performance of the three designed
approaches in terms of runtime, memory usage, number of can-
didates, number of patterns found, and scalability, on several
datasets.
The rest of this paper is organized as follows. Related work is
reviewed in Section 2. Section 3 introduces preliminaries and
defines the problem of SPHUIM. Section 4 provides details about
the proposed algorithm and presents the two pruning strategies.
A detailed example of how the proposed algorithm is applied on
a small database is presented in Section 5. Results of an extensive
experimental evaluation are discussed in Section 6. Finally, Sec-
tion 7 provides a conclusion and a discussion.
2. Related work
In data mining, the tasks of Association-rule mining (ARM) and
frequent itemset mining (FIM) are considered as fundamental, and
have thus attracted the interest of numerous researchers [2,13,14].
Agrawal et al. [1] first proposed the Apriori algorithm for discover-
ing association rules in a level-wise manner. The Apriori algorithm
first generates a large number of candidates (FIs) by considering a
minimum support threshold. Then, these itemsets are used to
reveal the association rules that respect a minimum confidence
threshold. This process allows discovering all association rules that
meet the minimum support and confidence thresholds set by the
user. However, this process is very time-consuming and can result
in long execution times and the consumption of a large amount of
memory. To address this issue, Han et al. designed a pattern-
growth algorithm called FP-Growth [2], which builds a compressed
tree structure using frequent items, and then derives all FIs from
the constructed tree structure without generating candidates.
Thereafter, several other algorithms have been proposed for FIM
and HUIM. But most of them are either based on the level-wise
or the pattern-growth approaches. In general, pattern-growth
methods perform better than level-wise approaches but are more
difficult to implement. Researchers have also proposed numerous
variations of the FIM problem such as sequential pattern mining
[15], top-k frequent pattern mining [14], frequent closed itemset
mining [3], maximal frequent itemset mining [13] and high utility
sequential pattern mining [16,17].
HUIM can be considered as an extension of ARM. It has been
extensively studied in recent decades [18–20] since it has numer-
ous applications in various fields of science and engineering [10].A
typical application of HUIM is market basket analysis. In this appli-
cation, HUIM provides crucial information to managers and deci-
sion makers for designing profitable sale strategies or taking
other strategic decisions. In HUIM, the purchase quantities in
transactions and the unit profits of items are considered to find
the itemsets that are highly profitable, called the high-utility item-
sets (HUIs). These itemsets are useful for managers and retailers as
they are the patterns that yield a high profit. To extract the set of
high-utility itemsets from a database, several algorithms have been
designed. Chan et al. [4] presented a framework to mine the top-k
closed utility patterns based on business objectives. Their approach
discovers not only frequent itemsets (FIs) but also HUIs. Yao et al.
[21] then defined utility mining as the problem of discovering prof-
itable itemsets in transactional databases by considering both the
purchase quantities of items in transactions and their unit profits.
A key difference between the tasks of FIM and HUIM is that the
downward closure property, also called the Apriori property, does
not hold in HUIM [21]. The Apriori property indicates that the
occurrence frequency of an itemset cannot be less than that of its
supersets. This property is widely used in FIM to reduce the num-
ber of candidate FIs.
In HUIM, this property does not hold since the utility of an item-
set can be less than, greater or equal to the utility of any of its
supersets. To obtain a downward closure property in HUIM and
be able to reduce the search space for mining HUIs, Liu et al. [22]
introduced a model called the transaction-weighted utilization
(TWU). This model consists of calculating an upper-bound on the
utility of itemsets that is downward-closed, called the
transaction-weighted downward closure (TWDC) property. A
two-phase algorithm was also designed to extract HUIs in transac-
tional databases. The first phase consists of mining the high
transaction-weighted utilization itemsets (HTWUIs) in a level-
wise manner. Then, the second phase consists of identifying the
HUIs among the HTWUIs. The aforementioned studies utilize a
generate-and-test approach where itemsets are discovered level-
by-level. This strategy requires to generate a large number of can-
didates. Thus, these algorithms have high time and space
complexities.
Ahmed et al. [23] then designed a tree-based structure named
HUP-tree and a corresponding algorithm to mine HUIs. The algo-
rithm utilizes the TWU model to find 1-HTWUIs (HTWUIs contain-
ing only one item) and keeps information in its tree structure to
support incremental and interactive mining. Lin et al. [24] then
presented the high-utility pattern (HUP)-growth algorithm for
mining HUIs. This algorithm is based on the TWU model and a
novel HUP-tree structure. The tree structure maintains information
about the purchase quantities of items in transactions to improve
the mining performance and discover HUIs without candidate gen-
eration. Tseng et al. proposed the UP-growth [25] and UP-growth+
[10] algorithms and several pruning strategies to mine the set of
HUIs based on their designed UP-tree structure. Nevertheless, the
above approaches have high time and space complexity, and the
search space in terms of number of itemsets considered by these
algorithms is very large.
To speed up the discovery of HUIs, Lan et al. [26] developed an
efficient projection-based indexing approach and a pruning strat-
egy to reduce the number of candidate itemsets. Liu et al. [27] pro-
posed a novel utility-list structure to keep the information
required for mining HUIs. They developed an algorithm named
HUI-Miner to directly discover HUIs without candidate generation.
The utility-list structure stores important information for identify-
ing HUIs: 1. the IDs of transactions where an itemset appears; 2.
the actual utilities of the itemset in these transactions; and 3. the
utility of items that could be appended to the itemset in these
transactions. Fournier-Viger et al. [18] further designed an Esti-
mated Utility Co-occurrence Structure (EUCS) to store information
about the relationships between 2-itemsets and proposed the FHM
algorithm to mine HUIs based on the utility-list structure. By rely-
ing on the EUCS, 2-itemsets can be easily discovered and
unpromising candidates can also be easily pruned. The develop-
ment of HUIM algorithms is a very active research area, and novel
J.C.-W. Lin et al. / Advanced Engineering Informatics 33 (2017) 29–43
31