be a k-itemset, where k is the length of the itemset. An itemset X is
said to be contained in a transaction T
q
if X # T
q
. A minimum
average-utility threshold d is set according to the user’s preference
(a positive integer). An example quantitative database is shown in
Table 1, which will be used as running example for the rest of this
paper. This database contains six transactions and six distinct
items, denoted with letters from (A)to(F). The profit table indi-
cates the unit profit of each item appearing in the database, and
is shown in Table 2. In the running example, the minimum
average-utility threshold is set to (d = 16%).
Definition 1. The average-utility of an item i
j
in a transaction T
q
is
denoted as auði
j
; T
q
Þ, and defined as:
auði
j
; T
q
Þ¼
qði
j
; T
q
Þprði
j
Þ
1
; ð1Þ
where qði
j
; T
q
Þ is the quantity of i
j
in T
q
, and prði
j
Þ is the unit profit
value of i
j
.
For example, the average-utility of items (A), (B), (C), (D), and (F)
in T
1
are respectively calculated as auðA; T
1
Þ¼
15
1
ð¼ 5Þ,
auðB; T
1
Þ¼
61
1
ð¼ 6Þ, auðC; T
1
Þ¼
32
1
ð¼ 6Þ, auðD; T
1
Þ¼
33
1
ð¼ 9Þ,
and auðF; T
1
Þ¼
61
1
ð¼ 6Þ.
Definition 2. The average-utility of a k-itemset X in a transaction
T
q
is denoted as auðX; T
q
Þ, and defined as:
auðX; T
q
Þ¼
P
i
j
2X ^X # T
q
qði
j
; T
q
Þprði
j
Þ
jXj
¼
P
i
j
2X ^X # T
q
qði
j
; T
q
Þprði
j
Þ
k
; ð2Þ
where k is the number of items in X.
For example, the average-utility of itemsets ðABÞ and ðABCÞ in T
1
are respectively calculated as auðABÞ =
15þ61
2
= (5.5) and
auðABCÞ
=
15þ61þ32
3
(=5.66).
Definition 3. The average-utility of an itemset X in D is denoted as
auðXÞ, and is defined as:
auðXÞ¼
X
X # T
q
^T
q
2D
auðX; T
q
Þ: ð3Þ
For example, the average-utilities of itemsets ðABÞ and ðABCÞ in
the database depicted in Table 1 are respectively calculated as
auðABÞ = auðAB; T
1
Þ + auðAB; T
4
Þ + auðAB; T
5
Þ = 5.5 + 7 + 12 (=24.5),
and auðABCÞ
= auðABC; T
1
Þ + auðABC; T
4
Þ + auðABC; T
5
Þ = 5.66 + 6.66
+ 10 (=22.32).
Definition 4. The transaction utility of a transaction T
q
is denoted
as tuðT
q
Þ, and defined as:
tuðT
q
Þ¼
X
i
j
2T
q
uði
j
; T
q
Þ: ð4Þ
For example, the utilities of transactions in Table 1 are respec-
tively calculated as tu ðT
1
Þ = 5 + 6 + 6 + 9 + 6 (=32), tuðT
2
Þ(=16),
tuðT
3
Þ(=22), tuðT
4
Þ(=28), tuðT
5
Þ(=37), and tuðT
6
Þ(=15).
Definition 5. The total utility of a database D is denoted as TU, and
defined as the sum of all transaction utilities, that is:
TU ¼
X
T
q
2D
tuðT
q
Þ: ð5Þ
For example, the total utility in the running example of Table 1
is calculated as TU = 32 + 16 + 22 + 28 + 37 + 15 (=150).
3.2. Problem statement
The problem of mining high average-utility itemsets is to dis-
cover the complete set of high average-utility itemsets (HAUIs).
An itemset X is an HAUI in a database D if its utility is no less than
the minimum average-utility count, specified by the user. The set
of HAUIs is thus formally defined as:
HAUIs fXjauðXÞ P TU dg: ð6Þ
4. The proposed HAUI-Miner algorithm
In this paper, we design an average-utility (AU)-list structure to
store the information needed by the mining process. Moreover, an
algorithm named HAUI-Miner is also developed to mine HAUIs
more efficiently than previous works. In traditional association rule
mining (ARM), the downward closure (DC) property is used to
reduce the search space and avoid the problem of the combinato-
rial explosion for mining HAUIs. In HAUIM, this property does not
hold for the average utility measure. To restore this property and
effectively reduce the search space, this paper introduces a
transaction-maximum utility downward closure (TMUDC) prop-
erty. It allows to prune unpromising candidates early, and thus
to reduce the search space to efficiently discover the actual HAUIs.
Definition 6. The transaction-maximum utility of a transaction T
q
is denoted as tmuðT
q
Þ, and defined as the maximum utility of items
in a transaction T
q
, that is:
tmuðT
q
Þ¼maxðfuði
j
Þji
j
2 T
q
gÞ: ð7Þ
For example, the transaction-maximum utility of T
1
is calcu-
lated as muðT
1
Þ = maxf5; 6; 6; 9; 6gÞ(=9). The transaction-
maximum utilities of the other transactions are calculated in the
same say, and are shown in Table 3.
Definition 7. The average-utility upper-bound of an itemset X is
denoted as auubðXÞ, and defined as the sum of the transaction-
maximum utilities of transactions containing X, that is:
auubðXÞ¼
X
X # T
q
^T
q
2D
tmuðT
q
Þ: ð8Þ
Table 1
A quantitative database.
TID Transaction (item, quantity)
1 A:1, B:6, C:3, D:3, F:6
2 B:2, C:3, E:2
3 A:2, C:1, D:2, E:1
4 A:1, B:9, C:3, D:2, F:2
5 A:3, B:9, C:3, D:1, E:1
6 C:4, D:1, E:1
Table 2
A profit table.
Item Profit
A 5
B 1
C 2
D 3
E 4
F 1
J.C.-W. Lin et al. / Advanced Engineering Informatics 30 (2016) 233–243
235