高效挖掘高平均效用项集算法

155 浏览量更新于2024-08-26 收藏 1.04MB PDF 举报

"挖掘高平均效用项集的有效算法" 这篇研究论文主要关注的是高平均效用项集（High Average Utility Itemsets, HAUIM）的挖掘算法。在数据挖掘领域，传统的关联规则学习通常基于频繁项集，这些方法主要考虑的是支持度，即项集在数据集中出现的频率。然而，支持度并不总是能准确反映项集的实际价值或效用，特别是在具有复杂成本和收益结构的实际应用中。因此，研究人员转向了效用驱动的挖掘方法，其中高平均效用项集挖掘就是一种。文章指出，随着数据挖掘应用的日益增多，挖掘高平均效用项集的重要性日益凸显。平均效用是衡量项集效用的一个关键指标，它不仅考虑了项集的总效用，还考虑了项集的大小，从而能更全面地评估项集的价值。然而，这种方法的挑战在于计算和挖掘高平均效用项集的过程可能会非常耗时且资源密集。论文提出了一种新的有效算法来解决这个问题。该算法可能采用了列表结构来存储和更新项集的效用信息，这样可以降低内存需求并提高计算效率。列表结构允许快速访问和更新，对于处理大量数据和高效能计算至关重要。此外，算法可能采用了剪枝策略来避免无效的搜索空间，进一步优化了挖掘过程。文章的接收和发表时间表明，这是2016年的研究成果，当时可能已经经过同行评审，确保了研究的科学性和创新性。关键词包括“高平均效用项集”、“列表结构”和“数据挖掘”，这表明论文主要讨论了如何利用列表结构有效地进行高平均效用项集的挖掘，并且这一工作对于数据挖掘领域的理论和实践都有重要的贡献。这篇论文深入探讨了高平均效用项集挖掘的挑战，提出了一种新的算法，旨在提高挖掘效率，减少计算资源的消耗，并通过列表结构和剪枝策略优化了整个挖掘过程。这样的研究对于理解如何在实际应用中更准确地评估和挖掘有价值的数据模式具有重要意义。

be a k-itemset, where k is the length of the itemset. An itemset X is

said to be contained in a transaction T

if X # T

. A minimum

average-utility threshold d is set according to the user’s preference

(a positive integer). An example quantitative database is shown in

Table 1, which will be used as running example for the rest of this

paper. This database contains six transactions and six distinct

items, denoted with letters from (A)to(F). The proﬁt table indi-

cates the unit proﬁt of each item appearing in the database, and

is shown in Table 2. In the running example, the minimum

average-utility threshold is set to (d = 16%).

Deﬁnition 1. The average-utility of an item i

in a transaction T

denoted as auði

; T

Þ, and deﬁned as:

auði

; T

Þ¼

qði

; T

Þprði

; ð1Þ

where qði

; T

Þ is the quantity of i

in T

, and prði

Þ is the unit proﬁt

value of i

For example, the average-utility of items (A), (B), (C), (D), and (F)

in T

are respectively calculated as auðA; T

Þ¼

15



ð¼ 5Þ,

auðB; T

Þ¼

61



ð¼ 6Þ, auðC; T

Þ¼

32



ð¼ 6Þ, auðD; T

Þ¼

33



ð¼ 9Þ,

and auðF; T

Þ¼

61



ð¼ 6Þ.

Deﬁnition 2. The average-utility of a k-itemset X in a transaction

is denoted as auðX; T

Þ, and deﬁned as:

auðX; T

Þ¼

2X ^X # T

qði

; T

Þprði

jXj

2X ^X # T

qði

; T

Þprði

; ð2Þ

where k is the number of items in X.

For example, the average-utility of itemsets ðABÞ and ðABCÞ in T

are respectively calculated as auðABÞ =

15þ61

= (5.5) and

auðABCÞ

15þ61þ32

(=5.66).

Deﬁnition 3. The average-utility of an itemset X in D is denoted as

auðXÞ, and is deﬁned as:

auðXÞ¼

X # T

auðX; T

Þ: ð3Þ

For example, the average-utilities of itemsets ðABÞ and ðABCÞ in

the database depicted in Table 1 are respectively calculated as

auðABÞ = auðAB; T

Þ + auðAB; T

Þ = 5.5 + 7 + 12 (=24.5),

and auðABCÞ

= auðABC; T

Þ + auðABC; T

Þ = 5.66 + 6.66

+ 10 (=22.32).

Deﬁnition 4. The transaction utility of a transaction T

is denoted

as tuðT

Þ, and deﬁned as:

tuðT

Þ¼

uði

; T

Þ: ð4Þ

For example, the utilities of transactions in Table 1 are respec-

tively calculated as tu ðT

Þ = 5 + 6 + 6 + 9 + 6 (=32), tuðT

Þ(=16),

tuðT

Þ(=22), tuðT

Þ(=28), tuðT

Þ(=37), and tuðT

Þ(=15).

Deﬁnition 5. The total utility of a database D is denoted as TU, and

deﬁned as the sum of all transaction utilities, that is:

TU ¼

tuðT

Þ: ð5Þ

For example, the total utility in the running example of Table 1

is calculated as TU = 32 + 16 + 22 + 28 + 37 + 15 (=150).

3.2. Problem statement

The problem of mining high average-utility itemsets is to dis-

cover the complete set of high average-utility itemsets (HAUIs).

An itemset X is an HAUI in a database D if its utility is no less than

the minimum average-utility count, speciﬁed by the user. The set

of HAUIs is thus formally deﬁned as:

HAUIs fXjauðXÞ P TU  dg: ð6Þ

4. The proposed HAUI-Miner algorithm

In this paper, we design an average-utility (AU)-list structure to

store the information needed by the mining process. Moreover, an

algorithm named HAUI-Miner is also developed to mine HAUIs

more efﬁciently than previous works. In traditional association rule

mining (ARM), the downward closure (DC) property is used to

reduce the search space and avoid the problem of the combinato-

rial explosion for mining HAUIs. In HAUIM, this property does not

hold for the average utility measure. To restore this property and

effectively reduce the search space, this paper introduces a

transaction-maximum utility downward closure (TMUDC) prop-

erty. It allows to prune unpromising candidates early, and thus

to reduce the search space to efﬁciently discover the actual HAUIs.

Deﬁnition 6. The transaction-maximum utility of a transaction T

is denoted as tmuðT

Þ, and deﬁned as the maximum utility of items

in a transaction T

, that is:

tmuðT

Þ¼maxðfuði

Þji

2 T

gÞ: ð7Þ

For example, the transaction-maximum utility of T

is calcu-

lated as muðT

Þ = maxf5; 6; 6; 9; 6gÞ(=9). The transaction-

maximum utilities of the other transactions are calculated in the

same say, and are shown in Table 3.

Deﬁnition 7. The average-utility upper-bound of an itemset X is

denoted as auubðXÞ, and deﬁned as the sum of the transaction-

maximum utilities of transactions containing X, that is:

auubðXÞ¼

X # T

tmuðT

Þ: ð8Þ

Table 1

A quantitative database.

TID Transaction (item, quantity)

1 A:1, B:6, C:3, D:3, F:6

2 B:2, C:3, E:2

3 A:2, C:1, D:2, E:1

4 A:1, B:9, C:3, D:2, F:2

5 A:3, B:9, C:3, D:1, E:1

6 C:4, D:1, E:1

Table 2

A proﬁt table.

Item Proﬁt

A 5

B 1

C 2

D 3

E 4

F 1

J.C.-W. Lin et al. / Advanced Engineering Informatics 30 (2016) 233–243

235

剩余10页未读，继续阅读

weixin_38690545

粉丝: 4
资源: 927

高效挖掘高平均效用项集算法

挖掘高平均效用项集的快速算法

具有事务删除功能的高平均效用项目集的维护算法

快速挖掘高平均效用项集的创新算法

事务删除下的高平均效用项目集维护算法

通过事务插入更新发现的高平均效用模式

模糊特征的top-k平均效用co-location模式挖掘.docx

事务插入更新的高平均效用模式算法

排序学习在推荐算法中的应用与进展

信息熵驱动的高维分类数据子空间聚类新算法

创新与挑战：实时数据挖掘算法的未来之路

最新资源