快速挖掘高平均效用项集的创新算法

99 浏览量更新于2024-07-15 收藏 1.42MB PDF 举报

挖掘高平均效用项集（High-Average Utility Itemsets, HUIs）是近年来交易数据库领域中的热门研究课题。在传统的数据挖掘任务中，寻找频繁项集（frequent itemsets）是基础，而HUIs则扩展了这一概念，它关注的是那些尽管出现频率不高，但整体贡献度较大的项集组合。这种度量方式在很多实际场景下具有重要意义，例如市场篮子分析、客户忠诚度分析以及个性化推荐系统等。这篇名为"A fast algorithm for mining high average-utility itemsets"的研究论文由Jerry Chun-Wei Lin、Shifeng Ren、Philippe Fournier-Viger、Tzung-Pei Hong、Ja-Hwung Su和Bay Vo等人合作撰写，发表于2017年的《应用智能》(ApplIntell)期刊，DOI为10.1007/s10489-017-0896-1。论文的核心内容探讨了一种高效的算法，旨在快速识别出那些在给定数据集中具有高平均效用的项集。在论文中，作者首先回顾了HUIs问题的基本背景和现有方法，强调了现有方法在处理大规模数据时存在的效率挑战。他们提出了新算法的主要思想，可能是通过改进的关联规则挖掘策略，结合统计学方法，如期望价值或加权频率，来提升搜索效率。该算法可能采用了剪枝技术、并行计算或者启发式搜索策略，以减少计算复杂度和内存消耗。算法的关键特性可能包括： 1. **高效性**：论文重点在于提出一种能有效处理大规模数据集的算法，能够在保证准确性的前提下，显著缩短发现高平均效用项集的时间。 2. **可扩展性**：针对大数据环境，设计了能够处理海量交易记录的架构，支持分布式计算，使得算法在处理实时或批量数据时都能保持性能。 3. **精度与效率平衡**：在挖掘过程中，既考虑了项集的频度，又充分考虑了它们对整体效用的贡献，实现了精确性和效率之间的良好平衡。 4. **实证分析**：论文可能会包含详细的实验评估，展示新算法在不同规模和类型的数据库上的性能对比，以及与现有方法的比较结果。这篇文章是数据库挖掘领域的一份重要贡献，为解决高平均效用项集的挖掘问题提供了新的解决方案。对于数据科学家、数据库管理员和商业智能专家来说，理解和应用这项研究成果，可以帮助他们在实际业务场景中提高数据分析的精准性和效率。

J.C.-W. Lin et al.

Table 1 A transactional database

TID Items with their quantities

1 A:1, B:6, C:3, D:3, F:6

2 B:2, C:3, E:2

3 A:2, C:1, D:2, E:1

4 A:1, B:9, C:3, D:2, F:2

5 A:3, B:9, C:3, D:1, E:1

6 C:4, D:1, E:1

7 A:1, E:1

For example in Table 1, the utility of item (A)inT

calculated as u(A, T

) = q(A, T

) × pr(A) = 1 × 5(= 5).

Definition 2 The utility of an itemset X in transaction T

denoted as u(X, T

), and defined as:

u(X, T

) =



∈X∧X⊆T

u(i

). (2)

For example in Table 1, the utility of itemset (AB)in

is calculated as u(AB, T

) = u(A, T

) + u(B, T

) =

1 × 5 + 6 × 1(= 11).

Definition 3 The utility of an itemset X in a database D is

denoted as u(X), and defined as:

u(X) =



X⊆T

∧T

∈D

u(X, T

). (3)

For example in Table 1, the utility of the itemset (AB)

is calculated as u(AB) = u(AB, T

) + u(AB, T

) +

u(AB, T

)(= 11 + 14 + 24)(= 49).

Definition 4 The transaction utility of a transaction T

denoted as tu(T

), and defined as:

tu(T

) =



∈X

u(i

), (4)

in which j is the number of items in T

For example in Table 1, the utility of T

is calculated as

tu(T

) = 5 + 6 + 6 + 9 + 6(= 32).

Table 2 A profit table

Item ABCDEF

Profit5123 41

Definition 5 The total utility of all transactions in a

database D is denoted as TU, and defined as:

TU =



∈D

tu(T

). (5)

For example in Table 1, the total utility of the database is

calculated as TU = 32+16+22+28+37+15+9(= 159).

The above definitions are used in traditional HUIM. An

itemset X is said to be a high-utility itemset (HUI) iff its util-

ity in a database D is no less than the minimum utility count

(minimum utility threshold multiplied by the total utility of

the database), that is:

HUI ←{X|u(X) ≥ TU × δ}, (6)

where δ is the user-defined minimum utility threshold.

For instance, suppose that the minimum utility threshold

is set to 37 %. Hence, the minimum utility count is 159 ×

37 %(= 58.83). Thus, HUIs in that database are (ABCDF :

60), (ABD : 67), (ABCD : 85), (ABC : 67), (BCD : 60),

(BCD : 60), (AD : 59), and (ACD : 79), where the number

written after each itemset indicates its utility.

Since the utilities of itemsets tend to be greater for larger

itemsets (itemsets containing more items); as a solution,

the task of high average-utility itemset mining (HAUIM)

was proposed [11]. It provides a more fair measurement

of the utility by taking each itemset’s size into account

when calculating its average-utility. HAUIM is based on the

following definitions.

Definition 6 The average-utility of an item (i

) in a trans-

action T

is denoted as au(i

), and defined as:

au(i

) =

q(i

) × pr(i

)

u(i

)

. (7)

For example in Table 1, the average-utility of the item

(A) in transaction T

is calculated as au(A, T

) =

(= 5),

which is equal to its utility value in traditional HUIM.

Definition 7 The average-utility of a k-itemset X in a

transaction T

is denoted as au(X, T

), and defined as:

au(X, T

) =



∈X∧X⊆T

u(i

)

|X|=k

. (8)

For example in Table 1, the average-utility of the itemset

(AB) in transaction T

is calculated as au(AB, T

) =

5.5).

剩余15页未读，继续阅读

weixin_38581777

粉丝: 4

快速挖掘高平均效用项集的创新算法

挖掘高平均效用项集的有效算法

具有事务删除功能的高平均效用项目集的维护算法

高效挖掘高平均效用项集算法

事务删除下的高平均效用项目集维护算法

通过事务插入更新发现的高平均效用模式

模糊特征的top-k平均效用co-location模式挖掘.docx

事务插入更新的高平均效用模式算法

浅析数据挖掘中的数据预处理技术.pdf

创新与挑战：实时数据挖掘算法的未来之路

【高级数据挖掘】：挖掘logit_probit回归在市场分析中的深层价值

最新资源