利用效用度量提取非冗余关联购买行为

125 浏览量更新于2024-08-31 收藏 821KB PDF 举报

"通过效用度量提取非冗余关联购买行为" 这篇研究论文探讨了如何利用效用度量来提取非冗余关联的购买行为，旨在从海量的用户点击和购买数据中挖掘有价值的信息。在电子商务和大数据分析的背景下，理解用户的购买行为对于提升销售策略、个性化推荐以及市场预测至关重要。文章首先指出，用户的在线购物行为数据，如搜索、浏览和购买记录，蕴含着丰富的消费者行为模式。这些模式可以反映用户的兴趣、需求和偏好，对于商家来说具有极大的商业价值。然而，这些数据通常包含大量的冗余信息，直接分析可能会导致模式的重复和不精确性。论文提出了一个基于效用度量的方法，用于识别和提取那些具有显著相关性但又不冗余的购买行为模式。效用度量在这里起到了关键作用，它能评估每个购买行为模式对用户价值的贡献，从而帮助过滤掉那些低效用或重复的模式。这种方法考虑了购买行为的内在相关性，即某些商品可能因为共同的用途或消费者的特定需求而经常一起被购买。为了实现这一目标，论文中可能涉及了以下技术： 1. 数据预处理：对原始的用户购买数据进行清洗和转换，以便于分析。这可能包括数据去重、异常值处理、缺失值填充等步骤。 2. 购买模式挖掘：使用聚类、关联规则学习或者序列模式挖掘等方法，找出用户购买行为中的频繁模式。 3. 效用计算：定义并应用效用函数，以量化每个购买模式对用户的价值。这可能涉及到用户评分、商品价格、购买频率等多种因素。 4. 非冗余性检查：通过投影或其他数学手段，检查和消除模式之间的冗余，确保提取出的购买行为模式是独立且有价值的。 5. 结果评估与验证：采用适当的评估指标（如准确率、召回率、F1分数等）来验证所提方法的有效性和效率。这篇论文为理解和挖掘用户购买行为提供了一个新颖的视角，通过效用度量，不仅可以更准确地捕捉到用户的实际需求，还能有效地减少数据分析的复杂性，对于电商平台和市场研究具有重要的实践意义。

32 W. Gan et al. / Knowledge-Based Systems 143 (2018) 30–41

Recently, a novel list-based algorithm named HUI-Miner [33] is

proposed to eﬃciently mine HUIs without candidate genera-

tion. It introduces a new data structure named utility-list which

stores the actual utility and remaining utility of an itemset

within each transaction in the processed database. Thus, the one-

phase HUI-Miner algorithm signiﬁcantly outperforms the previ-

ous Two-Phase [34] and pattern-growth tree-based algorithms

(i.e., IHUP [6] , UP-Growth [40] , and UP-Growth + [41] ). After

that, two enhanced versions of HUI-Miner, FHM [18] and HUP-

Miner [22] , are further developed. Experiments have shown that

both FHM [18] and HUP-Miner [22] signiﬁcantly outperform

the existing state-of-the-art HUI-Miner algorithm for most static

databases. At the same time, another approach named d

HUP is

also proposed to mine HUIs without candidate generation-and-

test [35] . Ryang and Yun introduced an indexed list-based IMHUP

algorithm for mining HUIs [38] . Recently, another faster one-phase

approach called EFIM [45] is designed for mining HUIs and it out-

performs all the previous HUIM algorithms.

Developing eﬃcient high-utility pattern mining algorithms is an

active research. Many algorithms are also extensively developed

for various problems of HUIM, such as mining HUIs from dynamic

environment databases [27] , mining on-shelf high-utility itemsets

that the products having different on-shelf behavior [23] , top- k

high-utility itemset mining without setting the minimum utility

threshold [42] , mining the up-to-date HUIs which can show the

recent trend [26] , HUIM from big data [32] . Different from the tra-

ditional deﬁnition of HUIM, Hong et al. introduced the concept

of high average utility itemset [16] , and some approaches have

been developed in recent years, such as MAU-Growth [21] . Re-

cently, some interesting issues of mining high-utility itemsets un-

der various constrains has been extensively studied, such as HUIM

with multiple minimum utility thresholds [30] , HUIM with consid-

eration of various discount strategies [29] , utility-based association

rule mining [36] , mining HUIs over uncertain databases [25] , min-

ing HUIs with both positive and negative unit proﬁts [31] , and so

on.

2.2. Aﬃnity/correlation pattern mining

Traditional algorithm of frequent pattern mining (FPM) [14] and

association rule mining (ARM) [2,3] uses the support (frequency)

of patterns as a constraint to prune the search space. However,

this support-based approach has major drawbacks [39,43] . To ad-

dress this issue, the problem of strong aﬃnity pattern mining

has been extensively studied in recent year [7,20,37,39,43] . To ﬁnd

strong aﬃnity patterns containing low-support items, the con-

cept of hyperclique patterns [20] was proposed. It deﬁnes a new

measure named hyperclique (h)-conﬁdence, which is equivalent to

the all-conﬁdence, to discover hyperclique patterns. With the h-

conﬁdence measure, a cross-support property is used to effectively

eliminate spurious patterns. Besides, three interesting measures

called any-conﬁdence, all-conﬁdence, and bond [37] were proposed

to ﬁnd frequency-based or support-based correlated patterns.

The degree of the expectation-based correlation is highly in-

ﬂuenced by the number of null transactions [43] , i.e., transac-

tions which do not contain items whose correlation has been mea-

sured. Hence, such measures are not suitable for the study of

correlations in large datasets, where the number of null transac-

tions could be large and unstable. For the problem of contrast-

ing positive and negative correlations, it is crucial to adopt a re-

liable correlation measure that is unconcerned with the number

of null-transactions present in the database. These measures are

called null (transaction) - invariant [43] . The main property of a

null-invariant measure is its independence of the total number of

transactions in a database. According to the study in [43] , all ﬁve

known null-invariant measures, including: all-conﬁdence [37] , Co-

herence [37] , Cosine [43] , Kulczynsky [9] , can be viewed as a gen-

eralized mean of conditional probabilities. Kulczynsky [9] has the

null (transaction)-invariant property, which implies that the corre-

lation measure is independent of the dataset size [43] .

2.3. Comparative analysis with related work

Up to now, the problem of HUIM has been extensively stud-

ied. Summary of the developed HUIM algorithms are shown in

Table 1 . Although the above traditional models of HUIM can reﬂect

the utility of the itemsets which beyond a minimum utility value,

the inherent correlation of items inside the patterns has not been

considered. In real-world situations, the discovered HUIs may be

meaningless, invalid even misleading (have happened by chance)

if they are weakly correlated. Identifying correlation and utility

information in databases can provide valuable information. Thus,

it is a critical issue and a challenge to discover non-redundant

and more correlated HUIs from transaction databases. In the past

studies, the HUIPM algorithm [7] and the faster FDHUP algorithm

[28] were respectively proposed to simultaneously consider both

the frequency aﬃnity and utility of patterns. FDHUP introduces

two compact structures named EI-table and FU-tree, and utilizes

three pruning strategies to reduce the search space, thus performs

better than the HUIPM algorithm [7] . Both of them only consider,

however, the co-occur frequency the transactions as the correla-

tion factor, which cannot discover the real inherent correlation of

among items inside the desired patterns.

Thus, it is an important and challenging task to design an ef-

ﬁcient algorithm for extracting non-redundant correlated purchase

behaviors by utility measure. In this paper, the proposed method

aims at discovering non-redundant correlated high-utility itemsets

from transactional quantitative databases and has the following

main differences compared to the state-of-the-art algorithms for

mining high-utility itemsets.

• The mining goal w.r.t. derived patterns is different. All the

above algorithms except HUIPM are designed to discover high-

utility itemsets, while the CoHUIM algorithm aims at mining

the high positive correlated but interesting patterns which hav-

ing high utility values.

• The correlation measure used in this paper is quite different

from the support aﬃnity measure used in HUIPM. The sup-

port aﬃnity was designed only with the co-occurrence consid-

eration. It used the utility relationship instead of the presence

information to measure the correlation of the items inside an

itemset. Utility factor of objects is, however, the user-speciﬁed

important value, the subjective criteria may be wrong to mea-

sure the objective correlation relationship. Moreover, they are

out of our purposes since the designed approach is used to

avoid mining the misleading HUIs from transactional databases.

• Furthermore, we develop some techniques to improve mining

performance. Not only the projection technology is adopted, but

also a global sorted downward closure ( SDC ) property is used in

the proposed approach.

3. Preliminaries and problem statement

Let I = { i

, i

, . . . , i

} be a ﬁnite set of m distinct items in a

temporal transactional database D = { T

, T

, . . . , T

}, where each

transaction T

= { q (i

, T

) , q (i

, T

) , . . . , q (i

, T

) } is a subset of I ,

and has an unique identiﬁer ( TID ). Notice that the q ( i

, T

) in each

is the different purchase quantity of each item. An unique proﬁt

pr ( i

) is assigned to each item i

∈ I , which represents its impor-

tance (e.g., proﬁt, interest, risk), and they are stored in a proﬁt-

table ptable = { pr ( i

), pr ( i

), . . . , pr ( i

)}. An itemset X ∈ I with k

distinct items { i

, i

, . . . , i

} is of length k and is referred to as a

剩余11页未读，继续阅读

weixin_38708461

粉丝: 5
资源: 993

利用效用度量提取非冗余关联购买行为

CorSegRec: A Topology-Preserving Scheme for Extracting Fully-Con

Inputting Matrices and Extracting Elements - Part 2.zip

Page Extracting likes-crx插件

Inputting Matrices and Extracting Elements - Part 1.zip

js-rails-as-api-extracting-a-service-class-v-000

Extracting-Information-and-Visualization-of-the-LAK-Dataset

dot-extracting-data-test

Co-Extracting-Opinion-Targets-and-Opinion-Words-From-Online-Reviews

js-rails-as-api-extracting-a-service-class-online-web-sp-000

Extracting-speech-from-a-noisy-record-DSP-Project:数字信号处理项目

最新资源