无线网络用户行为分析：主题模型与聚类新方法

183 浏览量更新于2024-08-26 收藏 1.28MB PDF 举报

"这篇研究论文探讨了如何利用主题模型对无线网络用户的行为进行建模和聚类分析。作者包括Bingjie Leng、Jingchu Liu、Huimin Pan、Sheng Zhou和Zhisheng Niu，来自清华大学信息科学技术国家实验室和电子工程系。他们提出了一种基于文档分类问题的主题模型用户行为模型，并使用对数TF-IDF权重来构建高维稀疏特征矩阵，然后通过潜在语义分析（LSA）降维得到低维密集特征矩阵，最后采用K-means++算法进行用户聚类。" 正文: 在无线网络中，对用户行为的理解和分析对于商业决策、网络服务质量提升以及社会管理具有重要的意义。该研究论文“基于主题模型的无线网络用户行为建模和聚类分析”提出了一种新的方法，旨在通过对用户流量日志的分析，将用户群体按照访问最频繁的网站进行聚类，以便揭示他们的兴趣偏好。首先，研究者采用了主题模型这一概念，它源于自然语言处理领域，通常用于文档分类和信息检索。在用户行为分析的背景下，主题模型可以视为一种工具，它能够挖掘用户网络活动中的隐藏模式或主题，这些主题反映了用户的兴趣或行为习惯。为了构建用户行为模型，他们使用了TF-IDF（词频-逆文档频率）权重来量化每个用户访问的网站的重要性。TF-IDF是一种统计方法，可以识别出在特定文档中频繁出现但在整个文档集合中不常见的词，这在用户行为分析中对应于用户访问的独特网站。通过计算每个用户访问的网站的TF-IDF值，研究人员生成了一个高维度的稀疏特征矩阵，其中每个用户是一个行，每个网站是一个列，值表示用户访问该网站的TF-IDF权重。接下来，他们应用了潜在语义分析（LSA）来进一步处理这个高维特征矩阵。LSA是一种降维技术，它能捕获特征之间的潜在关联，将高维稀疏矩阵转换为低维密集矩阵。这种方法有助于减少计算复杂性，同时保留主要的特征信息，使得后续的聚类分析更加有效。在特征矩阵被降到低维后，研究者采用了K-means++聚类算法来划分用户群体。K-means++是K-means算法的一种优化版本，它在初始聚类中心的选择上更随机化，可以避免陷入局部最优，从而提高聚类质量。通过这个过程，用户被分到不同的簇中，每个簇代表一类具有相似网络行为的用户群体。这篇论文提出的方法通过主题模型对无线网络用户行为进行了深入解析，不仅揭示了用户的行为模式，还为商业决策和网络优化提供了有价值的参考。这种数据驱动的方法对于理解大规模无线网络中的用户行为有着广泛的应用前景，包括个性化推荐、网络资源分配优化以及网络安全策略制定等。

Topic Model Based Behaviour Modeling and

Clustering Analysis for Wireless Network Users

Bingjie Leng, Jingchu Liu, Huimin Pan, Sheng Zhou, and Zhisheng Niu

Tsinghua National Laboratory for Information Science and Technology

Department of Electronic Engineering

Tsinghua University, Beijing 100084, China

Email: {lengbj14, liu-jc12, phm13}@mails.tsinghua.edu.cn, {sheng.zhou, niuzhs}@tsinghua.edu.cn

Abstract—User behaviour analysis based on trafﬁc log in wire-

less networks can be beneﬁcial to many ﬁelds in real life: not only

for commercial purposes, but also for improving network service

quality and social management. We cluster users into groups

marked by the most frequently visited websites to ﬁnd their

preferences. In this paper, we propose a user behaviour model

based on Topic Model from document classiﬁcation problems. We

use the logarithmic TF-IDF (term frequency - inverse document

frequency) weighing to form a high-dimensional sparse feature

matrix. Then we apply LSA (Latent semantic analysis) to deduce

the latent topic distribution and generate a low-dimensional

dense feature matrix. K-means++, which is a classic clustering

algorithm, is then applied to the dense feature matrix and several

interpretable user clusters are found. Moreover, by combining

the clustering results with additional demographical information,

including age, gender, and ﬁnancial information, we are able to

uncover more realistic implications from the clustering results.

Keywords—trafﬁc log, user behaviour modeling, clustering

analysis, topic model.

I. INTRODUCTION

Thanks to the wide adoption of smart devices such as

smart phones and tablets, nowadays people can perform an

unprecedented number of tasks online, ranging from news

and ﬁnance to social and gaming. As a consequence, Internet

browsing log in wireless networks has become an essential

source of information for analyzing users’ hidden preferences

and inferring their real life behaviour. With a deeper under-

standing on the usage pattern of mobile users, network service

providers are able to provide more personalized services and

improve the service quality as well. Users’ browsing interests

are also helpful in ﬁelds such as urban planning, mobile

advertisement, transportation, education, etc [1–3].

The most naive way to extract user behaviour from the

Internet browsing dataset is to observe the long-term global

statistics of various websites. But in this situation, individual

differences will be covered up. On the contrary, if we focus on

the analysis on one single user, the similarity between users’

browsing habits will be ignored. Hence, clustering becomes an

efﬁcient method to strike a balance between these two extremes

and extract the average behaviour of a group of users who have

similar browsing history. Therefore, we design and implement

a process to cluster similar users into groups, each of which

is labeled by the type of frequently visited websites.

To apply clustering algorithms, the ﬁrst step is to represent

users with a proﬁle vector through user behaviour modeling.

In this paper, we propose a user behaviour modeling method

based on the topic model, which is originally proposed for

document classiﬁcation, to generate an original proﬁle matrix.

To enhance the discriminative power of the original matrix, we

apply TF-IDF (term frequency - inverse document frequency)

weights to regenerate a feature matrix with large dimension-

ality. With methods in Latent semantic analysis (LSA) [4],

we are able to get a low-dimensional feature matrix reﬂecting

the distribution of different topics of all the users. Finally,

clustering algorithms such as K-means++ can be applied to

the ﬁnal feature matrix and the clustering results are analyzed.

Concretely, we make the following contributions in this

paper:

• We analyze the similarity and differences between

network user modeling and document classiﬁcation,

and propose to utilize text mining algorithms for

network user modeling problems.

• Based on the analysis on our dataset, we utilize

logarithmic TF-IDF to generate sparse feature matrix

and use LSA for topic discovery and dimensionality

reduction. To our knowledge, this is the ﬁrst study to

analyze user behaviour with a combination of these

tools.

• We extract users’ interests by clustering users with

similar browsing habits into groups. We also examine

our clustering results with additional demographical

information including age, gender, and ﬁnancial in-

formation on the campus during ﬁve months. Obvious

preference differences are found between different

genders and age. It helps us explain our clustering

ﬁndings accordingly and proves that our algorithm can

work effectively. Moreover, our ﬁndings can help with

campus management in many aspects.

The rest of the paper is outlined as follows. Section

II introduces related work about user behaviour analysis in

WLAN. Section III presents the network user behavior mod-

eling problem and explain its analogy with topic modeling

in document classiﬁcation. In Section IV, we introduce our

datasets and the details of our algorithm implementation. In

Section V, we present the clustering results and explain the

ﬁndings. Finally, in Section VI, we conclude and discuss future

work.

II. RELATED WORK

With the rapid development of wireless networks, the

potential of user behaviour analysis has brought up tremen-

dous attention recently. The most common method for user

clustering is by applying K-means on raw proﬁle matrix.

For example, the web browsing similarity among users of

arXiv:1511.05618v1 [cs.SI] 17 Nov 2015

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38693173

粉丝: 4
资源: 948

无线网络用户行为分析：主题模型与聚类新方法

数学建模 数学建模聚类分析

聚类分析和神经网络的无线网络流量预测研究.pdf

聚类分析数学建模

数学建模python聚类分析

如何利用蚁群聚类算法对手机银行用户行为数据进行聚类分析，以实现精准营销？

在4G与WIFI技术支撑下，如何结合蚁群聚类算法优化手机银行用户行为数据的聚类分析，以提高精准营销的效果？

python 聚类分析模型

系统聚类分析,快速聚类分析; 两步聚类分析分析

系统聚类分析,快速聚类分析; 两步聚类分析

聚类分析法近几年的研究

最新资源

数学建模数学建模聚类分析