社交网络中利用行为因素的级联框架识别垃圾账号

PDF格式 | 1009KB | 更新于2024-08-27 | 134 浏览量 | 举报

本文档深入探讨了在社交网络中揭露垃圾邮件发送者的递归框架（Cascading Framework for Uncovering Spammers in Social Networks）。随着社交媒体（OSNs）的迅速普及，它们已经成为市场营销和广告的重要平台，但同时，垃圾邮件问题已成为困扰OSNs的一大难题，引起了学术界和业界的广泛关注。研究者 Zejia Chen、Jiahai Yang 和 Jessie Hui Wang 从用户行为的角度出发，关注了关系建立、用户活跃度、用户互动以及推文内容等因素，以评估其对识别垃圾邮件用户的重要性。实验结果显示，推文内容对于识别垃圾邮件发送者具有决定性的影响，这表明垃圾邮件发送者往往在内容上存在某些模式或特征。其次，关系创建也是关键因素，因为垃圾邮件发送者可能通过建立大量虚假或不正常的关系来扩大其影响力。基于这些行为因素，研究者们提出了一个名为CWB-SPAM的新颖递归框架，它能够有效地检测和过滤社交网络中的垃圾邮件账户。 CWB-SPAM框架的设计考虑了多级分析，即逐层递归地分析用户的行为特征，从个体到社区，逐步揭示出可疑的垃圾邮件行为模式。在Sina Microblog（新浪微博）等真实数据集上的实验验证了该方法的有效性和准确性，表明CWB-SPAM能够在大规模社交网络环境中有效减少垃圾信息的传播，保护用户的在线体验。这篇论文不仅提供了对社交网络垃圾邮件问题的新视角，还为构建更有效的反垃圾邮件策略和工具提供了实用的算法基础。这对于维护社交网络的健康生态和提升用户体验具有重要意义。未来的研究可以进一步优化模型性能，考虑更多的用户行为特征，并结合机器学习和深度学习技术，以实现更精确的垃圾邮件检测。

A Cascading Framework for Uncovering

Spammers in Social Networks

Zejia Chen, Jiahai Yang, Jessie Hui Wang

Tsinghua National Laboratory for Information Science and Technology

Dept. of Computer Science and Technology, Tsinghua University

Beijing, 100084, China

zejiachen@gmail.com, {yang,hwang}@cernet.edu.cn

Abstract—With tremendous popularity, OSNs have become

the most important platform for marketing and advertising

during the past years. Meanwhile, spamming has already become

a very serious problem in OSNs, drawing the attention of both

academic and industry communities. In this paper, we investigate

the problem of spammer detection from the perspective of user

behaviors, including relation creation, user activeness, user

interaction and tweet content. We quantitatively explore their

correlations with spammer detection and find that tweet content

is the most important factor for spammer detection, followed by

relation creation. Based on these behavior factors, we propose a

novel cascading framework CWB-SPAM for spammer detection

in OSNs. Experiments on dataset crawled from Sina Microblog

show that the proposed algorithm outperforms over all classical

algorithms we investigated in terms of F-score

. Experiments also

demonstrate that as a probabilistic classification model, the

proposed CWB-SPAM has a good ranking quality. It enables the

OSN operators to make tradeoff between precision and recall

easily so that the proposed algorithm can be used in different

scenarios. Besides, we also note that the proposed framework can

be used in other probabilistic binary classification models and

thus applied in more scenarios.

Keywords—social network; user behavior; spammer detection;

cascading framework

I. INTRODUCTION

As social networks gain tremendous popularity in the past

years, spammers start to utilize OSNs as a new platform to

conduct their malicious behaviors. According to a survey of

Harris Interactive, 80% of OSN users have received unwanted

friend request, messages and postings in their OSN accounts in

the past years, which means spammers are now very popular in

social networks [1]. Many users have complained about their

terrible experiences in OSNs where they have to face with too

much spam. Moreover, spamming in OSNs has harmful effects

on both OSN users and operators. Virus, phishing or malwares

contained in spam can always lead to users' treasure loss or

privacy disclosure. Users may then reluctantly choose to leave

and therefore the population of OSNs suffers a decrease.

Considering the great impact spammers in OSNs bring,

both industry and academia have been making efforts to detect

them. In industry, in order not to mistake normal users for

spammers, OSN operators choose to apply simple strategies

F-score refers to the harmonic mean between precision and recall.

with high precision like URL blacklist and single-feature-

threshold classifiers [2], although they have a much lower

recall. Since these approaches exploit few aspects of behaviors,

spammers can adapt their behaviors to evade detection easily.

In academia, researchers notice that spammers will form tightly

connected communities in OSNs [3,4]. They then propose

detection approaches exploiting this social-graph characteristic.

However, all these approaches rely on the assumption that the

network is fast-mixing, which has been validated to be untrue

in large scale OSNs like Facebook, Youtube, etc[5]. Recently,

researchers start to detect spammers by training classifiers on

behavior features using traditional classification algorithms like

Random Forest. However, experiments in this paper reveal that

these traditional algorithms don’t work well for spammer

detection in OSNs. In summary, although much work has

already been done by academic and industry communities,

spammer detection remains an unsolved problem. Actually,

according to research report published by NexGate, during the

first half of 2013, there has been a growth of 355% in spam in

OSNs including Facebook, Twitter, Google+ and Youtube [6].

In this paper, we propose and evaluate a novel algorithm

CWB-SPAM for spammer detection in OSNs. We build a

classifier based on features extracted from various aspects of

user behaviors so as to make the detector more solid, as it

makes it much tougher for spammers to adapt their behaviors.

To improve the classifier's precision and recall, we propose a

cascading framework, in which we train a classifier using all

data and a second classifier on instances difficult to classify.

We then use these two classifiers to detect spammers in a

cascading framework. Besides, we design CWB-SPAM as a

probabilistic model so that OSN operators can easily make

tradeoff between precision and recall in different scenarios.

Experiments show that the proposed algorithm achieves a

precision and recall of about 90%. In addition, by changing the

parameters in CWB-SPAM, we can achieve a precision of 95%

with a recall of 81% or a higher recall with a lower precision.

In summary, we frame our contributions as follows:

 After investigating various aspects of user behaviors,

we find that spammers distinguish from normal users

in behavior patterns of relation creation, user

interaction, tweet content and user activeness. We

find that they all have predictive power but cannot be

used independently to detect spammers effectively.

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38700240

粉丝: 2

社交网络中利用行为因素的级联框架识别垃圾账号

A model for cascading failures in complex networks

Video Copy-Detection and Localization with a Scalable Cascading Framework

Identifying Vulnerable Nodes of Complex Networks in Cascading Failures induced by Node-based Attacks

Cascading Style Sheets for Web Design Second Edition_part1

Cascading Style Sheets for Web Design Second Edition_Part2

Cascading Style Sheets: Designing for the Web

Beginning CSS: Cascading Style Sheets for Web Design, Second Edition Part 2

Beginning CSS: Cascading Style Sheets for Web Design, Second Edition Part 4

Beginning CSS: Cascading Style Sheets for Web Design, Second Edition Part 3

Beginning CSS: Cascading Style Sheets for Web Design, Second Edition Part 5

最新资源