A Cascading Framework for Uncovering
Spammers in Social Networks
Zejia Chen, Jiahai Yang, Jessie Hui Wang
Tsinghua National Laboratory for Information Science and Technology
Dept. of Computer Science and Technology, Tsinghua University
Beijing, 100084, China
zejiachen@gmail.com, {yang,hwang}@cernet.edu.cn
Abstract—With tremendous popularity, OSNs have become
the most important platform for marketing and advertising
during the past years. Meanwhile, spamming has already become
a very serious problem in OSNs, drawing the attention of both
academic and industry communities. In this paper, we investigate
the problem of spammer detection from the perspective of user
behaviors, including relation creation, user activeness, user
interaction and tweet content. We quantitatively explore their
correlations with spammer detection and find that tweet content
is the most important factor for spammer detection, followed by
relation creation. Based on these behavior factors, we propose a
novel cascading framework CWB-SPAM for spammer detection
in OSNs. Experiments on dataset crawled from Sina Microblog
show that the proposed algorithm outperforms over all classical
algorithms we investigated in terms of F-score
1
. Experiments also
demonstrate that as a probabilistic classification model, the
proposed CWB-SPAM has a good ranking quality. It enables the
OSN operators to make tradeoff between precision and recall
easily so that the proposed algorithm can be used in different
scenarios. Besides, we also note that the proposed framework can
be used in other probabilistic binary classification models and
thus applied in more scenarios.
Keywords—social network; user behavior; spammer detection;
cascading framework
I. INTRODUCTION
As social networks gain tremendous popularity in the past
years, spammers start to utilize OSNs as a new platform to
conduct their malicious behaviors. According to a survey of
Harris Interactive, 80% of OSN users have received unwanted
friend request, messages and postings in their OSN accounts in
the past years, which means spammers are now very popular in
social networks [1]. Many users have complained about their
terrible experiences in OSNs where they have to face with too
much spam. Moreover, spamming in OSNs has harmful effects
on both OSN users and operators. Virus, phishing or malwares
contained in spam can always lead to users' treasure loss or
privacy disclosure. Users may then reluctantly choose to leave
and therefore the population of OSNs suffers a decrease.
Considering the great impact spammers in OSNs bring,
both industry and academia have been making efforts to detect
them. In industry, in order not to mistake normal users for
spammers, OSN operators choose to apply simple strategies
1
F-score refers to the harmonic mean between precision and recall.
with high precision like URL blacklist and single-feature-
threshold classifiers [2], although they have a much lower
recall. Since these approaches exploit few aspects of behaviors,
spammers can adapt their behaviors to evade detection easily.
In academia, researchers notice that spammers will form tightly
connected communities in OSNs [3,4]. They then propose
detection approaches exploiting this social-graph characteristic.
However, all these approaches rely on the assumption that the
network is fast-mixing, which has been validated to be untrue
in large scale OSNs like Facebook, Youtube, etc[5]. Recently,
researchers start to detect spammers by training classifiers on
behavior features using traditional classification algorithms like
Random Forest. However, experiments in this paper reveal that
these traditional algorithms don’t work well for spammer
detection in OSNs. In summary, although much work has
already been done by academic and industry communities,
spammer detection remains an unsolved problem. Actually,
according to research report published by NexGate, during the
first half of 2013, there has been a growth of 355% in spam in
OSNs including Facebook, Twitter, Google+ and Youtube [6].
In this paper, we propose and evaluate a novel algorithm
CWB-SPAM for spammer detection in OSNs. We build a
classifier based on features extracted from various aspects of
user behaviors so as to make the detector more solid, as it
makes it much tougher for spammers to adapt their behaviors.
To improve the classifier's precision and recall, we propose a
cascading framework, in which we train a classifier using all
data and a second classifier on instances difficult to classify.
We then use these two classifiers to detect spammers in a
cascading framework. Besides, we design CWB-SPAM as a
probabilistic model so that OSN operators can easily make
tradeoff between precision and recall in different scenarios.
Experiments show that the proposed algorithm achieves a
precision and recall of about 90%. In addition, by changing the
parameters in CWB-SPAM, we can achieve a precision of 95%
with a recall of 81% or a higher recall with a lower precision.
In summary, we frame our contributions as follows:
After investigating various aspects of user behaviors,
we find that spammers distinguish from normal users
in behavior patterns of relation creation, user
interaction, tweet content and user activeness. We
find that they all have predictive power but cannot be
used independently to detect spammers effectively.
ISBN 978-3-901882-58-6 © 2014 IFIP