researches but did not provide sophisticated and efficient
methods of detection.
Subsequently, many approaches have been proposed to
combat spam in online social networks.These approaches can
be generally divided into two types: machine learning-based
(
Hu et al., 2013; Lee et al., 2010; Lee and Kim, 2012; Wang et al.,
2015) and social-graph-based (Cao et al., 2012; Xue et al., 2013;
Yang et al., 2012
) approaches. Chen et al. (2015) constructed a
huge ground-truth dataset consisting of 6.5 million spam tweets
and 6 million non-spam tweets, and then conducted a com-
prehensive evaluation of different machine learning algorithms
using lightweight features.
Gao et al. (2012) present an online
spam filtering system as a component of online social network
platforms. They aggregated messages generated by users to
campaigns by adopting incremental clustering algorithm and
used six features to distinguish spam campaigns.
Cao et al.
(2012) designed an inference scheme to detect fake accounts
by computing the landing probability of early terminated
random walks.Yang et al
(2013) further explored the friend in-
vitation graphs and developed a detection system based on it.
Given the ability of spammers to rapidly change their tactics
to evade such detections (
Zhu et al., 2012), these approaches
are not very effective.
To meet this challenge, researchers have developed their
countermeasures.
Yang et al. (2013) identified and verified
several common evasion strategies used by spammers, de-
signed some more sophisticated detection features based on
the analysis, and then proposed a formal model to evaluate
the robustness of the new features.
Fu et al. (2015) extracted
the carefulness of users as a metric to indicate how careful a
user is when following another user, and subsequently made
use of this parameter to adjust and improve existing features
and methods.
Chen et al. (2017) focused on the “Twitter spam
drift” problem in which spammers post more tweets with the
similar semantic meaning but different text to evade detec-
tion; they proposed a “Lfun” approach which learns from
unlabeled tweets to address the “Twitter Spam Drift” problem.
However, the weakness of the above approaches is that they
do not address the critical issue as they still consider the social
networks as a static system, whereas spammers are con-
stantly finding new evasion techniques.To address this problem,
we propose a dynamic metric to describe temporal patterns
of users and develop a novel method for identifying spammers.
It is one of the main differences between our method and other
previous approaches.
3. Preliminary
Before illustrating our study in detail, we provide the motiva-
tion behind our work and the assumptions used in our
approach.
3.1. Motivation
As mentioned earlier, the means that spammers use to evade
detection are becoming increasingly sophisticated. If an in-
spection system is capable of discovering majority of the
spamming accounts at a period, its capacity to do so at another
period is uncertain.
The main reason for this new challenge is that most de-
tection mechanisms characterize users on the basis of their
features at a single point of time, whereas spammers continu-
ously optimize their spamming strategies.This aspect motivated
us to obtain a deeper insight about users in terms of tempo-
ral evolution patterns and design a detection system.
3.2. Assumptions
To make the proposed approach more reasonable, we make fol-
lowing assumptions according to the observations of the dataset
and the experiences of previous studies.
Assumption 1. Spammers constantly change their spamming
strategies.
This assumption is mentioned and utilized in many exist-
ing approaches, such as
Liu et al. (2016); Tan et al. (2013). The
reason behind this assumption is easy to understand: As the
intensity of detection increases, only those spammers who
adjust their strategies according to the detection method will
be able to survive. Meanwhile, legitimate users will not have
to make these adjustments, thus forming relatively non-
volatile patterns.
Assumption 2. Spammers tend to control a large number of
accounts to spread spam.
The profits of spammers’ activities are dependent on the
extent of users to which their spam messages can reach.
Because of the broad adoption of features based on bursty prop-
erty, such as time interval, spammers cannot generate a massive
amount of spam information using a few accounts. There-
fore, spammers usually create or compromise a significant
number of accounts and use them to spread spam to a large
set of users, thereby resulting in corresponding spam ac-
counts with similar behavioral patterns.
4. Dynamic metric
In this section, we first present the activity measures that are
used to build our dynamic metric.These activity measures prin-
cipally consist of features based on users’ activities. We then
illustrate the proposed dynamic metric and the new features
for characterizing the evolution patterns and detecting
spammers.
4.1. Activity measures
The features that we use as activity measures to build the
dynamic metric are divided into two categories: graph-based
and non-graph-based. For a spammer, the first step to spread
the malicious information in social networks is to establish
social relationships with other users, thus features based on
the social graph is a primary source of access to users’ pref-
erences and characteristics. We select four of these features:
degree centrality, bidirectional link ratio, betweenness centrality, and
62 computers & security 72 (2018) 60–73