SURF: Detecting and Measuring Search Poisoning
Long Lu
College of Computing
Georgia Inst. of Technology
long@cc.gatech.edu
Roberto Perdisci
Dept. of Computer Science
University of Georgia
perdisci@cs.uga.edu
Wenke Lee
College of Computing
Georgia Inst. of Technology
wenke@cc.gatech.edu
ABSTRACT
Search engine optimization (SEO) techniques are often abused to
promote websites among search results. This is a practice known
as blackhat SEO. In this paper we tackle a newly emerging and
especially aggressive class of blackhat SEO, namely search poi-
soning. Unlike other blackhat SEO techniques, which typically at-
tempt to promote a website’s ranking only under a limited set of
search keywords relevant to the website’s content, search poison-
ing techniques disregard any term relevance constraint and are em-
ployed to poison popular search keywords with the sole purpose of
diverting large numbers of users to short-lived traffic-hungry web-
sites for malicious purposes.
To accurately detect search poisoning cases, we designed a novel
detection system called SURF. SURF runs as a browser component
to extract a number of robust (i.e., difficult to evade) detection fea-
tures from search-then-visit browsing sessions, and is able to ac-
curately classify malicious search user redirections resulted from
user clicking on poisoned search results. Our evaluation on real-
world search poisoning instances shows that SURF can achieve a
detection rate of 99.1% at a false positive rate of 0.9%. Further-
more, we applied SURF to analyze a large dataset of search-related
browsing sessions collected over a period of seven months starting
in September 2010. Through this long-term measurement study we
were able to reveal new trends and interesting patterns related to a
great variety of poisoning cases, thus contributing to a better un-
derstanding of the prevalence and gravity of the search poisoning
problem.
Categories and Subject Descriptors
H.3.3 [INFORMATION STORAGE AND RETRIEVAL]: Infor-
mation Search and Retrieval—Relevance feedback
General Terms
Security
Keywords
Search engine poisoning, Malicious search engine redirection,
Detection, Measurement
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
CCS’11, October 17–21, 2011, Chicago, Illinois, USA.
Copyright 2011 ACM 978-1-4503-0948-6/11/10 ...$10.00.
1. INTRODUCTION
Search engines, capable of digging out the most relevant from
oceans of information, have become web surfers’ first choice when
seeking information on the web. In fact, for most websites more
than 70% of their visitors reach their pages through search en-
gines [6]. Therefore, website owners always strive to attract more
visits by optimizing their exposure in relevant search results. To
fulfill this need, web developers use a number of search engine op-
timization (SEO) techniques, which can improve the visibility of a
website to the search crawlers, highlight its relevance under certain
search terms, and promote its raking in the search results.
Legitimate uses of SEO techniques are accepted and even en-
couraged by search engines [1]. However, dishonest web devel-
opers may choose to abuse these techniques in various ways to
gain (or cheat) a favorable ranking in the search results, a prac-
tice known as blackhat SEO. In this case, search crawlers are pre-
sented with deceptive views of a website, which consist of spe-
cially crafted webpages with inflated relevance to a set of target
search terms. Attempts to counter blackhat SEO have been pro-
posed mainly in the information retrieval community [18, 24], but
with very limited success against the recent surge of blackhat SEO
adopters [11]. In the meantime, blackhat SEO has not captured
sufficient attention from the security community, perhaps because
such techniques have been historically employed by non-harmful
websites, including some high profile ones [10], that execute overly
aggressive marketing strategies to win search users from their com-
petitors.
This paper tackles a newly emerging class of blackhat SEO tech-
niques developed by Internet miscreants to lure search users into
visiting malicious websites [7]. We refer to this new class of black-
hat SEO as search poisoning. Unlike other blackhat SEO tech-
niques, which typically attempt to promote a website’s ranking
only under a limited set of search keywords relevant to the web-
site’s content, search poisoning techniques disregard any term rel-
evance constraint. In practice, search poisoning techniques target
any search term that can maximize the number of incoming search
users (e.g., popular keywords). This is in contrast with SEO or
other blackhat SEO techniques adopted by regular websites, be-
cause if search poisoning were to be used to promote a regular web-
site, users landing on the website via completely unrelated search
terms may get annoyed and the website’s reputation may be ir-
reparably damaged. Therefore, we posit that search poisoning can-
not be used for legitimate purposes and is only useful to short-lived
traffic-hungry websites that aim to attract search users for malicious
purposes.
We approach the search poisoning problem from a new angle,
compared to previous work on blackhat SEO. We focus on detect-
ing malicious search user redirections, an essential component of