[10] in a system sharing many methodological similarities to our
iWatch.
A number of other tools independent of P3P have also been
developed over the years, including filtering and privacy
protecting proxy servers, popup-blockers, cookie blockers and
analysis tools, anti-phishing tools, etc. Given that many of these
functions have subsequently been absorbed by the latest
generation of web-browsers, their numbers and user base is
unknown today.
Regardless of the underlying technology, HCI researchers have
been examining the issue of how to improve the usability and
usefulness of such systems, an early shortcoming of many.
Classic papers and studies include [37, 38]. This research showed
that a secure system would fail unless these security measures
were made usable. In recent years we have seen excellent papers
on why phishing attacks work [14, 13], and how our tools and
warning tend to go unheeded, regardless of the information
presented [39]. While excellent results, it is obvious more work
still needs to be done in this area as there are far more studies of
why things fail than how to succeed.
Our approach of harvesting and examining large amounts of data
via the use of a web-crawler has been employed by other security
and privacy researchers. Recently, this approach has produced
interesting results in the identification of malware and spyware
disseminating websites [30, 32]. In these studies, researchers were
able to scan and classify a large enough sample to convincingly
argue about the state of the Internet as a whole.
3. Definitions
Before diving into the meat of our study, it is important to define
certain terms in order to avoid misunderstandings or ambiguity.
Our definitions should most often match generally accepted
definitions, but may in some cases have a rather more narrow
definition, chosen for practical considerations.
In this paper, domain, web server, and website are terms which
are used interchangeably. While in the real-world, a given domain
can host many distinct sites, we differentiate between sites based
solely on domain-names. A distinct domain-name in our study
identifies a distinct domain. Our classification of domains was
very simplistic. We did not attempt to identify synonymous
domain names (
www.theregister.co.uk is not recognized as a
synonym for
www.theregister.com), or sub-domains
(
news.bbc.co.uk is not identified as a sub-domain of
www.bbc.co.uk). The first is a hard problem and requires either a
set of records from domain registrars, or a lot of hand-tuning. The
second, though technically simple to implement, would cause
problems with hosting services and smaller or related web-sites,
which may lack unique second-level domain names.
We will also use the terms 1
st
party and 3
rd
party frequently. In
this context a 1
st
party typically refers to the domain or website
which served the page, and a 3
rd
party is any other
domain/website which either receives information about the
transaction, or supplies information or resources used by the
requested page. Examples are 3
rd
party cookies, webbugs, and
banner ads.
In this paper we will talk about technologies such as P3P policies,
webbugs, cookies, popups, and banners. P3P stands for the
Platform for Privacy Preferences, and is a standard for specifying
privacy policies in a machine-readable XML format [8]. There are
two types of P3P policies, the compact policy (CP) and the full
policy. The P3P compact policy is a keyword abbreviated P3P
policy, offering less detail and nuance, but often used by browsers
to filter cookies. P3P and P3P policy will be terms that are used
interchangeably in this paper.
The P3P protocol specifies 3 ways of publishing a P3P policy; in
the HTTP header (can either be a compact policy, or a link to a
full policy), in the HTML document as a link tag, or in a well-
known location on the server. Because of some quirks of the way
web servers implement the serving of P3P policies (see discussion
in methodology), our current version of iWatch only finds
policies posted in the HTTP header or the body of the document,
it does not search the known locations. In order to fetch these
remaining policies without bringing the crawler to a halt we
delegate this task to a standalone program.
Privacy Seals are, in this paper, a combination of different
certificates or trustmarks issued by TRUSTe and BBBOnline
(BBBPrivacy and BBBReliability seals). These seals certify that
the site discloses or follows a minimum set of privacy protection
and security practices. While different seals or certificates are
enforced by different agencies, have different meanings, and offer
different enforcement mechanisms and guarantees, they are all
meant to calm potential users concerns. Given the relatively low
usage numbers, the different seal programs are grouped together
for most of our analysis.
Webbugs, also known as web-beacons or pixel tag, are a
collection of techniques aimed to tag and collect information from
web and email users without their knowledge. In a web page,
webbugs are typically used to track users navigating a given site,
and have become quite ubiquitous. Webbugs technically can be
implemented through a number of different techniques, but are
most commonly associated with a 1x1 pixel transparent gif,
invisible to the user. Webbugs are often used to augment the
tracking available with cookies, and are most troubling when set
by third parties, usually without user knowledge or consent. In
iWatch we group a number of tracking techniques under the label
of webbugs, but only when these are set and used by 3
rd
parties.
We do not classify banner ads or 3
rd
party cookies as webbugs,
but rather track these separately.
Much has been written about cookies, and so a discussion of how
they work and their potential threats to user privacy is omitted
here. We will just mention that in this work we do track the three
main categories of cookies separately, session cookies, defined as
cookies set by the first party and expiring with the browsing
session, 1
st
party cookies, set by the 1
st
party and set to persist,
and 3
rd
party cookies, which are set for any domain other than the
1
st
party.
Unsolicited popups, or just popups for short, refers to the much
hated technique of opening new browser windows, typically for
the purpose of advertising. Affiliated techniques include the pop-
under (popups which try to hide themselves). They present a
potential danger to end-users as they often serve up content for
third parties, enabling these to track users much like webbugs.
Popups have stopped being as big a focus in recent years as
blocking tools and techniques have become ubiquitous and
effective.
Web banners, or banners for short, do not present a privacy risk in
and of themselves, unless served by a third party. In this case,
they serve much the same function as a webbug, though at least
remaining visible to the user. Banners in our study are identified
by their size (these are the standardized sizes set by the Internet
31