138
We will try to give some intuition regarding the meaning of the guarantee. Sup-
pose the database contains social security numbers and web search histories, and
consider the query “How many people in the database have social security number
N and searched for “embarassing medical condition” 3 times in the past week?”
The true answer to this question is either yes or no, but the privacy mechanism
may produce arbitrary outputs; these are then interpreted as yes or no by the
user.
Differential privacy should obscure N’s presence or absence in the database, as
well as whether or not the search history fits the profile. So consider an adversary
that interprets the response to the query, deterministically mapping responses to
{yes, no}. Let S
i
, i ∈ {yes, no}, denote the pre-image of i under this mapping.
The privacy concern here is unbalanced: N does not want to be associated with
the embarassing query, and a response interpreted as yes is undesirable. Let α be
the probability, over coin flips of the mechanism, of producing a response in S
1
when N is not in the database. Intuitively, we want to ensure that either α is small
or the interpretation is meaningless—if an output is frequently interpreted as yes,
even when N is not in the database, then the interpretation means nothing.
For concreteness, suppose = ln 3, so e
= 3. If, say, α < 1/7, then even if N is
in the database and satisfies the profile, the response is more likely to be mapped
to no, and if α is very small then the increased factor of three still results in a
very small probability of yes. On the other hand, suppose α = 1/2. Then in
some sense the system, together with the interpretation, is silly, since even when
N is present the probability of an incorrect interpretation is at least 1/6. To see
this, let P (N ) be the boolean variable describing whether or not N fits the profile.
Since α = 1 − α = 1/2, we have
1/2 = Pr[S
¬P (N)
|N OUT ] ≤ 3 Pr[S
¬P (N)
|N IN].
Finally, if α is large then the “bad” event of interpreting the response as yes
happens even when N is not in the database, so there is little harm in joining.
The argument we have just given can be interpreted in terms hypothesis testing;
see, for example, Wasserman and Zhou [62] for more discussion.
5. Consider “positive” responses, i.e., those interpreted as yes. Differential privacy
may be achieved not only by reducing the probability of a true positive, but also by
increasing the probability of a false positive. In other words, by indiscriminately
implicating people who may not even be in the database, regardless of whether
or not they satisfy the profile, we provide “cover” for the true positives. This is
the same philosophy as in randomized response [61], which indeed provides some
differential privacy: an embarassing response may simply be the result of a coin
flip.
6. Definition 1 extends to group privacy as well (and to the case in which an in-
dividual contributes more than a single row to the database): changing a group
of k rows in the data set induces a change of at most a multiplicative e
k
in the
corresponding output distribution.