Exploring the Privacy Bound for Differential Privacy: From Theory to Practice
probability of successfully identifying the individuals
from the counting queries to be no greater than
ρ ≤
1
10
(maximum inference probability). In order to
achieve this protection goal with the existing theoretical
result [15] (more details will be given in Section
3), we thus have: the upper bound of differential
privacy budget should satisfy ≤
∆f
∆v
ln
(n−1)ρ
1−ρ
, where
n represents the number of records, ∆f is the sensitivity
of query, ∆v is the maximum distance between function
values of every possible world (the same information
needed to calculate global sensitivity) [15], and ρ is the
maximum inference probability. Then, if given n =
600, 000 records, the bound yields
≤
1
1
ln
(600, 000 − 1)(
1
10
)
1 −
1
10
≈ 11.1 (1)
where ∆v is no greater than 1 for count queries.
In the above example, the upper bound of would
be 11.1, which might exceed our expectation. In
other words, such large can satisfy ρ ≤
1
10
in their
interpretive inference model but can be vulnerable in
other cases (for instance, in our proposed interpretive
inference model, =11.1 would result in ρ >
1
10
) (higher
privacy risks than the data owners’ demand).
Furthermore, in the interpretive inference model
proposed in [15], is proportional to ln(n) where n is
data size. As n increases, the upper bound of also
increases. In case of a large or small n, the derived
bound would be meaningless (unbounded or negative).
From the above examples, we can see that existing
solutions have their inherent drawbacks. Motivated
by such observations, we propose a novel interpretive
inference model, which can be used to evaluate the
probability or confidence that the adversary will
be able to identify any individual from the noise-
injected queries over the dataset. This enables us to
understand the privacy implications of differentially
private techniques in a much clearer way.
1.3. Our Contributions
The major contributions of this paper are summarized
as follows.
• This paper presents a novel interpretive inference
model to convert the privacy bound in
differential privacy to inference probabilities that
any individual is included in the input data
(for queries). The proposed interpretive inference
model and converted inference probabilities have
addressed the drawbacks of the existing models
[15, 24].
• Based on the proposed interpretive inference
model, we present an instantiation for choosing
appropriate (maximum privacy bound in dif-
ferential privacy), which should effectively bound
the risks of inferring the presence/absence of indi-
viduals (given the maximum inference probabil-
ity) in generic differentially private algorithms.
• An in-depth theoretical analysis of our approach
is provided, and a set of experiments are
conducted to confirm the effectiveness of our
approach.
The rest of the paper is organized as follows. In
Section 2, we describe the preliminaries for differential
privacy. In Section 3, we present the analysis for two
representative existing works. Then, in Section 4, we
propose our interpretive inference model and the upper
bound for in differential privacy (given the maximum
inference probability). Section 5 demonstrates the
experimental results, and Section 6 reviews related
work. Finally, Section 7 gives the concluding remarks.
2. Preliminaries
In this section, we will first describe the basic
mechanism of differential privacy, and then present
the Laplace distribution which contributes to a generic
differentially private approach.
2.1. Differential Privacy
The most commonly-used definition of differential
privacy is -differential privacy, which guarantees that
any individual tuple has negligible influence on the
published statistical results, in a probabilistic sense.
Specifically, a randomized algorithm A satisfies -
differential privacy if and only if for any two databases
D
1
, D
2
that differ in exactly one record, and any possible
output O of A, the ratio between the probability that A
outputs O on D
1
and the probability that A outputs O
on D
2
is bounded by a constant. Formally, we have
|P rob(A(D
1
) = O)|
|P rob(A(D
2
) = O)|
≤ e
(2)
where is a constant specified by the user, D
1
, D
2
differ in at most one element, and e is the base of
the natural logarithms. Intuitively, given the output
O of A, it is hard for the adversary to infer whether
the original data is D
1
or D
2
, if the parameter is
sufficiently small. Similarly, -differential privacy also
provides any individual with plausible deniability that
her/his record was in the databases.
The earliest and most widely-adopted approach
for enforcing -differential privacy is the Laplace
mechanism [4], which works by injecting random noise
x ∝ lap(λ) that follows a Laplace distribution into
the output of the original O, and the deterministic
algorithm A obtains its randomized version O + x, that
is, A(D) = O + x, where λ =
4f
.
3
EAI Endorsed Transactions on
Security and Safety
12 2018 - 01 2019 | Volume 5 | Issue 18 | e2