2. We develop algorithms for approximate and exact pri-
vate nearest neighbor search. We utilize data mining
techniques to optimize query execution.
3. We show experimentally that the cost is reasonable;
hence our methods are applicable in practice.
2. RELATED WORK
Most existing approaches for private location-dependent
queries follow the user-anonymizer-LBS framework of Fig-
ure 1. The anonymizer sends to the LBS a Cloaking Region
(CR) instead of the actual user location; this procedure is
called cloaking. Ref. [11] combines spatial with temporal
cloaking. Each query q specifies a temporal interval δt that
the corresponding user u is willing to wait. If within δt,
K − 1 other clients in the vicinity of u also issue queries,
all these queries are combined in a single CR; otherwise, q
is rejected. In Casper [20], the anonymizer maintains the
locations of the clients using a pyramid data structure, sim-
ilar to a Quad-tree. Assume u asks a query and let c be the
lowest-level cell of the Quad-tree where u lies. If c contains
enough users (i.e., |c| ≥ K), c becomes the CR. Otherwise,
the horizontal c
h
and vertical c
v
neighbors of c are retrieved.
If |c ∪ c
h
| ≥ K or |c ∪ c
v
| ≥ K, the corresponding union of
cells becomes the CR; otherwise, the anonymizer retrieves
the parent of c and repeats this process recursively. In-
terval Cloak [14] is similar to Casper in terms of both the
data structure used by the anonymizer (a Quad-tree), and
the cloaking algorithm. The main difference is that Interval
Cloak does not consider neighboring cells at the same level
when determining the CR, but ascends directly to the an-
cestor level. Casper and Interval Cloak guarantee privacy
only for uniform distribution of user locations.
Hilbert Cloak [17] uses the Hilbert space filling curve to
map the 2-D space into 1-D values. These values are then
indexed by an annotated B
+
-tree. The algorithm partitions
the 1-D sorted list into groups of K users (the last group
may have up to 2K − 1 users). For querying user u the
algorithm finds the group to which u belongs, and returns
the minimum bounding rectangle of the group as the CR.
The same CR is returned for any user in a given group.
Hilbert Cloak guarantees privacy for any distribution of user
locations.
The previous approaches assume a static snapshot of user
locations and do not consider correlation attacks (e.g., his-
tory of user movement). In [5], correlation attacks are han-
dled as follows: At the initial timestamp t
0
, cloaking region
CR
0
is generated, which encloses a set AS of at least K
users. At a subsequent timestamp t
i
, the algorithm com-
putes a new anonymizing region CR
i
that encloses the same
users in AS, but contains their locations at timestamp t
i
.
There are two drawbacks: (i) As users move, the resulting
CR may grow very large, leading to prohibitive query cost.
(ii) If a user in AS disconnects from the service, the query
must be dropped. Furthermore, in [5] it is assumed that
there are no malicious users.
Privacy in LBS has also been studied in the context of
related problems. In [3], the CR is a closed region around
the query point, which is independent of the number of users
inside. Given CR, the LBS returns the probability that each
candidate result satisfies the query based on its location with
respect to the CR. Ref. [6], on the other hand, eliminates
the anonymizer by organizing users in a Peer-to-Peer sys-
tem. The querying user u searches his neighborhood until
he finds K − 1 other peers, which are used to construct the
CR. However, u tends to be close to the center of the CR;
therefore an attacker can identify u with high probability.
Ref. [12] also uses a Peer-to-Peer system to support dis-
tributed anonymization; although a centralized anonymizer
is not required, all users must trust each other. More rele-
vant to our work is the approach of [18]: In a preprocessing
phase, a trusted third party transforms (using 2-D to 1-D
mapping) and encrypts the database. The database is then
uploaded to the LBS, which does not know the decryption
key. All users possess tamper-resistant devices which store
the decryption key, but they do not know the key them-
selves. Users send encrypted queries to the LBS and decrypt
the answers to extract the results. The method assumes that
none of the tamper-resistant devices is compromised. If this
condition is violated, the privacy of all users is threatened.
Our work builds on the theoretical results of Private In-
formation Retrieval (PIR), which is defined as follows: a
server S holds a database with n bits, X = (X
1
. . . X
n
). A
user u has a particular index i and wishes to retrieve the
value of X
i
, without disclosing to S the value of i. The PIR
concept was introduced in [4] in an information theoretic
setting, requiring that even if S had infinite computational
power, it could not find i. It is proven in [4] that in any solu-
tion with a single server, u must receive the entire database
(i.e., Θ(n) cost). Nevertheless, in practice, it is sufficient
to ensure that S cannot find i with polynomial-time com-
putations; this problem is known as Computational PIR.
[19] showed that the communication cost for a single server
is Θ(n
ε
), where ε is an arbitrarily small positive constant.
Our work employs Computational PIR.
Several approaches employ cryptographic techniques to
privately answer NN queries over relational data. Most of
them are based on some version of the secure multiparty
computation problem [13]. Let two parties A and B hold ob-
jects a and b, respectively. They want to compute a function
f(a, b) without A learning anything about b and vice versa.
They encrypt their objects using random keys and follow a
protocol, which results into two “shares” S
A
and S
B
given to
A and B, respectively. By combining their shares, they com-
pute the value of f. In contrast to our problem (which hides
the querying user from the LBS), existing NN techniques as-
sume that the query is public, whereas the database is parti-
tioned into several servers, neither of which wants to reveal
their data to the others. Ref. [25] assumes vertically parti-
tioned data and uses secure multiparty computation to im-
plement a private version of Fagin’s [8] algorithm. Ref. [23]
follows a similar approach, but data is horizontally parti-
tioned among the servers. The computation cost is O(n
2
)
and may be prohibitive in practice. Ref. [1] also assumes
horizontally partitioned data, but focuses on top-k queries.
More relevant to our problem is the work of [16] which
uses PIR to compute the NN of a query point. The server
does not learn the query point and the user does not learn
anything more than the NN. To achieve this, the method
computes private approximations of the Euclidean distance
by adapting an algorithm [9] that approximates the Ham-
ming distance in {0, 1}
d
space (d is the dimensionality). The
cost of [16] is
˜
O(n
2
) for the exact NN and
˜
O(
√
n) for an ap-
proximation through sampling. The paper is mostly of the-
oretical interest, since the
˜
O notation hides polylogarithmic
factors that may affect the cost; the authors do not provide
any experimental evaluation of the algorithms.