Practical Private Information Retrieval Supporting
Keyword Search in the Cloud
Mengke Yu
∗
, Kaichen Yang
∗
, Lingbo Wei
†
and Jinyuan Sun
‡
∗
Key Laboratory of Electromagnetic Space Information
University of Science and Technology of China
Email: yumk@mail.ustc.edu.cn, ykcdxt@mail.ustc.edu.cn
†
Shanghai Jiao Tong University, Email: weilib@hotmail.com
‡
University of Tennessee, Email:jysun@utk.edu
Abstract—In cloud computing environment, just protecting
the contents of the queries from users to a large database server
is far away from enough. Because it does not protect the leak of
access patterns from careful observations. It is thus important to
make sure the server learning nothing about the queries including
access patterns. However, this implies an expensive computation
or communication cost of all the data on the server. Existing
solutions are not efficient due to their impractical communication
and computation cost. Besides, most of them do not support
keyword search. In this paper, we introduce the mechanism
of pricing to solve the problem of impractical cost. Using our
scheme called KSPIR, we achieve the minimum communication
and computation cost according to the flexible privacy and budget
specified by users. It is indeed a kind of tradeoff between the
cost of retrieval and the degree of privacy. It is worth noting
that it also supports keyword search. It allows users to retrieve
the data items containing the keywords they are interested in.
The experimental results confirm the correctness and efficiency
of KSPIR.
I. INTRODUCTION
With the rapid development of cloud computing, out-
sourced storage architectures become more prevalent than ever.
Data owners do not have to purchase expensive hardware for
storing or managing a large amount of data locally. More and
more data owners, therefore, are willing to upload their data to
cloud servers. Users who are interested in the data can retrieve
them anywhere from terminals with great flexibility.
But it also brings a threat to users’ privacy. Cloud servers
are untrusted and vulnerable to malicious attacks. Encryption
protects data privacy and prevents unauthorized accesses, but it
is far away from enough because it does not protect the leakage
of access patterns from careful observations. Traffic analysis
techniques can reveal sensitive information against privacy
[1]. For example, when users are in personalized search and
recommendation services through service providers, such as
Google, access history could disclose users’ retrieval habits or
identities; the frequencies of requests can reveal the popularity
of data; accesses to the same data may indicate the relationship
between multiple users.
This work was supported in part by the Natural Science Foundation of
China (NSFC) under Grants 61202140 and 61328208, by the Program for
New Century Excellent Talents in University under Grant NCET-13-0548, by
the Innovation Foundation of the Chinese Academy of Sciences under Grand
CXJJ-14-S132, and by the Youth Innovation Promotion Association of the
Chinese Academy of Sciences.
Fortunately, Private Information Retrieval (PIR) is a tech-
nique allowing users’ retrieving, without the server learning
which data items have been retrieved. It not only protects the
contents of queries, but also protects users’ access patterns.
In the formal PIR setting, the database is modeled as a n-
bit string DB = m
1
, ..., m
n
. The user obtains a particular
bit m
i
, and the server does not learn i. It is the foundation
of many privacy-sensitive applications, including anonymous
email, patent databases, domain name registration, and anony-
mous communication networks .
Obviously, a trivial but information-theoretically secure
way for users is to download the entire database and make
requests after decrypting the whole database. After that, they
then encrypt and upload them to the server again. However,
the communication cost is too huge to be practical. There-
fore, researchers came up with a large quantity of non-trivial
solutions. The number of bits transferred between users and
the server has to be smaller than the size of the whole
database. Unfortunately, the deployment of these protocols
on real hardware is mostly orders of magnitude slower than
trivially transferring of the entire database [2].
According to the formal PIR setting, most existing PIR
solutions only allow the retrieval by the address i, but i is not
available to users in cloud computing environment. To tackle
this challenge, extensions of the basic PIR [3][4][5] supporting
keyword search are proposed. Our paper is inspired by Michael
et al. [3], because it is among the best to design the scheme
which can also support keyword search and protect access
patterns. However, the bad news is that the hard problem of
huge cost in communication and computation also exists here.
This is exactly the problem we are going to solve.
As we all know, applying PIR to the database consisting of
n data items can make sure that the server cannot recognize
retrieving requests among n items. But users do not always
have such strict requirements on privacy. Sometimes, users
may only expect their retrieving requests to be hidden among
k data items (k << n). A relate concept is k-Anonymity
in which the probability to recognize an individual correctly
is at most 1/k. We introduce the mechanism of pricing here
and propose a scheme called KSPIR that achieves the tradeoff
between the cost of retrieval and the degree of privacy. That
is to say, users can sacrifice their privacy to exchange for less
cost or achieve higher level of privacy at the expense of higher
cost. More specifically, users can design the level of privacy
they need and the money they can afford at most. Then the
2014 Sixth International Conference on Wireless Communications and Signal Processing (WCSP)
978-1-4799-7339-2/14/$31.00 ©2014 IEEE