1939-1374 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSC.2017.2762296, IEEE
Transactions on Services Computing
4
denote hospital 1 and h2 to denote hospital 2. The children
nodes of role nodes are medical diagnosis nodes and other
laboratory inspection and test nodes. Only these leaf nodes
contain EHR data, such as prescriptions and diagnosis and
so on. Thus, the tree has two types of nodes: leaf nodes
that contain real EHR data and the internal nodes of upper
levels that are actually indices for EHR data. The former
nodes are called data nodes, which are represented by the
rectangle, and the latter nodes are called index nodes, which
are represented by the oval. Obviously in this structure, all
data nodes are nested according to a role node, so that they
can be expediently retrieved by different roles of doctors.
In the preceding EHR data structure, the index nodes
in level 2 can be used as the keywords for search. When
the cloud finds the matched index nodes according to the
queried keywords, it returns the data nodes to the data
users. However, in such an inverted index-based structure,
not only the EHR data need to be encrypted when they
are stored in a remote healthcare cloud, but also the index
should be encrypted, since it is easy to infer sensitive
information about the patient from the index nodes. In this
paper, searchable encryption schemes are used to encrypt
the index and support search operations over encrypted da-
ta. The encryption scheme used to encrypt EHR data can be
any secure searchable encryption scheme. The data owner
can choose different encryption schemes based on different
healthcare application scenarios and their respective privacy
requirements.
In the rest of this paper, we let D = {d
1
, d
2
, · · · , d
m
}
denote the set of EHR data to be stored in a third party and
possibly untrusted healthcare cloud, where |D| = m is the
total number of EHR data. According to the preceding EHR
data structure, each EHR data d
i
is indexed by one or more
keywords. Let I = {w
1
, w
2
, · · · , w
n
} be the index in D,
where |I| = n is the total number of keywords. D
i∈[n]
⊆ D
denotes the set of EHR data are indexed by keyword w
i
.
The keywords and the sets of EHR data are respectively
encrypted by encryption algorithms Enc and E, and are
stored in the healthcare cloud. Note again that Enc and E are
different encryption schemes: Enc is a searchable encryption
scheme and E can be any secure encryption scheme chosen
by the data owner.
2.2 Privacy Requirements in SE Schemes
Intuitively, an SE scheme is secure if the server learns
nothing about the query as well as the documents except
the encrypted query results [1], [4], [5], [6]. In this section,
we discuss in detail the specific privacy requirements for
index-based data storage structures, where the cloud server
searches over a set of searchable index instead of searching
on encrypted data directly.
Data privacy is a basic requirement for the EHR data such
that the outsourced EHR data should not be revealed in
any form to any unauthorized parties, including the cloud
service providers. Typically, it can be guaranteed by the
encryption algorithms. The user who has the secret key can
effectively decrypt the encoded EHR data after retrieving
them from the cloud server. The encryption based protection
needs to consider the following three types of requirements
for privacy protection: (i) leakage due to search keywords,
Patient
Encrypt
Search Token
Cloud
Result
Upload
Encrypt
Index
SSE
Fig. 3. The First Scenario: Owner as Reader/Writer
(ii) leakage due to search patterns, and (iii) leakage due to
access patterns.
Privacy of search keywords. This is related to plaintext
privacy: if the cloud server deduces any association between
frequent keywords and encrypted dataset from the index,
it may learn the main content of the EHR data. Therefore,
searchable index should be constructed in such a way that
prevents the cloud server from performing such kind of as-
sociation attacks. This kind of security is also called security
against chosen keyword attack (CKA). In an SE scheme, the
randomness of encrypted index guarantees to prevent such
attacks from the cloud server.
Privacy of search patterns. Data users usually prefer
to keep their query from being exposed to others, i.e.,
the keyword indicated by the corresponding token. In the
literature [7], [8], [9], [10], this kind of leakage is called
search pattern. Namely search patterns reveal whether the
same search was performed in the past or not. Accessing
the search pattern allows the server to use statistical analysis
and determine information about the query keywords. Us-
ing deterministic trapdoors directly leaks the search pattern.
Existing randomized SE schemes use randomly generated
tokens to guarantee the privacy of a user’s search pattern.
This security notion of randomized SE schemes is called
predicate privacy [6]. Note that randomizing token generation
algorithm only contributes to defend outside adversaries
of the cloud server but not inner adversaries (e.g., cloud
administrators), because the entry of index touched in
each search process discloses the search pattern as well.
A possible approach to reduce this leakage is to re-order
the keywords and re-generate the index periodically, say
semimonthly or monthly.
Privacy of access patterns. Besides the above privacy
requirements, we note that the sequence of search outcomes
of most SE constructions will likely reveal the information
of the keywords since the cloud server will always return
the same document set for the same queried keyword. This
kind of leakage is referred to as access pattern [7]. Access
pattern refers to the information that is implied by the
query results. Only ORAM can hide access patterns. But
ORAM is computationally intensive and do not scale well
for real world datasets. In practice, one may reduce (but not
eliminate) the leakage of access patterns. For example, one
could randomly insert fake documents in the bitmaps [11].
In this paper, we focus on the techniques for ensuring
plaintext privacy against any adversary and predicate privacy
against outsider adversaries of the cloud server. The above-
mentioned techniques can be combined with the four types
of SE schemes to reduce search and access pattern leakage