Abstract—Multi-dimension bucketization is a typical
framework for preventing privacy disclosure of microdata with
multiple sensitive attributes. However, it results in too much
tuple suppression when the considered microdata have more
than 3 sensitive attributes. Besides, it does not generalize
quasi-identifiers, which make the anonymized data easy to
suffer from linking attack. To overcome these drawbacks, we
propose an improved bucketization framework, named
ANGELMS. ANGELMS first vertically partitions sensitive
attributes into several independent tables, and then bucketizes
them according to l-diversity principle and generalizes
quasi-identifiers according to k-anonymity principle. In
addition, we proposed an MSB-KACA algorithm for the
k-anonymizing process of our ANGELMS framework.
Experiments show that the proposed framework can generate
anonymized tables with less information loss and suppress ratio
than simple multi-dimension bucketization do.
I. INTRODUCTION
ICRODATA play an increasingly important role in
data analysis and scientific research. However,
publishing and sharing microdata will threaten
individuals’ privacy. Therefore, some anonymity models
have been proposed to protect individual’s privacy for
publishing microdata. k-anonymity [1], [2] is a simple and
effective method to protect privacy in publishing microdata,
which requires that each tuple has at least k indistinguishable
tuples with respect to quasi-identifier in the released data. But
it cannot resist homogeneity attack and background
knowledge attack, so some other enhanced anonymity models
have been proposed, such as l-diversity [3] and t-closeness
[4].
Several technologies have also been proposed to implement
the above anonymity models. Generalization [1], [2] is a
typical one to achieve anonymity model, whose idea is to
replace real value of quasi-identifier with less specific but
semantically consistent value. Generalization distorts original
Manuscript received November 27, 2012. This work was supported by the
National Natural Science Foundation of China (No.61170108 and
No.6110019), the Natural Science Foundation of Zhejiang Province of China
(No.Y1100161), the Humanity and Social Science Foundation of Ministry of
Education of China (No.12YJCZH142).
F Luo is with the Department of Computer Science and technology,
Zhejiang Normal University, Jinhua, 321004, Zhejiang, PRC (e-mail:
lfw2565295@126.com).
J Han, corresponding author, is with the Department of Computer Science
and technology, Zhejiang Normal University, Jinhua, 321004, Zhejiang,
PRC (phone: 13750983528, e-mail: hanjm@zjnu.cn).
J Lu is with the Department of Computer Science and technology,
Zhejiang Normal University, Jinhua, 321004, Zhejiang, PRC (e-mail:
lujianfeng@zjnu.cn).
H Peng is with the Department of Computer Science and technology,
Zhejiang Normal University, Jinhua, 321004, Zhejiang, PRC (e-mail:
hpeng@zjnu.edu.cn).
data, which is to the disadvantage of data mining. Anatomy
[5] is also a fine method to anonymize microdata, whose idea
is to release all the quasi-identifier and sensitive values
directly in two separate tables. However, releasing the
QI-values directly may suffer from a higher breach probability
than generalization. To overcome these drawbacks, Tao et al.
[6] proposes ANGEL, a new anonymization method that is as
effective as generalization in privacy protection, but is able to
retain significantly more information in the microdata. Leela
et al. [7] applied Angelization to preserve privacy in
re-publication of dynamic microdata after insertions deletions.
Li et al. [8] proposed slicing, which partitions the microdata
both horizontally and vertically.
Most existing privacy preserving technologies focus on
microdata with single sensitive attribute, but they can’t be
directly used for microdata with multiple sensitive attributes.
A few works concentrated on microdata with multiple
sensitive attributes. Ye et al. [9] proposed a framework,
decomposition, to tackle privacy preservation in multiple
sensitive case. Yang et al. [10] proposed a Multiple Sensitive
Bucketization approach (MSB). They also proposed three
liner-complexity greedy algorithms to implement MSB,
namely, the maximal-bucket first algorithm (MBF), the
maximal single-dimension-capacity first algorithm
(MSDCF), and the maximal multiple dimension-capacity first
algorithm (MMDCF). But the MSB method is only suitable to
anonymize microdata with less sensitive attributes, e.g, 2-3
sensitive attributes. For microadata with more sensitive
attributes, MSB would result in high suppression ratios. For
example, table I is an original dataset. We assume that
{Gender, ZipCode, Age} are quasi-identifier attributes and
{Occupation, Salary, Physician, Disease} are sensitive
attributes. We can achieve a 3-diversity table by MSB. The
anonymity table only has one group with tuples {t
5
, t
6
, t
7
}
presented in table II, the rest tuples are all suppressed. The
suppression ratio reaches 6/9, which greatly degrades the
quality of data publishing.
In this paper, we propose a framework, called ANGELMS
(ANatomy and GEneraLization on Multiple Sensitive), for
microdata with multiple sensitive attributes based on
ANGEL[6]. The main idea of ANGELMS is to vertically
partition attributes into several sensitive attribute tables and
one quasi-identifier table. Then, tuples in each table are
divided into partitions. The quasi-identifier values of each
quasi-identifier group are generalized to the same generalized
value under k-anonymity principle. Meanwhile, the sensitive
values of each sensitive group are allocated to obey the
l-diversity requirement. For example, table I contains 3
quasi-identifier attributes and 4 sensitive attributes. We can
partition table I into three tables. The first one is a generalized
ANGELMS: A Privacy Preserving Data Publishing Framework for
Microdata with Multiple Sensitive Attributes
Fangwei Luo, Jianmin Han, Jianfeng Lu and Hao Peng
M
Third International Conference on Information Science and Technology
March 23-25, 2013; Yangzhou, Jiangsu, China
978-1-4673-2764-0/13/$31.00 ©2013 IEEE