SLOMS: A Privacy Preserving Data Publishing
Method for Multiple Sensitive Attributes
Microdata
Jianmin Han
Department of Computer Science and Technology
Zhejiang Normal University, Jinhua, 321004, Zhejiang, PRC
hanjm@zjnu.cn
Fangwei Luo,Jianfeng Lu and Hao Peng
Department of Computer Science and Technology
Zhejiang Normal University, Jinhua, 321004, Zhejiang, PRC
Abstract—Multi-dimension bucketization is a typical
method to anonymize multiple sensitive attributes. However,
the method leads to low data utility when microdata have
more sensitive attributes. In addition, the methods do not
generalize quasi-identifiers, which make the anonymous
data vulnerable to suffer from linked attacks. To address
the problems, the paper proposes a SLOMS method. The
method vertically partitions the multiple sensitive attributes
into several tables and bucketizes each sensitive attribute
table to implement l-diversity. At the same time, it
generalizes the quasi-identifiers to implement k-anonymity.
The paper also proposes a MSB-KACA algorithm to
anonymize microdata with multiple sensitive attributes by
SLOMS. Experiments show that SLOMS can generate
anonymous tables with less suppression ratio and less
distortion compared with generalization and MSB.
Index Terms—k-anonymity, l-diversity, multi-dimension
bucketization method, SLOMS
I. INTRODUCTION
Microdata play an increasingly important role in data
analysis and scientific research. However, publishing and
sharing microdata will threaten individuals’ privacy.
Therefore, some anonymity models have been proposed
to protect individual’s privacy for microdata publish
recently. k-anonymity [1] is a simple and effective
method to protect privacy in microdata, which requires
that each tuple has at least k indistinguishable tuples with
respect to quasi-identifier in the released data. But it
cannot resist homogeneity attack and background
knowledge attack, so some other enhanced anonymity
models have been proposed, such as l-diversity [4] and t-
closeness [5].
Several techniques have also been proposed to
implement the above anonymity models. Generalization
[1-3] is a typical one to implement anonymity model,
whose idea is to replace real value of quasi-identifier with
less specific but semantically consistent value.
Generalization distorts original data, which is
disadvantageous to data mining. Anatomy [6] is also a
fine method to anonymize microdata, whose idea is to
release all the quasi-identifier and sensitive values
directly in two separate tables. However, releasing the
QI-values directly may suffer from a higher breach
probability than generalization. To overcome these
drawbacks, Tao et al. [7] proposed ANGEL, a new
anonymization method that is as effective as
generalization in privacy protection, which can retain
higher data utility. Leela et al. [8] applied Angelization to
preserve privacy in re-publication of dynamic microdata
after insertions or deletions. Li et al. [9] proposed slicing,
which anonymizes microdata by partitioning microdata
horizontally and vertically. Neha et al. [10] concluded
that slicing preserves data utility better than
generalization, in addition, it also prevents membership
disclosure.
All of above works focus on microdata with single
sensitive attribute. These methods will lead to much low
data utility when they are directly used for microdata with
multiple sensitive attributes. At present, there is only a
few work concentrated on microdata with multiple
sensitive attributes. Yang et al. [11] proposed a Multiple
Sensitive Bucketization(MSB) approach. But the MSB
method is only suitable to deal with microdata with less
sensitive attributes, e.g, 2 to 3 sensitive attributes. For
microadata with more sensitive attributes, MSB would
result in high suppression ratios. For example, table I is
an original dataset. We assume that {Gender, ZipCode,
Age} are quasi-identifier attributes and {Occupation,
Salary, Physician, Disease} are sensitive attributes. We
can achieve a 3-diversity table by MSB, seeing table II.
The anonymity table only has one group with tuples {t
5
,
t
6
, t
7
} presented in table II, the rest tuples are all
suppressed. The suppression ratio is 6/9, which greatly
degrades the quality of data publishing.
JOURNAL OF SOFTWARE, VOL. 8, NO. 12, DECEMBER 2013
doi:10.4304/jsw.8.12.3096-3104