ANGELMS：多维度敏感属性微数据隐私保护新框架

研究论文

122 浏览量更新于2024-08-26 收藏 802KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"ANGELMS是针对具有多个敏感属性的微数据隐私保护的一种新型数据发布框架。该框架旨在解决传统多维度桶化方法在处理超过3个敏感属性时导致的大量元组抑制问题，以及未对准标识符进行泛化的链接攻击风险。ANGELMS首先将敏感属性垂直分割成独立表格，然后根据l-多样性原则对它们进行桶化，并依据k-匿名原则对准标识符进行泛化。提出了一种名为MSB-KACA的算法来支持k-匿名过程。实验表明，ANGELMS框架能够在保持较低信息损失和抑制比例的情况下生成匿名表，优于简单的多维度桶化方法。微数据在数据分析和科学研究中发挥着越来越重要的作用，但公开和分享这些数据会暴露个人隐私，因此隐私保护成为亟待解决的问题。" 在ANGELMS框架中，隐私保护的核心策略包括： 1. **垂直分割敏感属性**：这种方法允许将不同类型的敏感信息分隔开，减少在一个单一表中泄露过多个人信息的风险。通过这种方式，每个独立的表格只需要处理一部分敏感属性，降低了隐私泄露的可能性。 2. **l-多样性原则**：这是为了确保每个敏感属性值的类别在每个桶（或群体）中至少出现l次。这样，即使攻击者知道一个特定桶的属性值，他们也无法确定具体哪个个体对应这个值，因为有l种可能的匹配。 3. **k-匿名原则**：在ANGELMS框架中，通过对准标识符进行泛化（例如，通过聚合、替换或删除某些详细信息），确保每个记录在匿名集合中与至少k-1个其他记录相同，使得攻击者无法精确识别出单个个体。 4. **MSB-KACA算法**：这个算法是ANGELMS框架中k-匿名过程的关键组成部分，它优化了属性的泛化过程，以平衡隐私保护和数据可用性。通过这个算法，可以更有效地处理复杂的微数据集，减少信息损失，同时保持足够的匿名性。 5. **性能优势**：ANGELMS相比于传统的多维度桶化方法，能够在保持较低的信息损失和抑制比例的同时，生成更安全的匿名数据。这意味着在保护隐私的同时，数据的分析价值得到更好的保留。微数据隐私保护是一个复杂且重要的领域，ANGELMS提供了一种创新的解决方案，尤其适用于那些包含多种敏感属性的数据集。其设计思路和实施策略对于数据发布和共享提供了有效的保障，有助于促进科学研究和社会进步，同时减少对个人隐私的潜在威胁。

资源详情

资源推荐

Abstract—Multi-dimension bucketization is a typical

framework for preventing privacy disclosure of microdata with

multiple sensitive attributes. However, it results in too much

tuple suppression when the considered microdata have more

than 3 sensitive attributes. Besides, it does not generalize

quasi-identifiers, which make the anonymized data easy to

suffer from linking attack. To overcome these drawbacks, we

propose an improved bucketization framework, named

ANGELMS. ANGELMS first vertically partitions sensitive

attributes into several independent tables, and then bucketizes

them according to l-diversity principle and generalizes

quasi-identifiers according to k-anonymity principle. In

addition, we proposed an MSB-KACA algorithm for the

k-anonymizing process of our ANGELMS framework.

Experiments show that the proposed framework can generate

anonymized tables with less information loss and suppress ratio

than simple multi-dimension bucketization do.

I. INTRODUCTION

ICRODATA play an increasingly important role in

data analysis and scientific research. However,

publishing and sharing microdata will threaten

individuals’ privacy. Therefore, some anonymity models

have been proposed to protect individual’s privacy for

publishing microdata. k-anonymity [1], [2] is a simple and

effective method to protect privacy in publishing microdata,

which requires that each tuple has at least k indistinguishable

tuples with respect to quasi-identifier in the released data. But

it cannot resist homogeneity attack and background

knowledge attack, so some other enhanced anonymity models

have been proposed, such as l-diversity [3] and t-closeness

[4].

Several technologies have also been proposed to implement

the above anonymity models. Generalization [1], [2] is a

typical one to achieve anonymity model, whose idea is to

replace real value of quasi-identifier with less specific but

semantically consistent value. Generalization distorts original

Manuscript received November 27, 2012. This work was supported by the

National Natural Science Foundation of China (No.61170108 and

No.6110019), the Natural Science Foundation of Zhejiang Province of China

(No.Y1100161), the Humanity and Social Science Foundation of Ministry of

Education of China (No.12YJCZH142).

F Luo is with the Department of Computer Science and technology,

Zhejiang Normal University, Jinhua, 321004, Zhejiang, PRC (e-mail:

lfw2565295@126.com).

J Han, corresponding author, is with the Department of Computer Science

and technology, Zhejiang Normal University, Jinhua, 321004, Zhejiang,

PRC (phone: 13750983528, e-mail: hanjm@zjnu.cn).

J Lu is with the Department of Computer Science and technology,

Zhejiang Normal University, Jinhua, 321004, Zhejiang, PRC (e-mail:

lujianfeng@zjnu.cn).

H Peng is with the Department of Computer Science and technology,

Zhejiang Normal University, Jinhua, 321004, Zhejiang, PRC (e-mail:

hpeng@zjnu.edu.cn).

data, which is to the disadvantage of data mining. Anatomy

[5] is also a fine method to anonymize microdata, whose idea

is to release all the quasi-identifier and sensitive values

directly in two separate tables. However, releasing the

QI-values directly may suffer from a higher breach probability

than generalization. To overcome these drawbacks, Tao et al.

[6] proposes ANGEL, a new anonymization method that is as

effective as generalization in privacy protection, but is able to

retain significantly more information in the microdata. Leela

et al. [7] applied Angelization to preserve privacy in

re-publication of dynamic microdata after insertions deletions.

Li et al. [8] proposed slicing, which partitions the microdata

both horizontally and vertically.

Most existing privacy preserving technologies focus on

microdata with single sensitive attribute, but they can’t be

directly used for microdata with multiple sensitive attributes.

A few works concentrated on microdata with multiple

sensitive attributes. Ye et al. [9] proposed a framework,

decomposition, to tackle privacy preservation in multiple

sensitive case. Yang et al. [10] proposed a Multiple Sensitive

Bucketization approach (MSB). They also proposed three

liner-complexity greedy algorithms to implement MSB,

namely, the maximal-bucket first algorithm (MBF), the

maximal single-dimension-capacity first algorithm

(MSDCF), and the maximal multiple dimension-capacity first

algorithm (MMDCF). But the MSB method is only suitable to

anonymize microdata with less sensitive attributes, e.g, 2-3

sensitive attributes. For microadata with more sensitive

attributes, MSB would result in high suppression ratios. For

example, table I is an original dataset. We assume that

{Gender, ZipCode, Age} are quasi-identifier attributes and

{Occupation, Salary, Physician, Disease} are sensitive

attributes. We can achieve a 3-diversity table by MSB. The

anonymity table only has one group with tuples {t

, t

}

presented in table II, the rest tuples are all suppressed. The

suppression ratio reaches 6/9, which greatly degrades the

quality of data publishing.

In this paper, we propose a framework, called ANGELMS

(ANatomy and GEneraLization on Multiple Sensitive), for

microdata with multiple sensitive attributes based on

ANGEL[6]. The main idea of ANGELMS is to vertically

partition attributes into several sensitive attribute tables and

one quasi-identifier table. Then, tuples in each table are

divided into partitions. The quasi-identifier values of each

quasi-identifier group are generalized to the same generalized

value under k-anonymity principle. Meanwhile, the sensitive

values of each sensitive group are allocated to obey the

l-diversity requirement. For example, table I contains 3

quasi-identifier attributes and 4 sensitive attributes. We can

partition table I into three tables. The first one is a generalized

ANGELMS: A Privacy Preserving Data Publishing Framework for

Microdata with Multiple Sensitive Attributes

Fangwei Luo, Jianmin Han, Jianfeng Lu and Hao Peng

Third International Conference on Information Science and Technology

March 23-25, 2013; Yangzhou, Jiangsu, China

393

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38733676

粉丝: 5
资源: 915

ANGELMS：多维度敏感属性微数据隐私保护新框架

基于大模型技术的算力产业监测服务平台设计

This_honeypot_supports_Telnet_and_SSH_two_protocol_FF-Pot.zip

吉他谱_What I've Done - Linkin Park.pdf

吉他谱_Too sweet - Hozier.pdf

Linux使用的一些笔记，包括shell命令，软件，一些实用的网站的整理_Linux_note.zip

基于ssm的机房预约系统设计与实现.docx

app执行Linux命令_app-Linux-.zip

吉他谱_Would - Alice in Chains.pdf

吉他谱_Yellow Ledbetter - Pearl Jam.pdf

基于ssm的个性化影片推荐系统设计与实现.docx

Java项目-基于SSM+Jsp的网上医院预约挂号系统的设计与实现（源码+数据库脚本+部署视频+代码讲解视频+全套软件）

基于javaweb的图书管理系统源代码+数据库+使用说明

idea插件开发的第四天-完善JSON工具-jsonTool打包结果

Linux_Shell_命令汇总_Linux-Command-Summary.zip

c1900-universalk9-npe-mz.SPA.151-2.T0a.bin

基于ssm的病人跟踪治疗信息管理系统设计与实现.docx

吉他谱_Thank You For The Venom - My Chemical Romance.pdf

吉他谱_Parisienne Walkways - Gary Moore.pdf

Java项目-基于SSM+Vue的海鲜自助餐厅系统的设计与实现（源码+数据库脚本+部署视频+代码讲解视频+全套软件）

qt--将linux的ps+kill命令结合，成为psandkill命令，psandkill_nam_psandkill.zip

最新资源