统计学中的差分隐私：已知与未来探索

需积分: 12 182 浏览量更新于2024-07-17 收藏 266KB PDF 举报

"《统计中的差分隐私：我们已知与未来方向》\n\n这篇文章发表在2009年的《隐私与保密性期刊》(Journal of Privacy and Confidentiality)第1卷第2期，页码135-154。作者Cynthia Dwork和Adam Smith在2008年的一个由美国国家卫生研究院(NCHS)和疾病控制与预防中心(CDC)赞助的数据隐私研讨会上，提出了他们关于差分隐私的重要概念及其在统计学领域的应用。\n\n差分隐私是一种保护个人隐私的理论框架，旨在确保在数据分析过程中，个体数据的存在或缺席几乎不会对最终发布的统计结果产生显著影响。这种理论的核心思想在于，即使攻击者拥有数据库的全部信息，也无法通过分析发布的结果推断出任何特定个体的具体信息，从而提供了一种在保障隐私的同时进行数据共享的可能。\n\n在文章的开头，作者回顾了差分隐私的定义，并强调了其最初设计的背景，即在互联网时代，如何构建一个可以收集大量个人敏感信息（如人口普查数据）的统计数据库，同时保持公众对个人隐私的保护。问题的关键在于如何在提供有用的统计信息的同时，避免个体隐私被泄露。\n\n文中深入探讨了已有的研究成果，包括如何设计和实现满足差分隐私条件的统计估计器，这些估计器能够在保持数据隐私的同时，仍能提供高质量的统计分析。此外，作者还提出了一系列未来的研究议程，着重于如何进一步提升差分隐私技术的效率、鲁棒性和适用性，以及它在更广泛的领域，如机器学习、云计算和人工智能中的潜在应用。\n\n这篇文章不仅提供了对差分隐私理论的深入理解，还为该领域的研究者和实践者指明了未来的研究方向，是理解和应用差分隐私技术的宝贵资源。对于那些致力于保护数据隐私并在统计分析中寻求平衡的人来说，这篇文章无疑是一份重要的参考资料。"

138

We will try to give some intuition regarding the meaning of the guarantee. Sup-

pose the database contains social security numbers and web search histories, and

consider the query “How many people in the database have social security number

N and searched for “embarassing medical condition” 3 times in the past week?”

The true answer to this question is either yes or no, but the privacy mechanism

may produce arbitrary outputs; these are then interpreted as yes or no by the

user.

Diﬀerential privacy should obscure N’s presence or absence in the database, as

well as whether or not the search history ﬁts the proﬁle. So consider an adversary

that interprets the response to the query, deterministically mapping responses to

{yes, no}. Let S

, i ∈ {yes, no}, denote the pre-image of i under this mapping.

The privacy concern here is unbalanced: N does not want to be associated with

the embarassing query, and a response interpreted as yes is undesirable. Let α be

the probability, over coin ﬂips of the mechanism, of producing a response in S

when N is not in the database. Intuitively, we want to ensure that either α is small

or the interpretation is meaningless—if an output is frequently interpreted as yes,

even when N is not in the database, then the interpretation means nothing.

For concreteness, suppose  = ln 3, so e



= 3. If, say, α < 1/7, then even if N is

in the database and satisﬁes the proﬁle, the response is more likely to be mapped

to no, and if α is very small then the increased factor of three still results in a

very small probability of yes. On the other hand, suppose α = 1/2. Then in

some sense the system, together with the interpretation, is silly, since even when

N is present the probability of an incorrect interpretation is at least 1/6. To see

this, let P (N ) be the boolean variable describing whether or not N ﬁts the proﬁle.

Since α = 1 − α = 1/2, we have

1/2 = Pr[S

¬P (N)

|N OUT ] ≤ 3 Pr[S

¬P (N)

|N IN].

Finally, if α is large then the “bad” event of interpreting the response as yes

happens even when N is not in the database, so there is little harm in joining.

The argument we have just given can be interpreted in terms hypothesis testing;

see, for example, Wasserman and Zhou [62] for more discussion.

5. Consider “positive” responses, i.e., those interpreted as yes. Diﬀerential privacy

may be achieved not only by reducing the probability of a true positive, but also by

increasing the probability of a false positive. In other words, by indiscriminately

implicating people who may not even be in the database, regardless of whether

or not they satisfy the proﬁle, we provide “cover” for the true positives. This is

the same philosophy as in randomized response [61], which indeed provides some

diﬀerential privacy: an embarassing response may simply be the result of a coin

ﬂip.

6. Deﬁnition 1 extends to group privacy as well (and to the case in which an in-

dividual contributes more than a single row to the database): changing a group

of k rows in the data set induces a change of at most a multiplicative e

k

in the

corresponding output distribution.

剩余19页未读，继续阅读

weixin_42716010

粉丝: 9
资源: 31

统计学中的差分隐私：已知与未来探索

Differential Privacy and Applications 差分隐私综述

Bolt-on Differential Privacy for Scalable SGD-based Analytics

Differential Privacy

Enabling probabilistic differential privacy protection for location recommendations

Differential Privacy From Theory to Practice

A Novel Personalized Differential Privacy Mechanism for Trajectory Data Publication

Differential Privacy From Theory to Practice.pdf

Differential privacy__and machine learning.pdf

RegressionModelFitting under+Differential Privacy and Model Inversion Attack.pdf

The Algorithmic Foundations of Differential Privacy

最新资源