深度网络中的成员资格推断攻击：预测分数的误导性

版权申诉

132 浏览量更新于2024-07-06 收藏 4.77MB PDF 举报

"这篇论文《不信任成员资格推断攻击的预测分数》(Do Not Trust Prediction Scores for Membership Inference Attacks)由Dominik Hintersdorf、Lukas Struppek和Kristian Kersting三位作者撰写，来自德国达姆施塔特工业大学计算机科学系和认知科学中心，以及黑森州人工智能中心。" 在现代信息技术领域，隐私保护成为一个日益重要的议题。成员资格推断攻击（Membership Inference Attacks，简称MIAs）是一种针对机器学习模型的隐私威胁，其目的是确定特定样本是否被用于训练模型。攻击者通常依赖于模型的预测分数，即模型对于输入数据给出每个输出的概率，认为训练数据会影响模型的行为模式。然而，论文作者指出，这种基于预测分数进行成员资格推断的方法存在误区，特别是在现代深度网络架构，如使用ReLU激活函数的神经网络中。这类网络在远离训练数据的区域往往会产生几乎恒高的预测分数。因此，MIAs在这种情况下可能会失败，因为这种行为导致了高假阳性率，不仅在已知领域，而且在分布外的数据上也是如此，这实际上无意间起到了抵御MIAs的作用。论文进一步探讨了使用生成对抗网络（Generative Adversarial Networks, GANs）等技术进行防御的可能性。生成对抗网络可以生成与训练数据类似的新样本，以混淆攻击者的判断，提高模型的隐私安全性。此外，论文还可能涉及了其他防御策略，如差分隐私（Differential Privacy）和模型压缩等技术，以增强模型的隐私保护能力，同时保持其预测性能。总结来说，这篇论文揭示了现有成员资格推断攻击方法的局限性，并提出，对于现代深度学习模型，依赖预测分数进行攻击并不准确。它强调了对隐私保护策略的重新评估和改进的必要性，特别是在对抗性环境中的隐私保护研究。

queried each of the shadow models with its training data

(members), as well as unseen data (non-members) to re-

trieve the prediction scores of the shadow models. Multiple

binary classiﬁers were then trained for each class label to

predict the membership status.

Salem et al. [25] also exploited prediction scores and

trained a single class-agnostic neural network to distinguish

between members and non-members. In contrast to Shokri

et al. [26], their approach relies on a single shadow model.

The input of h consists of the k highest prediction scores in

descending order.

Instead of focusing solely on the scores, Yeom et al. [33]

took advantage of the fact that the loss of a model is lower

on members than on non-members and ﬁt a threshold to

the loss values. More recent approaches [3, 16] focused

on label-only attacks where only the predicted label for a

known input is observed.

Most defense strategies either try to decrease the in-

formative value of prediction scores or reduce overﬁtting.

The informative value can be decreased by adding a large

temperature to the softmax function to increase its entropy

[26], adding carefully crafted noise to the predictions [9] or

outputting only the predicted label without any score [26].

Various regularization techniques were proposed to reduce

overﬁtting and thus the accuracy gap, e.g., L2 regularization

[26] and dropout [26, 25].

3 Overconﬁdence of Neural Networks

Neural networks usually output prediction scores, e.g.,

by applying a softmax function. To take model uncer-

tainty into account, it is generally desired that the predic-

tion scores represent the probability of a correct prediction,

which is usually not the case. This problem is generally re-

ferred to as model calibration. Guo et al. [5] demonstrated

that modern networks tend to be overconﬁdent in their pre-

dictions.

Generally, as Hein et al. [7] noted, there have been many

cases reported where high prediction scores are made far

away from the training data by neural networks, e.g., on

fooling images, for out-of-distribution (OOD) images, in a

medical diagnosis task, but also on the original task. Hein et

al. then proved that ReLU networks are overconﬁdent even

on samples far away from the training data.

Scaling the inputs to ReLU network actually allows one

to produce arbitrarily high prediction scores. Existing ap-

proaches to mitigate overconﬁdence can be grouped into

two categories: post-processing methods applied on top of

trained models and regularization methods modifying the

training process.

As a post-processing method, Guo et al. [5] proposed

temperature scaling using a single temperature parameter T

for scaling down the pre-softmax logits for all classes. The

larger T is, the more the resulting scores approach a uniform

distribution while its entropy increases.

Kristiadi et al. [13] proposed a Bayesian approach. They

ﬁxed the weights for all layers of a trained network except

the last one and used a Kronecker-factored Laplace approx-

imation (LA) on the weights of the ﬁnal layer. M

uller et al.

[17] demonstrated that label smoothing regularization [28]

not only improves the generalization of a model but also

implicitly leads to better model calibration. It reduces the

difference between the highest and the other logit values,

thus reducing overconﬁdent predictions. The calibration of

a model can be measured by the expected calibration error

(ECE) [18]. It computes a weighted average over the abso-

lute difference between test accuracy and prediction scores.

4 Do Not Trust Prediction Scores for MIAs

In this section, we will show that predictions scores for

MIAs cannot be trusted because score-based MIAs make

membership decisions based mainly on the maximum pre-

diction score. As a ﬁrst step, we introduce our proposition

and then verify our claims empirically.

Formally, a neural network f (x) using ReLU activations

decomposes the unrestricted input space R

into a ﬁnite set

of polytopes (linear regions). We can then interpret f(x) as

a piecewise afﬁne function that is afﬁne in any polytope.

Due to the limited number of polytopes, the outer polytopes

extend to inﬁnity which allows to arbitrarily increase the

prediction scores through scaling inputs by a large constant

δ [7]. Applying these ﬁndings to MIAs results in the fol-

lowing proposition:

Proposition 1. Given a ReLU-classiﬁer, we can force al-

most any non-member input to be classiﬁed as a member

by score-based MIAs simply through scaling it by a large

constant.

Proof. Let f : R

→ R

be a piecewise afﬁne ReLU-

classiﬁer. We deﬁne a score-based MIA inference model

h : R

→ {0, 1} with 1 indicating a classiﬁcation as mem-

ber. For almost any input x ∈ R

and a sufﬁciently

small  > 0 if max

i=1,...,d

f(x)

≥ 1 −  it follows that

h(f(x)) = 1. Since lim

δ→∞

max

i=1,...,d

f(δx)

= 1, then

lim

δ→∞

h(f(δx)) = 1 already holds.

By scaling the whole non-member dataset, one can force

the FPR to be close to 100%. Indeed, the proposition holds

only for ReLU-networks and unbounded inputs, which are

not restricted to the range of [0, 1]

. Next, we empirically

show that one cannot trust predictions scores for MIAs in

more general settings without input scaling required and us-

ing other activation functions.

剩余14页未读，继续阅读

易小侠

粉丝: 6606
资源: 9万+

深度网络中的成员资格推断攻击：预测分数的误导性

LinkPrediction.rar_link prediction_prediction_链路预测 python_链路预测;p

xuqiuyuce_人口预测_prediction_源码

bp--nn.rar_BP预测_bp prediction_prediction bp_汽轮机

gm11_GM11_K._预测模型_灰色预测模型_prediction_

time-series-prediction.rar_ANN_ann prediction_prediction neural

lms.rar_LMS预测_linear prediction_线性预测 LMS_预测_预测 matlab

09_Deep_Learning_Prediction_deeplearning_prediction_

yu-2.rar_newff_newff prediction_traffic prediction_网络流量预测

2024光伏出力预测_pv_energy_prediction.zip

bp.rar_BP、预测_BP预测matlab_bp prediction_bp 预测_预测

最新资源