强化DNN防御：对抗攻击的有效策略

需积分: 30 127 浏览量更新于2024-07-18 收藏 876KB PDF 举报

"针对对抗性攻击的有效防御" 随着深度学习在计算机视觉领域的广泛应用，从视觉识别到图像生成，其在关键系统中的部署，如医疗成像、监控系统和安全敏感应用，使得对深度学习模型的可靠性和安全性预先建立变得至关重要。然而，正如任何基于计算机的系统一样，深度学习模型也可能受到各种标准攻击，例如拒绝服务或欺骗攻击。此外，深度神经网络（DNN）还表现出对预测模型特有的威胁：对抗性样本。这些样本被故意修改，旨在使模型产生期望的响应，通常是错误分类或对攻击者有利的特定错误预测。论文"Eﬃcient Defenses Against Adversarial Attacks"由Valentina Zantedeschi、Maria-Irina Nicolae和Ambrish Rawat发表，探讨了针对这种威胁的防御策略。由于对DNN工作原理的不完全理解，目前尚未开发出有效的防御方法。论文提出了一种新的基于实践观察的防御方法，该方法易于整合到模型中，并且在性能上优于现有的最佳防御策略。该提议的方法旨在强化DNN的结构，使其预测更加稳定，减少被对抗性样本愚弄的可能性。作者进行了广泛的实验研究，证明了该方法对多种攻击的有效性，将其与多种防御策略进行了对比，无论是在白盒还是黑盒设置下。值得注意的是，实施该方法几乎不会增加训练过程的开销，同时保持了预测的准确性。对抗性攻击的防御通常分为两大类：一类是检测和过滤对抗性样本，另一类是增强模型的鲁棒性。论文中提出的防御方法属于后者，它通过改进DNN的内在结构来提高模型的稳定性。这种方法可能涉及到调整网络架构、优化训练过程或者引入正则化等手段。实验结果表明，所提方法在不同攻击类型下都表现出了优越的抵抗力，这包括但不限于FGSM（Fast Gradient Sign Method）、PGD（Projected Gradient Descent）和Carlini-Wagner等攻击。此外，该方法在不显著增加计算成本的同时，确保了模型的预测性能不受影响，这对于实际应用来说具有重要意义。总结来说，这篇论文为对抗性攻击的防御提供了一个实用且高效的解决方案，强调了理解深度学习模型的工作原理对于构建更安全系统的必要性，并展示了如何通过改进模型结构来增强其抵抗对抗性样本的能力。这一研究为未来的深度学习安全性和鲁棒性研究提供了新的方向和参考。

iteratively. Each iteration, the pixel with the highest derivative is modiﬁed by a ﬁxed value (the budget

for the attack), followed by recomputing the saliency map, until the prediction has changed to a target

class. The adversarial images produced by JSMA are subtle and eﬀective for attacks, but they still

require an excessive amount of time to compute.

The fast gradient sign method (FGSM) [7] has been introduced as a computationally inexpensive,

but eﬀective alternative to JSMA. FGSM explores the gradient direction of the cost function and

introduces a ﬁxed amount of perturbation to maximize that cost. In practice, the examples produced

by this attack are more easily detectable and require a bigger distortion to achieve misclassiﬁcation

than those obtained from JSMA. An iterative version of FGSM, where a smaller perturbation is applied

multiple times, was introduced by Kurakin et al. [14].

Instead of using a ﬁxed attack budget as for the last two methods, DeepFool [22] was the ﬁrst

method to compute and apply the minimal perturbation necessary for misclassiﬁcation under the L

norm. The method performs iterative steps on the adversarial direction of the gradient provided by

a locally linear approximation of the classiﬁer. Doing so, the approximation is more accurate than

FGSM and faster than JSMA, as all the pixels are simultaneously modiﬁed at each step of the method,

but its iterative nature makes DeepFool computationally expensive. In [23, 24], the authors extend

DeepFool in order to craft a universal perturbation to be applied indiﬀerently to any instance: a ﬁxed

distortion is computed from a set of inputs, allowing to maximize the predictive error of the model

on that sample. The perturbation is computed by a greedy approach and needs multiple iterations

over the given sample before converging. To the extent where the sample is representative to the data

distribution, the computed perturbation has good chances of achieving misclassiﬁcation on unseen

samples as well.

One method aiming to compute good approximations of Problem (1) while keeping the computa-

tional cost of perturbing examples low has been proposed in Carlini and Wagner [2]. The authors cast

the formulation of Szegedy et al. [31] into a more eﬃcient optimization problem, which allows them

to craft eﬀective adversarial samples with low distortion. They deﬁne three similar targeted attacks,

based on diﬀerent distortion measures: L

, L

and L

∞

respectively. In practice, even these attacks

are computationally expensive.

If it is diﬃcult to ﬁnd new methods that are both eﬀective in jeopardizing a model and compu-

tationally aﬀordable, defending from adversarial attacks is even a harder task. On one hand, a good

defense should harden a model to any known attack and, on the other hand, it should not compro-

mise the discriminatory power of the model. In the following paragraph, we report the most eﬀective

defenses proposed for tackling adversarial examples.

Defenses A common technique for defending a model from adversarial examples consists in aug-

menting the training data with perturbed examples (technique known as ‘adversarial training‘ [31])

by either feeding a model with both true and adversarial examples or learning it using the modiﬁed

objective function:

J(θ, x, y) = αJ(θ, x, y) + (1 − α)J(θ, x + ∆x, y)

with J the original loss function. The aim of such defense is to increase the model’s robustness in speciﬁc

directions (of the adversarial perturbation) by ensuring that it will predict the same class for the true

example and its perturbations along those directions. In practice, the additional instances are crafted

for the considered model using one or multiple attack strategies, such as FGSM [7], DeepFool [22] and

virtual adversarial examples [21].

However, adversarially training a model is eﬀective only on adversarial examples crafted on the

original model, which is an improbable situation considering that an attacker might not have access to

exactly the same model for computing the perturbations. Additionally, adversarial training has been

proved to be easily bypassed through a two-step attack [32], which ﬁrst applies a random perturbation

to an instance and then performs any classical attack technique. The success of this new attack,

and of black-box attacks in general, is due to the sharpness of the loss around the training examples:

if smoothing the loss in few adversarial directions makes ineﬀective gradient-based attacks on those

剩余15页未读，继续阅读

季雨。

粉丝: 0
资源: 1

强化DNN防御：对抗攻击的有效策略

pcl-adversarial-defense:在ICCV 2019中通过限制深度神经网络的隐藏空间进行对抗性防御

基于图的对抗式攻击和防御（Adversarial attacks and defenses on graphs）.pdf

DeepRobust:用于图像和图形攻击和防御方法的 pytorch 对抗性库

Adversarial Examples: Attacks and Defenses for Deep Learning

SybilGuard defending against sybil attacks via social networks

Android Security Attacks and Defenses

Buffer Overflow Attacks and Defenses

Android Security- Attacks and Defenses

AW Crimeware Understanding New Attacks and Defenses (2008)

Attacks-and-Defenses-in-Federated-Learning

最新资源