动量增强深度学习模型的对抗性攻击策略

10 浏览量更新于2024-08-26 收藏 1.71MB PDF 举报

本文主要探讨了"用动量来增强对抗性攻击"这一主题，针对深度神经网络（Deep Neural Networks, DNNs）在实际应用中的安全性问题。随着深度学习模型在众多领域广泛应用，它们对于精心构造的对抗性样本（Adversarial Examples）表现出脆弱性，这些样本可以误导模型做出错误的预测，对算法的部署带来潜在的严重后果。因此，评估模型的鲁棒性是至关重要的，对抗性攻击作为度量模型抵抗力的一个关键工具。然而，目前大多数现有的对抗性攻击方法在黑盒环境下，即攻击者仅能观察到模型的输入输出，其成功率往往较低。为解决这一问题，研究者提出了基于动量的迭代算法来提升对抗性攻击的效率和效果。传统的迭代攻击方法通常采用梯度下降或其他优化策略，但这些方法容易陷入局部最优，导致攻击效率不高或难以突破模型的防御机制。作者们提出的新型算法将动量概念融入到攻击的迭代过程中，通过引入动量项，攻击方向得以稳定，并且能够更好地跳出那些可能限制攻击进展的局部最优区域。动量机制可以看作是对传统优化过程的一种加速和方向调整，它有助于在搜索空间中保持连续性和连贯性，从而提高攻击的全局性能。这种方法不仅适用于白盒攻击（攻击者完全了解目标模型的结构和参数），也对黑盒攻击有显著的提升作用，使得攻击者能够以更高的成功率欺骗模型。这篇研究论文的核心贡献在于提出了一种新颖的对抗性攻击策略，利用动量优化技术增强攻击效果，这对于提高深度学习模型的安全性评估以及对抗防御技术的研究具有重要意义。未来的研究可能会进一步探索如何优化动量参数、结合其他攻击技巧或者设计更高级别的自适应动量策略，以应对不断进化的安全挑战。

Iterative methods [9] iteratively apply fast gradient

multiple times with a small step size α. The iterative version

of FGSM (I-FGSM) can be expressed as:

∗

= x, x

∗

t+1

= x

∗

+ α · sign(∇

J(x

∗

, y)). (3)

To make the generated adversarial examples satisfy the L

∞

(or L

) bound, one can clip x

∗

into the  vicinity of x or

simply set α =



/T with T being the number of iterations.

It has been shown that iterative methods are stronger white-

box adversaries than one-step methods at the cost of worse

transferability [10, 24].

Optimization-based methods [23] directly optimize the

distance between the real and adversarial examples sub-

ject to the misclassiﬁcation of adversarial examples. Box-

constrained L-BFGS can be used to solve such a problem.

A more sophisticated way [1] is solving:

arg min

∗

λ · kx

∗

− xk

− J(x

∗

, y). (4)

Since it directly optimizes the distance between an adver-

sarial example and the corresponding real example, there

is no guarantee that the L

∞

) distance is less than the

required value. Optimization-based methods also lack the

efﬁcacy in black-box attacks just like iterative methods.

2.2. Defense methods

Among many attempts [13, 3, 15, 10, 24, 17, 11], adver-

sarial training is the most extensively investigated way to

increase the robustness of DNNs [5, 10, 24]. By injecting

adversarial examples into the training procedure, the adver-

sarially trained models learn to resist the perturbations in

the gradient direction of the loss function. However, they do

not confer robustness to black-box attacks due to the cou-

pling of the generation of adversarial examples and the pa-

rameters being trained. Ensemble adversarial training [24]

augments the training data with the adversarial samples pro-

duced not only from the model being trained, but also from

other hold-out models. Therefore, the ensemble adversari-

ally trained models are robust against one-step attacks and

black-box attacks.

3. Methodology

In this paper, we propose a broad class of momentum

iterative gradient-based methods to generate adversar-

ial examples, which can fool white-box models as well as

black-box models. In this section, we elaborate the pro-

posed algorithms. We ﬁrst illustrate how to integrate mo-

mentum into iterative FGSM, which induces a momentum

iterative fast gradient sign method (MI-FGSM) to generate

adversarial examples satisfying the L

∞

norm restriction in

the non-targeted attack fashion. We then present several

methods on how to efﬁciently attack an ensemble of mod-

els. Finally, we extend MI-FGSM to L

norm bound and

targeted attacks, yielding a broad class of attack methods.

Algorithm 1 MI-FGSM

Input: A classiﬁer f with loss function J; a real example x and

ground-truth label y;

Input: The size of perturbation ; iterations T and decay factor µ.

Output: An adversarial example x

∗

with kx

∗

− xk

∞

≤ .

1: α =



/T ;

2: g

= 0; x

∗

= x;

3: for t = 0 to T − 1 do

4: Input x

∗

to f and obtain the gradient ∇

J(x

∗

, y);

5: Update g

t+1

by accumulating the velocity vector in the

gradient direction as

t+1

= µ · g

∇

J(x

∗

, y)

k∇

J(x

∗

, y)k

; (6)

6: Update x

∗

t+1

by applying the sign gradient as

∗

t+1

= x

∗

+ α · sign(g

t+1

); (7)

7: end for

8: return x

∗

= x

∗

3.1. Momentum iterative fast gradient sign method

The momentum method [18] is a technique for accelerat-

ing gradient descent algorithms by accumulating a velocity

vector in the gradient direction of the loss function across

iterations. The memorization of previous gradients helps to

barrel through narrow valleys, small humps and poor local

minima or maxima [4]. The momentum method also shows

its effectiveness in stochastic gradient descent to stabilize

the updates [20]. We apply the idea of momentum to gener-

ate adversarial examples and obtain tremendous beneﬁts.

To generate a non-targeted adversarial example x

∗

from

a real example x, which satisﬁes the L

∞

norm bound,

gradient-based approaches seek the adversarial example by

solving the constrained optimization problem

arg max

∗

J(x

∗

, y), s.t. kx

∗

− xk

∞

≤ , (5)

where  is the size of adversarial perturbation. FGSM gen-

erates an adversarial example by applying the sign of the

gradient to a real example only once (in Eq. (1)) by the

assumption of linearity of the decision boundary around

the data point. However in practice, the linear assump-

tion may not hold when the distortion is large [12], which

makes the adversarial example generated by FGSM “under-

ﬁts” the model, limiting its attack ability. In contrast, it-

erative FGSM greedily moves the adversarial example in

the direction of the sign of the gradient in each iteration (in

Eq. (3)). Therefore, the adversarial example can easily drop

into poor local maxima and “overﬁt” the model, which is

not likely to transfer across models.

In order to break such a dilemma, we integrate momen-

tum into the iterative FGSM for the purpose of stabiliz-

ing update directions and escaping from poor local max-

ima. Therefore, the momentum-based method remains the

transferability of adversarial examples when increasing it-

剩余11页未读，继续阅读

weixin_38610052

粉丝: 6

动量增强深度学习模型的对抗性攻击策略

基于噪声初始化、Adam-Nesterov方法和准双曲动量方法的对抗样本生成方法.docx

对抗攻击算法总结-比较多的攻击算法

PyTorch工具箱torchadver:轻松创建对抗性攻击示例

深度学习模型防御实战：全面破解对抗性攻击的防护之道

TargetedFool一种实现有目标攻击的算法.docx

深度学习对抗攻击与防御方法全面指南

深度解析：多款对抗攻击算法详解-MIM, FGSM, PGD, C&W

PyTorch实现的白盒攻击源码：VGG与Resnet网络攻击方法

先验驱动不确定性近似：提升通用对抗扰动的无监督方法

对抗性训练：如何用迁移学习提高模型鲁棒性

最新资源