没有合适的资源?快使用搜索试试~ 我知道了~
首页深度学习:对抗样本挑战与解决策略
深度学习-样本误分类问题的分析及解决方案 这篇文章是2015年国际机器学习会议(ICLR)上发表的一篇重要论文,作者是Ian Goodfellow、Jonathon Shlens和Christian Szegedy,他们来自Google公司。论文标题为"Explaining and Harnessing Adversarial Examples",主要关注深度学习模型,特别是神经网络在面对故意设计的、对原始样本进行微小但具有破坏性扰动(即所谓的"对抗样本")时的分类性能显著下降问题。 早期的研究试图通过非线性性和过拟合来解释这种现象,但作者提出,神经网络对对抗性扰动的敏感性主要源于它们的本质——线性特性。他们通过新的定量结果支持这一观点,这不仅揭示了神经网络对于对抗样本的普遍易感性,即它们能在不同的网络架构和训练数据集上重现,还解答了这个令人费解的现象。 作者的关键发现是,神经网络的线性特性使得它们容易受到微小扰动的影响,即使这些扰动对于人类来说可能难以察觉。对抗样本的生成方法也由此变得简单和快速。利用这一洞察,论文提出了一种对抗性训练的方法,即在训练过程中引入对抗样本,以此提高模型的鲁棒性,减少测试集上的错误率。 这篇论文深入剖析了深度学习中的一个关键问题,并提供了重要的理论基础和实践策略,以增强模型对对抗攻击的防御能力,对于理解和改进深度学习系统的稳健性具有重要意义。通过理解神经网络的线性弱点,研究人员可以更好地设计更健壮的模型,以应对现实世界中可能遇到的各种干扰。
资源详情
资源推荐
![](https://csdnimg.cn/release/download_crawler_static/8587237/bg3.jpg)
Published as a conference paper at ICLR 2015
+ .007 × =
x sign(∇
x
J(θ, x, y))
x +
sign(∇
x
J(θ, x, y))
“panda” “nematode” “gibbon”
57.7% confidence 8.2% confidence 99.3 % confidence
Figure 1: A demonstration of fast adversarial example generation applied to GoogLeNet (Szegedy
et al., 2014a) on ImageNet. By adding an imperceptibly small vector whose elements are equal to
the sign of the elements of the gradient of the cost function with respect to the input, we can change
GoogLeNet’s classification of the image. Here our of .007 corresponds to the magnitude of the
smallest bit of an 8 bit image encoding after GoogLeNet’s conversion to real numbers.
Let θ be the parameters of a model, x the input to the model, y the targets associated with x (for
machine learning tasks that have targets) and J(θ, x, y) be the cost used to train the neural network.
We can linearize the cost function around the current value of θ, obtaining an optimal max-norm
constrained pertubation of
η = sign (∇
x
J(θ, x, y)) .
We refer to this as the “fast gradient sign method” of generating adversarial examples. Note that the
required gradient can be computed efficiently using backpropagation.
We find that this method reliably causes a wide variety of models to misclassify their input. See
Fig. 1 for a demonstration on ImageNet. We find that using = .25, we cause a shallow softmax
classifier to have an error rate of 99.9% with an average confidence of 79.3% on the MNIST (?) test
set
1
. In the same setting, a maxout network misclassifies 89.4% of our adversarial examples with
an average confidence of 97.6%. Similarly, using = .1, we obtain an error rate of 87.15% and
an average probability of 96.6% assigned to the incorrect labels when using a convolutional maxout
network on a preprocessed version of the CIFAR-10 (Krizhevsky & Hinton, 2009) test set
2
. Other
simple methods of generating adversarial examples are possible. For example, we also found that
rotating x by a small angle in the direction of the gradient reliably produces adversarial examples.
The fact that these simple, cheap algorithms are able to generate misclassified examples serves as
evidence in favor of our interpretation of adversarial examples as a result of linearity. The algorithms
are also useful as a way of speeding up adversarial training or even just analysis of trained networks.
5 ADVERSARIAL TRAINING OF LINEAR MODELS VERSUS WEIGHT DECAY
Perhaps the simplest possible model we can consider is logistic regression. In this case, the fast
gradient sign method is exact. We can use this case to gain some intuition for how adversarial
examples are generated in a simple setting. See Fig. 2 for instructive images.
If we train a single model to recognize labels y ∈ {−1, 1} with P (y = 1) = σ
w
>
x + b
where
σ(z) is the logistic sigmoid function, then training consists of gradient descent on
E
x,y∼p
data
ζ(−y(w
>
x + b))
where ζ(z) = log (1 + exp(z)) is the softplus function. We can derive a simple analytical form for
training on the worst-case adversarial perturbation of x rather than x itself, based on gradient sign
1
This is using MNIST pixel values in the interval [0, 1]. MNIST data does contain values other than 0 or
1, but the images are essentially binary. Each pixel roughly encodes “ink” or “no ink”. This justifies expecting
the classifier to be able to handle perturbations within a range of width 0.5, and indeed human observers can
read such images without difficulty.
2
See https://github.com/lisa-lab/pylearn2/tree/master/pylearn2/scripts/
papers/maxout. for the preprocessing code, which yields a standard deviation of roughly 0.5.
3
剩余10页未读,继续阅读
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
安全验证
文档复制为VIP权益,开通VIP直接复制
![](https://csdnimg.cn/release/wenkucmsfe/public/img/green-success.6a4acb44.png)