抵抗对抗样本：温度计编码增强神经网络的鲁棒性

130 浏览量更新于2024-07-14 收藏 2MB PDF 举报

"Thermometer Encoding - One Hot Way To Resist Adversarial Examples - ICLR 2018-计算机科学" 这篇论文是在ICLR 2018会议上发表的，主要讨论了神经网络对抗性示例的问题。众所周知，神经网络容易受到对抗性示例的影响，即那些网络错误分类但与真实数据难以区分的输入。作者Jacob Buckman、Aurko Roy、Colin Raffel和Ian Goodfellow提出了一个名为“温度计编码”(Thermometer Encoding)的简单改进方法，以增强网络对这类攻击的抵抗力。温度计编码是一种针对神经网络架构的修改，它显著提高了网络对对抗性示例的鲁棒性。该方法的核心思想是将输入数据的每个特征表示为一个“温度计”，其中每个可能的值被分配一个长度不等的热段（用二进制表示），值越大，对应的热段越长。这种编码方式使得网络在处理数据时更难以被误导，因为它增加了对抗性扰动的难度。论文通过实验验证了温度计编码在MNIST、CIFAR-10、CIFAR-100和SVHN数据集上的效果，结果显示，使用温度计编码的模型在对抗性示例上的准确率有显著提升，且没有降低对正常数据的泛化能力。在最强的已知白盒攻击下，MNIST数据集上的准确率从93.20%提高到94.30%，CIFAR-10数据集上则从50.00%提升至79.16%。此外，研究还探讨了采用温度计编码的网络特性，提供了证据表明，这种编码方式有助于神经网络更好地识别和抵抗对抗性示例。这可能是由于温度计编码增加了输入的稀疏性，使得网络更容易检测到异常或不自然的模式，从而提高了对潜在攻击的防御能力。温度计编码是一种有效的增强神经网络安全性的技术，特别是在抵御对抗性攻击方面。通过改变输入数据的编码方式，它可以提高模型的鲁棒性和对异常输入的识别能力，这对于保护深度学习模型免受恶意攻击具有重要意义。

Published as a conference paper at ICLR 2018

is typically performed in a white-box fashion, and so in order to utilize and properly compare against

the adversarial training techniques of Madry et al. (2017), it is important to have strong white-box

attacks.

For ease of presentation, we will describe the attacks assuming that f : R → R

discretizes inputs

into thermometer encodings; in order to attack one-hot encodings, simply replace all instances of

therm

with f

onehot

, τ with χ, and C with the identity function I. We represent the adversarial

image after t steps of the attack as z

, where the value of the ith pixel is z

The ﬁrst attack, Discrete Gradient Ascent (DGA), follows the direction of the gradient of the loss

with respect to f(x), but is constrained at every step to be a discretized vector. If we have discretized

the input image into k-dimensional vectors using the one-hot encoding, this corresponds to moving

to a vertex of the simplex (∆

)

at every step. The second attack, Logit-Space Projected Gradient

Ascent (LS-PGA), relaxes this assumption, allowing intermediate iterates to be in the interior of the

simplex. The ﬁnal adversarial image is obtained by projecting the ﬁnal point back to the nearest

vertex of the simplex.

Note that if the number of attack steps is 1, then the two attacks are equivalent; however, for larger

numbers of attack steps, LS-PGA is a generalization of DGA.

2.3.1 DISCRETE GRADIENT ASCENT (DGA)

Following PGD (Madry et al., 2017), we initialize DGA by placing each pixel into a random bucket

that is within ε of the pixel’s true value. At each step of the attack, we look at all buckets that are

within ε of the true value, and select the bucket that is likely to do the most ‘harm’, as estimated by

the gradient of setting that bucket’s indicator variable to 1, with respect to the model’s loss at the

previous step.

= f

therm

+ U (−ε, ε))

harm(z

)

(

− τ (l))

∂L(z

)

∂z

if ∃(−ε ≤ η ≤ ε) s.t. b(x

+ η) = l

0 otherwise.

t+1

= τ



arg max



harm





Because the outcome of this optimization procedure will vary depending on the initial random per-

turbation, we suggest strengthening the attack by re-running it several times and using the pertur-

bation with the greatest loss. The pseudo-code for the DGA attack is given in Section B of the

appendix.

2.3.2 LOGIT-SPACE PROJECTED GRADIENT ASCENT (LS-PGA)

To perform LS-PGA, we soften the discrete encodings into continuous relaxations, and then perform

standard Projected Gradient Ascent (PGA) on these relaxed values. We represent the distribution

over embeddings as a softmax over logits u, each corresponding to the unnormalized log-weight

of a speciﬁc bucket’s embedding. To improve the attack, we scale the logits with temperature T ,

allowing us to trade off between how closely our softmax approximates a true one-hot distribution

as in the Gumbel-softmax trick (Jang et al., 2016; Maddison et al., 2016), and how much gradient

signal the logits receive. At each step of a multi-step attack, we anneal this value via exponential

decay with rate δ.

= C





final

= τ



arg max



final



= T

t−1

· δ

We initialize each of the logits randomly with values sampled from a standard normal distribution.

At each step, we ensure that the model does not assign any probability to buckets which are not

within ε of the true value by ﬁxing the logits to be −∞. The model’s loss is a continuous function of

the logits, so we can simply utilize attacks designed for continuous-valued inputs, in this case PGA

剩余21页未读，继续阅读

weixin_38710781

粉丝: 3
资源: 907

抵抗对抗样本：温度计编码增强神经网络的鲁棒性

万用表联机软件tsdmv970.zip

MLX90614温度传感器 Arduino测试代码

噪音温度计「Noise Thermometer」-crx插件

Animated-Thermometer-Using-Unicode:动画温度计

DS18B20-thermometer-designed-.zip_单片机开发_C/C++_

Optris IR thermometer-开源

Noise Thermometer-crx插件

react-thermometer-component:React温度计组件

串口协议通信实现的温度计-thermometer-master.zip

Android thermometer_android仪表盘-IT计算机-毕业设计.zip

最新资源