深度学习黑盒攻击：零阶优化方法

需积分: 0 104 浏览量更新于2024-08-05 收藏 2.78MB PDF 举报

“黑盒C&W1：基于零阶优化的深度神经网络无训练替代模型的黑盒攻击” 这篇论文探讨了深度神经网络（DNNs）的安全性问题，特别是针对对抗性样本的鲁棒性。DNNs在图像分类、文本挖掘和语音处理等众多机器学习任务中表现出最先进的性能，但近年来的研究揭示了它们对对抗性攻击的脆弱性。对抗性样本是设计用来误导DNNs的输入，即使这些输入对于人类来说看起来与正常样本并无显著差异。论文提出了一个名为“ZOO”（Zeroth-Order Optimization-Based Black-box Attack）的新型攻击方法，它无需对目标模型进行任何内部知识或训练替代模型。这种方法利用零阶优化技术，通过仅使用目标模型的输出信息（如分类概率或损失函数值）来构造对抗性样本。这种黑盒攻击策略对于那些无法访问模型内部参数或结构的安全关键应用尤其重要，例如自动驾驶中的交通标志识别。作者包括来自IBM T.J. Watson Research Center的人工智能基础组成员和加州大学戴维斯分校的研究人员。他们指出，由于DNNs在决策过程中的不透明性，传统的基于梯度的方法在黑盒攻击中可能无效，因此ZOO提供了一种新的解决方案。通过模拟优化过程，ZOO能够逐步调整输入，使其最大化目标模型的预测错误，从而创建出能够欺骗DNN的对抗性样本。此外，论文还可能涉及了评估和度量攻击有效性的方法，以及可能的防御策略。研究人员可能会讨论如何通过增强模型的鲁棒性来抵御这种类型的攻击，例如使用对抗性训练或者设计更健壮的模型架构。同时，他们也可能探讨了ZOO攻击在不同类型的DNN模型上的表现，以及在实际应用中可能遇到的挑战和限制。这篇论文深入研究了DNNs的安全性问题，并提出了一种创新的黑盒攻击方法，这不仅对于理解DNN的弱点具有重要意义，也为提升机器学习模型的安全性和可靠性提供了新的视角和研究方向。

Figure 2: Taxonomy of adversarial attacks to deep neural

networks (DNNs). “Back propagation” means an attacker

can access the internal congurations in DNNs (e.g., per-

forming gradient descent), and “Query” means an attacker

can input any sample and observe the corresponding output.

successfully bypass 10 dierent detections methods designed for

detecting adversarial examples [7].

• Transferability:

In the context of adversarial attacks, transfer-

ability means that the adversarial examples generated from one

model are also very likely to be misclassied by another model.

In particular, the aforementioned adversarial attacks have demon-

strated that their adversarial examples are highly transferable from

one DNN at hand to the targeted DNN. One possible explanation

of inherent attack transferability for DNNs lies in the ndings that

DNNs commonly have overwhelming generalization power and lo-

cal linearity for feature extraction [

]. Notably, the transferability

of adversarial attacks brings about security concerns for machine

learning applications based on DNNs, as malicious examples may

be easily crafted even when the exact parameters of a targeted DNN

are absent. More interestingly, the authors in [

] have shown that

a carefully crafted universal perturbation to a set of natural im-

ages can lead to misclassication of all considered images with

high probability, suggesting the possibility of attack transferability

from one image to another. Further analysis and justication of a

universal perturbation is given in [30].

1.2 Black-box attacks and substitute models

While the denition of an open-box (white-box) attack to DNNs is

clear and precise - having complete knowledge and allowing full

access to a targeted DNN, the denition of a “black-box” attack

to DNNs may vary in terms of the capability of an attacker. In an

attacker’s perspective, a black-box attack may refer to the most

challenging case where only benign images and their class labels

are given, but the targeted DNN is completely unknown, and one is

prohibited from querying any information from the targeted classi-

er for adversarial attacks. This restricted setting, which we call a

“no-box” attack setting, excludes the principal adversarial attacks

introduced in Section 1.1, as they all require certain knowledge

and back propagation from the targeted DNN. Consequently, under

this no-box setting the research focus is mainly on the attack trans-

ferability from one self-trained DNN to a targeted but completely

access-prohibited DNN.

On the other hand, in many scenarios an attacker does have the

privilege to query a targeted DNN in order to obtain useful informa-

tion for crafting adversarial examples. For instance, a mobile app or

a computer software featuring image classication (mostly likely

trained by DNNs) allows an attacker to input any image at will and

acquire classication results, such as the condence scores or rank-

ing for classication. An attacker can then leverage the acquired

classication results to design more eective adversarial examples

to fool the targeted classier. In this setting, back propagation for

gradient computation of the targeted DNN is still prohibited, as

back propagation requires the knowledge of internal congurations

of a DNN that are not available in the black-box setting. However,

the adversarial query process can be iterated multiple times until an

attacker nds a satisfactory adversarial example. For instance, the

authors in [

] have demonstrated a successful black-box adversar-

ial attack to Clarifai.com, which is a black-box image classication

system.

Due to its feasibility, the case where an attacker can have free

access to the input and output of a targeted DNN while still be-

ing prohibited from performing back propagation on the targeted

DNN has been called a practical black-box attack setting for DNNs

[

]. For the rest of this paper, we also refer a

black-box adversarial attack to this setting. For illustration, the

attack settings and their limitations are summarized in Figure 2. It

is worth noting that under this black-box setting, existing attacking

approaches tend to make use of the power of free query to train a

substitute model [

], which is a representative substitute of

the targeted DNN. The substitute model can then be attacked using

any white-box attack techniques, and the generated adversarial

images are used to attack the target DNN. The primary advantage

of training a substitute model is its total transparency to an at-

tacker, and hence essential attack procedures for DNNs, such as

back propagation for gradient computation, can be implemented on

the substitute model for crafting adversarial examples. Moreover,

since the substitute model is representative of a targeted DNN in

terms of its classication rules, adversarial attacks to a substitute

model are expected to be similar to attacking the corresponding

targeted DNN. In other words, adversarial examples crafted from a

substitute model can be highly transferable to the targeted DNN

given the ability of querying the targeted DNN at will.

1.3 Defending adversarial attacks

One common observation from the development of security-related

research is that attack and defense often come hand-in-hand, and

one’s improvement depends on the other’s progress. Similarly, in

the context of robustness of DNNs, more eective adversarial at-

tacks are often driven by improved defenses, and vice versa. There

has been a vast amount of literature on enhancing the robustness

of DNNs. Here we focus on the defense methods that have been

shown to be eective in tackling (a subset of) the adversarial attacks

introduced in Section 1.1 while maintaining similar classication

performance for the benign examples. Based on the defense tech-

niques, we categorize the defense methods proposed for enhancing

the robustness of DNNs to adversarial examples as follows.

• Detection-based defense:

Detection-based approaches aim to

dierentiate an adversarial example from a set of benign exam-

ples using statistical tests or out-of-sample analysis. Interested

readers can refer to recent works in [

] and

references therein for details. In particular, feature squeezing is

剩余12页未读，继续阅读

梁肖松

粉丝: 32
资源: 300

深度学习黑盒攻击：零阶优化方法

【增长黑盒&久谦中台】2024下沉消费市场新商机研究报告.pdf

2023中国AIGC应用研究报告-增长黑盒&黑盒点评

图书管理系统设计(C语言).doc.doc

白盒&黑盒测试1

找零钱最佳组合 黑盒测试 c语言代码

软件黑盒测试用例练习题.doc.pdf

c测试用例黑盒测试

黑盒测试

C语言模块化编程：黑盒与接口设计

C语言模块化编程：接口与黑盒实践

最新资源

找零钱最佳组合黑盒测试 c语言代码