Adversarial Machine Learning in Image Classification: A Survey Towards the Defender’s Perspective 7
Fig. 4. (a): Malicious and usually imperceptible perturbations present in a input image can induce trained
models to misclassification. Adapted from Klarreich [
93
]. (b): The objective of an adversarial aack is to
generate a perturbation
δx
and insert it into a legitimate image
x
in order to make the resulting adversarial
image x
′
= x + δx cross the decision boundary. Adapted from Bakhti et al. [8].
3.2 Taxonomy of Aacks and Aackers
This section is also based on the concepts and denitions of the works of Akhtar and Mian, Barreno
et al
.
, Brendel et al
.
, Kumar and Mehta, Xiao and Yuan et al
.
[
2
,
10
,
15
,
97
,
190
,
198
] to extend
5
existing taxonomies which organize attacks and attackers. In the context of security, adversarial
attacks and attackers are categorized under threat models. A threat model denes the conditions
under which a defense is designed to provide security garantees against certain types of attacks
and attackers [
19
]. Basically, a threat model delimiters (i) the knowledge an attacker has about the
targeted classier (such as its parameters and architecture), (ii) his goal with the adversarial attack
and (iii) how he will perform the adversarial attack. A threat model can be then classied into six
dierent axes: (i) attacker’s inuency, (ii) attacker’s knowledge, (iii) security violation, (iv) attack
specicity, (v) attack computation and (vi) attack approach.
3.2.1 Aacker’s Influence. This axis denes how the attacker will control the learning process
of deep learning models. According to Xiao [
190
], the attacker can perform two types of attack,
taking into account his inuence on the classication model: (i) causative or poisoning attacks and
(ii) evasive or exploratory attacks.
• Causative or poisoning attacks
: in causative attacks, the attacker has inuence on the deep
learning model during its training stage. In this type of attack, the training samples are corrupted
or the training set is polluted with adversarial examples in order to produce a classication model
incompatible with the original data distribution;
• Evasive or exploratory attacks
: in constrast of causative attacks, in evasive attacks the attacker
has inuence on the deep learning models during the inference or testing stage. Evasive attacks
are the most common type of attack, where the attacker craft adversarial examples that lead
deep learning models to misclassication, usually with a high condence on the prediction.
Evasive attacks can also have an exploratory nature, where the attacker’s objective is to gather
information about the targeted model, such as its parameters, architectures, cost functions, etc.
The most common exploratory attack is the input/output attack, where the attacker provides
the targeted model with adversarial images crafted by him. Afterwards, the attacker observes
the outputs given by the model and tries to reproduce a substitute or surrogate model, so that it
5
Again here, the novel topics proposed by this paper are highlighted by undelined font.
, Vol. 1, No. 1, Article . Publication date: September 2020.