这一段讲的是什么:Abstract—A recent trojan attack on deep neural network (DNN) models is one insidious variant of data poisoning attacks. Trojan attacks exploit an effective backdoor created in a DNN model by leveraging the difficulty in interpretability of the learned model to misclassify any inputs signed with the attacker’s chosen trojan trigger. Since the trojan trigger is a secret guarded and exploited by the attacker, detecting such trojan inputs is a challenge, especially at run-time when models are in active operation. This work builds STRong Intentional Perturbation (STRIP) based run-time trojan attack detection system and focuses on vision system. We intentionally perturb the incoming input, for instance by superimposing various image patterns, and observe the randomness of predicted classes for perturbed inputs from a given deployed model—malicious or benign. A low entropy in predicted classes violates the input-dependence property of a benign model and implies the presence of a malicious input—a characteristic of a trojaned input. The high efficacy of our method is validated through case studies on three popular and contrasting datasets: MNIST, CIFAR10 and GTSRB. We achieve an overall false acceptance rate (FAR) of less than 1%, given a preset false rejection rate (FRR) of 1%, for different types of triggers. Using CIFAR10 and GTSRB, we have empirically achieved result of 0% for both FRR and FAR. We have also evaluated STRIP robustness against a number of trojan attack variants and adaptive attacks. Index Terms—Trojan attack, Backdoor attack
时间: 2024-04-22 20:28:18 浏览: 212
安全相关-病毒防治-trojan remover(木马查杀) v6.zip
5星 · 资源好评率100%
这段摘要讲述了关于深度神经网络(DNN)模型的特洛伊木马攻击。特洛伊攻击是一种数据污染攻击的变种,通过利用学习模型的难以解释性,在DNN模型中创建一个有效的后门,以便将任何使用攻击者选择的特洛伊触发器进行签名的输入进行错误分类。由于特洛伊触发器是攻击者保护和利用的秘密,因此在模型处于活动运行状态时,检测此类特洛伊输入是一项挑战。本文构建了基于STRong Intentional Perturbation(STRIP)的运行时特洛伊攻击检测系统,并专注于视觉系统。我们有意地扰乱传入的输入,例如通过叠加各种图像模式,并观察给定部署模型对扰动输入预测类别的随机性-恶意或良性。预测类别的低熵违反了良性模型的输入依赖性属性,并意味着存在恶意输入-这是特洛伊输入的特征。我们通过对三个流行且对比鲜明的数据集(MNIST、CIFAR10和GTSRB)进行案例研究,验证了我们方法的高效性。在给定1%的预设误拒绝率(FRR)的情况下,我们实现了总体误接受率(FAR)小于1%。使用CIFAR10和GTSRB,我们在FRR和FAR方面实现了0%的实证结果。我们还评估了STRIP对多种特洛伊攻击变种和自适应攻击的鲁棒性。
关键词:特洛伊攻击,后门攻击。
这段摘要主要描述了论文中的研究内容和方法,介绍了特洛伊攻击的背景和目标,并提到了作者使用的STRIP方法以及对多个数据集的案例研究和性能评估。
阅读全文