谷歌钓鱼页面过滤器安全挑战：逃避分类器攻击研究

74 浏览量更新于2024-08-27 收藏 365KB PDF 举报

"针对分类器的规避攻击：以谷歌的钓鱼页面过滤器为例" 这篇研究论文《CrackingClassifiers for Evasion: A Case Study on the Google’s Phishing Pages Filter》深入探讨了机器学习分类器在安全应用中的脆弱性，特别是针对客户端环境中的分类器的安全问题。作者包括Bin Liang、Miaoqiang Su、Wei You、Wenchang Shi和Gang Yang，他们均来自中国人民大学，并可以通过@ruc.edu.cn联系。在当前的网络安全环境中，基于机器学习的分类器被广泛应用，如在反钓鱼页面过滤中。然而，这些分类器同时也成为恶意攻击者的目标。许多研究已经关注在线分类器的规避攻击以及防御方法，但客户端部署的分类器的安全性却未得到充分重视。过去的研究大多集中在仅用于实验目的的分类器上，而对于广泛使用的商业分类器的安全性，了解仍然有限。谷歌的钓鱼页面过滤器（GPPF）是一个内置在Chrome浏览器中，拥有超过十亿用户的分类器，论文选择它作为案例来研究客户端分类器的安全挑战。作者提出了一种新的攻击方法，称为"Classifier Evasion through Feature Perturbation"（特征扰动规避分类器），专门针对客户端的分类器。该攻击方法的核心是通过微小的、难以察觉的修改（特征扰动）来欺骗分类器，使其无法正确识别潜在的威胁，例如钓鱼网站。这种攻击策略可能使用户在不知情的情况下访问危险的网络页面，从而对个人数据和隐私造成威胁。论文中，作者不仅详细描述了攻击的实施过程，还分析了攻击成功的原因，包括分类器模型的漏洞、训练数据集的局限性以及特征选择的敏感性等。此外，他们还评估了防御策略的可行性，比如增强分类器的鲁棒性、改进训练数据集的多样性以及实时更新分类器模型以对抗新出现的攻击技术。为了提高客户端分类器的抵抗力，论文建议进行更全面的安全评估，包括对实际应用场景中的攻击场景进行模拟，以及对分类器进行持续监控和更新。同时，研究也提倡开发新的防御机制，例如利用深度学习的自我修复能力，或者采用多层防御策略，以增加攻击的难度和成本。这篇论文揭示了客户端分类器在安全领域的脆弱性，并提供了关于如何规避和防御此类攻击的见解，对于理解和改善未来网络安全有着重要的理论和实践意义。

However, when a classifier is deployed in the client-side

computer, the situation may become worse. As shown in Figure

1(b), for a client-side classifier, its operations are performed in a

white-box. The adversary can leverage almost all kinds of

analysis techniques, such as debugging, disassembling, code

analysis, dynamic taint tracking, etc., to thoroughly analyze the

target classifier. As a result, the adversary has an opportunity to

get more comprehensive knowledge about the classifier to

develop more sophisticated evasion attacks. The malformed

instance can be applicable for all the users using the classifier.

Besides, if the adversary gets perfect knowledge about the

classifier, she can even reengineer a new classifier for commercial

purposes. In this study, it is assumed that all the implementation

and configuration of the client-side classifier are available for the

adversary. The adversary can figure out the type of classification

model, the classification algorithm and the feature extraction

method by leveraging various techniques. Considering the

advancement of modern analysis techniques, this assumption is

reasonable.

Some client-side classifiers, have introduced some defense

techniques to prevent the adversary from learning crucial

information. For example, GPPF employs the cryptography

technique to protect the classification model. Unfortunately, it is

proved to be ineffective to against classifier cracking (discussed

in Section 3 and 4).

2.2 Phishing and GPPF

According to the latest report [3] of Anti-Phishing Working

Group (APWG), phishing attacks remain widespread: the number

of unique phishing reports submitted to APWG during Q4 of 2014

was 197,252, and there is an increase of 18 percent from the

163,333 received in Q3. To minimize the impact of phishing

attacks, a variety of methods have been proposed to detect

phishing pages, involving machine learning [39][52][56] or other

techniques [24] [25] [27] [31] [33] [46] [57] [58].

Modern web browsers also provide detection tools to assist end

users against phishing attacks. Safe Browsing, a service offered by

Chrome, is aiming at providing not only blacklists of malicious

URLs but also a trained classifier (GPPF) which automatically

detects phishing pages as a countermeasure to the phishing

problem [4]. In Chrome, Safe Browsing serves as a guard when a

request comes, and the request URL will be checked before the

content is allowed to begin loading. The URL is checked against

two blacklists: malware and phishing. If the URL is matched with

the blacklists, Chrome will block the request and jump to a

warning page as shown in Figure 2. More importantly, for the

URL that is not present in the blacklists, Chrome will further

invoke GPPF to determine whether it is legitimate or phishing. In

practice, the phishing blacklist needs to be updated constantly and

users will be vulnerable to newly created phishing websites.

GPPF acts as an indispensable role in protecting end users from

unknown phishing pages.

In fact, GPPF is the local version of a Google’s internal classifier.

Google developed and trained a scalable machine learning

classifier in its servers to detect phishing websites and use it to

maintain Google’s phishing blacklist automatically [56]. Training

the classifier is a constant offline process. The training process

uses a sample of roughly ten million URLs analyzed over the past

three months as the training dataset. The number of URLs from a

single domain is also limit to 150 per week to prevent a single

domain from having too much contribution to the classification

model. Consequently, the adversaries don’t have an opportunity

to alter the training dataset enough to make the trained classifier

misclassify phishing pages as legitimate. However, to provide the

real-time detection of unknown phishing pages, the trained

classifier is also implemented as a part of Safe Browsing, i.e.,

GPPF. As an internal component of the Chrome browser, GPPF is

completely deployed and running in the user environment. This

actually allows the adversary to freely analyze its implementation

and configurations to construct more sophisticated phishing

attacks.

According to the report of StatCounter [5], from Aug 2014 to Aug

2015, Chrome shares an average of 48.6% market and is the most

popular web browser. In May 2015, Google announced that

Chrome has over one billion active users [1]. This means over one

billion users’ web surfing are protected by GPPF. Note that if a

phishing page can fool GPPF, it will have more chances to keep

away from the Google’s phishing blacklist. Furthermore, the

phishing blacklist provided by Google is also employed in Firefox

and Safari browsers, as well as by Internet Service Providers

(ISPs) [6]. We have reason to believe that the security breach of

GPPF will potentially impact many more people besides just the

users of Chrome.

3. CRACKING GPPF

There is very limited public information about the design and

implementation of GPPF. We choose to directly analyze the

development version of the Chrome browser, Chromium, to crack

GPPF. The cracking includes two main steps: (1) extracting the

classification model of GPPF from Chromium; and (2) decrypting

the hashed features of the model. It needs to be mentioned that

some sensitive details of the cracking are intentionally omitted

to prevent them from being used for malicious purposes.

3.1 Extracting the Classification Model

3.1.1 Classification Algorithm

The multi-process architecture that Chrome/Chromium adopts

helps it be more robust. According to a very brief description in

[4], we can know that Browser process will periodically fetch an

updated model from Google’s server and send it to every Render

process via an IPC channel. This allows the classification to be

done in the Render process, which will score the request page to

tell whether it is phishing or not.

Figure 2. Phishing warning page.

剩余11页未读，继续阅读

weixin_38552305

粉丝: 5
资源: 972

谷歌钓鱼页面过滤器安全挑战：逃避分类器攻击研究

英语一真题阅读单词【背诵本】(2)_20211018_161437.pdf

Phantom-Evasion:Python防病毒规避工具

javasnmp源码-IDS-Evasion:规避Snort入侵检测系统

Python-AV-Evasion:执行shellcode并使用python逃避AV检测

java猜数字源码-XSS-Filter-Evasion-Cheat-Sheet-CN:XSS_Filter_Evasion_Cheat_Sh

XSS-Filter-Evasion-Cheat-Sheet

evasion1.53

VirusEvasion:Virus Evasion 是一款病毒混淆工具

104_XSS_Filter_Evasion_And_WAF_Bypassing.pdf

Space-Evasion

最新资源