破解微软验证码：新颖字符分割技术揭示文本型安全挑战

4星 · 超过85%的资源需积分: 10 55 浏览量更新于2024-09-17 1 收藏 1.08MB PDF 举报

验证码识别的文献深入探讨了一种广泛应用于在线安全保护中的关键技术。文本验证码是此类方案中最常见的形式，它通常要求用户解决一个文字识别任务，目的是通过让机器难以辨认来增强系统的安全性。然而，随着技术的发展，特别是神经网络等高级算法的进步，字符识别的准确率在不断攀升，这对验证码设计提出了新的挑战。论文《一种低成本攻击微软验证码》（A Low-cost Attack on a Microsoft CAPTCHA）由Jeff Yan和Ahmad Salah El Ahmad两位学者自纽卡斯尔大学计算机科学系撰写。他们研究的重点是针对微软自2002年起在Hotmail、MSN和Windows Live等服务中广泛应用的文本验证码进行破解。微软的验证码设计初衷是为了抵抗基于分割的字符识别攻击，设计师们经过多年的调整和优化，使其在一定程度上具备了分割抵抗能力。该研究提出了一系列创新的字符分割技术，这些技术对包括微软在内的多家大型互联网公司（如Yahoo和Google）的验证码系统构成了潜在威胁。尽管这些验证码旨在提高安全性，但作者展示了即使面对精心设计的分割抵抗策略，通过他们的新方法仍能以相对较低的成本实现较高的识别成功率。论文不仅分析了现有的验证码设计弱点，还可能探讨了如何利用深度学习、图像处理等技术绕过分割策略，以及如何改进反攻击策略以提升验证码的健壮性。此外，它还可能涉及如何在实际应用中评估攻击的可行性和防御措施的有效性。这篇关于验证码识别的文献提供了一个重要的视角，展示了当前在对抗文本验证码挑战中的最新进展和攻防战，对于验证码设计者、安全研究人员和开发者来说，这是一份值得深入研究的重要参考资料。

with machine learning algorithms, achieving a success rate from

4.89% to 66.2%.

Our own early work [14] has broken a number of CAPTCHAs

(including those hosted at Captchaservice.org, a web service

specialised for CAPTCHA generation) with almost 100% success

by simply counting the number of pixels of each segmented

character, although these schemes were all resistant to the best

OCR software on the market. In contrast to other work that relied

on sophisticated computer vision or machine learning algorithms,

this study used only simple pattern recognition algorithms but

exploited fatal design errors that were discovered in each scheme.

This is one of the few work examining the robustness of

CAPTCHA from the security angle.

PWNtcha [7] is an excellent web page that aims to “demonstrate

the inefficiency of many CAPTCHA implementations”. It briefly

comments on the weaknesses of about a dozen simple

CAPTCHAs, which were claimed to be broken with a success

ranging from 49% to 100%. However, no technical detail of the

attacks was publicly available. Many more CAPTCHAs were also

commented at this site. For example, both the MSN scheme and a

Yahoo CAPTCHA that will be discussed in this paper (i.e. Yahoo

Scheme 1 in Section 6.1) were regarded by this site as “very

good” and difficult to break.

Two interesting algorithms were proposed in [19] to amplify the

skill gap between humans and computers. The algorithms could

improve systems security for text-based CAPTCHAs, but are

orthogonal to this paper. (In this paper, we do not discuss other

types of CAPTCHAs such as image-based ones. For those who

are interested, an overview of image-based CAPTCHAs can be

found in [19].)

Usability and robustness are two fundamental issues with

CAPTCHAs, and they often interconnect with each other. In [21],

we examined usability issues that should be considered and

addressed in the design of CAPTCHAs, and discussed subtle

implications some of the issues can have on robustness.

One last note: a survey on CAPTCHAs research (including the

design of most early notable schemes) can be found in [ 13], and

the limitations of defending against bots with CAPTCHAs

(including protocol-level attacks) were discussed in [ 15].

3. THE MSN SCHEME

Fig 1 shows some sample challenges generated by the MSN

CAPTCHA scheme. We have no access to the codebase of the

MSN scheme, so we collected from Microsoft’s website 100

random samples that were generated in real time online at [16].

By studying [4, 5] and the samples we collected, we observed that

the MSN scheme (as deployed) has the following characteristics.

Fig 1. The MSN CAPTCHA: 4 sample challenges.

• Eight characters are used in each challenge;

• Only upper case letters and digits are used.

• Foreground (i.e. challenge text) is dark blue and background

light gray.

• Warping (both local and global) is used for character

distortion.

Local warp produces “small ripples, waves and elastic

deformations along the pixels of the character”, and it foils

“feature-based algorithms which may use character thickness

or serif features to detect and recognise characters” [6].

Characters in the first and second rows of Table 1 are largely

distorted by local warping.

Global warp generates character-level, elastic deformations

to foil template matching algorithms for character detection

and recognition. Characters in the third and fourth rows of

Table 1 are largely distorted by global warping.

• The following random arcs of different thicknesses are used

as the main anti-segmentation measure.

o Thick foreground arcs: These arcs are of foreground

color. Their thickness can be the same as the thick

portions of characters. They do not directly intersect

with any characters, so they are also called “non-

intersecting arcs”.

o Thin foreground arcs: These arcs are of foreground

color. Although they are typically not as thick as the

above type of arcs, their thickness can be the same as

the thin portions of characters. They intersect with thick

arcs, characters or both, and thus also called

“intersecting thin arcs”.

o Thin background arcs: These arcs are thin and of

background color. They cut through characters and

remove some character content (pixels).

Both local and global warping is commonly used for distortion in

text-based CAPTCHAs. Many schemes use background textures

and meshes in foreground and background colors as clutter to

increase robustness. However, random arcs of different

thicknesses are used as clutter in the MSN scheme. The rationale

was as follows. These arcs are themselves good candidates for

false characters. The mix of random arcs and characters would

confuse state of the art segmentation methods, providing strong

segmentation resistance [5].

4. A SEGMENTATION ATTACK

We have developed a low-cost attack that can effectively and

efficiently segment challenges generated by the MSN scheme.

Specifically, our attack achieves the following:

• Identify and remove random arcs

• Identify all character locations in the right order; in other

words, divide each challenge into 8 ordered segments, each

containing a single character.

Our attack is built on observing and analysing the 100 random

samples we collected – this is a “sample set”. The effectiveness of

this attack was tested not only on the sample set, but also on a

large test set of 500 random samples – the design of the attack

used no prior knowledge about any sample in this set. This

methodology follows the common practice in the fields such as

545

剩余11页未读，继续阅读

landiri

粉丝: 0
资源: 11

破解微软验证码：新颖字符分割技术揭示文本型安全挑战

开题报告基于神经网络的车牌识别研究-开题.doc

笛卡尔割圆法验证码（英文文献）

验证码资料文献

卷积神经网络的验证码识别文献

验证码识别资料

【验证码识别】基于matlab机器视觉数字验证码识别【含Matlab源码 7469期】.mp4

【验证码识别】基于matlab机器视觉数字验证码识别【含Matlab源码 7469期】.md

【验证码识别】基于matlab CNN卷积神经网络验证码识别【含Matlab源码 098期】.md

【验证码识别】基于matlab GUI不变矩验证码识别（带面板）【含Matlab源码 095期】.mp4

【验证码识别】基于matlab GUI不变矩验证码识别（带面板）【含Matlab源码 095期】.md

最新资源