x86/x64二进制文件全尺寸解析深入分析

93 浏览量更新于2024-07-14 收藏 421KB PDF 举报

"An In-Depth Analysis of Disassembly on Full-Scale x86-x64 Binaries" 这篇论文是2016年USENIX安全 symposium的会议论文，讨论了对full-scale x86-x64二进制文件的深入分析和反汇编技术。论文的作者来自Vrije Universiteit Amsterdam和Lastline, Inc.，他们对static disassembly技术进行了深入研究，分析了其在real software中的应用，特别是对binary protection schemes的影响。知识点一：静态反汇编技术的挑战静态反汇编是指将机器代码转换为高级语言代码的过程，这是一项非常复杂的技术挑战。论文作者指出，静态反汇编技术仍然是一个unsolved problem，即使在现代计算机科学领域中，这项技术仍然很难实现。知识点二：Full-Scale x86-x64 Binaries的反汇编论文的作者对full-scale x86-x64二进制文件进行了深入分析，讨论了对其进行反汇编的技术挑战和解决方案。他们指出，对于full-scale二进制文件，静态反汇编技术仍然存在很多问题和挑战。知识点三：Binary Protection Schemes 论文的作者还讨论了binary protection schemes在real software中的应用，包括对其进行反汇编和分析的技术挑战和解决方案。他们指出，对于binary protection schemes，静态反汇编技术仍然是一个非常复杂的技术挑战。知识点四：Disassembly技术在Computer Science领域中的应用论文的作者讨论了disassembly技术在Computer Science领域中的应用，包括对二进制文件的分析、反汇编和保护。他们指出，disassembly技术在modern computer science领域中扮演着非常重要的角色。知识点五：Static Disassembly技术的未来发展论文的作者讨论了static disassembly技术的未来发展方向，包括对其进行改进和优化的技术挑战和解决方案。他们指出，static disassembly技术仍然需要进一步的研究和发展，以满足modern computer science领域中的需求。这篇论文对full-scale x86-x64二进制文件的深入分析和反汇编技术进行了深入研究，讨论了对其进行反汇编的技术挑战和解决方案，并对disassembly技术在Computer Science领域中的应用进行了分析和讨论。

USENIX Association 25th USENIX Security Symposium 585

assemblers deviate from the traditional CFG; typically

by omitting indirect edges, and sometimes by deﬁning

a global CFG rather than per-function CFGs. Therefore,

we deﬁne the Interprocedural CFG (ICFG): the union of

all function-level CFGs, connected through interprocedu-

ral call and jump edges. This allows us to abstract from

the disassemblers’ varying CFG deﬁnitions, by focusing

our measurement on the coverage of basic blocks in the

ICFG. We pay special attention to hard-to-resolve basic

blocks, such as the heads of address-taken functions and

switch/case blocks reached via jump tables.

(5) Callgraph accuracy: The correctness of the digraph

G =(V

∪ V

, E

call

)

linking the set

of call sites to

the function starts

through call edges

call

⊆ V

×V

Similarly to the CFG, disassemblers deviate from the

traditional callgraph deﬁnition by including only direct

call edges. In our experiments, we therefore measure the

completeness of this direct callgraph, considering indirect

calls and tailcalls separately in our complex case analysis.

2.3 Complex Constructs

We also study the prevalence in real-world binaries of

complex corner cases which are often cited as particularly

harmful to disassembly [5, 23, 34].

(1) Overlapping/shared basic blocks: Basic blocks may

be shared between different functions, hindering disas-

semblers from properly separating these functions.

(2) Overlapping instructions: Since x86/x64 uses

variable-length instructions without any enforced memory

alignment, jumps can target any offset within a multi-byte

instruction. This allows the same code bytes to be in-

terpreted as multiple overlapping instructions, some of

which may be missed by disassemblers.

(3) Inline data and jump tables: Data bytes may be

mixed in with instructions in a code section. Examples of

potential inline data include jump tables or local constants.

Such data can cause false positive instructions, and can

desynchronize the instruction stream if the last few data

bytes are mistakenly interpreted as the start of a multi-

byte instruction. Disassembly then continues parsing this

instruction into the actual code bytes, losing track of the

instruction stream alignment.

(4) Switches/case blocks: Switches are a challenge for

basic block discovery, because the switch case blocks are

typically indirect jump targets (encoded in jump tables).

(5) Alignment bytes: Some code (i.e.,

nop

) or data

bytes may have no semantic meaning, serving only to

align other code for optimization of memory accesses.

Alignment bytes may cause desynchronization if they do

not encode valid instructions.

(6) Multi-entry functions: Functions may have multiple

basic blocks used as entry points, which can complicate

function start recognition.

<BB

cmp ecx, edx

jl <BB

jmp <BB

<BB

mov eax,[fptr+ecx]

call eax

<BB

mov eax,[fptr+edx]

call eax

<BB

cmp ecx, edx

jl <BB

jmp <BB

<BB

mov eax,[fptr+ecx]

call eax

<BB

mov eax,[fptr+edx]

call eax

Recursive Linear

Figure 1: Disassembly methods. Arrows show disassem-

bly ﬂow. Gray blocks show missed or corrupted code.

(7) Tail calls: In this common optimization, a function

ends not with a return, but with a jump to another function.

This makes it more difﬁcult for disassemblers to detect

where the optimized function ends.

2.4 Disassembly & Testing Environment

We conducted all disassembly experiments on an Intel

Core i5 4300U machine with 8GB of RAM, running

Ubuntu 15.04 with kernel 3.19.0-47. We compiled our

gcc

and

clang

test cases on this same machine. The

Visual Studio binaries were compiled on an Intel Core i7

3770 machine with 8GB of RAM, running Windows 10.

We tested nine popular industry and research dis-

assemblers: IDA Pro v6.7, Hopper v3.11.5, Dyninst

v9.1.0 [

], BAP v0.9.9 [

], ByteWeight v0.9.9 [

], Jakstab

v0.8.4 [

], angr v4.6.1.4 [

], PSI v1.1 [

] (the suc-

cessor of BinCFI [

]), and objdump v2.22. ByteWeight

yields only function starts, while Dyninst and PSI sup-

port only ELF binaries (for Dyninst, this is due to our

Linux testing environment). Jakstab supports only x86

PE binaries. We omit angr results for x86, as angr is opti-

mized for x64. PSI is based on objdump, with added error

correction. Section 3 shows that PSI (and all linear dis-

assemblers) perform equivalently to objdump; therefore,

we group these under the name linear disassembly.

All others are recursive descent disassemblers, illus-

trated in Figure 1. These follow control ﬂow to avoid

desynchronization by inline data, and to discover com-

plex cases like overlapping instructions. In contrast, linear

disassemblers like objdump simply decode all code bytes

consecutively, and may be confused by inline data, possi-

bly causing garbled code like

in the ﬁgure. Recursive

disassemblers avoid this problem, but may miss indirect

control ﬂow targets, such as f

and f

in the ﬁgure.

剩余18页未读，继续阅读

weixin_38643212

粉丝: 3
资源: 931

x86/x64二进制文件全尺寸解析深入分析

IC-2720双频FM对讲机维修手册：介绍、注意事项

解决IDA sp-analysis failed错误及结构体管理技巧

ARM32到X86_64二进制转换技术探究

Disassembly-and-Reassembly-of-a-Dell-System-Unit:探索2000年代中期计算机组成部分的学校项目

Sonic-2-Simon-Wai-Disassembly:支持SonLVL的Sonic 2 Simon Wai拆卸

x86 Disassembly

Donkey-Kong-NES-Disassembly:拆卸金刚街机游戏减去水泥厂

X86_Disassembly.pdf

Art Of Disassembly

The Art of Disassembly

最新资源