TAJ：工业级Web应用的有效污点分析

需积分: 10 95 浏览量更新于2024-09-12 收藏 431KB PDF 举报

"这篇文档是Omer Tripp, Marco Pistoia, Stephen Fink, Manu Sridharan和Omri Weisman合作的研究成果，他们在2009年的PLDI会议上发表了《TAJ：有效的Web应用程序污点分析》。文章探讨了污点分析在Web应用安全中的应用，特别是如何通过这种技术来检测常见的安全漏洞。" 污点分析是一种信息流分析方法，主要关注的是从不可信源（如用户输入）传递到敏感操作的值。在Web应用程序中，由于用户输入经常与服务器端的敏感操作交互，因此污点分析对于检测潜在的安全漏洞至关重要。这些漏洞可能包括SQL注入、跨站脚本攻击（XSS）等。然而，大多数静态污点分析工具在应对大型工业级Web应用、处理必要的Web应用代码工件以及生成适用于多种攻击向量报告方面存在局限。 TAJ（静态Java污点分析）是为满足工业级应用需求而设计和实现的静态污点分析工具。TAJ的一个关键特性是其规模可扩展性，能够分析几乎任何大小的应用程序。这得益于它采用的一系列技术，这些技术旨在解决现有工具在处理复杂性和效率上的挑战。TAJ不仅考虑了Web应用的模棱两可代码元素，还提供了生成对各种攻击向量有洞察力的可消费报告的能力，这对于开发人员理解和修复潜在安全问题至关重要。 TAJ的实现和设计可能包括以下关键技术： 1. **高效的数据流分析**：为了处理大规模的代码，TAJ可能采用了优化的数据流框架，快速追踪和标记可疑的数据流路径。 2. **动态上下文感知**：考虑到Web应用的动态特性，TAJ可能支持动态上下文敏感分析，以更精确地追踪不同上下文中的数据流。 3. **自适应的分析策略**：可能包含智能分析策略，以适应不同类型的Web应用和攻击向量，比如针对特定漏洞的定制分析。 4. **精确性与效率的平衡**：TAJ在保持高精度的同时，通过剪枝和抽象等技术提高了分析速度。 5. **报告生成机制**：TAJ的报告功能可能允许开发人员直观地理解漏洞的位置、类型和可能的影响，便于修复。 TAJ的研究工作强调了污点分析在实际Web应用安全中的实用性和必要性，同时展示了如何克服当前工具的局限，以实现更强大、更适用于工业环境的解决方案。

“sanitizers”, and S

is a set of “sinks”. A source is a method whose

return value is considered tainted, or untrusted.

A sanitizer is a

method that manipulates its input to produce taint-free output. A

sink is a pair (m, P ), where m is a method that performs security-

sensitive computations and P contains those parameters of m that

are vulnerable to attack via tainted data. TAJ statically checks that

no value derived from a source is passed as an input to a sink unless

it ﬁrst undergoes appropriate sanitization.

TAJ consists of two stages. The ﬁrst phase performs pointer

analysis and builds a call graph. The second phase runs a novel

slicing algorithm to track tainted data.

3.1 Pointer Analysis and Call-graph Construction

The TAJ architecture supports any preliminary pointer analysis

and call graph construction algorithm. The current implementation

relies on a context-sensitive variant of Andersen’s analysis [1] with

on-the-ﬂy call graph construction.

TAJ employs a custom context-sensitivity policy tuned to ad-

dress precision and performance issues that arise when analyzing

real codes. Most methods are analyzed with one level of object

sensitivity [18; 22], in which the context of a method invocation

consists of the invoked method and the object abstraction represent-

ing the receiver. The policy also includes careful treatment of col-

lections and security-related Application Programming Interfaces

(APIs). In particular:

•

Java collection classes are treated with unlimited-depth (up to

recursion) object-sensitivity. This means that all internal objects

of a collection are cloned for each collection instance. As a

result, the contents of Java collections from different allocation

sites are fully disambiguated, eliminating a major source of

pointer-analysis pollution.

•

The pointer analysis adds one level of call-string context to

calls to library factory methods. These methods tend to pollute

pointer-ﬂow precision if handled without context sensitivity,

because all the objects created by a factory method share the

same allocation site.

•

Taint-speciﬁc APIs, such as sources and sinks, are also analyzed

with one level of call-string context. This is necessary due to the

special role these APIs play in taint propagation. In the example

given in Figure 1, this context allows TAJ to disambiguate the

two calls to source method getParameter at lines 13 and 14,

even though they are performed on the same receiver object.

As for other dimensions of precision, the pointer analysis of TAJ

is ﬁeld-sensitive [29]. Furthermore, it relies on a Static-Single As-

signment (SSA) register-transfer language representation of each

method [6], which gives a measure of ﬂow sensitivity for points-to

sets of local variables [14].

3.2 Hybrid Thin Slicing

Using the preliminary pointer analysis and call graph, the second

phase of TAJ tracks data ﬂow from tainted sources using hybrid

thin slicing, a novel thin-slicing algorithm [33]. Hybrid thin slicing

combines ﬂow-insensitive reasoning about ﬂow through the heap

with ﬂow- and context-sensitive tracking of ﬂow through local

variables.

Thin slicing [33] is a good basis for taint analysis since a thin

slice typically captures the statements most relevant to a tainted

ﬂow. A forward thin slice from a statement t consists of those state-

ments that are data-dependent on t [16], excluding base-pointer de-

pendencies: for a store statement x.f=y, dependencies due to

Some methods, such as RandomAccessFile.readFully in package

java.io, receive parameters by reference and taint their internal state. TAJ

also supports the speciﬁcation of such methods as sources.

uses of the base pointer x are ignored; loads are handled similarly.

Thin slices are typically much smaller and more understandable

than program slices. Note that in [33], the term “thin slice” refers

to a backward thin slice, in which data dependencies are consid-

ered in the opposite direction, while here we use this term to mean

a forward thin slice.

Thin slices do not include control dependencies, and hence

TAJ does not track the corresponding indirect information ﬂow.

Experience shows that attacks based on control dependence are rare

and complex, and thus less important than direct vulnerabilities.

Hybrid thin slicing combines aspects of the previously proposed

context-sensitive (CS) and context-insensitive (CI) thin slicing al-

gorithms [33], achieving a better tradeoff between scalability and

precision for taint analysis. Like CS thin slicing, hybrid thin slicing

tracks ﬂow through local variables with ﬂow and context sensitiv-

ity. However, unlike CS thin slicing, the hybrid technique does not

track heap data dependencies via additional method parameters and

return values, as this treatment is a scalability bottleneck [33]. This

handling of heap dependencies by CS thin slicing is also unsound

for multi-threaded programs since it is partially ﬂow-sensitive, and

many of our target Web applications are multi-threaded. Instead,

hybrid thin slicing tracks heap data dependencies via direct edges

from stores to loads. Such edges are added based on the prelim-

inary pointer analysis, as in CI thin slicing. As we shall show in

Section 7, the hybrid approach yields better scalability than CS thin

slicing and better precision than CI thin slicing (with better perfor-

mance than CI in some cases).

Hybrid thin slicing performs a demand-driven traversal over a

special System Dependence Graph (SDG) [16] called the Hybrid

SDG (HSDG). Nodes in an HSDG correspond to load and store

statements in the program, as well as call statements representing

source and sink methods.

An HSDG has two types of edges representing data dependence:

“direct edges” and “summary edges”. A direct edge connects a

store to a load and represents a data dependence computed by a

preliminary pointer analysis (as in CI thin slicing [33]). A sum-

mary edge can connect s to t if t is transitively data-dependent on

s purely via ﬂow through local variables; ﬂow through the heap

is excluded. Summary edges are obtained on demand by comput-

ing context-sensitive reachability over a no-heap SDG—an SDG

that elides all control- and data-dependence edges reﬂecting ﬂow

through heap locations. Note that the no-heap SDG includes no

successor edges for sanitizer return and sink call statements, since

we need not track ﬂow beyond these statements.

TAJ computes the successors of a statement x in the HSDG on

demand, as follows:

•

If x = st is a store statement, then precomputed points-to

information is used to connect st to all load statements l such

that the base pointers of st and l are may-aliased.

•

Otherwise, a context-sensitive slice is computed from program

point x on the no-heap SDG using the Reps-Horwitz-Sagiv

(RHS) tabulation algorithm [28]. All the statements in the slice

corresponding to store instructions and sink invocations are

registered as the successors of x.

Figure 2 shows an example, which displays the slice computed on

the no-heap SDG corresponding to a load-to-store summary edge

in the HSDG.

To ﬁnd tainted ﬂows, we compute reachability in the HSDG

from each source-call statement s, adding the necessary direct and

summary edges on demand. The nodes reachable from s represent

the load, store, and sink statements directly data-dependent on s

(ignoring base-pointer data dependencies). Our ﬁnal output recon-

structs thin slices from s to sensitive sinks via the HSDG and rele-

vant no-heap SDGs.

剩余10页未读，继续阅读

君的名字

粉丝: 7w+

TAJ：工业级Web应用的有效污点分析

pyt工具：Python Web安全漏洞静态分析神器

Taint PHP扩展：有效检测XSS与SQL注入

Taintgrind：Valgrind内存安全分析的污染跟踪新插件

Taint analysis

Taint-Analysis:Java程序的中间表示形式的静态上下文不相关的污点分析

Automatic Inference of Search Patterns for Taint-Style Vulnerabilities

taint：Taint是一个PHP扩展，用于检测XSS代码

taint-entropy-panda

jm-taint-strace:从 code.google.compjm-taint-strace 自动导出

_IDBPC-Taint-Fix.zip

最新资源