Grammar-based Whitebox Fuzzing
Patrice Godefroid
Microsoft Research
Redmond, WA, USA
pg@microsoft.com
Adam Kie
˙
zun
Massachusetts Institute of
Technology
Computer Science and Artificial
Intelligence Laboratory
Cambridge, MA, USA
akiezun@mit.edu
Michael Y. L evin
Microsoft Center for Software
Excellence
Redmond, WA, USA
mlevin@microsoft.com
Abstract
Whitebox fuzzing i s a form of automatic dynamic test gen-
eration, based on symbolic execution and constraint solving,
designed for security testing of large applications. Unfortu-
nately, the current effectiveness of whitebox fuzzing is lim-
ited when testing applications with highly-structured inputs,
such as compilers and interpreters. These applications pro-
cess their inputs in stages, such as lexing, parsing and evalu-
ation. Due to the enormous number of control paths in e arly
processing stages, whitebox fuzzing rarely reaches parts of
the application beyond those first stages.
In this paper, we study how to enhance whitebox fuzzing
of comp lex structured-input applications with a grammar-
based specification of their valid inputs. We present a novel
dynamic test g eneration algorithm where symbolic execu-
tion d irectly generates grammar-based constraints whose
satisfiability is checked using a custom grammar-based con-
straint solver. We have impl emented this algorithm and eval-
uated it on a large security-critical application, the JavaScript
interpreter of Internet Explorer 7 (IE7). Results of our ex-
periments show that grammar-based whitebox fuzzing ex-
plores deeper program paths and avoids dead-ends due to
non-parsable inputs. Compared to regular whitebox fuzzing,
grammar-based whitebox fuzzing increased coverage of the
code generation module of the IE7 JavaScript interpreter
from 53% to 81% while us ing three times fewer tests.
Categories and Subject Descriptors D.2.4 [Software Engi-
neering]: Software/Program Ve rification; D.2.5 [Software En-
gineering]: Testing and Debugging; F.3.1 [Logics and Mean-
ings of Programs]: Specifying and Verifying and Reasoning
about Programs
General Terms Veri fication, Algorithms, Reliability
Keywords Software Testing, Automatic Test Generation,
Grammars, Program Verification
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
PLDI’08,
June 7–13, 2008, Tucson, Arizona, USA.
Copyright
c
2008 ACM 978-1-59593-860-2/08/06. . . $5.00
1. Introduction
Blackbox fuzzing is a form of testing, heavily used for finding
security vulnerabilities in software. It simply consists in ran-
domly modifying well-formed inputs and testing the result-
ing variants [3, 12]. Blackbox fuzzing sometimes uses gram-
mars to generate the well-formed inputs, as well as to encode
application-specific knowledge and test heuristics for guid-
ing the generation of input variants [1, 37].
A recently propos ed alternative, whitebox fuzzing [ 16],
combines fuzz testing with dynamic test generation [6, 14].
Whitebox fuzzing executes the program under test with an
initial, well-formed input, both concretely and symbolically.
During the execution of conditional statements, symbolic
execution creates constraints on program inputs. Those con-
straints capture how the program uses its inputs, and satis-
fying assignments for the negation of each constraint define
new inputs that exercise different control paths. Whitebox
fuzzing repeats this process for the newly created inputs,
with the goal of exercising many different control paths of
the program under test and finding bugs as fast as possi-
ble us ing various search heuristics. In practice, the search is
usually incomplete because the number of f easible control
paths may be astronomical (even infinite) and because the
precision of symbolic execution, constraint generation and
solving is inherently limited. Nevertheless, whitebox fuzzing
has been shown to be very effective in finding new s ecurity
vulnerabilities in several applications.
Unfortunately, the current effectiveness of whitebox
fuzzing is limited when tes ting applications with highly-
structured inputs. Examples of such applications are com-
pilers and interpreters. These applications process their in-
puts in stages, such as lexing, parsing and evaluation. Due
to the enormous number of control paths in early process-
ing stages, whitebox fuzzing rarely reaches parts of the
application beyo nd these first stages. For instance, there
are many possible sequences of blank-spaces/tabs/carriage-
returns/etc. separating tokens in most structured languages,
each corresponding to a different control path in the lexer. In
addition to path explosion, symbolic execution itself may be
defeated already in the first processing stages. For instance,
lexers often detect language keywords by comparing their
pre-computed, hard-coded hash values with the hash values
of strings read from the input; this effectively prevents sym-
bolic execution and constraint solving from ever generating
input strings that match those keywords since hash functions
cannot be inversed (i.e., given a constraint x == hash(y)
and a value f or x, one cannot compute a value for y that sat-
isfies this constraint).