![](https://csdnimg.cn/release/download_crawler_static/19670081/bg3.jpg)
and inject an eventhandler that is attached to the link tag (e. g.,
’ onmouseover=’alert(1)). To encode single quotes to
the HTML entity ', the parameter ENT_QUOTES must
be added to the function htmlentities().
1 $page = htmlentities($_GET['page']);
2 echo "<a href='$page'>click</a>";
Listing 2: Insufficient sanitization with htmlentities().
In our example, however, the application would still be vul-
nerable. Instead of breaking the markup, an attacker can abuse
the diversity of web browsers and inject a Javascript protocol
handler into the link (e. g., javascript:alert(1)). This
injection does not need any meta characters that are encoded
by htmlentities().
Note that there are several other scopes that need to be
considered when using sanitization. For example, when user
input is used within style and script tags, or within event-
handler attributes, additional sanitization is required. Previous
work missed to take the different scopes and their intrinsic
behaviors into account.
B. Intricacies of the PHP language
PHP is the fastest growing and most popular script lan-
guage for web applications. It is a highly dynamic language
with lots of complicated semantics [3] that are frequently used
by modern web applications [12]. In this section, we introduce
the most important language features our tool has to model
precisely in order to correctly identify the flow of tainted data
into sensitive sinks. In particular, the flow of tainted strings is
of interest for taint-style vulnerabilities.
1) Dynamic and Weak Typing: PHP is a dynamically typed
language and does not require an explicit declaration of vari-
ables. The variable type is inferred on the first assignment at
runtime. Additionally, PHP is a weakly typed language and its
variables are not bound to a specific data type. Thus, data types
can be mixed with other data types at runtime. In Listing 3 the
string test is evaluated to 0 to fit the mathematical operation
and added to 1. The integer result is stored in the variable
$var2 whose previous data type was string.
1 $var1 = 1; $var2 = 'test';
2 $var2 = $var1 + $var2; // 1
Listing 3: Addition of a string and an integer in PHP.
2) Variable Variables: Variables are usually introduced
with the dollar character followed by an alphanumeric, case-
sensitive name. However, in PHP the name can also be an
expression, for example retrieved from another variable or the
return value of a function call that is only known at runtime
(see Listing 4). This makes it extremely difficult to analyze
the PHP language statically.
1 $name = "x"; $x = "test";
2 echo $$name; // test
3 $y = ${getVar()};
Listing 4: Variable variables in PHP.
3) Dynamic Arrays: Arrays are hash-tables that map num-
bers or strings (referred to as keys) to values. The key name
can be omitted when initializing an array and generated at
runtime (see Listing 5). Furthermore, keys and values can be
dynamic, as well as the array name itself. When performing
a static analysis, it is a challenge to precisely model such a
dynamic array structure and the dynamic access to it.
1 $var = 6;
2 $arr = array('a', "4" => $var, 'foo' => 'c', 'd');
3 $arr[] = 'e';
4 // Array ([0] => a [4] => 6 [foo] => c [5] => d [6] => e)
5 print $arr[$var]; // e
Listing 5: Dynamically generated key names in an array.
4) Dynamic Constants: In PHP, it is possible to define
constant scalar values as in other programming languages like
C. However, the constant name can be dynamically defined by
the built-in function define() and dynamically accessed by
the built-in function constant(). Although a constant may
not change once it is defined, it is possible to define constants
conditionally in the program flow or dynamically generated
with user input.
5) Dynamic Functions: Several functions with the same
name can be defined conditionally by the developer. Thus, a
totally different function may be called depending on the pro-
gram flow. It is also possible to define a function B() within
another function A() that is only present during the execution
of A(). Further, the built-in functions func_get_arg()
and func_get_args() allow to dynamically fetch argu-
ments of the function call by index.
1 $name = 'step' .(int)$_GET['id'];
2 $name();
3 array_walk($arr = array(1), $name);
Listing 6: Dynamically built and executed function name.
Listing 6 illustrates two different possibilities to call a
function dynamically (Reflection). The function name is built
dynamically in line 1 and is only known at runtime. It is
called in line 2 by adding parenthesis to the variable $name
and used in line 3 as callback function. The built-in function
create_function() dynamically creates function code.
6) Dynamic Code: The eval operator and the built-in
function assert() allows to directly evaluate PHP code that
is passed as string to its first argument. Other functions such
as preg_replace() allow the execution of dynamic PHP
code when used with certain modifiers. Dynamically generated
code is very challenging to analyze if the executed PHP code is
only known at runtime and cannot be reconstructed statically.
Furthermore, it introduces critical security vulnerabilities.
7) Dynamic Includes: The code of large PHP projects is
often split into several files and directories. At runtime, the
code can be merged and executed conditionally. The PHP
operator include opens a specified file, evaluates its PHP
code, and returns to the code after the include operator. It
can be used as expression within any other expression. Further-
more, the file name of an inclusion can be built dynamically
which implies that it is challenging to reconstruct it statically
in complex applications. During static analysis it is crucial to
resolve all file inclusions to analyze the PHP code correctly.
Additionally, tainted data within the file name leads to a File
Inclusion vulnerability.
3