This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
YANG et al.: EFFECTIVE QAR SCHEME AGAINST APT 3
B. Game-Theoretic Approach
Game theory has extensive applications in the cybersecurity
domain [17]. The game-theoretic approach to APT defense
dates to the seminal work by van Dijk et al. [18]. In this arti-
cle, protecting a system from APT is reduced to what is called
the timing game, in which the defender chooses proper time
points of investing defense resources to minimize the APT
impact at a lower cost. Later, the timing game was adapted to
a variety of realistic situations [19], [20]. In recent years, the
timing game has been successfully applied to the protection
of cloud storage systems against APT [21]–[23]. On the other
hand, Hu et al. [24] investigated an APT response problem
through the game-theoretic approach. In this article, based on
a differential equation governing the evolution of the com-
promise fraction of the system, the attacker’s benefit, as well
as the APT impact are estimated. On this basis, the original
problem boils down to a continuous-kernel game under the
Nash equilibrium solution concept, which is solved analyti-
cally and numerically. In all of the above work, the impact of
lateral movement is neglected at all.
Protecting interdependent systems from APT is of practi-
cal importance. Inspired by [24] and [25] handled an APT
response problem for interdependent systems in the frame-
work of game theory. In this article, based on a node-level
epidemic model capturing the effect of the response scheme on
the network’s expected state, the attacker’s expected benefit, as
well as the expected APT impact are estimated. On this basis,
the original problem is modeled as a continuous-kernel game
problem under the Nash equilibrium solution concept, which
is solved with a heuristic algorithm. Later, Yang et al. [26]
extended this article to the situation where the attack scheme
and the defense scheme are both varying over time, leading
to a differential game-theoretic model. In the two papers, the
impact of lateral movement is taken into full account.
The game-theoretic modeling of an APT defense problem
builds on the assumption that the defender knows the statisti-
cal distributions of some aspects of APT. For the purpose of
approximating the distributions, the defender needs to collect
massive real APT data, which is a challenging task.
C. Optimal Control Approach
Optimal control theory, which deals with the problem of
finding a control law for a given dynamical system such that
a certain optimality criterion is achieved [27], has widespread
applications in the cybersecurity domain [28]–[31]. In the
optimal control modeling of an APT defense problem, the
defender needs only to know the average values of only a few
parameters associated with APT. In practice, estimating the
values requires much less real APT data than fitting the above-
mentioned distributions does. As a matter of fact, these average
values can be estimated through APT attack-defense manoeu-
vres [32]. Therefore, the optimal control approach to APT
defense is more practical than its game-theoretic counterpart,
at the cost of achieving suboptimal solutions.
Li et al. [33] opened up the optimal control approach to APT
defense. In this article, based on a node-level epidemic model
capturing the effect of the recovery scheme on the network’s
expected state, an APT response problem is modeled as an
optimal control problem, which is solved theoretically. One
major weakness of this article is that it neglects the indispens-
able quarantine manipulations in the APT response process. In
practice, to prevent compromised systems from compromising
other systems through lateral movement, all the compromised
systems must be timely disconnected from the network.
In this article, we consider quarantine manipulations as
well as recovery manipulations in the APT response pro-
cess. Inspired by [34]–[36], we propose a node-level epidemic
model characterizing the effect of the QAR scheme on the
network’s expected state. On this basis, we reduce the QAR
problem to an optimal control problem. By comparison with
a set of heuristic QAR schemes, we find that the QAR
scheme obtained by solving the optimality system is satis-
factory. Therefore, our model and results are more practical
than those presented in [33].
Reference [26] is related to this article. In this article, it is
assumed that the defender knows a set of functions related
to the cost for launching an APT. In practice, these func-
tions are usually unknown to the defender. In this case, the
repair scheme proposed in this article is infeasible. In the
optimal control framework adopted in this article, this demand
is removed. As a result, our quarantine and repair scheme
is more practical than the above-mentioned repair scheme.
Additionally, Yang et al. [26] neglected the necessary quar-
antine manupulations in APT response at all. In contrast, this
article takes these manipulations into full account.
III. M
ODELING OF THE QAR PROBLEM
This section is dedicated to the modeling of the QAR
problem. First, we introduce a set of terms and notations.
Second, we establish an epidemic model capturing the effect
of the QAR scheme on the expected state of the network. Next,
we estimate the expected APT impact. Finally, we model the
QAR problem as an optimal control problem. See Table I for
a complete list of the major notations introduced in this article
and their meanings.
A. Terms and Notations
Consider an organization. Let the graph G = (V, E) denote
the business relationship between the organization’s employ-
ees, where (a) V ={v
1
, v
2
,...,v
N
} stands for the set of all
the computer systems (nodes) used by the employees and (b)
each edge {v
i
, v
j
}∈E stands for that the employee owning the
computer system v
i
has business relation with the employee
owning the computer system v
j
.LetA = (a
ij
)
N×N
denote
the adjacency matrix of G, i.e., a
ij
= 1 or 0 according as
{v
i
, v
j
}∈E or not.
Suppose the organization’s cybersecurity team (the
defender, for short) has found indicators of APT at a certain
time denoted t = 0 and will respond in the given time
horizon [0, T] by performing a series of QAR manipulations.
At any time in the response period, each and every node is
in one of three possible states: 1) uncompromised; 2) com-
promised; and 3) quarantined, where all uncompromised
and compromised nodes are on the network, all quarantined