2.2. Probabilistic Model Checking of MDPs
Probabilistic model checking is an automated technique for constructing probabilistic mod-
els such as MDPs and then analysing them against behavioural specifications expressed
in temporal logic. It can be used either to verify that a specification is always satisfied,
regardless of any adversarial behaviour, or to synthesise a strategy under whose control the
system’s behaviour can be guaranteed to satisfy a specification.
These ideas are formalised below for the PRISM logic. We first require the following
notation. Satisfaction of a path formula ψ can be represented by a random variable X
ψ
:
IPaths
M
→ R where X
ψ
(π) = 1 if path π satisfies ψ and 0 otherwise. For a reward structure
r and formula ρ, the random variable X
r,ρ
: IPaths
M
→ R is such that X
r,ρ
(π) equals the
state reward or accumulated reward corresponding to r and ρ for path π.
Verifying that an MDP M satisfies a formula Φ, denoted M |= Φ, is defined as follows.
Definition 5 (Verification problem for MDPs). The verification problem is: given an MDP
M and a formula Φ, verify whether M |= Φ, defined as:
M |= P
p
[ ψ ] ⇔ ∀σ ∈ Σ
M
.
E
σ
M
(X
ψ
) p
M |= R
r
q
[ ρ ] ⇔ ∀σ ∈ Σ
M
.
E
σ
M
(X
r,ρ
) q
.
In practice, we often solve a numerical verification problem: given an MDP M, formula
P
opt=?
[ ψ ] or R
r
opt=?
[ ρ ], where opt ∈ {min, max}, compute E
opt
M
(X) where X = X
ψ
or
X = X
r,ρ
, respectively, and:
E
min
M
(X)
def
= inf
σ∈Σ
M
E
σ
M
(X) and E
max
M
(X)
def
= sup
σ∈Σ
M
E
σ
M
(X) .
Closely related is the strategy synthesis problem.
Definition 6 (Strategy synthesis problem for MDPs). The strategy synthesis problem is:
given an MDP M and formula Φ of the form P
p
[ ψ ] or R
r
q
[ ρ ], find a strategy σ ∈ Σ
M
such
that Φ is satisfied in M under σ, i.e., such that E
σ
M
(X
ψ
) p or E
σ
M
(X
r,ρ
) q, respectively.
The numerical strategy synthesis problem is: given M and a formula of the form
P
opt=?
[ ψ ] or R
r
opt=?
[ ρ ], where opt ∈ {min, max}, find an optimal strategy σ
∈ Σ
M
such
that E
σ
?
M
(X) = E
opt
M
(X) for X = X
ψ
or X = X
r,ρ
, respectively.
For general path formulae, optimal strategies are finite-memory and deterministic. On the
other hand, for some common cases (e.g., the probability or expected accumulated reward
to reach a target), memoryless deterministic strategies are sufficient.
Example 2. Returning to the MDP from Example 1, verification-style queries using the
PRISM logic include:
• P
>0.8
[ F
610
goal ] – under all possible strategies, the robot reaches its goal location
within 10 steps with probability at least 0.8;
• R
r
hazard
61.5
[ C
620
] – for all possible strategies, the expected number of times that the robot
enters the hazard location within the first 20 steps is at most 1.5;
and examples of numerical queries include:
• P
max=?
[ ¬hazard U goal ] – what is the maximum probability that the goal can be
reached while avoiding the hazard location?
• R
r
steps
min=?
[ F goal ] – what is the minimum expected number of steps to reach the goal?
Above, we use the following reward structures: r
steps
, which assigns 1 to all state-action
pairs; and r
hazard
, which assigns 1 to all states labelled with atomic proposition hazard.
6 Kwiatkowska et al.