概率模型检查与自主系统分析

版权申诉

185 浏览量更新于2024-07-06 收藏 2.46MB PDF 举报

"概率模型检验与自治_Probabilistic Model Checking and Autonomy.pdf" 这篇论文深入探讨了概率模型检验在自治系统设计和控制中的应用，以及其与不确定或对抗性环境中的自主系统操作的关联。概率模型检验是一种形式化方法，用于自动验证系统模型是否满足给定的时态逻辑规范，并能合成最优控制策略。关键词包括：概率建模、时态逻辑、模型检验、策略合成、随机博弈和均衡。这些关键词揭示了论文涉及的主要概念和技术领域。首先，概率建模是构建能够反映系统行为的概率模型的过程，它允许在不确定性和随机性环境下分析系统的性能和行为。这种模型通常包含随机事件和决策点，可以用来模拟复杂的动态系统。时态逻辑是一种强大的逻辑框架，用于描述系统的动态行为，如“在某个时间点之后总是发生某事”或“有时可能发生某事”。在概率模型检验中，时态逻辑规范用于定义系统必须满足的属性。模型检验是自动化工具，用于验证给定模型是否符合指定的逻辑规范。对于概率模型，这涉及到计算模型在所有可能执行路径上的概率，以确定是否满足规范。策略合成是模型检验的一个扩展，它不仅检查模型是否满足规范，还能找出使系统行为最优化的控制策略。这在自主系统控制中尤其重要，因为它们需要在复杂环境中做出最佳决策。随机博弈是多代理系统分析的一种工具，其中每个代理都有自己的目标和策略。在这些博弈中，寻找纳什均衡（Nash equilibrium）至关重要，这是一种所有代理都不再有动机改变策略的状态，因为它对每个人都最优。论文还讨论了这些技术如何应用于多代理系统，这些系统中存在多个自主实体，每个实体都有自己的行为和目标。在这样的系统中，概率模型检验可以帮助理解和设计协调策略，以确保在不可预知的环境中达到预期的系统性能。 "概率模型检验与自治"这一主题涵盖了从形式化建模到控制策略优化的广泛领域，对于理解并开发能够在不确定和竞争环境中有效运作的自主系统具有重要意义。通过这种方式，我们可以提高系统的可靠性、安全性和效率。

Above, we assume that a property Φ for an MDP comprises a single probabilistic (P) or

reward (R) operator. The syntax also includes path (ψ) and reward (ρ) formulae, both

evaluated over paths, and propositional logic (φ) formulae, evaluated over states. The

intuitive meaning of the P and R operators, from the initial state of an MDP, is:

• P

p

[ ψ ] – the probability of a path satisfying path formula ψ satisﬁes the bound  p;

• R

q

[ ρ ] – the expected value of reward formula ρ, under reward structure r, satisﬁes

the bound  q.

A propositional formula φ is satisﬁed (or holds) in a state s if it evaluates to true in that

state, where an atomic proposition a is true if s is labelled with a (i.e., a ∈ L(s)) and the

logical connectives (¬, ∧) are interpreted in the usual way.

For path formulae ψ, the core temporal operators are:

• X ψ (next) – ψ is satisﬁed in the next state;

• ψ

(bounded until) – ψ

is satisﬁed within k steps, and ψ

is satisﬁed until

that point;

• ψ

U ψ

(until) – ψ

is eventually satisﬁed, and ψ

is satisﬁed until then.

As is standard in model checking, we use the equivalences F ψ ≡ true U ψ (eventually)

and G ψ ≡ ¬F ¬ψ (always). If we restrict the sub-formulae of a path formula to be atomic

propositions, then we get the following common property classes:

• F a (reachability) – eventually a stated labelled with a is reached;

• G a (invariance) – a labels all states;

• F

a (step-bounded reachability) – a labels a state within the ﬁrst k steps;

• G

a (step-bounded invariance) – a labels states for at least the ﬁrst k steps.

Without this restriction, path formulae allow temporal operators to be nested. In fact the

syntax of path formulae given in Deﬁnition 4 is that of linear temporal logic (LTL) (7).

LTL can express a range of useful property classes, including:

• G F ψ (recurrence) – ψ is satisﬁed inﬁnitely often;

• F G ψ (persistence) – eventually ψ is always satisﬁed;

• G (ψ

→ X ψ

) – whenever ψ

is satisﬁed, ψ

is satisﬁed in the next state;

• G (ψ

→ F ψ

) – whenever ψ

is satisﬁed, ψ

is satisﬁed in the future.

Finally, considering reward formulae ρ, the three key operators are:

• I

(instantaneous reward) – state reward at time step k;

• C

(bounded cumulative reward) – reward accumulated over k steps;

• F φ (reachability reward ) – reward accumulated until a state satisfying φ is reached.

Although omitted from the syntax here for simplicity, it is also common to generalise the

third case and consider the expected reward accumulated until some co-safe LTL formula

is satisﬁed. Intuitively, these are path formulae ψ whose satisfaction occurs within ﬁnite

time; examples include (F a

) ∧ (F a

) and F (a

∧ F a

), which require states labelled with

and a

to be reached, either in any order (ﬁrst case) or in a speciﬁed order (second case).

www.annualreviews.org

•

Probabilistic Model Checking and Autonomy 5

2.2. Probabilistic Model Checking of MDPs

Probabilistic model checking is an automated technique for constructing probabilistic mod-

els such as MDPs and then analysing them against behavioural speciﬁcations expressed

in temporal logic. It can be used either to verify that a speciﬁcation is always satisﬁed,

regardless of any adversarial behaviour, or to synthesise a strategy under whose control the

system’s behaviour can be guaranteed to satisfy a speciﬁcation.

These ideas are formalised below for the PRISM logic. We ﬁrst require the following

notation. Satisfaction of a path formula ψ can be represented by a random variable X

IPaths

→ R where X

(π) = 1 if path π satisﬁes ψ and 0 otherwise. For a reward structure

r and formula ρ, the random variable X

r,ρ

: IPaths

→ R is such that X

r,ρ

(π) equals the

state reward or accumulated reward corresponding to r and ρ for path π.

Verifying that an MDP M satisﬁes a formula Φ, denoted M |= Φ, is deﬁned as follows.

Deﬁnition 5 (Veriﬁcation problem for MDPs). The veriﬁcation problem is: given an MDP

M and a formula Φ, verify whether M |= Φ, deﬁned as:

M |= P

p

[ ψ ] ⇔ ∀σ ∈ Σ



)  p



M |= R

q

[ ρ ] ⇔ ∀σ ∈ Σ



r,ρ

)  q



In practice, we often solve a numerical veriﬁcation problem: given an MDP M, formula

opt=?

[ ψ ] or R

opt=?

[ ρ ], where opt ∈ {min, max}, compute E

opt

(X) where X = X

X = X

r,ρ

, respectively, and:

min

(X)

def

= inf

σ∈Σ

(X) and E

max

(X)

def

= sup

σ∈Σ

(X) .

Closely related is the strategy synthesis problem.

Deﬁnition 6 (Strategy synthesis problem for MDPs). The strategy synthesis problem is:

given an MDP M and formula Φ of the form P

p

[ ψ ] or R

q

[ ρ ], ﬁnd a strategy σ ∈ Σ

such

that Φ is satisﬁed in M under σ, i.e., such that E

)  p or E

r,ρ

)  q, respectively.

The numerical strategy synthesis problem is: given M and a formula of the form

opt=?

[ ψ ] or R

opt=?

[ ρ ], where opt ∈ {min, max}, ﬁnd an optimal strategy σ



∈ Σ

such

that E

(X) = E

opt

(X) for X = X

or X = X

r,ρ

, respectively.

For general path formulae, optimal strategies are ﬁnite-memory and deterministic. On the

other hand, for some common cases (e.g., the probability or expected accumulated reward

to reach a target), memoryless deterministic strategies are suﬃcient.

Example 2. Returning to the MDP from Example 1, veriﬁcation-style queries using the

PRISM logic include:

• P

>0.8

[ F

610

goal ] – under all possible strategies, the robot reaches its goal location

within 10 steps with probability at least 0.8;

• R

hazard

61.5

[ C

620

] – for all possible strategies, the expected number of times that the robot

enters the hazard location within the ﬁrst 20 steps is at most 1.5;

and examples of numerical queries include:

• P

max=?

[ ¬hazard U goal ] – what is the maximum probability that the goal can be

reached while avoiding the hazard location?

• R

steps

min=?

[ F goal ] – what is the minimum expected number of steps to reach the goal?

Above, we use the following reward structures: r

steps

, which assigns 1 to all state-action

pairs; and r

hazard

, which assigns 1 to all states labelled with atomic proposition hazard.

6 Kwiatkowska et al.

剩余25页未读，继续阅读

易小侠

粉丝: 6634
资源: 9万+

概率模型检查与自主系统分析

Daphne Koller_Probabilistic Graphical Models

BLOB_A Probabilistic Model for Recommendation that Combines Organic and Bandit S

蚁群寻屋的空间相关概率模型_A Spatially Dependent Probabilistic Model for Hous

点云拟合的概率超二次曲面_Probabilistic_Superquadrics_fitting_to_point_clouds

CMPPF_IEEE34.zip_IEEE34潮流计算_Probabilistic monte_概率 潮流_概率潮流_蒙特卡洛

2011_CVPR_Probabilistic simultaneous pose and non-rigid shape recovery

manual for Probabilistic model checking prism

solved_probabilistic_graphical_models_principles_and_techniques:尝试学习概率性知识

GSM.rar_GSM_Markov channel model_Probabilistic_gsm matlab_markov

PMAPS.rar_probabilistic power_无迹变换_概率潮流_电网_能源

最新资源

CMPPF_IEEE34.zip_IEEE34潮流计算_Probabilistic monte_概率潮流_概率潮流_蒙特卡洛