行为套件：强化学习的评估与基准

需积分: 9 28 浏览量更新于2024-07-09 收藏 2.91MB PDF 举报

"Behaviour Suite for Reinforcement Learning 是一个由DeepMind开发并开源的深度强化学习实验平台，旨在研究和评估RL代理的核心能力。这个平台在ICLR 2020会议上发布，提供了一系列精心设计的实验，以探索通用且高效的强化学习算法的关键问题。它包含清晰、有信息量且可扩展的环境，便于研究者分析和比较不同强化学习算法的表现。此外，bsuite还提供了一个自动化评估和分析工具，该工具已开源在github.com/deepmind/bsuite上，支持Python语言，易于集成到现有的研究项目中。通过bsuite，研究者可以进行可重复和可访问的研究，从而推动强化学习领域的发展，改进学习算法的设计。" Behaviour Suite for Reinforcement Learning (bsuite) 的核心目标有两个：首先，构建一系列实验来深入理解强化学习（RL）代理的关键能力，这些实验能够清晰、准确地反映出在设计广泛适用且高效的算法时所面临的问题。其次，通过这些共享的基准测试来研究和比较代理的行为，以便更好地理解和优化它们的学习性能。 bsuite中的实验涵盖了强化学习的不同方面，如探索与利用的平衡、学习速度、对环境变化的适应性以及长期规划的能力等。每个实验都经过精心设计，确保其能够捕获特定的理论或实践挑战，而且它们具有可扩展性，可以随着计算资源的增长而扩展，以测试算法在更复杂场景下的表现。 bsuite的开源代码库不仅提供了这些实验环境，还包含了评估和分析工具，使得研究者可以方便地对比不同算法在相同条件下的性能。这极大地促进了强化学习领域的研究可重复性和标准化，有助于减少由于评估方法不一致导致的误解或误导。通过bsuite，研究者能够更容易地识别出现有算法的弱点，并针对性地提出改进策略。这种标准化的评估框架也为学术界和工业界的合作提供了便利，共同推动强化学习技术的进步，尤其是在计算机视觉和其他相关领域的应用。 "Behaviour Suite for Reinforcement Learning"是一个重要的资源，它推动了强化学习基础研究的深度和广度，促进了算法的创新和发展，为解决复杂问题，如自动驾驶、游戏AI和机器人控制等提供了有力的工具。

Published as a conference paper at ICLR 2020

progress in deep RL through surfacing dozens of Atari 2600 games as learning environments

(Bellemare et al., 2013). Similar projects have been crucial to progress in continuous control

(Duan et al., 2016; Tassa et al., 2018), model-based RL (Wang et al., 2019) and even rich

3D games (Beattie et al., 2016). Performing well in these complex environments requires

the integration of many core agent capabilities. We might think of these benchmarks as

natural successors to ‘CartPole’ or ‘MountainCar’.

The Behaviour Suite for Reinforcement Learning oﬀers a complementary approach to exist-

ing benchmarks in RL, with several novel components:

1. bsuite experiments enforce a speciﬁc methodology for agent evaluation beyond just the

environment deﬁnition. This is crucial for scientiﬁc comparisons and something that has

become a major problem for many benchmark suites (Machado et al., 2017) (Section 2).

2. bsuite aims to isolate core capabilities with targeted ‘unit tests’, rather than integrate

general learning ability. Other benchmarks evolve by increasing complexity, bsuite aims

to remove all confounds from the core agent capabilities of interest (Section 3).

3. bsuite experiments are designed with an emphasis on scalability rather than ﬁnal per-

formance. Previous ‘unit tests’ (such as ‘Taxi’ or ‘RiverSwim’) are of ﬁxed size, bsuite

experiments are speciﬁcally designed to vary the complexity smoothly (Section 2).

4. github.com/deepmind/bsuite has an extraordinary emphasis on the ease of use, and

compatibility with RL agents not speciﬁcally designed for bsuite. Evaluating an agent

on bsuite is practical even for agents designed for a diﬀerent benchmark (Section 4).

2 Experiments

This section outlines the experiments included in the Behaviour Suite for Reinforcement

Learning 2019 release. In the context of bsuite, an experiment consists of three parts:

1. Environments: a ﬁxed set of environments determined by some parameters.

2. Interaction: a ﬁxed regime of agent/environment interaction (e.g. 100 episodes).

3. Analysis: a ﬁxed procedure that maps agent behaviour to results and plots.

One crucial part of each bsuite analysis deﬁnes a ‘score’ that maps agent performance on

the task to [0, 1]. This score allows for agent comparison ‘at a glance’, the Jupyter notebook

includes further detailed analysis for each experiment. All experiments in bsuite only

measure behavioural aspects of RL agents. This means that they only measure properties

that can be observed in the environment, and are not internal to the agent. It is this choice

that allows bsuite to easily generate and compare results across diﬀerent algorithms and

codebases. Researchers may still ﬁnd it useful to investigate internal aspects of their agents

on bsuite environments, but it is not part of the standard analysis.

Every current and future bsuite experiment should target some key issue in RL. We aim

for simple behavioural experiments, where agents that implement some concept well score

better than those that don’t. For an experiment to be included in bsuite it should embody

ﬁve key qualities:

• Targeted: performance in this task corresponds to a key issue in RL.

• Simple: strips away confounding/confusing factors in research.

• Challenging: pushes agents beyond the normal range.

• Scalable: provides insight on scalability, not performance on one environment.

• Fast: iteration from launch to results in under 30min on standard CPU.

Where our current experiments fall short, we see this as an opportunity to improve the

Behaviour Suite for Reinforcement Learning in future iterations. We can do this both

through replacing experiments with improved variants, and through broadening the scope

of issues that we consider.

We maintain the full description of each of our experiments through the code and accom-

panying documentation at github.com/deepmind/bsuite. In the following subsections, we

pick two bsuite experiments to review in detail: ‘memory length’ and ‘deep sea’, and review

these examples in detail. By presenting these experiments as examples, we can emphasize

what we think makes bsuite a valuable tool for investigating core RL issues. We do provide

a high level summary of all other current experiments in Appendix A.

剩余18页未读，继续阅读

潜夙

粉丝: 0
资源: 40

行为套件：强化学习的评估与基准

nature14540-Reinforcement learning improves behaviour from evaluative feedback

Machine+Learning+Methods+for+Behaviour+Analysis+and+Anomaly+Detection-2018.pdf

The Cucumber Book - Behaviour-Driven Development for Testers and Developers.pdf

Vuforia的Background Plane Behaviour代码

behaviourtree

Unity 客户端在处理帧同步逻辑的时候应该如何根据帧包刷新客户端各种Behaviour的Update

animator.AddBehaviour报错，正确代码

unity的AddBehaviour如何使用

unity用代码为动画add 已有的 behaviour代码部分

最新资源