GenerativeAdversarialImitationLearning生成对抗的模仿学习_GenerativeAdversialImitationlearning

Imitation

需积分: 48 116 浏览量更新于2023-03-16 评论收藏 433KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

Generative Adversarial Imitation Learning

Jonathan Ho

OpenAI

hoj@openai.com

Stefano Ermon

Stanford University

ermon@cs.stanford.edu

Abstract

Consider learning a policy from example expert behavior, without interaction with

the expert or access to a reinforcement signal. One approach is to recover the

expert’s cost function with inverse reinforcement learning, then extract a policy

from that cost function with reinforcement learning. This approach is indirect

and can be slow. We propose a new general framework for directly extracting a

policy from data as if it were obtained by reinforcement learning following inverse

reinforcement learning. We show that a certain instantiation of our framework

draws an analogy between imitation learning and generative adversarial networks,

from which we derive a model-free imitation learning algorithm that obtains signif-

icant performance gains over existing model-free methods in imitating complex

behaviors in large, high-dimensional environments.

1 Introduction

We are interested in a speciﬁc setting of imitation learning—the problem of learning to perform a

task from expert demonstrations—in which the learner is given only samples of trajectories from

the expert, is not allowed to query the expert for more data while training, and is not provided a

reinforcement signal of any kind. There are two main approaches suitable for this setting: behavioral

cloning [

], which learns a policy as a supervised learning problem over state-action pairs from

expert trajectories; and inverse reinforcement learning [

], which ﬁnds a cost function under

which the expert is uniquely optimal.

Behavioral cloning, while appealingly simple, only tends to succeed with large amounts of data, due

to compounding error caused by covariate shift [

]. Inverse reinforcement learning (IRL), on

the other hand, learns a cost function that prioritizes entire trajectories over others, so compounding

error, a problem for methods that ﬁt single-timestep decisions, is not an issue. Accordingly, IRL has

succeeded in a wide range of problems, from predicting behaviors of taxi drivers [

] to planning

footsteps for quadruped robots [20].

Unfortunately, many IRL algorithms are extremely expensive to run, requiring reinforcement learning

in an inner loop. Scaling IRL methods to large environments has thus been the focus of much recent

work [

]. Fundamentally, however, IRL learns a cost function, which explains expert behavior

but does not directly tell the learner how to act. Given that the learner’s true goal often is to take

actions imitating the expert—indeed, many IRL algorithms are evaluated on the quality of the optimal

actions of the costs they learn—why, then, must we learn a cost function, if doing so possibly incurs

signiﬁcant computational expense yet fails to directly yield actions?

We desire an algorithm that tells us explicitly how to act by directly learning a policy. To develop such

an algorithm, we begin in Section 3, where we characterize the policy given by running reinforcement

learning on a cost function learned by maximum causal entropy IRL [

]. Our characterization

introduces a framework for directly learning policies from data, bypassing any intermediate IRL step.

Then, we instantiate our framework in Sections 4 and 5 with a new model-free imitation learning

algorithm. We show that our resulting algorithm is intimately connected to generative adversarial

30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余8页未读，立即下载

快乐地笑

粉丝: 58
资源: 14

会员权益专享

Generative Adversarial Imitation Learning 生成对抗的模仿学习

评论0

会员权益专享

最新资源

Generative Adversarial Imitation Learning 生成对抗的模仿学习

评论0

Generative Adversarial Imitation Learning.pdf

Learning Generative Adversarial Networks 无水印pdf转化版

基于生成对抗网络的模仿学习综述（计算机学报）.pdf

generative adversarial network history

conditional generative adversarial nets

generative adversarial nets译

Generative Adversarial zero-shot

self-attention generative adversarial networks

GAN（Generative Adversarial Networks）是什么

Feature Statistics Mixing Regularization for Generative Adversarial Networks他的模型构成

请帮我综述当前模仿学习领域的发展情况并给我相关文献

spectral normalization for generative adversarial networks

讲一下 Generative Adversarial Networks的方法步骤

Feature Statistics Mixing Regularization for Generative Adversarial Networks中优化器是怎么设计的

gan生成对抗网络原理是什么

Feature Statistics Mixing Regularization for Generative Adversarial Networks进行了几次混合

给出Conditional Generative Adversarial Network 的判别器和生成器训练部分代码

2020年的国际会议ICML在哪里举行？Unsupervised representation learning with deep convolutional generative adversarial networks发表在哪里？

Feature Statistics Mixing Regularization for Generative Adversarial Networks中正则化是怎么进行的

会员权益专享

最新资源