Abstract—In this paper, a mutual information (MI) and
Extreme Learning Machine (ELM) based inverse reinforcement
learning (IRL) algorithm, which termed as MEIRL, is proposed
to construct nonlinear reward function. The basic idea of MIIRL
is that, similar to GPIRL, the reward function is learned by
using Gaussian process and the importance of each feature is
obtained by using automatic relevance determination (ARD).
Then mutual information is employed to evaluate the impact of
each feature to the reward function, based on which extreme
learning machine is introduced along with an adaptive model
construction procedure to choose the optimal subset of features
and the performance of the original GPIRL algorithm is
enhanced as well. Furthermore, to demonstrate the effectiveness
of MEIRL, a simulation called highway driving is constructed.
The simulation results show that MEIRL is comparable with the
state of art IRL algorithms in terms of generalization capability,
but more efficient while the number of features is large.
I. INTRODUCTION
Reinforcement learning (RL) techniques solve problems
through an agent, which acquires experience by interacting
with a dynamic environment
[1-3]
. The result is a policy that can
resolve complex tasks without specific instructions on how the
tasks are to be achieved. However, there is one problem
associated with RL, which is the reward function in RL has to
be specified in advance, and its design difficulties promoted
the introduction of Inverse reinforcement learning, where the
reward function can be derived from expert’s demonstrations
[4-6]
.
Although original IRL algorithms can somewhat achieve
their objectives and their effectiveness has been proved by
some applications, there are still assumptions and constraints
[7]
.For example, the expert’s demonstrations are assumed to be
optimal, which is usually not true when the demonstrations are
noisy and imperfect. Moreover, the original IRL algorithms are
always ill-posed and it is difficult to obtain a reasonable
solution. These drawbacks limit the applications of IRL in a
restricted range.
To solve the problems mentioned above, many researchers
participate into the further refinements of IRL, some of which
are listed as follows.
Maximum margin planning (MMP) uses similar ideas as
the original IRL algorithm [8], where the solver attempts to
*Resrach supported by National Nature Science Foundation (61305121,
61035005).
D. C. Li, is with the Shenyang Institute of Automation, the Chinese
Academy of Sciences, Shenyang, China (phone: 0086-24-23970276; fax:
0086-24-23970276; e-mail: lidecai@sia.cn).
Y. Q. He, with the Shenyang Institute of Automation, the Chinese
Academy of Sciences, Shenyang, China (e-mail: heyuqing@sia.cn).
F. Gu, with the Shenyang Institute of Automation, the Chinese Academy
of Sciences, Shenyang, China (e-mail: fenggu@sia.cn).
make the demonstrations look better than any other solutions
by a margin and solves the ill-posed problem by introducing
proper loss-functions. In addition, Ramachandran point out
that, the uncertainty existed in the obtained reward function is
a main reason leading to the ill-posed problem of IRL
algorithms. Therefore, Bayesian IRL is presented to model this
uncertainty, which approaches the problem from probability
distribution perspective and is able to deal with noisy and
incomplete observations. Similar to Bayesian IRL, maximum
entropy IRL uses a probability approach to resolve the IRL
problem as well
[10]
. However, different from the former, the
principle of maximum entropy, which gives the least biased
estimate based on the given information, is employed in the
maximum entropy IRL algorithm. This algorithm was applied
to a set of GPS data collected from taxi drivers, and recover a
reward function that can be used for route recommendation
and predicting driver’s behavior. Although these IRL
algorithms can solve the ill-posed problem to a certain extent,
they are still suffer some drawbacks, such as these IRL
algorithms generally learn a reward as a linear combination of
features, which can be wrong while the reward function
follows a nonlinear form.
Levine presented a Gaussian process based algorithm for
nonlinear inverse reinforcement learning (GPIRL)
[11]
. The goal
of the algorithm is to learn the reward function in a Markov
decision process from expert demonstrations and use Gaussian
processes to learn the reward as a nonlinear function, while
also determining the relevance of each feature to the expert’s
policy.
However, this is a costly procedure as the running time of
the optimization method increases dramatically with the
number of feature M. Hence, the GPIRL algorithm will be
computational complex when M is large.
In this paper, an inverse reinforcement learning algorithm
named MEIRL is proposed by combing mutual information,
Extreme Learning Machine and Gaussian Process. Similar to
GPIRL, the reward function is constructed through Gaussian
process. Then a mutual information based algorithm called
Max-Relevance and Min-Redundancy is employed to evaluate
the importance of each feature with the reward function, based
on which the number of the selected features is reduced, so as
the computational complexity of the GPIRL. Furthermore, a
computational efficiently neural network that named extreme
learning machine is introduced along with an adaptive model
construction procedure to refine the selection process and a
more compact subset of features is obtained. In the end, a
highway driving simulation is constructed to demonstrate the
effectiveness of MEIRL.
Nonlinear Inverse Reinforcement Learning with Mutual Information
and Gaussian Process
De C. Li, Yu Q. He, Member, IEEE, and Feng Fu, Member, IEEE
978-1-4799-7396-5/14/$31.00 © 2014 IEEE
Proceedings of the 2014 IEEE
International Conference on Robotics and Biomimetics
December 5-10, 2014, Bali, Indonesia