2379-8858 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIV.2022.3150748, IEEE
Transactions on Intelligent Vehicles
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES 3
Chi et al. propose the Comfort and Collision Risk (CCR)
map to model human behavior. Then a sampling-based
algorithm is applied on the CCR map to achieve socially-
complaint path planning. In [15], the Social Force Model
(SFM) is introduced to achieve human-aware navigation in
crowded environments. Later in [16], the Extended SFM
is taken in proactive kinodynamic planning. Recently, in
[17], Wang et al. utilize the potential function to model
the relationship between robots and humans to achieve
socially-compliant path planning. By combing membrane
computing rules and potential led methods, MemEAPF
[18] and MemPBPF [19] are proposed to handle mo-
bile robot path planning. An electrostatic potential eld
method [20] is proposed for mobile robot path planning
in scattered environments. These model-based methods
approximately model human behavior, but the robot still
considers the human as a dierent obstacle. The robot
cannot directly imitate human navigation behavior.
The learning-based method aims at generating trajecto-
ries that account for human behaviors by learning collected
data from real demonstrations. The Inverse Reinforcement
Learning (IRL) technique is often used to nd a novel
cost function to guide the robot path planning. In [21],
Noe et al. use the IRL to learn RRT*’s cost function
from demonstrations. In [22], Henrik et al. consider mod-
eling cooperative navigation behavior of humans using
the IRL technique to achieve socially-compliant mobile
robot navigation. Deep Learning (DL) technique is also
widely applied in this area. Mavrogiannis et al. [23] apply
the Long Short-Term Memory (LSTM) architecture to
learn multi-agent path topologies for socially competent
planning. Convolutional neural network (CNN) is used in
[24] to train a drone to y in the civilian environment
autonomously. Generative Adversarial Network (GAN)
[25] can be also used to learn heuristics for sampling-based
path planning. Meanwhile, a recurrent generative model
[26] is proposed to achieve ecient heuristic generation
for robot path planning. The learning-based method can
help the robot directly imitate human navigation behavior,
which has gained great popularity. However, the learning-
based method requires complex network models and long
oine training sessions.
Unlike the methods mentioned above, the GMR-RRT*
employs the GMR as a learning tool to learn from human
behavior with the Expectation Maximization (EM) algo-
rithm. Then the learned distribution acts as a nonuniform
sampling function to guide the sampling-based path plan-
ner to generate a feasible solution quickly. The simplicity
of the GMR-RRT* allows it to be more ecient and
aordable in practical applications.
III. Algorithm
In this section, we rst introduce how to use the GMR
to learn navigation behavior from human demonstrations.
The learned behavior is then applied in the GMR-RRT*
to achieve fast and high-quality path planning. At last,
we prove that the GMR-RRT* guarantees probabilistic
completeness and asymptotic optimality.
A. Gaussian Mixture Regression
In this subsection, we follow the previous work to give
a preliminary introduction of the GMR [9] and discuss
how to adapt it to the sampling-based path planning.
The GMR utilizes the Gaussian conditioning theorem to
compute the distribution of output given input. Firstly,
the GMM encodes the joint distribution of input and
output using the EM algorithm to calculate the model
parameters. Secondly, the output given the input (spatial
data given temporal data in this paper) is computed
through a linear combination of conditional expectations.
Therefore, the GMR relies on the learned joint distribution
instead of deriving the regression function directly. An
illustration of the GMR calculation process is provided in
Fig. 4.
For each human demonstrated trajectory, the length is
rescaled to a xed value N. The demonstrated trajectory
ξ = {ξ
i
}
N
i=1
is dened by N observations ξ
i
∈ R
3
. Each
datapoint ξ
i
= {ξ
t
, ξ
s
} encodes time ξ
t
∈ R and spatial
position ξ
s
∈ R
2
. All trajectories are fed into the GMM
with K components, and the probability density function
is dened as
p(ξ
i
) =
K
X
k=1
p
k
p(ξ
i
|k) (1)
=
K
X
k=1
p
k
N (ξ
i
; µ
k
, Σ
k
)
=
K
X
k=1
p
k
1
(2π)
3
|
P
k
|
e
−
1
2
((ξ
i
−µ
k
)
∑
−1
k
(ξ
i
−µ
k
))
,
where {p
k
, µ
k
, Σ
k
} are the prior, mean, and covariance
matrix of the Gaussian component k. When applying the
GMR to learn the navigation behavior, the temporal value
ξ
t
is used as the query point, and the corresponding spatial
value ξ
s
is estimated through regression. Therefore, µ
k
and
Σ
k
can be represented separately as
µ
k
= {µ
t,k
, µ
s,k
}, Σ
k
=
Σ
tt,k
Σ
ts,k
Σ
st,k
Σ
ss,k
. (2)
Given ξ
t
, the conditional distribution of ξ
s
in each Gaus-
sian model k is
p(ξ
s
|ξ
t
, k) = N (ξ
s
;
ˆ
ξ
s,k
,
ˆ
Σ
ss,k
),
ˆ
ξ
s,k
= µ
s,k
+ Σ
st,k
(Σ
tt,k
)
−1
(ξ
t
− µ
t,k
),
ˆ
Σ
ss,k
= Σ
ss,k
− Σ
st,k
(Σ
tt,k
)
−1
Σ
ts,k
, (3)
where
ˆ
ξ
s,k
is the conditional expectation and
ˆ
Σ
ss,k
is
the estimated conditional covariance. The complete con-
ditional distribution of ξ
s
given ξ
t
is
p(ξ
s
|ξ
t
) =
K
X
k=1
β
k
N (ξ
s
;
ˆ
ξ
s,k
,
ˆ
Σ
ss,k
), (4)
where β
k
= p(k|ξ
t
) is the so-called responsibility of the
Gaussian component k
β
k
=
p
k
p(ξ
t
|k)
P
K
i=1
p
i
p(ξ
t
|i)
=
p
k
N (ξ
t
; µ
t,k
, Σ
tt,k
)
P
K
i=1
p
i
N (ξ
t
; µ
t,i
, Σ
tt,i
)
. (5)
Authorized licensed use limited to: Shanghai University of Engineering Science. Downloaded on February 16,2022 at 11:27:27 UTC from IEEE Xplore. Restrictions apply.