Ocean Engineering 245 (2022) 110452
3
DRL methods and AUV motion control were mostly completed in the
underwater horizontal plane, and there were relatively fewer studies on
motion control in the vertical plane and three-dimensional space.
Especially for the motion control of the AUV in the three-dimensional
space, the AUV control of six-degrees-of-freedom was rarely involved,
making these methods difcult to be employed for real experiments.
Therefore, this work explores the AUV motion control in six-degrees-of-
freedom based on posture control in three-dimensional space. As the X-
rudder AUV has higher safety, better maneuverability, and lower noise
(Xia et al., 2020), the present work chooses the torpedo-like X-rudder
under-actuated AUV as the research object. Based on the principle of
DRL method, the present work uses the DDPG algorithm to train the
AUV on Gazebo simulation platform, and the experimental results verify
the feasibility of the control strategy. First, the AUV agent is trained to
realize posture adjustment and maintenance. On this basis, data pro-
cessing methods to expand and stabilize the navigation control capa-
bility of AUV are proposed, thus the DDPG algorithm can be quickly
deployed for AUV motion control in six-degrees-of-freedom. Then
posture adjusting, position tracking, and trajectory control experiments
are conducted successfully. The results prove that the proposed control
strategy has remarkable task generalization ability, with potential for
path planning, trajectory tracking, and obstacle avoidance, etc.
In this paper, the mathematical model and algorithm mechanism are
introduced in Section 2. Section 3 explains the detailed control strategy.
The posture control and trajectory control results are presented in Sec-
tion 4. Section 5 concludes the paper and discusses further research
interests.
2. Mathematical model and algorithm mechanism
2.1. Mathematical model
The ECA_A9 AUV, which has a torpedo-like shape and under-
actuated X-rudder layout, is chosen for the following research. It has a
conventional axially mounted propulsion system, as shown in Fig. 1. The
AUV is approximately 2 m long and weighs approximately 70 kg, with a
capability of underwater navigation for more than 10 h. The external
structure of this AUV is similar to submarine or torpedo, with a pro-
pulsion system deployed at the stern, four independent ns on the tail,
additional structure on the top, without sail rudder or bow rudder.
The East-North-Up coordinate system is used in the simulation
environment, shown as Fig. 2, where the red axis represents the X-axis,
the green axis represents the Y-axis, and the blue axis represents the Z-
axis.
In the coordinate system mentioned above, the horizontal plane z =
0 is set as the sea level, and (0, 0, − 50) is designated as the initial po-
sition of AUV. In this paper, the AUV is considered to have a six-degree-
of freedom, and the model of AUV is based on Fossen’s equations
(Fossen, 2011), shown as Eq. (1):
M
RB
˙v
r
+ C
RB
(v
r
)v
r
+ g
0
=
τ
g
(1)
where M
RB
is the rigid-body inertia matrix,
ν
r
the velocity vector, C
RB
(
ν
r
)
the matrix of rigid-body Coriolis and centripetal forces, g
0
the restoring
forces of gravity,
τ
g
the external forces and torques. And
τ
g
can be
calculated with Eq. (2):
τ
g
= − M
A
˙v
r
− C
A
(v
r
)v
r
− D(v
r
)v
r
− g(
η
) (2)
where M
A
is the added-mass inertia matrix, C
A
(
ν
r
) the matrix of added-
mass Coriolis and centripetal forces, D(
ν
r
) the damping matrix, g(
η
) the
restoring forces of buoyancy. Because this study is conducted based on
the Robot Operating System, Gazebo platform, and UUV Simulator
project (Manhaes et al., 2016) for simulation experiments, the above
parameters of the AUV underwater dynamics characteristics are already
integrated into the open source project UUV Simulator.
2.2. Algorithm mechanism
The basic principle of RL is that the agent selects the appropriate
action according to the current state and its behavioral policy, interacts
with the environment by performing the action, and receives the reward
function from the environment. Then, the environment updates the state
of the agent. In the process of cycling the steps above, the agent
continuously updates its behavioral strategy and learns the best strategy
to complete the task.
To overcome the shortcomings of RL in dealing with high-
dimensional state spaces, scholars combined RL with deep neural net-
works, namely DRL method. DQN (Mnih et al., 2015) is the rst suc-
cessful and inuential algorithm of DRL method, however, DQN can
only handle discrete and low-dimensional action spaces. Therefore, to
address the shortcomings of DQN algorithm in the continuous action
control problem, scholars proposed the DDPG algorithm (Lillicrap et al.,
Fig. 2. The axis system of AUV.
Y. Fang et al.