深度强化学习驱动的AUV姿态与轨迹控制

需积分: 0 73 浏览量更新于2024-08-03 1 收藏 11.46MB PDF 举报

本文主要探讨了基于快速部署深度强化学习（Fast-Deployed Deep Reinforcement Learning, FDDL）方法的无人自主水下航行器（Autonomous Underwater Vehicle, AUV）定位跟踪与轨迹控制问题。海洋工程学领域的一项最新研究发表于《OceanEngineering》2022年第110452期，于2021年12月31日在线发布，由Elsevier有限公司授权。在传统的AUV运动控制中，由于其欠驱动特性以及X型尾舵设计，实现精确的姿态控制往往是一项挑战。本文作者们针对这一难题，利用深度强化学习（Deep Reinforcement Learning, DRL）技术，特别是深度确定性策略梯度（Deep Deterministic Policy Gradient, DDPG）算法，设计了一种创新的解决方案。在模拟环境中，他们训练了一个AUV代理，实现了在恒定速度、固定滚转、可变俯仰和可变偏航下的三维姿态控制。首先，通过模拟训练，AUV能够在无需人类干预的情况下，逐步学习和优化其行为策略，以便更有效地应对复杂的水下环境。训练过程中，AUV能够自动调整其动力分配，以达到期望的位置和姿态。当AUV的偏航角度范围被扩展后，这个系统展示了更强的适应性和灵活性，使得AUV能够在更大的空间范围内执行精准的轨迹控制。值得注意的是，关键词“AUV”、“深度强化学习”、“位置跟踪”和“轨迹控制”突出了研究的核心关注点，表明该研究不仅关注理论模型的构建，还着重于实际应用中的性能提升和效率优化。此外，这项工作对于推进AUV技术的发展，尤其是在复杂水下环境中的自主导航和任务执行能力的提升，具有重要意义。总结来说，这篇论文通过FDDL方法解决了AUV姿态控制的难题，展示了深度强化学习在海洋工程领域的潜力，为AUV的智能化和自主性提供了新的途径。其研究成果不仅有助于推动科研进步，也为未来的水下作业如海洋勘探、军事侦察等领域提供了实用的技术支持。

Ocean Engineering 245 (2022) 110452

DRL methods and AUV motion control were mostly completed in the

underwater horizontal plane, and there were relatively fewer studies on

motion control in the vertical plane and three-dimensional space.

Especially for the motion control of the AUV in the three-dimensional

space, the AUV control of six-degrees-of-freedom was rarely involved,

making these methods difcult to be employed for real experiments.

Therefore, this work explores the AUV motion control in six-degrees-of-

freedom based on posture control in three-dimensional space. As the X-

rudder AUV has higher safety, better maneuverability, and lower noise

(Xia et al., 2020), the present work chooses the torpedo-like X-rudder

under-actuated AUV as the research object. Based on the principle of

DRL method, the present work uses the DDPG algorithm to train the

AUV on Gazebo simulation platform, and the experimental results verify

the feasibility of the control strategy. First, the AUV agent is trained to

realize posture adjustment and maintenance. On this basis, data pro-

cessing methods to expand and stabilize the navigation control capa-

bility of AUV are proposed, thus the DDPG algorithm can be quickly

deployed for AUV motion control in six-degrees-of-freedom. Then

posture adjusting, position tracking, and trajectory control experiments

are conducted successfully. The results prove that the proposed control

strategy has remarkable task generalization ability, with potential for

path planning, trajectory tracking, and obstacle avoidance, etc.

In this paper, the mathematical model and algorithm mechanism are

introduced in Section 2. Section 3 explains the detailed control strategy.

The posture control and trajectory control results are presented in Sec-

tion 4. Section 5 concludes the paper and discusses further research

interests.

2. Mathematical model and algorithm mechanism

2.1. Mathematical model

The ECA_A9 AUV, which has a torpedo-like shape and under-

actuated X-rudder layout, is chosen for the following research. It has a

conventional axially mounted propulsion system, as shown in Fig. 1. The

AUV is approximately 2 m long and weighs approximately 70 kg, with a

capability of underwater navigation for more than 10 h. The external

structure of this AUV is similar to submarine or torpedo, with a pro-

pulsion system deployed at the stern, four independent ns on the tail,

additional structure on the top, without sail rudder or bow rudder.

The East-North-Up coordinate system is used in the simulation

environment, shown as Fig. 2, where the red axis represents the X-axis,

the green axis represents the Y-axis, and the blue axis represents the Z-

axis.

In the coordinate system mentioned above, the horizontal plane z =

0 is set as the sea level, and (0, 0, − 50) is designated as the initial po-

sition of AUV. In this paper, the AUV is considered to have a six-degree-

of freedom, and the model of AUV is based on Fossen’s equations

(Fossen, 2011), shown as Eq. (1):

˙v

+ C

+ g

(1)

where M

is the rigid-body inertia matrix,

the velocity vector, C

(

)

the matrix of rigid-body Coriolis and centripetal forces, g

the restoring

forces of gravity,

the external forces and torques. And

can be

calculated with Eq. (2):

= − M

˙v

− C

− D(v

− g(

) (2)

where M

is the added-mass inertia matrix, C

(

) the matrix of added-

mass Coriolis and centripetal forces, D(

) the damping matrix, g(

) the

restoring forces of buoyancy. Because this study is conducted based on

the Robot Operating System, Gazebo platform, and UUV Simulator

project (Manhaes et al., 2016) for simulation experiments, the above

parameters of the AUV underwater dynamics characteristics are already

integrated into the open source project UUV Simulator.

2.2. Algorithm mechanism

The basic principle of RL is that the agent selects the appropriate

action according to the current state and its behavioral policy, interacts

with the environment by performing the action, and receives the reward

function from the environment. Then, the environment updates the state

of the agent. In the process of cycling the steps above, the agent

continuously updates its behavioral strategy and learns the best strategy

to complete the task.

To overcome the shortcomings of RL in dealing with high-

dimensional state spaces, scholars combined RL with deep neural net-

works, namely DRL method. DQN (Mnih et al., 2015) is the rst suc-

cessful and inuential algorithm of DRL method, however, DQN can

only handle discrete and low-dimensional action spaces. Therefore, to

address the shortcomings of DQN algorithm in the continuous action

control problem, scholars proposed the DDPG algorithm (Lillicrap et al.,

Fig. 2. The axis system of AUV.

Y. Fang et al.

剩余13页未读，继续阅读

方小汪

粉丝: 1326
资源: 3

深度强化学习驱动的AUV姿态与轨迹控制

AUV-motion-control-system-design.rar_AUV 控制_auv control_水下_水下机器_

AUV.rar_AUV simulation_AUV六自由度_auv_auv-tracking_quaternion

AUV-modeling-and-sim-master_AUV仿真_auv_UUV_uuv模型_UUV仿真.zip

AUV-modeling-and-sim-master_AUV仿真_auv_UUV_uuv模型_UUV仿真_源码.zip

subsea-resident-auv:水下居民AUV-Sim and Controls

基于单加速度并利用神经网络的水下机器人航位推算导航 AUV Dead-Reckoning Navigation Based On Neural Network Using a Single Accelerometer

SMC-and-Fuzzy-logic-controller-for-AUV-master_slidingmodefuzzy_s

A long-distance navigation model for AUV based on multiple parameter searching

AUV,UUV,ROV-PPT-OceanScan_LAUV.pdf

3d-auv-simulator:从 code.google.comp3d-auv-simulator 自动导出

最新资源