QELAR：水下传感器网络的强化学习适应路由协议

113 浏览量更新于2024-08-03 收藏 2.2MB PDF 举报

"这篇论文提出了一种基于机器学习的自适应路由协议——QELAR（QELAR: A Machine-Learning-Based Adaptive Routing Protocol for Energy-Efficient and Lifetime-Extended Underwater Sensor Networks），旨在解决水下传感器网络（UWSN）中的路由问题，以提高能源效率并延长网络寿命。该协议假设通用MAC协议，并通过考虑节点的剩余能量以及一组节点之间的能量分布来计算奖励函数，以选择合适的转发器进行数据包传输。在Aqua-Sim平台上进行了广泛的模拟实验以验证QELAR协议的效果。" 在水下传感器网络（UWSN）的研究中，由于网络特性如高延迟、低带宽和高能耗，设计适用于UWSN的网络协议是一项挑战。QELAR协议正是针对这些难题而提出的，它利用强化学习方法实现自适应、节能且寿命导向的路由策略。强化学习是一种机器学习技术，通过不断尝试和反馈，让算法自动学习最优决策策略。在QELAR协议中，每个节点的剩余能量是路由决策的重要依据。协议会持续监控和考虑这一因素，确保网络中所有节点的能量消耗更加均衡。这样做不仅可以避免某些节点过早耗尽能量而导致网络分区，还可以最大化整个网络的生命周期。此外，协议还考虑了节点群组间能量分布的不均匀性。这使得在选择数据包的下一跳时，能够优先选取那些对网络寿命影响最小的节点，从而进一步优化能量利用率。奖励函数的设计是这一决策过程的关键，它根据当前网络状态（包括节点的剩余能量和能量分布）动态计算，帮助确定最佳转发策略。为了评估QELAR协议的性能，研究者进行了大量模拟实验，使用了Aqua-Sim这个专门用于水下通信的仿真平台。这样的实验设置能够模拟真实水下环境中的各种复杂情况，包括信号传播损失、多路径效应等，从而全面地检验QELAR在实际应用中的效果。 QELAR协议通过引入机器学习技术，为解决UWSN的路由问题提供了一个创新的解决方案，有助于实现网络的高效能和长寿命周期。对于毕业设计或相关研究项目，深入理解和实现QELAR协议将有助于理解如何在资源受限的环境下优化网络性能。

and mobile nodes. The source node must be aware of its

own location and the location of its final destination.

There are also some routing protocols based on Machine

Learning techniques. Q-Routing, which is designed for

packet routing in telephone networks, is one of such early

works based on Q-learning technique [21]. It finds the best

routing path by considering not only the number of hops to

the destination, but also the contribution of traffic jams, and

hence, queue length of each node to the latency. The

simulation shows that under high network load, the

algorithm outperforms the shortest-path algorithm, which

considers only the number of hops to the destination. It also

performs well under changing network topologies. Since it

uses only local information at each node and is fully

distributed, it can be adapted to wireless networks as well.

An algorithm named Dual Reinforcement Q-Routing is later

presented to make the algorithm converge faster [22].

AdaR [23] is a self-configured routing protocol for

wireless sensor networks based on reinforcement learning.

However, in this protocol, packets are delivered from

sensors to a centralized base station to learn the environ-

ment, which is costly to apply in the context of UWSNs.

When one node forwards a packet to another according to

the current routing policy, both the action and associated

reward are appended to the packet. When the base station

receives the packet which contains samples along the

routing path, the base station calculates new weights in the

linear quality function (Q) and broadcasts them throughout

the whole network to improve the forwarding policy.

Ant colony optimization (ACO) is another Machine Learning

technique used in routing for mobile ad-hoc networks. ACO

is inspired by the nature of ants that exchange information

with each other by laying down pheromone on their way

back to the nest when they have found food. In ACO

routing protocols, packets act like artificial ants that put

pheromone on the nodes they visited from source (nest) to

sink (food) so as to achieve optimal paths for the following

packets. AntNet [24] is a proactive ACO-based algorithm, in

which every node in the network generates a forward ant

packet at regular intervals. After the forward ant reaches

the destination, a backward ant packet is dispatched and

sent all the way back. ARA [25] is a reactive ACO-based

algorithm which divides the routing process into route

discovery phase and route maintenance phase. Both of the

protocols achieve a high delivery rate and almost shortest

path. However, they also inherit disadvantages from

proactive and reactive routing protocols. As a proactive

routing protocol, AntNet introduces too much overhead in

that nodes regularly send ant packets; on the other hand,

the route discovery phase is long for ARA. In [26], a

geographical ACO-based routing protocol POSANT is

presented. However, the assumption of GPS being available

does not hold in UWSNs.

1.2 Paper Overview and Contributions

In this paper, we propose an adaptive, energy-efficient,

and lifetime-aware routing protocol, QELAR, to address

the issues mentioned above based on Q-learning techni-

que, which is a reinforcement learning technique that

solves decision problems. By learning the environment

and evaluating an action-value function (Q-value), which

gives the expected reward of taking an action in a given

state, the distributed learning agent is able to make a

decision automatically.

We find that Q-learning can perform well in UWSNs in

the following aspects to utilize or overcome the important

properties of UWSNs:

. Low Overhead. In the underwater environment

with low bandwidth, high delay, and high acoustic

communication power consumption, it is costly to

broadcast data throughout the network. Our pro-

posed routing protocol is a hybrid of proactive and

reactive protocol, where nodes only keep the routing

information of their direct neighbor nodes which is a

small subset of the network, so that the drawbacks of

both proactive and reactive routing protocols are

avoided. The routing information is updated by one-

hop broadcasts rather than flooding.

. Dynamic Network Topology. In the harsh under-

water environment, link quality is not guaranteed. A

node in the transmission range may not be available

due to link corruption, which results in topology

change. Meanwhile, the movement of nodes also

causes topology changes. Without GPS available

underwater, a routing protocol in such an environ-

ment should be able to adapt to the topology

changes. As we can see later, Q-learning algorithm

learns from the network environment, and allows a

fast adaptation to the current network topology.

. Load Balance. Shortest path routing algorithms

reduce both the energy consumption and end-to-

end delay by enforcing packets to go along the

shortest path. However, nodes on the shortest path

will drain more quickly than others, so that shortest-

path algorithms may reduce the lifetime of the

network. In contrast, after we take nodes energy into

consideration in Q-learning, alternative paths can be

chosen to use network nodes in a fair manner.

. General Framework. Q-learning is a framework that

can be easily extended. Its behavior is largely

determined by the reward function, which is used

to give a positive or negative reinforcement to the

agent after it makes a decision. By judiciously

designing the reward function, we make our routing

protocol energy-efficient and lifetime-aware. The

optimal policy (routing forwarders selection) is

derived through iterative Q-learning of the residual

energy of each node and the energy distribution

among a group of nodes. Meanwhile, we can easily

integrate other factors such as end-to-end delay and

node density for extension and can balance all the

factors according to our need by tuning the para-

meters in the reward function.

More specifically, in our Q-learning-based routing

algorithm, the energy consumption of sensor nodes and

residual energy distribution among groups of nodes are

novel considerations that are put into the reward function.

With such a reward function, the forwarding policy can be

improved at runtime, so as to achieve high energy efficiency

and uniform energy distribution, and thus, prolong the

network lifetime. The effects of different weights in the

798 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 9, NO. 6, JUNE 2010

剩余13页未读，继续阅读

蓝鹰天眼

粉丝: 14
资源: 5

QELAR：水下传感器网络的强化学习适应路由协议

水声网络路由协议 qelar

防止深度欺骗攻击的节能水声传感器路由协议

"深度欺骗攻击的水声传感器网络路由协议防护与性能优化

水下无线传感器网络路由协议综述

用MATLAB制作一个水声传感器网络模型

水声通信仿真 ns3

水声传感器网络中信任评估的国内外研究现状分析

opnet水声信道建模

水声信号响应matlab

bellhop水声工具箱

最新资源