A Stable and Energy-Efficient Routing Algorithm Based on Learning Automata Theory for MANET 45
is influenced by not only the relative mobility between nodes
but also the distribution of nodes. Overall energy conserva-
tion and balance should also be taken into consideration. In
addition to the above two points, it must be stressed that the
traditional heuristic techniques and machine learning meth-
ods used to design MANET routing protocols generally lack
expansibility. They have minimal hand-tuning, and incur rela-
tively high computation costs. To resolve these existing prob-
lems, this paper proposes a stable and energy-efficient routing
algorithm for MANET using LA theory. Compared with tra-
ditional machine learning methods and heuristic algorithms,
LA theory has the following advantages: (1) LA theory is sup-
ported by a completed mathematics proof
[23-26]
; (2) LA theory
is capable of global optimization and results in relatively low
costs, in a dynamic environment; (3) LA theory has good ex-
pansibility, which is needed to optimize large-scale MANET
routing performance; (4) LA theory can map the computa-
tion space to a probability space, ensuring normalization. The
optimization efficiency of traditional heuristic algorithms and
machine learning methods (e.g., ACO, PSO, GA) rely on the
construction of a heuristic function. Considering the dynamic
environment, it is difficult to construct a good enough heuris-
tic function. In addition, these methods cannot always ensure
normalization. As a general rule, the LA theory is used in
bio-computation and stochastic system control, which can be
regarded as a dynamic environment. Owing to the dynamic
features of MANET, we are able to use LA theory to find the
optimal route from available paths.
In our solution, we begin by constructing a new node sta-
bility measurement model and defining the effective energy
ratio function. On this basis, we introduce a node-weighted
value function, which is used as the iteration parameter. We
then use LA theory to construct a MANET environment feed-
back model. In this feedback model, each node is equipped
with learning automaton, enabling it to take action by sensing
the surrounding environment. Based on LA theory, this pro-
cess can be represented through a rigorous linear probability
iteration; in other words, the relay node in the available paths
updates its weighted value according to the feedback signal,
which represents the result after sensing the environment (we
have used a judging function to distinguish the type of feed-
back signal explained in part A of section IV). When the feed-
back signal is a reward signal, this node will add its weighted
value. Conversely, it will reduce its weighted value. Thus,
the current node can decide which node should be chosen as
the next hop node from a group of available hop nodes. Ac-
cordingly, the path value defined in this feedback mechanism
will be added or reduced (before executing this mechanism,
we found all available paths from the source node to the des-
tination node explained in part A of section IV). Because of
the convergence of LA, we can finally choose the optimal path
with the highest path value from all of the available paths. The
chosen path will be stable enough to ensure overall energy
saving and balance.
III. PRELIMINARIES AND BACKGROUND
In this section, we present a brief overview of routing pro-
tocols and some preliminary information on LA theory.
A. Overview of Routing Protocols
Based on the relation with information
[20,27]
, routing pro-
tocols can be divided into several categories. In general, we
classify the protocols into three kinds: proactive, reactive and
hybrid protocols. Proactive routing protocols periodically up-
date the message so that it can ensure the data packets trans-
mission. Reactive protocols initiate route discovery on de-
mand. That means when the source node has data packets to
be sent to a given destination node, it initiates route discov-
ery by broadcasting the route request packet. While receiving
the request packet, the relay nodes will rebroadcast it again.
This process continues until the request packet arrives at the
destination node. Similar to the handshake mechanism, the
destination node generates a route reply packet and sends it
to the source node. In other words, the reply packet tracks
the reverse route already taken by the corresponding request
packet. As a compromised scheme, hybrid routing protocols
combine these two routing protocols, which can be used in hi-
erarchical structure networks. Generally, proactive protocols
cause more energy consumption, which, of course, degrades
the network life cycle. Hybrid routing protocols use more con-
trol information than reactive protocols needing a hierarchical
structure network.
B. Learning Automata
LA theory is a self-learning mechanism based on the theory
of stochastic process
[23,26]
. As an adaptive decision-making
system, LA can enhance the performance by using previous
knowledge to choose the best action from a limited set of
actions through repeated interactions with a random environ-
ment. Basic LA contains three key factors: a random environ-
ment, an automaton and a feedback system. The automaton
chooses actions based on the random environment and the en-
vironment responds to these actions by producing a feedback
signal. Based on the effect on the automaton, the feedback
signal can be divided into ‘positive signal’ (reward signal) or
‘negative’ signal (penalty signal). Over a period of time, the
automaton can learn from the feedback signal to find an opti-
mal action (Fig. 1 shows the operating principle of LA).
Definition 1 (environment) The random environment is
an object interacting with the automaton. Usually, we set
E = {A,B,C} to describe the random environment. Where
A = {α
1
,α
2
,··· ,α
t
} represents the limited sets of inputs per-
formed by the automaton, α
t
represents the action in time