The total excitation function of +
i
is
net
i
¼
X
i
j¼1
w
j
x
j
; ð2Þ
where x
j
(j =1,2,..., i) are the inputs to node +
i
. The output of the node +
i
is then calculated by,
out
i
¼ f ða
i
; b
i
; net
i
Þ¼e
net
i
a
i
b
i
2
: ð3Þ
The overall output of flexible neural tree can be computed from left to right by depth-first method, recursively.
2.1. Tree structure optimization using PIPE
PIPE [28] combines probability vector coding of program instructions, population-based incremental learning [3], and
tree-coded programs. PIPE iteratively generates successive populations of functional programs according to an adaptive
probability distribution, represented as a probabilistic prototype tree (PPT), over all possible programs. Each iteration uses
the best program to refine the distribution. Thus, the structures of promising individuals are learned and encoded in the PPT.
PPT is very important for PIPE algorithm. Each program in a population is generated according to the PPT. That means the
PPT is a control factor in population generating process. And the PPT is also adjusted during the iterative process of the pop-
ulation. The PPT stores the knowledge gained from experiences with programs (trees) and guides the evolutionary search. It
holds the probability distribution over all possible programs that can be constructed from a predefined instruction set. The
PPT is generally a complete n-ary tree with infinitely many nodes, where n is the maximal number of function arguments.
Each node N
j
in PPT, with j P 0 contains a variable probability vector P
j
!
. Each P
j
!
has n + m 1 components, where n is the
number of instructions in set T, m 1 is the number of instructions in set F (since the first instruction in F is +
2
), and
n + m 1 is the number of instructions in instruction set S. Each component P
j
(I)ofP
j
!
denotes the probability of choosing
instruction I 2 S at node N
j
. Each vector P
j
!
is initialized as follows:
P
j
ðIÞ¼
P
T
n
8
I : I 2 T; ð4Þ
P
j
ðIÞ¼
1 P
T
m 1
8
I : I 2 F; ð5Þ
where P
T
is the total probability to select terminal instructions.
PIPE combines two forms of learning: generation-based learning (GBL) and elitist learning (EL). GBL is a learning strategy
according to the best program of current population. While EL is a learning strategy according to the global best program.
The global best program is also called the elitist program. GBL is PIPE’s main learning algorithm. The purpose of EL is to
use the best program found so far as an attractor. The whole PIPE learning frame is as follows:
1: repeat
2: with probability P
el
do EL
3: otherwise do GBL
4: until termination criterion is reached
Here P
el
is a user-defined constant in [0,1].
2.1.1. Generation-based learning
The main steps of the generation-based learning can be described as follows:
(1) Creation of program population. A population of programs P
ROG
j
(0 < j 6 PS;PS is population size) is generated using the
prototype tree PPT.
(2) Population evaluation. Each program P
ROG
j
of the current population is evaluated on the given task and assigned a fit-
ness value FITðP
ROG
j
Þ according to the fitness function. The best program of the current population (the one with the
smallest fitness value) is denoted P
ROG
b
. The best program found so far (elitist) is preserved in P
el
ROG
.
(3) Learning from population. Prototype tree probabilities are modified such that the probability PðP
ROG
b
Þ of creating P
ROG
b
increases. This procedure is called adapting PPT towards P
ROG
b
. This is implemented as follows. First PðP
ROG
b
Þ is com-
puted by looking at all PPT nodes N
j
used to generate P
ROG
b
:
P ðP
ROG
b
Þ¼
Y
j:N
j
used to generate P
ROG
b
P
j
ðI
j
ðP
ROG
b
ÞÞ; ð6Þ
where I
j
ðP
ROG
b
Þ denotes the instruction of program P
ROG
b
at node position j. Then a target probability P
TARGET
for P
ROG
b
is
calculated:
P
TARGET
¼ PðP
ROG
b
Þþð1 PðP
ROG
b
ÞÞ lr
e
þ FITðP
el
ROG
Þ
e
þ FITðP
ROG
b
Þ
: ð7Þ
L. Peng et al. / Parallel Computing 37 (2011) 653–666
655