Genetic CNN
Lingxi Xie, Alan Yuille
Department of Computer Science, The Johns Hopkins University, Baltimore, MD, USA
198808xc@gmail.com alan.l.yuille@gmail.com
Abstract
The deep convolutional neural network (CNN) is the
state-of-the-art solution for large-scale visual recognition.
Following some basic principles such as increasing network
depth and constructing highway connections, researchers
have manually designed a lot of fixed network architectures
and verified their effectiveness.
In this paper, we discuss the possibility of learning deep
network structures automatically. Note that the number
of possible network structures increases exponentially with
the number of layers in the network, which motivates us to
adopt the genetic algorithm to efficiently explore this large
search space. The core idea is to propose an encoding
method to represent each network structure in a fixed-length
binary string. The genetic algorithm is initialized by gen-
erating a set of randomized individuals. In each genera-
tion, we define standard genetic operations, e.g., selection,
mutation and crossover, to generate competitive individuals
and eliminate weak ones. The competitiveness of each
individual is defined as its recognition accuracy, which is
obtained via a standalone training process on a reference
dataset. We run the genetic process on CIFAR10, a small-
scale dataset, demonstrating its ability to find high-quality
structures which are little studied before. The learned pow-
erful structures are also transferrable to the ILSVRC2012
dataset for large-scale visual recognition.
1. Introduction
Visual recognition is a fundamental task in computer
vision, implying a wide range of applications. Recently, the
state-of-the-art algorithms on visual recognition are mostly
based on the deep Convolutional Neural Network (CNN).
Starting from the fundamental chain-styled network model-
s [19], researchers have been increasing the depth of the
network [32], as well as designing novel network mod-
ules [36][13] to improve recognition accuracy. Although
these modern networks have been shown to be efficient, we
note that their structures are manually designed, not learned,
which limits the flexibility of the approach.
In this paper, we reveal the possibility of automatically
learning the structure of deep neural networks. We consider
a constrained case, in which the network has a limited
number of stages, and each stage is defined as a set of pre-
defined building blocks such as convolution and pooling
layers. Even under these limitations, the total number of
possible network structures grows exponentially with the
number of layers, making it impractical to enumerate all the
candidates and find the best one. Instead, we formulate this
problem as optimization in a large search space, and apply
the genetic algorithm to exploring the space efficiently.
The genetic algorithm involves constructing an initial
population of individuals, and performing genetic opera-
tions to allow them to evolve in an iterative process. We
propose a novel encoding scheme to represent each network
structure as a fixed-length binary string, and define several
standard genetic operations, i.e., selection, mutation and
crossover, so that new competitive individuals are generated
from the previous generation and weak ones are eliminated.
The quality (fitness function) of each individual is deter-
mined by its recognition accuracy on a reference dataset.
To this end, we perform a complete training process for
each individual (i.e., network structure) which is inde-
pendent to the genetic algorithm. The genetic process
comes to an end after a fixed number of generations.
It is worth emphasizing that the genetic algorithm is
computationally expensive, as we need to undergo a com-
plete network training process for each generated individ-
ual. We adopt the strategy to run the genetic process on a
small dataset ( CIFAR10), in which we observe the ability
of the genetic algorithm to find effective network struc-
tures, and then transfer the learned top-ranked structures
to perform large-scale visual recognition. The learned
structures, most of which have been less studied before,
often perform better than the manually designed ones in
either small-scale or large-scale experiments.
The remainder of this paper is organized as follows.
Section 2 briefly introduces related work. Section 3 il-
lustrates the way of using the genetic algorithm to design
network structures. Experiments are shown in Section 4,
and conclusions are drawn in Section 5.
1379