
Network In Network
Min Lin
1,2
, Qiang Chen
2
, Shuicheng Yan
2
1
Graduate School for Integrative Sciences and Engineering
2
Department of Electronic & Computer Engineering
National University of Singapore, Singapore
{linmin,chenqiang,eleyans}@nus.edu.sg
Abstract
We propose a novel deep network structure called “Network In Network”(NIN)
to enhance model discriminability for local patches within the receptive field. The
conventional convolutional layer uses linear filters followed by a nonlinear acti-
vation function to scan the input. Instead, we build micro neural networks with
more complex structures to abstract the data within the receptive field. We in-
stantiate the micro neural network with a multilayer perceptron, which is a potent
function approximator. The feature maps are obtained by sliding the micro net-
works over the input in a similar manner as CNN; they are then fed into the next
layer. Deep NIN can be implemented by stacking mutiple of the above described
structure. With enhanced local modeling via the micro network, we are able to uti-
lize global average pooling over feature maps in the classification layer, which is
easier to interpret and less prone to overfitting than traditional fully connected lay-
ers. We demonstrated the state-of-the-art classification performances with NIN on
CIFAR-10 and CIFAR-100, and reasonable performances on SVHN and MNIST
datasets.
1 Introduction
Convolutional neural networks (CNNs) [1] consist of alternating convolutional layers and pooling
layers. Convolution layers take inner product of the linear filter and the underlying receptive field
followed by a nonlinear activation function at every local portion of the input. The resulting outputs
are called feature maps.
The convolution filter in CNN is a generalized linear model (GLM) for the underlying data patch,
and we argue that the level of abstraction is low with GLM. By abstraction we mean that the fea-
ture is invariant to the variants of the same concept [2]. Replacing the GLM with a more potent
nonlinear function approximator can enhance the abstraction ability of the local model. GLM can
achieve a good extent of abstraction when the samples of the latent concepts are linearly separable,
i.e. the variants of the concepts all live on one side of the separation plane defined by the GLM. Thus
conventional CNN implicitly makes the assumption that the latent concepts are linearly separable.
However, the data for the same concept often live on a nonlinear manifold, therefore the represen-
tations that capture these concepts are generally highly nonlinear function of the input. In NIN, the
GLM is replaced with a ”micro network” structure which is a general nonlinear function approxi-
mator. In this work, we choose multilayer perceptron [3] as the instantiation of the micro network,
which is a universal function approximator and a neural network trainable by back-propagation.
The resulting structure which we call an mlpconv layer is compared with CNN in Figure 1. Both the
linear convolutional layer and the mlpconv layer map the local receptive field to an output feature
vector. The mlpconv maps the input local patch to the output feature vector with a multilayer percep-
tron (MLP) consisting of multiple fully connected layers with nonlinear activation functions. The
MLP is shared among all local receptive fields. The feature maps are obtained by sliding the MLP
1
arXiv:1312.4400v3 [cs.NE] 4 Mar 2014