Must Know Tips/Tricks in Deep Neural Networks (by
Deep Neural Networks, especially
Convolutional Neural Networks
), allows computational models that are
composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These
methods have dramatically improved the state-of-the-arts in visual object recognition, object detection, text
recognition and many other domains such as drug discovery and genomics.
In addition, many solid papers have been published in this topic, and some high quality open source CNN software
packages have been made available. There are also well-written CNN tutorials or CNN software manuals. However, it
might lack a recent and comprehensive summary about the details of how to implement an excellent deep
convolutional neural networks from scratch. Thus, we collected and concluded many implementation details for
Here we will introduce these extensive implementation details, i.e.,
and training your own deep networks.
We assume you already know the basic knowledge of deep learning, and here we will present the implementation details (tricks or tips) in Deep Neural
Networks, especially CNN for image-related tasks, mainly in
some tips during training
selections of activation functions
some insights found from figures
methods of ensemble multiple deep networks
. If there are any problems/mistakes in these materials and slides, or there are
something important/interesting you consider that should be added, just feel free to contact
Sec. 1: Data Augmentation
Since deep networks need to be trained on a huge number of training images to achieve satisfactory performance, if the original image data set contains
limited training images, it is better to do data augmentation to boost the performance. Also, data augmentation becomes the thing must to do when
There are many ways to do data augmentation, such as the popular
you could try combinations of multiple different processing, e.g., doing the rotation and random scaling at the same time. In addition, you can try
to raise saturation and value (S and V components of the HSV color space) of all pixels to a power between 0.25 and 4 (same for all pixels within
a patch), multiply these values by a factor between 0.7 and 1.4, and add to them a value between -0.1 and 0.1. Also, you could add a value
between [-0.1, 0.1] to the hue (H component of HSV) of all pixels in the image/patch.
in 2012. Fancy PCA alters the intensities of the RGB channels in
training images. In practice, you can firstly perform PCA on the set of RGB pixel values throughout your training images. And then, for each
training image, just add the following quantity to each RGB image pixel (i.e.,
-th eigenvector and eigenvalue of the
covariance matrix of RGB pixel values, respectively, and
variable drawn from a Gaussian with mean zero and standard deviation 0.1. Please note that, each
is drawn only once for all the pixels of a
particular training image until that image is used for training again. That is to say, when the model meets the same training image again, it will
for data augmentation. In
fancy PCA could approximately capture an important property
of natural images, namely, that object identity is invariant to changes in the intensity and color of the illumination
performance, this scheme reduced the top-1 error rate by over 1% in the competition of ImageNet 2012.
Now we have obtained a large number of training samples (images/crops), but please do not hurry! Actually, it is necessary to do pre-processing on
these images/crops. In this section, we will introduce several approaches for pre-processing.
The first and simple pre-processing approach is
them, which is presented as two lines Python codes as
>>> X -= np.mean(X, axis = 0)
>>> X /= np.std(X, axis = 0)
where, X is the input data (NumIns×NumDim). Another form of this pre-processing normalizes each dimension so that the min and max along the
dimension is -1 and 1 respectively. It only makes sense to apply this pre-processing if you have a reason to believe that different input features have
different scales (or units), but they should be of approximately equal importance to the learning algorithm. In case of images, the relative scales of
pixels are already approximately equal (and in range from 0 to 255), so it is not strictly necessary to perform this additional pre-processing step.
Another pre-processing approach similar to the first one is
. In this process, the data is first centered as described above. Then, you
can compute the covariance matrix that tells us about the correlation structure in the data:
>>> X -= np.mean(X, axis = 0)
>>> cov = np.dot(X.T, X) / X.shape[0]
# compute the covariance matrix
After that, you decorrelate the data by projecting the original (but zero-centered) data into the eigenbasis: