A Basic Introduction to Separable
Convolutions
Anyone who takes a look at the architecture of MobileNet will
undoubtedly come across the concept of separable convolutions. But
what is that, and how is it different from a normal convolution?
There are two main types of separable convolutions: spatial separable
convolutions, and depthwise separable convolutions.
. . .
Spatial Separable Convolutions
Conceptually, this is the easier one out of the two, and illustrates the
idea of separating one convolution into two well, so I’ll start with this.
Unfortunately, spatial separable convolutions have some significant
limitations, meaning that it is not heavily used in deep learning.
The spatial separable convolution is so named because it deals
primarily with the spatialdimensions of an image and kernel: the
width and the height. (The other dimension, the “depth” dimension,
is the number of channels of each image).
A spatial separable convolution simply divides a kernel into two,
smaller kernels. The most common case would be to divide a 3x3
kernel into a 3x1 and 1x3 kernel, like so:
Image 1: Separating a 3x3 kernel spatially