the shared parameters (usually once for each forward or backward-propagation step) and works
well for GPUs in a single server that share a high speed bus [14]. It can be used with larger
models as hardware constraints per node are no more a limitation, but is highly vulnerable to
worker failures.
1.2.3 Performance metrics
The term performance in these systems has a double interpretation. On one hand it refers
to the predictive accuracy of the model, and on the other to the computational speed of the
process. The first metric is independent of the platform and is the performance metric to
compare multiple models, whereas the second depends on the platform on which the model is
deployed and is mainly measured by metrics such as:
• Speedup: ratio of solution time for the sequential algorithms versus its parallel counter-
part
• Efficiency: ratio of speedup to the number of processors
• Scalability: efficiency as a function of an increasing number of processors
Some of this metrics will be highly dependent on the cluster configuration, the type of
network used and the efficiency of the framework using the libraries and managing resources.
1.3 Deep learning overview
Despite having gained a lot of notoriety in the last years, research in multilayer neural
networks span many decades [57]. Conventional machine learning techniques had difficulty in
processing natural data in their raw form, so to make classifiers more powerful it was common
to use generic non-linear features such as kernel methods and create multi-stage hand-tuned
pipelines of extracted features and discriminative classifiers [55]. However, those generic features
did not allow the learner to generalize well from the training examples, so one of the main
advantages of these new methods is the fact that good features can be learned automatically
using a general purpose learning procedure and without human engineering [41]. That is why
deep learning techniques are also referenced as representation learning methods that are fed
with raw data and have multiple levels of representation, obtained by composing simple but
nonlinear modules that each transform the representation at one level into a representation
at a higher, slightly more abstract level [41, 7]. These methods have dramatically improved
the state of the art in recent years in multiple areas such as speech recognition, visual object
recognition and detection, drug discovery and genomics.
1.3.1 Early tendencies and evolution
Early works of multilayer neural networks date back to Ivakhnenko [34] for Feedforward
Multilayer Perceptron type networks and to Fukushima [20] for today’s version of convolutional
neural networks, inspired by the structure of the nervous visual system, having lower order
hypercomplex cells in the early layers and higher order hypercomplex cells in the latest ones.
Although this works mainly established the structure for deep neural networks that was later
popularised, they still lacked a good learning algorithm for weight updates during the training
phase. It was not until the mid 80s that backpropagation was popularised for neural networks
9