2 Deep Learning and Parallel Computing Environment for Bioengineering Systems
tency and other factors make this approach untenable.
As others have noted, GPUs are designed to handle
high-dimensional matrices, which is a feature of many
ML models. TPUs are designed specifically for ML mod-
els and don’t include the technology required for image
display.
1.1.3 Computational Intelligence
Computational intelligence deals with the automatic
adaptation and organizes accordingly with respect to
the implementation environment. By possessing the at-
tributes such a s knowledge discovery, data abstraction,
association and generalization, the system can learn and
deal with new situations in the changing environments.
Silicon-based computational intelligence comprises hy-
brids of paradigms such as artificial neural networks,
fuzzy systems and evolutionary algorithms, augmented
with knowledge elements, which are often designed to
mimic one or more aspects of carbon-based biological
intelligence [3].
1.1.4 GPU, Deep Learning and
Computational Intelligence
GPU is basically based on parallel processing in na-
ture, which helps in improving the execution time
of the deep learning algorithms. By imparting the
parallel deep learning using GPU, all the computa-
tional intelligence research applications which involves
images, videos, etc., can be trained at a very fast
rate and the entire execution time is reduced drasti-
cally.
The rest of the chapter is organized as follows. In
Section 1.2, we discuss the role and types of paralleliza-
tion in deep learning. Section 1.3 tells us the role of
GPU in parallel deep learning. Section 1.4 presents the
data flow of parallelization and a numerical example
on how the parallelization works in deep learning with
a real time application. Section 1.5 shows the imple-
mentation details and screenshots, while Section 1.6
summarizes the entire contents discussed in the above
sections.
1.2 DEEP LEARNING AND
PARALLELIZATION
In this section, we will discuss what is parallel pro-
cessing and the algorithms which are suitable for deep
learning through analysis. The analysis is based on the
time and throughput of the algorithms.
1.2.1 Parallel Processing Concepts
Parallel processing concept arises to facilitate the anal-
ysis of huge data and acquire meaningful information
from it. Speech processing, medical imaging, bioinfor-
matics and many similar fields are facing the difficulty
of analyzing huge amounts of complex data. There are
some problems in which the run-time complexity can-
not be improved even with many processors.
Parallel algorithms are called efficient when their
run-time complexity divided by the number of proces-
sors is equal to the best run-time complexity in sequen-
tial processing. Not everything should be parallelized.
User experience, for example, is a serial task. If one
thread redraws the screen when some other thread is
trying to click s omething that cannot be encouraged for
parallel processing, it has to be sequential. Sometimes
sequential processing is faster than parallel where the
latter requires gathering all the data in one place, but
the former does not have to gather data [4].
In single processor systems, a set of inputs are given
to the processor and it returns the output after pro-
cessing. The performance of the processor can be made
faster by increasing the frequency limit. But, there is a
certain limit beyond which the processor emits a huge
amount of heat. The amount of heat emitted by the elec-
trons moving through the processor is very high, hence
there is a certain frequency limit beyond which the pro-
cessor melts down.
FIG. 1.1 Single processor execution.
To rectify the issue shown in Fig. 1.1,wemovetopar-
allel processing where more than one processor is used
to process the data. This way the workload is divided
between multiple processors. See Fig. 1.2.
Parallel computing has its own disadvantage such
as dependency between processors, i.e., one processor
might wait for the results of the process running on an-
other processor. In modern computing, we address the
number of processors by using the term core. Dual-core,
multi-core, i3, i5, i7, etc., all denote the number of pro-
cessors.