COMPUTER MODELLING & NEW TECHNOLOGIES 2014 18(6) 7-14 Jiang Jingfei, Hu Rongdong, Mikel Lujan, Dou Yong
7
Mathematical and Computer Modelling
Accuracy evaluation of deep belief networks with fixed-point
arithmetic
Jingfei Jiang
1*
, Rongdong Hu
1
, Lujάn Mikel
2
, Yong Dou
1
1
Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, ChangSha, Hunan 410073,
China
2
University of Manchester, Manchester, M13 9PL, UK
Received 12 June 2014, www.tsi.lv
Abstract
Deep Belief Networks (DBNs) are state-of-art Machine Learning techniques and one of the most important unsupervised learning
algorithms. Training DBNs is computationally intensive which naturally leads to investigate FPGA acceleration. Fixed-point
arithmetic can be used when implementing DBNs in FPGAs to reduce execution time, but it is not clear the implications for
accuracy. Previous studies have focused only on accelerators using some fixed bit-widths. A contribution of this paper is to
demonstrate the bit-width effect on various configurations of DBNs in a comprehensive way by experimental evaluation. Explicit
performance changing points are found using various bit-widths. The impact of sigmoid function approximation, required part of
DBNs, is evaluated. A solution of mixed bit-widths DBN is proposed, fitting the bit-widths of FPGA primitives and gaining similar
performance to the software implementation. Our results provide a guide to inform the design choices on bit-widths when
implementing DBNs in FPGAs documenting clearly the trade-off in accuracy.
Keywords: deep belief network, fixed-point arithmetic, bit-width, FPGA
* Corresponding author e-mail: jingfeijiang@nudt.edu.cn
1 Introduction
Deep neural networks have become a “hot topic” in the
Machine Learning community with successful results
demonstrated with Deep Belief Networks (DBNs) [1],
denoising autoencoder [2], sparse coding [3] and etc.
DBNs have been shown to be among the best neural
networks even for challenging recognition, mining and
synthesis tasks. A DBN is built on a subset of neural
networks known as Restricted Boltzmann Machine
(RBM). Running a DBN is a time-consuming task due to
its large scale and processing characteristics. Many
experiments have often reported taking weeks, to search
the large parameter space (numbers of layers and
neurons, learning rate, momentum and all kinds of
regulation terms) and calculate millions of parameters
(weights and biases). One good example is Quoc et al. [4]
who used a cluster in Google of 1,000 machines (16,000
cores) for a week to demonstrate the success of larger
scale unsupervised learning from internet images
recognition.
Reducing the execution time of the training phase and
prediction of a DBN is one critical barrier which has
restricted the mass adoption of DBNs. Interest in the
acceleration of DBNs has built up in recent years. FPGAs
are attractive platforms for accelerating DBNs. For
example, a RBM of 256x256 nodes was tested on a
platform of four Xilinx Virtex II FPGAs and gained a
speedup of 145-fold over an optimized C program
running on a 2.8-GHz Intel processor [5]. Using Altera
Stratix III FPGA, Kim et al. [6] also gained significant
speedup for a 256x1024 RBM. Multi-FPGA solutions
were discussed to determine the extensibility of RBM in
[7, 8].
Existing works on FPGA implementations of neural
networks often have vast and regular processing units to
map neurons partially or wholly at a time. Weights and
neuron values are stored in on-chip RAM during
processing and are swapped out to off-chip memory after
processing. It is too expensive to support a large number
of floating-point units on chip and store values using the
standard double precision floating-point representations
in on-chip RAMs. Many of the previous attempts with
FPGAs for neural networks implemented fixed bit-widths
(8 bits, 16 bits or 32 bits). Bit-widths with integral
multiple of bytes are convenient to align with other
components (such as IP cores and user interfaces) and
easier to design. Previous works have mainly analysed
the impact of bit-widths on accuracy and execution time
of old-style neural networks [9-11]. All reported RBM (a
building component of DBN) designs on FPGA selected
fixed-point arithmetic with a fixed bit-width as well, e.g.
16 bits in [6, 8] or 32 bits in [5] without analyzing in
depth the implications for accuracy. Thus, it is not clear
whether this kind of fixed bit-width is really the most
suitable and area efficient for DBNs.
Using bit-width unequal to the machine word-length
on a standard processor or GPU may rarely deliver any