BNN框架Finn在PYNQ上的高性能实现与MNIST实验

1星需积分: 46 162 浏览量更新于2024-09-08 1 收藏 634KB PDF 举报

本文主要探讨了在PYNQ开发板上利用Binarized Neural Networks (BNN)进行高效计算的实践方法。BNN是一种特殊的神经网络架构，通过将传统的浮点权重和激活值转换为二进制形式，显著减少了硬件需求，同时保持了较高的分类精度。文章介绍了一个名为FINN的框架，它旨在构建快速且灵活的FPGA加速器，采用了一种异构流水线架构，能够适应不同用户对性能的需求。 FINN的核心优势在于其新颖的优化策略，这些策略使得二进制神经网络模型能够有效地映射到硬件资源上。作者们实现了全连接层、卷积层和池化层，每个层的计算资源可以根据用户的实际吞吐量需求进行定制。这种灵活性使得FINN能够在Zynq C706嵌入式FPGA平台上表现出色，即使在较低的系统功耗（小于25W）下，也能实现每秒高达1230万张图像的分类，同时保持极低的延迟，例如在MNIST数据集上的分类操作，平均延迟仅为0.31微秒。这个研究对于那些寻求在资源受限环境中提升深度学习性能的开发者具有重要意义，特别是对于那些使用PYNQ平台进行原型设计和部署的工程师。通过FINN，他们可以利用FPGA的并行处理能力，提高神经网络的实时性能，同时还能享受到BNN带来的能源效率优势。此外，FINN的开源教程提供了详细的指导，使得其他研究人员和开发者能够轻松地在自己的项目中应用BNN技术，推动了FPGA在AI领域的广泛应用和发展。

FINN: A Framework for Fast, Scalable Binarized Neural

Network Inference

Yaman Umuroglu

*†

, Nicholas J. Fraser

*‡

, Giulio Gambardella

, Michaela Blott

Philip Leong

‡

, Magnus Jahre

†

and Kees Vissers

Xilinx Research Labs;

†

Norwegian University of Science and Technology;

‡

University of Sydney

yamanu@idi.ntnu.no

ABSTRACT

Research has shown that convolutional neural networks con-

tain signiﬁcant redundancy, and high classiﬁcation accuracy

can be obtained even when weights and activations are re-

duced from ﬂoating point to binary values. In this paper,

we present Finn, a framework for building fast and ﬂexible

FPGA accelerators using a ﬂexible heterogeneous stream-

ing architecture. By utilizing a novel set of optimizations

that enable eﬃcient mapping of binarized neural networks

to hardware, we implement fully connected, convolutional

and pooling layers, with per-layer compute resources being

tailored to user-provided throughput requirements. On a

ZC706 embedded FPGA platform drawing less than 25 W

total system power, we demonstrate up to 12.3 million image

classiﬁcations per second with 0.31

s latency on the MNIST

dataset with 95.8% accuracy, and 21906 image classiﬁcations

per second with 283

s latency on the CIFAR-10 and SVHN

datasets with respectively 80.1% and 94.9% accuracy. To

the best of our knowledge, ours are the fastest classiﬁcation

rates reported to date on these benchmarks.

1. INTRODUCTION

Convolutional Neural Networks (CNNs) have dramatically

improved in recent years, their performance now exceeding

that of other visual recognition algorithms [14], and even sur-

passing human accuracy on certain problems [23, 28]. They

are likely to play an important role in enabling ubiquitous

machine vision and intelligence on all kinds of devices, but a

signiﬁcant computational challenge remains. Modern CNNs

may contain millions of ﬂoating-point parameters and require

billions of ﬂoating-point operations to recognize a single im-

age. Furthermore, these requirements tend to increase as re-

searchers explore deeper networks. For instance, AlexNet [14]

(the winning entry for ImageNet Large Scale Visual Recogni-

tion Competition (ILSVRC) [22] in 2012) required 244 MB of

parameters and 1.4 billon ﬂoating point operations (GFLOP)

per image, while VGG-16 [24] from ILSVRC 2014 required

552MB of parameters and 30.8 GFLOP per image.

To appear in the 25th International Symposium on Field-

Programmable Gate Arrays, February 2017.

While the vast majority of CNNs implementations use

ﬂoating point parameters, a growing body of research demon-

strates this approach incorporates signiﬁcant redundancy.

Recently, it has been shown [5, 26, 21, 12, 31] that neu-

ral networks can classify accurately using one- or two-bit

quantization for weights and activations. Such a combina-

tion of low-precision arithmetic and small memory footprint

presents a unique opportunity for fast and energy-eﬃcient

image classiﬁcation using Field Programmable Grid Arrays

(FPGAs). FPGAs have much higher theoretical peak per-

formance for binary operations compared to ﬂoating point,

while the small memory footprint removes the oﬀ-chip mem-

ory bottleneck by keeping parameters on-chip, even for large

networks. Binarized Neural Networks (BNNs), proposed by

Courbariaux et al. [5], are particularly appealing since they

can be implemented almost entirely with binary operations,

with the potential to attain performance in the teraoperations

per second (TOPS) range on FPGAs.

In this work, we propose Finn, a framework for build-

ing scalable and fast BNN inference accelerators on FPGAs.

Finn-generated accelerators can perform millions of classi-

ﬁcations per second with sub-microsecond latency, thereby

making them ideal for supporting real-time embedded appli-

cations such as augmented reality, autonomous driving and

robotics. Compute resources can be scaled to meet a given

classiﬁcation rate requirement. We demonstrate Finn’s capa-

bilities with a series of prototypes for classifying the MNIST,

SVHN and CIFAR-10 benchmark datasets. Our classiﬁcation

rate results surpass the best previously published results by

over

48×

for MNIST,

2.2×

for CIFAR-10 and

8×

for SVHN.

To the best of our knowledge, this is the fastest reported

neural network inference implementation on these datasets.

The novel contributions are:

• Quantiﬁcation of peak performance for BNNs on

FPGAs using a rooﬂine model.

•

A set of novel optimizations for mapping BNNs onto

FPGA more eﬃciently.

•

A BNN architecture and accelerator construction tool,

permitting customization of throughput.

•

A range of prototypes that demonstrate the potential

of BNNs on an oﬀ-the-shelf FPGAs platform.

The rest of this paper is organized as follows: Section 2

provides background on CNNs, BNNs, and their hardware

implementations. Section 3 discusses BNNs accuracy and

peak performance on FPGAs. Section 4 describes Finn’s

arXiv:1612.07119v1 [cs.CV] 1 Dec 2016

下载后可阅读完整内容，剩余9页未读，立即下载

sunsanstone

粉丝: 0
资源: 5

BNN框架Finn在PYNQ上的高性能实现与MNIST实验

深度学习BNN的Lipschitz连续性与鲁棒性增强

基于BNN的图像姿态预测模型研究

VMD与BNN结合的神经网络预测技术研究

BNN-PYNQ-master.zip_BNN_PYNQ_PYNQ+神经网络_labelm2j_pynq bnn

bnn_pynq.zip

BNN-PYNQ-master.zip_6BNN.COM_6bnn_com_PYNQ的BNN搭建_gradehrl_pynq b

PYNQ上实现BNN网络识别数字

基于PYNQ的BNN重建项目-lfcW1A1

基于PYNQ复现的BNN重建项目-lfcW1A2

bnn.zip_bnn.com_微社区

最新资源