PipeCNN：可重构的OpenCL FPGA CNN加速器

125 浏览量更新于2024-08-28 收藏 243KB PDF 举报

PipeCNN是一个基于OpenCL的开源FPGA加速器，专为卷积神经网络（CNN）设计，旨在解决深度学习应用中的计算密集型挑战。随着CNN在图像分类、视频分析和语音识别等领域的广泛应用，它们对高性能计算的需求日益增长，然而传统的GPU加速虽然强大，但能耗较高。为此，研究人员开始探索 FPGA 作为替代方案，利用其灵活性和在能效方面的优势。 PipeCNN项目的设计者是北京交通大学信息科学学院的Dong Wang、Ke Xu和Diankun Jiang。他们提出了一种高效的FPGA加速器，该加速器可以适应各种FPGA平台，提供可重构的性能和成本效益。这个项目的核心在于其开放源代码特性，使得无论是研究人员还是教师，都可以利用它作为探索新型硬件架构的通用框架，或者作为一个现成的设计实例来教学和实践。 PipeCNN的设计理念是采用可编程逻辑来实现CNN算法的并行处理，通过OpenCL这样的高级编译工具，开发者可以快速验证和部署设计，简化了硬件与软件之间的交互。这种架构允许用户根据具体的应用需求和FPGA资源，动态调整计算管道的配置，从而优化性能和能效。此外，由于其开源性质， PipeCNN还鼓励社区的贡献和改进，促进了CNN加速技术的共享和进步。 PipeCNN是一个具有创新意义的解决方案，它将FPGA的优势与现代编程模型相结合，为CNN的硬件加速提供了新的可能性。对于那些寻求高性能、低功耗和可扩展性的人来说，这是一个重要的研究成果和实践工具，推动了AI硬件生态的发展。

PipeCNN: An OpenCL-Based Open-Source FPGA

Accelerator for Convolution Neural Networks

Dong Wang, Ke Xu and Diankun Jiang

Institute of Information Science

Beijing Jiaotong University

Beijing 100044, China

Email: {wangdong, 17112071, 16125141}@bjtu.edu.cn

Abstract—Convolutional neural networks (CNNs) have been

employed in many applications, such as image classiﬁcation, video

analysis and speech recognition. Being compute-intensive, CNNs

are widely accelerated by GPUs with high power dissipations.

Recently, studies were carried out exploiting FPGA as CNN ac-

celerator because of its reconﬁgurability and advantage on energy

efﬁciency over GPU, especially when OpenCL-based high-level

synthesis tools are now available providing fast veriﬁcation and

implementation ﬂows. In this paper, we demonstrate PipeCNN

– an efﬁcient FPGA accelerator that can be implemented on a

variety of FPGA platforms with reconﬁgurable performance and

cost. The PipeCNN project is openly accessible, and thus can

be used either by researchers as a generic framework to explore

new hardware architectures or by teachers as a off-the-self design

example for any academic courses related to FPGAs.

I. INTRODUCTION

Convolutional neural network (CNN) [1], [2], as an emerg-

ing deep learning architecture, has received huge attentions

in various applications, such as video surveillance, image

searching, speech recognition, and robot vision. Currently,

GPUs are widely adopted as hardware accelerators for training

deep neuron networks. Yet they are generally energy inefﬁcient

for embedded applications. FPGAs, which provide massive

processing elements, reconﬁgurable interconnections and low-

er power dissipation, are naturally suitable to implement neural

network circuits. Moreover, FPGAs are also ﬂexible with

reduced data precision at circuit level, which will reduce the

memory footprint and bandwidth requirements, resulting in

better energy efﬁciency than GPUs.

Studies, such as [4], [5], have reported efﬁcient CNN ac-

celerators on embedded FPGA platforms. However, traditional

studies require deep background knowledge in digital circuit

design and great effort in writing complex RTL codes, prac-

ticing time-consuming simulations and compilations before

one can actually run accelerators on hardware. As the rapid

development in deep learning areas, the unfriendly features of

RTL-based design scheme hinder domain experts from utiliz-

ing FPGAs to explore new architectures for neural network

accelerators.

High-Level Synthesis (HLS) tools, which enable automatic

compilation from high-level programs (C/C++) to low-level

This work was supported by NNSF of China Grants NO.61574013,

61532005.

ŽŶǀ WŽŽůŝŶŐ

DĞŵZ

Ă

ŚĂŶŶĞůWŝƉĞƐEZĂŶŐĞ<ĞƌŶĞů

^ŝŶŐůĞͲƚŚƌĞĂĚĞĚ

<ĞƌŶĞů

ĞĞƉůǇWŝƉĞůŝŶĞĚKƉĞŶ><ĞƌŶĞůƐ

DĞŵtZ

>ZE

'ůŽďĂůDĞŵŽƌǇ

Fig. 1. The top-level architecture of PipeCNN.

RTL speciﬁcations, have became increasingly popular in both

academic and industrial ﬁelds. Compared with traditional

methodology, the HLS tools provide faster hardware develop-

ment cycle and software-friendly program interfaces that can

be easily integrated with user applications [3].

In this paper, we introduce PipeCNN, an efﬁcient OpenCL-

based CNN accelerator on FPGAs. A set of conﬁgurable

OpenCL kernels are designed to accelerate a wide range of

neural network models. Throughput and memory bandwidth

optimization schemes are also presented and discussed. All

the design ﬁles are openly accessible and can be downloaded

from [6].

In the ﬁnal demo, PipeCNN was implemented and evalu-

ated on three different FPGA platforms, including Cyclone-

V SEA5 SoC, Stratix-V GXA7 and Arria-10 AX115. CNN-

based image classiﬁcation applications were accelerated by

PipeCNN. The processing speed and power consumption

were measured and demonstrated at runtime showing scalable

performance and cost that can meet different application

requirements and resource constrains.

II. A

RCHITECTURE DESIGN AND OPTIMIZATION

A. Accelerator Architecture

As shown in Fig. 1, PipeCNN consists of a group of

OpenCL kernels that are cascaded by using Altera’s OpenCL

extension Channels. Two data mover kernels, namely MemRD

下载后可阅读完整内容，剩余3页未读，立即下载

weixin_38728347

粉丝: 4
资源: 912

PipeCNN：可重构的OpenCL FPGA CNN加速器

CNN唯一开源FPGA实现

PipeCNN设计

PipeCNN：用于卷积神经网络的基于OpenCL的FPGA加速器

TF2:基于FPGA的开源深度学习推理引擎

MatCL：MathWorks Matlab的OpenCL包装器

OpenCL-Z-开源

OpenCL.jl：OpenCL Julia绑定

Altera发布业界第一个面向FPGA的OpenCL计划.pdf

FIR-Axel:使用PYNQ架构的FIR加速器设计

OpenCL学习文档

最新资源