没有合适的资源?快使用搜索试试~ 我知道了~
首页PYNQ Classification - Python on Zynq FPGA for Neural Networks
资源详情
资源评论
资源推荐
Imperial College London
Department of Electrical and Electronic Engineering
Final Year Project Report 2017
Project Title: PYNQ Classification - Python on Zynq FPGA for Neural Networks
Student: Erwei Wang
CID: 00816456
Course: 4T
Project Supervisor: Prof P.Y.K. Cheung
Second Marker: Prof G.A. Constantinides
ABSTRACT
Convolutional Neural Networks (CNNs) have achieved a significant amount of success in solving a wide range
of classification problems. Traditionally, embedded CNN application prototypes have been implemented on
CPU or GPU based machines due to short development time, but sacrificing performance and energy effi-
ciency. However, recent advancements in high level synthesis (HLS) tools and PYNQ development boards
are making the prototyping effort on FPGA comparable to that of CPUs or GPUs, making them a good op-
tion for prototyping embedded CNN applications. This report presents a fast FPGA prototyping framework,
which is an Open Source framework designed to enable fast deployment of embedded CNN applications on
FPGA platforms. My framework provides HLS CNN layers, which can be parameterised for a wide range of
network specifications and provides state-of-the-art performance at low power consumption. By comparing
with PYNQ ARM CPU implementation, my CIFAR-10 prototype shows up to 43x acceleration, while maintain-
ing a 73.7% classification accuracy and 1.953 frames/J energy consumption.
2
ACKNOWLEDGEMENT
I would like to express my sincere gratitude to my supervisor Professor P.Y.K. Cheung, who not only provided
unstinting support and invaluable guidance throughout my four year’s study in Imperial College, but also sparked
my motivation to pursue a career in scientific research.
I would like to thank Dr. Peter Ogden for providing the PYNQ FPGA data transfer API design, which becomes the
backbone of my project’s architecture.
I would also like to thank Michaela Blott, Cathal McCabe, Giulio Gambardella and Andrea Solazzo from Xilinx
Ireland Lab for the warm hospitality on our visit, as well as invaluable guidance on the techniques to optimise
CNN implementation on FPGA. Thanks also to Patrick Lysaght from Xilinx Lab, San Jose, for initiating the Pynq
project and provide all the support I needed to make this a successful project.
Special thanks to Stylianos Venieris and Junyi Liu from Circuits and Systems lab, as well as Aaron Zhao and Daryl
Mah, who provided insightful ideas on the project and report.
3
ACRONYMS AND ABBREVIATIONS
AI Artificial Intelligence
API Application Programming Interface
ASIC Application-specific Integrated Circuit
BLAS Basic Linear Algebra Subprograms
BRAM Blocked Random Access Memory
BNN Binarised Neural Network
CNN Convolution Neural Network
CPU Central Processing Unit
DAG Directed Acyclic Graph
DSP Digital Signal Processor
DMA Direct Memory Access
FPGA Field-programmable Gate Array
GPU Graphics Processing Unit
HLS High Level Synthesis
HPC High-performance Computing
HTC High-throughput Computing
IP Intellectual Property
NIN Network in Network
NN Neural Network
OS Operating System
PYNQ Python Productivity for Zynq
RAM Random Access Memory
ReLU Rectified Linear Unit
RTL Register Transfer Language
SDF Synchronous Dataflow
SDFG Synchronous Dataflow Graph
SoC System on a Chip
4
CONTENTS
1. Introduction 8
2. Project Scope 10
3. Background 12
3.1. What is a Convolution Neural Network (CNN)? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2. Framework High Level Interface Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.1. Why Do We Need Another High Level CNN Framework? . . . . . . . . . . . . . . . . . . . . . . 14
3.2.2.
Existing Field-programmable Gate Array (
FPGA
)
CNN
Frameworks and Their High Level
Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.3. Installing CNN Frameworks on Embedded ARM Chipset . . . . . . . . . . . . . . . . . . . . . . 15
3.3. FPGA Layer Intellectual Property (IP) Library Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.1. Why Is FPGA Good at Accelerating CNN? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.2. Related Works on Accelerating 2D Convolution on FPGAs . . . . . . . . . . . . . . . . . . . . . 16
3.4. Data Quantisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4.1. Why Do We Quantise CNN? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4.2. Existing CNN Quantisation Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.5. PYNQ Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5.1. What is Python Productivity for Zynq (PYNQ) Platform? . . . . . . . . . . . . . . . . . . . . . . 19
3.5.2. Alternative Platforms for CNN Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4. Implementation - Overall Architecture 20
5. Implementation - ARM Linux Operating System (OS) Side 21
5.1. Framework Installation and Setup on PYNQ Linux OS . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.1.1. Caffe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.1.2. TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.1.3. Theano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.1.4. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2.
Application Programming Interface (
API
) for
FPGA
-Central Processing Unit (
CPU
) Data Transmission
24
6. Implementation - Zynq FPGA Side 25
6.1. Data Streaming Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.2. Quantisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2.1. 32-bit Floating-point Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2.2. Fixed-point Data Quantisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.3. FPGA Layer IPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.3.1. Convolution Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.3.2. Pooling Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.3.3. Fully-connected Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5
剩余64页未读,继续阅读
sunsanstone
- 粉丝: 0
- 资源: 5
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- 2023年中国辣条食品行业创新及消费需求洞察报告.pptx
- 2023年半导体行业20强品牌.pptx
- 2023年全球电力行业评论.pptx
- 2023年全球网络安全现状-劳动力资源和网络运营的全球发展新态势.pptx
- 毕业设计-基于单片机的液体密度检测系统设计.doc
- 家用清扫机器人设计.doc
- 基于VB+数据库SQL的教师信息管理系统设计与实现 计算机专业设计范文模板参考资料.pdf
- 官塘驿林场林防火(资源监管)“空天地人”四位一体监测系统方案.doc
- 基于专利语义表征的技术预见方法及其应用.docx
- 浅谈电子商务的现状及发展趋势学习总结.doc
- 基于单片机的智能仓库温湿度控制系统 (2).pdf
- 基于SSM框架知识产权管理系统 (2).pdf
- 9年终工作总结新年计划PPT模板.pptx
- Hytera海能达CH04L01 说明书.pdf
- 数据中心运维操作标准及流程.pdf
- 报告模板 -成本分析与报告培训之三.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0