面向行列的ReRAM内存架构加速卷积神经网络

研究论文

120 浏览量更新于2024-08-26 收藏 439KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"这篇研究论文提出了一种基于Resistive Random Access Memory (ReRAM)的面向行列的内存体系结构，特别设计用于加速卷积神经网络（CNN）的运算。该结构通过在内存中进行计算来提高效率，采用了一个新颖的2维可访问ReRAM单元电路，并将整个内存系统组织为一个2维数组。这种结构允许特定的存储单元同时按列和行局部性进行访问。对于CNN的内存计算，仅通过2维读取操作访问同一子阵列中的相关单元，避免了传统ReRAM结构中的冗余访问（列或行），从而消除了处理CNN时不必要的数据移动。通过这种方法，论文旨在优化内存计算性能并减少能耗。" 在卷积神经网络（CNN）的计算中，内存访问和数据移动是性能瓶颈之一。传统的ReRAM架构通常只能进行一维访问，这导致在执行CNN的卷积操作时，需要频繁地读取和写入大量数据，增加了功耗和延迟。论文提出的ReRAM基内存体系结构解决了这个问题，通过实现2D访问能力，只读取和操作与当前计算相关的存储单元，显著减少了不必要的数据传输。文章中的关键创新点在于设计了一种新型ReRAM单元电路，它具备双向访问能力，可以沿行和列两个维度进行读取。这种2D访问方式使得计算过程更加高效，因为CNN的卷积操作天然具有局部性，即相邻的数据在计算中经常被一起使用。通过将内存组织成2D数组，可以利用这一特性，只访问需要的子阵列，而无需全阵列扫描，从而提高了速度和能效。此外，通过消除冗余访问，该架构还能优化能量效率。在传统的ReRAM系统中，访问非相关单元会导致额外的能耗。而在新的设计中，由于访问更加精确，可以减少这种无用功，这对于能源有限的嵌入式或移动设备尤其重要。这篇研究论文提出了一种面向CNN的、基于ReRAM的优化内存架构，其核心是2D可访问的ReRAM单元和2D数组组织，目标是提高计算效率和降低能耗。这种技术有潜力改变深度学习硬件的设计，为未来的高性能、低功耗神经网络系统提供基础。

资源详情

资源推荐

580

IEICE TRANS. ELECTRON., VOL.E102–C, NO.7 JULY 2019

BRIEF PAPER

Special Section on Analog Circuits and Their Application Technologies

A ReRAM-Based Row-Column-Oriented Memory Architecture for

Convolutional Neural Networks

Yan CHEN

†,††a)

, Jing ZHANG

†

, Yuebing XU

†

, Yingjie ZHANG

†††

, Nonmembers, Renyuan ZHANG

††

, Member,

and Yasuhiko NAKASHIMA

††

, Fellow

SUMMARY An eﬃcient resistive random access memory (ReRAM)

structure is developed for accelerating convolutional neural network (CNN)

signed with two-directional (2-D) accessibility. The entire memory system

is organized as a 2-D array, in which speciﬁc memory cells can be iden-

tically accessed by both of column- and row-locality. For the in-memory

computations of CNNs, only relevant cells in an identical sub-array are

accessed by 2-D read-out operations, which is hardly implemented by con-

ventional ReRAM cells. In this manner, the redundant access (column or

row) of the conventional ReRAM structures is prevented to eliminated the

unnecessary data movement when CNNs are processed in-memory. From

the simulation results, the energy and bandwidth eﬃciency of the proposed

memory structure are 1.4x and 5x of a state-of-the-art ReRAM architecture,

respectively.

key words: data locality, ReRAM, convolutional neural networks, row-

column-oriented access

1. Introduction

Deep learning (DL) has achieved noticeable advances in a

series of cognitive applications, such as visual recognition,

object detection, speech recognition, and so forth [1]–[3].

In particular, convolutional neural networks (CNNs) have

been established as a powerful class of DL models for vi-

sual recognition. As the CNN models going deeper, there

is an increasing need of powerful and eﬃcient accelerat-

ing computation for CNN deployments, especially for these

even human brain scale CNN models.

The emerging resistive random access memory

(ReRAM) cells [4], [5] are promising for brain-scale CNN

deployments, deriving from their capability of eﬃciently

performing arithmetic operations beyond data storage. This

overcomes the well-known “memory wall” problem in con-

ventional FPGA/ASIC based accelerators

[6], [7], which are

deemed diﬃcult for brain-scale CNN deployments. In gen-

eral, ReRAM-based accelerators mainly consist of process

elements and storage components. Both of them are per-

formed in ReRAM cells.

Manuscript received November 9, 2018.

Manuscript revised January 31, 2019.

†

The authors are with College of Electrical and Information

Engineering, Hunan University, China.

††

The authors are with Graduate School of Information Science,

Nara Institute of Science and Technology, Ikoma-shi, 630–0192

Japan.

†††

The author is with College of Computer Science and Elec-

tronic Engineering, Hunan University, China.

a) E-mail: chenyan1226@hnu.edu.cn

DOI: 10.1587/transele.2018CTS0001

Despite the excellent computation capability of

ReRAM cells for CNN deployments, there exists signiﬁ-

cant energy overhead in the storage component due to the

massive amounts of memory accesses. Prime

[4] focuses

on mapping DL applications into the ReRAM crossbar ar-

ray, and it dynamically conﬁgures ReRAM cells as pro-

cess elements or as storage components for energy sav-

ing. PipeLayer

[5] replicates weight parameters of CNNs

in ReRAM crossbar arrays before inference to reduce the

data movement and boost the throughput, but they cannot

save energy by fully exploiting the locality of the activa-

tions of CNNs. Since the weight parameters can be pre-

pared before CNN inference, reducing the memory access

of activations becomes critical for energy saving. Neverthe-

less, it is not easy to reuse the activations owing to that the

reusable input activations locate at both rows and columns

of the raw feature maps of CNNs. Intuitively, the ReRAM

memory components, which enable the row and column ac-

cesses, can achieve eﬃcient data locality because they only

need to provide few new added row or column activations.

Though RC-NVM

[8] builds a ReRAM-based row/column

memory component for in-memory database applications, it

cannot be adopted for large-scale CNN deployments due to

the severe sneak-path issue caused by the rigorous demand

of symmetric ReRAM cells.

In this paper, we propose a ReRAM-based memory ar-

chitecture, which is composed of two-layered ReRAM cells

and two control transistors. It enables two-directional (2-

D) accesses, both row and column accesses, for exploiting

the locality of activations to reduce the data movements in

CNN inference. Evaluation results based on representative

CNN models show that, the proposed design achieves 1.4x

energy saving and 5.0x bandwidth saving over a state-of-

the-art ReRAM architecture.

2. Preliminaries and Motivations

CNN models mainly consist of convolutional (Conv) layers

and fully-connected (FC) layers. Especially, Conv layers

occupy over 90% of the computation in most representative

CNN models

[7]. Figure 1(a) depicts the convolving opera-

tions of Conv layers. Output activations (out) in each feature

map are generated by convolving the N channels of shared

kernel weights (w) to input activations (ia) under a stride

size S. M groups of w generate M channels of out. Fur-

thermore, the computation of Conv layers can unify the FC

 2019 The Institute of Electronics, Information and Communication Engineers

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38630612

粉丝: 5
资源: 891

面向行列的ReRAM内存架构加速卷积神经网络

富士通业内密度8Mbit ReRAM将量产

基于ReRAM的神经网络加速器发展概况.pdf

忆阻神经网络数值模拟

simulink忆阻器电路仿真

新兴非易失性存储(nvm)技术及市场趋势-2020版

半导体存储器未来发展方向

vit-keras-0.0.11.tar.gz

5212-微信小程序疫苗预约系统+ssm（源码+数据库+lun文）.zip

基于 Flask 的书评系统.zip

5205-微信小程序的二手物品交易平台ssm（源码+数据库+lun文）.zip

电影推荐系统综合应用.zip

5217-微信小程序英语互助小程序springboot（源码+数据库+lun文）.zip

5159-微信小程序博客小程序+ssm（源码+数据库+lun文）.zip

bacpypes-0.16.0-py3-none-any.whl

bacpypes-0.13.8-py3-none-any.whl

c#轻量级高并发物联网服务器接收程序源码（仅仅是接收硬件数据程序，没有web端，不是java，协议自己写，如果问及这些问题统统不

5061-微信小程序健身房私教预约系统+ssm（源码+数据库+lun文）.zip

ssm+mysql的校园生活管理系统(源码+lw+ppt)

5230-微信小程序微信小程序研学自习室选座与门禁系统的实现与开发springboot（源码+数据库+lun文）.zip

5122-微信小程序智慧乡村旅游服务平台的设计与实现+ssm（源码+数据库+lun文）.zip

最新资源