580
IEICE TRANS. ELECTRON., VOL.E102–C, NO.7 JULY 2019
BRIEF PAPER
Special Section on Analog Circuits and Their Application Technologies
A ReRAM-Based Row-Column-Oriented Memory Architecture for
Convolutional Neural Networks
Yan CHEN
†,††a)
, Jing ZHANG
†
, Yuebing XU
†
, Yingjie ZHANG
†††
, Nonmembers, Renyuan ZHANG
††
, Member,
and Yasuhiko NAKASHIMA
††
, Fellow
SUMMARY An efficient resistive random access memory (ReRAM)
structure is developed for accelerating convolutional neural network (CNN)
powered by the in-memory computation. A novel ReRAM cell circuit is de-
signed with two-directional (2-D) accessibility. The entire memory system
is organized as a 2-D array, in which specific memory cells can be iden-
tically accessed by both of column- and row-locality. For the in-memory
computations of CNNs, only relevant cells in an identical sub-array are
accessed by 2-D read-out operations, which is hardly implemented by con-
ventional ReRAM cells. In this manner, the redundant access (column or
row) of the conventional ReRAM structures is prevented to eliminated the
unnecessary data movement when CNNs are processed in-memory. From
the simulation results, the energy and bandwidth efficiency of the proposed
memory structure are 1.4x and 5x of a state-of-the-art ReRAM architecture,
respectively.
key words: data locality, ReRAM, convolutional neural networks, row-
column-oriented access
1. Introduction
Deep learning (DL) has achieved noticeable advances in a
series of cognitive applications, such as visual recognition,
object detection, speech recognition, and so forth [1]–[3].
In particular, convolutional neural networks (CNNs) have
been established as a powerful class of DL models for vi-
sual recognition. As the CNN models going deeper, there
is an increasing need of powerful and efficient accelerat-
ing computation for CNN deployments, especially for these
even human brain scale CNN models.
The emerging resistive random access memory
(ReRAM) cells [4], [5] are promising for brain-scale CNN
deployments, deriving from their capability of efficiently
performing arithmetic operations beyond data storage. This
overcomes the well-known “memory wall” problem in con-
ventional FPGA/ASIC based accelerators
[6], [7], which are
deemed difficult for brain-scale CNN deployments. In gen-
eral, ReRAM-based accelerators mainly consist of process
elements and storage components. Both of them are per-
formed in ReRAM cells.
Manuscript received November 9, 2018.
Manuscript revised January 31, 2019.
†
The authors are with College of Electrical and Information
Engineering, Hunan University, China.
††
The authors are with Graduate School of Information Science,
Nara Institute of Science and Technology, Ikoma-shi, 630–0192
Japan.
†††
The author is with College of Computer Science and Elec-
tronic Engineering, Hunan University, China.
a) E-mail: chenyan1226@hnu.edu.cn
DOI: 10.1587/transele.2018CTS0001
Despite the excellent computation capability of
ReRAM cells for CNN deployments, there exists signifi-
cant energy overhead in the storage component due to the
massive amounts of memory accesses. Prime
[4] focuses
on mapping DL applications into the ReRAM crossbar ar-
ray, and it dynamically configures ReRAM cells as pro-
cess elements or as storage components for energy sav-
ing. PipeLayer
[5] replicates weight parameters of CNNs
in ReRAM crossbar arrays before inference to reduce the
data movement and boost the throughput, but they cannot
save energy by fully exploiting the locality of the activa-
tions of CNNs. Since the weight parameters can be pre-
pared before CNN inference, reducing the memory access
of activations becomes critical for energy saving. Neverthe-
less, it is not easy to reuse the activations owing to that the
reusable input activations locate at both rows and columns
of the raw feature maps of CNNs. Intuitively, the ReRAM
memory components, which enable the row and column ac-
cesses, can achieve efficient data locality because they only
need to provide few new added row or column activations.
Though RC-NVM
[8] builds a ReRAM-based row/column
memory component for in-memory database applications, it
cannot be adopted for large-scale CNN deployments due to
the severe sneak-path issue caused by the rigorous demand
of symmetric ReRAM cells.
In this paper, we propose a ReRAM-based memory ar-
chitecture, which is composed of two-layered ReRAM cells
and two control transistors. It enables two-directional (2-
D) accesses, both row and column accesses, for exploiting
the locality of activations to reduce the data movements in
CNN inference. Evaluation results based on representative
CNN models show that, the proposed design achieves 1.4x
energy saving and 5.0x bandwidth saving over a state-of-
the-art ReRAM architecture.
2. Preliminaries and Motivations
CNN models mainly consist of convolutional (Conv) layers
and fully-connected (FC) layers. Especially, Conv layers
occupy over 90% of the computation in most representative
CNN models
[7]. Figure 1(a) depicts the convolving opera-
tions of Conv layers. Output activations (out) in each feature
map are generated by convolving the N channels of shared
kernel weights (w) to input activations (ia) under a stride
size S. M groups of w generate M channels of out. Fur-
thermore, the computation of Conv layers can unify the FC
Copyright
c
2019 The Institute of Electronics, Information and Communication Engineers