IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 11, NOVEMBER 2014 1623
A Reliability-Aware Address Mapping Strategy for
NAND Flash Memory Storage Systems
Yi Wang, Min Huang, Zili Shao, Henry C. B. Chan, Luis Angel D. Bathen,
and Nikil D. Dutt, Fellow, IEEE
Abstract—The increasing density of NAND flash memory leads
to a dramatic increase in the bit error rate of flash, which greatly
reduces the ability of error correcting codes (ECC) to handle
multibit errors.
NAND flash memory is normally used to store
the file system metadata and page mapping information. Thus,
a broken physical page containing metadata may cause an unin-
tended and severe change in functionality of the entire flash.
This paper presents Meta-Cure, a novel hardware and file system
interface that transparently protects metadata in the presence of
multibit faults. Meta-Cure exploits built-in ECC and replication
in order to protect pages containing critical data, such as file
system metadata. Redundant pairs are formed at run time and
distributed to different physical pages to protect against failures.
Meta-Cure requires no changes to the file system, on-chip hier-
archy, or hardware implementation of flash memory chip. We
evaluate Meta-Cure under a real-embedded platform using a
variety of I/O traces. The evaluation platform adopts dual ARM
Cortex A9 processor cores with 64 Gb
NAND flash memory. We
have evaluated the effectiveness of Meta-Cure on the new tech-
nology file system file system. Experimental results show that
the proposed technique can reduce uncorrectable page errors by
70.38% with less than 7.86% time overhead in comparison with
conventional error correction techniques.
Manuscript received November 27, 2013; revised January 30, 2014
and April 7, 2014; accepted July 28, 2014. Date of current version
October 16, 2014. This work was supported in part by the grants from NSF
award CCF-1029783 (Variability Expedition), in part by the Research Grants
Council of the Hong Kong Special Administrative Region, China, under Grant
GRF 15213814, in part by the Germany/Hong Kong Joint Research Scheme
sponsored by the Research Grants Council of Hong Kong and the Germany
Academic Exchange Service of Germany (Reference G_HK021/12), in part by
the National Natural Science Foundation of China under Project 61272103 and
Project 61373049, in part by National 863 Program 2013AA013202, in part by
the Hong Kong Polytechnic University under Grant 4-ZZD7, Grant G-YK24,
Grant G-YM10, and Grant G-YN36, and in part by Academician Workstation
Construction Projects in Guangdong Province (2012B090500020). This ver-
sion is a revised version. A preliminary version of this paper appears in
Proceedings of the 49th IEEE/ACM Annual Design Automation Conference
(DAC’12) [41]. This paper was recommended by Associate Editor J. Henkel.
Y. Wang is with the Guangdong Province Key Laboratory of Popular High
Performance Computers, Shenzhen University, Shenzhen 518060, China, and
also with the Department of Computing, Hong Kong Polytechnic University,
Hong Kong (e-mail: csywang@comp.polyu.edu.hk).
M. Huang is with Automatic Test and Control Institute, Harbin Institute of
Technology, Harbin 150001, China (e-mail: jimmyhuanghit@gmail.com).
Z. Shao is with Embedded Systems and CPS Laboratory, Department
of Computing, Hong Kong Polytechnic University, Hong Kong (e-mail:
cszlshao@comp.polyu.edu.hk).
H. C. B. Chan is with the Department of Computing, Hong Kong
Polytechnic University, Hong Kong (e-mail: cshchan@comp.polyu.edu.hk).
L. A. D. Bathen is with the Programming Systems Laboratory,
Intel Corporation, Santa Clara, CA 95052-8119 USA (e-mail:
danny.bathen@intel.com).
N. D. Dutt is with the Center for Embedded Computer Systems, University
of California, Irvine, Irvine, CA 92697 USA (e-mail: dutt@uci.edu).
Digital Object Identifier 10.1109/TCAD.2014.2347929
Index Terms—Error correcting codes (ECC), memory man-
agement, metadata,
NAND flash memory, redundancy, reliability.
I. INTRODUCTION
D
URING past decades, the capacity of NAND flash mem-
ory has been increasing dramatically. The dramatic
improvements in capacity and price of
NAND flash mem-
ory have been driven by aggressive shrinking of process
geometry and the increase of the number of bits stored in
each memory cell. However, the increasing density of flash
causes severe reliability issues. Previous studies have shown
that most multilevel cell (MLC) flash chips experience a sharp
increase in bit error rate after a number of reprograms [1]. With
the trend of increasing capacity of flash memory, the bit error
rate will become even worse because of closer voltage levels
assigned to consecutive logic states.
Error correcting codes (ECC) are widely used in
NAND
flash memory to ensure data integrity. ECC can effectively
protect data when the bit error rate is relatively low (e.g.,
SECDED ECC can provide single-error correction and double-
error detection). Since the strength of ECC is predefined and
fixed by chip manufacturer, with the wear of flash, ECC’s
ability to handle multibit errors beyond what it was originally
designed for greatly diminishes over the lifetime of the flash.
This poses a threat to the integrity of metadata (e.g., file system
metadata, page mappings) stored in flash [2]. If a flash mem-
ory page contains metadata, the data corruption of the page is
very serious, as it may cause an unintended change in function-
ality of the entire flash. While ECC-based approaches improve
the reliability, this paper presents Meta-Cure, a replication-
based approach that significantly improves flash reliability and
thus extends the lifetime of flash. Meta-Cure aims to design
a strategy to enhance the built-in ECC and to provide a more
reliable
NAND flash memory storage system.
In order to provide a reliable flash memory storage system,
the primary objective is to ensure the integrity of metadata
stored in flash while considering the distinct characteristics
of flash memory (i.e., limited erase count of flash memory
cells and “out-of-place update”). Most existing approaches
aim to minimize either the amount of critical data or/and
the number of block erase counts [3], [4]. Reliability is
enhanced through the modification of different components
in
NAND flash architecture, such as file system [5], [6],
parallelism of solid-state drive (SSD) [7], [8], flash trans-
lation layer (FTL) [9], [10], or the optimization of garbage
0278-0070
c
2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.