An Efficient Sudden-Power-off-Recovery Design with Guaranteed
Booting Time for Solid State Drives
Yu-Ming Chang
1,3
, Ping-Hsien Lin
1,4
, Ye-Jyun Lin
1,5
, Tai-Chun Kuo
1,6
,
Yuan-Hao Chang
2,10
, Yung-Chun Li
1,7
, Hsiang-Pang Li
1,8
,KCWang
1,9
1
Macronix International Co., Ltd., Emerging System Lab., Hsinchu 300, Taiwan, R.O.C.
2
Institute of Information Science, Academia Sinica Taipei 115, Taiwan, R.O.C.
{
3
sanchang,
4
pinghsienlin,
5
yejyunlin,
6
tjkuo,
7
monixslee,
8
sbli,
9
kcwang}@mxic.com.tw,
10
johnson@iis.sinica.edu.tw
Abstract
Solid state drives (SSDs) that deliver high-bandwidth and low-latency
performance have become the mainstream of storage devices in modern
systems. Over the past years, there has been a great deal of researches
conducted to improve the SSD performance or reliability with parallel or
efficient address translation designs. On the contrary, little work is done
for the optimization to guarantee the booting/recovery time of SSDs after
any sudden power-off. Motivated by the fact that the fast-growing SSD
capacity gradually makes existing scanning and recovering processes be-
come infeasible and unacceptable, we propose an efficient sudden-power-
off-recovery design to recover an SSD with guaranteed booting time.
The proposed design was implemented on an SSD prototyping platform
equipped with in-house NAND flash memories and was evaluated with
various benchmarks. The results demonstrate that after sudden power-
off, the prototyped SSD can be recovered with a guaranteed and bounded
booting time between 80ms and 200ms.
1. Introduction
Sold state drives (SSDs) that include multiple flash-memory chips have
become a popular alternative to replace hard disk drives (HDDs) in recent
years, because of their shock resistance, high energy efficiency, and high
I/O performance. In order to be compatible with the existing storage in-
terface, i.e., logical block address (LBA), each SSD needs to keep track
of the address translation information from LBAs to their corresponding
physical addresses in the flash memory. The address translation infor-
mation is usually maintained/cached in the internal RAM space of the
SSD. Once a sudden power-off (or power-loss) occurs, the address trans-
lation information would be crashed and need to be recovered by scanning
the whole flash-memory space. Such a scanning and recovering process
is extremely time-consuming, and is gradually becoming infeasible and
unacceptable due to the fast-growing storage capacity. Such an observa-
tion motivates us to explore the solution that could efficiently recover the
address translation information to make SSDs resilient to sudden power
losses and system crashes.
An SSD usually consists of multiple flash-memory chips. Each chip
is composed of a large number of blocks. Each block consists of a fixed
number of pages, where a block is the basic unit of erase operations and
a page is the unit of read/write operations. Each page is divided into
a data area and a spare area. The data area is used to store user data,
and the spare area is also called out-of-band (OOB) area that maintains
the house-keeping information such as the error correction codes (ECC)
and information of logical block addresses. Because of the write-once
property, each page can not be overwritten unless its residing block is
erased. A typical solution to overcome this constraint is to adopt the
out-place update that writes updated data in free pages to improve the
write performance. As a result, multiple versions of the same data could
coexist in the SSD at the same time. The up-to-date version is called
valid data and the old versions are considered as invalid data. The pages
with valid data (resp. invalid data) are referred to as valid pages (resp.
invalid pages). Due to the out-place update, in each SSD, a management
software, i.e., flash translation layer (FTL), is needed to maintain the
address translation information that maps each LBA to its corresponding
valid page/data. Note that we refer “address translation information” and
“mapping information” interchangeably when there is no ambiguity. In
addition, the FTL also includes a garbage collector that is activated to
reclaim space of invalid data when there is not enough free space in the
SSD. Due to the limited number of program/erase (P/E) cycles of each
block, the FTL also includes a wear leveler that is used to prolong the
lifetime of the SSD by evenly erasing flash blocks to avoid wearing out
any block prematurely.
Due to the importance of address translation information in SSDs,
many excellent FTL designs were proposed to resolve the management
issue and to achieve a good compromise between read/write performance
and RAM space requirement [7,10,13,16]. Based on the on-demand load-
ing/storing address translation information, some proposed to include a
update mapping table that only logs the modified mapping information
to reduce the overheads on loading/storing address translation informa-
tion [14]. Another research direction is the garbage collection (GC) that
also has significant impact on the performance of SSDs. The simplest so-
lution is the greedy policy that selects the block with the largest number of
invalid pages to minimize the overheads on moving valid pages out of the
to-be-erased blocks [17], and some others proposed hot-cold swapping
and data clustering to improve the GC performance with considering the
age of valid data [6,9,11,12]. Meanwhile, to resolve the endurance issue,
some researchers focused on different wear leveling designs to extend
the lifetime of flash storage devices by moving data around flash blocks
to prevent wearing out any block excessively [1, 2, 5, 15]. Furthermore,
due to the advances of manufacturing technology, the reliability of flash
memory has drawn a lot of attention in recent years. To tackle this issue,
some researchers proposed to improve the reliability of flash memory by
reducing the write disturbance [4] or including some parity information
to improve the capability on correcting error data [3, 19]. However, there
is little work that focuses on how to efficiently and reliably recover the
address translation information of SSDs after sudden power loses, even
though the fast-growing capacity of SSDs gradually makes greedy scan-
ning methods infeasible.
In this work, an efficient sudden-power-off recovery (SPOR) design is
proposed to enhance the reliability of SSDs with the guaranteed booting
time without additional hardware support. The proposed design aims at
recovering address translation information correctly and efficiently after
any (normal or sudden) power-off. In particular, the design is configurable
such that the booting (recovery) time can be bounded and guaranteed by
adjusting the synchronization frequency of address translation informa-
tion from RAM to flash memory during the runtime. We must point out
that the proposed design only has to read relatively a small number of
pages during the system recovery, in contrast to the past recovery de-
sign by scanning the whole flash-memory space. The evaluation results
demonstrate that the average booting time is 132ms under all investigated
benchmarks. Especially, the booting time is bounded between the theo-
retical worst-case and the best-case values derived from our analysis.
The rest of the paper is organized as follows. Section 2 presents the
proposed sudden-power-off-recovery (SPOR) design. Section 3 reports
the experiment results. Section 4 is the conclusion.
2. A Sudden Power-off Recovery Design
2.1 Overview
In this section, a sudden power-off recovery (SPOR) design is proposed
to enhance the reliability of an SSD by enabling the SSD to survive un-
der any sudden power-off without any power-off notification to the con-
troller of the SSD. To be more specific, the SPOR design can efficiently
recover the address translation information after sudden power losses.
Meanwhile, it is realized with a software implementation so that no ad-
ditional hardware resource, e.g., super capacitor, is required. To preserve
the flexibility, this design is configurable and able to control the expected
978-1-4673-8833-7/16/$31.00 ©2016 IEEE