402
• 2023 IEEE International Solid-State Circuits Conference
ISSCC 2023 / SESSION 28 / HIGH-DENSITY MEMORIES AND HIGH-SPEED INTERFACE / 28.2
28.2 A High-Performance 1Tb 3b/Cell 3D-NAND Flash with a
194MB/s Write Throughput on over 300 Layers
Byungryul Kim, Seungpil Lee, Beomseok Hah, Kangwoo Park, Yongsoon Park,
Kangwook Jo, Yujong Noh, Hyeoncheon Seol, Hyunsoo Lee, Jaehyeon Shin,
Seongjin Choi, Youngdon Jung, Sungho Ahn, Yonghun Park, Sujeong Oh,
Myungsu Kim, Seonguk Kim, Hyunwook Park, Taeho Lee, Haeun Won,
Minsung Kim, Cheulhee Koo, Yeonjoo Choi, Suyoung Choi, Sechun Park,
Dongkyu Youn, Junyoun Lim, Wonsun Park, Hwang Hur, Kichang Kwean,
Hongsok Choi, Woopyo Jeong, Sungyong Chung, Jungdal Choi, Seonyong Cha
SK hynix Semiconductor, Icheon, Korea
As data produced by multimedia explodes and demand for data storage increases, the
most important topics for the NAND-Flash memory field are continuous performance
improvements and cost/bit reduction. To improve performance, features to improve the
quality of service (QoS) as well as the read/write performance [1] are required. To reduce
the cost/bit, the number of stacked layers needs to increase, while the pitch between
stacked layers decreases. It is necessary to manage the increasing WL resistance
produced by a decreased stack pitch. To overcome these challenges, this paper presents
techniques applied to a >300-layer 1Tb 3b/cell (TLC) 3D-NAND Flash memory: 1) A triple-
verify program (TPGM) technique is used to improve program performance. 2) An
adaptive unselected string pre-charge (AUSP) technique is used to reduce disturb and
program time (t
PROG
). 3) A programmed dummy string (PDS) technique is used to reduce
WL settling time. 4) An all-pass rising (APR) technique is used to reduce the read time
(t
R
), 5) A plane-level read retry (PLRR) technique is used during erase to improve the
QoS.
The TPGM scheme reduces t
PROG
by narrowing the cell threshold voltage (V
TH
)
distribution. Increasing the step voltage (V
STEP
) is one way to reduce program time,
whereby an incremental step pulse programming method increases the step voltage
(V
STEP
) but makes the V
TH
distribution wider. However, improving the V
TH
distribution is
essential to increasing the step voltage and reducing the program time. In a program
operation, the threshold voltage difference (ΔV
TH
) is determined by difference between
the step voltage applied to WL and the channel voltage (V
CH
). Figure 28.2.1 (a) and Fig.
28.2.1 (b) present the difference between the double-verify program (DPGM) and the
TPGM scheme. The DPGM scheme [2] divides cells into three groups, according to the
program verify (PV) levels and then controls the channel voltage of each group by
applying three different BL voltages (V
BL
). Appling V
DD
to the group 1 (GR1) BLs to isolate
the channels; the cells of GR1 are not programmed. V
A
is applied to group 2 (GR2) BLs,
and ΔV
TH
= V
STEP
– V
A
. 0V is applied to group 3 (GR3) BL and ΔV
TH
= V
STEP
. In DPGM, the
V
TH
distribution can be improved by two kinds of ΔV
TH
. Adding one more group (ΔV
TH
=
V
STEP
– V
B
, V
A
> V
B
) to existing three groups in DPGM. TPGM categorizes cells into four
groups according to their PV levels and drives the channel voltage of each group by
applying four different BL voltages. Figure 28.2.1(c) illustrates the counter driving
scheme that prevents BL coupling effect. BL1 is driven by the series connection of NMOS
and is set to V
REF1
– V
THN
, while BL2 is initially set to V
DD
and is discharged to V
REF2
+ V
THP
by the series connection of PMOS and NMOS. V
THN
and V
THP
represents the threshold
voltages of the NMOS and the PMOS. BL1 rising is affected by BL2 falling, however the
BL1 level does not exceed the target level due to inverse coupling. The counter driving
scheme enhances BL settling and TPGM efficiency. By converting the V
TH
distribution
improvements into program time reduction results in approximately a 10% of program
time reduction.
The AUSP scheme reduces t
PROG
by tightening the cell’s V
TH
distribution. A program pulse
is preceded by an unselected-string precharge (USP) [3] period to initialize all channels.
USP prevents lack of channel boosting in a program pulse by precharging channels with
V
DD
, but a hot-carrier injection (HCI) disturbance occurs, as shown in Fig. 28.2.2(a). A
voltage below V
PASS
(V
LOW
) is applied to all WLs, and the selected cell with a V
TH
higher
than V
LOW
is turned off. The source-selection line (SSL) side channel is pre-charged to
V
DD
and the Drain Selection Lines (DSL) side channel is undriven. Due to the voltage
difference between the SSL- and DSL-side channel, the HCI disturbance is produced by
the high electric field. In the AUSP scheme, the SSL-side dummy WL is controlled by
V
DWL
, and V
DWL
– V
TH(DummyCell)
is applied to the channel. HCI disturbances are reduced due
to a lower electric field. Figure 28.2.2(b) illustrates the incremental channel initialization
voltage that is proportional to the number of program loops. The channel initialization
voltage corresponds to the SSL-side channel voltage; a higher channel initialization
voltage is required for higher program loops. The channel initialization voltage can be
lowered for lower program loops, thereby reducing HCI disturb further. As shown in Fig.
28.2.2(c), the cell’s V
TH
distribution becomes widen after programming, while
programming with AUSP results in a narrower V
TH
distribution, compared to a
conventional USP. This reduced V
TH
distribution contributes to around 2% t
PROG
reduction.
The PDS scheme reduces t
R
and t
PROG
by programming dummy cells of the dummy
strings. DSLs are divided by the DSL cut, as shown in Fig. 28.2.3(a), which separates
each DSL; meanwhile, the dummy WLs, main WLs, and SSLs are connected to several
strings in the 3D-NAND cell array. A dummy string produced by the DSL cut acts as
capacitive load for the case of a rising/falling WL; hence, delaying WL settling time. Figure
28.2.3(b) and 28.2.3(c) present different channel conditions between an unprogrammed
dummy string and a programmed dummy string. In an unprogrammed dummy string,
all the cells are turned on, and the channel voltage becomes 0V via the source-line voltage
(V
SL
) when V
PASS
is applied to all WLs. The non-floating channel acts as a capacitive load
and affects the WL settling time. The PDS scheme programs the V
TH
of dummy string’s
SSL-side dummy cell above V
PASS
to turn off the dummy cell. As the SSL-side dummy
cell is turned off, the floating channel no longer acts as capacitive load and the WL settling
time is reduced.
The APR scheme reduces t
R
by reducing the WL rise time. The different resistance and
capacitance characteristics of each WL require different V
PASS
sources to be connected
to each WL group, and one source is selected by the switch circuits. As depicted in Fig.
28.2.4(a), in a conventional scheme one target V
PASS
source is selected and applied to
the dedicated WL during V
PASS
rise time. As in shown in Fig 28.2.4(b), the APR scheme
divides the V
PASS
rise time into two parts, A and B. In part A, all V
PASS
sources are
connected to all WL to reduce the WL rise time. In part B, one target V
PASS
source is
applied to the dedicated WL so that it is same as the conventional V
PASS
rising scheme.
The APR scheme reduces t
R
by around 2%.
As program/erase (P/E) cycles increase, the number of erroneous bits also increase;
adjusting the read voltage bias can reduce the number erroneous bits. The read retry
(RR) scheme with read level change is one effective method to overcome these
situations. However, in a conventional RR the read level can only be changed when the
read operation for all planes in the NAND device are completed. As a result, the read
performance is determined by the last plane terminated. In this work, a PLRR scheme is
used to alleviate read performance deterioration in the NAND controller. Figure 28.2.5
shows an example PLRR sequence: the read level is changed regardless of the operations
occurring in other planes. Therefore, the read performance can be improved compared
to the previous one since subsequent read commands can be issued immediately. In
addition, the PLRR effect becomes greater when the number of planes increases.
In this work, five new techniques are introduced to achieve a high-performance 1-Tb
3bit/cell 3D-NAND Flash memory using a peripheral circuit under cell array architecture.
The key comparison table, shown in Fig 28.2.6, reports a 20Gb/mm
2
bit density, which
is achieved by using over 300-stacked WLs with an improved program throughput, t
R
and bit density compared to prior work [4]. A die microphotograph of the fabricated TLC
NAND chip is shown in Figure 28.2.7.
References:
[1] A. Grossi et al., “Quality-of-service implications of enhanced program algorithms for
charge-trapping NAND in future solid-state drives,” IEEE Trans. Device Mater. Rel., vol.
15, no. 3, pp. 363-369, Sept. 2015.
[2] C. Miccoli et al., “Investigation of the programming accuracy of a double-verify ISPP
algorithm for nanoscale NAND Flash memories,” IEEE IRPS, pp. 5.1-5.6, 2011.
[3] R. Yamashita et al., “A 512Gb 3b/cell flash memory on 64-word-line-layer BiCS
technology”,ISSCC, pp. 196-197, 2017.
[4] M. Kim et al., “A 1Tb 3b/Cell 8th-Generation 3D-NAND Flash Memory with 164MB/s
Write Throughput and a 2.4Gb/s Interface,” ISSCC, pp. 136-137, 2022.
978-1-6654-9016-0/23/$31.00 ©2023 IEEE