494
• 2023 IEEE International Solid-State Circuits Conference
ISSCC 2023 / SESSION 33 / NON-VOLATILE MEMORY AND COMPUTE-IN-MEMORY / 33.1
33.1 A 16nm 32Mb Embedded STT-MRAM with a 6ns Read-Access
Time, a 1M-Cycle Write Endurance, 20-Year Retention at
150°C and MTJ-OTP Solutions for Magnetic Immunity
Po-Hao Lee, Chia-Fu Lee, Yi-Chun Shih, Hon-Jarn Lin, Yen-An Chang,
Cheng-Han Lu, Yu-Lin Chen, Chieh-Pu Lo, Chung-Chieh Chen,
Cheng-Hsiung Kuo, Tan-Li Chou, Chia-Yu Wang, J. J. Wu, Roger Wang,
Harry Chuang, Yih Wang, Yu-Der Chih, Tsung-Yung Jonathan Chang
TSMC, Hsinchu, Taiwan
Embedded non-volatile memory (eNVM) is an essential element for microcontrollers
(MCUs) used in automotive applications. As the automotive market transitions to greater
electrification and autonomy, we are seeing MCU growth in the car: including integration
to simplify system design, an electrical and electronic (E/E) architectural evolution to
domain/zone control, and over-the-air (OTA) updates beyond 128Mb eNVM densities.
To support this transition technology nodes are migrating from 55/40nm to 28/16nm.
In addition, traditional charge-based embedded Flash will be replaced by Back-end-of-
line (BEOL) memories: such as STT-MRAM [1-3], PCRAM [4], and RRAM. Of these
candidates, STT-MRAM is the most promising solution for automotive applications due
to its high-temperature data retention, high write endurance, and fast write speed.
However, STT-MRAM still faces several challenges from the inherent properties of
magnetic tunnel junctions (MTJs) and the side effects of process integration: such as
array-level variability and magnetic-field interference (MFI). In this work, several design
solutions are proposed to overcome these challenges: (1) a novel merged-local-reference
scheme is used to overcome the array-edge effect on the MTJs; (2) write bias segment
trimming is used to mitigate the near-far effect for better write endurance; (3) an MTJ-
based one-time programmable (OTP) is used to preserve critical data during the
wafer-level chip-scale packaging (WLCSP) process (360°C, 3hr), and; (4) a novel
sensing-reference scheme is used so that this MTJ-OTP is immune to external-magnetic-
field interference. A 32-Mb STT-MRAM test chip based on these proposed solutions is
successfully fabricated in a 16nm FinFET CMOS process, Measured results confirm its
excellent performance and manufacturability for next generation automotive MCU
application.
Figure 33.1.1 shows the organization of the 16-nm 32-Mb STT-MRAM macro: consisting
of four 8Mb banks sharing one global analog bias-generator circuit (bandgap, reference-
voltage generator, and charge pump) and the control logic. Four 2Mb sub-banks share
one local voltage regulator. Each sub-bank has four 0.5Mb arrays. The 0.5Mb array
features 512b per BL and 1224b per WL in a butterfly-array configuration. Each logical
IO uses a 32:1 data-cell column and a 2:1 reference-cell column. The BL & SL drivers
are on both sides of the array. A sense amplifier with a programmable offset-cancellation
trimming is in the center of the IO.
Figure 33.1.2 shows the 1T1MTJ bitcell structure, the MTJ cross-section, and the array
architecture. Each 1T1MTJ is 0.033μm
2
with routing for WL, BL, and common SL on
metals 3, 6, and 2, respectively. Due to the small STT-MRAM read margin, an MTJ-based
reference sensing scheme, which combines both AP and P MTJ states, is used to
generate the reference sensing current that tracks the bitcell current across temperature.
Since the BL and SL are on different metal layers and have different metal width and
spacing, resulting in different resistance variation and a further impact on the read
margin, this work uses a local WL-location tracking reference scheme, which uses the
same WL for the data and reference cells. Therefore, the parasitic resistance due to
BL/SL/WL and the location-dependence of the MTJ resistance is common; thus, it can
be eliminated by using differential sensing.
The read-sensing-scheme operation is critical to overcome the small read margin of STT-
MRAM [6]. To eliminate MTJ variation’s impact on the local reference scheme,
techniques are implemented: (1) a merged-reference SA scheme (Fig. 33.1.3 top) is
designed to average 36 MTJs, with a default 1:1 R
P
:R
AP
ratio, to set the reference current
as the average of I
P
and I
AP
, thereby tracking MTJ process and temperature variation; (2)
during the wafer-sort test, the built-in self-test (BIST) engine searches for the optimum
R
P
:R
AP
ratio that achieves the lowest read failure rate and that maximizes the sensing
margin of each local reference-trim unit (16 × 36 = 576b). The V
DD
vs. access time Shmoo
plot shows a <6ns read at 0.7V, from -40 –150°C.
In prior work, an MTJ-OTP is used to achieve data retention during WLCSP and for
immunity to magnetic fields [6]. In this work, we implement MTJ-OTP cells at the edge
of the arrays, so that they can share periphery circuits with the main array for a low area
overhead. The ratio of MTJ-OTP:STT-MRAM is 12kb:2Mb. The OTP devices are 3T1MTJ
cells for MTJ breakdown and 1T1MTJ for read. Figure 33.1.4 shows a comparison
between a normal MRAM read and the two types of MTJ-OTP read. For a normal MRAM
read operation, the reference branches are merged to generate a ½I
AP
+ ½I
P
reference
current, where the read margin is about 10%. One MTJ-OTP read scheme is R
P
-based,
where the reference is the I
P
current, and a stored-1 is R
AP
while a stored-0 is R
BD
. The
read margin for this scheme can reach 47%. However, it is sensitive to magnetic
interference under high magnetic fields without shielding. We propose using an R
BD
-
based read, which uses I
BD
as the reference, and shifts the reference between I
BD
and I
P
by tuning the merged reference trim branches: stored-0 is R
BD
, while stored-1 can be
either R
P
or R
AP
. Although, this scheme achieves a smaller read margin, 22%, it is more
suitable for applications that require high magnetic-field immunity.
To replace embedded Flash, STT-MRAM requires an MTJ with a high energy barrier (Eb)
[7] so that it can retain stored data during infrared reflow. As a result, the MTJ switching
current can be more than hundreds of μA. This work uses a write voltage generator, with
WL location-bias trimming, to generate write voltages. This provides a sufficient MTJ
switching current for reliable writes and minimizes MTJ voltage stress for higher write
endurance (>1M cycles). Figure 33.1.5 shows write Shmoo plot for different WL
segments. Due to the CSL structure, the SL parasitic resistance, for each row, is lower
than that of each row’s BL; hence, the total SL/BL parasitic resistance is not uniform: it
is smaller on the near side and larger on far side. If a fixed voltage is applied for write
operations, then the bitcells on far side will have a lower voltage across the MTJ, while
those on the near side will have a higher voltage. This location dependency may cause
additional write soft errors on the far side or alternatively write hard errors on the near
slide. This work uses a write-segment trim scheme to achieve a fast write performance
(100ns write pulse) and a high write endurance (>1M cycles) by trimming write voltage
for different WL locations. A 1M-cycle endurance test shows that the endurance error
rate is reduced from 0.19 to 0.01ppm by using the write-segment trim scheme.
Figure 33.1.6 shows the chip probe (CP) flow for wafer-sort test. In addition to the MRAM
memory macro, there are two accompanied soft IPs: a BIST module with a standard
JTAG interface and a memory controller (MC). BIST supports sense amplifier self-
trimming, local reference self-trimming, and data cell self-repair to assist in production
test flow. The MC for read/write access to the MRAM macro and the double error
correction, triple error detection (DECTED) error correcting code (ECC) for error
correction. The intelligent write algorithm is the main function of the MRAM controller
for reliable write operations (>69Mb/s including a write-bias setup and verify/retry time)
and high write endurance (>1M cycles). It contains read-before-write to decide which
bits need to be written and write voltage setting adjustment with different WL segments.
Figure 33.1.7 shows the specification summary table and die photograph of the MRAM
test chip. A 32Mb embedded MRAM chip using a 16nm FINFET logic process is
presented. This product-grade MRAM macro requires only two power supplies: core and
IO voltage. A 6ns read-access time and read power of 0.8μA/MHz/b are demonstrated.
Standby mode consumes less than 107μA at 25°C. Low-power standby consumes less
than 66.7μA at 25°C and has <100ns wake-up time, thereby meeting the requirements
for low-power applications. This MRAM technology passes 1M write endurance and the
20-year data retention endurance tests at 150°C.
References:
[1] O. Glowinski et al., “MRAM as Embedded Non-Volatile Memory Solution for 22FFL
FinFET Technology,” IEDM, pp. 18.1.1-18.1.4, 2018.
[2] Y.-D. Chih et al., “A 22nm 32Mb Embedded STT-MRAM with 10ns Read Speed, 1M
Cycle Write Endurance, 10 Years Retention at 150°C and High Immunity to Magnetic
Field Interference,” ISSCC, pp. 222-223, 2020.
[3] Y.-C. Shih et al., “A Reflow-Capable, Embedded 8Mb STT-MRAM Macro with 9ns
Read Access Time in 16nm FinFET Logic CMOS Process,” IEDM, pp. 11.4.1-11.4.4, 2020.
[4] D. Min et al., “18nm FDSOI Technology Platform embedding PCM & Innovative
Continuous-Active Construct Enhancing Performance for Leading-Edge MCU
Applications,” IEDM, pp. 13.1.1-13.1.4, 2021.
[5] Y.-D. Chih et al., “Design Challenges and Solutions of Emerging Nonvolatile Memory
for Embedded Applications,” IEDM, pp. 2.4.1-2.4.4, 2021.
[6] W. J. Gallagher et al., “Recent Progress and Next Directions for Embedded MRAM
Technology”, IEEE VLSI Tech., pp T190-T191, 2019.
978-1-6654-9016-0/23/$31.00 ©2023 IEEE