The Cortex-R52 branch prediction mechanisms detect branches at an early stage in the pipeline. Also,
they redirect instruction fetching to the appropriate address immediately, rather than waiting for the
branch to reach the end of the pipeline. However, not all branches are predicted in this way.
Branch Target Address Cache
The PFU contains a 16-entry Branch Target Address Cache (BTAC) to predict the target address
of indirect branches (except for subroutine returns). The BTAC implementation is
architecturally transparent, so it does not have to be flushed on a context switch.
Branch predictor
The branch predictor is a global type that uses branch history registers and a 2048-entry pattern
history prediction table.
Return stack
The PFU includes an 8-entry call-return stack to accelerate returns from subroutine calls. For
each subroutine call, the return address is pushed onto a hardware stack. When a subroutine
return is recognized, the address held in the return stack is popped, and the PFU uses it as the
predicted return address. The return stack is architecturally transparent, so it does not have to be
flushed on a context switch.
Exception Target Address Cache
The Exception Target Address Cache (ETAC) is a structure used to reduce the best case latency
of IRQ and FIQ exceptions by caching the address of generic handler for these exceptions.
The ETAC is enabled out of reset. Writing 1 to the system register CPUACTLR.ETACDIS,
disables the ETAC.
The ETAC supports caching of Interrupt (IRQ) and Fast Interrupt (FIQ) vector entries only.
Other types of exceptions do not allocate entries no hit in the ETAC. This is because a fast
response to the IRQ and FIQ exceptions is most critical in real-time systems.
A vector is only cached in the ETAC if the vector is in a TCM. A vector located in any other
type of memory never allocates or hits in the ETAC. This is because the TCMs are the only
memories with a perfect response. Other memories can be subject to cache misses and in these
cases the savings that the ETAC offers are minimal compared to the latency of the cache miss.
The ETAC only caches the vector corresponding to the IRQ or FIQ exception if the instruction
in the vector table is a compatible instruction. Compatible instructions are all encodings of B
#immed. If the exception vector is not a compatible instruction, the ETAC does not cache that
exception.
The IRQ and FIQ exception can be taken to either Exception Level EL1 or EL2, depending on
the exception level at the time of the interrupt and the values of HCR.IMO and HCR.FMO. The
ETAC independently supports both IRQ and FIQ exceptions taken to both EL1 and EL2, which
means that there are four independent entries for each of these cases.
For more information on:
• CPUACTLR, see 3.3.19 CPU Auxiliary Control Register on page 3-90.
• HCR, see 3.3.39 Hyp Configuration Register on page 3-111.
1.2.2 Advanced SIMD and floating-point support
The Advanced SIMD and floating-point that each core supports uses NEON
™
technology, a SIMD
architecture.
The Advanced SIMD and floating-point feature provides:
• Instructions for single-precision (C programming language float type) data-processing operations.
• Optional instructions for double-precision (C double type) data-processing operations.
• Combined Multiply and Accumulate instructions for increased precision (Fused MAC).
• Hardware support for conversion, addition, subtraction, multiplication with optional accumulate,
division, and square-root.
• Hardware support for denormals and all IEEE Standard 754-2008 rounding modes.
• For single-precision floating-point, there are 32 32-bit single-precision registers or 16 64-bit double-
precision registers. If the optional instructions for the double-precision and Advanced SIMD are
included, a total of 32 64-bit double-precision registers or 16 128-bit registers are available.
1 Introduction
1.2 Component blocks
100026_0102_00_en Copyright © 2016–2019 Arm Limited or its affiliates. All rights
reserved.
1-19
Non-Confidential