"Cortex-R52官方文档：ARM Cortex-R52处理器技术参考手册"

需积分: 47 94 浏览量更新于2024-02-01 2 收藏 3.45MB PDF 举报

The "CortexR52_TRM.pdf" is the official technical reference manual for the Arm® Cortex®-R52 Processor, providing detailed information on the architecture, features, and functionalities of the processor. This document is copyrighted by Arm Limited and its affiliates, with all rights reserved. The Cortex-R52 Processor is designed for use in safety-critical embedded systems that require high performance and reliability. The technical reference manual provides a comprehensive overview of the processor's capabilities, including its pipeline structure, memory management unit, instruction set architecture, and system control co-processor. The manual includes release information, detailing the document history and any revisions made to the content. The first release for r0p0 was issued on August 12, 2016, followed by subsequent releases for r1p0 and r1p2. Each release is marked as either confidential or non-confidential, indicating the level of access and distribution allowed for the document. Overall, the CortexR52_TRM.pdf is a valuable resource for developers, engineers, and designers working with Cortex-R52 Processor-based systems. It serves as a comprehensive guide for understanding the technical specifications and capabilities of the processor, facilitating the development of safe, reliable, and high-performance embedded systems. This document plays a crucial role in enabling the successful implementation and utilization of the Cortex-R52 Processor in a wide range of safety-critical applications.

Table 1-1 Configuration options

Feature Options Done at

Number of cores 1-4 cores. Implementation

Lock-step Redundant logic, flops, and comparators for DCLS included or not.

DCLS configuration with 2, 4, 6, or 8 instances of the core logic, depending

on the number of cores configured.

Implementation

EL1-controlled MPU 16, 20, or 24 programmable EL2-controlled Memory Protection Unit

(MPU) regions per core.

Implementation

EL2-controlled MPU 0, 16, 20, or 24 programmable EL2-controlled MPU regions per core. Implementation

AXI bus protection No bus protection included, only signal integrity protection included, or

signal integrity and interconnect protection are included.

Implementation

RAM protection Included for all RAMs or not included. Implementation

Number of interrupts (SPIs) into the

interrupt controller

32-960 in multiples of 32, with a minimum of 32 interrupts per core. Implementation

AXIS ID bits Any non-zero number, but 5-16 preferred. Implementation

Advanced SIMD and floating-point

capabilities for each core

Single-precision floating-point only or single-precision and double-

precision floating-point and Advanced SIMD.

Implementation

Size of each of the three TCMs on

each core

0KB, 8KB-1MB (powers of 2). Implementation

TCM wait states for each core 0 or 1 wait states. Implementation

Instruction cache size for each core 4KB, 8KB, 16KB, 32KB, or excluded completely. Implementation

Data cache size for each core 4KB, 8KB, 16KB, 32KB, or excluded completely. Implementation

External device interfaces to GIC 0 or 1 external devices. Implementation

Flash ECC scheme Switches the flash memory integrity protection scheme between 64-bit ECC

and 128-bit ECC.

Implementation

Reset all registers Only required registers or all programmer-visible registers are reset in the

hardware.

Integration

TCM boot Boot with ATCM enabled and at address 0x0, or disabled at reset. Integration

Flash boot Boot with flash memory enabled or disabled at reset. Integration

Flash interface base address Base address of Flash interface region. Integration

Flash region present tie-off Flash region access to AXIM interface or Flash interface. Integration

LLPP interface base address and size Base address and size (powers of 2) of LLPP region. Integration

LLPP region present tie-off LLPP region access to AXIM interface or LLPP. Integration

TCMs base addresses

Base address of TCM regions as seen by AXIS interface. Integration

Processor configurations

The Cortex-R52 processor can be configured to implement DCLS and Split/Lock configurations.

DCLS

In DCLS configurations, there is a second, redundant copy of the majority of the core logic for each core,

and a redundant copy of the shared logic.

1 Introduction

1.1 About the Cortex

-R52 processor

reserved.

1-16

Non-Confidential

The redundant logic is driven by the same inputs as the functional logic. In particular, the redundant core

logic shares the same cache RAMs and TCMs as the functional core. Therefore, only one set of cache

RAMs and TCMs is required. The redundant logic operates in lock-step with the core, but does not

directly affect the processor behavior in any way. The processor outputs to the rest of the system and the

core outputs to the cache RAMs and TCMs are driven exclusively by the functional core.

During implementation, comparator logic can be included to compare the outputs of the redundant logic

and the functional logic. These comparators can detect a single fault that occurs in either set of logic

because of radiation or circuit failure. When used with RAM error detection schemes, the system can be

protected from faults.

If you are implementing a DCLS configuration, contact Arm for more information.

Split/Lock

In Split/Lock configuration, there must be two or four complete redundant copies of each core. The

following table shows how the cores are used in Lock mode and Split mode. In the following table:

• The number of physical cores is N.

• The number of cores used in Lock mode is LOCK_N.

• The number of cores used in Split mode is SPLIT_N.

Table 1-2 Split/Lock configuration

N LOCK_N SPLIT_N

2 1 2

4 2 4

In Lock mode, the higher order cores function as redundant copies of the lower order cores. For example,

if N is 4, only the lower order cores are logically present, that is, core 0 and core 1. Core 2 and core 3 are

the higher order cores which are logically not present, but function as redundant copies. Although

present, the inputs and outputs, cache RAMs, and TCMs belonging to the higher order cores are disabled

and must not be used in Lock mode.

In Split mode, all interfaces, cache RAMs, and TCMs associated to the number of physical cores selected

are present and enabled but redundancy checking is not possible.

Similar to DCLS, comparator logic can be included to compare the outputs of the redundant logic and

functional logic during Lock mode operation. Split mode operation disables the comparator logic.

For Split/Lock, a new input signal CFGSLSPLIT must be set to determine whether Split or Lock mode

is configured. If Lock mode is selected, all the DCLS signals must be driven in addition to

CFGSLSPLIT. If Split mode is selected, only CLKINDCLS must be driven in addition to

CFGSLSPLIT. For more information on DCLS signals, see A.15 DCLS signals on page Appx-A-585.

If you are implementing Split/Lock configuration, contact Arm for more information.

Related reference

1.2 Component blocks on page 1-18

1 Introduction

1.1 About the Cortex

-R52 processor

reserved.

1-17

Non-Confidential

The Cortex-R52 branch prediction mechanisms detect branches at an early stage in the pipeline. Also,

they redirect instruction fetching to the appropriate address immediately, rather than waiting for the

branch to reach the end of the pipeline. However, not all branches are predicted in this way.

Branch Target Address Cache

The PFU contains a 16-entry Branch Target Address Cache (BTAC) to predict the target address

of indirect branches (except for subroutine returns). The BTAC implementation is

architecturally transparent, so it does not have to be flushed on a context switch.

Branch predictor

The branch predictor is a global type that uses branch history registers and a 2048-entry pattern

history prediction table.

Return stack

The PFU includes an 8-entry call-return stack to accelerate returns from subroutine calls. For

each subroutine call, the return address is pushed onto a hardware stack. When a subroutine

return is recognized, the address held in the return stack is popped, and the PFU uses it as the

predicted return address. The return stack is architecturally transparent, so it does not have to be

flushed on a context switch.

Exception Target Address Cache

The Exception Target Address Cache (ETAC) is a structure used to reduce the best case latency

of IRQ and FIQ exceptions by caching the address of generic handler for these exceptions.

The ETAC is enabled out of reset. Writing 1 to the system register CPUACTLR.ETACDIS,

disables the ETAC.

The ETAC supports caching of Interrupt (IRQ) and Fast Interrupt (FIQ) vector entries only.

Other types of exceptions do not allocate entries no hit in the ETAC. This is because a fast

response to the IRQ and FIQ exceptions is most critical in real-time systems.

A vector is only cached in the ETAC if the vector is in a TCM. A vector located in any other

type of memory never allocates or hits in the ETAC. This is because the TCMs are the only

memories with a perfect response. Other memories can be subject to cache misses and in these

cases the savings that the ETAC offers are minimal compared to the latency of the cache miss.

The ETAC only caches the vector corresponding to the IRQ or FIQ exception if the instruction

in the vector table is a compatible instruction. Compatible instructions are all encodings of B

#immed. If the exception vector is not a compatible instruction, the ETAC does not cache that

exception.

The IRQ and FIQ exception can be taken to either Exception Level EL1 or EL2, depending on

the exception level at the time of the interrupt and the values of HCR.IMO and HCR.FMO. The

ETAC independently supports both IRQ and FIQ exceptions taken to both EL1 and EL2, which

means that there are four independent entries for each of these cases.

For more information on:

• CPUACTLR, see 3.3.19 CPU Auxiliary Control Register on page 3-90.

• HCR, see 3.3.39 Hyp Configuration Register on page 3-111.

1.2.2 Advanced SIMD and floating-point support

The Advanced SIMD and floating-point that each core supports uses NEON

™

technology, a SIMD

architecture.

The Advanced SIMD and floating-point feature provides:

• Instructions for single-precision (C programming language float type) data-processing operations.

• Optional instructions for double-precision (C double type) data-processing operations.

• Combined Multiply and Accumulate instructions for increased precision (Fused MAC).

• Hardware support for conversion, addition, subtraction, multiplication with optional accumulate,

division, and square-root.

• Hardware support for denormals and all IEEE Standard 754-2008 rounding modes.

• For single-precision floating-point, there are 32 32-bit single-precision registers or 16 64-bit double-

precision registers. If the optional instructions for the double-precision and Advanced SIMD are

included, a total of 32 64-bit double-precision registers or 16 128-bit registers are available.

1 Introduction

1.2 Component blocks

reserved.

1-19

Non-Confidential

Related reference

Chapter 15 Advanced SIMD and floating-point support on page 15-527

1.2.3 GIC Distributor

The GIC Distributor receives, prioritizes, and routes physical interrupts to the appropriate interrupt

target.

The output of the GIC Distributor is the highest priority pending interrupt for each interrupt target. An

interrupt target is either the GIC CPU interface for a core or an export port for connection to an external

device such as a Direct Memory Access (DMA) controller.

Related reference

Chapter 9 Generic Interrupt Controller on page 9-259

1.2.4 GIC CPU interface

The GIC CPU interfaces handle interrupt preemption for both physical and virtual interrupts for each

core.

The virtual part of each GIC CPU interface is divided into hypervisor registers and guest OS registers.

The hypervisor generates interrupts to the guest OS using the GIC CPU interface.

Related reference

Chapter 9 Generic Interrupt Controller on page 9-259

1.2.5 Memory system

The Cortex-R52 memory system provides different memories and interfaces depending on your

implementation.

Intended for use by contexts without strong real-time requirements, each Cortex-R52 core has a

dedicated 128-bit AXIM interface for memory, instructions and data, and peripheral access.

Also, intended for use by real-time contexts, each Cortex-R52 core can have:

• Three unified TCMs, each 8KB-1MB providing lowest-latency access for instructions and data.

• Optionally, 32-bit AXI4 LLPP interface for device data accesses to private peripherals.

• 128-bit read-only Flash interface.

• ECC protection for all TCM and flash memories providing SECDED protection.

• TCM access for DMA through the AXIS interface.

• TCM testing using the MBIST interface.

Note

A real-time context is also able to access the AXIM, although such an access might not be desirable

depending on the system design.

Each Cortex-R52 core has optional Harvard caches, which can be used to cache data from the Flash

interface and the AXIM interface. The cache behavior depends on the memory attributes.

Each core has:

• Store buffer with merging and forwarding (as appropriate) for stores.

• 4-way instruction cache of 4-32KB.

• Instruction linefill buffering.

• 4-way data cache of 4-32KB with Write-Through behavior.

• Data read buffers.

• ECC protection for all cache memories (including tag RAM).

• 64-bit datapath for loads and stores to caches.

• Cache maintenance operations according to Arm architecture.

• Cache memory testing using the MBIST interface.

1 Introduction

1.2 Component blocks

reserved.

1-20

Non-Confidential

剩余632页未读，继续阅读

z79325

粉丝: 0
资源: 1

"Cortex-R52官方文档：ARM Cortex-R52处理器技术参考手册"

Cortex-A55官方技术手册

CortexA55 TRM

Qemu使用手册中文版

【Cortex R52 TRM文档解读】：探索技术参考手册的奥秘

【Cortex R52处理器核心介绍】：揭开Cortex R52的神秘面纱

【Cortex R52编程模型详解】：掌握核心编程原理

【Cortex R52缓存系统分析】：性能优化的关键

【Cortex R52性能基准测试】：如何衡量处理器性能

【Cortex R52安全特性解读】：保障系统安全的基石

【Cortex R52在物联网领域的应用】：探索无限可能

最新资源