TMS320C6657双核DSP技术手册：电源管理与任务调度

需积分: 32 182 浏览量更新于2024-07-24 收藏 2.28MB PDF 举报

"TMS320C6657 数据手册" TMS320C6657 是由德州仪器（Texas Instruments）推出的高性能双核数字信号处理器（DSP），专为处理固定点和浮点运算而设计。该处理器系列在2012年进行了多次更新，提供最新的生产数据，并符合德州仪器的标准保修条款。尽管生产过程可能不包括对所有参数的测试，但产品仍能满足规格要求。该数据手册详细介绍了TMS320C6655/57的主要特性，包括处理器的架构、性能、电源管理系统以及任务调度功能。其中，TMS320C6657特别强调了其双核配置，这使得它在并行处理和高计算密集型应用中表现出色。在2012年8月的修订版（SPRS814A）中，手册对以下方面进行了更新： 1. **Tracer描述**：可能涉及了调试工具和追踪硬件的改进，以支持更高效的程序调试和性能分析。 2. **McBSP（多通道缓冲串口）时序要求表**：更新了与多通道缓冲串行接口相关的时序参数，确保通信协议的准确性和可靠性。 3. **热特性数据**：可能增加了关于处理器散热和温度管理的新信息，以确保在各种工作条件下保持稳定运行。 4. **内存映射摘要表中的DDR3 EMIF数据**：添加了脚注，详细说明了双倍数据速率同步动态随机访问内存（DDR3 SDRAM）的接口配置。 5. **SmartReflex电压参数**：在SmartReflex开关表中加入了CVDD和SmartReflex电压，这是节能技术的一部分，旨在动态调整电压以优化功耗。 6. **DDR3 PLL初始化序列**：从数据手册中移除，改到PLL控制器用户指南中，这意味着PLL（锁相环）的配置和初始化细节可能在单独的文档中有更详细的说明。最初的版本（SPRS841A）是2012年3月发布的，标志着TMS320C6657处理器的首次正式发布，包含了处理器的基本设计和技术规格。这个处理器家族适用于多种应用，如音频和视频处理、网络基础设施、医疗成像、工业自动化以及高级驾驶辅助系统等。其强大的处理能力和灵活的电源管理策略使其成为需要高效能计算的嵌入式系统的理想选择。通过不断的更新和完善，TMS320C6657确保了与最新技术趋势的同步，提供了卓越的性能和可靠性。

SPRS814A—August 2012

Fixed and Floating-Point Digital Signal Processor

TMS320C6655/57

www.ti.com

Submit Documentation Feedback

1.3 Functional Block Diagram

Figure 1-1 shows the functional block diagram of the device.

Figure 1-1 Functional Block Diagram

1 or 2 Cores @ up to 1.25 GHz

C66x™

CorePac

VCP2

TCI6655/57

MSMC

1MB

MSM

SRAM

32-Bit

DDR3 EMIF

TCP3d

´2

Coprocessors

Memory Subsystem

Packet

DMA

Multicore Navigator

Queue

Manager

´2

32KB L1

P-Cache

32KB L1

D-Cache

1024KB L2 Cache

PLL

EDMA

HyperLink

TeraNet

Ethernet

MAC

SGMII

SRIO 4´

SPI

UART 2´

PCIe 2´

UPP

McBSP ´2

GPIO

EMIF16

Boot ROM

Debug & Trace

Power

Management

Semaphore

Security /

Key Manager

Timers

2nd core, C6657 only

Fixed and Floating-Point Digital Signal Processor

SPRS814A—August 2012

TMS320C6655/57

www.ti.com

Submit Documentation Feedback

2 Device Overview

2.1 Device Characteristics

Table 2-1 Characteristics of the TMS320C6655/57 Processor

HARDWARE FEATURES TMS320C6655 TMS320C6657

Peripheral

DDR3 Memory Controller (32-bit bus width)

[1.5 V I/O] (clock source = DDRREFCLKN|P)

DDR3 Maximum Data Rate 1333

EDMA3 (64 independent channels) [DSP/3 clock rate] 1

High-speed 1×/2×/4× Serial RapidIO Port (4 lanes) 1

PCIe (2 lanes) 1

10/100/1000 Ethernet 1

Management Data Input/Output (MDIO) 1

HyperLink 1

EMIF16 1

McBSP 2

SPI 1

UART 2

uPP 1

C 1

64-Bit Timers (configurable) (internal clock source = CPU/6

clock frequency)

8 (each configurable as two 32-bit timers)

General-Purpose Input/Output port (GPIO) 32

Encoder/Decoder

Coprocessors

VCP2 (clock source = CPU/3 clock frequency) 2

TCP3d (clock source = CPU/2 clock frequency) 1

On-Chip Memory

CorePac Memory

32KB L1 Program Memory [SRAM/Cache]

32KB L1 Data Memory [SRAM/Cache]

1024KB L2 Unified Memory/Cache

ROM Memory 128KB L3 ROM

Multicore Shared Memory 1024KB MSM SRAM

C66x CorePac

Revision ID

CorePac Revision ID Register

(address location: 0181 2000h)

See Section 5.5 ‘‘C66x CorePac Revision’’ on page 103

JTAG BSDL_ID JTAGID register (address location: 0262 0018h) See Section 3.3.3 ‘‘JTAG ID (JTAGID) Register Description’’ on page 72

Frequency MHz

1250 (1.25GHz)

1000 (1.0 GHz)

- 850 (0.85 GHz)

Cycle Time ns

0.8 (1.25 GHz)

1 (1.0 GHz)

- 1.175 (0.85 GHz)

Voltage

Core (V) SmartReflex variable supply

I/O (V) 1.0 V, 1.5 V, and 1.8 V

Process

Technology

m 0.040 m

BGA Package 21 mm × 21mm 625-Pin Flip-Chip Plastic BGA (CZH or GZH)

Product Status

(1)

1 PRODUCTION DATA information is current as of publication date. Products conform to specifications per the terms of Texas Instruments standard warranty. Production

processing does not necessarily include testing of all parameters.

Production Data (PD) PD PD

End of Table 2-1

SPRS814A—August 2012

Fixed and Floating-Point Digital Signal Processor

TMS320C6655/57

www.ti.com

Submit Documentation Feedback

2.2 DSP Core Description

The C66x Digital Signal Processor (DSP) extends the performance of the C64x+ and C674x DSPs through

enhancements and new features. Many of the new features target increased performance for vector processing. The

C64x+ and C674x DSPs support 2-way SIMD operations for 16-bit data and 4-way SIMD operations for 8-bit data.

On the C66x DSP, the vector processing capability is improved by extending the width of the SIMD instructions.

C66x DSPs can execute instructions that operate on 128-bit vectors. For example the QMPY32 instruction is able to

perform the element-to-element multiplication between two vectors of four 32-bit data each. The C66x DSP also

supports SIMD for floating-point operations. Improved vector processing capability (each instruction can process

multiple data in parallel) combined with the natural instruction level parallelism of C6000 architecture (e.g

execution of up to 8 instructions per cycle) results in a very high level of parallelism that can be exploited by DSP

programmers through the use of TI's optimized C/C++ compiler.

The C66x DSP consists of eight functional units, two register files, and two data paths as shown in Figure 2-1. The

two general-purpose register files (A and B) each contain 32 32-bit registers for a total of 64 registers. The

general-purpose registers can be used for data or can be data address pointers. The data types supported include

packed 8-bit data, packed 16-bit data, 32-bit data, 40-bit data, and 64-bit data. Multiplies also support 128-bit data.

40-bit-long or 64-bit-long values are stored in register pairs, with the 32 LSBs of data placed in an even register and

the remaining 8 or 32 MSBs in the next upper register (which is always an odd-numbered register). 128-bit data

values are stored in register quadruplets, with the 32 LSBs of data placed in a register that is a multiple of 4 and the

remaining 96 MSBs in the next 3 upper registers.

The eight functional units (.M1, .L1, .D1, .S1, .M2, .L2, .D2, and .S2) are each capable of executing one instruction

every clock cycle. The .M functional units perform all multiply operations. The .S and .L units perform a general set

of arithmetic, logical, and branch functions. The .D units primarily load data from memory to the register file and

store results from the register file into memory.

Each C66x .M unit can perform one of the following fixed-point operations each clock cycle: four 32 × 32 bit

multiplies, sixteen 16 × 16 bit multiplies, four 16 × 32 bit multiplies, four 8 × 8 bit multiplies, four 8 × 8 bit multiplies

with add operations, and four 16 × 16 multiplies with add/subtract capabilities. There is also support for Galois field

multiplication for 8-bit and 32-bit data. Many communications algorithms such as FFTs and modems require

complex multiplication. Each C66x .M unit can perform one 16 × 16 bit complex multiply with or without rounding

capabilities, two 16 × 16 bit complex multiplies with rounding capability, and a 32 × 32 bit complex multiply with

rounding capability. The C66x can also perform two 16 × 16 bit and one 32 × 32 bit complex multiply instructions

that multiply a complex number with a complex conjugate of another number with rounding capability.

Communication signal processing also requires an extensive use of matrix operations. Each C66x .M unit is capable

of multiplying a [1 × 2] complex vector by a [2 × 2] complex matrix per cycle with or without rounding capability.

A version also exists allowing multiplication of the conjugate of a [1 × 2] vector with a [2 × 2] complex matrix.

Each C66x .M unit also includes IEEE floating-point multiplication operations from the C674x DSP, which includes

one single-precision multiply each cycle and one double-precision multiply every 4 cycles. There is also a

mixed-precision multiply that allows multiplication of a single-precision value by a double-precision value and an

operation allowing multiplication of two single-precision numbers resulting in a double-precision number. The

C66x DSP improves the performance over the C674x double-precision multiplies by adding a instruction allowing

one double-precision multiply per cycle and also reduces the number of delay slots from 10 down to 4. Each C66x

.M unit can also perform one the following floating-point operations each clock cycle: one, two, or four

single-precision multiplies or a complex single-precision multiply.

The .L and .S units can now support up to 64-bit operands. This allows for new versions of many of the arithmetic,

logical, and data packing instructions to allow for more parallel operations per cycle. Additional instructions were

added yielding performance enhancements of the floating point addition and subtraction instructions, including the

ability to perform one double precision addition or subtraction per cycle. Conversion to/from integer and

single-precision values can now be done on both .L and .S units on the C66x. Also, by taking advantage of the larger

剩余232页未读，继续阅读

u010899382

粉丝: 0
资源: 2

TMS320C6657双核DSP技术手册：电源管理与任务调度

DSP C6657+ZYNQ7035硬件设计手册

C6657相关全部文档手册

NonOS_GPIO_LED_NonOS_GPIO_LED_dsp_GPIO_TMS320C6657官网_tms320c6657

tms320c6657资料

TMS320C6657例程程序

《TMS320C6655 和 TMS320C6657 定点及浮点数字信号处理器》中文手册

NonOS_rememberjst_C6657NonOs_tms320c6657_

TMS320C6657创龙开发板硬件说明书2

upp_test_tms320c6657upp_

基于TMS320C6657的千兆以太网接口设计

最新资源