TMS320C6678：多核固定与浮点数字信号处理器数据手册

3星 · 超过75%的资源需积分: 13 80 浏览量更新于2024-07-27 收藏 2.38MB PDF 举报

"TMS320C6678是一款多核固定点和浮点数字信号处理器的数据手册，由Texas Instruments公司出版，日期为2012年2月，包含了产品的生产数据信息。该手册详细介绍了处理器的修订历史、功能特性、硬件接口以及配置参数等。" TMS320C6678是德州仪器（Texas Instruments，TI）推出的一款高性能数字信号处理器，设计用于处理复杂的实时计算任务，特别适用于多媒体、通信、图像处理和嵌入式系统等领域。该处理器的特点在于其多核架构，能够同时执行多个处理任务，提供强大的并行计算能力。在2012年2月发布的SPRS691C版本中，手册进行了以下更新和修改： 1. 添加了TeraNet连接图，并在连接表中增加了桥接编号，这表明处理器的网络连接和互连能力得到了增强，可以支持更复杂的系统级通信。 2. TPCC被更新为EDMA3CC（Enhanced Direct Memory Access Controller Channel），而TPTC则变为EDMA3TC（Enhanced Direct Memory Access Transfer Controller），这表示数据传输机制已经升级，提高了内存访问的效率和灵活性。 3. 将芯片级中断控制器的名称从INTC更改为CIC（Central Interrupt Controller），这一改动可能意味着中断管理机制的改进，提供了更好的中断处理和优先级管理。 4. 手册新增了DDR3 PLL（Phase-Locked Loop）和PASS PLL的初始化序列，这对于正确配置和优化处理器与DDR3内存之间的时钟同步至关重要。 5. 添加了DEVSPEED寄存器部分，DEVSPEED寄存器通常用于监测和控制设备的工作速度，确保系统在不同工作模式下保持稳定运行。 6. 更新了设备的频率信息，这反映了处理器和相关外设的运行速度，有助于用户了解性能表现。 7. 修正了SPI（Serial Peripheral Interface）、DDR3和HyperBridge配置/数据内存映射地址，确保了内存访问的准确性和一致性。 8. 限制了SECCTL（Security Control）寄存器的输出分频值，最大只能设置为除以2，这可能是为了提高系统的安全性和稳定性。这些更新表明TMS320C6678在不断演进和优化，以适应不断变化的市场需求和技术挑战。用户可以通过这份数据手册获取最新的设计和应用信息，以便于在项目中有效利用这款强大的处理器。

SPRS691C—February 2012

Multicore Fixed and Floating-Point Digital Signal Processor

TMS320C6678

www.ti.com

1.3 Functional Block Diagram

Figure 1-1 shows the functional block diagram of the TMS320C6678 device.

Figure 1-1 Functional Block Diagram

8 Cores @ up to 1.25 GHz

Power

Management

Debug & Trace

Boot ROM

Semaphore

SRIO 4´

PCIe 2´

UART

TSIP ´2

SPI

Packet

DMA

Multicore Navigator

Queue

Manager

GPIO

´3

PLL

EDMA

´3

EMIF 16

4MB

MSM

SRAM

64-Bit

DDR3 EMIF

Memory Subsystem

MSMC

C66x

CorePac

32KB L1

P-Cache

32KB L1

D-Cache

512KB L2 Cache

TeraNet

HyperLink

TeraNet

Network Coprocessor

Switch

Ethernet

Switch

SGMII

Packet

Accelerator

Security

Accelerator

Multicore Fixed and Floating-Point Digital Signal Processor

SPRS691C—February 2012

TMS320C6678

www.ti.com

2 Device Overview

2.1 Device Characteristics

Table 2-1 shows the significant features of the device.

Table 2-1 Device Characteristics

HARDWARE FEATURES TMS320C6678

Peripherals

DDR3 Memory Controller (64-bit bus width) [1.5 V I/O]

(clock source = DDRREFCLKN|P)

EDMA3 (16 independent channels) [DSP/2 clock rate] 1

EDMA3 (64 independent channels) [DSP/3 clock rate] 2

High-speed 1×/2x/4× Serial RapidIO Port (4 lanes) 1

PCIe (2 lanes) 1

10/100/1000 Ethernet 2

Management Data Input/Output (MDIO) 1

HyperLink 1

EMIF16 1

TSIP 2

SPI 1

UART 1

C 1

64-Bit Timers (configurable) (internal clock source = CPU/6 clock frequency) Sixteen 64-bit (each configurable as two32-bit

timers)

General-Purpose Input/Output Port (GPIO) 16

Accelerators

Packet Accelerator 1

Security Accelerator

(1)

1 The Security Accelerator function is subject to export control and will be enabled only for approved device shipments.

On-Chip Memory

Size (Bytes) 8832KB

Organization

256KB L1 Program Memory [SRAM/Cache]

256KB L1 Data Memory [SRAM/Cache]

4096KB L2 Unified Memory/Cache

4096KB MSM SRAM

128KB L3 ROM

C66x CorePac

Revision ID

CorePac Revision ID Register (address location: 0181 2000h)

See Section 5.5 ‘‘C66x CorePac Revision’’ on

page 107.

JTAG BSDL_ID JTAGID register (address location: 0262 0018h)

See Section 3.3.3 ‘‘JTAG ID (JTAGID) Register

Description’’ on page 73

Frequency MHz

1250 (1.25 GHz)

1000 (1.0 GHz)

Cycle Time ns

0.8 ns (1.25 GHz)

1 ns (1.0 GHz)

Voltage

Core (V) SmartReflex variable supply

I/O (V) 1.0 V, 1.5 V, and 1.8 V

Process Technology μm 0.040 μm

BGA Package 24 mm × 24 mm 841-Pin Flip-Chip Plastic BGA (CYP)

Product Status

(2)

2 PRODUCTION DATA information is current as of publication date. Products conform to specifications per the terms of Texas Instruments standard warranty. Production

processing does not necessarily include testing of all parameters.

Product Preview (PP), Advance Information (AI), or Production Data (PD) PD

End of Table 2-1

SPRS691C—February 2012

Multicore Fixed and Floating-Point Digital Signal Processor

TMS320C6678

www.ti.com

2.2 DSP Core Description

The C66x Digital Signal Processor (DSP) extends the performance of the C64x+ and C674x DSPs through

enhancements and new features. Many of the new features target increased performance for vector processing. The

C64x+ and C674x DSPs support 2-way SIMD operations for 16-bit data and 4-way SIMD operations for 8-bit data.

On the C66x DSP, the vector processing capability is improved by extending the width of the SIMD instructions.

C66x DSPs can execute instructions that operate on 128-bit vectors. For example the QMPY32 instruction is able to

perform the element-to-element multiplication between two vectors of four 32-bit data each. The C66x DSP also

supports SIMD for floating-point operations. Improved vector processing capability (each instruction can process

multiple data in parallel) combined with the natural instruction level parallelism of C6000 architecture (e.g

execution of up to 8 instructions per cycle) results in a very high level of parallelism that can be exploited by DSP

programmers through the use of TI's optimized C/C++ compiler.

The C66x DSP consists of eight functional units, two register files, and two data paths as shown in Figure 2-1. The

two general-purpose register files (A and B) each contain 32 32-bit registers for a total of 64 registers. The

general-purpose registers can be used for data or can be data address pointers. The data types supported include

packed 8-bit data, packed 16-bit data, 32-bit data, 40-bit data, and 64-bit data. Multiplies also support 128-bit data.

40-bit-long or 64-bit-long values are stored in register pairs, with the 32 LSBs of data placed in an even register and

the remaining 8 or 32 MSBs in the next upper register (which is always an odd-numbered register). 128-bit data

values are stored in register quadruplets, with the 32 LSBs of data placed in a register that is a multiple of 4 and the

remaining 96 MSBs in the next 3 upper registers.

The eight functional units (.M1, .L1, .D1, .S1, .M2, .L2, .D2, and .S2) are each capable of executing one instruction

every clock cycle. The .M functional units perform all multiply operations. The .S and .L units perform a general set

of arithmetic, logical, and branch functions. The .D units primarily load data from memory to the register file and

store results from the register file into memory.

Each C66x .M unit can perform one of the following fixed-point operations each clock cycle: four 32 × 32 bit

multiplies, sixteen 16 × 16 bit multiplies, four 16 × 32 bit multiplies, four 8 × 8 bit multiplies, four 8 × 8 bit multiplies

with add operations, and four 16 × 16 multiplies with add/subtract capabilities. There is also support for Galois field

multiplication for 8-bit and 32-bit data. Many communications algorithms such as FFTs and modems require

complex multiplication. Each C66x .M unit can perform one 16 × 16 bit complex multiply with or without rounding

capabilities, two 16 × 16 bit complex multiplies with rounding capability, and a 32 × 32 bit complex multiply with

rounding capability. The C66x can also perform two 16 × 16 bit and one 32 × 32 bit complex multiply instructions

that multiply a complex number with a complex conjugate of another number with rounding capability.

Communication signal processing also requires an extensive use of matrix operations. Each C66x .M unit is capable

of multiplying a [1 × 2] complex vector by a [2 × 2] complex matrix per cycle with or without rounding capability.

A version also exists allowing multiplication of the conjugate of a [1 × 2] vector with a [2 × 2] complex matrix.

Each C66x .M unit also includes IEEE floating-point multiplication operations from the C674x DSP, which includes

one single-precision multiply each cycle and one double-precision multiply every 4 cycles. There is also a

mixed-precision multiply that allows multiplication of a single-precision value by a double-precision value and an

operation allowing multiplication of two single-precision numbers resulting in a double-precision number. The

C66x DSP improves the performance over the C674x double-precision multiplies by adding a instruction allowing

one double-precision multiply per cycle and also reduces the number of delay slots from 10 down to 4. Each C66x

.M unit can also perform one the following floating-point operations each clock cycle: one, two, or four

single-precision multiplies or a complex single-precision multiply.

The .L and .S units can now support up to 64-bit operands. This allows for new versions of many of the arithmetic,

logical, and data packing instructions to allow for more parallel operations per cycle. Additional instructions were

added yielding performance enhancements of the floating point addition and subtraction instructions, including the

ability to perform one double precision addition or subtraction per cycle. Conversion to/from integer and

single-precision values can now be done on both .L and .S units on the C66x. Also, by taking advantage of the larger

剩余231页未读，继续阅读

niexc2005

粉丝: 1
资源: 11

TMS320C6678：多核固定与浮点数字信号处理器数据手册

TI CCSv5调试TMS320C6678入门教程

TMS320C6678全套手册

TI DSP TMS320C6678参考设计 含电路图+物料清单

创龙TMS320C6678光盘资料更新说明V2.0_创龙_tms320c6678_V2_TMS320C6678光盘_

tms320c6678_lib_tms320c6678_

tms320c6678_UG_tms320c6678_dsp6678_6678_6678手册_

TMS320C6678

tms320C6678

TMS320C6678和TMS320C6672的区别在于什么

tms320c6678手册

最新资源

TI DSP TMS320C6678参考设计含电路图+物料清单