18 Device Overview Copyright 2012 Texas Instruments Incorporated
SPRS689D—March 2012
Multicore Fixed and Floating-Point System-on-Chip
TMS320C6670
www.ti.com
Submit Documentation Feedback
The C66x Central Processing Unit (CPU) extends the performance of the C64x+ and C674x CPUs through
enhancements and new features. Many of the new features target increased performance for vector processing. The
C64x+ and C674x CPUs support 2-way SIMD operations for 16-bit data and 4-way SIMD operations for 8-bit data.
On the C66x CPU, the vector processing capability is improved by extending the width of the SIMD instructions.
C66x CPUs can execute instructions that operate on 128-bit vectors. For example the QMPY32 instruction is able
to perform the element-to-element multiplication between two vectors of four 32-bit data each. The C66x CPU also
supports SIMD for floating-point operations. Improved vector processing capability (each instruction can process
multiple data in parallel) combined with the natural instruction-level parallelism of C6000 architecture (e.g
execution of up to 8 instructions per cycle) results in a very high level of parallelism that can be exploited by DSP
programmers through the use of TI's optimized C/C++ compiler.
The C66x CPU consists of eight functional units, two register files, and two data paths as shown in Figure 2-1. The
two general-purpose register files (A and B) each contain thirty-two 32-bit registers for a total of 64 registers. The
general-purpose registers can be used for data or can be data address pointers. The data types supported include
packed 8-bit data, packed 16-bit data, 32-bit data, 40-bit data, and 64-bit data. Multiplies also support 128-bit data.
40-bit-long or 64-bit-long values are stored in register pairs, with the 32 LSBs of data placed in an even register and
the remaining 8 or 32 MSBs in the next upper register (which is always an odd-numbered register). 128-bit data
values are stored in register quadruplets, with the 32 LSBs of data placed in a register that is a multiple of 4 and the
remaining 96 MSBs in the next 3 upper registers.
The eight functional units (.M1, .L1, .D1, .S1, .M2, .L2, .D2, and .S2) are each capable of executing one instruction
every clock cycle. The .M functional units perform all multiply operations. The .S and .L units perform a general set
of arithmetic, logical, and branch functions. The .D units primarily load data from memory to the register file and
store results from the register file into memory.
Each C66x .M unit can perform one of the following fixed-point operations each clock cycle: four 32 × 32 bit
multiplies, sixteen 16 × 16 bit multiplies, four 16 × 32 bit multiplies, four 8 × 8 bit multiplies, four 8 × 8 bit multiplies
with add operations, and four 16 × 16 multiplies with add/subtract capabilities. There is also support for Galois field
multiplication for 8-bit and 32-bit data. Many communications algorithms such as FFTs and modems require
complex multiplication. Each C66x .M unit can perform one 16 × 16 bit complex multiply with or without rounding
capabilities, two 16 × 16 bit complex multiplies with rounding capability, and a 32 × 32 bit complex multiply with
rounding capability. The C66x can also perform two 16 × 16 bit and one 32 × 32 bit complex multiply instructions
that multiply a complex number with a complex conjugate of another number with rounding capability.
Communication signal processing also requires an extensive use of matrix operations. Each C66x .M unit is capable
of multiplying a [1 × 2] complex vector by a [2 × 2] complex matrix per cycle with or without rounding capability.
A version also exists allowing multiplication of the conjugate of a [1 × 2] vector with a [2 × 2] complex matrix.
Each C66x .M unit also includes IEEE floating-point multiplication operations from the C674x CPU. This includes
one single-precision multiply each cycle and one double precision multiply every 4 cycles. There is also a
mixed-precision multiply that allows multiplication of a single-precision value by a double-precision value and an
operation allowing multiplication of two single-precision numbers resulting in a double-precision number. The
BGA Package 24 mm × 24 mm 841-Pin Flip-Chip Plastic BGA (CYP)
Process Technology μm 0.040 μm
Product Status
(2)
Product Preview (PP), Advance Information (AI),
or Production Data (PD)
PD
End of Table 2-1
1 The Security Accelerator function is subject to export control and will be enabled only for approved device shipments.
2 PRODUCTION DATA information is current as of publication date. Products conform to specifications per the terms of Texas Instruments standard warranty. Production
processing does not necessarily include testing of all parameters.
Table 2-1 Characteristics of the C6670 SoC (Part 2 of 2)
Hardware Features TMS320C6670