![](https://csdnimg.cn/release/download_crawler_static/7460661/bg13.jpg)
4M.B.Tayloretal.
tiles across the area of a chip and connecting neighboring routers together, a com-
plete on-chip communication network is created.
The use of a general network router in each tile distinguishes tiled multicores
from other mainstream multicores such as Intel’s Core processors, Sun’s Niagara
[21], and the Cell Broadband Engine [18]. Most of these multicores have distributed
processing elements but still connect cores together using non-scalable centralized
structures such as bus interconnects, crossbars, and shared caches. The Cell pro-
cessor uses ring networks that are physically scalable but can suffer from signifi-
cant performance degradation due to congestion as t he number of cores increases.
Although these designs are adequate for small numbers of cores, they will not scale
to the t housand-core chips we will see within the next decade.
Tiled multicores distribute both computation and communication structures pro-
viding advantages in efficiency, scalability, design costs, and versatility. As men-
tioned previously, smaller simpler cores are faster and more efficient due to the
scaling properties of certain internal processor structures. In addition, they pro-
vide fast, cheap access to local resources (such as caches) and incur extra cost
only when additional distant resources are required. Centralized designs, on the
other hand, force every access to incur the costs of using a single large, dis-
tant resource. This is true to a lesser extent even for other multicore designs
with centralized interconnects. Every access that leaves a core must use the sin-
gle large interconnect. In a tiled multicore, an external access is routed through
the on-chip network and uses only the network segments between the source and
destination.
Tiled multicore architectures are specifically designed to scale easily as improve-
ments in process technology provide more transistors on each chip. Because tiled
multicores use distributed communication structures as well as distributed compu-
tation, processors of any size can be built by simply laying down additional tiles.
Moving to a new process generation does not require any redesign or re-verification
of the tile design. Besides future scalability, this property has enormous advantages
for design costs today. To design a huge billion-transistor chip, one only needs to
design, layout, and verify a small, relatively simple tile and then replicate it as
needed to fill the die area. Multicores with centralized interconnect allow much of
the core design to be re-used, but still require some customized layout for each core.
In addition, the interconnect may need to be completely redesigned to add additional
cores.
As we will see in Sect. 1.5, tiled multicores are also much more versatile than
traditional general-purpose processors. This versatility stems from the fact that,
much like FPGAs, tiled multicores provide large quantities of general processing
resources and allow the application to decide how best to use them. This is in con-
trast to large monolithic processors where the majority of die area is consumed
by special-purpose structures that may not be needed by all applications. If an
application does need a complex function, it can dedicate some of the resources
to emulating it in software. Thus, tiled multicores are, in a sense, more general than
general-purpose processors. They can provide competitive performance on single-
threaded ILP (instruction-level parallelism) applications as well as applications that