Zynq UltraScale+ MPSoC Embedded Design Methodology Guide 17
UG1228 (v1.0) March 31, 2017
www.xilinx.com
Chapter 2: Processing System
necessary data movers and software drivers to enable the rest of your APU- or RPU-bound
software to transparently use the accelerated software portions. SDSoC therefore helps
streamline the software acceleration process by greatly simplifying all steps involved. The
use of SDSoC vs. manual offloading is therefore a trade off between ease of implementation
and hand-crafted performance tuning.
An additional aspect to keep in mind is the top clock speeds of the processing blocks:
• APU – Up to 1.5 GHz
• RPU – Up to 600 MHz
• GPU – Up to 667 MHz
Note:
Keep in mind that those are top speeds. While each block can run at a maximum at those
speeds, it's very unlikely to be running at those speeds all the time nor will it necessarily make sense
for your design.
With its ARM® Cortex®-A53 processors, the APU is the fastest general purpose computing
resource on the Zynq UltraScale+ MPSoC device. At first glance it might therefore seem to
be the best candidate for workloads requiring maximum computing power, especially since
you can have up to four Cortex-A53 processors on the Zynq UltraScale+ MPSoC device.
Maximum frequency however does not necessarily mean best fit for function. The APU's
Cortex-A53 processors, for instance, are not as well suited to real-time workloads as the
RPU's ARM® Cortex®-R5 processor. Among many other factors, there's therefore a trade
off between performance and determinism in choosing between the APU and the RPU.
Once the most likely candidate blocks for housing a given functionality have been
identified, you still need to identify the best way to move data between blocks through the
interconnect and how each processing location interacts with the various processing
resources internal to the system as well as interfaces and resources within the outside
world. The interconnect and interrupt processing are discussed in detail later in this chapter.
For all aspects related to peripheral I/O, refer to
Chapter 10, Peripherals. For information
regarding the Memory, refer to Chapter 6, Memory. For more information regarding the
PL's capabilities, including its built-in accelerators, refer to Chapter 5, Programmable Logic.
Note that while the present guidelines might prescribe a given recommended processing
block, it's entirely possible that after reviewing the entire set of content related to a given
part of your design that an alternate, better-suited configuration might become evident to
best fit your specific product needs. The decision tree presented earlier, for example,
recommended using the RPU for your real-time software. Your design might, instead, call
for running a real-time operating system (RTOS) on the APU with the Cortex-R5 processors
being run bare-metal. Another example is network communications. The above
recommendations categorize network communication as being best slated for the APU. Yet,
the PL contains integrated blocks for 100G Ethernet and PCIe which, together, can be used
to efficiently accomplish network-related tasks that would typically be designated for the
APU. The Xilinx White Paper Unleash the Unparalleled Power and Flexibility of Zynq
UltraScale+ MPSoCs (WP470)
[Ref 10] describes the flexibility of the Zynq UltraScale+
MPSoC outlines such an example use-case for a data center application. It also covers two