Tracy Profiler The user manual
The following conditions also need apply, but don’t trouble yourself with them too much. You would
probably already knew, if you’d be breaking any.
• Only little-endian CPUs are supported.
• Virtual address space must be limited to 48 bits.
• Tracy server requires CPU which is able to handle misaligned memory accesses.
2.2 Check your environment
It is not an easy task to reliably measure performance of an application on modern machines. There are
many factors affecting program execution characteristics, some of which you will be able to minimize, and
others you will have to live with. It is critically important that you understand how these variables impact
profiling results, as it is key to understanding the data you get.
2.2.1 Operating system
In a multitasking operating system applications compete for system resources with each other. This has a
visible effect on the measurements performed by the profiler, which you may, or may not accept.
In order to get the most accurate profiling results you should minimize interference caused by other
programs running on the same machine. Before starting a profile session close all web browsers, music
players, instant messengers, and all other non-essential applications like Steam, Uplay, etc. Make sure you
don’t have the debugger hooked into the profiled program, as it also has impact on the timing results.
Interference caused by other programs can be seen in the profiler, if context switch capture (section 3.13.3)
is enabled.
In MSVC you would typically run your program using the Start Debugging menu option, which is
conveniently available as a
F5
shortcut. You should instead use the Start Without Debugging option,
available as
Ctrl
+
F5
shortcut.
Debugger in Visual Studio
2.2.2 CPU design
Where to even begin here? Modern processors are such a complex beasts, that it’s almost impossible to
surely say anything about how they will behave. Cache configuration, prefetcher logic, memory timings,
branch predictor, execution unit counts are all the drivers of instructions-per-cycle uplift nowadays, after the
megahertz race had hit the wall. Not only is it incredibly difficult to reason about, but you also need to take
into account how the CPU topology affects things, which is described in more detail in section 3.13.4.
Nevertheless, let’s take a look on the ways we can try to stabilize the profiling data.
2.2.2.1 Superscalar out-of-order speculative execution
Also known as: the spectre thing we have to dealt with now.
You must be aware that most processors available on the market
14
do not execute machine code in a linear
way, as laid out in the source code. This can lead to counterintuivive timing results reported by Tracy. Trying
to get more ’reliable’ readings
15
would require a change in the behavior of the code and this is not a thing a
profiler should do. Instead, Tracy shows you what the hardware is really doing.
This is a complex subject and the details vary from one CPU to another. You can read a brief rundown of the
topic at the following address: https://travisdowns.github.io/blog/2019/06/11/speed-limits.html.
14With the exception of low-cost ARM CPUs.
15And by saying ’reliable’ you do in reality mean: behaving in a way you expect it to.
15