没有合适的资源?快使用搜索试试~ 我知道了~
首页Optimizing software in C++
资源详情
资源评论
资源推荐

1.
Optimizing software in C++
An optimization guide for Windows, Linux and Mac
platforms
By Agner Fog. Copenhagen University College of Engineering.
Copyright © 2004 - 2011. Last updated 2011-01-30.
Contents
1 Introduction ....................................................................................................................... 3
1.1 The costs of optimizing ............................................................................................... 4
2 Choosing the optimal platform........................................................................................... 4
2.1 Choice of hardware platform....................................................................................... 4
2.2 Choice of microprocessor ........................................................................................... 6
2.3 Choice of operating system......................................................................................... 6
2.4 Choice of programming language ............................................................................... 7
2.5 Choice of compiler ...................................................................................................... 9
2.6 Choice of function libraries........................................................................................ 11
2.7 Choice of user interface framework........................................................................... 13
2.8 Overcoming the drawbacks of the C++ language...................................................... 14
3 Finding the biggest time consumers ................................................................................ 15
3.1 How much is a clock cycle? ...................................................................................... 15
3.2 Use a profiler to find hot spots .................................................................................. 16
3.3 Program installation .................................................................................................. 18
3.4 Automatic updates .................................................................................................... 18
3.5 Program loading ....................................................................................................... 19
3.6 Dynamic linking and position-independent code ....................................................... 19
3.7 File access................................................................................................................ 21
3.8 System database...................................................................................................... 21
3.9 Other databases ....................................................................................................... 21
3.10 Graphics and other system resources..................................................................... 22
3.11 Network access ...................................................................................................... 22
3.12 Memory access....................................................................................................... 22
3.13 Context switches..................................................................................................... 22
3.14 Dependency chains ................................................................................................ 23
3.15 Execution unit throughput ....................................................................................... 23
4 Performance and usability ............................................................................................... 23
5 Choosing the optimal algorithm ....................................................................................... 25
6 Development process...................................................................................................... 26
7 The efficiency of different C++ constructs........................................................................ 26
7.1 Different kinds of variable storage............................................................................. 26
7.2 Integers variables and operators............................................................................... 29
7.3 Floating point variables and operators ...................................................................... 31
7.4 Enums ...................................................................................................................... 33
7.5 Booleans................................................................................................................... 33
7.6 Pointers and references............................................................................................ 36
7.7 Function pointers ...................................................................................................... 37
7.8 Member pointers....................................................................................................... 37
7.9 Smart pointers .......................................................................................................... 38
7.10 Arrays .....................................................................................................................38
7.11 Type conversions.................................................................................................... 40
7.12 Branches and switch statements............................................................................. 43
7.13 Loops......................................................................................................................45
7.14 Functions................................................................................................................ 47

2
7.15 Function parameters ............................................................................................... 49
7.16 Function return types .............................................................................................. 50
7.17 Structures and classes............................................................................................ 51
7.18 Class data members (properties) ............................................................................ 51
7.19 Class member functions (methods)......................................................................... 52
7.20 Virtual member functions ........................................................................................ 53
7.21 Runtime type identification (RTTI)........................................................................... 54
7.22 Inheritance.............................................................................................................. 54
7.23 Constructors and destructors .................................................................................. 54
7.24 Unions .................................................................................................................... 55
7.25 Bitfields................................................................................................................... 55
7.26 Overloaded functions .............................................................................................. 56
7.27 Overloaded operators ............................................................................................. 56
7.28 Templates............................................................................................................... 56
7.29 Threads .................................................................................................................. 59
7.30 Exceptions and error handling ................................................................................ 60
7.31 Other cases of stack unwinding .............................................................................. 64
7.32 Preprocessing directives......................................................................................... 64
7.33 Namespaces........................................................................................................... 64
8 Optimizations in the compiler .......................................................................................... 64
8.1 How compilers optimize ............................................................................................ 64
8.2 Comparison of different compilers............................................................................. 72
8.3 Obstacles to optimization by compiler....................................................................... 76
8.4 Obstacles to optimization by CPU............................................................................. 79
8.5 Compiler optimization options ................................................................................... 79
8.6 Optimization directives.............................................................................................. 81
8.7 Checking what the compiler does ............................................................................. 83
9 Optimizing memory access ............................................................................................. 86
9.1 Caching of code and data ......................................................................................... 86
9.2 Cache organization................................................................................................... 86
9.3 Functions that are used together should be stored together...................................... 87
9.4 Variables that are used together should be stored together ...................................... 87
9.5 Alignment of data...................................................................................................... 89
9.6 Dynamic memory allocation ...................................................................................... 89
9.7 Container classes ..................................................................................................... 91
9.8 Strings ...................................................................................................................... 94
9.9 Access data sequentially .......................................................................................... 94
9.10 Cache contentions in large data structures ............................................................. 95
9.11 Explicit cache control .............................................................................................. 97
10 Multithreading................................................................................................................ 99
10.1 Hyperthreading ..................................................................................................... 101
11 Out of order execution................................................................................................. 101
12 Using vector operations............................................................................................... 103
12.1 AVX instruction set and YMM registers................................................................. 104
12.2 Automatic vectorization......................................................................................... 105
12.3 Explicit vectorization ............................................................................................. 107
12.4 Mathematical functions for vectors........................................................................ 119
12.5 Aligning dynamically allocated memory................................................................. 122
12.6 Aligning RGB video or 3-dimensional vectors ....................................................... 122
12.7 Conclusion............................................................................................................ 123
13 Make critical code in multiple versions for different CPUs............................................ 123
13.1 CPU dispatch strategies........................................................................................ 124
13.2 Test and maintenance .......................................................................................... 126
13.3 Implementation ..................................................................................................... 126
13.4 CPU dispatching in Gnu compiler ......................................................................... 128
13.5 CPU dispatching in Intel compiler ......................................................................... 129
14 Specific optimization tips ............................................................................................. 135
14.1 Use lookup tables ................................................................................................. 135

3
14.2 Bounds checking .................................................................................................. 137
14.3 Use bitwise operators for checking multiple values at once................................... 138
14.4 Integer multiplication ............................................................................................. 139
14.5 Integer division...................................................................................................... 141
14.6 Floating point division ........................................................................................... 142
14.7 Don't mix float and double..................................................................................... 143
14.8 Conversions between floating point numbers and integers ................................... 144
14.9 Using integer operations for manipulating floating point variables ......................... 145
14.10 Mathematical functions ....................................................................................... 149
15 Metaprogramming ....................................................................................................... 149
16 Testing speed.............................................................................................................. 152
16.1 The pitfalls of unit-testing ...................................................................................... 154
16.2 Worst-case testing ................................................................................................ 155
17 Overview of compiler options....................................................................................... 157
18 Literature..................................................................................................................... 160
19 Copyright notice .......................................................................................................... 161
1 Introduction
This manual is for advanced programmers and software developers who want to make their
software faster. It is assumed that the reader has a good knowledge of the C++
programming language and a basic understanding of how compilers work. The C++
language is chosen as the basis for this manual for reasons explained on page 7 below.
This manual is based mainly on my study of how compilers and microprocessors work. The
recommendations are based on the x86 family of microprocessors from Intel, AMD and VIA
including the 64-bit versions. The x86 processors are used in the most common platforms
with Windows, Linux, BSD and Mac OS X operating systems, though these operating
systems can also be used with other microprocessors. Many of the advices may apply to
other platforms and other compiled programming languages as well.
This is the first in a series of five manuals:
1. Optimizing software in C++: An optimization guide for Windows, Linux and Mac
platforms.
2. Optimizing subroutines in assembly language: An optimization guide for x86
platforms.
3. The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for
assembly programmers and compiler makers.
4. Instruction tables: Lists of instruction latencies, throughputs and micro-operation
breakdowns for Intel, AMD and VIA CPUs.
5. Calling conventions for different C++ compilers and operating systems.
The latest versions of these manuals are always available from www.agner.org/optimize
.
Copyright conditions are listed on page 161 below.
Those who are satisfied with making software in a high-level language need only read this
first manual. The subsequent manuals are for those who want to go deeper into the
technical details of instruction timing, assembly language programming, compiler
technology, and microprocessor microarchitecture. A higher level of optimization can
sometimes be obtained by the use of assembly language for CPU-intensive code, as
described in the subsequent manuals.

4
Please note that my optimization manuals are used by thousands of people. I simply don't
have the time to answer questions from everybody. So please don't send your programming
questions to me. You will not get any answer. Beginners are advised to seek information
elsewhere and get a good deal of programming experience before trying the techniques in
the present manual. There are various discussion forums on the Internet where you can get
answers to your programming questions if you cannot find the answers in the relevant
books and manuals.
I want to thank the many people who have sent me corrections and suggestions for my
optimization manuals. I am always happy to receive new relevant information.
1.1 The costs of optimizing
University courses in programming nowadays stress the importance of structured and
object-oriented programming, modularity, reusability and systematization of the software
development process. These requirements are often conflicting with the requirements of
optimizing the software for speed or size.
Today, it is not uncommon for software teachers to recommend that no function or method
should be longer than a few lines. A few decades ago, the recommendation was the
opposite: Don't put something in a separate subroutine if it is only called once. The reasons
for this shift in software writing style are that software projects have become bigger and
more complex, that there is more focus on the costs of software development, and that
computers have become more powerful.
The high priority of structured software development and the low priority of program
efficiency is reflected, first and foremost, in the choice of programming language and
interface frameworks. This is often a disadvantage for the end user who has to invest in
ever more powerful computers to keep up with the ever bigger software packages and who
is still frustrated by unacceptably long response times, even for simple tasks.
Sometimes it is necessary to compromise on the advanced principles of software develop-
ment in order to make software packages faster and smaller. This manual discusses how to
make a sensible balance between these considerations. It is discussed how to identify and
isolate the most critical part of a program and concentrate the optimization effort on that
particular part. It is discussed how to overcome the dangers of a relatively primitive
programming style that doesn't automatically check for array bounds violations, invalid
pointers, etc. And it is discussed which of the advanced programming constructs are costly
and which are cheap, in relation to execution time.
2 Choosing the optimal platform
2.1 Choice of hardware platform
The choice of hardware platform has become less important than it used to be. The
distinctions between RISC and CISC processors, between PC's and mainframes, and
between simple processors and vector processors are becoming increasingly blurred as the
standard PC processors with CISC instruction sets have got RISC cores, vector processing
instructions, multiple cores, and a processing speed exceeding that of yesterday's big
mainframe computers.
Today, the choice of hardware platform for a given task is often determined by
considerations such as price, compatibility, second source, and the availability of good
development tools, rather than by the processing power. Connecting several standard PC's
in a network may be both cheaper and more efficient than investing in a big mainframe
computer. Big supercomputers with massively parallel vector processing capabilities still

5
have a niche in scientific computing, but for most purposes the standard PC processors are
preferred because of their superior performance/price ratio.
The CISC instruction set (called x86) of the standard PC processors is not optimal from a
technological point of view. This instruction set is maintained for the sake of backwards
compatibility with a lineage of software that dates back to around 1980 where RAM memory
and disk space were scarce resources. However, the CISC instruction set is better than its
reputation. The compactness of the code makes caching more efficient today where cache
size is a limited resource. The CISC instruction set may actually be better than RISC in
situations where code caching is critical. The worst problem of the x86 instruction set is the
scarcity of registers. This problem has been alleviated in the 64-bit extension to the x86
instruction set where the number of registers has been doubled. A register is a small piece
of storage inside the CPU which can be accessed faster than RAM storage.
Thin clients that depend on network resources are not recommended for critical applications
because the response times for network resources cannot be controlled.
This manual is based on the standard PC platform with an Intel, AMD or VIA processor and
a Windows, Linux, BSD or Mac operating system running in 32-bit or 64-bit mode. Much of
the advice given here may apply to other platforms as well, but the examples have been
tested only on PC platforms.
Graphics accelerators
The choice of platform is obviously influenced by the requirements of the task in question.
For example, a heavy graphics application is preferably implemented on a platform with a
graphics coprocessor or graphics accelerator card. Some systems also have a dedicated
physics processor for calculating the physical movements of objects in a computer game or
animation.
It is possible in some cases to use the high processing power of the processors on a
graphics accelerator card for other purposes than rendering graphics on the screen.
However, such applications are highly system dependent and therefore not recommended if
portability is important. This manual does not cover graphics processors.
Programmable logic devices
A programmable logic device is a chip that can be programmed in a hardware definition
language, such as VHDL or Verilog. Common devices are CPLDs and FPGAs. The
difference between a software programming language, e.g. C++, and a hardware definition
language is that the software programming language defines an algorithm of sequential
instructions, where a hardware definition language defines hardware circuits consisting of
digital building blocks such as gates, flip-flops, multiplexers, arithmetic units, etc. and the
wires that connect them. The hardware definition language is inherently parallel because it
defines electrical connections rather than sequences of operations.
A complex digital operation can often be executed faster in a programmable logic device
than in a microprocessor because the hardware can be wired for a specific purpose.
It is possible to implement a microprocessor in an FPGA as a so-called softcore. Such a
softcore is much slower than a dedicated microprocessor and therefore not advantageous
by itself. But a solution where a softcore activates critical application-specific instructions
that are coded in a hardware definition language in the same chip can be a very efficient
solution in some cases. An even more powerful solution is the combination of a dedicated
microprocessor core and an FPGA in the same chip. Such hybrid solutions are now used in
some embedded systems.
A look in my crystal ball reveals that similar solutions may some day be implemented in PC
processors. The application program will be able to define application-specific instructions
剩余160页未读,继续阅读















安全验证
文档复制为VIP权益,开通VIP直接复制

评论2