C++代码优化指南：关键点与策略

需积分: 10 6 浏览量更新于2024-07-22 收藏 557KB PDF 举报

C++代码优化是一本详尽的指南，由Agner Fog撰写，版权日期为2006年，最近更新至2006年8月13日。本书针对Windows、Linux和Mac平台，旨在帮助读者深入了解如何在C++编程中实现高效的软件优化。以下章节概述了关键知识点： 1. **介绍**：首先，作者解释了优化软件的重要性，包括可能带来的成本和收益，并强调优化决策需要综合考虑硬件、操作系统、编程语言、编译器以及用户界面框架的选择。 2. **选择最优平台**： - **硬件平台**：根据目标应用的需求和性能要求，选择合适的处理器架构。 - **微处理器**：理解不同处理器的特点，如指令集和特性，以最大化效率。 - **操作系统**：操作系统对程序性能有重大影响，要考虑内核机制、调度策略等。 - **编程语言**：C++语言本身可能存在限制，但通过合理使用可以克服，如避免不必要的开销。 - **编译器**：了解不同编译器的优化选项，以及它们对代码性能的影响。 - **用户界面框架**：选择能提供高效渲染和交互的框架，减少不必要的资源消耗。 3. **识别性能瓶颈**：通过分析方法找出程序中的主要性能消耗点，如： - **时钟周期**：理解程序执行的基本单位。 - **代码热区**：使用性能分析工具定位效率低下的代码区域。 - **安装与加载**：优化程序的部署和启动过程。 - **文件访问**：减少磁盘I/O操作，提高数据访问速度。 - **系统资源**：注意内存、CPU、硬盘和网络等资源的使用效率。 - **上下文切换**：减少进程间的切换，提高并发性能。 - **依赖关系**：优化函数调用链，减少间接调用带来的性能损失。 - **执行单元吞吐量**：优化代码结构以充分利用处理器能力。 4. **性能与用户体验**：优化不仅关注速度，还要确保程序的易用性和响应性。 5. **选择最优算法**：选择时间复杂度低且空间效率高的算法，是提高性能的关键。 6. **C++构造效率**：深入探讨不同类型的变量存储、整数和浮点运算、枚举类型、布尔值等C++特性对性能的影响，提供优化建议。通过阅读这本书，学习者可以掌握一套实用的C++代码优化策略，从而提升程序性能并适应不同的平台环境。理解这些概念并应用于实际项目中，将有助于开发出更加高效、可维护的C++软件。

3.12 Execution unit throughput

There is an important distinction between the latency and the throughput of an execution

unit. For example, it takes five clock cycles to do a floating point addition on a Pentium 4.

But it is possible to start a new floating point addition every clock cycle. This means that if

each addition depends on the result of the preceding addition then you will have only one

addition every five clock cycles. But if all the additions are independent then you can have

one addition every clock cycle.

The highest performance that can possibly be obtained in a computationally intensive

program is achieved when none of the time-consumers mentioned in the above sections are

dominating and there are no long dependence-chains. In this case, the performance is

limited by the throughput of the execution units rather than by the latency or by memory

access.

The execution core of modern microprocessors is split between several execution units.

Typically, there are two or more integer units, one floating point addition unit, and one

floating point multiplication unit. This means that it is possible to do an integer addition, a

floating point addition, and a floating point multiplication at the same time.

A code that does floating point calculations should therefore preferably have a balanced mix

of additions and multiplications. Subtractions use the same unit as additions. Divisions take

longer time and use the multiplication unit. It is possible to do integer operations in-between

the floating point operations without reducing the performance because the integer

operations use a different execution unit. For example, a loop that does floating point

calculations will typically use integer operations for incrementing a loop counter, comparing

the loop counter with its limit, etc. In most cases, you can assume that these integer

operations do not add to the total computation time.

4 Performance and usability

A better performing software product is one that saves time for the user. Time is a precious

resource for many computer users and much time is wasted on software that is slow,

difficult to use, incompatible or error prone. All these problems are usability issues, and I

believe that software performance should be seen in the broader perspective of usability. A

list of literature on usability is given on page 132.

This is not a manual on usability, but I think that it is necessary here to draw the attention of

software programmers to some of the most common obstacles to efficient use of software.

The following list points out some typical sources of frustration and waste of time for

software users as well as important usability problems that software developers should be

aware of.

• Big runtime frameworks. The .NET framework and the Java virtual machine are

frameworks that typically take much more resources than the programs they are

running. Such frameworks are frequent sources of resource problems and

compatibility problems and they waste a lot of time both during installation of the

framework itself, during installation of the program that runs under the framework,

during start of the program, and while the program is running. The main reason why

such runtime frameworks are used at all is for the sake of cross-platform portability.

Unfortunately, the cross-platform compatibility is not always as good as expected. I

believe that the portability could be achieved more efficiently by better standardi-

zation of programming languages, operating systems, and API's.

• Memory swapping. Software developers typically have more powerful computers

with more RAM than end users have. The developers may therefore fail to see the

excessive memory swapping and other resource problems that cause the resource-

hungry applications to perform poorly for the end user.

• Installation problems. The procedures for installation and uninstallation of programs

should be standardized and done by the operating system rather than by individual

installation tools.

• Automatic updates. Automatic updating of software can cause problems if the

network is unstable or if the new version causes problem that were not present in the

old version. Updating mechanisms often disturb the users with nagging pop-up

messages saying please install this important new update or even telling the user to

restart the computer while he or she is busy concentrating on important work. The

updating mechanism should never interrupt the user but only show a discrete icon

signaling the availability of an update, or update automatically when the computer is

restarted anyway. Software distributors are often abusing the update mechanism to

advertise new versions of their software. This is annoying to the user.

• Compatibility problems. All software should be tested on different platforms, different

screen resolutions, different system color settings and different user access rights.

Software should use standard API calls rather than self-styled hacks and direct

hardware access. Available protocols and standardized file formats should be used.

Web systems should be tested in different browsers, different platforms, different

screen resolutions, etc. Accessibility guidelines should be obeyed (See literature

page 132).

• Copy protection. Some copy protection schemes are based on hacks that violate or

circumvent operating system standards. Such schemes are frequent sources of

compatibility problems and system breakdown. Many copy protection schemes are

based on hardware identification. Such schemes cause problems when the

hardware is updated. Most copy protection schemes are annoying to the user and

prevent legitimate backup copying without effectively preventing illegitimate copying.

The benefits of a copy protection scheme should be weighed against the costs in

terms of usability problems and necessary support.

• Hardware updating. The change of a hard disk or other hardware often requires that

all software be reinstalled and user settings are lost. It is not unusual for the

reinstallation work to take a whole workday or more. Current operating systems need

better support for hard disk copying.

• Security. The vulnerability of software with network access to virus attacks and other

abuse is extremely costly to many users.

• Background services. Many services that run in the background are unnecessary for

the user and a waste of resources. Consider running the services only when

activated by the user.

• Take user feedback seriously. User complaints should be regarded as a valuable

source of information about bugs, compatibility problems, usability problems and

desired new features. User feedback should be handled in a systematic manner to

make sure the information is utilized appropriately. Users should get a reply about

investigation of the problems and planned solutions. Patches should be easily

available from a website.

5 Choosing the optimal algorithm

The first thing to do when you want to optimize a piece of CPU-intensive software is to find

the best algorithm. The choice of algorithm is very important for tasks such as sorting,

searching, or mathematical calculations. In such cases, you can obtain much more by

choosing the best algorithm than by optimizing the first algorithm that comes to mind. In

some cases you may have to test several different algorithms in order to find the one that

works best on a typical set of test data.

The discussion of different algorithms is beyond the scope of this manual. You have to

consult the general literature on algorithms for standard tasks such as sorting and

searching, or the specific literature for more complicated mathematical tasks.

Before you start to code, you may consider whether others have done the job before you.

Optimized function libraries for many standard tasks are available from a number of

sources. For example, the "Intel Math Kernel Library" contains many functions for common

mathematical calculations including linear algebra and statistics, and the "Intel Integrated

Performance Primitives" library contains many functions for audio and video processing,

signal processing, data compression and cryptography (www.intel.com

6 The efficiency of different C++ constructs

Most programmers have little or no idea how a piece of program code is translated into

machine code and how the microprocessor handles this code. For example, many

programmers do not know that double precision calculations are just as fast as single

precision. And who would know that a template class is more efficient than a polymorphous

class?

This chapter is aiming at explaining the relative efficiency of different C++ language

elements in order to help the programmer choosing the most efficient alternative. The

theoretical background is further explained in the other volumes in this series of manuals.

6.1 Different kinds of variable storage

Variables and objects are stored in different parts of the memory, depending on how they

are declared in a C++ program. This has influence on the efficiency of the data cache (see

page 72). Data caching is poor if data are scattered randomly around in the memory. It is

therefore important to understand how variables are stored. The storage principles are the

same for simple variables, arrays and objects.

Storage on the stack

Variables declared with the keyword auto are stored on the stack. The keyword auto is

practically newer used because automatic storage is the default for all variables and objects

that are declared inside any function.

The stack is a part of memory that is organized in a first-in-last-out fashion. It is used for

storing function return addresses (i.e. where the function was called from), function

parameters, local variables, and for saving registers that have to be restored before the

function returns. Every time a function is called, it allocates the required amount of space on

the stack for all these purposes. This memory space is freed when the function returns. The

next time a function is called, it can use the same space for the parameters of the new

function.

The stack is the most efficient place to store data because the same range of memory

addresses is reused again and again. If there are no big arrays, then it is almost certain that

this part of the memory is mirrored in the level-1 data cache, where it is accessed quite fast.

The lesson we can learn from this is that all variables and objects should preferably be

declared inside the function in which they are used.

It is possible to make the scope of a variable even smaller by declaring it inside a {}

bracket. However, most compilers do not free the memory used by a variable until the

function returns even though it could free the memory when exiting the {} bracket in which

the variable is declared.

Global or static storage

Variables that are declared outside of any function are called global variables. They can be

accessed from any function. Global variables are stored in a static part of the memory. The

static memory is also used for variables declared with the static keyword, for floating

point constants, string constants, array initializer lists, switch statement jump tables, and

virtual function tables.

The static data area is usually divided into three parts: one for constants that are never

modified by the program, one for initialized variables that may be modified by the program,

and one for uninitialized variables that may be modified by the program.

The advantage of static data is that they can be initialized to desired values before the

program starts. The disadvantage is that the memory space is occupied throughout the

whole program execution, even if the variable is only used in a small part of the program.

This makes data caching less efficient.

Do not make variables global if you can avoid it. Global variables may be needed for

communication between different threads, but that's about the only situation where they are

unavoidable. It may be useful to make a variable global if it is accessed by several different

functions and you want to avoid the overhead of transferring the variable as function

parameter. But it may be a better solution to make the functions that access the save

variable members of the same class and store the shared variable inside the class. Which

solution you prefer is a matter of programming style.

It is often preferable to make a lookup-table static. Example:

// Example 6.1

float SomeFunction (int x) {

static float list[] = {1.1, 0.3, -2.0, 4.4, 2.5};

return list[x];

}

The advantage of using static here is that the list does not need to be initialized when the

function is called. The values are simply put there when the program is loaded into memory.

If the word static is removed from the above example, then all five values have to be put

into the list every time the function is called. This is done by copying the entire list from

static memory to stack memory. Copying constant data from static memory to the stack is a

waste of time in most cases, but it may be optimal in special cases where the data are used

may times in a loop where almost the entire level-1 cache is used in a number of arrays that

you want to keep together on the stack.

String constants and floating point constants are stored in static memory. Example:

// Example 6.2

a = b * 3.5;

c = d + 3.5;

Here, the constant 3.5 will be stored in static memory. Most compilers will recognize that

the two constants are identical so that only one constant needs to be stored. All identical

constants in the entire program will be joined together in order to minimize the amount of

cache space used for constants.

Integer constants are usually included as part of the instruction code. You can assume that

there are no caching problems for integer constants.

A limited number of variables can be stored in registers instead of main memory. A register

is a small piece of memory inside the CPU used for temporary storage. Variables that are

stored in registers are accessed very fast. All optimizing compilers will automatically choose

the most often used variables in a function for register storage.

The number of registers is very limited. There are approximately six integer registers

available for general purposes in 32-bit operating systems and fourteen integer registers in

64-bit systems.

Floating point variables use a different kind of registers. There are eight floating point

registers available in 32-bit operating systems and sixteen in 64-bit operating systems.

Some compilers have difficulties making floating point register variables in 32-bit mode

without the SSE2 instruction set.

Volatile

The volatile keyword specifies that a variable cannot be stored in a register, not even

temporarily. This is necessary for variables that are accessed by more than one thread.

Volatile storage prevents the compiler from doing any kind of optimization on the variable. It

is sometimes used for turning off optimization of a particular variable.

Thread-local storage

Most compilers can make thread-local storage of static and global variables by using the

keyword __thread or __declspec(thread). Such variables have one instance for

each thread. Thread-local storage is inefficient because it is accessed through a pointer

stored in a thread environment block. Thread-local storage should be avoided, if possible,

and replaced by storage on the stack (see above, p. 18). Variables stored on the stack

always belong to the thread in which they are created.

Far

Systems with segmented memory, such as DOS and 16-bit Windows, allow variables to be

stored in a far data segment by using the keyword far (arrays can also be huge). Far

storage, far pointers, and far procedures are inefficient. If a program has too much data for

one segment then it is recommended to use a different operating systems that allows bigger

segments (32-bit or 64-bit systems).

Dynamic memory allocation

Dynamic memory allocation is done with the operators new and delete or with the

functions malloc and free. These operators and functions consume a significant amount

of time. A part of memory called the heap is reserved for dynamic allocation. The heap can

easily become fragmented when objects of different sizes are allocated and deallocated in

random order. The heap manager can spend a lot of time cleaning up spaces that are no

longer used and searching for vacant spaces. This is called garbage collection. Objects that

are allocated in sequence are not necessarily stored sequentially in memory. They may be

scattered around at different places when the heap has become fragmented. This makes

data caching inefficient.

Dynamic memory allocation also tends to make the code more complicated and error-prone.

The program has to keep pointers to all allocated objects and keep track of when they are

no longer used. It is important that all allocated objects are also deallocated in all possible

cases of program flow. Failure to do so is a common source of error known as memory leak.

An even worse kind of error is to access an object after it has been deallocated. The

program logic may need extra overhead to prevent such errors.

剩余132页未读，继续阅读

夜行歌

粉丝: 15
资源: 92

C++代码优化指南：关键点与策略

Optimizing subroutines in assembly language: An optimization guide for x86 platforms

讲C++代码优化的书，很不错的

17个C++代码优化技巧

c++代码优化

C++ 代码优化

C/C++代码优化技巧

C++代码优化技巧与实践

嵌入式系统中C/C++代码优化技巧

C++代码优化技巧：提升性能的策略

C++代码优化实战技巧与测试案例解析

最新资源