因果分析：定位性能优化的关键代码 - COZ 算法

119 浏览量更新于2024-08-25 收藏 1.1MB PDF 举报

"COZ - Finding Code that Counts with Causal Profiling - 2015 (090-curtsinger)-计算机科学" 在软件开发领域，性能优化是一项至关重要的任务。为了找到优化的机会，开发者通常会依赖软件分析器（或称为探查器）。然而，传统的探查器只能报告程序运行时的时间消耗分布，这并不足以指导开发者进行有效的优化，因为优化那些时间消耗高的代码可能并不会对整体性能产生显著影响。这一问题导致了开发者时间和资源的浪费，并使得发现重要的优化机会变得困难。这篇名为"COZ: Finding Code that Counts with Causal Profiling"的研究论文由Charlie Curtsinger和Emery D. Berger提出，他们分别来自Grinnell College和University of Massachusetts Amherst的计算机科学系。论文引入了一种创新的性能分析方法——因果分析（Causal Profiling）。因果分析与以往的分析方法不同，它能精确地指示开发者应该在哪些代码段上集中优化努力，并量化这些优化可能带来的影响。其工作原理是在程序执行期间运行性能实验。每个实验通过虚拟加速代码来计算潜在优化的影响：插入暂停，使所有其他并发运行的代码变慢。通过这种方式，因果分析可以模拟出如果特定代码块被优化后，整个程序性能的变化情况。传统的探查器通常只能提供程序运行的时间线性视图，而因果分析则提供了因果关系的视角，揭示了代码间的相互依赖性和性能影响。这种方法有助于开发者识别那些即使时间占比小，但对整体性能有重大影响的代码片段。因此，因果分析可以更有效地指导优化工作，避免在无关紧要的代码上浪费时间和资源。此外，因果分析还可能帮助开发者发现并解决并发和多线程环境中的性能瓶颈，因为在这些环境中，优化单个代码段的效果可能受到其他并发执行代码的影响。通过这种方式，因果分析为性能优化提供了一种更加精准和系统的方法，提高了软件性能改进的效率和效果。 "COZ: Finding Code that Counts with Causal Profiling"论文介绍的因果分析技术是对传统软件性能分析工具的一个重要补充，它有助于开发者更准确地定位性能关键点，从而实现更有针对性和效益的代码优化。这一方法对于现代软件工程，尤其是高性能计算和大规模并发应用领域，具有极高的实用价值。

2. Causal Proﬁling Overview

This section describes the major steps in collecting, process-

ing, and interpreting a causal proﬁle with C

, our prototype

causal proﬁler.

Proﬁler startup. A user invokes C

OZ using a command of

the form

coz run --- <program> <args>

. At the

beginning of the program’s execution, C

collects debug

information for the executable and all loaded libraries. Users

may specify ﬁle and binary scope, which restricts C

’s

experiments to speedups in the speciﬁed ﬁles. By default,

COZ will consider speedups in any source ﬁle from the main

executable. C

builds a map from instructions to source

lines using the program’s debug information and the speciﬁed

scope. Once the source map is constructed, C

creates a

proﬁler thread and resumes normal execution.

Experiment initialization.

’s proﬁler thread begins an

experiment by selecting a line to virtually speed up, and a

randomly-chosen percent speedup. Both parameters must be

selected randomly; any systematic method of exploring lines

or speedups could lead to systematic bias in proﬁle results.

One might assume that C

could exclude lines or virtual

speedup amounts that have not shown a performance effect

early in previous experiments, but prioritizing experiments

based on past results would prevent C

from identifying

an important line if its performance only matters after some

warmup period. Once a line and speedup have been selected,

the proﬁler thread saves the number of visits to each progress

point and begins the experiment.

Applying a virtual speedup.

Every time the proﬁled pro-

gram creates a thread, C

begins sampling the instruction

pointer from this thread. COZ processes samples within each

thread to implement a sampling version of virtual speedups.

In Section 3.4, we show the equivalence between the virtual

speedup mechanism shown in Figure 3 and the sampling

approach used by C

. Every time a sample is available, a

thread checks whether the sample falls in the line of code

selected for virtual speedup. If so, it forces other threads to

pause. This process continues until the proﬁler thread indi-

cates that the experiment has completed.

Ending an experiment.

ends the experiment after a

pre-determined time has elapsed. If there were too few visits

to progress points during the experiment—ﬁve is the default

minimum—C

doubles the experiment time for the rest

of the execution. Once the experiment has completed, the

proﬁler thread logs the results of the experiment, including

the effective duration of the experiment (runtime minus the

total inserted delay), the selected line and speedup, and the

number of visits to all progress points. Before beginning the

next experiment, COZ will pause for a brief cooloff period to

allow any remaining samples to be processed before the next

experiment begins.

Illustration of Virtual Speedup

(a) Original Program

(b) Actual Speedup

original runtime

· d

original runtime

…

effect of optimizing by d

Figure 3:

An illustration of virtual speedup: (a) shows the original

execution of two threads running functions

and

; (b) shows the

effect of a actually speeding up

by 40%; (c) shows the effect of

virtually speeding up

by 40%. Each time

runs in one thread, all

other threads pause for 40% of

’s original execution time (shown as

ellipsis). The difference between the runtime in (c) and the original

runtime plus

· d

—the number of times

ran times the delay

size—is the same as the effect of actually optimizing f.

Producing a causal proﬁle.

After an application has been

proﬁled with C

, the results of all the performance exper-

iments can be combined to produce a causal proﬁle. Each

experiment has two independent variables: the line chosen

for virtual speedup and the amount of virtual speedup. C

records the dependent variable, the rate of visits to each

progress point, in two numbers: the total number of visits

to each progress point and the effective duration of the exper-

iment (the real runtime minus the total length of all pauses).

Experiments with the same independent variables can be

combined by adding the progress point visits and experiment

durations.

Once experiments have been combined, C

groups ex-

periments by the line that was virtually sped up. Any lines

that do not have a measurement of 0% virtual speedup are

discarded; without this baseline measurement we cannot com-

pute a percent speedup relative to the original program. Mea-

suring this baseline separately for each line guarantees that

any line-dependent overhead from virtual speedups, such as

the additional cross-thread communication required to insert

delays when a frequently-executed line runs, will not skew

proﬁle results. By default, C

will also discard any lines

with fewer than 5 different virtual speedup amounts (a plot

that only shows the effect of a 75% virtual speedup is not

particularly useful). Finally, we compute the percent program

186

剩余13页未读，继续阅读

weixin_38627826

粉丝: 5
资源: 939

因果分析：定位性能优化的关键代码 - COZ 算法

COZ - Finding Code that Counts with Causal Profiling - 2015 (UM-CS-2015-008)-计算机科学

COZ - Finding Code that Counts with Causal Profiling - Slides (2015)-计算机科学

前端开源库-coz-handlebars-engine

前端开源库-coz-bud-remover

前端开源库-coz-bud-writer

前端开源库-coz-bud-loader

前端开源库-coz-bud-compiler

coz-handlebars-engine:带把手的coz渲染引擎

前端开源库-coz-bud-remover.zip

前端开源库-coz-bud

最新资源