深入解析Java性能优化

需积分: 5 38 浏览量更新于2024-07-23 收藏 1.79MB PDF 举报

"Java Performance tuning 是一本由Oreilly出版社出版的专业书籍，专注于Java应用程序的性能优化技术。这本书由O'Reilly& Associates, Inc.在2000年首次发布，旨在帮助Java开发者和团队提升代码执行效率，解决性能瓶颈问题。书中可能会涉及Java平台的各个层面，包括JVM调优、内存管理、线程优化、代码优化策略以及工具的使用等。此外，书中的设计ations和制造商的商标均得到了尊重，特别提到了O'Reilly的标志和Java系列图书的商标，同时也指出与Java性能调优相关的图像设计是O'Reilly的商标。虽然O'Reilly与Sun Microsystems（现Oracle）无关，但书中可能详细讨论了与Sun Microsystems的Java技术相关的性能优化内容。" 在Java性能调优这个主题中，有几个关键的知识点至关重要： 1. **Java虚拟机（JVM）调优**：理解JVM的工作原理是优化性能的基础，包括垃圾收集器的选择、堆内存大小的调整、新生代和老年代的分配比例、方法区的设置等。 2. **内存管理**：关注内存泄漏、对象生命周期和引用类型，如强引用、软引用、弱引用和虚引用，以及如何通过内存分析工具检测和解决内存问题。 3. **线程优化**：优化多线程环境下的并发性能，包括线程池的配置、锁的使用、死锁和竞态条件的避免。 4. **代码优化**：编写高效代码，如避免冗余计算、减少不必要的对象创建、使用StringBuilder代替String连接、合理使用数据结构等。 5. **JDK工具的使用**：利用JConsole、VisualVM、JProfiler等工具监控和分析应用性能，找出性能瓶颈。 6. **性能基准测试**：通过Junit或其他测试框架进行性能测试，以量化优化效果，如使用JMH（Java Microbenchmark Harness）进行微基准测试。 7. **CPU和磁盘I/O优化**：减少不必要的计算和I/O操作，优化数据库查询，使用缓存策略来提升系统响应速度。 8. **网络优化**：优化网络通信，例如减少TCP/IP握手次数，使用更高效的序列化和反序列化库。 9. **日志和诊断**：设置合适的日志级别，利用日志记录进行问题排查，同时注意日志输出对性能的影响。 10. **持续集成与自动化**：建立自动化性能测试流程，确保每次代码更改后都能快速识别出潜在的性能问题。掌握这些知识点可以帮助开发者构建高性能的Java应用，提高系统的稳定性和可扩展性。Java性能调优是一个持续的过程，随着技术和应用的发展，新的优化策略和技术也会不断出现。

O’reilly - Java Performance Tuning

- 16 -

Your general benchmark suite should be based on real functions used in the end application, but at

the same time should not rely on user input, as this can make measurements difficult. Any

variability in input times or any other part of the application should either be eliminated from the

benchmarks or precisely identified and specified within the performance targets. There may be

variability, but it must be controlled and reproducible.

1.6.3 The Benchmark Harness

There are tools for testing applications in various ways.

[2]

These tools focus mostly on testing the

robustness of the application, but as long as they measure and report times, they can also be used for

performance testing. However, because their focus tends to be on robustness testing, many tools

interfere with the application's performance, and you may not find a tool you can use adequately or

cost-effectively. If you cannot find an acceptable tool, the alternative is to build your own harness.

[2]

You can search the Web for java+perf+test to find performance-testing tools. In addition, some Java profilers are listed in Chapter 15.

Your benchmark harness can be as simple as a class that sets some values and then starts the main(

)

method of your application. A slightly more sophisticated harness might turn on logging and

timestamp all output for later analysis. GUI-run applications need a more complex harness and

require either an alternative way to execute the graphical functionality without going through the

GUI (which may depend on whether your design can support this), or a screen event capture and

playback tool (several such tools exist

[3]

). In any case, the most important requirement is that your

harness correctly reproduces user activity and data input and output. Normally, whatever

regression-testing apparatus you have (and presumably are already using) can be adapted to form a

benchmark harness.

[3]

JDK 1.3 introduced a new java.awt.Robot class, which provides for generating native system-input events, primarily to support automated

testing of Java GUIs.

The benchmark harness should not test the quality or robustness of the system. Operations should

be normal: startup, shutdown, noninterrupted functionality. The harness should support the different

configurations your application operates under, and any randomized inputs should be controlled;

but note that the random sequence used in tests should be reproducible. You should use a realistic

amount of randomized data and input. It is helpful if the benchmark harness includes support for

logging statistics and easily allows new tests to be added. The harness should be able to reproduce

and simulate all user input, including GUI input, and should test the system across all scales of

intended use, up to the maximum numbers of users, objects, throughputs, etc. You should also

validate your benchmarks, checking some of the values against actual clock time to ensure that no

systematic or random bias has crept into the benchmark harness.

For the multiuser case, the benchmark harness must be able to simulate multiple users working,

including variations in user access and execution patterns. Without this support for variations in

activity, the multiuser tests inevitably miss many bottlenecks encountered in actual deployment and,

conversely, do encounter artificial bottlenecks that are never encountered in deployment, wasting

time and resources. It is critical in multiuser and distributed applications that the benchmark harness

correctly reproduces user-activity variations, delays, and data flows.

1.6.4 Taking Measurements

Each run of your benchmarks needs to be under conditions that are as identical as possible;

otherwise it becomes difficult to pinpoint why something is running faster (or slower) than in

another test. The benchmarks should be run multiple times, and the full list of results retained, not

just the average and deviation or the ranged percentages. Also note the time of day that benchmarks

O’reilly - Java Performance Tuning

- 17 -

are being run and any special conditions that apply, e.g., weekend or after hours in the office.

Sometimes the variation can give you useful information. It is essential that you always run an

initial benchmark to precisely determine the initial times. This is important because, together with

your targets, the initial benchmarks specify how far you need to go and highlight how much you

have achieved when you finish tuning.

It is more important to run all benchmarks under the same conditions than to achieve the end-user

environment for those benchmarks, though you should try to target the expected environment. It is

possible to switch environments by running all benchmarks on an identical implementation of the

application in two environments, thus rebasing your measurements. But this can be problematic: it

requires detailed analysis because different environments usually have different relative

performance between functions (thus your initial benchmarks could be relatively skewed compared

with the current measurements).

Each set of changes (and preferably each individual change) should be followed by a run of

benchmarks to precisely identify improvements (or degradations) in the performance across all

functions. A particular optimization may improve the performance of some functions while at the

same time degrading the performance of others, and obviously you need to know this. Each set of

changes should be driven by identifying exactly which bottleneck is to be improved and how much

a speedup is expected. Using this methodology rigorously provides a precise target of your effort.

You need to verify that any particular change does improve performance. It is tempting to change

something small that you are sure will give an "obvious" improvement, without bothering to

measure the performance change for that modification (because "it's too much trouble to keep

running tests"). But you could easily be wrong. Jon Bentley once discovered that eliminating code

from some simple loops can actually slow them down.

[4]

If a change does not improve performance,

you should revert back to the previous version.

[4]

"Code Tuning in Context" by Jon Bentley, Dr. Dobb's Journal, May 1999. An empty loop in C ran slower than one that contained an integer increment

operation.

The benchmark suite should not interfere with the application. Be on the lookout for artificial

performance problems caused by the benchmarks themselves. This is very common if no thought is

given to normal variation in usage. A typical situation might be benchmarking multiuser systems

with lack of user simulation (e.g., user delays not simulated causing much higher throughput than

would ever be seen; user data variation not simulated causing all tests to try to use the same data at

the same time; activities artificially synchronized giving bursts of activity and inactivity; etc.). Be

careful not to measure artificial situations, such as full caches with exactly the data needed for the

test (e.g., running the test multiple times sequentially without clearing caches between runs). There

is little point in performing tests that hit only the cache, unless this is the type of work the users will

always perform.

When tuning, you need to alter any benchmarks that are quick (under five seconds) so that the code

applicable to the benchmark is tested repeatedly in a loop to get a more consistent measure of where

any problems lie. By comparing timings of the looped version with a single-run test, you can

sometimes identify whether caches and startup effects are altering times in any significant way.

Optimizing code can introduce new bugs, so the application should be tested during the

optimization phase. A particular optimization should not be considered valid until the application

using that optimization's code path has passed quality assessment.

O’reilly - Java Performance Tuning

- 18 -

Optimizations should also be completely documented. It is often useful to retain the previous code

in comments for maintenance purposes, especially as some kinds of optimized code can be more

difficult to understand (and therefore to maintain).

It is typically better (and easier) to tune multiuser applications in single-user mode first. Many

multiuser applications can obtain 90% of their final tuned performance if you tune in single-user

mode and then identify and tune just a few major multiuser bottlenecks (which are typically a sort

of give-and-take between single-user performance and general system throughput). Occasionally,

though, there will be serious conflicts that are revealed only during multiuser testing, such as

transaction conflicts that can slow an application to a crawl. These may require a redesign or

rearchitecting of the application. For this reason, some basic multiuser tests should be run as early

as possible to flush out potential multiuser-specific performance problems.

Tuning distributed applications requires access to the data being transferred across the various parts

of the application. At the lowest level, this can be a packet sniffer on the network or server machine.

One step up from this is to wrap all the external communication points of the application so that you

can record all data transfers. Relay servers are also useful. These are small applications that just re-

route data between two communication points. Most useful of all is a trace or debug mode in the

communications layer that allows you to examine the higher-level calls and communication

between distributed parts.

1.7 What to Measure

The main measurement is always wall-clock time. You should use this measurement to specify

almost all benchmarks, as it's the real-time interval that is most appreciated by the user. (There are

certain situations, however, in which system throughput might be considered more important than

the wall-clock time; e.g., servers, enterprise transaction systems, and batch or background systems.)

The obvious way to measure wall-clock time is to get a timestamp using

System.currentTimeMillis( ) and then subtract this from a later timestamp to determine the

elapsed time. This works well for elapsed time measurements that are not short.

[5]

Other types of

measurements have to be system-specific and often application-specific. You can measure:

[5]

System.currentTimeMillis( ) can take up to half a millisecond to execute. Any measurement including the two calls needed to

measure the time difference should be over an interval greater than 100 milliseconds to ensure that the cost of the

System.currentTimeMillis( ) calls are less than 1% of the total measurement. I generally recommend that you do not make more than

one time measurement (i.e., two calls to

System.currentTimeMillis( )) per second.

• CPU time (the time allocated on the CPU for a particular procedure)

• The number of runnable processes waiting for the CPU (this gives you an idea of CPU

contention)

• Paging of processes

• Memory sizes

• Disk throughput

• Disk scanning times

• Network traffic, throughput, and latency

• Transaction rates

• Other system values

However, Java doesn't provide mechanisms for measuring these values directly, and measuring

them requires at least some system knowledge, and usually some application-specific knowledge

(e.g., what is a transaction for your application?).

O’reilly - Java Performance Tuning

- 19 -

You need to be careful when running tests that have small differences in timings. The first test is usually

slightly slower than any other tests. Try doubling the test run so that each test is run twice within the VM

(e.g., rename main( ) to maintest( ), and call maintest( ) twice from a new main( )).

There are almost always small variations between test runs, so always use averages to measure

differences and consider whether those differences are relevant by calculating the variance in the results.

For distributed applications , you need to break down measurements into times spent on each

component, times spent preparing data for transfer and from transfer (e.g., marshalling and

unmarshalling objects and writing to and reading from a buffer), and times spent in network

transfer. Each separate machine used on the networked system needs to be monitored during the test

if any system parameters are to be included in the measurements. Timestamps must be

synchronized across the system (this can be done by measuring offsets from one reference machine

at the beginning of tests). Taking measurements consistently from distributed systems can be

challenging, and it is often easier to focus on one machine, or one communication layer, at a time.

This is usually sufficient for most tuning.

1.8 Don't Tune What You Don't Need to Tune

The most efficient tuning you can do is not to alter what works well. As they say, "If it ain't broke,

don't fix it." This may seem obvious, but the temptation to tweak something just because you have

thought of an improvement has a tendency to override this obvious statement.

The second most efficient tuning is to discard work that doesn't need doing. It is not at all

uncommon for an application to be started with one set of specifications and to have some of the

specifications change over time. Many times the initial specifications are much more generic than

the final product. However, the earlier generic specifications often still have their stamps in the

application. I frequently find routines, variables, objects, and subsystems that are still being

maintained but are never used and never will be used, since some critical aspect of these resources

is no longer supported. These redundant parts of the application can usually be chopped without any

bad consequences, often resulting in a performance gain.

In general, you need to ask yourself exactly what the application is doing and why. Then question

whether it needs to do it in that way, or even if it needs to do it at all. If you have third-party

products and tools being used by the application, consider exactly what they are doing. Try to be

aware of the main resources they use (from their documentation). For example, a zippy DLL

(shared library) that is speeding up all your network transfers is using some resources to achieve

that speedup. You should know that it is allocating larger and larger buffers before you start trying

to hunt down the source of your mysteriously disappearing memory. Then you can realize that you

need to use the more complicated interface to the DLL that restricts resource usage, rather than a

simple and convenient interface. And you will have realized this before doing extensive (and

useless) object profiling, because you would have been trying to determine why your application is

being a memory hog.

When benchmarking third-party components, you need to apply a good simulation of exactly how

you will use those products. Determine characteristics from your benchmarks and put the numbers

into your overall model to determine if performance can be reached. Be aware that vendor

benchmarks are typically useless for a particular application. Break your application down into a

hugely simplified version for a preliminary benchmark implementation to test third-party

components. You should make a strong attempt to include all the scaling necessary so that you are

benchmarking a fully scaled usage of the components, not some reduced version that will reveal

little about the components in full use.

O’reilly - Java Performance Tuning

- 20 -

1.9 Performance Checklist

• Specify the required performance.

o Ensure performance objectives are clear.

o Specify target response times for as much of the system as possible.

o Specify all variations in benchmarks, including expected response ranges (e.g., 80%

of responses for X must fall within 3 seconds).

o Include benchmarks for the full range of scaling expected (e.g., low to high numbers

of users, data, files, file sizes, objects, etc.).

o Specify and use a benchmark suite based on real user behavior. This is particularly

important for multiuser benchmarks.

o Agree on all target times with users, customers, managers, etc., before tuning.

• Make your benchmarks long enough: over five seconds is a good target.

o Use elapsed time (wall-clock time) for the primary time measurements.

o Ensure the benchmark harness does not interfere with the performance of the

application.

o Run benchmarks before starting tuning, and again after each tuning exercise.

o Take care that you are not measuring artificial situations, such as full caches

containing exactly the data needed for the test.

• Break down distributed application measurements into components, transfer layers, and

network transfer times.

• Tune systematically: understand what affects the performance; define targets; tune; monitor

and redefine targets when necessary.

o Approach tuning scientifically: measure performance; identify bottlenecks;

hypothesize on causes; test hypothesis; make changes; measure improved

performance.

o Determine which resources are limiting performance: CPU, memory, or I/O.

o Accurately identify the causes of the performance problems before trying to tune

them.

o Use the strategy of identifying the main bottlenecks, fixing the easiest, then

repeating.

o Don't tune what does not need tuning. Avoid "fixing" nonbottlenecked parts of the

application.

o Measure that the tuning exercise has improved speed.

o Target one bottleneck at a time. The application running characteristics can change

after each alteration.

o Improve a CPU limitation with faster code and better algorithms, and fewer short-

lived objects.

o Improve a system-memory limitation by using fewer objects or smaller long-lived

objects.

o Improve I/O limitations by targeted redesigns or speeding up I/O, perhaps by

multithreading the I/O.

• Work with user expectations to provide the appearance of better performance.

o Hold back releasing tuning improvements until there is at least a 20% improvement

in response times.

o Avoid giving users a false expectation that a task will be finished sooner than it will.

o Reduce the variation in response times. Bear in mind that users perceive the mean

response time as the actual 90th percentile value of the response times.

o Keep the user interface responsive at all times.

o Aim to always give user feedback. The interface should not be dead for more than

two seconds when carrying out tasks.

o Provide the ability to abort or carry on alternative tasks.

剩余317页未读，继续阅读

orwater

粉丝: 0
资源: 2

深入解析Java性能优化

java performance tuning

Java Performance Tuning

Java Performance Tuning (streamload)

Java Performance Tuning on Linux Servers

Java performance tuning - getting started

Java Performance Tuning Guide 1.3 By 江南白衣

Sun Java System Application Server Performance Tuning Guide.pdf

Optimizing Java Practical Techniques for Improved Performance Tuning mobi

Optimizing Java Practical Techniques for Improved Performance Tuning epub

SQL Performance Tuning

最新资源