O’reilly - Java Performance Tuning
- 16 -
Your general benchmark suite should be based on real functions used in the end application, but at
the same time should not rely on user input, as this can make measurements difficult. Any
variability in input times or any other part of the application should either be eliminated from the
benchmarks or precisely identified and specified within the performance targets. There may be
variability, but it must be controlled and reproducible.
1.6.3 The Benchmark Harness
There are tools for testing applications in various ways.
[2]
These tools focus mostly on testing the
robustness of the application, but as long as they measure and report times, they can also be used for
performance testing. However, because their focus tends to be on robustness testing, many tools
interfere with the application's performance, and you may not find a tool you can use adequately or
cost-effectively. If you cannot find an acceptable tool, the alternative is to build your own harness.
[2]
You can search the Web for java+perf+test to find performance-testing tools. In addition, some Java profilers are listed in Chapter 15.
Your benchmark harness can be as simple as a class that sets some values and then starts the main(
)
method of your application. A slightly more sophisticated harness might turn on logging and
timestamp all output for later analysis. GUI-run applications need a more complex harness and
require either an alternative way to execute the graphical functionality without going through the
GUI (which may depend on whether your design can support this), or a screen event capture and
playback tool (several such tools exist
[3]
). In any case, the most important requirement is that your
harness correctly reproduces user activity and data input and output. Normally, whatever
regression-testing apparatus you have (and presumably are already using) can be adapted to form a
benchmark harness.
[3]
JDK 1.3 introduced a new java.awt.Robot class, which provides for generating native system-input events, primarily to support automated
testing of Java GUIs.
The benchmark harness should not test the quality or robustness of the system. Operations should
be normal: startup, shutdown, noninterrupted functionality. The harness should support the different
configurations your application operates under, and any randomized inputs should be controlled;
but note that the random sequence used in tests should be reproducible. You should use a realistic
amount of randomized data and input. It is helpful if the benchmark harness includes support for
logging statistics and easily allows new tests to be added. The harness should be able to reproduce
and simulate all user input, including GUI input, and should test the system across all scales of
intended use, up to the maximum numbers of users, objects, throughputs, etc. You should also
validate your benchmarks, checking some of the values against actual clock time to ensure that no
systematic or random bias has crept into the benchmark harness.
For the multiuser case, the benchmark harness must be able to simulate multiple users working,
including variations in user access and execution patterns. Without this support for variations in
activity, the multiuser tests inevitably miss many bottlenecks encountered in actual deployment and,
conversely, do encounter artificial bottlenecks that are never encountered in deployment, wasting
time and resources. It is critical in multiuser and distributed applications that the benchmark harness
correctly reproduces user-activity variations, delays, and data flows.
1.6.4 Taking Measurements
Each run of your benchmarks needs to be under conditions that are as identical as possible;
otherwise it becomes difficult to pinpoint why something is running faster (or slower) than in
another test. The benchmarks should be run multiple times, and the full list of results retained, not
just the average and deviation or the ranged percentages. Also note the time of day that benchmarks