Skip to content

Latest commit

 

History

History
215 lines (176 loc) · 14.2 KB

README-proftool.md

File metadata and controls

215 lines (176 loc) · 14.2 KB

README

proftool is a utility for capturing and examining profiles of Java programs. It's intended to provide machine level details about the execution to aid with JIT performance analysis. It currently consists of three primary components:

  • a JVMTI agent for capturing all the assembly generated by the JVM.
  • a parser for Linux perf output
  • a parser for HotSpot LogCompilation information

By combining these components into a single command line, the perf profile information can be attributed to the JIT code. Proftool can also capture and examine profiles of standalone executables built with Native Image. Profile collection currently only supports Linux perf though once the data is captured then the profile can be viewed anywhere.

This suite must be built first to create the JVMTI agent library and the suite is normally dynamically imported to make the commands available through mx.

Recording a profile

There are currently 2 ways to capture profiles. The direct way is to use the profrecord command which accepts either a full Java command line, an executable compiled by Native Image, or a Truffle language launcher command line. profrecord launches the command line with extra arguments to collect the data.

$ mx profrecord -E fop /home/graal/oraclejdk1.8.0_291-jvmci-21.1-b03/bin/java -jar dacapo.jar fop -n 56
$ mx profrecord -E myprofile /.../graalvm/bin/js ...
$ mx profrecord -E image_profile ./my_native_image ...

The required argument -E <name> specifies the directory that will contain the output the files. It's an error if it already exists but the -O will overwrite the directory if it already exists. mx profrecord should work with any Java command line you can supply.

The second way is to use the proftool profiler with the mx benchmark command. The proftool profiler conflicts with the rss tracker which is enabled by default, so it currently must be explicitly turned off. For example, running:

$ mx benchmark dacapo:fop --tracker none -- --profiler proftool

produces a uniquely named directory like proftool_fop_2021-07-11_181530 which contains a profile of the full benchmark run. Note that the --tracker and --profiler arguments are on different sides of the -- separator. The same process applies to Native Image benchmarks, e.g.:

$ mx --env ni-ce benchmark dacapo-native-image:fop --tracker none -- --jvm=native-image --jvm-config=default-ce \
    --profiler proftool

After capturing the profile, the perf binary output needs to be converted to text using the profpackage command because the current benchmark profiler infrastructure can't do that automatically.

$ mx profpackage proftool_fop_2021-07-11_181530
Created /home/graal/ws/graal/compiler/proftool_fop_2021-07-11_181530.zip

This command additionally packs everything into a zip to make it easy to move profiles around, though that step can be skipped with the -n option. Both the zip and directory form can be used directly by all the proftool commands so use whichever form is most convenient. Packaging as a zip it intended to simplify capturing the profile on a Linux machine then moving it to another machine for analysis.

Not all benchmark suites actually support the --profiler option even though it's broadly advertised in the help output. At the current time, only the dacapo, scala-dacapo, renaissance and renaissance-legacy suites fully support the --profiler option. Their Native Image counterparts (e.g., dacapo-native-image) are also supported. The JMH benchmarks only correctly support it when the JMH option -f 0 is used to suppress forking by the harness. Note that this changes the way the benchmark is run and might not produce the same results. With JMH you can always fall back to the JMH perfasm profiler which is sufficient for most uses.

For other benchmarks, you can always use mx profrecord directly by capturing the command line used to launch the benchmark with mx -v benchmark ... and then passing that to mx profrecord. For example, for the fop benchmark above:

$ mx -v benchmark dacapo:fop --tracker none 

displays a very long command line in the middle of all the output that includes something like the follow:

/home/tkrodrig/ws/unroll/graal/compiler/mxbuild/linux-amd64/graaljdks/jdk1.8-cmp/bin/java -server ... \
-jar /home/graal/.mx/cache/DACAPO_e39957904b7e79caf4fa54f30e8e4ee74d4e9e37/dacapo.jar fop -n 56

For compactness, the sample output above doesn't include all required arguments. Copy and paste that line to use it with mx profrecord:

$ mx profrecord -E fop /home/tkrodrig/ws/unroll/graal/compiler/mxbuild/linux-amd64/graaljdks/jdk1.8-cmp/bin/java -server ... \
-jar /home/graal/.mx/cache/DACAPO_e39957904b7e79caf4fa54f30e8e4ee74d4e9e37/dacapo.jar fop -n 56

Examining a profile.

The primary command for examining profiles is profhot. This gives a summary of the top C functions and the top JIT methods. The JIT methods are additionally disassembled and annotated with the profile information. Truncated sample output looks like this:

$ mx profhot proftool_fop_2021-07-11_181530.zip
Hot C functions:
  Percent   Name
    5.39%   SpinPause
    2.07%   ParMarkBitMap::mark_obj
    1.98%   ParallelCompactData::add_obj
    0.95%   GreyObjectsWalker_walkGreyObjects_e59a80a81022647b793f2f995d251cf18cbcbab4
    0.79%   ControlFlowGraph_identifyBlocks_91c6bd2376323c10cda627f1ed5130f90d88a82f
    0.69%   EconomicMapImpl_compareKeys_139d60c35d64f232b5b9fac87391585f45801e09
    0.67%   ArrayList_add_43f6399c3b46d50c69e7ebaee5e44ba3b6a870f3
    0.58%   ControlFlowGraph_identifyBlock_7f27028f31036b771721050156eba6ce33899dfa
    0.55%   SchedulePhase$Instance_scheduleEarliestIterative_f11f1d84d698be305ec40ea50fa995f7185d393f
    0.54%   EconomicMapImpl_find_70589f86ac2abbc40566645a1af83af4aa5c89f0

Hot generated code:
  Percent   Compile Id  Name
    1.97%         2557  sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa(int, long, int, boolean)
    1.86%         3101  org.apache.fop.render.ps.PSRenderer.renderText(org.apache.fop.area.inline.AbstractTextArea, java.lang.String, int[])
    1.27%               Interpreter
    1.12%         3353  org.apache.fop.fo.FObj.addChildNode(org.apache.fop.fo.FONode)
    0.75%         3112  java.text.DigitList.set(boolean, double, int, boolean)
    0.63%         1357  org.apache.fop.fo.properties.PropertyMaker.get(int, org.apache.fop.fo.PropertyList, boolean, boolean)
    0.61%         3145  org.apache.fop.layoutmgr.inline.TextLayoutManager.getNextKnuthElements(org.apache.fop.layoutmgr.LayoutContext, int)
    0.51%         3110  java.text.DecimalFormat.subformat(java.lang.StringBuffer, java.text.Format$FieldDelegate, boolean, boolean, int, int, int, int)
    0.34%         2787  org.apache.fop.layoutmgr.inline.TextLayoutManager.addAreas(org.apache.fop.layoutmgr.PositionIterator, org.apache.fop.layoutmgr.LayoutContext)
    0.32%         2738  org.apache.fop.layoutmgr.inline.TextLayoutManager.createTextArea(org.apache.fop.traits.MinOptMax, int, org.apache.fop.layoutmgr.LayoutContext, int, int, int, boolean)

sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa(int, long, int, boolean)
0x7f7a69251e80-0x7f7a69252d38 (samples=613, period=2591391597)
Hot region 1
       0x7f7a69252082:  sub     ecx, edi        ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:252
       0x7f7a69252084:  mov     edx, ecx        ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:287
       0x7f7a69252086:  add     edx, r13d
       0x7f7a69252089:  add     edx, dword ptr [rax + rsi*4 + 0x10]     ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:307
 0.16% 0x7f7a6925208d:  cmp     edx, 0x40       ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:330
       0x7f7a69252090:  jge     0x7f7a69252a62
       0x7f7a69252096:  sub     r10d, r14d      ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:372
       0x7f7a69252099:  sub     r10d, edi       ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:259
       0x7f7a6925209c:  mov     edi, r10d       ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:294
       0x7f7a6925209f:  add     edi, dword ptr [rax + rbx*4 + 0x10]
       0x7f7a692520a3:  inc     edi
       0x7f7a692520a5:  cmp     edi, 0x40
       0x7f7a692520a8:  jge     0x7f7a69252c6f
 ...

Every piece of generated code is disassembled and broken into hot regions where ticks are clustered. This gives a general overview of the hot parts of the execution.

mx profhot can also summarize profiles of native images, but the output is slightly different. To view the annotated code, it is necessary to run the perf command given by profhot. For example:

$ mx profhot proftool_*
Hot code:
  Percent   Name
   10.40%   com.oracle.svm.core.genscavenge.GreyObjectsWalker.walkGreyObjects()
    4.06%   [unknown]
    3.26%   java.util.HashMap.computeIfAbsent(Object, Function)
    2.54%   java.util.stream.AbstractPipeline.copyInto(Sink, Spliterator)
    2.29%   java.util.stream.AbstractPipeline.evaluate(TerminalOp)
    2.12%   java.util.HashMap$EntrySpliterator.forEachRemaining(Consumer)
    2.06%   java.util.regex.Pattern$CharPropertyGreedy.match(Matcher, int, CharSequence)
    1.92%   java.util.stream.Collectors.lambda$groupingBy$53(Function, Supplier, BiConsumer, Map, Object)
    1.88%   java.util.stream.ReferencePipeline.collect(Collector)
    1.87%   java.lang.StringLatin1.toUpperCase(String, byte[], Locale)

Display annotated code by running:
  perf report -Mintel --sort symbol -i /path/to/perf_binary_file

Proftool displays only one section where symbols might come from different executables or libraries. This is because a native image may statically link non-Java code, and proftool cannot distinguish Java symbols from foreign symbols. As another consequence, non-Java method names might be formatted as if they were Java names.

An [unknown] name might come from a dynamic library or the kernel if perf cannot resolve the symbols. Run with --dso to display the name of the library. If there are no Java symbols at all, try building the executable with -g so that debug info is generated. Read the manual to learn more about debugging native images.

Proftool provides limited extra support for Truffle programs at the moment. The only extra information printed is the Truffle provided name of the generated nmethod when printing the nmethod names. This results in summary output like this for the octane splay benchmark:

Hot generated code:
  Percent   Name
    3.04%   10260: "MeasureDefault#2" OptimizedCallTarget.profiledPERoot(Object[])
    1.79%   5934: "GeneratePayloadTree#2" OptimizedCallTarget.profiledPERoot(Object[])
    0.74%   10118: "SplayTree.remove#2" OptimizedCallTarget.profiledPERoot(Object[])
    0.65%   Interpreter
    0.18%   7976: "SplayRun#2" OptimizedCallTarget.profiledPERoot(Object[])
    0.16%   11695: FastDtoa.digitGen(DiyFp, DiyFp, DiyFp, DtoaBuffer, int)
    0.13%   8914: "SplayTree.splay_#2" OptimizedCallTarget.profiledPERoot(Object[])
    0.07%   1345: OptimizedCallTarget.callBoundary(Object[])
    0.07%   jshort_disjoint_arraycopy
    0.07%   7877: "SplayTree.splay_#1" OptimizedCallTarget.profiledPERoot(Object[])

OptimizedCallTarget.profiledPERoot is the Java method that is used by the partial evaluator when compiling Truffle methods and the name in quotes should correspond to the compiled JavaScript function. More support is planned for future versions of proftool.

Other options to profhot give more control over how this is printed. It's possible to strip package names from the output using the '-s' option which can make the output more compact. The debug information associated with a pc can be quite deep which will overwhelm the actual assembly output. The '-c' option can be used to control the number of frames printed, so passing 0 will hide the frame information completely and 1 will show just deepest inline frame.

Checking basic block relative frequencies

Proftool has a functionality to check the relative frequencies of the basic blocks that the graal compiler computes during its transformations. It works by telling the compiler to dump the relative frequencies of blocks during the code emittion. To enable it, use the debug option -Dgraal.PrintBBInfoPath with the path where to dump the basic block information. During compilation the compiler will create a new file in the provided directory (if provided otherwise nothing is dumped) for each compilation and write the block information in it. To simplify its use, proftool can automatically manage this option. By using the -B or --with-basic-block-info option of the mx profrecord command, it will create a directory called block_info inside the experiment directory and tell the compiler to dump the block information into it. Similarly with mx benchmarks, one can use the proftool-with-blocks profiler to enable the dumping of block information in the experiment directory.

To check the blocks relative frequencies, use the mx checkblocks command. It requires a path to a proftool experiment directory. If the block_info directory is missing from this experiment or if an other one should be used, it can be provided using the -b or --block-info options. Otherwise it will use the block_info directory from the proftool experiment. Checkblocks firstly counts the number of perf sample and their total period for each blocks. Then for the hottest methods, it performs two kind of checks.

  • The first one computes the relative frequency of each block with respect to the first block and compares it with the relative frequency from the compiler. Note that if the first block didn't got any sample this check won't be done and a warning will be printed.
  • The second check computes and compares the relative frequency of each block with respect to the most frequent block. Note that if the hottest block from the compiler and from perf aren't the same the check won't be performed and a warning will be printed. Additionally if the 5 hottest blocks from the compiler and from perf are disjoint sets then something is probably wrong so an error is printed. These checks are not perfect and so to manually look at the data, checkblocks can print it using the -r or --raw flag.