Tags merge optimization #4959

mstyura · 2024-04-10T21:11:33Z

I've observed unexpectedly significant CPU time spent in Tags merge function in one of the production services which rely on micrometer:

I've slightly refactored internals of Tags class to take advantage of tags set is being already sorted array, which could be then combined with other sorted set in linear time.

I've also added microbenchmark for tags concat operation to see the performance gain before I can test this change in production.

Below I'm providing results from Apple M2 Pro + JDK 21 machine:

Before:

Benchmark             Mode  Cnt    Score   Error  Units
TagsBenchmark.dotAnd  avgt    2  204.243          ns/op
TagsBenchmark.of      avgt    2  104.397          ns/op

Benchmark                     Mode  Cnt     Score   Error  Units
TagsMergeBenchmark.mergeTags  avgt    2  2244.210          ns/op

Full JMH output (before)

# JMH version: 1.37
# VM version: JDK 21.0.1, OpenJDK 64-Bit Server VM, 21.0.1+12-LTS
# VM invoker: /Library/Java/JavaVirtualMachines/amazon-corretto-21.jdk/Contents/Home/bin/java
# VM options: -Dfile.encoding=UTF-8 -Duser.country=PL -Duser.language=en -Duser.variant
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 2 iterations, 10 s each
# Measurement: 2 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: io.micrometer.benchmark.core.TagsMergeBenchmark.mergeTags

# Run progress: 0.00% complete, ETA 00:00:40
# Fork: 1 of 1
# Warmup Iteration   1: 646.455 ±(99.9%) 5.904 ns/op
# Warmup Iteration   2: 2216.230 ±(99.9%) 20.156 ns/op
Iteration   1: 2243.049 ±(99.9%) 33.435 ns/op
Iteration   2: 2245.372 ±(99.9%) 27.381 ns/op


Result "io.micrometer.benchmark.core.TagsMergeBenchmark.mergeTags":
  2244.210 ns/op


# Run complete. Total time: 00:00:40

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

NOTE: Current JVM experimentally supports Compiler Blackholes, and they are in use. Please exercise
extra caution when trusting the results, look into the generated code to check the benchmark still
works, and factor in a small probability of new VM bugs. Additionally, while comparisons between
different JVMs are already problematic, the performance difference caused by different Blackhole
modes can be very significant. Please make sure you use the consistent Blackhole mode for comparisons.

Benchmark                     Mode  Cnt     Score   Error  Units
TagsMergeBenchmark.mergeTags  avgt    2  2244.210          ns/op


# JMH version: 1.37
# VM version: JDK 21.0.1, OpenJDK 64-Bit Server VM, 21.0.1+12-LTS
# VM invoker: /Library/Java/JavaVirtualMachines/amazon-corretto-21.jdk/Contents/Home/bin/java
# VM options: -Dfile.encoding=UTF-8 -Duser.country=PL -Duser.language=en -Duser.variant
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 2 iterations, 10 s each
# Measurement: 2 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: io.micrometer.benchmark.core.TagsBenchmark.dotAnd

# Run progress: 0.00% complete, ETA 00:01:20
# Fork: 1 of 1
# Warmup Iteration   1: 184.006 ±(99.9%) 2.288 ns/op
# Warmup Iteration   2: 202.977 ±(99.9%) 1.666 ns/op
Iteration   1: 202.850 ±(99.9%) 2.525 ns/op
Iteration   2: 205.637 ±(99.9%) 2.360 ns/op


Result "io.micrometer.benchmark.core.TagsBenchmark.dotAnd":
  204.243 ns/op


# JMH version: 1.37
# VM version: JDK 21.0.1, OpenJDK 64-Bit Server VM, 21.0.1+12-LTS
# VM invoker: /Library/Java/JavaVirtualMachines/amazon-corretto-21.jdk/Contents/Home/bin/java
# VM options: -Dfile.encoding=UTF-8 -Duser.country=PL -Duser.language=en -Duser.variant
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 2 iterations, 10 s each
# Measurement: 2 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: io.micrometer.benchmark.core.TagsBenchmark.of

# Run progress: 50.00% complete, ETA 00:00:40
# Fork: 1 of 1
# Warmup Iteration   1: 116.358 ±(99.9%) 1.819 ns/op
# Warmup Iteration   2: 107.735 ±(99.9%) 1.373 ns/op
Iteration   1: 104.108 ±(99.9%) 1.247 ns/op
Iteration   2: 104.686 ±(99.9%) 1.438 ns/op


Result "io.micrometer.benchmark.core.TagsBenchmark.of":
  104.397 ns/op


# Run complete. Total time: 00:01:20

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

NOTE: Current JVM experimentally supports Compiler Blackholes, and they are in use. Please exercise
extra caution when trusting the results, look into the generated code to check the benchmark still
works, and factor in a small probability of new VM bugs. Additionally, while comparisons between
different JVMs are already problematic, the performance difference caused by different Blackhole
modes can be very significant. Please make sure you use the consistent Blackhole mode for comparisons.

Benchmark             Mode  Cnt    Score   Error  Units
TagsBenchmark.dotAnd  avgt    2  204.243          ns/op
TagsBenchmark.of      avgt    2  104.397          ns/op

After:

Benchmark             Mode  Cnt    Score   Error  Units
TagsBenchmark.dotAnd  avgt    2  118.698          ns/op
TagsBenchmark.of      avgt    2  100.275          ns/op

Benchmark                     Mode  Cnt    Score   Error  Units
TagsMergeBenchmark.mergeTags  avgt    2  180.412          ns/op

Full JMH output (after)

# JMH version: 1.37
# VM version: JDK 21.0.1, OpenJDK 64-Bit Server VM, 21.0.1+12-LTS
# VM invoker: /Library/Java/JavaVirtualMachines/amazon-corretto-21.jdk/Contents/Home/bin/java
# VM options: -Dfile.encoding=UTF-8 -Duser.country=PL -Duser.language=en -Duser.variant
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 2 iterations, 10 s each
# Measurement: 2 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: io.micrometer.benchmark.core.TagsMergeBenchmark.mergeTags

# Run progress: 0.00% complete, ETA 00:00:40
# Fork: 1 of 1
# Warmup Iteration   1: 183.716 ±(99.9%) 2.371 ns/op
# Warmup Iteration   2: 188.851 ±(99.9%) 1.677 ns/op
Iteration   1: 180.315 ±(99.9%) 1.748 ns/op
Iteration   2: 180.508 ±(99.9%) 1.816 ns/op


Result "io.micrometer.benchmark.core.TagsMergeBenchmark.mergeTags":
  180.412 ns/op


# Run complete. Total time: 00:00:40

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

NOTE: Current JVM experimentally supports Compiler Blackholes, and they are in use. Please exercise
extra caution when trusting the results, look into the generated code to check the benchmark still
works, and factor in a small probability of new VM bugs. Additionally, while comparisons between
different JVMs are already problematic, the performance difference caused by different Blackhole
modes can be very significant. Please make sure you use the consistent Blackhole mode for comparisons.

Benchmark                     Mode  Cnt    Score   Error  Units
TagsMergeBenchmark.mergeTags  avgt    2  180.412          ns/op


# JMH version: 1.37
# VM version: JDK 21.0.1, OpenJDK 64-Bit Server VM, 21.0.1+12-LTS
# VM invoker: /Library/Java/JavaVirtualMachines/amazon-corretto-21.jdk/Contents/Home/bin/java
# VM options: -Dfile.encoding=UTF-8 -Duser.country=PL -Duser.language=en -Duser.variant
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 2 iterations, 10 s each
# Measurement: 2 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: io.micrometer.benchmark.core.TagsBenchmark.dotAnd

# Run progress: 0.00% complete, ETA 00:01:20
# Fork: 1 of 1
# Warmup Iteration   1: 124.894 ±(99.9%) 2.132 ns/op
# Warmup Iteration   2: 116.994 ±(99.9%) 1.534 ns/op
Iteration   1: 117.101 ±(99.9%) 2.191 ns/op
Iteration   2: 120.295 ±(99.9%) 1.594 ns/op


Result "io.micrometer.benchmark.core.TagsBenchmark.dotAnd":
  118.698 ns/op


# JMH version: 1.37
# VM version: JDK 21.0.1, OpenJDK 64-Bit Server VM, 21.0.1+12-LTS
# VM invoker: /Library/Java/JavaVirtualMachines/amazon-corretto-21.jdk/Contents/Home/bin/java
# VM options: -Dfile.encoding=UTF-8 -Duser.country=PL -Duser.language=en -Duser.variant
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 2 iterations, 10 s each
# Measurement: 2 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: io.micrometer.benchmark.core.TagsBenchmark.of

# Run progress: 50.00% complete, ETA 00:00:40
# Fork: 1 of 1
# Warmup Iteration   1: 106.194 ±(99.9%) 0.890 ns/op
# Warmup Iteration   2: 101.206 ±(99.9%) 1.222 ns/op
Iteration   1: 101.113 ±(99.9%) 1.249 ns/op
Iteration   2: 99.436 ±(99.9%) 1.208 ns/op


Result "io.micrometer.benchmark.core.TagsBenchmark.of":
  100.275 ns/op


# Run complete. Total time: 00:01:20

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

NOTE: Current JVM experimentally supports Compiler Blackholes, and they are in use. Please exercise
extra caution when trusting the results, look into the generated code to check the benchmark still
works, and factor in a small probability of new VM bugs. Additionally, while comparisons between
different JVMs are already problematic, the performance difference caused by different Blackhole
modes can be very significant. Please make sure you use the consistent Blackhole mode for comparisons.

Benchmark             Mode  Cnt    Score   Error  Units
TagsBenchmark.dotAnd  avgt    2  118.698          ns/op
TagsBenchmark.of      avgt    2  100.275          ns/op

Part of #5140.

mstyura · 2024-04-10T21:21:37Z

Hello @jonatan-ivanov! Could you please review this MR? Thanks a lot in advance!

mstyura · 2024-04-16T04:39:32Z

Hello @shakuzen, could you please review this merge request when you have a chance? Thank you in advance!

shakuzen · 2024-04-22T03:39:56Z

Thanks for the pull request. It's on our list of things to review.

mstyura · 2024-04-23T05:55:34Z

Thanks! Is there anything I can do to facilitate review, like maybe separating MR into 2 smaller independent MRs: one with benchmark and follow up with refactoring? So it may be easier to review?

mstyura · 2024-05-15T20:26:26Z

Hello there! May I kindly ask what is the status of review of this PR? May I somehow facilitate review making it happen any time soon? Thanks a lot in advance!

shakuzen

Thank you for all the work on this. It's very appreciated. We just haven't had a chance to give it the attention it deserves yet. I'll be busy traveling for a conference through the end of next week, but I should be able to properly review and test this after that. I've left some initial quick feedback.

shakuzen · 2024-05-23T07:21:18Z

micrometer-core/src/main/java/io/micrometer/core/instrument/Tags.java

+    private Tags(Tag[] sortedSet, int length) {
+        assert isSortedSet(sortedSet, length) : "bug on caller side: construction invariant violated";
+        this.sortedSet = sortedSet;
+        this.length = length;


I'm not understanding why we would take a length parameter here instead of get the length from the sortedSet array. I would guess it is mostly for the assertion effectively checking the logic of the dedup method. If that's the case, see my other comment about the assertion.

This constructor now assumes that the first length elements of sortedSet array are ordered and unique - so basically sortedSet[0..length] is a sorted set of tags. This is enforced by "construction", I mean constructor is private and other factory methods guarantee that arguments passed into sortedSet + length satisfy sorted set property.
This constructor now used not only after sort + dedup calls which guarantee that sorted set property of sortedSet[0..length] holds, but also from several other places like creating Tags from two other Tags objects when two ordered sets are merged such that merged sortedSet[0..length] must also be ordered set.
assert here is just an extra safety measure to make sure the enforcement by "construction" is additionally checked on non-production runs like tests. I'm quite confident there is no bugs in merge procedure so I can remove this assert.

shakuzen · 2024-05-23T07:23:05Z

micrometer-core/src/main/java/io/micrometer/core/instrument/Tags.java

-        Arrays.sort(this.tags);
-        dedup();
+    private Tags(Tag[] sortedSet, int length) {
+        assert isSortedSet(sortedSet, length) : "bug on caller side: construction invariant violated";


We don't have any other Java asserts in our code now. I think it'd be better to consider introducing them separate from the performance optimization proposed by this PR.

So maybe then also split this PR into one which adds only benchmark without modifying Tags and another which does optimisation inside Tags?

mstyura added 2 commits May 23, 2024 09:12

Added benchmark to measure Tags.and(Tags) operation.

0b93af8

Optimised Tags merge.

dfbdde3

mstyura force-pushed the optimize-tags-merge branch from 32ced6c to dfbdde3 Compare May 23, 2024 07:19

mstyura mentioned this pull request May 23, 2024

Speedup operations on Tags. #5140

Open

mstyura force-pushed the optimize-tags-merge branch 3 times, most recently from 18a1aed to 6b96b4d Compare May 23, 2024 13:42

Check if tags array is already sorted set before sorting.

e7a42c8

mstyura force-pushed the optimize-tags-merge branch from 6b96b4d to e7a42c8 Compare May 23, 2024 13:45

shakuzen reviewed May 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tags merge optimization #4959

Tags merge optimization #4959

mstyura commented Apr 10, 2024 •

edited

mstyura commented Apr 10, 2024

mstyura commented Apr 16, 2024

shakuzen commented Apr 22, 2024

mstyura commented Apr 23, 2024

mstyura commented May 15, 2024

shakuzen left a comment

shakuzen May 23, 2024

mstyura May 23, 2024

shakuzen May 23, 2024

mstyura May 23, 2024

Tags merge optimization #4959

Are you sure you want to change the base?

Tags merge optimization #4959

Conversation

mstyura commented Apr 10, 2024 • edited

mstyura commented Apr 10, 2024

mstyura commented Apr 16, 2024

shakuzen commented Apr 22, 2024

mstyura commented Apr 23, 2024

mstyura commented May 15, 2024

shakuzen left a comment

Choose a reason for hiding this comment

shakuzen May 23, 2024

Choose a reason for hiding this comment

mstyura May 23, 2024

Choose a reason for hiding this comment

shakuzen May 23, 2024

Choose a reason for hiding this comment

mstyura May 23, 2024

Choose a reason for hiding this comment

mstyura commented Apr 10, 2024 •

edited