-
Notifications
You must be signed in to change notification settings - Fork 771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce iterator allocations on hot paths #10745
Reduce iterator allocations on hot paths #10745
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if these adjustments are necessary. It would be better if there were some performance comparisons. If there is no particularly obvious performance gap, it may be a better suggestion not to modify it, because the entire code base has various usages. How about you? @laurit
hey @johnbley, can you send a JMH test showing the difference between
and
I would like to think that any difference would get JIT'd away, but definitely curious now... you can add any new JMH tests here: https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/javaagent-tooling/src/jmh |
OK, so I looked into benchmarking this and found some interesting stuff. First, a bit of background: I've been doing this stuff for more than two decades now and (historically) for optimizing java instrumentation I've found that paying attention to each allocation in the instrumented path is a great way to improve overhead. I threw together a simple utility that makes 100,000 faked-out servlet calls (that don't actually use the real servlet implementation and allocate no objects themselves). Our instrumentation generated ~10.8 million garbage objects (108 per instrumented invocation) on the main path alone - ignoring OK, so:
"Bad" (no warmup loops) microbenchmark data - you can see that the
"Bad" microbenchmark code:
|
...ion-api/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/SpanSuppressors.java
Outdated
Show resolved
Hide resolved
for (SpanLinksExtractor<? super REQUEST> spanLinksExtractor : spanLinksExtractors) { | ||
spanLinksExtractor.extract(spanLinksBuilder, parentContext, request); | ||
} | ||
spanLinksExtractors.forEach( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't try this out but I suspect that this is a capturing lambda that might also allocate. I think we should do some micro benchmarks to verify whether this allocates or not and whether it ends up allocating more or less than the iterator. It might be easier to replace the lists with arrays and keep using the extended for on them as that doesn't need to allocate anything for iteration. Alternatively could also use a counted loop over the array as that doesn't need to allocate the iterator, though extended for will probably look nicer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my understanding of invokedynamic
and lambda compilation this doesn't look like an allocating/capturing instance, but I could be wrong. Also totally happy to switch many of these fields from List<? extends Foo>
to simple Foo[]
- just let me know which style you want.
|
||
BySpanKey(Set<SpanKey> spanKeys) { | ||
this.spanKeys = spanKeys; | ||
this.spanKeys = spanKeys.toArray(new SpanKey[spanKeys.size()]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using new SpanKey[0]
might be a bit more performant on jdk 8 https://shipilev.net/blog/2016/arrays-wisdom-ancients/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to change if you like, but note that this is not in the hot path and only happens per-Instrumenter at startup.
Happy to change based on some of the feedback here; please just let me know what you think. |
can you add a new section after https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/contributing/style-guideline.md#javautilstreamstream-usage to document the usage recommendations? |
I went with @laurit's reccomendation and dropped the stream api usage entirely in favor of simple arrays. |
// operation listeners run after span start, so that they have access to the current span | ||
// for capturing exemplars | ||
long startNanos = getNanos(startTime); | ||
for (OperationListener operationListener : operationListeners) { | ||
context = operationListener.onStart(context, attributes, startNanos); | ||
for (int i = 0; i < operationListeners.length; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could leave this as for (OperationListener operationListener : operationListeners) {
though as the doEnd
needs to iterate in reverse order that one needs to use the counted loop, so using a counted loop here is also fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I kept it that way for the symmetry.
Took a stab at it, let me know what you think. |
Thanks! |
Allocating iterators on hot paths adds unnecessary overhead. Prefer
forEach
orint i=0
iteration for these locations. This is a straightforward optimization that doesn't add additional code complexity.I looked for static analysis tools to do this across the code base, and did not find anything particularly great. Intellij's
Loop can be collapsed with stream api
suggestions came the closest, but sometimes produced uglier code than the original or sometimes was questionable if it impacted performance or not.This is not an exhaustive exploration of this topic, but just a first and easy cut at it.