Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV caused by ElasticApmTracer.captureException #3583

Open
kelunik opened this issue Apr 4, 2024 · 4 comments
Open

SIGSEGV caused by ElasticApmTracer.captureException #3583

kelunik opened this issue Apr 4, 2024 · 4 comments
Labels
agent-java community Issues and PRs created by the community triage

Comments

@kelunik
Copy link
Contributor

kelunik commented Apr 4, 2024

SIGSEGV caused by ElasticApmTracer.captureException:

V  [libjvm.so+0xa9808d]  LinkResolver::resolve_invoke(CallInfo&, Handle, constantPoolHandle const&, int, Bytecodes::Code, JavaThread*)+0x20d
V  [libjvm.so+0x82436f]  InterpreterRuntime::resolve_invoke(JavaThread*, Bytecodes::Code)+0x15f
V  [libjvm.so+0x8248a7]  InterpreterRuntime::resolve_from_cache(JavaThread*, Bytecodes::Code)+0x37
j  co.elastic.apm.agent.impl.ElasticApmTracer.captureException(JLjava/lang/Throwable;Lco/elastic/apm/agent/impl/transaction/ElasticContext;Ljava/lang/ClassLoader;)Lco/elastic/apm/agent/impl/error/ErrorCapture;+43
j  co.elastic.apm.agent.impl.ElasticApmTracer.captureAndReportException(JLjava/lang/Throwable;Lco/elastic/apm/agent/impl/transaction/ElasticContext;)Ljava/lang/String;+9
j  co.elastic.apm.agent.impl.transaction.AbstractSpan.captureExceptionAndGetErrorId(JLjava/lang/Throwable;)Ljava/lang/String;+16
j  co.elastic.apm.agent.impl.transaction.AbstractSpan.captureException(Ljava/lang/Throwable;)Lco/elastic/apm/agent/impl/transaction/AbstractSpan;+16
j  co.elastic.apm.agent.impl.transaction.AbstractSpan.captureException(Ljava/lang/Throwable;)Lco/elastic/apm/agent/tracer/AbstractSpan;+2
j  co.elastic.apm.agent.okhttp.OkHttp3ClientInstrumentation$OkHttpClient3ExecuteAdvice.onAfterExecute(Lokhttp3/Response;Ljava/lang/Throwable;[Ljava/lang/Object;)V+70
J 32507 c2 okhttp3.internal.connection.RealCall.execute()Lokhttp3/Response; (161 bytes) @ 0x00007f565abda8d8 [0x00007f565abd8c20+0x0000000000001cb8]

Steps to reproduce

Exact steps are unknown so far, but these crashes happen sporadically. They seem to be always related to OkHttpClient3ExecuteAdvice.

Expected behavior

No crash.

@github-actions github-actions bot added agent-java community Issues and PRs created by the community triage labels Apr 4, 2024
@JonasKunz
Copy link
Contributor

Are you running on Java 17+?

There have been other similar crashes, like #3521 . We have been in contact with Oracle and managed to reproduce the issue with them, it definitely seems to be a JVM bug: https://bugs.openjdk.org/browse/JDK-8322726

In version 1.48.1 we have added an undocumented configuration option -Delastic.apm.safe_exceptions=3 to workaround this issue at the cost of loosing observability: With this option set, the agent will avoid touching application exceptions and use placeholder exceptions instead. So the exception counts are still correct, just the exception details will be lost.

@kelunik
Copy link
Contributor Author

kelunik commented Apr 4, 2024

Yes, running Java 17.0.10. I've seen the undocumented option, but wasn't sure on the impact it really has. Can this be documented? Especially as =3 indicates there are multiple different configuration options.

Thanks for linking the upstream bug!

@JonasKunz
Copy link
Contributor

We are hoping for the JVM bug to be fixed soon. When that happens we'll remove the option from the agent again, that's why we are not planning to have it officially documented / supported, but I can give a quick explanation here:

The new option safe_exceptions is a bit-flag for enabling/disabling certain workarounds:

  • “Redacted Exceptions”: We record a “surrogate” exception which we create where we would have recorded the application exception. This means the error count and at least a similar stacktrace are preserved, the original exception type and message will be not recorded.

  • “Map-less propagation”: This is a workaround for crashes which seemed to happen with exceptions only captured in spring exception handlers. We would put those exception into the servlet request attributes and later extract them, which in turn sometimes caused a corrputed heap due to a bad exception pointer. Instead of putting the exception into the servlet request attributes, it is simply immediately added to the Transaction immediately.

    So the configuration option can be used as follows:

  • -Delastic.apm.safe_exceptions=3: “Redacted Exceptions” and “Map-less propagation” are both enabled

  • -Delastic.apm.safe_exceptions=2: Only “Map-less propagation” is enabled

  • -Delastic.apm.safe_exceptions=1: Only “Redacted Exceptions” is enabled

  • -Delastic.apm.safe_exceptions=0: None of the workarounds are enabled (default)

@jackshirazi
Copy link
Contributor

To be closed after JVM release with https://bugs.openjdk.org/browse/JDK-8322726 fixed (we also expect a backport to 17 and 21, so wait for those for tracking purposes before closing)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-java community Issues and PRs created by the community triage
Projects
None yet
Development

No branches or pull requests

3 participants