Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV in clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) #1437

Open
0x6675636b796f75676974687562 opened this issue Nov 10, 2023 · 9 comments

Comments

@0x6675636b796f75676974687562
Copy link
Contributor

I'm calling org.bytedeco.llvm.global.clang.clang_visitChildren(CXCursor, CXCursorVisitor, CXClientData) from my Java code, and, while this works when parsing small C++ codebases, I'm repeatedly getting a SIGSEGV when it comes to larger C++ codebases like that of CMake.

The behaviour is the same regardless of Java version (11 or 17) or whether it's Windows or Linux — the JVM crashes with SIGSEGV / EXCEPTION_ACCESS_VIOLATION:

image

The problematic frame is

C  [libclang.so.16+0x523ac0]  clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+0x70

This frame corresponds to the clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) function declared in CursorVisitor.h and implemented in CIndex.cpp.

If we look at the output of nm (when executed against ~/.javacpp/cache/llvm-16.0.4-1.5.9-linux-x86_64.jar/org/bytedeco/llvm/linux-x86_64/libclang.so.16), the address of both clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) and clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias] is the same:

0000000000523a50 t clang::cxcursor::CursorVisitor::Visit(CXCursor, bool)
0000000000523a50 t clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]

If we debug the core with gdb, the problematic frame will indeed be at offset 0x523ac0 (which is exactly 0x523a50 + 0x70):

(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007fa4a203dac0 in clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias] ()
   from /home/.../.javacpp/cache/llvm-16.0.4-1.5.9-linux-x86_64.jar/org/bytedeco/llvm/linux-x86_64/libclang.so.16
#2  0x00007fa4a203dd4d in clang::cxcursor::CursorVisitor::handleDeclForVisitation(clang::Decl const*) [clone .localalias] ()
   from /home/.../.javacpp/cache/llvm-16.0.4-1.5.9-linux-x86_64.jar/org/bytedeco/llvm/linux-x86_64/libclang.so.16
#3  0x00007fa4a203d457 in clang::cxcursor::CursorVisitor::VisitChildren(CXCursor) [clone .localalias] ()
   from /home/.../.javacpp/cache/llvm-16.0.4-1.5.9-linux-x86_64.jar/org/bytedeco/llvm/linux-x86_64/libclang.so.16
#4  0x00007fa4a203d9a9 in clang_visitChildren () from /home/.../.javacpp/cache/llvm-16.0.4-1.5.9-linux-x86_64.jar/org/bytedeco/llvm/linux-x86_64/libclang.so.16
#5  0x00007fa5417bab0a in Java_org_bytedeco_llvm_global_clang_clang_1visitChildren ()
   from /home/.../.javacpp/cache/llvm-16.0.4-1.5.9-linux-x86_64.jar/org/bytedeco/llvm/linux-x86_64/libjniclang.so
#6  0x00007fa5b1279298 in ?? ()
#7  0x000000070d5c6038 in ?? ()
#8  0x000000070d5c64f8 in ?? ()
#9  0x000000070d5c6850 in ?? ()
#10 0x00000007074057d8 in ?? ()
#11 0x00007fa588b30718 in ?? ()
#12 0x0000000000000000 in ?? ()

Here's the disassembly of the function body:

    0x7fa4a203da4f                                                                                  nop
    0x7fa4a203da50 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]>      push   %r12
    0x7fa4a203da52 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+2>    push   %rbp
    0x7fa4a203da53 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+3>    push   %rbx
    0x7fa4a203da54 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+4>    mov    0x20(%rsp),%eax
    0x7fa4a203da58 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+8>    lea    -0x46(%rax),%edx
    0x7fa4a203da5b <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+11>   cmp    $0x3,%edx
    0x7fa4a203da5e <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+14>   jbe    0x7fa4a203db69 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+281>
    0x7fa4a203da64 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+20>   lea    -0x1(%rax),%edx
    0x7fa4a203da67 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+23>   mov    %rdi,%rbx
    0x7fa4a203da6a <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+26>   mov    %esi,%ebp
    0x7fa4a203da6c <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+28>   cmp    $0x26,%edx
    0x7fa4a203da6f <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+31>   jbe    0x7fa4a203db20 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+208>
    0x7fa4a203da75 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+37>   sub    $0x258,%eax
    0x7fa4a203da7a <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+42>   cmp    $0x4,%eax
    0x7fa4a203da7d <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+45>   jbe    0x7fa4a203db20 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+208>
    0x7fa4a203da83 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+51>   mov    0x54(%rbx),%edx
    0x7fa4a203da86 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+54>   test   %edx,%edx
    0x7fa4a203da88 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+56>   je     0x7fa4a203da9a <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+74>
    0x7fa4a203da8a <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+58>   mov    0x58(%rbx),%eax
    0x7fa4a203da8d <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+61>   test   %eax,%eax
    0x7fa4a203da8f <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+63>   je     0x7fa4a203da9a <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+74>
    0x7fa4a203da91 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+65>   test   %bpl,%bpl
    0x7fa4a203da94 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+68>   je     0x7fa4a203db78 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+296>
    0x7fa4a203da9a <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+74>   mov    0x48(%rbx),%rdi
    0x7fa4a203da9e <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+78>   push   0x28(%rbx)
    0x7fa4a203daa1 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+81>   xor    %r12d,%r12d
    0x7fa4a203daa4 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+84>   push   0x20(%rbx)
    0x7fa4a203daa7 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+87>   push   0x18(%rbx)
    0x7fa4a203daaa <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+90>   push   0x10(%rbx)
    0x7fa4a203daad <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+93>   push   0x58(%rsp)
    0x7fa4a203dab1 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+97>   push   0x58(%rsp)
    0x7fa4a203dab5 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+101>  push   0x58(%rsp)
    0x7fa4a203dab9 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+105>  push   0x58(%rsp)
    0x7fa4a203dabd <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+109>  call   *0x38(%rbx)
  > 0x7fa4a203dac0 <clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) [clone .localalias]+112>  add    $0x40,%rsp

image

How can I further diagnose the issue?

@saudet
Copy link
Member

saudet commented Nov 11, 2023

Please try to set the "org.bytedeco.javacpp.nopointergc" system property to "true".

@saudet
Copy link
Member

saudet commented Nov 11, 2023

Did you set the LIBCLANG_DISABLE_CRASH_RECOVERY environment variable to 1?

@0x6675636b796f75676974687562
Copy link
Contributor Author

Did you set the LIBCLANG_DISABLE_CRASH_RECOVERY environment variable to 1?

@saudet, thank you for the hints.

LIBCLANG_DISABLE_CRASH_RECOVERY was initially set to 1, but changing its value back to 0 makes no difference.

Setting org.bytedeco.javacpp.nopointergc doesn't change anything, either.

@0x6675636b796f75676974687562
Copy link
Contributor Author

Since the same issue also occurs on Windows, I tried to analyze the .mdmp file with WinDbg.

Similarly, the segmentation fault occurs at the ret assembly instruction (on Linux, the disassembly is slightly different: the %rsp is being incremented, but the meaning is the same):

This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(32868.337ac): Access violation - code c0000005 (first/second chance not available)
For analysis of this file, run !analyze -v
ntdll!NtGetContextThread+0x14:
00007ffc`f51cee34 c3              ret

Running !analyze -v results in

*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************

*** WARNING: Unable to verify checksum for jniclang.dll

KEY_VALUES_STRING: 1

    Key  : AV.Fault
    Value: Execute

    Key  : Analysis.CPU.mSec
    Value: 905

    Key  : Analysis.Elapsed.mSec
    Value: 79916

    Key  : Analysis.IO.Other.Mb
    Value: 4

    Key  : Analysis.IO.Read.Mb
    Value: 17

    Key  : Analysis.IO.Write.Mb
    Value: 34

    Key  : Analysis.Init.CPU.mSec
    Value: 311

    Key  : Analysis.Init.Elapsed.mSec
    Value: 20867

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 91

    Key  : Failure.Bucket
    Value: SOFTWARE_NX_FAULT_c0000005_libclang.dll!Unknown

    Key  : Failure.Hash
    Value: {58c64d95-3ee2-2504-9cfe-4b7ff0ac9dd8}

    Key  : Timeline.OS.Boot.DeltaSec
    Value: 1122330

    Key  : Timeline.Process.Start.DeltaSec
    Value: 44

    Key  : WER.OS.Branch
    Value: vb_release

    Key  : WER.OS.Version
    Value: 10.0.19041.1

    Key  : WER.Process.Version
    Value: 17.0.3.1


FILE_IN_CAB:  hs_err_pid206952.mdmp

NTGLOBALFLAG:  0

PROCESS_BAM_CURRENT_THROTTLED: 0

PROCESS_BAM_PREVIOUS_THROTTLED: 0

APPLICATION_VERIFIER_FLAGS:  0

CONTEXT:  (.ecxr)
rax=0000000000000000 rbx=0000005f14efd7f0 rcx=0000005f14efd770
rdx=0000005f14efd750 rsi=0000000000000001 rdi=0000005f14efd940
rip=0000000000000000 rsp=0000005f14efd728 rbp=0000000000000000
 r8=000001861d9201a0  r9=0000000000000004 r10=00007ffc91b849e0
r11=00000000000000a2 r12=0000000000000000 r13=0000005f14efdb30
r14=0000005f14efd940 r15=0000000000000001
iopl=0         nv up ei pl zr na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
00000000`00000000 ??              ???
Resetting default scope

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 0000000000000000
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000008
   Parameter[1]: 0000000000000000
Attempt to execute non-executable address 0000000000000000

PROCESS_NAME:  java.exe

EXECUTE_ADDRESS: 0

FAILED_INSTRUCTION_ADDRESS: 
+0
ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%p referenced memory at 0x%p. The memory could not be %s.

EXCEPTION_CODE_STR:  c0000005

EXCEPTION_PARAMETER1:  0000000000000008

EXCEPTION_PARAMETER2:  0000000000000000

STACK_TEXT:  
0000005f`14efd728 00007ffc`7448985f     : 00000186`7f7f2080 0000005f`14efd780 00000186`11304c10 0000005f`14efd840 : 0x0
0000005f`14efd730 00007ffc`74494428     : 0000005f`14efd908 0000005f`14efd810 0000005f`14efd940 00000000`00000000 : libclang!clang_getBuildSessionTimestamp+0xabef
0000005f`14efd7a0 00007ffc`7448a0ef     : 0000005f`14efd8a0 00000186`255f0140 0000005f`14efd899 00000186`1d8698c0 : libclang!clang_defaultReparseOptions+0x59e8
0000005f`14efd820 00007ffc`744a4685     : 00000000`00000000 00000186`11304c10 00000001`00000000 00000000`00000000 : libclang!clang_getBuildSessionTimestamp+0xb47f
0000005f`14efd900 00007ffc`735b9cec     : 00000006`0a309038 00000186`11304eb8 00000000`00000000 00000186`15e8cfd0 : libclang!clang_visitChildren+0xe5
0000005f`14efda50 00000186`720f8fc7     : 00000006`0a309038 00000186`14a14c30 0000005f`14efdb10 00000006`0a309010 : jniclang!Java_org_bytedeco_llvm_global_clang_clang_1visitChildren+0x10c
0000005f`14efdaa0 00000006`0a309038     : 00000186`14a14c30 0000005f`14efdb10 00000006`0a309010 0000005f`14efdad8 : 0x00000186`720f8fc7
0000005f`14efdaa8 00000186`14a14c30     : 0000005f`14efdb10 00000006`0a309010 0000005f`14efdad8 00000006`072680f8 : 0x00000006`0a309038
0000005f`14efdab0 0000005f`14efdb10     : 00000006`0a309010 0000005f`14efdad8 00000006`072680f8 00000006`072680b0 : 0x00000186`14a14c30
0000005f`14efdab8 00000006`0a309010     : 0000005f`14efdad8 00000006`072680f8 00000006`072680b0 00000006`07268080 : 0x0000005f`14efdb10
0000005f`14efdac0 0000005f`14efdad8     : 00000006`072680f8 00000006`072680b0 00000006`07268080 00000000`00000000 : 0x00000006`0a309010
0000005f`14efdac8 00000006`072680f8     : 00000006`072680b0 00000006`07268080 00000000`00000000 00000006`03c057b0 : 0x0000005f`14efdad8
0000005f`14efdad0 00000006`072680b0     : 00000006`07268080 00000000`00000000 00000006`03c057b0 00000186`142a18e0 : 0x00000006`072680f8
0000005f`14efdad8 00000006`07268080     : 00000000`00000000 00000006`03c057b0 00000186`142a18e0 00000006`04899c00 : 0x00000006`072680b0
0000005f`14efdae0 00000000`00000000     : 00000006`03c057b0 00000186`142a18e0 00000006`04899c00 0000005f`14efdb18 : 0x00000006`07268080


SYMBOL_NAME:  libclang+abef

MODULE_NAME: libclang

IMAGE_NAME:  libclang.dll

STACK_COMMAND:  ~43s; .ecxr ; kb

FAILURE_BUCKET_ID:  SOFTWARE_NX_FAULT_c0000005_libclang.dll!Unknown

OS_VERSION:  10.0.19041.1

BUILDLAB_STR:  vb_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

IMAGE_VERSION:  16.0.4.0

FAILURE_ID_HASH:  {58c64d95-3ee2-2504-9cfe-4b7ff0ac9dd8}

Followup:     MachineOwner
---------

@0x6675636b796f75676974687562
Copy link
Contributor Author

@saudet, I found out that libclang crashes only when accessed from multiple JVM threads, and works reliably in a single-threaded mode.

I'll try to create a minimal reproducer. If I'm successful in doing so, I'll share it here.

@saudet
Copy link
Member

saudet commented Nov 21, 2023 via email

@0x6675636b796f75676974687562
Copy link
Contributor Author

I don't think libclang is thread safe

It should be thread-safe, actually, despite I haven't found any official proof yet. But, citing this message:

I'm using Clang via libclang. I've found llvm_start_multithreaded() so is it possible to turn libclang to thread-safe mode with it? Does it affect libclang for parsing, tokenizing, indexing?

This is already called by libclang. I believe accessing different CXTranslationUnit's concurrently is fine, but the CXTranslationUnit itself is not thread-safe.

Yet, other people have encountered the same problem as I.

So I'll look further into this and get back to you.

@firefligher
Copy link

firefligher commented Apr 24, 2024

Hey!

I just found this discussion when I encountered pretty much the same issue:

  • Setup: Kotlin + JNA + libclang 14.0.6-12, openjdk 17.0.11 (both installed via APT on Debian 12.5)
  • Crashes with SIGSEGV when calling clang_visitChildren at different points of execution (usually somewhere inside a JNA function).
  • On my end, it seems like the occurrence of the segfault correlates with the number of (function) declarations inside the C file that I am visiting.
  • For now (in-depth investigations missing), the error seems to vanish when setting the environment variable LIBCLANG_DISABLE_CRASH_RECOVERY to 1. (I have absolutely no idea what this variable does and why this works.)

Due to that behavior, my suspicion is a pointer issue in clang, but that's just a very vague gut feeling.

@saudet
Copy link
Member

saudet commented Apr 24, 2024

If it works LIBCLANG_DISABLE_CRASH_RECOVERY=1 then that's expected, yes:
https://github.com/bytedeco/javacpp-presets/tree/master/llvm#documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants