Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when running CoreFoundation applications (and wqthread-related problems) #4

Open
Qix- opened this issue Feb 23, 2020 · 9 comments
Labels
bug Something isn't working help wanted Extra attention is needed
Projects

Comments

@Qix-
Copy link

Qix- commented Feb 23, 2020

I'm trying really hard to debug this but getting nowhere, even with lldb, vgdb and valgrind --vgdb=yes.

The new valgrind patch for MacOS mojave results in a successful install but on a few non-trivial applications Memcheck is causing a SIGILL to be raised. The application in question runs fine, otherwise.

Here is the stack:

==47185== valgrind: Unrecognised instruction at address 0x108afc25c.
==47185==    at 0x108AFC25C: ??? (in /usr/lib/system/libdispatch.dylib)
==47185==    by 0x108AFBD3A: ??? (in /usr/lib/system/libdispatch.dylib)
==47185==    by 0x108AFB2B1: ??? (in /usr/lib/system/libdispatch.dylib)
==47185==    by 0x108AF2720: ??? (in /usr/lib/system/libdispatch.dylib)
==47185==    by 0x108AF267C: ??? (in /usr/lib/system/libdispatch.dylib)
==47185==    by 0x108AE6B9A: ??? (in /usr/lib/system/libdispatch.dylib)
==47185==    by 0x108DD5F3B: ??? (in /usr/lib/system/libsystem_notify.dylib)
==47185==    by 0x108DD4B77: ??? (in /usr/lib/system/libsystem_notify.dylib)
==47185==    by 0x108DD2F6E: ??? (in /usr/lib/system/libsystem_notify.dylib)
==47185==    by 0x104EF87B9: _CFPrefsExtractQuadrupleFromPathIfPossible (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==47185==    by 0x108AE163C: ??? (in /usr/lib/system/libdispatch.dylib)
==47185==    by 0x108AE2D4A: ??? (in /usr/lib/system/libdispatch.dylib)
==47185== Your program just tried to execute an instruction that Valgrind
==47185== did not recognise.  There are two possible reasons for this.
==47185== 1. Your program has a bug and erroneously jumped to a non-code
==47185==    location.  If you are running Memcheck and you just saw a
==47185==    warning about a bad jump, it's probably your program's fault.
==47185== 2. The instruction is legitimate but Valgrind doesn't handle it,
==47185==    i.e. it's Valgrind's fault.  If you think this is the case or
==47185==    you are not sure, please let us know and we'll try to fix it.
==47185== Either way, Valgrind will now raise a SIGILL signal which will
==47185== probably kill your program.

The only other output comes before the application even starts - I don't know if it's relevant or not:

--47185-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option
--47185-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times)
--47185-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times)

I'm having issues creating a minimal reproduction case, as I have no idea which part of the application is causing this to happen (since the stacktrace doesn't give me any application-specific information).

Any tips on how to debug this would be appreciated :)

@LouisBrunner LouisBrunner added the bug Something isn't working label Feb 23, 2020
@LouisBrunner
Copy link
Owner

@Qix-, thanks for your report!

The UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option messages are not relevant for this problem (but we should fix them sooner or later...).

Without a program to test, it will be extremely tricky for me to debug. As _CFPrefsExtractQuadrupleFromPathIfPossible appears in the stacktrace, CoreFoundation is involved which means we might be able to reproduce it on non-CLI programs (Safari or else maybe?).

As this is a really similar case to the ptr_munge bug of macOS 10.15 (where valgrind was not setting up the binary properly and hitting a ud2 instruction added by some kind of ASSERT), it might give us a bit more information. Could you include the full input if you ran valgrind with -d -v -v -v -v -v --trace-syscalls=yes --trace-flags=11111111 --trace-children=yes?

@LouisBrunner
Copy link
Owner

I actually managed to reproduce the problem fairly easily, running valgrind with Hex Fiend (I am guessing any .app will work).

I have good and bad news: I already have a fix which I recovered from an old patch (you can try it here), however you will probably run into another problem rightaway: SIGSEGV on start_wqthread.

I also have a fix for that issue, but it is really experimental (see here) and need polishing.

@Qix-
Copy link
Author

Qix- commented Feb 23, 2020

Yep definitely using CoreFoundation (it's a game engine, with a bunch of window calls). Glad it wasn't localized to my application ^^

As for testing the patches, should I rebase one onto the other in order to test both, or does the last link include them both? And should I build via the makefiles or is there a fancy brew command to do so?

@LouisBrunner
Copy link
Owner

I just merged kevent_id because it was done and now it's obvious that it's needed. Which means you could use the feature/wqthread_fix branch directly (keeping in mind that it will probably crash).

Unfortunately, I am not aware of any brew command that does that, you will need to build via the Makefile.

@LouisBrunner LouisBrunner changed the title Memcheck causing segfault in otherwise working application Crash when running CoreFoundation applications (and wqthread-related problems) Feb 23, 2020
@LouisBrunner LouisBrunner pinned this issue Feb 23, 2020
@Qix-
Copy link
Author

Qix- commented Feb 23, 2020

Output from feature/wqthread_fix. Built valgrind with ./autogen.sh && ./configure && make && sudo make install && valgrind /path/to/my/app

--77136-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 8 times)

valgrind: m_syswrap/syswrap-amd64-darwin.c:512 (void wqthread_hijack(Addr, Addr, Addr, Addr, Int, Addr)): Assertion 'tst->os_state.pthread - magic_delta == self' failed.

host stacktrace:
==77136==    at 0x2580521E9: ???

sched status:
  running_tid=0

Thread 1: status = VgTs_WaitSys syscall unix:266 (lwpid 771)
==77136==    at 0x108DF895E: ??? (in /usr/lib/system/libsystem_kernel.dylib)
==77136==    by 0x104F016A6: -[CFPrefsPlistSource handleReply:toRequestNewDataMessage:onConnection:retryCount:error:] (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104F011DA: -[CFPrefsPlistSource handleReply:toRequestNewDataMessage:onConnection:retryCount:error:] (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104F00CF8: -[CFPrefsSearchListSource handleReply:toRequestNewDataMessage:onConnection:retryCount:error:] (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x108EABAB9: ??? (in /usr/lib/system/libxpc.dylib)
==77136==    by 0x104F00C3A: __34-[_CFXPreferences canLookUpAgents]_block_invoke (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104F00AFE: ___CFPrefsDirectMode_block_invoke (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104F00527: ___CFGetCachedUnsandboxedHomeDirectoryForUser_block_invoke (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104F004FD: ___CFGetCachedUnsandboxedHomeDirectoryForUser_block_invoke (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x108AEB671: ??? (in /usr/lib/system/libdispatch.dylib)
==77136==    by 0x108AFBA42: ??? (in /usr/lib/system/libdispatch.dylib)
==77136==    by 0x108AFB595: ??? (in /usr/lib/system/libdispatch.dylib)
==77136==    by 0x10504622A: __CFStringEncodingICUToBytes (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104F8354C: __64-[_CFXPreferences copyKeyListForIdentifier:user:host:container:]_block_invoke (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104EFF40E: -[__NSArrayM getObjects:range:] (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104EFEE41: -[CFPrefsSearchListSource alreadylocked_copyValueForKey:] (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104EFEA08: _CFStringCheckAndGetCharacters (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104EFE950: CFStringHashISOLatin1CString (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104EFE90E: CFStringHashISOLatin1CString (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104EE89B5: -[_CFXPreferences(SearchListAdditions) withSearchListForIdentifier:container:cloudConfigurationURL:perform:] (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104EE8677: -[_CFXPreferencesHandle copyPrefs] (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104EE83C8: ___CFPrefsCopyDefaultPreferences_block_invoke (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104EE7F88: CFArrayGetCount (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x102AB29ED: -[NSUserDefaults(NSUserDefaults) init] (in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation)
==77136==    by 0x102ABAB4F: +[NSUserDefaults(NSUserDefaults) standardUserDefaults] (in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation)
==77136==    by 0x103760C77: +[NSApplication initialize] (in /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit)
==77136==    by 0x1066AE4B1: ??? (in /usr/lib/libobjc.A.dylib)
==77136==    by 0x1066AE864: ??? (in /usr/lib/libobjc.A.dylib)
==77136==    by 0x1066AE79A: ??? (in /usr/lib/libobjc.A.dylib)
==77136==    by 0x1066AF62E: ??? (in /usr/lib/libobjc.A.dylib)
==77136==    by 0x10669E68F: ??? (in /usr/lib/libobjc.A.dylib)
==77136==    by 0x10669E113: ??? (in /usr/lib/libobjc.A.dylib)
==77136==    by 0x10056E8E8: Cocoa_RegisterApp (SDL_cocoaevents.m:404)
==77136==    by 0x10057603B: Cocoa_CreateDevice (SDL_cocoavideo.m:58)
==77136==    by 0x100487B3A: SDL_VideoInit_REAL (SDL_video.c:505)
==77136==    by 0x10037D379: SDL_InitSubSystem_REAL (SDL.c:206)
==77136==    by 0x10037D712: SDL_Init_REAL (SDL.c:291)
==77136==    by 0x1003953E6: SDL_Init (SDL_dynapi_procs.h:85)
==77136==    by 0x100221249: tide::renderer::init_render_thread() (renderer.cc:79)
==77136==    by 0x1001BD0E3: main (main.cc:60)
client stack range: [0x105E97000 0x106696FFF] client SP: 0x106694E28
valgrind stack range: [0x700001AA0000 0x700001B9FFFF] top usage: 9800 of 1048576

Thread 2: status = VgTs_WaitSys syscall unix:368 (lwpid 5387)
==77136==    at 0x108DF8BFA: ??? (in /usr/lib/system/libsystem_kernel.dylib)
==77136==    by 0x108E4D6E5: ??? (in /usr/lib/system/libsystem_pthread.dylib)
==77136==    by 0x108E4D3FC: ??? (in /usr/lib/system/libsystem_pthread.dylib)
client stack range: ??????? client SP: 0x70000DDE6F98
valgrind stack range: [0x700006142000 0x700006241FFF] top usage: 3304 of 1048576

Thread 3: status = VgTs_WaitSys syscall mach:31 (lwpid 9987)
==77136==    at 0x108DF721A: ??? (in /usr/lib/system/libsystem_kernel.dylib)
==77136==    by 0x108DF7767: ??? (in /usr/lib/system/libsystem_kernel.dylib)
==77136==    by 0x108EA60D7: ??? (in /usr/lib/system/libxpc.dylib)
==77136==    by 0x108EA5E30: ??? (in /usr/lib/system/libxpc.dylib)
==77136==    by 0x108EB59F2: ??? (in /usr/lib/system/libxpc.dylib)
==77136==    by 0x108EAA1C6: ??? (in /usr/lib/system/libxpc.dylib)
==77136==    by 0x108EA9D93: ??? (in /usr/lib/system/libxpc.dylib)
==77136==    by 0x108EA9BAB: ??? (in /usr/lib/system/libxpc.dylib)
==77136==    by 0x108EA9B36: ??? (in /usr/lib/system/libxpc.dylib)
==77136==    by 0x104F00707: -[_CFXPreferences withConnectionForRole:performBlock:] (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104F0051A: ___CFGetCachedUnsandboxedHomeDirectoryForUser_block_invoke (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104F004FD: ___CFGetCachedUnsandboxedHomeDirectoryForUser_block_invoke (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x108AEB671: ??? (in /usr/lib/system/libdispatch.dylib)
==77136==    by 0x108AFAF94: ??? (in /usr/lib/system/libdispatch.dylib)
==77136==    by 0x108AEB63C: ??? (in /usr/lib/system/libdispatch.dylib)
==77136==    by 0x108AF9508: ??? (in /usr/lib/system/libdispatch.dylib)
==77136==    by 0x108AF9B45: ??? (in /usr/lib/system/libdispatch.dylib)
==77136==    by 0x108E4D6B2: ??? (in /usr/lib/system/libsystem_pthread.dylib)
==77136==    by 0x108E4D3FC: ??? (in /usr/lib/system/libsystem_pthread.dylib)
client stack range: ??????? client SP: 0x70000DE69B28
valgrind stack range: [0x700006246000 0x700006345FFF] top usage: 3872 of 1048576

If you'd like, I can do a run with the high-verbosity flags you mentioned before.

@LouisBrunner
Copy link
Owner

No it's alright, I get the exact same error, so at least there is comfort in that...

I'll need to continue working on the wqthread fix as it's obviously still buggy.

@Qix-
Copy link
Author

Qix- commented Feb 23, 2020

Alright, sounds good :) Let me know if I can help in any way! Thanks for all of the work you've done, it's incredibly appreciated.

@LouisBrunner
Copy link
Owner

Thanks a lot for your kind words! I will try to look into it, but if you want to investigate yourself, it would be greatly appreciated.

@ernesernesto
Copy link

I'm not sure If this is the right place to ask this, is there any tool that can be used to detect memory leak beside valgrind? Because of wqthread issues it's impossible to check multithread app with valgrind in osx according to this https://bugs.kde.org/show_bug.cgi?id=380269 which might be related, link is already 4 years old and I guess it's safe to assume it would not be fixed shortly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
Roadmap
  
To do
Development

No branches or pull requests

3 participants