Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

M1/M2 Compatibility #56

Open
gilaroni opened this issue Nov 20, 2022 · 51 comments
Open

M1/M2 Compatibility #56

gilaroni opened this issue Nov 20, 2022 · 51 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@gilaroni
Copy link

will there be a version for m1?

@LouisBrunner LouisBrunner changed the title m1 compatability M1/M2 Compatibility Dec 21, 2022
@LouisBrunner LouisBrunner added enhancement New feature or request help wanted Extra attention is needed labels Dec 21, 2022
@LouisBrunner
Copy link
Owner

Hi @gilaroni,

I have started on M1 support, it might take a bit but it's on the roadmap, yes.

@MalwarePup
Copy link

Hello,
any news here?

@eliesaikali
Copy link

Any news here, its always not possible to install valgrind ....

brew install --HEAD LouisBrunner/valgrind/valgrind
==> Fetching louisbrunner/valgrind/valgrind
==> Cloning https://github.com/LouisBrunner/valgrind-macos.git
Updating /Users/xxx/Library/Caches/Homebrew/valgrind--git
==> Checking out branch main
Already on 'main'
Your branch is up to date with 'origin/main'.
HEAD is now at ee485f9 docs: Update README for Homebrew error (#72)
==> Installing valgrind from louisbrunner/valgrind
==> ./autogen.sh
==> ./configure --prefix=/opt/homebrew/Cellar/valgrind/HEAD-ee485f9 --enable-only64bit --build=amd64-darwin
==> make
Last 15 lines from /Users/xxx/Library/Logs/Homebrew/valgrind/03.make:
fixup_macho_loadcmds.c:465:22: error: use of undeclared identifier 'x86_thread_state64_t'
= (x86_thread_state64_t*)(&w32s[2]);
^
fixup_macho_loadcmds.c:467:36: error: no member named '__rsp' in 'struct __darwin_arm_thread_state64'; did you mean '__sp'?
init_rsp = state64->__rsp;
^~~~~
__sp
/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/usr/include/mach/arm/_structs.h:141:13: note: '__sp' declared here
__uint64_t __sp; /* Stack pointer x31 */
^
7 errors generated.
make[2]: *** [fixup_macho_loadcmds] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

If reporting this issue please do so at (not Homebrew/brew or Homebrew/homebrew-core):
https://github.com/louisbrunner/homebrew-valgrind/issues

valgrind's formula was built from an unstable upstream --HEAD.
This build failure is expected behaviour.
Do not create issues about this on Homebrew's GitHub repositories.
Any opened issues will be immediately closed without response.
Do not ask for help from Homebrew or its maintainers on social media.
You may ask for help in Homebrew's discussions but are unlikely to receive a response.
Try to figure out the problem yourself and submit a fix as a pull request.
We will review it but may or may not accept it.

@TommyJD93
Copy link

any update for the M1/M2 compatibility?

@hacknus
Copy link

hacknus commented Mar 19, 2023

ran into the same error as @eliesaikali on my M2 Pro MacBook Pro running macOS Ventura 13.2.1 (22D68).

@whisper-bye
Copy link

+1

@carl-alphonce
Copy link

M1 Pro MacBook Pro running macOS Venture 13.3.1. Same issue as @eliesaikali and @hacknus.

@fakecore
Copy link

Any details about the roadmap of Valgrind runs in the M1 arch?

@zakariazh
Copy link

any update for the M1/M2 compatibility?

@moliqingwa

This comment was marked as spam.

@JoonasMykkanen
Copy link

Still need help?

@kalip2
Copy link

kalip2 commented Jul 8, 2023

Would really like Valgrind w/ MacOS Silicon for school purposes. We're using it at school!

brew install --HEAD LouisBrunner/valgrind/valgrind                  6s
Error: Valgrind is currently incompatible with ARM-based Macs, see https://github.com/LouisBrunner/valgrind-macos/issues/56

@kalip2
Copy link

kalip2 commented Jul 14, 2023

Pleeeassseee bump this up. We're using valgrind in our cs101 class to help with memory leaks in projects related to singly linked lists and pointers. We're also going to need valgrind in our upcoming cs102 class. I don't mind using our college's ubuntu desktops with valgrind, but the labs close at 10pm, and a lot of us do our best studying after 10pm.

@paulfloyd
Copy link
Contributor

Speaking from experience, a huge amount of work is needed to get things running smoothly.

@MartinDelille
Copy link

What are the alternative to valgrind on MacOS what would be usable in a continuous integration environment ?

@eliesaikali
Copy link

What are the alternative to valgrind on MacOS what would be usable in a continuous integration environment ?

@MartinDelille You can use leaks if you like on MacOS

@MartinDelille
Copy link

I wasn't aware of this xcode tool! Thanks a lot! 👌

@MalwarePup
Copy link

Speaking from experience, a huge amount of work is needed to get things running smoothly.

Yes probably, but some people want to contribute like @JoonasMykkanen, but he doesn't get any answer

@paulfloyd
Copy link
Contributor

I recently got a Raspberry Pi 5 and I've started work on adding support for FreeBSD arm64.

Initially that went well. It didn't take long to get things to compile and to be able to load guest binaries. However I'm now stuck early on in program startup, getting various crashes in the backend generated code. Not sure yet what is causing this - memory layout, TLS.

Anyway, if I get it to work fairly well and merge it upstream it may help a bit with Darwin arm64.

@LouisBrunner
Copy link
Owner

LouisBrunner commented Feb 23, 2024

Quick update, I already posted this on Discord, but I found a nice workaround for the compilation issue and managed to load the guest binary quite reliably.

@paulfloyd Same thing for me. I am getting crashes from the is_imm64_to_ireg_EXACTLY4 assertion and random SIGILLs from the "event check" part of the VEX-translated source block. I don't have a clue where either of those issues comes from at the moment. Definitely keep me updated if you fix anything.

@paulfloyd
Copy link
Contributor

paulfloyd commented Feb 24, 2024

@LouisBrunner OK, very interesting. I'm seeing exactly the same thing.

When the assert happens the guest code is

==11397== at 0x4016BD8: ??? (in /libexec/ld-elf.so.1)

Which is

0x15e38 + 0x1bd8 = 0x17a10

   179e8: aa1803e1      mov     x1, x24
   179ec: f9001118      str     x24, [x8, #0x20]
   179f0: 940030b9      bl      0x23cd4 <_rtld_atfork_post+0x2e98>
   179f4: f9430b57      ldr     x23, [x26, #0x610]
   179f8: 9108f2e9      add     x9, x23, #0x23c
   179fc: f9400128      ldr     x8, [x9]
   17a00: b240010a      orr     x10, x8, #0x1
   17a04: f94417e8      ldr     x8, [sp, #0x828]
   17a08: f900012a      str     x10, [x9]
   17a0c: b40000a8      cbz     x8, 0x17a20 <__tls_get_addr+0xee8>
   17a10: f9400508      ldr     x8, [x8, #0x8]

And the assert in VEX is

#0  chainXDirect_ARM64 (endness_host=VexEndnessLE, place_to_chain=0x10029911e8, 
    disp_cp_chain_me_EXPECTED=0x3817fc1c <vgPlain_disp_run_translations+76>, place_to_jump_to=0x1002991220) at priv/host_arm64_defs.c:6093

Initially I thought that I was getting some address range error and writing into and corrupting the generated code. If you're seeing the same thing that is less likely. I did try Linux arm64 with clang (but using libstdc++ not libc++ - don't think that makes a difference) and I didn't see any errors like this. I'll see if I can use ld.lld.

@LouisBrunner
Copy link
Owner

It's difficult to pin down because I have many different scenarios.

Running with vgdb

I get SIGILLs during the early stage of dyld setup (around this area https://github.com/apple-oss-distributions/dyld/blob/d1a0f6869ece370913a3f749617e457f3b4cd7c4/dyld/dyldMain.cpp#L1195). The exact guest RIP or instruction is never the same as the crash happens in the VEX-generated code. It's the evCheck which looks something like this:

->  0x700000fb9f30: ldur   w9, [x21, #0x8]
    0x700000fb9f34: subs   w9, w9, #0x1
    0x700000fb9f38: stur   w9, [x21, #0x8]
    0x700000fb9f3c: b.pl   0x700000fb9f48

IIRC the crash is always on the load.

Running with lldb

I get a crash on the is_imm64_to_ireg_EXACTLY4 assert in chainXDirect_ARM64 like you. I think this is because lldb is able to bypass the SIGILLs (sometimes it reports them, sometimes it silences them and sometimes you can bypass them, not sure how/why).

Running directly

A healthy mix of SIGILLs, asserts and sometimes mmap failure (not sure yet why this is so mercurial). What is so odd to me is that it's so consistent. As I am doing this testing, I only get SIGILLs but yesterday when I tried it was only asserts. This morning it was mmap failures.

Summary

While it's really likely we are encountering the same issue, I also have other problems which might be causing this. Or the evCheck probem and the assertion are related somehow.

@paulfloyd
Copy link
Contributor

I also get SIGILLs, for instance if I single step using vgdb or --vex-guest-max-insns=1

This isn't code that I know at all well unfortunately.

This looks too similar for it to be a coincidence. I don't see much in the way of platform dependent stuff in any of the files, so it's a bit of a mystery why there is no problem on Linux. I tried adding an "vassert(False)" in chainXDirect_ARM64 on Linux this morning and it triggered straight away, so the problem isn't that macOS and FreeBSD use that function and not Linux.

@paulfloyd
Copy link
Contributor

If I understand the code correctly it works as follows.

place_to_chain points to generated code that performs the 4 opcode load of x9 and the blr subroutine call to the address in x9.

(gdb) x /5i place_to_chain
   0x1002991468:        mov     x9, #0x1220                     // #4640
   0x100299146c:        movk    x9, #0x299, lsl #16
   0x1002991470:        movk    x9, #0x10, lsl #32
   0x1002991474:        movk    x9, #0x0, lsl #48
   0x1002991478:        br      x9

Then the assert is checking that the address above corresponds to disp_cp_chain_me_EXPECTED

I think that address is 0x1002991220 which contains

(gdb) x /i 0x1002991220
   0x1002991220:        ldur    w9, [x21, #8]

That address should match disp_cp_chain_me_EXPECTED, but that contains something completely different.

(gdb) p disp_cp_chain_me_EXPECTED
$5 = (const void *) 0x38180094 <vgPlain_disp_run_translations+76>

The function in the assert is a bit hard to follow as it doesn't extract the address and compare the two addresses, it generates opcodes from the address and compares the opcodes.

My thoughts at the moment are that this us a problem of matching chainXDirect_ARM64 and unchainXDirect_ARM64. It looks to me like "chain" is being called on instructions that have been already "chained" once but not "unchained".

@paulfloyd
Copy link
Contributor

And there may be something in this bugzilla https://bugs.kde.org/show_bug.cgi?id=412377 since there is some connection between chaining and the icache:

   VexInvalRange vir
       = LibVEX_UnChain( arch_host, endness_host, place_to_patch, 
                         place_to_jump_to_EXPECTED, disp_cp_chain_me );
   VG_(invalidate_icache)( (void*)vir.start, vir.len );

@paulfloyd
Copy link
Contributor

Hmm no Linux doesn't seem to use unchainXDirect_ARM64 but every place_to_chain address is unique. That's not the case on FreeBSD.

@LouisBrunner
Copy link
Owner

I also get SIGILLs, for instance if I single step using vgdb or --vex-guest-max-insns=1

Good shout, --vex-guest-max-insns=1 is very good at raising SIGILLs within the first few instructions.

This isn't code that I know at all well unfortunately.

Me neither, it's quite tough.

This looks too similar for it to be a coincidence. I don't see much in the way of platform dependent stuff in any of the files, so it's a bit of a mystery why there is no problem on Linux. I tried adding an "vassert(False)" in chainXDirect_ARM64 on Linux this morning and it triggered straight away, so the problem isn't that macOS and FreeBSD use that function and not Linux.

That's true but while using breakpoints in LLDB I was able to run dyld for a while before it crashed, quite far actually. It also crashed for a completely unrelated reason (EXC_BAD_ACCESS). Thus I think something very odd is afoot if slowing down execution/using breakpoints bypassed any such issue.

If I understand the code correctly it works as follows.

Thanks for the deep dive, I haven't looked at the assert much because the SIGILL is much more problematic for me right now.

And there may be something in this bugzilla https://bugs.kde.org/show_bug.cgi?id=412377 since there is some connection between chaining and the icache:

This is also where I have to do special JIT memory permission tricks due to changes in Apple Silicon (https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon#Disable-Write-Protections-Before-You-Generate-Instructions). I wonder if this plays a factor as well.

Hmm no Linux doesn't seem to use unchainXDirect_ARM64 but every place_to_chain address is unique. That's not the case on FreeBSD.

I haven't checked in details but they looked pretty unique on macOS. What do they look like on FreeBSD?

@LouisBrunner
Copy link
Owner

Quick update: thanks to @paulfloyd we have found the source of the SIGILLs/assertions (which was that cache invalidation for JIT was only enabled for Linux). This particular problem is now fixed on both macOS and FreeBSD. I also fixed some syscall issues and added even more support for new arm64 instructions.

While this is good news as Apple Silicon support is definitely moving forward given that Valgrind is now reliably running part of the guest binaries, it is not yet running them completely without crash. However we have never been this close to getting it working and I am very hopeful for the near future!

@rogerburtonpatel
Copy link

I don't have the bandwidth to contribute to this right now-- can I buy you a coffee or some takeout? You're holding all our hopes and dreams.

@paulfloyd
Copy link
Contributor

FreeBSD arm64 is now working reasonably well. No signals or threads just yet.

== 709 tests, 235 stderr failures, 47 stdout failures, 5 stderrB failures, 6 stdoutB failures, 0 post failures ==

@julienhouyet

This comment was marked as duplicate.

@LouisBrunner
Copy link
Owner

LouisBrunner commented Feb 29, 2024

Great news, I have managed to run a guest binary on M1 through Valgrind for the first time. But while I am very pleased, let me be absolutely clear: this is in no way stable and can at best be described as an experimental prototype. I just thought it was too much progress not to post it.

If you'd like to test it for yourself, you can clone this repo and build the feature/m1 branch directly. Do not expect it to work, as it probably won't (at the moment only simple Unix utilities, most likely without threads/forks work). If it crashes, do not open a new issue or post a comment here for now (I will close/delete them). Posting on Discord is fine however.

Summary of the roadmap ahead:

  • Most binaries failing: need to investigate where those SIGSEGV come from (most likely initializers related) -> seemingly an issue when objc is being used
  • Many mmaps still failing with "Operation not permitted": no clue yet, very inconsistent (maybe some kind of kernel heuristic?)
  • Clearing initializers at startup: currently extremely hardcoded, might be doable using dyld (because Valgrind has this now)
  • mprotect + most errors: need to identify malloc metadata (would love to replicate the vmmap logic)
  • Other errors: combination of dyld shenanigans and other unknown issues
  • Other errors: still a lot left, esp with libxpc, libobjC, malloc shenanigans, etc etc etc
  • VEX PAUTH support: need actual support in place, or maybe passing directly to host
  • Malloc/free tracking is broken: just like on macOS 11 x86 when it came out
  • sys_icache_invalidate/___chkstk_darwin: not GPL compatible, will need a complete rewrite
  • MDEP Syscall class: clean up as it doesn't seem to exist in arm
  • Threads: currently broken, crash with NSThreadPoisoned
  • Fork: needs reviewing, most likely broken
  • Signals (syswrap/fixup_guest_state_after_syscall_interrupted): needs reviewing, most likely broken
  • GET_STARTREGS, get_StackTrace_wrk, N_CFI_REGS, pub_tool_machine: needs reviewing
  • ALLOW_RWX_WRITE: arguments are probably wrong in place
  • Tests: they don't even build yet

@Miljoen
Copy link

Miljoen commented Apr 16, 2024

Thanks for all of the hard work on this @LouisBrunner, getting valgrind to work for Apple Silicon is going to be so great for anyone trying to do development in C/C++ on a Mac.

As it stands I switched to Ubuntu for my tasks that require memchecks, but just know that many of us cannot wait to jump back and check out any first stable version.

@paulfloyd
Copy link
Contributor

And a bit of news from upstream. This morning I pushed the code for FreeBSD arm64. Other than the occasional failure related to setting up memory for new threads (I think) it works pretty well.

Might help a bit with stuff like signal resumption where Darwin and FreeBSD have very similar code.

@rdoeffinger
Copy link

rdoeffinger commented Apr 23, 2024

Not sure if helpful, but FYI I got something semiworking on a complex app with 2 patches, one that adds a (hackish) DC_ZVA instruction implementation and another that avoids crashes trying to load debug info for a lot a system libraries.
mypatch.diff.txt
EDIT: Also needs this, as some things assume 64-byte cache lines it seems:

--- a/coregrind/m_machine.c
+++ b/coregrind/m_machine.c
@@ -1859,7 +1859,8 @@ Bool VG_(machine_get_hwcaps)( void )
      vai.hwcaps |= VEX_HWCAPS_ARM64_LRCPC;
      vai.hwcaps |= VEX_HWCAPS_ARM64_DIT;
 
-     ULong ctr_el0 = 0;
+     // 64 byte cachelines
+     ULong ctr_el0 = 0x00040004;
 #else
      r = VG_(sigprocmask)(VKI_SIG_UNBLOCK, &tmp_set, &saved_set);
      vg_assert(r == 0);

With this, memcheck seems to work perfectly for my case. --tool=massif however does not...

@paulfloyd
Copy link
Contributor

There are a few bugzilla items similar to this.

First we need a better implementation of the mrs instructions.

Are they documented for Darwin?

@LouisBrunner
Copy link
Owner

Not sure if helpful, but FYI I got something semiworking on a complex app with 2 patches

Oh, very interesting! Does your app uses threads at all? Do you interface with objc in any way?

one that adds a (hackish) DC_ZVA instruction implementation and another that avoids crashes trying to load debug info for a lot a system libraries. mypatch.diff.txt EDIT: Also needs this, as some things assume 64-byte cache lines it seems:

--- a/coregrind/m_machine.c
+++ b/coregrind/m_machine.c
@@ -1859,7 +1859,8 @@ Bool VG_(machine_get_hwcaps)( void )
      vai.hwcaps |= VEX_HWCAPS_ARM64_LRCPC;
      vai.hwcaps |= VEX_HWCAPS_ARM64_DIT;
 
-     ULong ctr_el0 = 0;
+     // 64 byte cachelines
+     ULong ctr_el0 = 0x00040004;
 #else
      r = VG_(sigprocmask)(VKI_SIG_UNBLOCK, &tmp_set, &saved_set);
      vg_assert(r == 0);

Thanks, I will look into those. If you have any extra background on the reasoning behind those fixes, that would be very nice.

Some specific questions I have:

  • Regarding the debugging fix: does your binary have no debug symbols or truncated ones?
  • Regarding the cache lines: any reference for this?

With this, memcheck seems to work perfectly for my case. --tool=massif however does not...

If you get it working, feel free to post here. I am focusing mostly on memcheck as it's the default tool and there is still so much that is broken.

Are they documented for Darwin?

Couldn't find much personally. The OSS source code have a few mentions of the Apple-specific registries and that's kind of it. I don't even think you can run mrs at all in user-space (at least Valgrind always got SIGILL'd).

First we need a better implementation of the mrs instructions.

Not sure what you mean by this. I added a few extra ones here, here and here.

@paulfloyd
Copy link
Contributor

I've only started looking at this.

My understanding is that the MSR instructions are kind of like a 'soft' version of CPUID. Unlike CPUID which returns baked-in values MRS traps to the kernel. What I don't know yet is whether it is always safe to use a dirty helper or not. It will be a problem if the kernel reports capabilities that Valgrind doesn't support.

@rdoeffinger
Copy link

First we need a better implementation of the mrs instructions.

I don't think so, the OS parts of macOS at least seem to not bother and just assume things, like the granularity at which dc zva works As far as I can tell the libc usages don't care what the MSR contains (understandable, the loop would be far more complex and costly if it did).

Oh, very interesting! Does your app uses threads at all? Do you interface with objc in any way?

No, no threads at all (by default). Just a command-line application primarily targetting Linux, so no objc either.
There is a python dependency but for other reasons we can run that in a separate process, so as long as fork works all that can be skipped (though fork does not seem to work in massif is one of the problems).

Regarding the debugging fix: does your binary have no debug symbols or truncated ones?

Sorry, I should indeed have written more: there is no issue with my binary. The problem is with in-memory system libraries where this triggered. First I tried to skip all offending binaries by name in img_from_memory (it ONLY happens with system libraries and when img_from_memory is used), but it got a bit much and went with this more general solution.

Regarding the cache lines: any reference for this?

Not really, mostly guessing/trial and error. But the DCZID_EL0 register specifies the proper value, so could use that to double-check.
The current code sets this to an invalid value, hoping that generic code would not then use this instruction, but macOS OS-level code is not generic...
Purely by arch-spec we'd just need to make sure that the DCZID_EL0 return value and the size used by DC_ZVA match, but as far as I can tell that will not work on macOS, unless it all matches what the actual M processors do.
DISCLAIMER: it's all guesswork from my side.

If you get it working, feel free to post here. I am focusing mostly on memcheck as it's the default tool and there is still so much that is broken.

Yes, makes total sense. The massif issues, whatever they are, seem a bit beyond my skills. For now I'll just hope that someone else does some fixes that just happen to fix that, too!

@paulfloyd
Copy link
Contributor

First we need a better implementation of the mrs instructions.

I don't think so, the OS parts of macOS at least seem to not bother and just assume things, like the granularity at which dc zva works As far as I can tell the libc usages don't care what the MSR contains (understandable, the loop would be far more complex and costly if it did).

macOS isn't my only concern. I also want to ensure that everything works correctly on FreeBSD and Linux.

@rdoeffinger
Copy link

rdoeffinger commented Apr 24, 2024

None of my patches should change anything for FreeBSD or Linux.
At most there is a question if on non-macOS either the "dc zva" instruction should keep producing an illegal instruction error, or alternatively if DCZID_EL0 should be changed to report a value indicating support for dc zva.
But that seems at best a tiny improvement of questionable practical relevance that would come far down the priority list.

@paulfloyd
Copy link
Contributor

@rdoeffinger
Copy link

That ticket is old and lacks reproducers, but it sounds like Android might have the same issue (optimized libraries not checking if the instruction is available before using it).
If the cache line size is correctly detected by the (existing) code on Android, my patch should also work for that case.
A bit polish and/or finally updating the code for DCZID_EL0 to return the instruction as supported might make sense though.

@rdoeffinger
Copy link

rdoeffinger commented Apr 24, 2024

I put my patch also on that ticket, maybe there is some user feedback on it. Anyway feel free to re-use any part of my patches in any way you want.

@Yuri6037
Copy link

Yuri6037 commented Apr 30, 2024

Hello,

I've just tried building the M1 branch on my M1 Max and I got the exact same issue than rdoeffinger related to debug info.
This is the log I get on ANY application before applying rdoeffinger patch:

➜  valgrind-macos git:(feature/m1) ./vg-in-place ls
==18327== Memcheck, a memory error detector
==18327== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==18327== Using Valgrind-3.23.0.GIT-lbmacos and LibVEX; rerun with -h for copyright info
==18327== Apple Silicon support is currently experimental, see https://github.com/LouisBrunner/valgrind-macos/issues/56 in case of issues.
==18327== Command: ls
==18327== 
==18327== Valgrind: debuginfo reader: ensure_valid failed:
==18327== Valgrind:   during call to ML_(img_strdup)
==18327== Valgrind:   request for range [845575592, +1) exceeds
==18327== Valgrind:   valid image size of 777363456 for image:
==18327== Valgrind:   "/usr/lib/system/libsystem_darwindirectory.dylib"
==18327== 
==18327== Valgrind: debuginfo reader: Possibly corrupted debuginfo file.
==18327== Valgrind: I can't recover.  Giving up.  Sorry.
==18327== 

After applying the patch and recompiling vg-in-place is now able to trace trivial applications such as ls. I have tried running a more advanced own app but it has 2 issues:

  • VG is unable to load a binary if it depends on a DYLD_LIBRARY_PATH. Environment seem completely ignored when launching vg.
  • When placing the dylib in a location the VG accepts it ends up throwing LOTs of memory errors.

This is the output of VG when attempting to trace a custom CLI tool using a Rust dylib (everything is built with debug info including the Rust dylib):

➜  valgrind-macos git:(feature/m1) ✗ DYLD_LIBRARY_PATH=../bpx-edit-core/target/debug/ ./vg-in-place ../bpx-edit-core/a.out
==19581== Memcheck, a memory error detector
==19581== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==19581== Using Valgrind-3.23.0.GIT-lbmacos and LibVEX; rerun with -h for copyright info
==19581== Apple Silicon support is currently experimental, see https://github.com/LouisBrunner/valgrind-macos/issues/56 in case of issues.
==19581== Command: ../bpx-edit-core/a.out
==19581== 
--19581-- run: /usr/bin/dsymutil "/Users/xx/Projects/valgrind-macos/libBPXEditCore.dylib"
--19581-- UNKNOWN mach_msg2 unhandled MACH64_MSG_VECTOR option
--19581-- UNKNOWN mach_msg2 unhandled MACH64_MSG_VECTOR option (repeated 2 times)
==19581== Conditional jump or move depends on uninitialised value(s)
==19581==    at 0x18B0F2154: _ds_item (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F1E87: ds_user_byuid (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F1B83: search_item_bynumber (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F4287: getpwuid_r (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B11E813: _CFStringGetUserDefaultEncoding (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==19581==    by 0x18B11E233: __CFInitialize (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==19581==    by 0x1028B105B: invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const::$_0::operator()() const (in /usr/lib/dyld)
==19581==    by 0x1028EF307: invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void ( block_pointer)(unsigned int), void const*) const (in /usr/lib/dyld)
==19581==    by 0x1028E299B: invocation function for block in dyld3::MachOFile::forEachSection(void ( block_pointer)(dyld3::MachOFile::SectionInfo const&, bool, bool&)) const (in /usr/lib/dyld)
==19581==    by 0x1028922FB: dyld3::MachOFile::forEachLoadCommand(Diagnostics&, void ( block_pointer)(load_command const*, bool&)) const (in /usr/lib/dyld)
==19581==    by 0x1028E192F: dyld3::MachOFile::forEachSection(void ( block_pointer)(dyld3::MachOFile::SectionInfo const&, bool, bool&)) const (in /usr/lib/dyld)
==19581==    by 0x1028EEE1B: dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void ( block_pointer)(unsigned int), void const*) const (in /usr/lib/dyld)
==19581== 
==19581== Conditional jump or move depends on uninitialised value(s)
==19581==    at 0x18ACE9FE4: object_getClass (in /usr/lib/libobjc.A.dylib)
==19581==    by 0x18ADC2A8B: xpc_dictionary_get_int64 (in /usr/lib/system/libxpc.dylib)
==19581==    by 0x18B0F217F: _ds_item (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F1E87: ds_user_byuid (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F1B83: search_item_bynumber (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F4287: getpwuid_r (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B11E813: _CFStringGetUserDefaultEncoding (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==19581==    by 0x18B11E233: __CFInitialize (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==19581==    by 0x1028B105B: invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const::$_0::operator()() const (in /usr/lib/dyld)
==19581==    by 0x1028EF307: invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void ( block_pointer)(unsigned int), void const*) const (in /usr/lib/dyld)
==19581==    by 0x1028E299B: invocation function for block in dyld3::MachOFile::forEachSection(void ( block_pointer)(dyld3::MachOFile::SectionInfo const&, bool, bool&)) const (in /usr/lib/dyld)
==19581==    by 0x1028922FB: dyld3::MachOFile::forEachLoadCommand(Diagnostics&, void ( block_pointer)(load_command const*, bool&)) const (in /usr/lib/dyld)
==19581== 
==19581== Conditional jump or move depends on uninitialised value(s)
==19581==    at 0x18ACE9FE8: object_getClass (in /usr/lib/libobjc.A.dylib)
==19581==    by 0x18ADC2A8B: xpc_dictionary_get_int64 (in /usr/lib/system/libxpc.dylib)
==19581==    by 0x18B0F217F: _ds_item (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F1E87: ds_user_byuid (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F1B83: search_item_bynumber (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F4287: getpwuid_r (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B11E813: _CFStringGetUserDefaultEncoding (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==19581==    by 0x18B11E233: __CFInitialize (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==19581==    by 0x1028B105B: invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const::$_0::operator()() const (in /usr/lib/dyld)
==19581==    by 0x1028EF307: invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void ( block_pointer)(unsigned int), void const*) const (in /usr/lib/dyld)
==19581==    by 0x1028E299B: invocation function for block in dyld3::MachOFile::forEachSection(void ( block_pointer)(dyld3::MachOFile::SectionInfo const&, bool, bool&)) const (in /usr/lib/dyld)
==19581==    by 0x1028922FB: dyld3::MachOFile::forEachLoadCommand(Diagnostics&, void ( block_pointer)(load_command const*, bool&)) const (in /usr/lib/dyld)
==19581== 
==19581== Use of uninitialised value of size 8
==19581==    at 0x18ACE9FEC: object_getClass (in /usr/lib/libobjc.A.dylib)
==19581==    by 0x18ADC2A8B: xpc_dictionary_get_int64 (in /usr/lib/system/libxpc.dylib)
==19581==    by 0x18B0F217F: _ds_item (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F1E87: ds_user_byuid (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F1B83: search_item_bynumber (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F4287: getpwuid_r (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B11E813: _CFStringGetUserDefaultEncoding (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==19581==    by 0x18B11E233: __CFInitialize (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==19581==    by 0x1028B105B: invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const::$_0::operator()() const (in /usr/lib/dyld)
==19581==    by 0x1028EF307: invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void ( block_pointer)(unsigned int), void const*) const (in /usr/lib/dyld)
==19581==    by 0x1028E299B: invocation function for block in dyld3::MachOFile::forEachSection(void ( block_pointer)(dyld3::MachOFile::SectionInfo const&, bool, bool&)) const (in /usr/lib/dyld)
==19581==    by 0x1028922FB: dyld3::MachOFile::forEachLoadCommand(Diagnostics&, void ( block_pointer)(load_command const*, bool&)) const (in /usr/lib/dyld)
==19581== 
[SNIP]
==19581== 
==19581== Process terminating with default action of signal 5 (SIGTRAP)
==19581==    at 0x18CA3D9E0: _NSThreadPoisoned (in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation)
==19581==    by 0x18C272D07: -[NSThread init] (in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation)
==19581==    by 0x18CA3E6CF: ____mainNSThread_block_invoke (in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation)
==19581==    by 0x18AF0A3E7: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib)
==19581==    by 0x18AF0BC67: _dispatch_once_callout (in /usr/lib/system/libdispatch.dylib)
==19581==    by 0x18C272C9F: _NSThreadGet0 (in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation)
==19581==    by 0x18C2728FF: _NSInitializePlatform (in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation)
==19581==    by 0x18ACE7CF3: load_images (in /usr/lib/libobjc.A.dylib)
==19581==    by 0x1028A478F: dyld4::RuntimeState::notifyObjCInit(dyld4::Loader const*) (in /usr/lib/dyld)
==19581==    by 0x1028AD44F: dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const (in /usr/lib/dyld)
==19581==    by 0x1028AD3FF: dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const (in /usr/lib/dyld)
==19581==    by 0x1028AD3FF: dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const (in /usr/lib/dyld)
==19581== 
==19581== HEAP SUMMARY:
==19581==     in use at exit: 33,966 bytes in 492 blocks
==19581==   total heap usage: 609 allocs, 117 frees, 42,575 bytes allocated
==19581== 
==19581== LEAK SUMMARY:
==19581==    definitely lost: 10,336 bytes in 323 blocks
==19581==    indirectly lost: 0 bytes in 0 blocks
==19581==      possibly lost: 2,536 bytes in 38 blocks
==19581==    still reachable: 21,094 bytes in 131 blocks
==19581==         suppressed: 0 bytes in 0 blocks
==19581== Rerun with --leak-check=full to see details of leaked memory
==19581== 
==19581== Use --track-origins=yes to see where uninitialised values come from
==19581== For lists of detected and suppressed errors, rerun with: -s
==19581== ERROR SUMMARY: 1285 errors from 161 contexts (suppressed: 2 from 2)
[1]    19581 trace trap  DYLD_LIBRARY_PATH=../bpx-edit-core/target/debug/ ./vg-in-place

I wonder if these errors are due to VG internal issues or if they are real and Apply really needs to fix their dyld... Are these errors expected?

EDIT: I also wonder why it's calling _NSThreadPoisoned as in my test I have not used any threads.

@LouisBrunner
Copy link
Owner

@Yuri6037 FYI, I edited your comment to remove the bulk of the errors (as they aren't particularly useful for this discussion and can still be accessed through the Github edit history otherwise) because your message made it difficult to scroll through this issue.

Regarding the debugging fix: does your binary have no debug symbols or truncated ones?

Sorry, I should indeed have written more: there is no issue with my binary. The problem is with in-memory system libraries where this triggered. First I tried to skip all offending binaries by name in img_from_memory (it ONLY happens with system libraries and when img_from_memory is used), but it got a bit much and went with this more general solution.

I've just tried building the M1 branch on my M1 Max and I got the exact same issue than rdoeffinger related to debug info. This is the log I get on ANY application before applying rdoeffinger patch:

Good to know that it isn't an isolated issue. I have yet to reproduce it, I will have to look into it a bit more.

  • VG is unable to load a binary if it depends on a DYLD_LIBRARY_PATH. Environment seem completely ignored when launching vg.

Valgrind should provide all DYLD env vars to the real dyld when running your binary. If you have a MRE, I would gladly look into that issue.

  • When placing the dylib in a location the VG accepts it ends up throwing LOTs of memory errors.

I wonder if these errors are due to VG internal issues or if they are real and Apply really needs to fix their dyld... Are these errors expected?

This is usually due to Valgrind not knowing about how the memory is laid out on macOS. In the past, they have been hidden through a copious amount of suppressions (a special Valgrind file). I haven't seen those specific one yet, so I don't know what's causing it but the general rule I found is: if you dig down to why an issue like _NSThreadPoisoned happens, you usually fix a lot of seemingly unrelated warnings (by mapping memory correctly).

EDIT: I also wonder why it's calling _NSThreadPoisoned as in my test I have not used any threads.

As stated in my latest update, this is currently expected. I haven't had time to investigate why that happens apart that objc is being loaded at some point. You don't have to use threads especially but if you use any macOS capability which rely on objc, you will have this crash.

Regarding the cache lines: any reference for this?

Not really, mostly guessing/trial and error. But the DCZID_EL0 register specifies the proper value, so could use that to double-check. The current code sets this to an invalid value, hoping that generic code would not then use this instruction, but macOS OS-level code is not generic... Purely by arch-spec we'd just need to make sure that the DCZID_EL0 return value and the size used by DC_ZVA match, but as far as I can tell that will not work on macOS, unless it all matches what the actual M processors do. DISCLAIMER: it's all guesswork from my side.

I am not very knowledgeable about this so thanks for this background as it will make easier to research and fix.

@paulfloyd
Copy link
Contributor

Not really, mostly guessing/trial and error. But the DCZID_EL0 register specifies the proper value, so could use that to double-check. The current code sets this to an invalid value, hoping that generic code would not then use this instruction, but macOS OS-level code is not generic... Purely by arch-spec we'd just need to make sure that the DCZID_EL0 return value and the size used by DC_ZVA match, but as far as I can tell that will not work on macOS, unless it all matches what the actual M processors do. DISCLAIMER: it's all guesswork from my side.

I am not very knowledgeable about this so thanks for this background as it will make easier to research and fix.

I just pushed a bunch of changes upstream for arm64 concerning several mrs and dc opcodes. That includes mrs dczd_el0 and dc zva.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests