Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mimalloc_rust may be unsound due to calls to getenv #68

Open
xerxes12354 opened this issue Apr 28, 2021 · 2 comments
Open

mimalloc_rust may be unsound due to calls to getenv #68

xerxes12354 opened this issue Apr 28, 2021 · 2 comments

Comments

@xerxes12354
Copy link

I was looking at the time crate wondering why the time crate didn't give offsets, and I stumbled upon this thread on Internals. This seems like a big problem so I went looking for FFI crates where this could be a problem.

Looking at mimalloc source code, and the list of environment variables https://microsoft.github.io/mimalloc/environment.html, it looks like this wrapper has a data race when set_env is called in another thread.

@thomcc
Copy link
Contributor

thomcc commented May 2, 2021

There's no soundness hole, since the only safe way of using this crate is by setting the #[global_allocator]. Both env::set_var and thread::spawn will allocate memory with the global allocator, which will cause mimalloc to run its initialization if not already run, which is where the reads from the environment occur.


More details

Mimalloc reads these values from the environment either in a static constructor (e.g. __attribute__((constructor)) — which means it happens before Rust code can normally run, since it's before rust's main), or on the first allocation, which ever comes first.

Unix/posix platforms are the platforms that have a thread-unsafe setenv (it's fine on windows, although this turns out not to matter really). The way this can trigger is only in multithreaded scenarios, e.g. one thread must be doing a env::set_var, while the C code on another thread performs getenv.

There are two ways to use mimalloc provided by this repo, one of which could have a soundness hole, the other can't:

  1. As a global_allocator. This doesn't require you type unsafe, and so in principal could be considered an "unsound" API if there's a way to trigger the issue you describe.

  2. The APIs provided by the -sys crate. These are is entirely unsafe, and more-or-less have a safety contract of "I'm using the C library correctly", and so, they can't really be unsound (as "unsound" essentially means "can cause UB in a safe api").

    That said, this could be a bit concerning, or worth documenting if it's plausible to trigger by accident. Thankfully, it's not — it's very very hard.

    The only way to trigger the issue would be if you are not using mimalloc as a global_allocator, you run code prior to main to spawn a thread, detach the thread, and have it do env::set_var calls that attempt to race mimalloc's reads of the env. I say that this is "very hard" to trigger by accident because:

    • Running env::set_var is already pretty unlikely to do on a thread — it's almost always used for initialization.

    • Running Rust code prior to main is is "impossible" with the features provided by the language and standard library. Crates like ctor or startup can enable it on some targets by using features of the executable format and/or OS.

    • Even then, spawning a thread prior to main is wild — essentially pathological behavior (and already unsound with some of the ctor crate's APIs).

      • Doing it and immediately calling env::set_var is basically ato the point of "can only happen if deliberately trying to trigger the bug"
      • Even if you are trying to trigger that bug deliberately, remember that you still have to be using unsafe functions from the libmimalloc-sys crate.
    • More broadly, calling into libstd prior to main is considered unsupported by Rust's stdlib currently. The stack check probably won't work (which is likely a small soundness hole, arguably), and also many APIs will fail or panic prior to main (potentially including std::thread::spawn, although against all odds, that one seems to work at least on my machine).

      • Pretty quickly, if you use these crates, you realize the startup/ctor functions pretty much shouldn't touch anything in std:: so it's unlikely to do something as crazy as spawning a thread prior to main which sets some env vars.
    • Additionally, the user-provided ctor running prior to mimalloc's seems a bit unlikely as well — neither of those crates provide ways to set a priority on the ctor function.

I could probably go on further but yeah, there's no soundness hole here, the only way you'll trigger this is by both:

  1. Using the unsafe API (and only the unsafe API), thus you're taking responsibility for guaranteeing that your calls are sound.
  2. Through an extremely unlikely set of circumstances, that are entirely under your control (... don't spawn a thread before main that does env::set_var).

A bit of a rant about this particular problem

We're somewhat lucky here that the issue can't trigger soundly for mimalloc, as almost any potential fix would be too costly for performance — allocation is insanely hot, and taking an env lock around the allocator would absolutely destroy performance in multi-threaded use cases, which are a big selling point for mimalloc, and something its algorithms are tuned for (I guess we could ask upstream for a way to disble env options, but that's unfortunate too).

The thing is, std::env::set_var shouldn't be treating setenv as safe. C code has treated it as thread-unsafe for a long time, and it's only "safe" in Rust if you exclusively use libstd's apis for it. This is a bad pattern — in general, any API to OS functionality (such as anything in libc) that is only safe if you exclusively a particular wrapper API is pretty much broken. I think we know that now, but didn't when the std::env APIs were designed.

This mismatch is hard to fix, but IMO libstd needs to do something about it (the current "solution" is not really viable for very many applications — anything where Rust isn't calling C, but is called by C, anything where no_std is required, let alone wrapping reasonable C apis that read from the env ...). Even libstd suffers from a variant of the bug — rust-lang/rust#27970, and thankfully, most of the solutions to the bug discussed (at least recently) seem to be in favor of fixing it in such a way that would fix non-libstd usage too.

To add insult to injury, even in cases where the issue does exist (like with the time crate), it's extremely unlikely to trigger IRL — it pretty much requires brute-force trying to cause the issue. Moreover, there's really no workaround — you'll note that the time crate had to remove the functionality...

So yah, 🤷 I don't really know if it's great to go around filing issues about this for what are essentially stdlib soundness holes, especially if they will almost never be hit in real code, especially if they have no workaround for most cases... (IMO the time crate having to remove the functionality (and getting a CVE), is pretty ridiculous/unfortunate), but whatever, thankfully it doesn't matter for mimalloc.

Sorry for the rant, but I find this specific issue deeply annoying.

@enkore
Copy link

enkore commented Feb 13, 2024

mimalloc has an undocumented MI_NO_GETENV build flag which may be desirable to expose through a crate feature. Not because this is unsound, but because one might want to build binaries which aren't influenced through a bunch of environment variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants