Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the wasm memory model have tear-free non-atomic loads? #199

Open
sunfishcode opened this issue May 17, 2023 · 7 comments
Open

Does the wasm memory model have tear-free non-atomic loads? #199

sunfishcode opened this issue May 17, 2023 · 7 comments

Comments

@sunfishcode
Copy link
Member

We recently learned that Musl (used in both wasi-libc and Emscripten) is using non-atomic 32-bit aligned loads from memory locations which can be stored to by other threads without synchronization, apparently on the assumption that compilers will always codegen these to a single instruction, effectively intending something like a very relaxed ordering. On Wasm, LLVM compiles these to plain i32.load wasm instructions.

Are there any guarantees in the proposed Wasm memory model that a plain aligned i32.load of memory that can be simultaneously stored to by aligned i32 stores in other threads won't tear?

@conrad-watt
Copy link
Collaborator

conrad-watt commented May 17, 2023

Are there any guarantees in the proposed Wasm memory model that a plain aligned i32.load of memory that can be simultaneously stored to by aligned i32 stores in other threads won't tear?

Yes, naturally aligned i32 loads and stores won't tear. I sketched out this table a while ago, which should still be good.

@dschuff
Copy link
Member

dschuff commented May 17, 2023

Relatedly, are non-atomic stores guaranteed to be observed by "later" non-atomic loads on another thread? (or by atomic loads?)

@conrad-watt
Copy link
Collaborator

conrad-watt commented May 17, 2023

It depends what you mean by "later". There is a concept of happens-before in the specification. If a non-atomic store (transitively) happens-before a non-atomic load it is guaranteed to be observed.

All operations taking place sequentially in the same thread are happens-before each other (so the single thread semantics works as you expect). In addition, synchronizing atomic_sc store-load pairs create a cross-thread happens-before relation between the two accesses, as do a few other atomic operations such as wait/notify.

So if you have a non-atomic store to x in one thread, followed by an atomic store to y in the same thread, and in a second thread do an atomic load of y which observes the value stored in the first thread, all subsequent loads of x in the second thread are guaranteed to observe the first-thread's non-atomic store to x (intuition - you know the load in the second thread is happening "later").

Thread 1                                                     Thread 2
(a) store x
(b) store_sc y  ---- reads from, creating happens-before---> (c) load_sc y
                                                             (d) load x      // observes the first thread's store

(a) happens-before (b) happens-before (c) happens-before (d) in this example

However, if all the atomic_sc accesses in the above example are changed to non-atomics, this is not guaranteed (in the model or in real hardware).

@tlively
Copy link
Member

tlively commented May 17, 2023

IIUC, if there are no atomics forming that happens-before relationship across threads A and B, then a non-atomic store on thread A might in principle never be observed on thread B, even if it's sitting in a loop doing non-atomic loads of the same address.

@conrad-watt
Copy link
Collaborator

conrad-watt commented May 17, 2023

Yes, that's right.

EDIT: I originally wrote the below, but in light of the point about hoisting at the bottom, I don't think it's advisable to use this code 🙃


It can still be useful to (very carefully) do a loop of non-atomic loads in certain circumstances, so long as the read value is then "confirmed" using an atomic load or CAS. For example, the following isn't an unreasonable spin-lock implementation (assuming Wasm semantics for non-atomics and CAS - in C11 you need a relaxed atomic load instead of a non-atomic load to avoid UB).

while (true) {
  lock_status = non_atomic_load
  if (lock_status = held by someone else) {
    continue
  } else {
    try to acquire the lock with a CAS
    break if successful
  }
}

Technically the memory model says that it's never guaranteed that the non_atomic_load will ever observe another thread unlocking the lock, but if it does observe the unlocked state, the subsequent CAS makes everything synchronise, so the only gotcha is that the memory model allows the possibility of an infinite loop.

In practice the infinite loop never happens with any reasonable compilation scheme, and if you know the loop has terminated, then you're ok even from the theoretical memory model point of view, since the CAS synchronised everything up.

Actually, thinking things through, it might be permissible for a Wasm compiler to hoist the non_atomic_load out of the loop, and then you'd be screwed :P

@dschuff
Copy link
Member

dschuff commented May 17, 2023

IIUC, if there are no atomics forming that happens-before relationship across threads A and B, then a non-atomic store on thread A might in principle never be observed on thread B, even if it's sitting in a loop doing non-atomic loads of the same address.

Yeah, this is basically what I was getting at. Non-atomics never make happens-before relationships with anything on other threads (other than indirectly via a program order relationship with an atomic on the same thread).

All that is to say, maybe we should add some cheaper atomics (acq/rel and maybe relaxed?) to wasm soon :)

@conrad-watt
Copy link
Collaborator

conrad-watt commented May 17, 2023

All that is to say, maybe we should add some cheaper atomics (acq/rel and maybe relaxed?) to wasm soon :)

My personal vision is acq/rel, and then a "load-store-ordered" atomic which is marginally stronger than relaxed but avoids a lot of the mess associated with it (probably implemented using the "bogus conditional branch" strategy benchmarked in sections 4.22 and 5.2 of this paper).

I have to say though, currently I'm pushing hard to get current threads phase 4 landed, so I'd rather not spend too much time thinking about extensions before this is finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants