-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
thread-local state not available to user programs - how to introduce support into stdlib/compiler? #11770
Comments
There's some confusion here. It would help if you'd refer to the actual code on ocaml/ocaml and not to old commits on ocaml-multicore.
Now what was your question, exactly? |
Thanks for the clarification, my initial assumption was that it is thread-unsafe too, until I got corrected on the forums. What do you think of the following change to str.mli to document this and avoid the confusion?
Annotations added on individual functions rather than on the entire module, because as long as those functions are avoided using Str is both domain- and thread-safe. It is just the use of those functions that trigger the use of domain-local (and not thread-safe) state. Perhaps a separate alert should be used for thread safety vs domain safety though, what do you think?
The OCaml equivalent PR is here
Looks like there are 2 separate issues here:
|
Domain.DLS is the provided API for domain-local (~ per-core) storage for OCaml programs, and it is reasonably efficient.
The fact that only one domain can access its domain-local state guarantees that it can perform unsynchronized accesses without racing against other domains; this is important for performance. If you want per-domain state that all domains can access (say, per-domain mailboxes for message passing), a hashtable on the domain id sounds like a reasonable choice, but you have to be very careful about synchronization. Currently it is easier to implement something slightly more efficient in C, by using an array indexed on the domain index.
Indeed, Thread programming is pretty much unchanged from OCaml 4. It is not clear to me why you would naturally have more needs of efficient thread-local state in OCaml 5 than in OCaml 4. Because it is still the case that only one thread (per domain) runs OCaml code at the same time, this can be reasonable implemented as a (domain-local) hashtable over thread ids. |
Concerning the documentation, we could add a word of caution in the documentation for |
This is true. At the same time, it might be possible to extend the OCaml 5 domain-local state mechanism to implement thread-local state, with the same API. I don't know if that's something worth looking into. Also, I'd like us to keep in mind that thread-local state, like domain-local state, is a quick fix; avoiding global state entirely is a better way to go about parallelism and concurrency. Going back to the Str library, it could easily be given a stateless API; it's just that nobody cared to do so because everybody and their dog was clamoring that Str sucks and PCRE / RE / whatever is so much better. |
Except that Re.Str doesn't provide a stateless API either, it just copied Str's API and replaced the regex engine (Re.Posix/etc., sure I can see the benefits there for thread safety). Too late to make changes for 5.0 here (except maybe for adding doc comments/alerts), but I would suggest:
What do you think? [Currently there isn't a straightforward way for an application that uses Str regexes and groups to be thread-safe, other than converting to an alternative regex library (and due to the different syntax it is a very error-prone process, even though several people have reviewed such a change, inevitably some buggy regexes slipped through due to subtleties around different escaping/special chars/etc.) |
@polytypic proposed a nice solution for fast TLS here: https://discuss.ocaml.org/t/a-hack-to-implement-efficient-tls-thread-local-storage/13264 and suggested the discussion continue here. To reply to @gasche from earlier:
Now threads can live on distinct domains (and, in my opinion, are still the right primitive for concurrency; domains are not the right interface for users and should be a hidden implementation detail). This means that TLS is a more general and a safer interface than DLS (which is not enough if you have threads on at least 2 domains). The fact that |
#11193 added concurrency safety annotations to the stdlib, which is a great start, however looking at https://github.com/ocaml/ocaml/blob/trunk/otherlibs/str/str.mli I don't see any such annotations.
That module was already unsafe to use with multi-threaded programs if you relied on retrieving matched groups (which were stored in a global), but this PR changed it used per-pthread state.
Which is good, I assume this mean that Str is now actually both multi-thread and multi-domain safe? (contrary to Re.Str which has all the problems of the old Str module as discussed on the forum
However that solution only works for C stubs (using pthread thread local state). OCaml 5 offers per-domain state (Domain.DLS), which although would suffice for the 1 thread per 1 domain, it wouldn't be safe for the N threads per 1 domain case (which I assume is possible in multicore).
Since multicore programs are likely to also be multi-threaded, not just multi-domain it'd be good to have a built-in solution in the standard library that would make global OCaml state both multi-thread and multi-domain safe (although global state should in general be discouraged it might be a necessary intermediate step in updating a program to be safe, or as an optimization for a scalable, low-overhead multi-thread/multi-domain data structure).
A careful use of the Atomic module in the stdlib could result in such safety, however perhaps at a greater cost than desired (it'd store the state in a memory page shared between all cores, and updating that might have a higher cost than using purely thread-local state).
Here is one such (non-OCaml) example where atomic counters cause performance issues with lots of cores: https://pkolaczk.github.io/server-slower-than-a-laptop/
Having low-overhead per-thread state storage accessible from OCaml programs might also enable other optimizations (e.g. storing and incrementing per-thread counters, histograms, etc. in fast-paths and summing them up only at query time allowing lower overhead logging/profiling/etc.; all those per-thread counters would be stored in separate memory pages, avoiding cross-core synchronization)
[Thread.id] might allow implementing some of this, although at the cost of having an additional data structure, and additional C library calls (see here for an OCaml example: https://godbolt.org/z/cbh88Yc37) whereas pthread thread local state might be accessible from a CPU register already, e.g. if you look at how GCC implements
thread_local
as demonstrated by this small exampleWhat would be the best way to introduce such thread local storage support into OCaml? (probably too late for 5.0 at this point).
Although it might be possible to prototype some of this in an external library, some assistance from the compiler might be needed to get truly low overhead.
The text was updated successfully, but these errors were encountered: