Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thread-local state not available to user programs - how to introduce support into stdlib/compiler? #11770

Open
edwintorok opened this issue Nov 28, 2022 · 7 comments

Comments

@edwintorok
Copy link
Contributor

edwintorok commented Nov 28, 2022

#11193 added concurrency safety annotations to the stdlib, which is a great start, however looking at https://github.com/ocaml/ocaml/blob/trunk/otherlibs/str/str.mli I don't see any such annotations.
That module was already unsafe to use with multi-threaded programs if you relied on retrieving matched groups (which were stored in a global), but this PR changed it used per-pthread state.
Which is good, I assume this mean that Str is now actually both multi-thread and multi-domain safe? (contrary to Re.Str which has all the problems of the old Str module as discussed on the forum

However that solution only works for C stubs (using pthread thread local state). OCaml 5 offers per-domain state (Domain.DLS), which although would suffice for the 1 thread per 1 domain, it wouldn't be safe for the N threads per 1 domain case (which I assume is possible in multicore).
Since multicore programs are likely to also be multi-threaded, not just multi-domain it'd be good to have a built-in solution in the standard library that would make global OCaml state both multi-thread and multi-domain safe (although global state should in general be discouraged it might be a necessary intermediate step in updating a program to be safe, or as an optimization for a scalable, low-overhead multi-thread/multi-domain data structure).

A careful use of the Atomic module in the stdlib could result in such safety, however perhaps at a greater cost than desired (it'd store the state in a memory page shared between all cores, and updating that might have a higher cost than using purely thread-local state).
Here is one such (non-OCaml) example where atomic counters cause performance issues with lots of cores: https://pkolaczk.github.io/server-slower-than-a-laptop/

Having low-overhead per-thread state storage accessible from OCaml programs might also enable other optimizations (e.g. storing and incrementing per-thread counters, histograms, etc. in fast-paths and summing them up only at query time allowing lower overhead logging/profiling/etc.; all those per-thread counters would be stored in separate memory pages, avoiding cross-core synchronization)

[Thread.id] might allow implementing some of this, although at the cost of having an additional data structure, and additional C library calls (see here for an OCaml example: https://godbolt.org/z/cbh88Yc37) whereas pthread thread local state might be accessible from a CPU register already, e.g. if you look at how GCC implements thread_local as demonstrated by this small example

What would be the best way to introduce such thread local storage support into OCaml? (probably too late for 5.0 at this point).

Although it might be possible to prototype some of this in an external library, some assistance from the compiler might be needed to get truly low overhead.

@xavierleroy
Copy link
Contributor

That module was already unsafe to use with multi-threaded programs if you relied on retrieving matched groups (which were stored in a global), but this PR changed it used per-pthread state. Which is good, I assume this mean that Str is now actually both multi-thread and multi-domain safe?

There's some confusion here. It would help if you'd refer to the actual code on ocaml/ocaml and not to old commits on ocaml-multicore.

  • The part of Str that's written in C currently has no global state whatsoever, so it's domain- and thread-safe. In OCaml 4 there was a bit of gratuitous global state, which was briefly turned into pthread TLS in the ocaml-multicore development, before sanity prevailed and all global state was removed.
  • The OCaml API for Str is stateful (the "matched groups" part). In OCaml 5 domain-local state is used, so that the library is domain-safe. It remains thread-unsafe like in OCaml 4.

Now what was your question, exactly?

@edwintorok
Copy link
Contributor Author

That module was already unsafe to use with multi-threaded programs if you relied on retrieving matched groups (which were stored in a global), but this PR changed it used per-pthread state. Which is good, I assume this mean that Str is now actually both multi-thread and multi-domain safe?

There's some confusion here.

  • The part of Str that's written in C currently has no global state whatsoever, so it's domain- and thread-safe. In OCaml 4 there was a bit of gratuitous global state, which was briefly turned into pthread TLS in the ocaml-multicore development, before sanity prevailed and all global state was removed.
  • The OCaml API for Str is stateful (the "matched groups" part). In OCaml 5 domain-local state is used, so that the library is domain-safe. It remains thread-unsafe like in OCaml 4.

Thanks for the clarification, my initial assumption was that it is thread-unsafe too, until I got corrected on the forums.
I should've looked closer at the implementation (I made the assumption that the OCaml code somehow uses the global state from the -- now thread-safe -- C stub)

What do you think of the following change to str.mli to document this and avoid the confusion?

diff --git a/otherlibs/str/str.mli b/otherlibs/str/str.mli
index 8fdb1ea27..909075c1d 100644
--- a/otherlibs/str/str.mli
+++ b/otherlibs/str/str.mli
@@ -158,6 +158,8 @@ val match_end : unit -> int
    details). *)

 val matched_group : int -> string -> string
+[@@alert unsynchronized_access
+    "Str matched group is domain local, but not thread safe state."]
 (** [matched_group n s] returns the substring of [s] that was matched
    by the [n]th group [\(...\)] of the regular expression that was
    matched by the last call to a matching or searching function (see
@@ -173,6 +175,8 @@ val matched_group : int -> string -> string
    because the first group itself was not matched. *)

 val group_beginning : int -> int
+[@@alert unsynchronized_access
+    "Str matched group is domain local, but not thread safe state."]
 (** [group_beginning n] returns the position of the first character
    of the substring that was matched by the [n]th group of
    the regular expression that was matched by the last call to a
@@ -183,6 +187,8 @@ val group_beginning : int -> int
    the regular expression. *)

 val group_end : int -> int
+[@@alert unsynchronized_access
+    "Str matched group is domain local, but not thread safe state."]
 (** [group_end n] returns
    the position of the character following the last character of
    substring that was matched by the [n]th group of the regular
@@ -221,6 +227,8 @@ val substitute_first : regexp -> (string -> string) -> string -> string
    matching the regular expression is replaced. *)

 val replace_matched : string -> string -> string
+[@@alert unsynchronized_access
+    "Str matched group is domain local, but not thread safe state."]
 (** [replace_matched repl s] returns the replacement text [repl]
    in which [\1], [\2], etc. have been replaced by the text
    matched by the corresponding groups in the regular expression

Annotations added on individual functions rather than on the entire module, because as long as those functions are avoided using Str is both domain- and thread-safe. It is just the use of those functions that trigger the use of domain-local (and not thread-safe) state.

Perhaps a separate alert should be used for thread safety vs domain safety though, what do you think?

It would help if you'd refer to the actual code on ocaml/ocaml and not to old commits on ocaml-multicore.

The OCaml equivalent PR is here

Now what was your question, exactly?

Looks like there are 2 separate issues here:

  • document the thread-unsafety of Str

  • efficient access to per-CPU state. I'll open a separate issue about this, because it is entirely unrelated to how Str works (there is nothing in Str that could be reused for this purpose, I was wrong)
    (From a quick look: the assembly generated for Domain.DLS.get/set is probably close enough to what I had in mind: it requires 2 memory accesses instead of 1, but there are no C calls needed; however Domain.DLS.set is not implemented as a primitive, but as a C call.
    There is no way to fold over all domain local state, unless I'd manually introduce some all-domain rendezvous code into each domain.
    Thread-local state is still unavailable AFAICT, however domain local state may be sufficient for the optimizations I had in mind.)

@gasche
Copy link
Member

gasche commented Nov 29, 2022

Domain.DLS is the provided API for domain-local (~ per-core) storage for OCaml programs, and it is reasonably efficient.

There is no way to fold over all domain local state, unless I'd manually introduce some all-domain rendezvous code into each domain.

The fact that only one domain can access its domain-local state guarantees that it can perform unsynchronized accesses without racing against other domains; this is important for performance.

If you want per-domain state that all domains can access (say, per-domain mailboxes for message passing), a hashtable on the domain id sounds like a reasonable choice, but you have to be very careful about synchronization. Currently it is easier to implement something slightly more efficient in C, by using an array indexed on the domain index.

Thread-local state is still unavailable AFAICT, however domain local state may be sufficient for the optimizations I had in mind.)

Indeed, Thread programming is pretty much unchanged from OCaml 4. It is not clear to me why you would naturally have more needs of efficient thread-local state in OCaml 5 than in OCaml 4.

Because it is still the case that only one thread (per domain) runs OCaml code at the same time, this can be reasonable implemented as a (domain-local) hashtable over thread ids.

@Octachron
Copy link
Member

Concerning the documentation, we could add a word of caution in the documentation for Str, but I am not sure if it is useful to add an alert to specific Str functions: we are not trying to encourage people to write new code using Str.

@xavierleroy
Copy link
Contributor

Thread programming is pretty much unchanged from OCaml 4. It is not clear to me why you would naturally have more needs of efficient thread-local state in OCaml 5 than in OCaml 4. Because it is still the case that only one thread (per domain) runs OCaml code at the same time, this can be reasonable implemented as a (domain-local) hashtable over thread ids.

This is true. At the same time, it might be possible to extend the OCaml 5 domain-local state mechanism to implement thread-local state, with the same API. I don't know if that's something worth looking into.

Also, I'd like us to keep in mind that thread-local state, like domain-local state, is a quick fix; avoiding global state entirely is a better way to go about parallelism and concurrency.

Going back to the Str library, it could easily be given a stateless API; it's just that nobody cared to do so because everybody and their dog was clamoring that Str sucks and PCRE / RE / whatever is so much better.

@edwintorok
Copy link
Contributor Author

Going back to the Str library, it could easily be given a stateless API; it's just that nobody cared to do so because everybody and their dog was clamoring that Str sucks and PCRE / RE / whatever is so much better.

Except that Re.Str doesn't provide a stateless API either, it just copied Str's API and replaced the regex engine (Re.Posix/etc., sure I can see the benefits there for thread safety).

Too late to make changes for 5.0 here (except maybe for adding doc comments/alerts), but I would suggest:

  • remove the non-threadsafe functions from Re.Str, introduce new ones that are thread safe (need to modify the function signature, so won't be a drop-in replacement anymore, but I think that'd be good: one reason for switching from Str to Re was to gain thread safety). I've opened an issue on the upstream project a while ago about this, if it is useful I can attempt to follow up with a PR.

  • once Re.Str is updated, and the new API is tested on some applications, and if there is enough interest the Str in OCaml 5.x could be updated to have these additional thread-safe group functions with same signatures and deprecate the old ones.
    IIUC this is the preferred way for the stdlib usually: prototype/try out changes in an external lib first before making the change in stdlib

What do you think?

[Currently there isn't a straightforward way for an application that uses Str regexes and groups to be thread-safe, other than converting to an alternative regex library (and due to the different syntax it is a very error-prone process, even though several people have reviewed such a change, inevitably some buggy regexes slipped through due to subtleties around different escaping/special chars/etc.)
Nothing new compared to OCaml 4 of course as you said, just that OCaml 5 has sparked some renewed interest in checking thread safety.]

@c-cube
Copy link
Contributor

c-cube commented Oct 18, 2023

@polytypic proposed a nice solution for fast TLS here: https://discuss.ocaml.org/t/a-hack-to-implement-efficient-tls-thread-local-storage/13264 and suggested the discussion continue here.

To reply to @gasche from earlier:

Indeed, Thread programming is pretty much unchanged from OCaml 4. It is not clear to me why you would naturally have more needs of efficient thread-local state in OCaml 5 than in OCaml 4.

Now threads can live on distinct domains (and, in my opinion, are still the right primitive for concurrency; domains are not the right interface for users and should be a hidden implementation detail). This means that TLS is a more general and a safer interface than DLS (which is not enough if you have threads on at least 2 domains).

The fact that Str was broken in presence of multiple threads on OCaml 4 doesn't mean it should be in OCaml 5, imho. And there are other situations where TLS is quite useful; for me, a prime example is instrumentation and ambient context to carry around data that can't be passed a regular function arguments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants