Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit stdlib for mutable state #10960

Closed
abbysmal opened this issue Jan 28, 2022 · 31 comments
Closed

Audit stdlib for mutable state #10960

abbysmal opened this issue Jan 28, 2022 · 31 comments
Milestone

Comments

@abbysmal
Copy link
Contributor

abbysmal commented Jan 28, 2022

This is moved from ocaml-multicore/ocaml-multicore#757

This issue tracks the status of auditing stdlib for mutable state. OCaml 5.00 stdlib will have the following guarantees:

  1. Memory safety -- no crashes if stdlib modules are concurrently used by multiple domains
  2. Modules with mutable interfaces such as Stack and Queue are not made thread-safe
  3. Modules with top-level mutable state is made safe(r) or unsafety documented

There are two categories by which we classify the stdlib modules.

Top-level state

The module has some top-level mutable state that may cause surprising behaviours. For example, Arg module has a top-level mutable ref called current that represents the cursor of the Sys.argv argument being parsed. If two domains concurrently use Arg module, then they may see arguments being skipped. These cases either need to be:

  1. fixed to be safe for concurrent access (like Filename.temp_file, Format module for predefined buffers) or
  2. their behaviour documented and left alone (such as the Arg module; it is reasonable to expect only the main domain to parse command-line arguments).

Mutable interface

The module may create mutable state and return it. For example, Queue, Stack, Hashtbl, etc. These modules will be left as sequential only and not thread-safe. Multiple concurrent invocations may lead to non-linearizable behaviours. We leave it to the user to put a mutex around the use of the mutable structure (or use thread-safe libraries such as domainslib).

Not all mutable interfaces are unsafe. For example, concurrent array get and set are fine. But we still mark the array for mutable interface. The reason is that, we also use mutable interface to indicate cases where the observed concurrent behaviour cannot be explained by assuming that each API call to the module executes atomically (linearizability). For example, though an individual get and set of Array fields is safe, we mark it as mutable interface as iteration functions that modify the same array may leave the array in a state that cannot be explained by linearizability.

Stdlib modules

The column "needs work" tracks whether code changes need to be made for OCaml 5.00 MVP.

Needs work column will be N if the work has already been done. For example, the Format module has top-level mutable state, which has been made domain-safe already in the Multicore OCaml 5.00 branch. Another example is Printexc, which has been made thread-safe in OCaml trunk in a forward-compatible manner with multicore.

Needs work does not encompass documentation; Needs work may be N and documentation may need to be updated.

Module Top-level state Mutable interface Needs work Notes
arg.ml Y Y ?? current
array.ml N Y N
arrayLabels.ml N N N Only refers to Array
atomic.ml N N N Newly added in OCaml 5.00 (safe by construction)
bigarray.ml N Y N
bool.ml N N N
buffer.ml N Y Y Segfault on parallel access (see #11279)
bytes.ml N Y N Document unsynchronized mixed sized accesses
bytesLabels.ml N N N Only refers to Bytes
callback.ml N N N
camlinternalAtomic.ml N N N
camlinternalFormat.ml N Y N see type buffer
camlinternalFormatBasics.ml N N N
camlinternalLazy.ml N Y Y Lazy must be handled specially for memory-safety. Unify RacyLazy and Undefined exceptions?
camlinternalMod.ml N Y N See uses of Obj.set_field, Lazy.force
camlinternalOO.ml
char.ml N N N
complex.ml N N N
condition.ml N N N Newly added in OCaml 5.00 (safe by construction)
digest.ml N N N
domain.ml N N N Newly added in OCaml 5.00 (safe by construction)
effectHandlers.ml N N N Newly added in OCaml 5.00 (safe by construction)
either.ml N N N
ephemeron.ml N N Y New ephemerons are immutable. Implement Bucket module as in OCaml trunk.
filename.ml Y Y Y current_temp_dir_name and setter getter functions
float.ml N N N
format.ml Y Y N OCaml 5.00 makes pre-defined formatters safe
fun.ml N N N
gc.ml N Y ?? alarm may need attention. caml_gc_get/set may need attention
hashtbl.ml Y Y ?? Uses Random state, which has been made domain-local. What about the non-atomic top-level ref randomized?
in_channel.ml N N Y Concurrent calls to input_all will likely not return contiguous chunks. (already an issue on 4.X)
int.ml N N N
int32.ml N N N
int64.ml N N N
lazy.ml N N N The complexity handled in camlinternalLazy.ml
lexing.ml N Y N
list.ml N N N
listLabels.ml N N N Just refers to the List module
map.ml N N N
marshal.ml N N N Clarify documentation about marshaling a concurrently modified object. Due to OCaml memory model ensuring absence of out-of-thin-air values, no crashes.
moreLabels.ml N N N
mutex.ml N N N Newly added in OCaml 5.00 (safe by construction)
nativeint.ml N N N
obj.ml N Y N
oo.ml
option.ml N N N
out_channel.ml
parsing.ml Y Y ?? current_lookahead_fun and a few effectful functions.
pervasives.ml Is now Stdlib
printexc.ml Y Y Y Top-level state has been made atomic. See raw_backtrace_entries which returns an array (which could be modified concurrently). set_uncaught_exception needs to be checked (uses a global reference)
printf.ml
queue.ml N Y N
random.ml Y Y N Splittable PRNG worked tracked in #10742.
result.ml N N N
scanf.ml ? N N Scanf is thread-unsafe because of internal buffer usages, plus the fact that they may be sharing a channel and thus complicate the situation further. Should probably be documented as thread-unsage.
semaphore.ml N N N Newly added in OCaml 5.00 (safe by construction)
seq.ml N N N
set.ml N N N
stack.ml N Y ?? Already not thread safe ?
stdLabels.ml N N N
std_exit.ml N N N
stdlib.ml ? ? Y caml_shutdown as called by Stdlib.exit may access a non-atomic startup_count variable. This may cause issues under Multicore. Should be atomic?
string.ml N N N Remove deprecated functions in 5.00?
stringLabels.ml N N N
sys.ml Y Y Y caml_ml_enable_runtime_warnings is a global boolean. To be documented or made atomic?
uchar.ml N N N
unit.ml N N N
weak.ml N Y N

otherlibs

Module Top-level state Mutable interface Needs work Notes
win32unix/unix.ml
bigarray/bigarray.ml
unix/unix.ml
unix/unixLabels.ml
str/str.ml N N N Has been made thread-safe with #10670
systhreads/thread.ml Y N Y set_uncaught_exception modifies a global reference
systhreads/event.ml N N N
dynlink/ Y Y Y Dynlink should have a mutex inside it to ensure it doesnt crash especially in bytecode. (excerpt from WG 4)
@abbysmal
Copy link
Contributor Author

I moved this issue from the Multicore repository so that we may resume work on this now that Multicore was merged onto trunk.

  • There are still a few items where the situation is unknown. (non-annotated rows)
  • I have a feeling a few rows were addressed since this was last updated
  • I updated a few rows already. (printexc and systhreads specifically, I think the uncaught_exception_handler bit may need to be looked into or documentation in a multicore setting.)

I think maintainers can edit the initial post, so we can try to keep the table up to date if this sounds reasonable?
If anyone have time to look into fixing individual items or document the empty rows, feel free to give it a look, I will try to address a few items in the coming days.

@Octachron
Copy link
Member

My memory of the last meeting for dynlink is that it was probably better to add a mutex to the dynlinking functions to avoid any kind of races on the dynlinking code.

@abbysmal abbysmal added this to the 5.00.0 milestone Feb 1, 2022
@abbysmal
Copy link
Contributor Author

abbysmal commented Feb 1, 2022

@Octachron Indeed, I updated this entry to reflect the notes from WG 4, thanks.

I made a first run through the list today and filled a few empty rows. I also removed a few lines related to module that were removed since the initial writing of this list.

Tomorrow I will take a stab at more unknown items on the list (there are bigger fishes like sys andunix).

For the printf row, I assume we did solve entirely the issues we had with it when the work on format was done a while back, if I am not mistaking? @kayceesrk @gasche

@gasche
Copy link
Member

gasche commented Feb 1, 2022

Printf and Format use different output engines, and I only discussed Format. (For Format, there is a performance-related work item in #10876 that is just right for @Octachron to handle once the 4.14 release becomes less interesting.)

I just looked, there is no mutable state in Printf (or really CamlinternalFormat). Of course the default formatters (printf, eprintf) use stdout and stderr from Stdlib, and the following output operations:

  • output_string, output_char
  • flush

My understanding is that those operations are protected by per-channel mutexes, so multicore usage is safe, but the output will be mangled if several domains try to print simultaneously. The current approach of Format provides better printing guarantees, as interleavings only happen on flush points. We could also implement it for channels, but it would also be expensive. (It would be nice if we could at least leave enough control to Printf to lock the channel for the printing of its whole format string, but it's some amount of work and I don't know if anyone is going to fight for it.)

@xavierleroy
Copy link
Contributor

The current approach of Format provides better printing guarantees, as interleavings only happen on flush points. We could also implement it for channels, but it would also be expensive. (It would be nice if we could at least leave enough control to Printf to lock the channel for the printing of its whole format string, but it's some amount of work and I don't know if anyone is going to fight for it.)

There's the flockfile / funlockfile approach of POSIX threads: provide library functions to temporarily lock an input channel or output channel, preventing other domains or threads from using it. Then, a program that wants to print something without interferences can take the lock for the duration of the print operations. Possibly printf and fprintf could take the lock automatically, but it may be too expensive.

Implementation-wise, it means using recursive locks to protect I/O channels, something that I considered in the past and that would be easy to implement.

@xavierleroy
Copy link
Contributor

Something like:

Out_channel.with_lock: out_channel -> (unit -> 'a) -> 'a

that can be used like this:

Out.channel.with_lock stdout (fun () -> print_string "Hello! "; print_string "world."; print_newline())

@gasche
Copy link
Member

gasche commented Feb 1, 2022

Possibly printf and fprintf could take the lock automatically, but it may be too expensive.

I think it would be nice to try this. If it's not too expensive, it could provide much better printing behavior when using printf or fprintf in practice. However, we don't have any guarantees over how long the lock would be held, as printf can run user-defined code in the middle of its output logic (%a and %t).

On the other hand, exposing the locking function to user programs is probably a bad idea, because it locks the whole domain, and we promised not to do that (remember, this is the reason why we forbid concurrent Lazy.force). I guess the current mantra is that users should use a higher-level library that provides asynchronous IO.

@Octachron
Copy link
Member

On the subject of Dynlink, the current state on trunk is that only the main domain is allowed to call any Dynlink functions. Do we want to switch to a global lock version (I have a draft version available)? Or would it be better to document the current limitation? The small advantage that I see for the global local version is that the few functions that read the current configuration (main_program_units, public_dynamically_loaded_units, and all_units) seem fine to call concurrently from any domains.

@gasche
Copy link
Member

gasche commented Feb 9, 2022

Personally I find the global-lock version more tempting, because I don't want to have to wonder about which domain I'm in when I write code. Plus it simplifies the documentation -- nothing has to be said. This said, I have very little experience using Dynlink.

@xavierleroy
Copy link
Contributor

Concerning Dynlink, I guess the functions would be safe (= not crash the program) if protected by a global lock. Some global OCaml tables are modified, e.g. Symtable in bytecode, and then a code fragment is registered with the runtime (using a lock-free skiplist!). All in all, it looks like this can be done from any domain, in mutual exclusion.

However, the API remains dangerous if it's used by two domains (or two threads) concurrently, as one could do set_allowed_units, then the other set_allowed_units with a different list, and the first would then load code with the wrong list. So, we'll need a better API eventually.

@Octachron
Copy link
Member

Concurrent write to the list ofallowed_units are already dangerous with threads in OCaml 4, aren't they?

@xavierleroy
Copy link
Contributor

Yes, the API is already questionable in the presence of threads.

@OlivierNicole
Copy link
Contributor

Hi everyone,

I spent some time reviewing a few Stdlib modules lately ({in_out}_channel.ml, stdlib.ml, sys.ml, scanf.ml), I summarize here my findings with help from @Engil.

Needs fixing, actionable

  • {In,Out}_channel.close should lock the channel. As of now, concurrent reads and close can cause non-linearizable results due to field write without a lock. In particular, I was able to trigger segfaults when concurrently closeing and writing to out_channels.
  • {In,Out}_channel.pos reads non-atomic fields; as a consequence, the result may be nonsensical when calling pos concurrently with read/write operations. Should be documented (or alternatively make pos acquire a lock).
  • flush: Either fd should be made atomic, or reading it in caml_ml_flush should be behind the lock. Unclear what's the best option. In the current situation the contract in the Out_channel.close documentation is not respected, as bad interaction between concurrent uses of flush and close may yield unexpected Sys_error. This issue is inherited by flush_all.
  • Scanf functions are not thread-safe on shared channels (because of mutable underlying buffers). This should probably be documented.

Unclear if action required

  • In_channel.input_all is not guaranteed to read a contiguous chunk from the channel if there are concurrent readers, but this is already the case in 4.x.
  • In {In,Out}_channel, set_binary_mode, set_buffered and is_buffered take no lock, so sequential consistency is not to be expected. However, to me this is probably only used in setup code, so concurrent accesses seem unlikely. Maybe should be documented, maybe not, maybe we should acquire a lock; unclear.
  • Sys.caml_ml_enable_runtime_warnings and Sys.caml_ml_runtime_warnings_enabled read and write a global boolean, respectively. Maybe document that these are non-atomic? Unclear.

Needs fixing, and possibly hints at bigger problems

The caml_shutdown primitive, called by Stdlib.exit (through caml_sys_exit) accesses a non-atomic startup_count counter. This seems unsafe if two threads do that concurrently.

This is probably OK in 4.x, but for multicore startup_count at least needs to be atomic. In addition, after discussing with @Engil, we fear there may be deeper issues with runtime unloading in Multicore. Ideally should be investigated further.

@abbysmal
Copy link
Contributor Author

abbysmal commented Apr 4, 2022

Thank you @OlivierNicole for your investigations! I updated the table accordingly to your findings.

@xavierleroy
Copy link
Contributor

Thanks for the careful reading. I agree that {In,Out}_channel.close and most other I/O channel primitives should lock the channel. One reason why e.g. the pos operation doesn't lock the channel in OCaml 4 is that the C primitive doesn't release the master lock, so there are no other threads executing concurrently. But this is no longer true in OCaml 5...

@Octachron
Copy link
Member

Octachron commented Apr 29, 2022

Taking in account #11171, #11193, #11225, #11227 and #11228, I am left with the following items for the stdlib audit:


Changes required:

- unified uncaught exception handlers mechanism for threads, domains and global handler

Documentation:


Did I miss anything?

@kayceesrk
Copy link
Contributor

kayceesrk commented Apr 30, 2022

arrays: document that operations on arrays are not linearizable

For non-float-arrays, individual reads and writes respect the memory model; memory safety is preserved and there are no crashes with racy access. blit operation isn't linearizable, but preserves memory safety. With concurrent blit operations, one may see updates done by any of the concurrent threads of execution. The value in a field is always a write performed by one of the racy threads i.e, no tearing.

For float arrays, operations do not respect the OCaml memory model. Tearing is possible. However, since float arrays do not contain pointers, there is no possibility of crashes. For context, see ocaml-multicore/ocaml-multicore#822.

ephemerons : what should be done for ephemerons in 5.0?

I'd mark ephemerons as not concurrency-safe, and users must implement their own synchronization. The proposed new API has been included, but the necessary implementation can come after 5.0. For context, see #10737.

@Octachron
Copy link
Member

Octachron commented May 2, 2022

Ephemerons weak hash tables are documented as concurrency-unsafe in #11193.
Tearing only happens for float arrays on architectures where the word size and double size is mismatched, isn't it? I will document that point. It sounds also worthwhile to document that blit (and other whole array operations?) is not an atomic operation.
However, I would expect the memory safety property to rather belong to a memory model for advanced user chapter since it is a property of the whole language rather than a piece of information specific to arrays.

@kayceesrk
Copy link
Contributor

kayceesrk commented May 2, 2022

Tearing only happens for float arrays on architectures where the word size and double size is mistaken, isn't it?

Yes, for all operations except blit. For float arrays, we use memmove for blit, which doesn’t guarantee aligned, word-sized writes; memmove may choose to do byte-by-byte copy of values. Hence, tearing may happen with blit even on architectures where word size and double size agree.

It sounds also worthwhile to document that blit (and other whole array operations?) is not an atomic operation.

Yes, indeed.

However, I would expect the memory safety property to rather belong to a memory model for advanced user chapter since it is a property of the whole language rather than a piece of information specific to arrays.

This sounds ok. The fact that program doesn’t crash with data races is the baseline expectation and I guess there’s no point explicitly mentioning that for array operations.

@Octachron
Copy link
Member

I had forgotten that the Lazy documentation was already documented in term of concurrency safety (https://github.com/ocaml/ocaml/blob/trunk/stdlib/lazy.mli#L43). Do we need more documentation or is the current documentation state sufficient?

@OlivierNicole
Copy link
Contributor

I am looking at camlinternalOO.ml but feel a bit disarmed as I know little about how OO is implemented in OCaml, and how this module is used. In addition, @dra27 tells me that OO has already been the subject of much work by @kayceesrk and others, so maybe this module should be considered dealt with already?

I can take care of unix.ml which is more my area.

@OlivierNicole
Copy link
Contributor

In case of concurrent calls to Unix.{in_out}_channel_of_descr on Windows, the two accesses to CRT_fd_val(handle) in this function can race:

int win_CRT_fd_of_filedescr(value handle)
{
if (CRT_fd_val(handle) != NO_CRT_FD) {
return CRT_fd_val(handle);
} else {
int fd = _open_osfhandle((intptr_t) Handle_val(handle), O_BINARY);
if (fd == -1) uerror("channel_of_descr", Nothing);
CRT_fd_val(handle) = fd;
return fd;
}
}

What do we do for racy accesses in C code? Do we treat it as UB?

@xavierleroy
Copy link
Contributor

n case of concurrent calls to Unix.{in_out}_channel_of_descr on Windows, the two accesses to CRT_fd_val(handle) in this function can race

Well spotted! I'm not sure we have a systematic doctrine to deal with these races, but in this particular case it looks like a perfect use case for a compare-and-swap, solving the race condition for a very low cost.

@OlivierNicole
Copy link
Contributor

Follow-up PR: #11304.

@Octachron
Copy link
Member

Octachron commented Jul 20, 2022

Summarizing the current state of the audit: we still have to integrate the documentation before the final release,
and there one known issue to fix in Buffer.

Concerning the internal modules CamlInternalMod and CamlInternalOo, there is a small issue within CamlinternalOo : the statistics for classes, methods and instance variables creation will be inaccurate in presence of multiple domains due to the use of references. But I am not sure if this matters.

The runtime unloading has not been audited as far as I know (beyond the issue with startup_count counter) but that sounds an issue that can possibly wait for OCaml 5.1 and later?

Documentation:

Required fixes

@OlivierNicole
Copy link
Contributor

OlivierNicole commented Aug 5, 2022

I finished reviewing unix.ml. This led me to open a number of PRs and issues, part of which have been addressed already:

Remaining points for which I have not opened tickets (cc @Engil):

  • On both Unix-like systems and Windows, calling closedirconcurrently with one of readdir, rewinddir or closedir may not yield sequentially consistent results.
  • On Unix-like systems, functions open_process_*, process_*_pid and close_process_* all access a shared hash table, and thus can be sequentially inconsistent.

(Edit: improved readability, removed unrelated PRs)

@OlivierNicole
Copy link
Contributor

OlivierNicole commented Aug 24, 2022

Regarding OO programming, in my understanding when calling a method, the method address is resolved and possibly cached by modifying the bytecode:

*pc = (li-3)*sizeof(value);

Isn't this a data race in case of multiple domains executing the same GETPUBMET opcode concurrently?

@OlivierNicole
Copy link
Contributor

OlivierNicole commented Aug 24, 2022

As far as I can tell the same is true in native code, where instead of a word in the bytecode a global method cache is mutated.

Edit: Thinking again, as in OCaml data races are locally bound and here the absence of sequential consistency does not matter (as the cached value is checked before use), we are probably safe in native code.

@xavierleroy
Copy link
Contributor

This is a cache: any integer value in the correct range is a safe value, although better performance is achieved if the value is the one last written. So, this is a benign race condition. For the bytecode interpreter, we'd better inform the C compiler of this fact, e.g. with atomic relaxed accesses? For the ocamlopt-generated code, no change is required.

@OlivierNicole
Copy link
Contributor

Proposed change for the bytecode interpreter as #11512.

@Octachron
Copy link
Member

With the last buffer change, I propose to close this meta-issue PR . The remaining open questions for OCaml 5.1 would probably be better served by fresh issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants