Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit stdlib for mutable state #757

Open
kayceesrk opened this issue Nov 23, 2021 · 5 comments
Open

Audit stdlib for mutable state #757

kayceesrk opened this issue Nov 23, 2021 · 5 comments

Comments

@kayceesrk
Copy link
Contributor

kayceesrk commented Nov 23, 2021

This issue tracks the status of auditing stdlib for mutable state. OCaml 5.00 stdlib will have the following guarantees:

  1. Memory safety -- no crashes if stdlib modules are concurrently used by multiple domains
  2. Modules with mutable interfaces such as Stack and Queue are not made thread-safe
  3. Modules with top-level mutable state is made safe(r) or unsafety documented

There are two categories by which we classify the stdlib modules.

Top-level state

The module has some top-level mutable state that may cause surprising behaviours. For example, Arg module has a top-level mutable ref called current that represents the cursor of the Sys.argv argument being parsed. If two domains concurrently use Arg module, then they may see arguments being skipped. These cases either need to be:

  1. fixed to be safe for concurrent access (like Filename.temp_file, Format module for predefined buffers) or
  2. their behaviour documented and left alone (such as the Arg module; it is reasonable to expect only the main domain to parse command-line arguments).

Mutable interface

The module may create mutable state and return it. For example, Queue, Stack, Hashtbl, etc. These modules will be left as sequential only and not thread-safe. Multiple concurrent invocations may lead to non-linearizable behaviours. We leave it to the user to put a mutex around the use of the mutable structure (or use thread-safe libraries such as domainslib).

Not all mutable interfaces are unsafe. For example, concurrent array get and set are fine. But we still mark the array for mutable interface. The reason is that, we also use mutable interface to indicate cases where the observed concurrent behaviour cannot be explained by assuming that each API call to the module executes atomically (linearizability). For example, though an individual get and set of Array fields is safe, we mark it as mutable interface as iteration functions that modify the same array may leave the array in a state that cannot be explained by linearizability.

Stdlib modules

The column "needs work" tracks whether code changes need to be made for OCaml 5.00 MVP.

Needs work column will be N if the work has already been done. For example, the Format module has top-level mutable state, which has been made domain-safe already in the Multicore OCaml 5.00 branch. Another example is Printexc, which has been made thread-safe in OCaml trunk in a forward-compatible manner with multicore.

Needs work does not encompass documentation; Needs work may be N and documentation may need to be updated.

Module Top-level state Mutable interface Needs work Notes
arg.ml Y Y ?? current
array.ml N Y N
arrayLabels.ml N N N Only refers to Array
atomic.ml N N N Newly added in OCaml 5.00 (safe by construction)
bigarray.ml N Y N
bool.ml N N N
buffer.ml N Y N
bytes.ml N Y N Document unsynchronized mixed sized accesses
bytesLabels.ml N N N Only refers to Bytes
callback.ml N N N
camlinternalAtomic.ml N N N
camlinternalFormat.ml N Y N see type buffer
camlinternalFormatBasics.ml N N N
camlinternalLazy.ml N Y Y Lazy must be handled specially for memory-safety. Unify RacyLazy and Undefined exceptions?
camlinternalMod.ml N Y N See uses of Obj.set_field, Lazy.force
camlinternalOO.ml
char.ml
complex.ml
condition.ml N N N Newly added in OCaml 5.00 (safe by construction)
digest.ml N N N
domain.ml N N N Newly added in OCaml 5.00 (safe by construction)
effectHandlers.ml N N N Newly added in OCaml 5.00 (safe by construction)
either.ml N N N
ephemeron.ml N N Y New ephemerons are immutable. Implement Bucket module as in OCaml trunk.
filename.ml
float.ml N N N
format.ml Y Y N OCaml 5.00 makes pre-defined formatters safe
fun.ml N N N
gc.ml
genlex.ml
hashtbl.ml Y Y ?? Uses Random state, which has been made domain-local. What about the non-atomic top-level ref randomized?
in_channel.ml
int.ml N N N
int32.ml N N N
int64.ml N N N
lazy.ml N N N The complexity handled in camlinternalLazy.ml
lexing.ml N Y N
list.ml N N N
listLabels.ml N N N Just refers to the List module
map.ml N N N
marshal.ml N N N Clarify documentation about marshaling a concurrently modified object. Due to OCaml memory model ensuring absence of out-of-thin-air values, no crashes.
moreLabels.ml N N N
mutex.ml N N N Newly added in OCaml 5.00 (safe by construction)
nativeint.ml
obj.ml N Y N
oo.ml
option.ml N N N
out_channel.ml
parsing.ml
pervasives.ml
printexc.ml Y Y N Top-level state has been made atomic. See raw_backtrace_entries which returns an array (which could be modified concurrently).
printf.ml
queue.ml N Y N
random.ml Y Y N Splittable PRNG worked tracked in ocaml/ocaml#10742.
result.ml
scanf.ml
semaphore.ml N N N Newly added in OCaml 5.00 (safe by construction)
seq.ml N N N
set.ml N N N
stack.ml
stdLabels.ml
std_exit.ml
stdlib.ml
stream.ml
string.ml N N N Remove deprecated functions in 5.00?
stringLabels.ml N N N
sys.ml
uchar.ml N N N
unit.ml N N N
weak.ml N Y N

otherlibs

Module Top-level state Mutable interface Needs work Notes
win32unix/unix.ml
bigarray/bigarray.ml
unix/unix.ml
unix/unixLabels.ml
str/str.ml
systhreads/threadUnix.ml
systhreads/thread.ml
systhreads/event.ml
dynlink/ ?? Many files here
@Octachron
Copy link
Contributor

It seems probably better to also add str.ml to this table even if it not strictly part of the standard library?

@kayceesrk
Copy link
Contributor Author

Good point. It would be useful to extend this to include otherlibs as well.

@kayceesrk
Copy link
Contributor Author

@Octachron I added a table for otherlibs files. I don't know what to do about dynlink right now. Specifically, dynlink_compilerlibs has lots of files.

@avsm
Copy link
Collaborator

avsm commented Nov 24, 2021

Another idea that's come up in discussions about this is to add an ocamldoc tag to each of these stdlib modules that indicates how thread-safe a particular module or function is expected to be. An equivalent convention is present in a javadoc extension.

@Octachron
Copy link
Contributor

Octachron commented Nov 24, 2021

It is the bytecode side of Dynlink that pulls part of the compiler bytecode library (in particular Symtable) as a dependency.

On the native side, it only uses Cmxs_format, Cmi_format (and Misc.String) as dependencies. I would thus suggest to focus on the native side for tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants