Meta-issue for OCaml 5.0 release goals #11013

Octachron · 2022-02-14T13:47:30Z

This is a meta-issue checklist to keep tracks of the issues that ought to be resolved before branching OCaml 5.0 .
(for maintainers: please feel free to edit this post to update the status (or descriptions) of the various items on this list.)

Platforms

32-bits:

bytecode-only [works since Assorted fixes found while restarting the Jenkins CI #11037]
(patches accepted for word-size independence)

Arm 64

Arm64 multicore support #10972 now merged. Also added CFI directives to runtime/arm64.S to support stack backtraces by 3rd-party tools (ARM64 CFI support #11099).
- [ ] technical documentation for the memory model (updated PLDI paper with object publication)
(see also OCaml multicore memory model and C (runtime, FFI, VM) #10992, Wrong memory order constraint in caml_modify #10886) (to be done later)
litmus tests (Memory model tests #11004)
high-level descriptions of the assembly functions in runtime/$ARCH.S (Documentation improvements for native stack switching functions #11059)

Windows

mingw-64 with winpthreads

Middle-end

flambda: check optimisation safety

Tooling

ocamldebug: assess feasibility ( Restore basic functionality to the bytecode debugger #11065 )
event ring (Ring-buffer based runtime tracing (eventring) #10964) to replace eventlog

Runtime

limit virtual memory reservation (Means of limiting how much memory is being reserved by the runtime, so that Valgrind and AFL can be used #10971) [temporary workaround introduced by Assorted fixes found while restarting the Jenkins CI #11037] - follow-up API work in Gc.set now changes the minor heap size of all domains #11142
bytecode performance assessment (see Bytecode compiler emits too many calls to caml_ensure_stack_capacity, causing a slowdown #11062)

Stdlib

full audit: Audit stdlib for mutable state #10960
documentation: concurrency safety (Handle alerts in ocamldoc #11024, Concurrency safety documentation, part one: concurrency unsafe types #11193, stdlib concurrency safety documentation, part two: global states #11227)

Documentation

API: Effects, Domains, Atomic
manual: Effects (Effects manual #11093)
manual: Domains (Manual chapters on parallelism and memory model #11280)
manual: memory models (Manual chapters on parallelism and memory model #11280)

CI

~~Thread Sanitizer (see ThreadSanitizer issues #11040)~~ (removed, hopeless)

The text was updated successfully, but these errors were encountered:

dra27 · 2022-03-21T14:48:26Z

The technical documentation for the memory model (updated PLDI paper with object publication) task is unlikely to be doable for 5.0 - are there objections to doing that update after 5.0 is released (but before 5.1)?

gadmm · 2022-03-26T15:59:00Z

I think people were agreeing that #10915 was important for 5.0, and we only proposed that it was delayable because we had no idea how long it would take. Now #11057 is there to fix it, and new bugs in the current implementation were discovered in the process, so I propose to put it on the list.

Can you please also add #10832? (Or maybe it is implicit as a sub-goal of #10960 even if it is not mentioned there?)

@dra27 For the technical documentation, there is not much to do and it is useful for experts of the community to understand how the memory models works. I don't think it would be very hard to adapt Stephen's post into documentation. A bigger chunk of this task is #10992, and it looks important for 5.0 (it directly affects how users are going to write and port C code). On the other hand, while I suggested the update to the PLDI benchmarks (thinking that it is easy to do), I would be surprised if an updated PLDI paper was necessary for 5.0.

Octachron · 2022-05-05T14:19:04Z

Concerning the flambda side, I think that it would be nice to have some written confirmation that flambda optimisation have been reviewed as safe? This is mostly to confirm that the initial assessment that flambda optimisations are safe holds (ping @lthls , @mshinwell ?).

lthls · 2022-05-05T15:51:13Z

I'm not sure what "reviewed as safe" means. I haven't seen much work on reviewing non-flambda optimisations (for example, simplif.ml, but there are optimisations everywhere in the backend), so I don't even know what such a review would look like.
Flambda can move code around (and remove code), but as long as we don't have a concept of critical sections this should be perfectly safe (the rules for which code can be moved are described by the semantics_of_primitives.ml file, and I believe it has been updated with the correct semantics for the multicore primitives). We can also propagate information on immutable values, but that's normally not a problem. The only exception I'm aware of is with Lazy values, that can appear immutable although they are not, but as far as I know this is handled by using C primitives for accessing and modifying lazy values (more or less, there are some subtle details here). If there has been additional code in the standard library that uses the Obj module that might need reviewing for compatibility with flambda, but I haven't heard about anything like that.
It's possible that there are bugs that will only appear with flambda, as we can hit more corner cases of the semantics, but that's not specific to multicore.
So my opinion on this is that flambda optimisations are assumed to be safe, and unless multicore has changed the semantics of sequential OCaml programs in known ways there isn't anything in particular to review.

gadmm · 2022-05-06T13:38:51Z

The OCaml memory model is different from the C memory model in that it indeed explicitly restricts the valid transformations in sequential programs in known ways in order to reason about data races, for instance preserving load-to-store ordering. So naively I would think too that there is something to check, but maybe it is obvious for flambda authors that the memory model is respected. Out of curiosity I went to see semantics_of_primitives.ml; loads are marked as having a coeffect and stores as having an effect, so I expect that you already respect the coeffect-to-effect ordering. Do I expect correctly?

Regarding Lazy, my takeaway was that there is another issue than the one you describe due to the following polling critical section:

ocaml/stdlib/camlinternalLazy.ml

Lines 94 to 102 in 77fee60

    
             let x = Obj.repr lzv in 
        
             (* START no safe points. If a GC occurs here, then the object [x] may be 
        
                short-circuited, and getting the first field of [x] would get us the wrong 
        
                value. Luckily, the compiler does not insert GC safe points at this place, 
        
                so it is ok. *) 
        
             let t = Obj.tag x in 
        
             if t = Obj.forward_tag then 
        
               (Obj.obj (Obj.field x 0) : 'arg) 
        
             (* END no safe points *)

My conclusion was that the [@poll error] attribute is not expressive-enough as it stands to express this, as previously discussed in this thread. (On the other hand, the slightly more expressive variant that was previously proposed would be enough.)

lthls · 2022-05-06T17:04:37Z

The OCaml memory model is different from the C memory model in that it indeed explicitly restricts the valid transformations in sequential programs in known ways in order to reason about data races, for instance preserving load-to-store ordering. So naively I would think too that there is something to check, but maybe it is obvious for flambda authors that the memory model is respected. Out of curiosity I went to see semantics_of_primitives.ml; loads are marked as having a coeffect and stores as having an effect, so I expect that you already respect the coeffect-to-effect ordering. Do I expect correctly?

Yes. Effects cannot be moved around anything that has effects or coeffects, and coeffects cannot be moved around anything that has effects. There's no location associated to the effects either: if you update two different references in sequence, even if the compiler could prove that the references do not alias it wouldn't be allowed to swap the updates (unless the references are transformed into mutable variables, but that's another story, and I don't think Flambda 1 does anything clever with mutable variables anyway).

One thing you may want to take into account is that our model for Flambda has always assumed that an interrupt (signal, GC or whatever) can occur at any time, running arbitrary code. So in a way, our model was already adapted for multicore as it expected even sequential code to be potentially interleaved with anything. However, lots of OCaml programs use immutable data structures that can still be optimised, so that still left us a lot of interesting things to do.

Regarding Lazy, my takeaway was that there is another issue than the one you describe due to the following polling critical section:

ocaml/stdlib/camlinternalLazy.ml

Lines 94 to 102 in 77fee60

let x = Obj.repr lzv in

(* START no safe points. If a GC occurs here, then the object [x] may be

short-circuited, and getting the first field of [x] would get us the wrong

value. Luckily, the compiler does not insert GC safe points at this place,

so it is ok. *)

let t = Obj.tag x in

if t = Obj.forward_tag then

(Obj.obj (Obj.field x 0) : 'arg)

(* END no safe points *)

My conclusion was that the [@poll error] attribute is not expressive-enough as it stands to express this, as previously discussed in this thread. (On the other hand, the slightly more expressive variant that was previously proposed would be enough.)

From what I read in the thread you linked, it seems that you assume that Flambda understands [@poll error]. That's not the case. As far as I'm aware, the only place in the compiler that checks this attribute is the Polling module, which does the actual placement of polls, and occurs later in the backend. All the rest of the compiler simply ignores it. In fact, you can even get the compiler to drop the attribute in some specific cases if you know all the weird optimisations that the compiler performs:

let allocate () = Array.make 10 0
let g () =
  ((fun _ -> allocate ())[@poll error]) 0
let h () = (fun _ -> allocate ())[@poll error]

You won't get any errors on this code, because the attributes were dropped during translation to Lambda code.

If you want to be serious about critical sections, you should introduce primitives in the compiler for that. Right now, the code is safe only because there's no part of the current compiler (except afl instrumentation, but it is disabled on this file) that would insert a poll there. But it would be quite easy for a future patch to break this without anyone noticing.

gadmm · 2022-05-09T16:08:58Z

From what I read in the thread you linked, it seems that you assume that Flambda understands [@poll error].

I assume Flambda should understand [@poll error], indeed remember the earlier problem where Flambda could in theory break atomicity requirements due to inlining: #10035. Also the original PR #10462 mentions that support for Flambda is needed. I am under the impression that [@poll error] has the same atomicity problem, and that a minimal implementation of [@poll error] in Flambda should prevent inlining. In the code I see that Flambda "understands" [@poll error] at least in the sense that it carries the annotation over to the back-end, and I cannot tell whether this is enough to avoid all inlining as it stands.

Regarding the code in question, I take note that it is safe currently (albeit fragile).

[@poll error] is only meant to be used at top level, I think, which would address your example.

lthls · 2022-05-10T13:06:14Z

I don't seem to be getting my point across. The Flambda team has made a more-or-less formal model of the behaviour of OCaml programs. This model is based on our understanding of the compiler's other passes and optimisations, and is usually stricter than what we are seeing in other parts of the compiler. So whenever I point out that something might break with Flambda, it means that it might also be broken without Flambda. But it's easier to show how to break it with hypothetical Flambda transformations, because it's well-defined which transformations are permitted and which are not.
For example, if you look at the original example in #10035, then pushing the allocation down the branch would have been a perfectly valid optimisation for the non-flambda compiler too, it just happens that nobody had implemented it (Flambda didn't implement it either, I think, but we know that it should be legal because we have a model).
In fact, if you alter this example very slightly, you can get the allocation pushed down the branch using any moderately recent version of the compiler:

let r : float ref = ref 0.

let rec push_this x =
  let old_val = r.contents in
  let new_val = x +. old_val in
  let success =
    if r.contents == old_val
    then (r.contents <- new_val; true)
    else false
  in
  if not success then push_this x

So, my estimation is that Flambda as safe as the alternative, and that shouldn't block a 5.0 release.

Regarding [@poll error], you can't make Flambda (or any compiler pass before Polling) understand it because there is no notion of poll in the corresponding intermediate representations. So we're stuck with the current approach of making the annotation imply [@inline never][@local never] and hope that the whole pipeline happens to remove all polls and not insert any new ones (and doesn't remove the attributes).

Finally, a few more examples of (slightly more realistic) code using [@poll error] that I consider badly handled:

let[@poll error] f1 (x : int) (a : int array) =
  a.(x)

module Array = struct
  let[@inline] get x a = Array.get x a
  (* Note: the following definition gives the same result *)
  (* let get = Array.get *)
end

let[@poll error] f2 (x : int) (a : int array) =
  a.(x)

let[@inline] (let*) x f =
  match x with
  | None -> None
  | Some y -> f y

let[@poll error] f3 (x : int option) =
  let* y = x in
  if y > 0 then x else None

The function f1 works as intended, but the function f2 doesn't (except with Flambda 2); and the function f3 will compile correctly with Flambda but not with Closure, which will be unable to remove all the polls.
I'm not saying that there is anything that we need to fix, but in my opinion this shows the limits of the [@poll error] solution. In particular, I don't think we can guarantee that if a piece of code compiles correctly with [@poll error], it will continue to do so in the future.

Octachron · 2022-05-10T13:17:28Z

@lthls : First, thank you for your answer .

I mostly wanted to assess that Flambda is deemed safe by the Flambda team (in the sense that there is no expected difficulty ahead) outside of the time pressure of a meeting.

I am personally convinced by your argument that Flambda does use a stricter set of constraints on effects and co-effects than multicore OCaml. And it seems reasonable to consider that any potential bug in Flambda would be no qualitatively different than a bug in the Closure middle-end.Thus I will strike out this item from the branching checklist.

gadmm · 2022-05-11T12:34:52Z

@lthls Reading carefully your examples, I think this is working as intended since it is not meant to remove polls. I agree about the limitations especially in terms of forward-compatibility wrt. reasoning about polling locations, but I do not think this is something that was meant to be addressed by [@poll error].

kayceesrk · 2022-06-08T04:55:05Z

@Octachron

Reg API: Effects, Domains, Atomic, the API docs seem to be in a reasonable state now. Perhaps one thing we may do is to point to the corresponding manual pages rather than add a short summary of the module at the top. What I am not sure about is what URL to use for this.

Octachron · 2022-06-08T15:04:22Z

Linking to the manual in the API documentation is indeed not that straightforward. I will have a look at adding a link with the right ocamldoc incantation and html post-processing to the Effect module this week.

OlivierNicole · 2022-06-15T13:47:23Z

Regarding the manual, we probably want to update this section https://v2.ocaml.org/manual/intfc.html#s%3Ac-gc-harmony, which still talks about the master lock.

Octachron · 2022-10-04T12:56:13Z

Rereading this meta-issue, the only remaining high-level items are the documentation issue and the last part of the stdlib audit (mostly unix and buffer). The finer points are better tracked by the 5.0 milestone and the stdlib audit meta-issue. I am thus closing this meta-issue.

gadmm · 2022-10-04T13:17:10Z

The finer points are better tracked by the 5.0 milestone and the stdlib audit meta-issue.

Right, but it seems that the 5.0 milestone is missing #10992 and #11426.

Octachron · 2022-10-04T13:31:34Z

For #11426, I can reasonably commit some time to do an audit before (or during) the beta.

I fear however that #10992 is beyond the scope of 5.0 in term of review and development fatigue. I am thus assigning it to a 5.1 milestone.

dra27 added this to the 5.0.0 milestone Mar 16, 2022

Octachron mentioned this issue Mar 29, 2022

Build system: make it possible to choose which ocamldep (and flags) to use #11126

Merged

pitag-ha mentioned this issue May 31, 2022

Support windows tarides/ocaml-platform-installer#15

Open

Octachron closed this as completed Oct 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meta-issue for OCaml 5.0 release goals #11013

Meta-issue for OCaml 5.0 release goals #11013

Octachron commented Feb 14, 2022 •

edited

dra27 commented Mar 21, 2022

gadmm commented Mar 26, 2022

Octachron commented May 5, 2022

lthls commented May 5, 2022

gadmm commented May 6, 2022

lthls commented May 6, 2022

gadmm commented May 9, 2022 •

edited

lthls commented May 10, 2022

Octachron commented May 10, 2022

gadmm commented May 11, 2022

kayceesrk commented Jun 8, 2022

Octachron commented Jun 8, 2022

OlivierNicole commented Jun 15, 2022

Octachron commented Oct 4, 2022

gadmm commented Oct 4, 2022

Octachron commented Oct 4, 2022

Meta-issue for OCaml 5.0 release goals #11013

Meta-issue for OCaml 5.0 release goals #11013

Comments

Octachron commented Feb 14, 2022 • edited

Platforms

32-bits:

Arm 64

Windows

Middle-end

Tooling

Runtime

Stdlib

Documentation

CI

dra27 commented Mar 21, 2022

gadmm commented Mar 26, 2022

Octachron commented May 5, 2022

lthls commented May 5, 2022

gadmm commented May 6, 2022

lthls commented May 6, 2022

gadmm commented May 9, 2022 • edited

lthls commented May 10, 2022

Octachron commented May 10, 2022

gadmm commented May 11, 2022

kayceesrk commented Jun 8, 2022

Octachron commented Jun 8, 2022

OlivierNicole commented Jun 15, 2022

Octachron commented Oct 4, 2022

gadmm commented Oct 4, 2022

Octachron commented Oct 4, 2022

Octachron commented Feb 14, 2022 •

edited

gadmm commented May 9, 2022 •

edited