Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta-issue for OCaml 5.0 release goals #11013

Closed
13 of 16 tasks
Octachron opened this issue Feb 14, 2022 · 16 comments
Closed
13 of 16 tasks

Meta-issue for OCaml 5.0 release goals #11013

Octachron opened this issue Feb 14, 2022 · 16 comments
Milestone

Comments

@Octachron
Copy link
Member

Octachron commented Feb 14, 2022

This is a meta-issue checklist to keep tracks of the issues that ought to be resolved before branching OCaml 5.0 .
(for maintainers: please feel free to edit this post to update the status (or descriptions) of the various items on this list.)

Platforms

32-bits:

Arm 64

Windows

  • mingw-64 with winpthreads

Middle-end

  • flambda: check optimisation safety

Tooling

Runtime

Stdlib

Documentation

CI

@dra27 dra27 added this to the 5.0.0 milestone Mar 16, 2022
@dra27
Copy link
Member

dra27 commented Mar 21, 2022

The technical documentation for the memory model (updated PLDI paper with object publication) task is unlikely to be doable for 5.0 - are there objections to doing that update after 5.0 is released (but before 5.1)?

@gadmm
Copy link
Contributor

gadmm commented Mar 26, 2022

I think people were agreeing that #10915 was important for 5.0, and we only proposed that it was delayable because we had no idea how long it would take. Now #11057 is there to fix it, and new bugs in the current implementation were discovered in the process, so I propose to put it on the list.

Can you please also add #10832? (Or maybe it is implicit as a sub-goal of #10960 even if it is not mentioned there?)

@dra27 For the technical documentation, there is not much to do and it is useful for experts of the community to understand how the memory models works. I don't think it would be very hard to adapt Stephen's post into documentation. A bigger chunk of this task is #10992, and it looks important for 5.0 (it directly affects how users are going to write and port C code). On the other hand, while I suggested the update to the PLDI benchmarks (thinking that it is easy to do), I would be surprised if an updated PLDI paper was necessary for 5.0.

@Octachron
Copy link
Member Author

Concerning the flambda side, I think that it would be nice to have some written confirmation that flambda optimisation have been reviewed as safe? This is mostly to confirm that the initial assessment that flambda optimisations are safe holds (ping @lthls , @mshinwell ?).

@lthls
Copy link
Contributor

lthls commented May 5, 2022

I'm not sure what "reviewed as safe" means. I haven't seen much work on reviewing non-flambda optimisations (for example, simplif.ml, but there are optimisations everywhere in the backend), so I don't even know what such a review would look like.
Flambda can move code around (and remove code), but as long as we don't have a concept of critical sections this should be perfectly safe (the rules for which code can be moved are described by the semantics_of_primitives.ml file, and I believe it has been updated with the correct semantics for the multicore primitives). We can also propagate information on immutable values, but that's normally not a problem. The only exception I'm aware of is with Lazy values, that can appear immutable although they are not, but as far as I know this is handled by using C primitives for accessing and modifying lazy values (more or less, there are some subtle details here). If there has been additional code in the standard library that uses the Obj module that might need reviewing for compatibility with flambda, but I haven't heard about anything like that.
It's possible that there are bugs that will only appear with flambda, as we can hit more corner cases of the semantics, but that's not specific to multicore.
So my opinion on this is that flambda optimisations are assumed to be safe, and unless multicore has changed the semantics of sequential OCaml programs in known ways there isn't anything in particular to review.

@gadmm
Copy link
Contributor

gadmm commented May 6, 2022

The OCaml memory model is different from the C memory model in that it indeed explicitly restricts the valid transformations in sequential programs in known ways in order to reason about data races, for instance preserving load-to-store ordering. So naively I would think too that there is something to check, but maybe it is obvious for flambda authors that the memory model is respected. Out of curiosity I went to see semantics_of_primitives.ml; loads are marked as having a coeffect and stores as having an effect, so I expect that you already respect the coeffect-to-effect ordering. Do I expect correctly?

Regarding Lazy, my takeaway was that there is another issue than the one you describe due to the following polling critical section:

let x = Obj.repr lzv in
(* START no safe points. If a GC occurs here, then the object [x] may be
short-circuited, and getting the first field of [x] would get us the wrong
value. Luckily, the compiler does not insert GC safe points at this place,
so it is ok. *)
let t = Obj.tag x in
if t = Obj.forward_tag then
(Obj.obj (Obj.field x 0) : 'arg)
(* END no safe points *)

My conclusion was that the [@poll error] attribute is not expressive-enough as it stands to express this, as previously discussed in this thread. (On the other hand, the slightly more expressive variant that was previously proposed would be enough.)

@lthls
Copy link
Contributor

lthls commented May 6, 2022

The OCaml memory model is different from the C memory model in that it indeed explicitly restricts the valid transformations in sequential programs in known ways in order to reason about data races, for instance preserving load-to-store ordering. So naively I would think too that there is something to check, but maybe it is obvious for flambda authors that the memory model is respected. Out of curiosity I went to see semantics_of_primitives.ml; loads are marked as having a coeffect and stores as having an effect, so I expect that you already respect the coeffect-to-effect ordering. Do I expect correctly?

Yes. Effects cannot be moved around anything that has effects or coeffects, and coeffects cannot be moved around anything that has effects. There's no location associated to the effects either: if you update two different references in sequence, even if the compiler could prove that the references do not alias it wouldn't be allowed to swap the updates (unless the references are transformed into mutable variables, but that's another story, and I don't think Flambda 1 does anything clever with mutable variables anyway).

One thing you may want to take into account is that our model for Flambda has always assumed that an interrupt (signal, GC or whatever) can occur at any time, running arbitrary code. So in a way, our model was already adapted for multicore as it expected even sequential code to be potentially interleaved with anything. However, lots of OCaml programs use immutable data structures that can still be optimised, so that still left us a lot of interesting things to do.

Regarding Lazy, my takeaway was that there is another issue than the one you describe due to the following polling critical section:

let x = Obj.repr lzv in
(* START no safe points. If a GC occurs here, then the object [x] may be
short-circuited, and getting the first field of [x] would get us the wrong
value. Luckily, the compiler does not insert GC safe points at this place,
so it is ok. *)
let t = Obj.tag x in
if t = Obj.forward_tag then
(Obj.obj (Obj.field x 0) : 'arg)
(* END no safe points *)

My conclusion was that the [@poll error] attribute is not expressive-enough as it stands to express this, as previously discussed in this thread. (On the other hand, the slightly more expressive variant that was previously proposed would be enough.)

From what I read in the thread you linked, it seems that you assume that Flambda understands [@poll error]. That's not the case. As far as I'm aware, the only place in the compiler that checks this attribute is the Polling module, which does the actual placement of polls, and occurs later in the backend. All the rest of the compiler simply ignores it. In fact, you can even get the compiler to drop the attribute in some specific cases if you know all the weird optimisations that the compiler performs:

let allocate () = Array.make 10 0
let g () =
  ((fun _ -> allocate ())[@poll error]) 0
let h () = (fun _ -> allocate ())[@poll error]

You won't get any errors on this code, because the attributes were dropped during translation to Lambda code.

If you want to be serious about critical sections, you should introduce primitives in the compiler for that. Right now, the code is safe only because there's no part of the current compiler (except afl instrumentation, but it is disabled on this file) that would insert a poll there. But it would be quite easy for a future patch to break this without anyone noticing.

@gadmm
Copy link
Contributor

gadmm commented May 9, 2022

From what I read in the thread you linked, it seems that you assume that Flambda understands [@poll error].

I assume Flambda should understand [@poll error], indeed remember the earlier problem where Flambda could in theory break atomicity requirements due to inlining: #10035. Also the original PR #10462 mentions that support for Flambda is needed. I am under the impression that [@poll error] has the same atomicity problem, and that a minimal implementation of [@poll error] in Flambda should prevent inlining. In the code I see that Flambda "understands" [@poll error] at least in the sense that it carries the annotation over to the back-end, and I cannot tell whether this is enough to avoid all inlining as it stands.

Regarding the code in question, I take note that it is safe currently (albeit fragile).

[@poll error] is only meant to be used at top level, I think, which would address your example.

@lthls
Copy link
Contributor

lthls commented May 10, 2022

I don't seem to be getting my point across. The Flambda team has made a more-or-less formal model of the behaviour of OCaml programs. This model is based on our understanding of the compiler's other passes and optimisations, and is usually stricter than what we are seeing in other parts of the compiler. So whenever I point out that something might break with Flambda, it means that it might also be broken without Flambda. But it's easier to show how to break it with hypothetical Flambda transformations, because it's well-defined which transformations are permitted and which are not.
For example, if you look at the original example in #10035, then pushing the allocation down the branch would have been a perfectly valid optimisation for the non-flambda compiler too, it just happens that nobody had implemented it (Flambda didn't implement it either, I think, but we know that it should be legal because we have a model).
In fact, if you alter this example very slightly, you can get the allocation pushed down the branch using any moderately recent version of the compiler:

let r : float ref = ref 0.

let rec push_this x =
  let old_val = r.contents in
  let new_val = x +. old_val in
  let success =
    if r.contents == old_val
    then (r.contents <- new_val; true)
    else false
  in
  if not success then push_this x

So, my estimation is that Flambda as safe as the alternative, and that shouldn't block a 5.0 release.

Regarding [@poll error], you can't make Flambda (or any compiler pass before Polling) understand it because there is no notion of poll in the corresponding intermediate representations. So we're stuck with the current approach of making the annotation imply [@inline never][@local never] and hope that the whole pipeline happens to remove all polls and not insert any new ones (and doesn't remove the attributes).

Finally, a few more examples of (slightly more realistic) code using [@poll error] that I consider badly handled:

let[@poll error] f1 (x : int) (a : int array) =
  a.(x)

module Array = struct
  let[@inline] get x a = Array.get x a
  (* Note: the following definition gives the same result *)
  (* let get = Array.get *)
end

let[@poll error] f2 (x : int) (a : int array) =
  a.(x)

let[@inline] (let*) x f =
  match x with
  | None -> None
  | Some y -> f y

let[@poll error] f3 (x : int option) =
  let* y = x in
  if y > 0 then x else None

The function f1 works as intended, but the function f2 doesn't (except with Flambda 2); and the function f3 will compile correctly with Flambda but not with Closure, which will be unable to remove all the polls.
I'm not saying that there is anything that we need to fix, but in my opinion this shows the limits of the [@poll error] solution. In particular, I don't think we can guarantee that if a piece of code compiles correctly with [@poll error], it will continue to do so in the future.

@Octachron
Copy link
Member Author

@lthls : First, thank you for your answer .

I mostly wanted to assess that Flambda is deemed safe by the Flambda team (in the sense that there is no expected difficulty ahead) outside of the time pressure of a meeting.

I am personally convinced by your argument that Flambda does use a stricter set of constraints on effects and co-effects than multicore OCaml. And it seems reasonable to consider that any potential bug in Flambda would be no qualitatively different than a bug in the Closure middle-end.Thus I will strike out this item from the branching checklist.

@gadmm
Copy link
Contributor

gadmm commented May 11, 2022

@lthls Reading carefully your examples, I think this is working as intended since it is not meant to remove polls. I agree about the limitations especially in terms of forward-compatibility wrt. reasoning about polling locations, but I do not think this is something that was meant to be addressed by [@poll error].

@kayceesrk
Copy link
Contributor

@Octachron

Reg API: Effects, Domains, Atomic, the API docs seem to be in a reasonable state now. Perhaps one thing we may do is to point to the corresponding manual pages rather than add a short summary of the module at the top. What I am not sure about is what URL to use for this.

@Octachron
Copy link
Member Author

Linking to the manual in the API documentation is indeed not that straightforward. I will have a look at adding a link with the right ocamldoc incantation and html post-processing to the Effect module this week.

@OlivierNicole
Copy link
Contributor

Regarding the manual, we probably want to update this section https://v2.ocaml.org/manual/intfc.html#s%3Ac-gc-harmony, which still talks about the master lock.

@Octachron
Copy link
Member Author

Rereading this meta-issue, the only remaining high-level items are the documentation issue and the last part of the stdlib audit (mostly unix and buffer). The finer points are better tracked by the 5.0 milestone and the stdlib audit meta-issue. I am thus closing this meta-issue.

@gadmm
Copy link
Contributor

gadmm commented Oct 4, 2022

The finer points are better tracked by the 5.0 milestone and the stdlib audit meta-issue.

Right, but it seems that the 5.0 milestone is missing #10992 and #11426.

@Octachron
Copy link
Member Author

For #11426, I can reasonably commit some time to do an audit before (or during) the beta.

I fear however that #10992 is beyond the scope of 5.0 in term of review and development fatigue. I am thus assigning it to a 5.1 milestone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants