Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow more CSE of loads from immutable blocks #367

Closed

Conversation

mshinwell
Copy link
Contributor

Morally speaking, this patch is small: it changes CSEgen so that values computed by loading from an immutable place are not killed by stores.

The specific near-term aim is to permit accesses to closure environments to be subject to CSE.

With flambda I measure this patch giving a 2.5% speedup on Knuth-Bendix, which is quite significant for a small change.

@lpw25 has a more complete implementation of the front-end portion of this patch on the multicore branch that, instead of adding Pimmutable_field, alters Pfield to always carry such information (together with a flag saying whether the value is always immediate). This could have fairly wide applicability, e.g. in reading from record fields, which more often than not are immutable. I would like to consider merging that patch in the future---I haven't got time right now---but I think this one is a reasonable step along the way.

(I will adjust the code for the other backends if this is accepted.)

@mshinwell
Copy link
Contributor Author

(I should add that Leo's patch is a prerequisite for multicore.)

@mshinwell mshinwell changed the title Allow CSE of loads from immutable blocks Allow more CSE of loads from immutable blocks Dec 22, 2015
@xavierleroy
Copy link
Contributor

Various remarks in random order:

  • I thought about doing this kind of CSE a while ago but remembered that it would break in the presence of tail-recursion modulo constructors, either done by the compiler (in the future) or by Obj witchcraft (as in Batteries).
  • Is this really a prerequisite for flambda? Could we please try to merge some version of flambda that works, before embarking in yet another optimization?
  • This affects the platform-specific parts of the backend, but only AMD64 is done yet.
  • The speedup on KB is probably noise, since KB uses essentially no mutable data.
  • On the other hand, Bigarray-based codes could benefit significantly, if the bigarray dimensions and data pointer are accessed with immutable loads.
  • In Cmmgen, the immutable load on line 570 looks wrong (we're reading from a float array!)
  • The rest of Cmmgen errs on the side of caution, which is better than the opposite. Still, consider some more Immutable loads, esp. related to Bigarrays. Also, reading the header word of a block should be Immutable! (Obj.truncate notwithstanding.)
  • In Lambda, why not Pfield of int * mutable_flag for symmetry with Pmakeblock ?
  • Asttypes.Immutable is quite a mouthful. Do yourself a favor and open Asttypes.

@mshinwell
Copy link
Contributor Author

Yes, this one should probably not be a prerequisite for flambda actually. Thanks for the other comments; I will make Cmmgen less conservative and get some numbers on other benchmarks. (I'm not sure that KB is within noise actually, although I take your point.)

@mshinwell
Copy link
Contributor Author

Oh, and as regards header words---I thought we couldn't assume that in the compiler due to Obj.truncate.

@xavierleroy
Copy link
Contributor

Concerning Obj.truncate and CSE, it is a runtime function, not an inlined primitive, and we don't CSE across function calls anyway (because it hurts register allocation most of the time). Obj.set_field, on the other hand, is a lot more problematic...

@gasche
Copy link
Member

gasche commented Dec 22, 2015

I thought about doing this kind of CSE a while ago but remembered that it would break in the presence of tail-recursion modulo constructors, either done by the compiler (in the future) or by Obj witchcraft (as in Batteries).

The unsafe cast for destination-passing lists in Batteries is used to coerce a variable from a mutable type to an immutable one. In particular, no value is created at an immutable type then coerced to mutable, so there is no read on the immutable value that could persist across a mutation (as no mutation are ever done after the conversion is made).

There is another part of Batteries that coerces a variable at an immutable type to a mutable type, and then does a mutation. We are careful never to create value at the immutable type (they are create at the mutable type, then coerced). In any case, the soundness of this unsafe write is justified by the fact that it does not change the value semantically (it is a rebalancing in a splay tree), so in particular optimizing a read to a stale value (ignoring a rebalancing) is sound and correct, it is just less efficient in one particular case. (To avoid this, we would need to refine the analysis to preserve the immutable loads at a store, but invalidate those that can be proved to be at the same location; or to not propagate "immutable" type information if a value that can be proved equal is typed "mutable" in another part of the program. I don't know precisely which kind of type-directed propagation of mutability the compiler does today (as opposed to mutability of constructors or fields), there may be none at all.)

@damiendoligez damiendoligez added this to the 4.04-or-later milestone Jan 22, 2016
@damiendoligez damiendoligez removed this from the 4.04 milestone Aug 2, 2016
@xavierleroy
Copy link
Contributor

This pull request has been asleep for more than a year. What do we do with it? I note that part of this PR (the mutability information over loads in Cmm) is already in 4.05.

@mshinwell
Copy link
Contributor Author

A very conservative version of the mutability-in-Cmm stuff is in 4.05, yes. I need to discuss a couple of things with people about the remainder of this patch and will update this pull request again shortly.

@mshinwell
Copy link
Contributor Author

@let-def and I discussed the interaction between this kind of CSE and TRMC. The conclusion was that it would be safe.

I will clean up this patch to remove the parts that have been superceded by the mutability annotations now in trunk.

As far as header words go: @stedolan would like to prohibit the changing of the tag on existing blocks. A new function would be provided (if I remember correctly) to copy a block and give the copy a new tag. @alainfrisch I seem to remember you make some use of tag changing: would such a function suffice for your purposes? If we could do this then I think we can CSE loads of header words.

@mshinwell
Copy link
Contributor Author

@alainfrisch Ping, on the tag changing thing above.

@alainfrisch
Copy link
Contributor

would such a function suffice for your purposes

Isn't it the case that since Obj.set_tag refers to a C function and is not an inlined primitive, we don't do CSE anyway through it? This is what a previous note from @xavierleroy suggests (about truncate). So I'm not sure to understand the problem.

Anyway, it would have to use a function that copies a block and changes its tag, instead of set_tag, this would create some extra allocations in rather critical parts, but I don't want our use of undocumented hacks to stop progress, so please consider we are ok with the change.

FWIW, our main use of set_tag is to set Object_tag on freshly created records whose second field is an integer (most often a unique id). This is to be able to use the generic comparison/hash functions on complex data structures referring to those records. We could always patch our local compiler so that a specific attribute on a record type declaration would have the effect of using Object_tag instead of 0 when creating values of this type. This would be more efficient than the current situation.

@damiendoligez damiendoligez added this to the 4.07-or-later milestone Sep 27, 2017
@stedolan
Copy link
Contributor

stedolan commented Oct 2, 2017

@alainfrisch Would it be acceptable for there to be a new primitive Obj.with_tag which copies a block and changes its tag, along with an optimisation that turns (Obj.with_tag t1 (makeblock t2 (...))) into (makeblock t1 (...))? This way, we wouldn't have to worry about tags changing at runtime, but the pattern of changing the tag of a just-created record would continue to involve no extra allocations.

I care about this for multicore, where Obj.set_tag even less safe than it is currently, since other threads doing GC may access the tag in parallel.

@alainfrisch
Copy link
Contributor

Don't worry about our use cases. As said, if what we do currently is no longer available, we will just add the ability to choose explicitly the tag when declaring the record type. We already maintain a custom version of the compiler and this extra patch seems tiny enough.

@damiendoligez damiendoligez removed this from the consider-for-4.07 milestone Jun 1, 2018
@xavierleroy
Copy link
Contributor

Could we get an update on this PR? For the record, Obj.with_tag has been available since 4.09, and Obj.set_tag is deprecated.

@damiendoligez
Copy link
Member

@mshinwell ping for benchmarks

anmolsahoo25 pushed a commit to anmolsahoo25/ocaml that referenced this pull request Aug 25, 2020
mshinwell added a commit to mshinwell/ocaml that referenced this pull request Mar 26, 2021
@gasche
Copy link
Member

gasche commented Apr 17, 2021

This PR appears to be stale since at least 2017 and has been independently reimplemented by #9562. I propose to close it in a few days, unless someone complains.

@xavierleroy
Copy link
Contributor

This PR (#367) includes propagation of mutability information down to Cmm which was reimplemented later in #966 and merged in trunk. So, a large part of this PR is obsolete. #9562 starts with a Cmm contaning mutability information on loads, so it's a lot closer to the current code base. I move to close this PR (#367).

@gasche
Copy link
Member

gasche commented Apr 18, 2021

Wish granted.

@gasche gasche closed this Apr 18, 2021
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
* model the margin bottom parameter
* remove webpack fs shim, not needed now
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
Signed-off-by: Christine Rose <professor.rose@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants