Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[comments only] matching.ml: explain the compilation strategy for switches on constructors #10512

Merged

Conversation

gasche
Copy link
Member

@gasche gasche commented Jul 15, 2021

I'm working on constructor unboxing with @nchataing, and this involves looking at the lambda/matching.ml machinery. There are some decisions / design choices around the compilation of switches that are not obvious -- I understood them after trying some changes and realizing they were not ideal. This comments-only PR tries to document and explain the current state of this code.

cc @maranget @xavierleroy

In one case there is an optimization that conflicts with the general explanation I propose -- that we want to avoid generating several bytecode instructions. I believe that this optimization could and probably should be removed, but the details are a bit subtle and I left this idea for a future PR.

@gasche
Copy link
Member Author

gasche commented Jul 15, 2021

P.S.: One reason for sending this PR is that having a documentation of the current code would help our implementation (with @nchataing) of matching on unboxed constructors. Having would have helped us in the past, but now we can update the documentation as we change things, it serves as a useful sanity check, and it will also help reviewers of our work.

@gasche gasche force-pushed the constructor-matching-compilation-comment branch from dd72c8f to 93b78ad Compare July 15, 2021 11:25
@gasche
Copy link
Member Author

gasche commented Jul 16, 2021

After more tinkering, I believe that the idea of preferring switch in bytecode is not accurate: in very-sparse cases where most of the cases on one of the two domains are missing, switch generates a large bytecode instruction that lists many exit labels, when a test followed by a better compilation runs slightly slower but can reduce bytecode size significantly.

An edge case of interest is when cstr_nonconsts is large (many block constructors exist at this type), but consts = [] (none of them are matched in this pattern); this occurs several times in the compiler codebase due to programs of the form match token with SEMISEMI -> true | _ -> false. In this edge case, the current optimization when same_actions nonconsts = Some act generates better bytecode than a switch -- otherwise it is worse, but the other cases happen rarely. I don't know whether this is intentional.

This sounds like something to discuss with @maranget.

@gasche gasche force-pushed the constructor-matching-compilation-comment branch from 93b78ad to d6a0f14 Compare September 14, 2021 21:09
@gasche
Copy link
Member Author

gasche commented Sep 14, 2021

I tweaked the comments to explain why we probably want the test-generating approach when all non-constant actions are shared. I discussed the matter with @maranget yesterday, and would be interested in a review :-)

@gasche gasche force-pushed the constructor-matching-compilation-comment branch from d6a0f14 to affe759 Compare September 14, 2021 21:13
@gasche
Copy link
Member Author

gasche commented Oct 2, 2021

I'm still looking for reviews (in particular approvals!) on this comment-only PR. @maranget (or @trefis maybe), would you be interested?

@trefis
Copy link
Contributor

trefis commented Oct 4, 2021

(I had a look, but am not knowledgeable enough to approve or reject)

@lthls
Copy link
Contributor

lthls commented Oct 4, 2021

One small comment: despite the message (Optimizations that only work well for one backend should be done in the backend.), these comments only discuss bytecode compilation. In particular, the optimisation that compiles matches on lists or options to a single Lifthenelse is bad for the native backend, and I wish this was documented better.

@gasche
Copy link
Member Author

gasche commented Oct 4, 2021

@lthls can you explain why this optimization would be bad for the native backend? This is news to me.

@lthls
Copy link
Contributor

lthls commented Oct 4, 2021

First, with the native backend if x then ... is compiled to if x == 1 then ... while if isint x then ... is compiled to if x & 1 then .... Which one is better depends on the architecture, as the equality can be encoded in most architectures with a single instruction, but on x86 at least the use of byte registers can make the encoding of x & 1 more compact.
I don't think it makes much of a difference in the end, but generating Lswitch instead of Lifthenelse would in theory allow the backend to choose the best implementation.

But the real issue is that it introduces non-boolean arguments to Lifthenelse, and this in turns disables some optimisations in the backend. As an example, Cmmgen.transl_if has code to translate if x then true else false to x, but it has to be disabled in most cases because unless x is a primitive operation that is guaranteed to return a boolean, we can't be sure that the optimisation is correct.
This gets particularly annoying with Flambda 2, so we've disabled non-boolean Lifthenelse generation when the backend is Flambda 2 (though that's not relevant here yet).

@gasche
Copy link
Member Author

gasche commented Oct 4, 2021

Two remarks:

  • It would be easy to check Clflags.native to change the code generation on native-code; why don't you suggest this if you think it would improve things? (My intuition is that maybe something could be improved in the backend, to better results, but maybe only generating Lswitch would still be a right default.)
  • I think another point on Lifthenelse is that the constant/affine optimization only happens in the backend on Lswitch, not Lifthenelse; but then again maybe this could be fixed in the native backend, I'm not sure what is the right level to change/improve things.

But the real issue is that it introduces non-boolean arguments to Lifthenelse

We could introduce an iszero primitive to generate Lifthenelse(iszero(arg), e1, e2).

@lthls
Copy link
Contributor

lthls commented Oct 4, 2021

It would be easy to check Clflags.native to change the code generation on native-code; why don't you suggest this if you think it would improve things? (My intuition is that maybe something could be improved in the backend, to better results, but maybe only generating Lswitch would still be a right default.)

I'll see if I can make a PR out of our relevant patches. But I think that your original comment that backend-specific optimisations should be done in the relevant backend is correct, and that moving some of the code in matching.ml to bytegen.ml would make sense too.

We could introduce an iszero primitive to generate Lifthenelse(iszero(arg), e1, e2).

For the native backend at least, keeping the original Lswitch is better because it has more information. If that doesn't work, a primitive isnonzero could be helpful indeed.

@gasche
Copy link
Member Author

gasche commented Mar 5, 2023

I rebased the PR on top of the current trunk. @lthls, would you mind having another look?

My understanding is that @lthls objections to the documentation were mostly fixed by his own follow-up PR #10681. In that PR we optimized native code generation slightly more, and we simplified the assumptions between bytecomp/matching and the backend. One place remains in matching.ml where the interests of bytecode and native compilations diverge, but this place is clearly marked by a Clflags.native_code conditional with explanatory comments.

All this to say: to me, the rebased PR is in a good state and the proposed documentation improves the compiler codebase.

@gasche gasche force-pushed the constructor-matching-compilation-comment branch from 454593a to 5697dfd Compare March 5, 2023 13:09
@gasche gasche force-pushed the constructor-matching-compilation-comment branch from 5697dfd to e870586 Compare March 5, 2023 13:10
Copy link
Contributor

@lthls lthls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still annoyed to have a comment saying that backend-specific optimisations should stay in their respective backends immediately followed by two bytecode-specific optimisations, but I can live with that. I may propose a PR moving bytecode-specific optimisations to the bytecode backend, where they belong, later.

(* This case is very frequent, it corresponds to
options and lists. *)
(* Keeping the Pisint test would make the bytecode
slightly worse, but it lets the native compilre generate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
slightly worse, but it lets the native compilre generate
slightly worse, but it lets the native compiler generate

@gasche
Copy link
Member Author

gasche commented Mar 5, 2023

Thanks! I like how productive you are when my comments annoy you. I should write more.

(I'll merge now and fix the typo later, because.)

@gasche gasche merged commit 6a5d7ab into ocaml:trunk Mar 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants