Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve code-generation for inlined comparisons #10228

Merged
merged 3 commits into from Feb 22, 2021

Conversation

stedolan
Copy link
Contributor

Non-Flambda compilers currently generate terrible code when functions involving comparisons get inlined:

let int_eq (a : int) (b : int) = (a = b)
let pair_eq (a0, a1) (b0, b1) =
  int_eq a0 b0 && int_eq a1 b1
let same p q =
  if pair_eq p q then "same" else "diff"

The amd64 assembly for "same" is:

	movq	%rax, %rdi
	movq	(%rbx), %rax
	movq	(%rdi), %rsi
	cmpq	%rax, %rsi
	sete	%al
	movzbq	%al, %rax
	leaq	1(%rax,%rax), %rax
	cmpq	$1, %rax
	je	.L106
	movq	8(%rbx), %rax
	movq	8(%rdi), %rbx
	cmpq	%rax, %rbx
	sete	%al
	movzbq	%al, %rax
	leaq	1(%rax,%rax), %rax
	cmpq	$1, %rax
	je	.L106
	movq	camlBr__1@GOTPCREL(%rip), %rax
	ret
	.align	4
.L106:
	movq	camlBr__2@GOTPCREL(%rip), %rax
	ret

The cmpq/sete/movzbq/leaq/cmpq sequence is: do the comparison, convert the result to an integer, tag the integer, compare the tagged integer with zero.

The reason this is so longwinded is that after inlining, a Clambda expression of the form if (let x = ... in ... = ...) then ... else ... arises. The Cmm generator doesn't know what to do with a let expression as a conditional, and gives up.

The Cmm generator already has lots of optimisations for particular forms of if. For instance, this is why the code generated for && above is good, involving direct branching and no conversion / tagging. This patch adds another case to this optimisation, to handle if let, converting:

if (let x = E in COND) then THEN else ELSE
==>
let x = E in (if COND then THEN else ELSE)

(In normal OCaml, this is slightly dodgy transformation as you might shadow some other x. In Cmm, names are unique so it's fine)

With this transformation, the generated code is:

	movq	(%rbx), %rdi
	movq	(%rax), %rsi
	cmpq	%rdi, %rsi
	jne	.L106
	movq	8(%rbx), %rbx
	movq	8(%rax), %rax
	cmpq	%rbx, %rax
	jne	.L106
	movq	camlBr__1@GOTPCREL(%rip), %rax
	ret
	.align	4
.L106:
	movq	camlBr__2@GOTPCREL(%rip), %rax
	ret

@gasche
Copy link
Member

gasche commented Feb 16, 2021

Doesn't the same problem come up with arithmetic optimizations, etc.? (I wonder if it would be possible to keep track of an "application context" in the inlining logic, and extrude inlined let-bindings out of the application context.)

@chambart
Copy link
Contributor

I didn't try this example, but I suppose that the reason that flambda handles that properly is due to the un-ANF pass. This kind of problems would arise everywhere in a term in ANF style. So we had to add this pass to 'compact' every terms. This is a clambda to clambda pass. So it might be possible to simply apply that pass for the non-flambda backend too.

Since I didn't check, it might be necessary to turn the code in ANF before applying un-ANF for it to be efficient in that kind of cases (to lift the let out of the if condition)

@chambart
Copy link
Contributor

Just to be clear, I don't suggest we do that, this fix seems to be sufficient for that kind of cases. It might just handle a few more situations that we didn't notice yet. But they probably don't appear very often.

@chambart
Copy link
Contributor

By the way I did read the diff, It is ok.

@stedolan
Copy link
Contributor Author

Doesn't the same problem come up with arithmetic optimizations, etc.? (I wonder if it would be possible to keep track of an "application context" in the inlining logic, and extrude inlined let-bindings out of the application context.)

Yes, it does. For instance, this generates two additions rather than one:

let plus1 x = x + 1
let plus2 x = plus1 (plus1 x)

But conditionals are much worse: it's expensive to get the result of a conditional into a register, since you need to extract flags + sign extend + tag. If you're just going to compare again, that's a lot of wasted work / code size. Missing an opportunity for folding some arithmetic together is also bad, but not nearly as bad.

@chambart I don't think Un_anf will necessarily help here. I want to hoist let out of if even when it is not used linearly:

if (let x = !r in x.foo = x.bar) then ...
==>
let x = !r in if x.foo < x.bar then .. .

If I understand correctly, the only thing that Un_anf can do is to remove the let by substituting it away, which isn't a good idea here (I don't want to duplicate the load).

(However, I agree that applying Un_anf to Clambda code generated by closure.ml might well be a good idea anyway)

@chambart
Copy link
Contributor

chambart commented Feb 16, 2021

@chambart I don't think Un_anf will necessarily help here. I want to hoist let out of if even when it is not used linearly:

That's why I think the roundtrip to ANF and then back to compact expression might be useful. That's kind of the point of The essence of compiling with continuation. Well normaly you don't need to go back to expression form, but the way selectgen works requires it.

begin match str, boxed_number with
| Immutable, _ -> Clet (v, cexp, body)
| Mutable, bn -> Clet_mut (v, typ_of_boxed_number bn, cexp, body)
end

and transl_let env str kind id exp body =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep the name transl_let for what is now called transl_let' and call it directly from the only call site (for dealing with Ulet).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely a good idea to lift "let" bindings in this case. There may be other cases, we'll see later. On the other hand, full A-normalization is probably too costly and too risky (it can increase lifetimes of heap-allocated data, if I'm not mistaken).

I agree with @alainfrisch's suggestion to not introduce transl_let' and just change the interface of transl_let.

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I pushed a Changes update directly and will now merge.

@xavierleroy xavierleroy merged commit 000564e into ocaml:trunk Feb 22, 2021
garrigue pushed a commit to garrigue/ocaml that referenced this pull request Mar 3, 2021
Compile `if (let x = E in COND) then IFSO else IFNOT` like `let x = E in if COND then IFSO else IFNOT`.
smuenzel pushed a commit to smuenzel/ocaml that referenced this pull request Mar 30, 2021
Compile `if (let x = E in COND) then IFSO else IFNOT` like `let x = E in if COND then IFSO else IFNOT`.
stedolan added a commit to stedolan/ocaml that referenced this pull request May 24, 2022
…28) (ocaml#563)

* Improve code-generation for inlined comparisons (ocaml#10228)

Compile `if (let x = E in COND) then IFSO else IFNOT` like `let x = E in if COND then IFSO else IFNOT`.

Co-authored-by: Stephen Dolan <sdolan@janestreet.com>
lpw25 pushed a commit to lpw25/ocaml that referenced this pull request Jun 21, 2022
…t upstream PR#10228) (ocaml#563)

* Improve code-generation for inlined comparisons (ocaml#10228)

Compile `if (let x = E in COND) then IFSO else IFNOT` like `let x = E in if COND then IFSO else IFNOT`.

Co-authored-by: Stephen Dolan <sdolan@janestreet.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants