Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline and specialize Stdlib.{min,max} #10251

Closed
wants to merge 7 commits into from

Conversation

alainfrisch
Copy link
Contributor

@alainfrisch alainfrisch commented Feb 24, 2021

Fix #5541, fix #4808.

Stdlib.{min,max} are currently not inlined; this is because their size is above the default inlining threshold. Compiling Stdlib with -inline 2 would be enough to have them inlined, but since type-specialization of primitives happen before inlining, one ends up with a C call to the generic comparison, even if arguments are integers or floats.

This PR implements Stdlib.{min,max} with two new %-primitives %min, %max, applies the same specialization as for comparison primitives, then expands them to their definition when generating the Lambda code. (No new runtime primitive has been added, so no need to adapt js_of_ocaml for instance.)

Some micro-bencharks:

(* int64 *)
let r = ref 0L in for i = 1 to 1_000_000_000 do r := max !r (Int64.of_int i) done
(* float *)
let r = ref 0. in for i = 1 to 1_000_000_000 do r := max !r (float i) done
(* int *)
let r = ref 0 in for i = 1 to 1_000_000_000 do r := max !r i done
(* string *)
let r = ref "" in for i = 1 to 1_000_000_000 do r := max !r (if i mod 2 = 1 then "Hello" else "World") done
trunk PR speedup
int64 15s 0.875s x17
float 13.6s 0.98s x13
int 6.0s 0.48s x12
string 16.6s 5.1s x3

No noticeable impact for non-specialized cases.

Notes for the reviewer:

  • The first two commits are general cleanup on the primitive specialization code; they potentially lose some sharing of primitive descriptions in lambda code, but I suspect this is completely negligible.
  • The real stuff are the third and fifth commits, with a bootstrap of the compiler in-between.
  • There is no explicit performance test (I haven't seen any other test doing explicit time measurement).

Also, one could consider further improvements:

  • Better code generation in the backend (a branchless implementation for integers, for instance).
  • Explicit type specialized versions {Int,Float,...}.{min,max}

@alainfrisch
Copy link
Contributor Author

@xavierleroy : since you were the one suspending #5541 do you have an opinion on this PR? ("I'm not expecting a solution soon, which is why I'm putting this PR in suspended state." => indeed, that was 9 years ago!)

@xavierleroy
Copy link
Contributor

#5541 was 9 years ago... What I meant at the time is that type specialization after inlining the (OCaml) definition of min and max was far away in the future, and still is (?). You sidestep the problem by turning min and max into primitives. On the one hand it does the job. On the other hand, it feels sad that operations that are perfectly definable in OCaml are being turned into primitives just to get better optimization.

@alainfrisch
Copy link
Contributor Author

I perfectly agree, and I'll be happier with an inliner that would having to do that. But it also feels sad to know that the polymorphic comparison will be involved when min/max are used on simple numbers. Do you think the workaround is decent enough? If the inliner ever gets better, we can always get rid of it. Or perhaps one should just expose specialized functions {Int,Int32,Int64,Float}.{min,max} (and deprecate the functions in Stdlib?).

(One could argue that not is also definable in OCaml! And all comparison could be derived from the single %compare operation -- except that's not really the case, but oh well.)

@alainfrisch
Copy link
Contributor Author

@xavierleroy : what do you think we should do here? Merge this PR, expose explicit functions like Int.{min,max} (also in Float, Int64, etc), accept the bad performance when people use min/max on numbers until we get a better inliner?

@xavierleroy
Copy link
Contributor

I'm still not enthusiastic about adding "min" and "max" as primitives with specific rules for type specialization.

Half of the dev team wants to kill polymorphic equality. While I stand up for polymorphic equality, I agree that polymorphic "min" and "max" are questionable and it would make sense to remove them. Adding special compiler support for them goes against this grain.

Having specialized min and max functions in Float, Int, Int32, etc, would make more sense to me. Plus, for Float.min and Float.max we'll get to argue about their semantics w.r.t. NaN.

@dbuenzli
Copy link
Contributor

Plus, for Float.min and Float.max we'll get to argue about their semantics w.r.t. NaN.

That already happened, the discussion starts here.

@xavierleroy
Copy link
Contributor

xavierleroy commented Apr 30, 2021

Plus, for Float.min and Float.max we'll get to argue about their semantics w.r.t. NaN.

That already happened, the discussion starts here.

Right, I forgot about it. The current semantics (return NaN as soon as one of the arguments is NaN) looks healthy to me (*). IEEE 754-2008 made different choices, which were partially reverted in IEEE 754-2019, leaving me confused.

(*) Even though it doesn't map to any "fmin" or "fmax" processor instruction I known of. so don't expect a super fast implementation any time soon.

@xavierleroy
Copy link
Contributor

If anyone is interested, here is a lot of background info on FP min/max: http://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/minNum_maxNum_Removal_Demotion_v3.pdf

@dbuenzli
Copy link
Contributor

(*) Even though it doesn't map to any "fmin" or "fmax" processor instruction I known of. so don't expect a super fast implementation any time soon.

Note that this PR also introduced the equivalent to the C99 fmin and fmax (NaN as missing data) under the names Float.{min,max}_num.

@alainfrisch
Copy link
Contributor Author

Having specialized min and max functions in Float, Int, Int32, etc, would make more sense to me.

Ok, this makes sense! I've opened #10389 accordingly.

The first two commits are general code cleanup (which was useful to implement this PR), but I don't think it's worth spending reviewing time on them, so I'm closing this PR.

@alainfrisch alainfrisch closed this May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pervasives.min/max are not inlined better compilation of min and max
3 participants