Inline and specialize Stdlib.{min,max} #10251

alainfrisch · 2021-02-24T11:35:13Z

Stdlib.{min,max} are currently not inlined; this is because their size is above the default inlining threshold. Compiling Stdlib with -inline 2 would be enough to have them inlined, but since type-specialization of primitives happen before inlining, one ends up with a C call to the generic comparison, even if arguments are integers or floats.

This PR implements Stdlib.{min,max} with two new %-primitives %min, %max, applies the same specialization as for comparison primitives, then expands them to their definition when generating the Lambda code. (No new runtime primitive has been added, so no need to adapt js_of_ocaml for instance.)

Some micro-bencharks:

(* int64 *)
let r = ref 0L in for i = 1 to 1_000_000_000 do r := max !r (Int64.of_int i) done
(* float *)
let r = ref 0. in for i = 1 to 1_000_000_000 do r := max !r (float i) done
(* int *)
let r = ref 0 in for i = 1 to 1_000_000_000 do r := max !r i done
(* string *)
let r = ref "" in for i = 1 to 1_000_000_000 do r := max !r (if i mod 2 = 1 then "Hello" else "World") done

	trunk	PR	speedup
int64	15s	0.875s	x17
float	13.6s	0.98s	x13
int	6.0s	0.48s	x12
string	16.6s	5.1s	x3

No noticeable impact for non-specialized cases.

Notes for the reviewer:

The first two commits are general cleanup on the primitive specialization code; they potentially lose some sharing of primitive descriptions in lambda code, but I suspect this is completely negligible.
The real stuff are the third and fifth commits, with a bootstrap of the compiler in-between.
There is no explicit performance test (I haven't seen any other test doing explicit time measurement).

Also, one could consider further improvements:

Better code generation in the backend (a branchless implementation for integers, for instance).
Explicit type specialized versions {Int,Float,...}.{min,max}

alainfrisch · 2021-03-08T08:17:49Z

@xavierleroy : since you were the one suspending #5541 do you have an opinion on this PR? ("I'm not expecting a solution soon, which is why I'm putting this PR in suspended state." => indeed, that was 9 years ago!)

xavierleroy · 2021-03-08T18:18:31Z

#5541 was 9 years ago... What I meant at the time is that type specialization after inlining the (OCaml) definition of min and max was far away in the future, and still is (?). You sidestep the problem by turning min and max into primitives. On the one hand it does the job. On the other hand, it feels sad that operations that are perfectly definable in OCaml are being turned into primitives just to get better optimization.

alainfrisch · 2021-03-08T20:09:04Z

I perfectly agree, and I'll be happier with an inliner that would having to do that. But it also feels sad to know that the polymorphic comparison will be involved when min/max are used on simple numbers. Do you think the workaround is decent enough? If the inliner ever gets better, we can always get rid of it. Or perhaps one should just expose specialized functions {Int,Int32,Int64,Float}.{min,max} (and deprecate the functions in Stdlib?).

(One could argue that not is also definable in OCaml! And all comparison could be derived from the single %compare operation -- except that's not really the case, but oh well.)

alainfrisch · 2021-04-28T20:35:20Z

@xavierleroy : what do you think we should do here? Merge this PR, expose explicit functions like Int.{min,max} (also in Float, Int64, etc), accept the bad performance when people use min/max on numbers until we get a better inliner?

xavierleroy · 2021-04-29T17:41:38Z

I'm still not enthusiastic about adding "min" and "max" as primitives with specific rules for type specialization.

Half of the dev team wants to kill polymorphic equality. While I stand up for polymorphic equality, I agree that polymorphic "min" and "max" are questionable and it would make sense to remove them. Adding special compiler support for them goes against this grain.

Having specialized min and max functions in Float, Int, Int32, etc, would make more sense to me. Plus, for Float.min and Float.max we'll get to argue about their semantics w.r.t. NaN.

dbuenzli · 2021-04-29T18:49:20Z

Plus, for Float.min and Float.max we'll get to argue about their semantics w.r.t. NaN.

That already happened, the discussion starts here.

xavierleroy · 2021-04-30T16:19:13Z

Plus, for Float.min and Float.max we'll get to argue about their semantics w.r.t. NaN.

That already happened, the discussion starts here.

Right, I forgot about it. The current semantics (return NaN as soon as one of the arguments is NaN) looks healthy to me (*). IEEE 754-2008 made different choices, which were partially reverted in IEEE 754-2019, leaving me confused.

(*) Even though it doesn't map to any "fmin" or "fmax" processor instruction I known of. so don't expect a super fast implementation any time soon.

xavierleroy · 2021-04-30T16:27:16Z

If anyone is interested, here is a lot of background info on FP min/max: http://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/minNum_maxNum_Removal_Demotion_v3.pdf

dbuenzli · 2021-04-30T16:37:29Z

(*) Even though it doesn't map to any "fmin" or "fmax" processor instruction I known of. so don't expect a super fast implementation any time soon.

Note that this PR also introduced the equivalent to the C99 fmin and fmax (NaN as missing data) under the names Float.{min,max}_num.

alainfrisch · 2021-05-03T07:58:06Z

Having specialized min and max functions in Float, Int, Int32, etc, would make more sense to me.

Ok, this makes sense! I've opened #10389 accordingly.

The first two commits are general code cleanup (which was useful to implement this PR), but I don't think it's worth spending reviewing time on them, so I'm closing this PR.

Alain Frisch added 7 commits March 8, 2021 09:11

Code cleanup

3494c4f

Drop (probably premature) optimization to simplify the code

35c18c4

Add %min and %max internal primitives

65c2e36

Bootstrap

9e3e9ba

Implement Stdlib.{min,max} using the new primitives

5f78c73

Add test

f71299e

Changelog

90cf5e7

alainfrisch force-pushed the afrisch_inline_min_max branch from 386c1d6 to 90cf5e7 Compare March 8, 2021 08:12

alainfrisch mentioned this pull request May 3, 2021

Expose {Int,Int32,Int64,Nativeint}.{min,max} #10389

Closed

alainfrisch closed this May 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inline and specialize Stdlib.{min,max} #10251

Inline and specialize Stdlib.{min,max} #10251

alainfrisch commented Feb 24, 2021 •

edited

alainfrisch commented Mar 8, 2021

xavierleroy commented Mar 8, 2021

alainfrisch commented Mar 8, 2021

alainfrisch commented Apr 28, 2021

xavierleroy commented Apr 29, 2021

dbuenzli commented Apr 29, 2021

xavierleroy commented Apr 30, 2021 •

edited

xavierleroy commented Apr 30, 2021

dbuenzli commented Apr 30, 2021

alainfrisch commented May 3, 2021

Inline and specialize Stdlib.{min,max} #10251

Inline and specialize Stdlib.{min,max} #10251

Conversation

alainfrisch commented Feb 24, 2021 • edited

alainfrisch commented Mar 8, 2021

xavierleroy commented Mar 8, 2021

alainfrisch commented Mar 8, 2021

alainfrisch commented Apr 28, 2021

xavierleroy commented Apr 29, 2021

dbuenzli commented Apr 29, 2021

xavierleroy commented Apr 30, 2021 • edited

xavierleroy commented Apr 30, 2021

dbuenzli commented Apr 30, 2021

alainfrisch commented May 3, 2021

alainfrisch commented Feb 24, 2021 •

edited

xavierleroy commented Apr 30, 2021 •

edited