Slower compilation 4.07 => 4.08 #8776

alainfrisch · 2019-06-28T13:57:24Z

This is to document one case of significant degradation in compilation time between 4.07 and 4.08, on a synthetic example (originally created to stress-tess #8774). In case someone wants to investigate further...

Results:

4.07: 10s
4.08, trunk: 19s

Reproduction:

After a make world, run:

 boot/ocamlrun ./ocaml -I stdlib gen.ml > bar.ml && time boot/ocamlrun ./ocamlc -I stdlib -c bar.ml

with gen.ml:

let n1 = 200
let n2 = 200
let n3 = 40

let () =
  let f () =
    for j = 1 to n2 do
      Printf.printf "method f%i = ()\n" j
    done
  in

  print_endline "type t = A: 'a -> t";
  for i = 1 to n3 do
    Printf.printf "let a%i = object\n" i;
    f ();
    Printf.printf "end\n"
  done;
  print_endline "let f x =";
  for i = 1 to n1 do
    print_endline "match x with A _ ->";
  done;
  for i = 1 to n3 do
    Printf.printf "a%i," i;
  done;
  print_endline "()"

The text was updated successfully, but these errors were encountered:

nojb · 2019-06-28T14:09:38Z

-dtimings output shows a big increase in the lambda generation phase:
4.07.1:

8.581s bar.ml
  0.084s parsing
    0.084s parser
  8.141s typing
  0.196s transl
  0.157s generate
  0.003s other
0.006s other

4.08.0:

16.359s bar.ml
  00.102s parsing
    00.102s parser
  08.271s typing
  07.814s transl
  00.172s generate
00.012s other

mshinwell · 2019-06-28T14:48:06Z

I think at least some of this might be coming from the calls to Typeopt.value_kind (and hence Subst.typexp) introduced in simplify_cases in matching.ml in #2156.

nojb · 2019-06-28T14:50:30Z

Am not sure, if I didn't make a mistake, the same slowdown (worse in fact) can be obtained without any pattern matching at all:

let n2 = 200
let n3 = 40

let () =
  let f () =
    for j = 1 to n2 do
      Printf.printf "method f%i = ()\n" j
    done
  in
  for i = 1 to n3 do
    Printf.printf "let a%i = object\n" i;
    f ();
    Printf.printf "end\n"
  done

mshinwell · 2019-06-28T15:03:10Z

@nojb I see the same degradation in the same function with your second example (indeed it seems worse too).

mshinwell · 2019-06-28T15:05:01Z

(gdb) inf break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00000000007e55c0 bytecomp/matching.ml:593
	breakpoint already hit 8000 times
        c

lpw25 · 2019-06-28T15:15:58Z

Looks like Typeopt.scrape_ty is calling Ctype.correct_levels which means it is making a completely unnecessary copy of the type. It should be adjusted to only do that when there is actually something to expand.

trefis · 2019-06-28T15:23:58Z

It should be adjusted to only do that when there is actually something to expand.

Or it could just not do the copy: it's only there to update the level so the expansion won't fail. But we could also have a version of the expansion that doesn't check the levels (we're past typechecking at this point, so all this doesn't matter anymore)

alainfrisch · 2019-06-28T16:03:32Z

I confirm that not calling correct_levels removes a good part of the observed slowdown. The call to correct_levels was there already in the first version of typeopt.ml (ea8fe59#diff-5ad81800134284c48d42f76b958990b3), but it is called much more often now, so it is worth optimizing it.

My understanding is that, if we don't introduce a variant of the expansion that ignores levels, not calling correct_levels can only miss some optimizations (but not fail); is that right?

trefis · 2019-06-28T16:07:47Z

My understanding is that, if we don't introduce a variant of the expansion that ignores levels, not calling correct_levels can only miss some optimizations (but not fail); is that right?

That sounds right yes.

lpw25 · 2019-06-28T16:16:11Z

not calling correct_levels can only miss some optimizations (but not fail); is that right?

That's right, but when an optimisation was missed would be very unpredictable and depend on things like if -principal was used. So it'd be better to try and get this right. I think only calling correct_levels and expand_head_opt when the type_desc is actually Tconstr would probably be both correct and quick in most cases.

kit-ty-kate · 2019-07-24T13:07:33Z

Any news on this issue before the release of 4.08.1?

gasche · 2019-07-24T13:22:45Z

To my best knowledge the answer is "no". (But I would be surprised if that was the cause of the +164% increase reported #8811, which we also don't know.)

mshinwell · 2020-04-16T16:31:52Z

@alainfrisch Would you like to bring this PR to a conclusion?

alainfrisch · 2020-04-16T17:45:40Z

(This is an issue, not a PR.) Reading the discussion again, it seems the slowdown is confirmed, and some possible improvements that could improve performance significantly have been identified but not implemented. Unless I missed something, it sounds reasonable to keep the issue open.

github-actions · 2021-04-19T04:27:16Z

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

gasche · 2021-04-19T05:42:52Z

I can reproduce the slowdown for recent versions of OCaml:

$ for ver in 4.07.1 4.08.0 4.09.0 4.10.0 4.11.1 4.12.0; do
     echo -n "$ver: ";
     opam exec --switch=$ver -- ocamlc -c -dtimings prog.ml | grep prog.ml;
   done
4.07.1: 0.468s prog.ml
4.08.0: 1.979s prog.ml
4.09.0: 1.924s prog.ml
4.10.0: 2.050s prog.ml
4.11.1: 1.933s prog.ml
4.12.0: 1.997s prog.ml

nojb mentioned this issue Jul 17, 2019

4.08 switch creation time on raspian - +164% ! #8811

Closed

nojb mentioned this issue Feb 22, 2020

Performance regression in OCaml 4.08 #9326

Closed

github-actions bot added the Stale label Apr 19, 2021

github-actions bot removed the Stale label Apr 21, 2021

gasche mentioned this issue Jun 30, 2021

Rewrite get_unboxed_type_representation using a terminating algorithm instead of fuel. #10479

Draft

nojb mentioned this issue Sep 9, 2021

Fix performance regression introduced in 4.08 #10624

Merged

nojb closed this as completed in #10624 Sep 10, 2021

gasche mentioned this issue Nov 6, 2021

Add value kinds on blocks #10716

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slower compilation 4.07 => 4.08 #8776

Slower compilation 4.07 => 4.08 #8776

alainfrisch commented Jun 28, 2019

nojb commented Jun 28, 2019

mshinwell commented Jun 28, 2019

nojb commented Jun 28, 2019

mshinwell commented Jun 28, 2019

mshinwell commented Jun 28, 2019

lpw25 commented Jun 28, 2019

trefis commented Jun 28, 2019

alainfrisch commented Jun 28, 2019 •

edited

trefis commented Jun 28, 2019

lpw25 commented Jun 28, 2019

kit-ty-kate commented Jul 24, 2019

gasche commented Jul 24, 2019 •

edited

mshinwell commented Apr 16, 2020

alainfrisch commented Apr 16, 2020

github-actions bot commented Apr 19, 2021

gasche commented Apr 19, 2021 •

edited

Slower compilation 4.07 => 4.08 #8776

Slower compilation 4.07 => 4.08 #8776

Comments

alainfrisch commented Jun 28, 2019

nojb commented Jun 28, 2019

mshinwell commented Jun 28, 2019

nojb commented Jun 28, 2019

mshinwell commented Jun 28, 2019

mshinwell commented Jun 28, 2019

lpw25 commented Jun 28, 2019

trefis commented Jun 28, 2019

alainfrisch commented Jun 28, 2019 • edited

trefis commented Jun 28, 2019

lpw25 commented Jun 28, 2019

kit-ty-kate commented Jul 24, 2019

gasche commented Jul 24, 2019 • edited

mshinwell commented Apr 16, 2020

alainfrisch commented Apr 16, 2020

github-actions bot commented Apr 19, 2021

gasche commented Apr 19, 2021 • edited

alainfrisch commented Jun 28, 2019 •

edited

gasche commented Jul 24, 2019 •

edited

gasche commented Apr 19, 2021 •

edited