Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slower compilation 4.07 => 4.08 #8776

Closed
alainfrisch opened this issue Jun 28, 2019 · 16 comments · Fixed by #10624
Closed

Slower compilation 4.07 => 4.08 #8776

alainfrisch opened this issue Jun 28, 2019 · 16 comments · Fixed by #10624

Comments

@alainfrisch
Copy link
Contributor

This is to document one case of significant degradation in compilation time between 4.07 and 4.08, on a synthetic example (originally created to stress-tess #8774). In case someone wants to investigate further...

Results:

  • 4.07: 10s
  • 4.08, trunk: 19s

Reproduction:

After a make world, run:

 boot/ocamlrun ./ocaml -I stdlib gen.ml > bar.ml && time boot/ocamlrun ./ocamlc -I stdlib -c bar.ml 

with gen.ml:

let n1 = 200
let n2 = 200
let n3 = 40

let () =
  let f () =
    for j = 1 to n2 do
      Printf.printf "method f%i = ()\n" j
    done
  in

  print_endline "type t = A: 'a -> t";
  for i = 1 to n3 do
    Printf.printf "let a%i = object\n" i;
    f ();
    Printf.printf "end\n"
  done;
  print_endline "let f x =";
  for i = 1 to n1 do
    print_endline "match x with A _ ->";
  done;
  for i = 1 to n3 do
    Printf.printf "a%i," i;
  done;
  print_endline "()"
@nojb
Copy link
Contributor

nojb commented Jun 28, 2019

-dtimings output shows a big increase in the lambda generation phase:
4.07.1:

8.581s bar.ml
  0.084s parsing
    0.084s parser
  8.141s typing
  0.196s transl
  0.157s generate
  0.003s other
0.006s other

4.08.0:

16.359s bar.ml
  00.102s parsing
    00.102s parser
  08.271s typing
  07.814s transl
  00.172s generate
00.012s other

@mshinwell
Copy link
Contributor

I think at least some of this might be coming from the calls to Typeopt.value_kind (and hence Subst.typexp) introduced in simplify_cases in matching.ml in #2156.

@nojb
Copy link
Contributor

nojb commented Jun 28, 2019

Am not sure, if I didn't make a mistake, the same slowdown (worse in fact) can be obtained without any pattern matching at all:

let n2 = 200
let n3 = 40

let () =
  let f () =
    for j = 1 to n2 do
      Printf.printf "method f%i = ()\n" j
    done
  in
  for i = 1 to n3 do
    Printf.printf "let a%i = object\n" i;
    f ();
    Printf.printf "end\n"
  done

@mshinwell
Copy link
Contributor

@nojb I see the same degradation in the same function with your second example (indeed it seems worse too).

@mshinwell
Copy link
Contributor

(gdb) inf break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00000000007e55c0 bytecomp/matching.ml:593
	breakpoint already hit 8000 times
        c

@lpw25
Copy link
Contributor

lpw25 commented Jun 28, 2019

Looks like Typeopt.scrape_ty is calling Ctype.correct_levels which means it is making a completely unnecessary copy of the type. It should be adjusted to only do that when there is actually something to expand.

@trefis
Copy link
Contributor

trefis commented Jun 28, 2019

It should be adjusted to only do that when there is actually something to expand.

Or it could just not do the copy: it's only there to update the level so the expansion won't fail. But we could also have a version of the expansion that doesn't check the levels (we're past typechecking at this point, so all this doesn't matter anymore)

@alainfrisch
Copy link
Contributor Author

alainfrisch commented Jun 28, 2019

I confirm that not calling correct_levels removes a good part of the observed slowdown. The call to correct_levels was there already in the first version of typeopt.ml (ea8fe59#diff-5ad81800134284c48d42f76b958990b3), but it is called much more often now, so it is worth optimizing it.

My understanding is that, if we don't introduce a variant of the expansion that ignores levels, not calling correct_levels can only miss some optimizations (but not fail); is that right?

@trefis
Copy link
Contributor

trefis commented Jun 28, 2019

My understanding is that, if we don't introduce a variant of the expansion that ignores levels, not calling correct_levels can only miss some optimizations (but not fail); is that right?

That sounds right yes.

@lpw25
Copy link
Contributor

lpw25 commented Jun 28, 2019

not calling correct_levels can only miss some optimizations (but not fail); is that right?

That's right, but when an optimisation was missed would be very unpredictable and depend on things like if -principal was used. So it'd be better to try and get this right. I think only calling correct_levels and expand_head_opt when the type_desc is actually Tconstr would probably be both correct and quick in most cases.

@kit-ty-kate
Copy link
Member

Any news on this issue before the release of 4.08.1?

@gasche
Copy link
Member

gasche commented Jul 24, 2019

To my best knowledge the answer is "no". (But I would be surprised if that was the cause of the +164% increase reported #8811, which we also don't know.)

@mshinwell
Copy link
Contributor

@alainfrisch Would you like to bring this PR to a conclusion?

@alainfrisch
Copy link
Contributor Author

(This is an issue, not a PR.) Reading the discussion again, it seems the slowdown is confirmed, and some possible improvements that could improve performance significantly have been identified but not implemented. Unless I missed something, it sounds reasonable to keep the issue open.

@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label Apr 19, 2021
@gasche
Copy link
Member

gasche commented Apr 19, 2021

I can reproduce the slowdown for recent versions of OCaml:

$ for ver in 4.07.1 4.08.0 4.09.0 4.10.0 4.11.1 4.12.0; do
     echo -n "$ver: ";
     opam exec --switch=$ver -- ocamlc -c -dtimings prog.ml | grep prog.ml;
   done
4.07.1: 0.468s prog.ml
4.08.0: 1.979s prog.ml
4.09.0: 1.924s prog.ml
4.10.0: 2.050s prog.ml
4.11.1: 1.933s prog.ml
4.12.0: 1.997s prog.ml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants