Optimize performance in case of local links #219

Martoon-00 · 2022-11-17T14:54:48Z

Clarification and motivation

One use case that I wanted to see very fast - running xrefcheck on local links only. So that as a user, I could have a very responsive "verify >> edit" loop.

However, currently it seems to take quite long.

Acceptance criteria

We investigated what are the bottlenecks (probably via profiling), and tried to optimize them if sensible.

Martoon-00 · 2022-11-17T14:55:54Z

Note: as far as I understand it correctly, making IO concurrent does not make much sense. But if pure logic takes a decent amount of time, it should be parallelized.

Martoon-00 · 2022-11-17T15:04:31Z

I tried xrefcheck on Morley, and it takes 0.75s to check the local links.

0.13s turned out to be consumed for files traversal, 0.08s on scanning, so looks like the most portion of time is consumed by verification. And I'm not sure how can this be, it is mostly pure code if we exclude external links checks 🤔

Sorokin-Anton · 2022-11-17T16:18:57Z

IMHO, this proportion of time may be result of laziness (so it reported about finishing scan before e.g. all anchors were parsed)

Martoon-00 · 2022-11-17T16:24:37Z

Ah yes, I first measured it incorrectly but then remembered the need in running some evaluateNF after the scan. Otherwise, the result that I reported could be completely spoiled, that's true.

Martoon-00 · 2022-11-19T00:52:47Z

I tried profiling this in a free time, and if I interpret the results correctly, we really have something to optimize in the verification logic. I have run on Morley repository with -m local-only:

(I also had evaluateNF applied at the end of the scan logic to exclude parsing from being computed as part of verification).

I get approximately the same flamegraph if focusing on memory, not time.

I believe, something like this is fully expected, given that we brought proper handling of filepaths comparison just recently.

Looks like in both fat places, equalFilePath takes all the time.

equalFilePath on all systems is approximately (==) 'on' normalize, and I believe that if we extract normalization out, this will result in a dramatic time save. Perhaps keeping the normalized filepaths in an appropriate data structure instead of [String] would also make sense, at least to decrease the asymptotic (keeping them in [FilePath] is an old issue however).

This is basically what #197 is about.

YuriRomanowski · 2022-12-12T12:20:28Z

Also we can try to use Text instead of String if it eventually boils down to simple (==)

Martoon-00 · 2022-12-12T14:30:42Z

Yep, good note.

I think we could go even further. Comparing on simple (==) @Text would be inefficient as oftentimes the compared paths share the common prefix, mere linear comparison will take too much time in practice. And I believe practically we can get mostly instant comparisons.

Martoon-00 · 2022-12-12T14:32:03Z

Let's do this right after #230 (otherwise we will get severe merge conflicts, and after that PR introducing Text would be trivial).

YuriRomanowski · 2022-12-12T14:33:36Z

I think we could go even further. Comparing on simple (==) @Text would be inefficient as oftentimes the compared paths share the common prefix, mere linear comparison will take too much time in practice. And I believe practically we can get mostly instant comparisons.

We can just compare reversed paths 😄

Martoon-00 · 2022-12-12T14:37:44Z

Yep, this should quite work 😸

aeqz · 2022-12-22T19:39:46Z

I have tried to generate the same flame graph after #230, if I am not wrong with the repository under the test (https://gitlab.com/morley-framework/morley), and also running evaluateNF after the scan:

It seems that this time verification has been slightly less than half of the running time.

These are the runtime statistics im my case, with a no profiling build from the master branch:

> xrefcheck -m local-only +RTS -s -RTS
All repository links are valid.                                                      
     501,631,104 bytes allocated in the heap
      45,211,984 bytes copied during GC
       5,647,024 bytes maximum residency (8 sample(s))
         343,256 bytes maximum slop
              21 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       392 colls,   392 par    0.185s   0.032s     0.0001s    0.0013s
  Gen  1         8 colls,     7 par    0.155s   0.047s     0.0058s    0.0319s

  Parallel GC work balance: 41.78% (serial 0%, perfect 100%)

  TASKS: 16 (1 bound, 15 peak workers (15 total), using -N6)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.007s elapsed)
  MUT     time    0.195s  (  0.334s elapsed)
  GC      time    0.339s  (  0.078s elapsed)
  EXIT    time    0.000s  (  0.011s elapsed)
  Total   time    0.535s  (  0.431s elapsed)

  Alloc rate    2,576,338,313 bytes per MUT second

  Productivity  36.4% of total user, 77.6% of total elapsed

aeqz · 2023-01-09T17:33:29Z

I have tried again with the changes made by the #263 PR, and both memory usage and overall execution time have been improved:

> xrefcheck -m local-only +RTS -s -RTS
All repository links are valid.                                                      
     118,896,832 bytes allocated in the heap
      17,366,568 bytes copied during GC
       3,093,824 bytes maximum residency (5 sample(s))
         356,080 bytes maximum slop
              15 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        94 colls,    94 par    0.073s   0.013s     0.0001s    0.0012s
  Gen  1         5 colls,     4 par    0.044s   0.011s     0.0023s    0.0049s

  Parallel GC work balance: 31.94% (serial 0%, perfect 100%)

  TASKS: 16 (1 bound, 15 peak workers (15 total), using -N6)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.004s elapsed)
  MUT     time    0.089s  (  0.257s elapsed)
  GC      time    0.117s  (  0.024s elapsed)
  EXIT    time    0.000s  (  0.013s elapsed)
  Total   time    0.207s  (  0.298s elapsed)

  Alloc rate    1,337,301,840 bytes per MUT second

  Productivity  43.0% of total user, 86.4% of total elapsed

Martoon-00 · 2023-01-19T18:47:51Z

😮 Nice to see that!

Martoon-00 · 2023-01-27T01:51:44Z

I tried to run the recent version of xrefcheck (from master) on Morley repository (yeah, your guess was correct!), and now it takes 0.44sec for me, feels quite instant!

Looking at the flamegraph you provided, I found it suspicious that matchesGlobPatterns takes so much time. And hilariously, looks like we somehow fell into that very problem - we compile glob patterns every time we want to perform a match instead of compiling once in advance. So I created #272, and probably it will optimize things out considerably.

I'm personally satisfied with what we already have, so I suggest leaving #272 for future work (not in 0.3.0 milestone), and only:

Make sure we apply the reverse Ord hack or other optimization for filepaths comparison;
Check how fast xrefcheck is after [#239][#249] Further filepath refactor #263.

And then close this ticket if we observe no significant regress.

Martoon-00 added this to the 0.3.0 milestone Nov 17, 2022

Martoon-00 mentioned this issue Nov 17, 2022

[#147] Update readme #216

Merged

13 tasks

aeqz mentioned this issue Jan 9, 2023

[#239][#249] Further filepath refactor #263

Merged

8 tasks

Martoon-00 mentioned this issue Jan 27, 2023

Get rid of matchesGlobPatterns #272

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize performance in case of local links #219

Optimize performance in case of local links #219

Martoon-00 commented Nov 17, 2022 •

edited

Martoon-00 commented Nov 17, 2022

Martoon-00 commented Nov 17, 2022

Sorokin-Anton commented Nov 17, 2022

Martoon-00 commented Nov 17, 2022 •

edited

Martoon-00 commented Nov 19, 2022 •

edited

YuriRomanowski commented Dec 12, 2022

Martoon-00 commented Dec 12, 2022

Martoon-00 commented Dec 12, 2022

YuriRomanowski commented Dec 12, 2022 •

edited

Martoon-00 commented Dec 12, 2022

aeqz commented Dec 22, 2022

aeqz commented Jan 9, 2023

Martoon-00 commented Jan 19, 2023

Martoon-00 commented Jan 27, 2023 •

edited

Optimize performance in case of local links #219

Optimize performance in case of local links #219

Comments

Martoon-00 commented Nov 17, 2022 • edited

Clarification and motivation

Acceptance criteria

Martoon-00 commented Nov 17, 2022

Martoon-00 commented Nov 17, 2022

Sorokin-Anton commented Nov 17, 2022

Martoon-00 commented Nov 17, 2022 • edited

Martoon-00 commented Nov 19, 2022 • edited

YuriRomanowski commented Dec 12, 2022

Martoon-00 commented Dec 12, 2022

Martoon-00 commented Dec 12, 2022

YuriRomanowski commented Dec 12, 2022 • edited

Martoon-00 commented Dec 12, 2022

aeqz commented Dec 22, 2022

aeqz commented Jan 9, 2023

Martoon-00 commented Jan 19, 2023

Martoon-00 commented Jan 27, 2023 • edited

Martoon-00 commented Nov 17, 2022 •

edited

Martoon-00 commented Nov 17, 2022 •

edited

Martoon-00 commented Nov 19, 2022 •

edited

YuriRomanowski commented Dec 12, 2022 •

edited

Martoon-00 commented Jan 27, 2023 •

edited