Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize performance in case of local links #219

Open
Martoon-00 opened this issue Nov 17, 2022 · 14 comments
Open

Optimize performance in case of local links #219

Martoon-00 opened this issue Nov 17, 2022 · 14 comments
Milestone

Comments

@Martoon-00
Copy link
Member

Martoon-00 commented Nov 17, 2022

Clarification and motivation

One use case that I wanted to see very fast - running xrefcheck on local links only. So that as a user, I could have a very responsive "verify >> edit" loop.

However, currently it seems to take quite long.

Acceptance criteria

  • We investigated what are the bottlenecks (probably via profiling), and tried to optimize them if sensible.
@Martoon-00 Martoon-00 added this to the 0.3.0 milestone Nov 17, 2022
@Martoon-00
Copy link
Member Author

Note: as far as I understand it correctly, making IO concurrent does not make much sense. But if pure logic takes a decent amount of time, it should be parallelized.

@Martoon-00
Copy link
Member Author

I tried xrefcheck on Morley, and it takes 0.75s to check the local links.

0.13s turned out to be consumed for files traversal, 0.08s on scanning, so looks like the most portion of time is consumed by verification. And I'm not sure how can this be, it is mostly pure code if we exclude external links checks 🤔

@Martoon-00 Martoon-00 mentioned this issue Nov 17, 2022
13 tasks
@Sorokin-Anton
Copy link
Contributor

IMHO, this proportion of time may be result of laziness (so it reported about finishing scan before e.g. all anchors were parsed)

@Martoon-00
Copy link
Member Author

Martoon-00 commented Nov 17, 2022

Ah yes, I first measured it incorrectly but then remembered the need in running some evaluateNF after the scan. Otherwise, the result that I reported could be completely spoiled, that's true.

@Martoon-00
Copy link
Member Author

Martoon-00 commented Nov 19, 2022

I tried profiling this in a free time, and if I interpret the results correctly, we really have something to optimize in the verification logic. I have run on Morley repository with -m local-only:

morley-run

(I also had evaluateNF applied at the end of the scan logic to exclude parsing from being computed as part of verification).

I get approximately the same flamegraph if focusing on memory, not time.

I believe, something like this is fully expected, given that we brought proper handling of filepaths comparison just recently.


Looks like in both fat places, equalFilePath takes all the time.

equalFilePath on all systems is approximately (==) 'on' normalize, and I believe that if we extract normalization out, this will result in a dramatic time save. Perhaps keeping the normalized filepaths in an appropriate data structure instead of [String] would also make sense, at least to decrease the asymptotic (keeping them in [FilePath] is an old issue however).

This is basically what #197 is about.

@YuriRomanowski
Copy link
Contributor

Also we can try to use Text instead of String if it eventually boils down to simple (==)

@Martoon-00
Copy link
Member Author

Yep, good note.

I think we could go even further. Comparing on simple (==) @Text would be inefficient as oftentimes the compared paths share the common prefix, mere linear comparison will take too much time in practice. And I believe practically we can get mostly instant comparisons.

@Martoon-00
Copy link
Member Author

Let's do this right after #230 (otherwise we will get severe merge conflicts, and after that PR introducing Text would be trivial).

@YuriRomanowski
Copy link
Contributor

YuriRomanowski commented Dec 12, 2022

I think we could go even further. Comparing on simple (==) @Text would be inefficient as oftentimes the compared paths share the common prefix, mere linear comparison will take too much time in practice. And I believe practically we can get mostly instant comparisons.

We can just compare reversed paths 😄

@Martoon-00
Copy link
Member Author

Yep, this should quite work 😸

@aeqz
Copy link
Contributor

aeqz commented Dec 22, 2022

I have tried to generate the same flame graph after #230, if I am not wrong with the repository under the test (https://gitlab.com/morley-framework/morley), and also running evaluateNF after the scan:

output

It seems that this time verification has been slightly less than half of the running time.

These are the runtime statistics im my case, with a no profiling build from the master branch:

> xrefcheck -m local-only +RTS -s -RTS
All repository links are valid.                                                      
     501,631,104 bytes allocated in the heap
      45,211,984 bytes copied during GC
       5,647,024 bytes maximum residency (8 sample(s))
         343,256 bytes maximum slop
              21 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       392 colls,   392 par    0.185s   0.032s     0.0001s    0.0013s
  Gen  1         8 colls,     7 par    0.155s   0.047s     0.0058s    0.0319s

  Parallel GC work balance: 41.78% (serial 0%, perfect 100%)

  TASKS: 16 (1 bound, 15 peak workers (15 total), using -N6)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.007s elapsed)
  MUT     time    0.195s  (  0.334s elapsed)
  GC      time    0.339s  (  0.078s elapsed)
  EXIT    time    0.000s  (  0.011s elapsed)
  Total   time    0.535s  (  0.431s elapsed)

  Alloc rate    2,576,338,313 bytes per MUT second

  Productivity  36.4% of total user, 77.6% of total elapsed

@aeqz
Copy link
Contributor

aeqz commented Jan 9, 2023

I have tried again with the changes made by the #263 PR, and both memory usage and overall execution time have been improved:

> xrefcheck -m local-only +RTS -s -RTS
All repository links are valid.                                                      
     118,896,832 bytes allocated in the heap
      17,366,568 bytes copied during GC
       3,093,824 bytes maximum residency (5 sample(s))
         356,080 bytes maximum slop
              15 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        94 colls,    94 par    0.073s   0.013s     0.0001s    0.0012s
  Gen  1         5 colls,     4 par    0.044s   0.011s     0.0023s    0.0049s

  Parallel GC work balance: 31.94% (serial 0%, perfect 100%)

  TASKS: 16 (1 bound, 15 peak workers (15 total), using -N6)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.004s elapsed)
  MUT     time    0.089s  (  0.257s elapsed)
  GC      time    0.117s  (  0.024s elapsed)
  EXIT    time    0.000s  (  0.013s elapsed)
  Total   time    0.207s  (  0.298s elapsed)

  Alloc rate    1,337,301,840 bytes per MUT second

  Productivity  43.0% of total user, 86.4% of total elapsed

@Martoon-00
Copy link
Member Author

😮 Nice to see that!

@Martoon-00
Copy link
Member Author

Martoon-00 commented Jan 27, 2023

I tried to run the recent version of xrefcheck (from master) on Morley repository (yeah, your guess was correct!), and now it takes 0.44sec for me, feels quite instant!

Looking at the flamegraph you provided, I found it suspicious that matchesGlobPatterns takes so much time. And hilariously, looks like we somehow fell into that very problem - we compile glob patterns every time we want to perform a match instead of compiling once in advance. So I created #272, and probably it will optimize things out considerably.

I'm personally satisfied with what we already have, so I suggest leaving #272 for future work (not in 0.3.0 milestone), and only:

And then close this ticket if we observe no significant regress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants