Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

very slow with 10k tags #248

Open
mgaunard opened this issue Mar 16, 2020 · 11 comments
Open

very slow with 10k tags #248

mgaunard opened this issue Mar 16, 2020 · 11 comments

Comments

@mgaunard
Copy link

mgaunard commented Mar 16, 2020

Loading every single page takes 20-30 seconds. If I remove the get tags function from the context it only takes a couple of seconds.

Tags don't seem that important, how come they have such a terrible performance impact?
Could some caching logic help?

@jonashaag
Copy link
Owner

I'd be interesting in speeding things up. Can you provide an example repo that I can use to reproduce the issue?

You can also simply disable ctags for the time being.

@jelmer
Copy link
Contributor

jelmer commented Mar 16, 2020

Is this about git tags or ctags?

@mgaunard
Copy link
Author

This is about git tags.
Nothing to do with ctags.

@jonashaag
Copy link
Owner

I did some investigation today. We have multiple causes here.

The time spent compiling the list of tags (and branches) for display is spent 50% on listing refs (Dulwich’s refs.as_dict()), and 50% on looking up each ref’s time stamp for sorting.

Then, looking up refs and checking their time stamps is not cached at the moment.

We can easily cache the time stamp part; there’s already logic in the code base for cache invalidation based on changed list-of-refs. The ref lookup would still need to be performed on each page load, since it is used as cache validator, so we can only save 50% of page load time using this method.

@jelmer two Dulwich/Git questions here:

  • Is there a faster way to check for changed refs (tags and branches) than compiling the dict of refs? Probably we can use ref files’ modification time stamps as a cheap has-any-ref-been-modified pre-check, and only then resolve each ref to check for changes? Is this a reliable method?
  • Is there a faster way to look up a bunch of refs at the same time? Use case would be looking up all refs’ time stamps for sorting.

@jonashaag
Copy link
Owner

Btw, Mathias, if you have no more than 10k tags and page load takes up to 30 seconds, are you using a terribly slow file system or operating system or a very slow computer? My benchmarks were more in the range of 1-2 seconds for 10k tags on a moderately capable MacBook.

@jelmer
Copy link
Contributor

jelmer commented Mar 18, 2020 via email

@jonashaag
Copy link
Owner

Thanks!

Me: Is there a faster way to look up a bunch of refs at the same time? Use case would be looking up all refs’ time stamps for sorting.

Do you think we could gain some speedup by batching the ref lookups?

the overhead of tracking timestamps versus
actually reading the files is also non-trivial (both in terms of
performance and in terms of additional code).

Code-wise, isn't it only a matter of calling os.stat() on each ref file? Do you think that's slower than reading each file's contents?

@jelmer
Copy link
Contributor

jelmer commented Mar 21, 2020 via email

@jelmer
Copy link
Contributor

jelmer commented Jun 28, 2020

You can now use RefsContainer.watch() to wait for changes to refs:

r = Repo('.')
with r.refs.watch() as w:
    for (ref, sha) in w:
        print('Ref %s has been updated to %s' % (ref, sha))

Note that you'd probably have to run this from another thread, since it will block until one of the refs changes.

@jonashaag
Copy link
Owner

Great news!

Minor downer is that inotify is Linux-only, so no macOS or BSD support. But that’s probably easy to add, and maybe entirely unnecessary since 99% of users with huge number of repositories will run Linux anyways.

@jelmer
Copy link
Contributor

jelmer commented Jun 28, 2020

The API on the Dulwich side should be generic enough that we could add a Windows-specific implementation (perhaps using watchdog?) to it, without any changes on the klaus side.

jonashaag added a commit that referenced this issue Oct 12, 2020
Not working on caching at the moment...

This reverts commit e21cf6a.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants