Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimization for find #4783

Open
aawsome opened this issue Apr 28, 2024 · 2 comments
Open

optimization for find #4783

aawsome opened this issue Apr 28, 2024 · 2 comments

Comments

@aawsome
Copy link
Contributor

aawsome commented Apr 28, 2024

Output of restic version

restic 0.16.4 compiled with go1.21.6 on linux/amd64

What should restic do differently? Which functionality do you think we should add?

I re-implemented the restic find command for rustic, see rustic-rs/rustic#1136.

After the implementation I wanted to compare with restic with respect of the find results and also with respect to performance.
TL;DR: restic is incredibly slow..

For my repo on local disc (with many - mostly identical or similar - snapshots, to admit), I got:
restic:

1331.31user 111.21system 24:23.76elapsed 98%CPU (0avgtext+0avgdata 151184maxresident)k
64528inputs+40outputs (0major+104798minor)pagefaults 0swaps

rustic:

100.11user 37.98system 2:27.27elapsed 93%CPU (0avgtext+0avgdata 96088maxresident)k
20512inputs+0outputs (131major+4932230minor)pagefaults 0swaps

I think the reason is that restic processes all snapshots one-by-one without using any results obtained by previous results. In rustic I remember the result list (even an empty one) for each (tree_id/path) pair which has been processed and don't re-search if this re-appears in another snapshot.
IMO the same optimization can be also trivially implemented in restic and would improve find-performance in many cases a lot.

What are you trying to do? What problem would this solve?

find efficiently files within the repository

@MichaelEischer
Copy link
Member

In rustic I remember the result list (even an empty one) for each (tree_id/path) pair which has been processed

Does that mean that if /folder/a matches some filter, then that match is also propagated up to /folder? From what I understood /folder won't be processed again if it exactly matches that from a previous snapshot; thus a match also has to be propagated to all parent folders.

@aawsome
Copy link
Contributor Author

aawsome commented May 5, 2024

Does that mean that if /folder/a matches some filter, then that match is also propagated up to /folder? From what I understood /folder won't be processed again if it exactly matches that from a previous snapshot; thus a match also has to be propagated to all parent folders.

Exactly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants