optimization for `find` #4783

aawsome · 2024-04-28T21:21:24Z

Output of `restic version`

restic 0.16.4 compiled with go1.21.6 on linux/amd64

What should restic do differently? Which functionality do you think we should add?

I re-implemented the restic find command for rustic, see rustic-rs/rustic#1136.

After the implementation I wanted to compare with restic with respect of the find results and also with respect to performance.
TL;DR: restic is incredibly slow..

For my repo on local disc (with many - mostly identical or similar - snapshots, to admit), I got:
restic:

1331.31user 111.21system 24:23.76elapsed 98%CPU (0avgtext+0avgdata 151184maxresident)k
64528inputs+40outputs (0major+104798minor)pagefaults 0swaps

rustic:

100.11user 37.98system 2:27.27elapsed 93%CPU (0avgtext+0avgdata 96088maxresident)k
20512inputs+0outputs (131major+4932230minor)pagefaults 0swaps

I think the reason is that restic processes all snapshots one-by-one without using any results obtained by previous results. In rustic I remember the result list (even an empty one) for each (tree_id/path) pair which has been processed and don't re-search if this re-appears in another snapshot.
IMO the same optimization can be also trivially implemented in restic and would improve find-performance in many cases a lot.

What are you trying to do? What problem would this solve?

find efficiently files within the repository

The text was updated successfully, but these errors were encountered:

MichaelEischer · 2024-05-05T17:55:51Z

In rustic I remember the result list (even an empty one) for each (tree_id/path) pair which has been processed

Does that mean that if /folder/a matches some filter, then that match is also propagated up to /folder? From what I understood /folder won't be processed again if it exactly matches that from a previous snapshot; thus a match also has to be propagated to all parent folders.

aawsome · 2024-05-05T19:16:17Z

Does that mean that if /folder/a matches some filter, then that match is also propagated up to /folder? From what I understood /folder won't be processed again if it exactly matches that from a previous snapshot; thus a match also has to be propagated to all parent folders.

Exactly!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimization for `find` #4783

optimization for `find` #4783

aawsome commented Apr 28, 2024 •

edited

MichaelEischer commented May 5, 2024

aawsome commented May 5, 2024

optimization for find #4783

optimization for find #4783

Comments

aawsome commented Apr 28, 2024 • edited

Output of restic version

What should restic do differently? Which functionality do you think we should add?

What are you trying to do? What problem would this solve?

MichaelEischer commented May 5, 2024

aawsome commented May 5, 2024

optimization for `find` #4783

optimization for `find` #4783

aawsome commented Apr 28, 2024 •

edited

Output of `restic version`