Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add possibility to ignore deleted files in certain locations #70

Open
Wastus opened this issue Sep 26, 2023 · 18 comments
Open

Add possibility to ignore deleted files in certain locations #70

Wastus opened this issue Sep 26, 2023 · 18 comments

Comments

@Wastus
Copy link

Wastus commented Sep 26, 2023

I'd love to see a feature where I can specify files / folders (or a regular expression) which should not count to the deleted file count. But are still synced (so I can't use the SnapRAID exclude feature).

I know that this also means, that the script will have to parse the output of the diff instead of just reading the summary.

Why would I need something like this?

I have a rolling backup running onto my SnapRAID, certain parts will get deleted after a day or two and new things will pop up. This is a mess with counting the deleted files, you never know how much it'll be and honestly I don't really care. Of course someone could just delete my whole backup and the sync would just continue when the backup is excluded from counting, but that is a risk I'll take over just setting the delete count to 1000 and miss that I deleted the wrong folder in some after work delirium.

@auanasgheps
Copy link
Owner

I have thought this in the past, but this script relies on Snapraid data. Altering such data is very complex, since we're dealing with a raw log.

I'm not a real coder, my abilities are very limited.

I have a rolling backup running onto my SnapRAID, certain parts will get deleted after a day or two and new things will pop up.

I feel you. I'm testing a backup solution from my Windows clients to my server called UrBackup, which is very cool, but likes to cycle files around for maintenance, producing lots of deleted and moved files.

Let's workaround this issue:

  • You could entirely disable the deleted file detection, or enable alert but still sync (that's what I do)

  • You could use the Added/Deleted Threshold
    ADD_DEL_THRESHOLD feature, by allowing a sync that would otherwise violate the delete threshold, if the ratio of added to deleted files is greater than the value set.

  • I could add an alternative method, inspired by UrBackup itself:

Generate a random file in a folder that the user chooses, store its SHA256 and check the file before every sync. If it's changed or deleted, stop the sync, otherwise proceed.

@Wastus
Copy link
Author

Wastus commented Sep 27, 2023

I've created a quick draft on how this can be implemented, it's not thoroughly tested yet.

And I have so far failed to create a "copy" event in Snapraid diff output, so I can't create the count for that using the new method, so it's still the same as before. Equal files are not listed (luckily) so this stays the same as well.

Side note: I'm using Kopia, so far it's blazingly fast allowing almost real-time backups for my whole system drive containing 440 GB (I excluded Windows and Programs).

@auanasgheps
Copy link
Owner

auanasgheps commented Sep 28, 2023

Thank you for the PR! Please keep testing it and let me know. I would like to make a release shortly, so this will eventually be in a subsequent release.

Kopia is on my radar but haven't tested it yet. Looks like a good replacement, since it manages the backup in blocks and not individual files.

Are you simply using a SMB share from your client to store backups or have you deployed the Kopia Server? (there's almost no documentation about it).

But what is the actual issue with Kopia and Snapraid? it removes a lot of files when cycling trough backups?

@Wastus
Copy link
Author

Wastus commented Sep 28, 2023

I'm running Kopia server in a docker image, which saves data to the Snapraid pool.

I've found instructions on using Kopia as a server also lacking and only a forum post explained a bit more, but it is still a bit cumbersome compared to the streamlined experience I had with most other docker images. You need to execute some commands directly in the running docker and you need to adjust the start command after the first run.
I'd say currently it is better suited for running as a normal service.

I still have to fine tune the Snapraid config, last night it failed because Kopia was doing some work in cache and log files (didn't expect that while no client is active). But the main issue for the script is, that blocks get deleted on the backup rotation (it keeps the last x snapshots, x daily snapshots, etc.) and this triggers my deleted threshold easily.

@tehniemer
Copy link
Contributor

That sounds a bit like Borg backup, which I use in conjunction with the add/del threshold to essentially ignore the blocks deleted by Borg since approximately the same number are added back each backup.

@auanasgheps
Copy link
Owner

@Wastus since you run Kopia Server as container (that's what I'd like to do) you can pause/stop it before running the script, and recover it when done. I do this with other containers and it's a great featur!
Otherwise you should use the custom hook to stop and start the service.

Can you please link the forum post? It would help me to start working on the Server part.

But the main issue for the script is, that blocks get deleted on the backup rotation (it keeps the last x snapshots, x daily snapshots, etc.) and this triggers my deleted threshold easily.

Allright, now I get it. You could still try the ADD/DEL ratio feature.

@tehniemer Borg in my opinion is the best backup service, period. I use it to backup my whole NAS to an offsite NAS.
But it has a major caveat: no Windows client, and Windows File History is garbage
But yes, Kopia uses a similar approach, which in my opinion is really efficient (at least Borg side, I have a limited experience with Kopia)

@Wastus
Copy link
Author

Wastus commented Oct 3, 2023

@auanasgheps
To save you scraping through that post here is my approach: Replace the MyXyz with your values, I'm not sure if everything is necessary, the local path surely isn't but I needed it to backup my phones rsynced files (there is no Kopia for Android yet).

version: '3.7'
services:
    kopia:
        image: kopia/kopia:latest
        hostname: datavault
        restart: unless-stopped
        ports:
            - 51515:51515
        environment:
            KOPIA_PASSWORD: MyVeryGoodSecret
            TZ: Europe/Berlin
        volumes:
            - /MyPersistentPath/config:/app/config
            - /MyPersistentPath/cache:/app/cache
            - /MyPersistentPath/logs:/app/logs
            - /MyPersistentPath/backup:/app/backup
            - /MyOtherPathIWantToLocalBackup:/mnt/MyLocalBackupPath
        entrypoint: ["/bin/kopia", "server", "start", "--tls-cert-file","/app/config/my.cert", "--tls-key-file","/app/config/my.key", "--address=0.0.0.0:51515", "--override-username=MyUser@MyServer", "--server-username=MyUser@MyServer", "--server-password=MyServerSecretIsGood"]

For the first run, you need a different entrypoint to generate the certificate (note the "--tls-generate-cert") you may not run it multiple times, as it creates a new certificate every time then:
entrypoint: ["/bin/kopia", "server", "start", "--tls-generate-cert", "--tls-cert-file","/app/config/my.cert", "--tls-key-file","/app/config/my.key", "--address=0.0.0.0:51515", "--override-username=MyUser@MyServer", "--server-username=MyUser@MyServer", "--server-password=MyServerSecretIsGood"]

Have a look at the output of the first start, there should be something along the lines of:
SERVER CERT SHA256: 48537cce585fed39fb26c639eb8ef38143592ba4b4e7677a84a31916398d40f7
Which you need for setting up you backups from remote devices.

For adding users (you see your user in KopiaUI) the normal documentation is quite helpful. I just log into the container using Portainer but that boils down to running a bash with interactive console with docker. And then execute the commands there.

@auanasgheps
Copy link
Owner

Thank you @Wastus your advice worked! I was able to create a working Kopia server!

@Wastus
Copy link
Author

Wastus commented Oct 20, 2023

I've added the docker containers to the script and that is working also quite well so far, thanks for that feature.
I'll add an optional output of the deleted files, because currently I'm not really able to tell if it's working as intended for all cases.

@auanasgheps
Copy link
Owner

@Wastus what do you mean by this?
You mean that you're pausing/stopping containers using the built in feature?
What about the optional ouput?

Just wanted to understand the whole situation

@Br33ce
Copy link

Br33ce commented Mar 12, 2024

Would this solve the following "problem":
I've got an extensive media library with .nfo files that contain the metadata. Those files get updated frequently which I know but still triggers the diff check.
Excluding *.nfo would solve this, wouldnt it?

@tehniemer
Copy link
Contributor

Yes, but you would exclude those files or locations in your snapraid.conf file, not this script. See section 7 of the manual

@Br33ce
Copy link

Br33ce commented Mar 13, 2024

But with that approach they wouldn't get synced at all. I've read a workaround by having two .conf files and renaming them right before a sync / diff.
That's why I thought this script might be a more elegant solution.

@Wastus
Copy link
Author

Wastus commented Mar 13, 2024

Actually the addition I wrote is exactly for those use cases, you have files which you know change a lot, but should be synced. But you still want to stop a sync when only 20 other files changed.

There is currently no selection or differentiation from what the pattern is excluded. So it will apply to deletions and updates alike.

That might not be fine grained control enough for you @Br33ce.

@Br33ce
Copy link

Br33ce commented Mar 13, 2024

Hmm if I can only include the pattern "*nfo" it's fine grained enough because those nfo files will be updated and deleted (and I don't care 😅 in terms of snapraid). So I will wait for the PR then.

@tehniemer
Copy link
Contributor

I originally misinterpreted the intent of this discussion and now understand and am very intrigued.

@auanasgheps
Copy link
Owner

auanasgheps commented Mar 14, 2024

I like the idea to exclude files from the total calculations, but requires some work. We can consider working on the PR in the future.
At the moment I'm focused on wrapping up the next release.

@Br33ce
Copy link

Br33ce commented Mar 14, 2024

Sounds good! The next release is more important than this because it's only QOL 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants