-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support explicit relative paths using common /./ notation #4762
Comments
Take a look at #4026 which proposes to change the absolute path model and is under consideration. |
@damoclark I did see that ticket while I was looking for workarounds, but that doesn't address the issue that it would still store useless (and wrong for the purposes of restore) path information. It also seems like a more significant undertaking, because it re-architects what restic uses to identify backups, so this proposal seems like a much less impactful change. |
It does address the issue you describe, using the
A novel approach, but I don't think it is a good idea encoding special meaning within paths using the current directory
Unfortunately, I think re-architecting this part of restic is necessary. If done well, impact can be minimised. In fact, I actually think it will make restic easier and more intuitive to use. And yes, it will be a substantial undertaking. An interim solution might be worthwhile, but the challenge with half measures, such as the one you suggest is that they might appear simple, but risk breaking other things, such as parent matching. These ideas are discussed in #4026. Mucking around with the absolute paths under the current model, in my view, is risky. I appreciate the need for a quick fix for your use case. I have similar requirements, although mine are for backing up LVM and APFS snapshots, rather than ZFS, and for storing backup archives. I propose that the sooner the present model that relies upon absolute paths for identifying backup-sets is replaced, the sooner and easier a great number of existing issues and limitations of restic can be resolved. The evidence to me suggests it's worth the trouble. In the meantime, there are some clever hacks you can use to work around your zfs issue, using symbolic links. They are barbaric, but they work if you are interested. :) |
I never claimed it was never useful. I said in my case it wasn't useful since it was wrong in my specific situation. Which is why I'm looking for a way to override the default assumption that the path things are stored at now (backup/snapshot creation time) is meaningful in the future, which is often correct, but sometimes not.
You are correct that that could potentially happen, and if it is desired to protect against this "unintended" use, by all means, add a However it should be noted that if some script specifies a relative path and then some other part blindly prepends something to that path such that it ended up with an embedded
Perhaps. But those semantics do not actually handle the example I've given in this ticket. If you'll note, I've specified different relative roots in the paths to be backed up in my example, which
I don't know that I'd go so far as necessary, but I do agree that it seems like a good addition. But your proposal also does not completely solve my use case. Introducing labels does not alleviate the problem that useless and distracting "wrong" paths are being stored as an intrinsic part of the backup, and there is no mechanism to correct them. But I also note that your proposed re-architecture ticket has been around for about 18 months without anyone showing any indication of being willing to take it on. In fact, you noted yourself in #4026 (comment) that it doesn't look like it will be implemented anytime soon. Whereas the change I've proposed is much simpler, and I might even be willing to take a stab at making a PR for it, assuming there was some indication from the maintainers that it might be accepted.
I don't see how symbolic links could help, since restic resolves paths to absolute, which would undo any symbolic links. I already mentioned the only reasonable workaround I see for this in the OP, which is using bind mounts. I do agree that having to use bind mounts is pretty barbaric (as well as an incomplete solution). Can you explain how symlinks can help? |
Hi Evan,
I understand now what you are saying.
Actually, they do if I have understood your requirements correctly. You want:
Hope I understand this correctly.
This is incorrect. restic backup -x --time "2024-01-01 02:00:00" --host argon \
-C /mnt/backups/.zfs/snapshot/20240101-020000/argon/root/ . \
-C /mnt/backups/.zfs/snapshot/20240101-020000/argon/ boot \
-C /mnt/backups/.zfs/snapshot/20240101-020000/argon/ usr \
-C /mnt/backups/.zfs/snapshot/20240101-020000/argon/ var Again, based on my correct interpretation of your requirements. :)
Strong words, I know. :). My confidence in making such a bold statement is based on my observations in the forums and the github issues over an extended period of time. The current model confuses people, generates lots of questions, tricky feature requests, and workarounds, taking up maintainers' time that I think could be better spent. That is why I think a change to the underlying model is necessary.
You need to read #4026 more thoroughly as its a discussion, with evolving ideas based on contributions from others. I actually prefer to use 'name' rather than label - naming backup-sets.
Yes. As you can tell, I was disappointed by that revelation. But I'm not doing all the work and I fully respect that.
Go for it. I see you have referenced #3200. Have a good read, as it would be a great place to start. I was disappointed that Alex closed that PR. Also, have a good read of #4026 and Michael's comments in particular. Cracking this nut, if it can be done in the short-term would benefit many (including me). My only concern is that some users could get themselves in a 'bit of a pickle' by 'fudging' the paths of a snapshot without decoupling it's implicit meaning as a backup set identifier, especially if they don't know what they are doing, or make a mistake.
I made a mistake here. You are correct that symbolic links won't help you with zfs. The symbolic links have helped me with APFS, because its a single volume that I am backing up (Macintosh HD - Data) and I don't need to reassemble different mount points. Essentially, for each APFS snapshot, I create a symbolic link This way, the absolute path is always I think you are correct RE bind mounts for zfs. D. |
I was not aware that -C could be specified multiple times for tar. But even if that is the case, I would guess that However after already having had a cursory look at how restic processes paths to be backed up (as a single list of paths), it would seem significantly more complicated to have to change all the internals to be able to handle a more complicated data structure, whereas it would seem that my suggested method could still be done with the same simple list of paths, since both items (actual and wanted path) are communicated within each individual path. |
No 'ifs'. Its a fact. Try it for yourself. BSD & GNU tar alike.
I have already done the research on this, in terms of how tar behaves and how restic could better implement it - it's in #4026. I'm not going to restate things here.
This is not a convincing argument to me. What is identical is the concept. That is what's important. Those who are already familiar with the concept will be able to easy apply it to restic. It's also a simple concept, so those who aren't will learn a new concept.
I strongly advise you not to pursue your proposed approach of using the And naturally, it's your call as the PR initiator, and it's the Restic team's call as to whether they accept it. But... Without turning into a computer science lecture, you are creating a semantic overload. Encoding special meaning in the paths can only cause unintended grief for users down the road. I have already provided a concrete example in the way of concatenating user-provided path segments. So I'm not just speculating. Part of the issues with the current model of the paths being used to match parent snapshots is that the paths have been endowed with a special meaning beyond their literal meaning of being a path to 'what' is being backed up. The path can change, while the 'what' may not. It is why, for instance, databases use arbitrary incremental numbers as primary keys (i.e. surrogate versus natural keys). The Furthermore, with the tar syntax tar cv -C /home fred margot barry say, if you only wanted to backup those three home directories, relative to /home. The concept is: We teach this simple concept to users who don't already know it in the doco, and these semantics become very powerful, without the potential aforementioned problems. This problem has already been solved - let's stand on the shoulders of giants. :)
I think you are arguing that your solution is much quicker and easier to implement, than #4026. If I have misunderstood, disregard my response below and help me understand. I am not contesting that your short-term solution wouldn't be easier, not withstanding the challenges of retrospectively enabling path mapping and/or relative paths under a model that is heavily reliant on absolute paths. And in a way that doesn't compound the issues of semantic overload. That is where I don't think your short-term solution is necessarily easier, and will require careful work. This is why I have been advocating for changing the model earlier, rather than later. By doing so, these problems (and many others) become much easier to solve. Or at least I think they will be. Again, I am guided by those who are very familiar with the code-base here. Very happy to have intellectual discussion and debate on these ideas with you. But with sincerity, I would prefer it if you do the readings I suggested earlier first. This way, we have a common starting point for discussing new ideas going forward. Damien. |
This discussion completely mixes two only partially related issues. #4026 primarily addresses how to identify backup sets (something similar can already be achieved using
That won't behave as you'd expect. The
There is a resistance to blindly adding flags without first discussing whether it is actually necessary; things should just work without requiring users to assemble the right magic combination of options. More flags that interact with each other in surprising ways don't help anyone. Although, the file layout issue will likely require some additional options.
I can only take a look at so many things at a time, which also means that I can't get involved in every discussion as otherwise I wouldn't get anything done.
The usual rule of thumb is that a more complicated implementation is ok if that simplifies the interface (all within certain limits obviously).
I won't have time to have a closer look at this and the related issues before the development cycle for restic 0.18.0 starts. Probably anything you start working on now won't match the outcome of the corresponding discussions. |
Important things first. My comment wasn't a criticism. It is okay that I am disappointed - you don't owe me, or anyone else anything. You have to make decisions and they can't always please everyone. And I take your point RE discussion. I'm making my reply deliberately brief. :)
Not entirely Michael. In fact, this point was discussed at length in the issue. But let's leave for a more appropriate time...
No problem. Appreciate the clarity.
This important point slipped by me, due to my lack of experience with ZFS. Thanks for this Michael. There are proposed ideas in #4026 and elsewhere to address this - at a later time.
Thanks for this. I totally missed #555. And especially that rsync already partially adopts what Evan was proposing by applying special meaning to
I share your philosophy. |
That is actually unrelated to ZFS. Using Linux, you can only mount a volume over a folder. That is to mount a volume at some path you have to use |
Yes, I understand this. My misunderstanding was with how ZFS was arranging its mounted snapshots. What you describe is an important problem to solve for restic going forward, but it has been solved by others already. This provides some guidance on how to approach it within the model of restic. GNU and BSD tar approaches are a good place to start as discussed in #4026. I have great respect for the work you and your team do on Restic. Backup technologies are high-stakes complicated beasts, especially when they are multi-platform (including the black sheep Windows juggernaut). |
Output of
restic version
restic 0.16.4 (v0.16.4-0-g3786536dc) compiled with go1.21.8 on linux/amd64
What should restic do differently? Which functionality do you think we should add?
Add support for explicit relative path designations (specifying the intended root of path) using common
/some/path/./another/path
notation to anchor the portion of the path to be stored.Backups and exclude/includes should be able to be specified with the new "root" of the path starting after the
/./
part. So the path above becomes/another/path
for all storage/referencing purposes, including: parent use/detection, include/excludes, etc.This is a simpler, and already known method of accomplishing the indication of relative path intent.
(See also: #2714, #2246, #2993, #3131, #3200, and possibly others)
The forced absolute path design that
restic
currently uses to identify backups actually causes several problems, and there are several tickets (some of which are listed above) already trying to solve or abate them.What are you trying to do? What problem would this solve?
In my current use case, I have backups that were done via zfs snapshots, and I'd like to "move"/convert them into restic instead.
However, since each snapshot appears within zfs under its own unique snapshot directory, restic resists being informed that those are actually the same files, just from another point in time, and clutters the snapshot listing with path information that is completely irrelevant, as well as breaking restic's parent and file-change detection and forcing all files to be completely rescanned. It may also result in restic being less efficient at storing those changes (this is just a guess).
Example command:
Admittedly, this can be worked around by using bind mounts to position the directories to be backed up in a manner that restic can no longer mistakenly infer that they are different sources, but that requires root privileges and is overly convoluted when this capability should (I think) be provided within restic. And the bind mount workaround still does not allow the paths to actually be stored/referenced at the actual root of the filesystem if that is what is desired.
Did restic help you today? Did it make you happy in any way?
I massively appreciate restic being able to reduce the overall storage needs for backups of often changing but otherwise relatively very similar files like mbox mail files, where one message deleted near the beginning of the file causes the entire rest of the file to be rewritten (and thus changed as far as zfs is concerned) even though the actual total data changed may have only been a few kb deleted.
The text was updated successfully, but these errors were encountered: