Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The RAID backend #479

Open
kenkendk opened this issue Aug 5, 2014 · 26 comments
Open

The RAID backend #479

kenkendk opened this issue Aug 5, 2014 · 26 comments

Comments

@kenkendk
Copy link
Member

kenkendk commented Aug 5, 2014

From kenneth@hexad.dk on September 27, 2011 15:48:04

The initial motivation for this feature is the fact that you can get free online storage space from multiple providers, but usually only a few gigabytes. To get the amount of space required for a decent photo collection, requires that you pay for it.

What if Duplicati had a backend that would use multiple storage providers and thus enable you to pool together all the small free storage options into a single large one?

This can be acheived by creating a meta-backend that has no configuration itself, but has a list of other backends and their options, similar to:
webdav://user:pass@host1
ftp://user:pass@host2 --use-ssl
...

The meta-backend would then create the instances of the real backends, based on configuration, and relay the requests to those instances. The UI side could also be a collection of instances of UIs for the real backends.

One problem with this is how to chose what backend to use, and here there is no golden solution. Some users would prefer that each backend has a full copy (more resistant to failures) others would prefer that they were spread out as much as possible (optimal space usage), others that each setup is filled first, then "spill" to the next backend etc.

Another problem is how to handle ambiguity, that is: if a file exists once and was supposed to exist twice, do we consider it deleted or do we assume that the file exists and the copy is erronously missing? The same dilemma arises if we find the same file in two different versions, which one is the right one?

Stepping back a bit, it is clear that this is very similar to how RAID works, and the different solutions have different RAID names: RAID-0, RAID-1, RAID-N, etc. Appart from prioritizing the targets, RAID algorithms exists for solving these problems. The implementation should be able to handle any configuration of N destinations with R redundant copies (where R <= N).

Even if the motivation is utilization of free online storage, this backend could also be used for enterprise backups that require geographically distributed copies.

Original issue: http://code.google.com/p/duplicati/issues/detail?id=479

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/3560186-the-raid-backend?utm_campaign=plugin&utm_content=tracker%2F4870652&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F4870652&utm_medium=issues&utm_source=github).
@kenkendk
Copy link
Member Author

kenkendk commented Aug 5, 2014

From imi...@gmail.com on September 27, 2011 13:48:27

Depending on which backends one chooses, it is conceivable that a user could aggregate 4 or 5 of them, and create a free virtual single backend with 40+ GB of storage. My initial reactions:

(1) Wow! Very exciting idea.
(2) This could be an invention, and you might want to explore protecting this idea as Intellectual Property.
(3) The prevailing business model of the cloud-storage industry is to give away the first 1-5GB, and profit from those users who fill up the free storage and purchase more. Your idea demolishes this model, and it might force cloud storage providers to re-think the free GB's, or to significantly modify their API's to block duplicati.

@kenkendk
Copy link
Member Author

kenkendk commented Aug 5, 2014

From kenneth@hexad.dk on September 28, 2011 00:40:02

  1. Thank you :)

  2. I do not like (software) patents. Writing it here hopefully makes it "prior art" so no-one can claim copyright on the idea.

  3. Yes, that could be a problem if Duplicati users achieve critial mass, but I would think that only skilled users would be able to figure out how to do this. It could be a problem if someone registers for 10 accounts on a single provider and then pools them.

@kenkendk
Copy link
Member Author

kenkendk commented Aug 5, 2014

From rst...@gmail.com on December 19, 2011 00:33:01

The first use case that comes into my mind is: I have 25GB of online storage from my provider and get 25GB from Windows Live SkyDrive. I have a few backup jobs that require 30GB. I do not want to define what job goes where. I simply want to store all jobs on my "25+25 Raid drive".

One thing that we must make sure: To restore files, it must be sufficient to point Duplicati to one of the targets and it must be able to access all targets afterwards.

@kenkendk
Copy link
Member Author

kenkendk commented Aug 5, 2014

From kenneth@hexad.dk on January 24, 2012 12:20:05

Issue 547 has been merged into this issue.

@kenkendk
Copy link
Member Author

kenkendk commented Aug 5, 2014

From james.co...@gmail.com on February 17, 2012 11:05:50

Good idea, but would this also introduce a need for redundancy? Say, allowing 1 out of 3 providers to go down.

@kenkendk
Copy link
Member Author

kenkendk commented Aug 5, 2014

From rst...@gmail.com on February 17, 2012 23:44:55

Asking for redundancy is a good question. We have not discussed about this but still collecting ideas and suggestions.

Pro

  • Reliability

Con

  • Servers with similar size required
  • Configuration required (might become difficult with many servers)

@kenkendk
Copy link
Member Author

kenkendk commented Aug 5, 2014

From rst...@gmail.com on February 17, 2012 23:51:35

Just noticed that redundancy has also been suggested in issue 234 . The question is, if both can be combined easily...

@kenkendk
Copy link
Member Author

kenkendk commented Aug 5, 2014

From Daniel....@gmail.com on May 10, 2012 18:00:22

yes for redundancy of having same backup going to 2 or 3 different locations but spanning or 'raid' across multiple online storage providers is a massive data loss waiting to happen. too many providers involved in 1 solution. providers change policies etc etc. and youve lost a chunk of data. straight up redundancy or mirroring if you want to consider it RAID would be best and wouldnt cause cloud storage providers to remove free storage etc. i want to have my backup goto 1 or 2 different cloud hosts as well as to a local NAS. this type of solution would work well.

@kenkendk
Copy link
Member Author

kenkendk commented Aug 5, 2014

From jgbree...@gmail.com on January 16, 2014 18:46:47

I independently had this idea myself recently, glad its an idea being considered; would love to see it! (publishing prior art also, yay! shows its almost obvious too, so less likely patentable).

I think going the RAID way (easy to copy algorithms from eg. Linux fs implementation or other oss sources) is the most powerful, though the idiosyncrasies of each provider/API might take some careful and bug-risky code to handle well combining to give consistent behaviour across the supported cloud storage. But I love the idea, especially with RAID5 (or 1 or 6 or whatever). The potential performance improvement of "striping" may even show up (though it might just come down to network/provider speed at the time and if you have more blocks on the slower provider, tough luck)

I even (as a developer) seriously thought how I might do it with existing tools - and found s3ql (which does similar back-end storage as this project does, to multiple cloud providers but only one at a time, converting it to a mountable filesystem like another nfs).

Btw, lots of the comments made early on are still relevant to general backup and not specific to the technical issues this idea raises.

@bensoibj
Copy link

bensoibj commented Apr 8, 2016

+1. Very good idea! It came also to my mind and - no suprise - it already exists (here or in #1265 ).

It would be great if the backends configuration can be modified afterwards: If your cloud storage is almost full, you simply add a further one for Duplicati. And if a provider closes for whatever reason its service, you should be able to move the backup data to another location and tell this move to Duplicati.

@shoeper
Copy link

shoeper commented Mar 13, 2017

I also like this idea very much.

Asking for redundancy is a good question. We have not discussed about this but still collecting ideas and suggestions.
Pro
Reliability
Con
Servers with similar size required

One could handle it having a group per redundant copy. Each group can contain any number of backends. Thus a user could balance the groups to have about the same amount of space and duplicity could make sure that each file is being saved in all groups once.

A good idea could be to have some tool that can keep the backends in sync. I have the following scenario in mind: At home i configure an online backup without redundancy. On my gigabit connected server (with only few storage) I could use the tool with a config including redundancy and it would replicate the data to match my configuration. In the best case this tool wouldn't even need my backup passphrase, but just sync one source (the backend defined at home) with others I define. This way one could get backups online fast and also make sure there are no issues if one provider closes or whatever.

@technofab
Copy link

I also agree.. multiple backup destination is a good idea about redundancy but also faul tolerance for saving operaion.

@TuRDMaN
Copy link

TuRDMaN commented Oct 16, 2017

Just leaving my comment here to say that I would LOVE the ability to set multiple destinations, in one way or another. The ideas that have been discussed here all sound great.

@cy2k
Copy link

cy2k commented Mar 10, 2018

Just curious if this is something that might still be worked on? All I'm really looking for is what CrashPlan did, where you create one backup set, and target it to a local destination and a cloud destination. Exact same backup, just to two locations identically, so you don't have to maintain two separate jobs. Seems a bit simpler than what some of the other requests here are for, so I'm hopeful that it could be done.

@piegamesde
Copy link
Contributor

If you have multiple backup destinations, I'd like to have control about the execution:

  • Execute all
  • Execute until one succeeds
  • Execute until one fails

This will cover most lot of the use cases of that kind.

@duplicatibot
Copy link

This issue has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/merge-maybe-split-backups-on-stackip/10506/3

@duplicatibot
Copy link

This issue has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/backup-to-multiple-cloud/13555/3

@duplicatibot
Copy link

This issue has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/backup-to-multiple-cloud/13555/4

@duplicatibot
Copy link

This issue has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/split-upload-to-different-destinations/16365/2

@DiagonalArg
Copy link

Can't you just do this with already existing tools?

https://en.wikipedia.org/wiki/Rclone
https://en.wikipedia.org/wiki/Tahoe-LAFS

@ts678
Copy link
Collaborator

ts678 commented Jan 6, 2024

Can't you just do this with already existing tools?

It would be a far quicker plan for an interested person to pursue that now than to hope Duplicati reinvents it.

Exact same backup, just to two locations identically

This might be rclone sync in Duplicati run-script-after.

creating a meta-backend that has no configuration itself, but has a list of other backends and their options,

This might be rclone union with Duplicati Rclone backend, or maybe rclone mount instead to act as filesystem:

The union backend joins several remotes together to make a single unified view of them.

Whoever volunteers to look should probably read Duplicati Storage Providers to find something that Duplicati can already use. Going indirectly through Duplicati Rclone backend will probably get a few more, but the more steps used, the slower it may be.

Fragility will increase, and troubleshooting will grow more complex, possibly involving multiple parties who supply various parts. While this is an argument for Duplicati to do it all (and well), the available volunteers are all occupied finishing the basic backup.

Another thing to look for in third-party helper is portability. Duplicati runs on various OS, and it'd be nice in favored helper too.

Support for popular destinations would be good too. Tahoe-LAFS cloud support seems to be at the proposal level. I'm not sure.
Developer documentation's Overview explains what Duplicati needs. Nothing fancy needed from Destination, unlike Source side.

There might be a directory somewhere of candidate systems, but if it doesn't exist, some of them may compare to competitors.
JuiceFS Community Edition compares to six others, including s3ql (above). Some of the open source projects look quite active. Some mention similar projects, so you can follow leads until you find the right one. Ask if you have questions on Duplicati uses.

@LastDragon-ru
Copy link

LastDragon-ru commented Jan 6, 2024

This might be rclone union with Duplicati Rclone backend,

Works good, but #4748 and #4553

@ts678
Copy link
Collaborator

ts678 commented Jan 6, 2024

Works good, but

note workarounds given in both of using screen 5 options rather than screen 2. 4553 links possible code fix but it's not in yet, which I suppose underscores my point that there's no lack of work on the basic backup to keep the available developers busy.

On another note, I forgot last time to urge people to heed provider terms of service. Some have rules about multiple accounts. Economically, I'm not sure it's worthwhile to gather free accounts when paid storage costs single digit US cents per GB-month. While this is an interesting idea (and maybe fun to try to see what can be done), some labor and maintenance issues also arise.

If you rclone sync to improve disaster resilience, remember you'll need to recreate a local database unless you sync it as well, however if you sync it, note that it's unencrypted and you might want to add encryption on the sync, if you don't trust remote.

@duplicatibot
Copy link

This issue has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/quota-size-check-to-not-be-exceeded/17268/4

@LastDragon-ru
Copy link

LastDragon-ru commented Jan 6, 2024

Economically, I'm not sure it's worthwhile to gather free accounts when paid storage costs single digit US cents per GB-month.

It may be actual for some people in some countries. But RAID is useful not only for save money. For example, I use it to have a local copy on old hdd (10+ years old 2Tb seagate, so technically it can die at any time...). Local backup works much faster than from cloud.

I personally wish better integration with rclone rather than a new storage. Storing and editing rclone config in UI, built-in bininaries, etc.

@ts678
Copy link
Collaborator

ts678 commented Jan 6, 2024

But RAID is useful not only for save money.

I agree, and that's why I mentioned other uses while wondering only if aggregating free accounts was worthwhile.

Regarding other countries, prices do likely vary (and currencies certainly do), but the Internet is mostly worldwide.
Some countries are firewalled in, you might want a service that works in your language, etc. There can be reasons.

Regarding integration (which is a good point because I'm suggesting someone search for things to integrate with),
you're reflecting my comment that there's labor and maintenance, though Duplicati labor might reduce your labor.

Integrating Rclone would be a separate enhancement request, probably similarly slowed by the lack of volunteers.
Regarding binaries, Rclone is native code on various CPU+OS. Duplicati is more portable. Native code complicates,
however future Duplicati might drop the portability of .NET Framework and mono in favor of more native scheme.
That's another thing that has to happen, even beyond the basic need to get Duplicati out of its current Beta state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests