Replicate all dependencies of a dataset first #572

JakobR · 2020-08-01T07:55:07Z

Assuming we want to replicate the following pool:

NAME            USED  AVAIL  REFER  MOUNTPOINT              ORIGIN
testpool1      1.10M  38.2M   288K  /Volumes/testpool1      -
testpool1/A     326K  38.2M   293K  /Volumes/testpool1/A    testpool1/B@b
testpool1/A/D   303K  38.2M   288K  /Volumes/testpool1/A/D  -
testpool1/B    35.5K  38.2M   292K  /Volumes/testpool1/B    testpool1/C@a
testpool1/C     306K  38.2M   290K  /Volumes/testpool1/C    -

Note the clone dependencies: A -> B -> C.

Currently, syncoid notices that A and B are clones and defers syncing them.
There are two problems:

Syncing A/D fails because we have deferred A.
The clone relation A -> B will not be recreated since the list of deferred datasets does not take into account clone relations between them.

This PR solves both of these problems by collecting all dependencies of a dataset and syncing them before the dataset itself.

One problematic case remains: if a dataset depends (transitively) on one of its own children, e.g.:

NAME            USED  AVAIL  REFER  MOUNTPOINT              ORIGIN
testpool1/E    58.5K  38.7M   298K  /Volumes/testpool1/E    testpool1/E/D@e
testpool1/E/D  37.5K  38.7M   296K  /Volumes/testpool1/E/D  testpool1/A@d

Here, the first run of syncoid will fail to sync E/D.
I've chosen to ignore this case for now because

it seems quite artificial and not like something that would occur in practice very often, and
a second run of syncoid will successfully sync E/D too (although the clone relation E -> E/D is lost).

Assuming we want to replicate the following pool: ``` NAME USED AVAIL REFER MOUNTPOINT ORIGIN testpool1 1.10M 38.2M 288K /Volumes/testpool1 - testpool1/A 326K 38.2M 293K /Volumes/testpool1/A testpool1/B@b testpool1/A/D 303K 38.2M 288K /Volumes/testpool1/A/D - testpool1/B 35.5K 38.2M 292K /Volumes/testpool1/B testpool1/C@a testpool1/C 306K 38.2M 290K /Volumes/testpool1/C - ``` Note the clone dependencies: `A -> B -> C`. Currently, syncoid notices that `A` and `B` are clones and defers syncing them. There are two problems: 1. Syncing `A/D` fails because we have deferred `A`. 2. The clone relation `A -> B` will not be recreated since the list of deferred datasets does not take into account clone relations between them. This PR solves both of these problems by collecting all dependencies of a dataset and syncing them before the dataset itself. --- One problematic case remains: if a dataset depends (transitively) on one of its own children, e.g.: ``` NAME USED AVAIL REFER MOUNTPOINT ORIGIN testpool1/E 58.5K 38.7M 298K /Volumes/testpool1/E testpool1/E/D@e testpool1/E/D 37.5K 38.7M 296K /Volumes/testpool1/E/D testpool1/A@d ``` Here, the first run of syncoid will fail to sync `E/D`. I've chosen to ignore this case for now because 1) it seems quite artificial and not like something that would occur in practice very often, and 2) a second run of syncoid will successfully sync `E/D` too (although the clone relation `E -> E/D` is lost).

phreaker0 mentioned this pull request Apr 25, 2023

Fail on missing clone #817

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replicate all dependencies of a dataset first #572

Replicate all dependencies of a dataset first #572

JakobR commented Aug 1, 2020

Replicate all dependencies of a dataset first #572

Are you sure you want to change the base?

Replicate all dependencies of a dataset first #572

Conversation

JakobR commented Aug 1, 2020