Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replicate all dependencies of a dataset first #572

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

JakobR
Copy link
Contributor

@JakobR JakobR commented Aug 1, 2020

Assuming we want to replicate the following pool:

NAME            USED  AVAIL  REFER  MOUNTPOINT              ORIGIN
testpool1      1.10M  38.2M   288K  /Volumes/testpool1      -
testpool1/A     326K  38.2M   293K  /Volumes/testpool1/A    testpool1/B@b
testpool1/A/D   303K  38.2M   288K  /Volumes/testpool1/A/D  -
testpool1/B    35.5K  38.2M   292K  /Volumes/testpool1/B    testpool1/C@a
testpool1/C     306K  38.2M   290K  /Volumes/testpool1/C    -

Note the clone dependencies: A -> B -> C.

Currently, syncoid notices that A and B are clones and defers syncing them.
There are two problems:

  1. Syncing A/D fails because we have deferred A.
  2. The clone relation A -> B will not be recreated since the list of deferred datasets does not take into account clone relations between them.

This PR solves both of these problems by collecting all dependencies of a dataset and syncing them before the dataset itself.


One problematic case remains: if a dataset depends (transitively) on one of its own children, e.g.:

NAME            USED  AVAIL  REFER  MOUNTPOINT              ORIGIN
testpool1/E    58.5K  38.7M   298K  /Volumes/testpool1/E    testpool1/E/D@e
testpool1/E/D  37.5K  38.7M   296K  /Volumes/testpool1/E/D  testpool1/A@d

Here, the first run of syncoid will fail to sync E/D.
I've chosen to ignore this case for now because

  1. it seems quite artificial and not like something that would occur in practice very often, and
  2. a second run of syncoid will successfully sync E/D too (although the clone relation E -> E/D is lost).

Assuming we want to replicate the following pool:

```
NAME            USED  AVAIL  REFER  MOUNTPOINT              ORIGIN
testpool1      1.10M  38.2M   288K  /Volumes/testpool1      -
testpool1/A     326K  38.2M   293K  /Volumes/testpool1/A    testpool1/B@b
testpool1/A/D   303K  38.2M   288K  /Volumes/testpool1/A/D  -
testpool1/B    35.5K  38.2M   292K  /Volumes/testpool1/B    testpool1/C@a
testpool1/C     306K  38.2M   290K  /Volumes/testpool1/C    -
```

Note the clone dependencies: `A -> B -> C`.

Currently, syncoid notices that `A` and `B` are clones and defers syncing them.
There are two problems:

1. Syncing `A/D` fails because we have deferred `A`.

2. The clone relation `A -> B` will not be recreated since the list of deferred datasets does not take into account clone relations between them.

This PR solves both of these problems by collecting all dependencies of a dataset and syncing them before the dataset itself.

---

One problematic case remains: if a dataset depends (transitively) on one of its own children, e.g.:

```
NAME            USED  AVAIL  REFER  MOUNTPOINT              ORIGIN
testpool1/E    58.5K  38.7M   298K  /Volumes/testpool1/E    testpool1/E/D@e
testpool1/E/D  37.5K  38.7M   296K  /Volumes/testpool1/E/D  testpool1/A@d
```

Here, the first run of syncoid will fail to sync `E/D`.
I've chosen to ignore this case for now because
1) it seems quite artificial and not like something that would occur in practice very often, and
2) a second run of syncoid will successfully sync `E/D` too (although the clone relation `E -> E/D` is lost).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant