Add a dry-run option to dregsy #71

daconstenla · 2022-06-01T13:51:59Z

This code is adding a new optional parameter --dry-run to dregsy's command line.
It allows easier debugging of matching tags based on the specified configuration without syncing all the information to the target docker registry.

faster
simple
cheaper

xelalexv · 2022-06-01T15:07:55Z

Great PR! Thanks! A dry-run is definitely a good idea.

Before going into review, could you please remove the commit for building on AMR64? This is a separate concern and should be addressed separately. I think it's best to first open an issue for that. I've been thinking about adding cross-compilation to the project recently, but would probably solve that by moving the build into hack/devenvutil, and just make parameterized calls to that from the Makefile.

Once the commit is removed, please squash the remaining two commits, I'll then review.

daconstenla · 2022-06-01T17:05:56Z

Removed M1 related changes and stick to the feature of the dry-run execution.

Since I got into the code I've included additional information:

show tags from the target registry
show subtraction of (expanded tags) - (existing tags in target) ~ tags to be synced
show subtraction of (existing tags in target) - (expanded tags) ~ not synced with the current config

which might also help to find problems and / or information about current state. (also maybe cleanup on target)

xelalexv · 2022-06-01T17:06:47Z

I've just merged another PR, so you will have to rebase. Sorry about that! But there are not that many conflicts, so it shouldn't be too complicated.

Mea culpa

daconstenla · 2022-06-01T17:07:38Z

Saw it so just did it @xelalexv, code is ready to be reviewed 😜

xelalexv · 2022-06-02T06:37:36Z

I did a first review. Here are my comments:

I think we should push dry run handling into the relays. The code for expanding source tag lists is already present there, and auth & TLS verify work correctly. With the current implementation, getting tag lists may fail for certain registries, since certDir is not set.
Going into the relays would additionally exercise all the mechanics involved, just short circuit the actual sync. Have a look at the branch I pushed, it's a sketch of what I have in mind.
Target tag lists and result output could then be done via a helper object. This could actually be set on the relay instead of just a dry run flag.
When retrieving target tag lists, we need to consider the case where the target repo does not yet exist (we should not create it during a dry run). This is easy since all expanded source tags would then apply. We just need to handle it properly.
We could improve the output. The dry-run log output seems a bit hard to read, in particular when many tags are involved. Maybe we could create a JSON/YAML/diff file or whatever instead of writing to log. That could then also be processed with generic tools.
Make all interval tasks one-off. We may not want the dry-run to run indefinitely.

xelalexv · 2022-06-03T06:43:17Z

I've just noticed that starting with Skopeo 1.8.0, the sync command which we use in dregsy also introduces a dry-run mode. I think we should take advantage of that once that Skopeo version is available via packages in Ubuntu and Alpine. This will have an impact on the design of this feature.

xelalexv · 2022-06-03T14:19:49Z

I've just switched to building our own Skopeo binary, and merged those changes. So we can now use latest Skopeo release and start experimenting with Skopeo's dry-run feature.

daconstenla · 2022-06-03T14:23:05Z

I've just switched to building our own Skopeo binary, and merged those changes. So we can now use latest Skopeo release and start experimenting with Skopeo's dry-run feature.

I'm going to rebase and give it a try, thanks @xelalexv 👌

xelalexv · 2022-06-03T16:40:25Z

I'll be away for the next two weeks, so no need to rush 😉

daconstenla · 2022-06-03T18:51:42Z

In my initial attempt to push the --dry-run down to relays and...

On skopeo's relay I've tried to use --dry-run native's option from skopeo's command and result is:

it requires to use sync subcommand instead of copy (it reminded me of Allow use of skopeo's sync command instead of copy #58)
it requires to reformat the parameters if we use sync
is not as fast as the proposed solution because it actually launches skopeo as many times as tags as we would sync tags

On docker's relay:

there is no native --dry-run option available as relay uses docker push directly, I guess we could "just print" the command if we'd like to do implement some kind of dry-run execution.

Conclusion, maybe we should delegate on the relies the tag obtention but not the actual dry run execution.

I'm going to try it out and will share more details, pushing the current state using native's dry-run option only on skopeo's relay.

xelalexv · 2022-06-03T19:26:15Z

Ah, sorry about that! I was jumping to conclusions here. I guess I was too thrilled to see a dry run option in the Skopeo release notes 🤣 But we use skoepo copy, not sync, so that doesn't help for now. So I agree, we just push dry run into relays, then short circuit copy for Skopeo and pull & push for Docker, and make output available in some form.

daconstenla · 2022-06-07T06:36:39Z

Interestingly I was going over the code of the relays and found that dockerrelay depends on skopeo's because docker does not support listing tags of a repository which makes me wonder relays are the right place for the dry-run to happen. The biggest advantage is that it keeps things more flexible and adjustable on relay level but source of information is always skopeo's implementation 😌 .

I wonder if we should implement (maybe outside of this pr) a custom relay function to list images without skopeo's dependency using a plain http request to the registry:

# list repositories
curl  'http://localhost:5000/v2/_catalog'
{"repositories":["mirror/nginx","nginx"]}

# list tags for nginx
curl  'http://localhost:5000/v2/nginx/tags/list'  
{"name":"nginx","tags":["latest"]}

# list tags for mirror/nginx
curl  'http://localhost:5000/v2/mirror/nginx/tags/list'
{"name":"mirror/nginx","tags":["1.21.1-perl","1.20-perl"]}

daconstenla · 2022-06-07T08:47:12Z

~~⚠️ I just realised dry-run output generates a single file with only the latest task, newby mistake 🤦~~
Adjusted, one file per task * mapping

daconstenla · 2022-06-16T07:20:29Z

@xelalexv just rebased the code, could we take a look at the current state? remarks? changes?

daconstenla · 2022-06-16T07:29:10Z

Just removed all the non related to the task at hand changes and squashed it to a single commit 👌

daconstenla · 2022-06-20T12:18:52Z

Just rebased it again 👌

xelalexv · 2022-06-20T12:50:38Z

I'm preparing a release just now, so you'll unfortunately have to rebase once more, but should be trivial. With the release out of the way, we can continue the review of this PR.

and obtain the final list of labels that'll be synced as well as tags that are in the target repo and are not updated based on the current configuration

daconstenla · 2022-06-21T14:58:44Z

Rebased once more 👌

xelalexv · 2022-07-14T09:32:23Z

First my apologies for the really long delay! Right after my vacation, I was totally swamped with work.

Now for the PR. It's great progress! Here are a few comments of things we could improve further:

Since I recently changed the relay.Sync(...) interface to take a SyncOptions struct, it's maybe cleaner now to not persist the dry run flag in relay instances, but add it to the SyncOptions when calling relay.Sync().
I'm not quite sure about dumping the dry run result in to separate output files. I'd prefer a single output. Also, the instantiation of the map you're passing to util.DumpMapAsJson is the same in both relays. Since it's quite large, and the keys contained in the map may change as we develop dry run further, the code duplication is a concern. I think it's justifiable to create a dedicated type for dry run result collection, which could be passed in the sync options, and could also function as the dry run flag (see above).

We should also maintain a hierarchy in the collected data, e.g. instead of saying:

{
    "task name": "task 1",
    "task index": "0",
    ...
}

we could say:

{
    "task": {
        "name": "task 1",
        "index": "0"
    },
    ...
}

I think this could make post-processing of dry run results a lot easier.

It should be possible to choose between YAML & JSON for output.
Have you tried this with source/target registries that require authentication?

daconstenla · 2022-07-20T11:25:43Z

Thanks for the improvement suggestions @xelalexv.

Since I recently changed the relay.Sync(...) interface to take a SyncOptions struct, it's maybe cleaner now to not persist the dry run flag in relay instances, but add it to the SyncOptions when calling relay.Sync().

Does this mean we expect dry-run to be configured on task level? I mean SyncOptions are on task level, right? 🤔

I'm not quite sure about dumping the dry run result in to separate output files. I'd prefer a single output. Also, the instantiation of the map you're passing to util.DumpMapAsJson is the same in both relays. Since it's quite large, and the keys contained in the map may change as we develop dry run further, the code duplication is a concern. I think it's justifiable to create a dedicated type for dry run result collection, which could be passed in the sync options, and could also function as the dry run flag (see above).

We should also maintain a hierarchy in the collected data, e.g. instead of saying:

Got it, let me try to do so.

It should be possible to choose between YAML & JSON for output.

could it be a further iteration?

Have you tried this with source/target registries that require authentication?

yes, I did and it worked so far, actually I've been using this development version for dry-run validations for the last weeks.

xelalexv · 2022-07-20T11:40:43Z

Since I recently changed the relay.Sync(...) interface to take a SyncOptions struct, it's maybe cleaner now to not persist the dry run flag in relay instances, but add it to the SyncOptions when calling relay.Sync().

Does this mean we expect dry-run to be configured on task level? I mean SyncOptions are on task level, right?

Hadn't thought about that. Right now it's just about persisting the dry-run flag cleanly in only one place. Passing this then each time via the SyncOptions would open up a way to do per task dry-runs, amongst other things. Whether per task dry-run is meaningful is a different story.

It should be possible to choose between YAML & JSON for output.

could it be a further iteration?

Of course. We can start with JSON, and then extend later. We just need to keep this in mind during initial implementation.

Have you tried this with source/target registries that require authentication?

yes, I did and it worked so far, actually I've been using this development version for dry-run validations for the last weeks.

Great. I'll try myself later on.

daconstenla · 2022-07-20T12:57:43Z

I'm not quite sure about dumping the dry run result in to separate output files. I'd prefer a single output. Also, the instantiation of the map you're passing to util.DumpMapAsJson is the same in both relays. Since it's quite large, and the keys contained in the map may change as we develop dry run further, the code duplication is a concern. I think it's justifiable to create a dedicated type for dry run result collection, which could be passed in the sync options, and could also function as the dry run flag (see above).
We should also maintain a hierarchy in the collected data, e.g. instead of saying:

I'm trying to think on how to do this and I do remember why I did generated one file per-task. Right now we are calling once per task to s.relay.Sync(...), which means, inside of the function we only know about the currently being executed task, if we would like to generate a map with all the task's results we'd need to return the --dry-run results back to the ./internal/pkg/sync/sync.go to then write all the data at once.

Ideas? comments?

xelalexv · 2022-07-20T13:16:47Z

... I think it's justifiable to create a dedicated type for dry run result collection, which could be passed in the sync options, and could also function as the dry run flag (see above)...

I'm trying to think on how to do this and I do remember why I did generated one file per-task. Right now we are calling once per task to s.relay.Sync(...), which means, inside of the function we only know about the currently being executed task, if we would like to generate a map with all the task's results we'd need to return the --dry-run results back to the ./internal/pkg/sync/sync.go to then write all the data at once.

Ideas? comments?

In the simplest form the collector could for example just be a map that we create at the start of a dry-run, and which we set in SyncOptions each time we sync a task. Each task then adds its dry-run output under its key. When all tasks are done, we can post-process the collected outputs. We could take this further by making this map a type and attach the post-processing functionality to it.

daconstenla force-pushed the add-dry-run-command branch 2 times, most recently from c47d16a to 4ed09b5 Compare June 1, 2022 17:05

xelalexv added a commit that referenced this pull request Jun 2, 2022

[wip] PR #71 review

07a4691

daconstenla pushed a commit to daconstenla/dregsy that referenced this pull request Jun 3, 2022

[wip] PR xelalexv#71 review

4e4d81f

daconstenla force-pushed the add-dry-run-command branch from 4ed09b5 to 277a5de Compare June 3, 2022 18:51

daconstenla pushed a commit to daconstenla/dregsy that referenced this pull request Jun 7, 2022

[wip] PR xelalexv#71 review

d5f66ce

daconstenla force-pushed the add-dry-run-command branch from 90bc3a3 to 2b50094 Compare June 7, 2022 06:30

daconstenla force-pushed the add-dry-run-command branch from 5d5d11b to c359d96 Compare June 7, 2022 09:39

daconstenla pushed a commit to daconstenla/dregsy that referenced this pull request Jun 16, 2022

[wip] PR xelalexv#71 review

f6b4a0f

daconstenla force-pushed the add-dry-run-command branch from 804e590 to c246889 Compare June 16, 2022 07:20

daconstenla force-pushed the add-dry-run-command branch from c246889 to a8d93ea Compare June 16, 2022 07:28

daconstenla force-pushed the add-dry-run-command branch 2 times, most recently from 40fdc27 to 49a2cd8 Compare June 20, 2022 12:18

Add dry-run flag to allow configuration validation

1b92cf4

and obtain the final list of labels that'll be synced as well as tags that are in the target repo and are not updated based on the current configuration

daconstenla force-pushed the add-dry-run-command branch from 49a2cd8 to 1b92cf4 Compare June 21, 2022 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a dry-run option to dregsy #71

Add a dry-run option to dregsy #71

daconstenla commented Jun 1, 2022

xelalexv commented Jun 1, 2022

daconstenla commented Jun 1, 2022

xelalexv commented Jun 1, 2022

daconstenla commented Jun 1, 2022

xelalexv commented Jun 2, 2022

xelalexv commented Jun 3, 2022

xelalexv commented Jun 3, 2022

daconstenla commented Jun 3, 2022

xelalexv commented Jun 3, 2022

daconstenla commented Jun 3, 2022

xelalexv commented Jun 3, 2022

daconstenla commented Jun 7, 2022

daconstenla commented Jun 7, 2022 •

edited

daconstenla commented Jun 16, 2022

daconstenla commented Jun 16, 2022

daconstenla commented Jun 20, 2022

xelalexv commented Jun 20, 2022

daconstenla commented Jun 21, 2022

xelalexv commented Jul 14, 2022

daconstenla commented Jul 20, 2022

xelalexv commented Jul 20, 2022

daconstenla commented Jul 20, 2022

xelalexv commented Jul 20, 2022

Add a dry-run option to dregsy #71

Are you sure you want to change the base?

Add a dry-run option to dregsy #71

Conversation

daconstenla commented Jun 1, 2022

xelalexv commented Jun 1, 2022

daconstenla commented Jun 1, 2022

xelalexv commented Jun 1, 2022

daconstenla commented Jun 1, 2022

xelalexv commented Jun 2, 2022

xelalexv commented Jun 3, 2022

xelalexv commented Jun 3, 2022

daconstenla commented Jun 3, 2022

xelalexv commented Jun 3, 2022

daconstenla commented Jun 3, 2022

xelalexv commented Jun 3, 2022

daconstenla commented Jun 7, 2022

daconstenla commented Jun 7, 2022 • edited

daconstenla commented Jun 16, 2022

daconstenla commented Jun 16, 2022

daconstenla commented Jun 20, 2022

xelalexv commented Jun 20, 2022

daconstenla commented Jun 21, 2022

xelalexv commented Jul 14, 2022

daconstenla commented Jul 20, 2022

xelalexv commented Jul 20, 2022

daconstenla commented Jul 20, 2022

xelalexv commented Jul 20, 2022

daconstenla commented Jun 7, 2022 •

edited