Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c8d: Just enough Windows support to run the test suite #46539

Merged
merged 7 commits into from
Jan 17, 2024

Conversation

TBBle
Copy link
Contributor

@TBBle TBBle commented Sep 24, 2023

- What I did

Building upon #46402 #46572 to bring the containerd image-store support on Windows up to something that is worth having tests enabled for.

- How I did it

Firstly, cherry-picked #46402 onto the mainline branch, once containerd 1.7.6 had been vendored by #45966. Update: Currently based directly on #46402 until it merges, unless something important lands in mainline first.

Secondly, got the snapshotter tests functional on Windows matching the Linux version. (Two-liner, most of the work was done by @rumpl in #46402).

Violently beat my head against Iteratively worked through failures observed in the integration test suite setup, building the busybox image. As it happens, this step does most of the things we seem to care about, so it was a good test-case.

Rebased the results of the above after #46572 landed, which integrated the outcomes of the second paragraph already.

Remaining TODOs to undraft this PR:


- Hack 1 (53777db)

builder-next specifies the host network provider, but that fails on Windows, blocking daemon startup even though we don't currently support builder-next.

My current hack is to use auto, since that means cni if configured and if not, on Windows it falls back to null, and on Linux it falls back to host.

I've currently only done this for Windows with a runtime.GOOS test, but I'm inclined to think we should use auto always, since: it does the right thing on Linux, and if we ever do need a more-complex network setup, CNI is the obvious path. If we ever do get builder-next working for Windows, it'll need to use CNI anyway, even for the equivalent of host networking.


- Hack 2 (6e48f48 or b099f42)

The windowfilter graphdriver is a bit weird: When you "mount" an image from it, it actually only does the volume setup (Prepare and Activate in HCS terms) and then returns the \\?\Volume{GUID}\ formatted volume name, where other graphdrivers (and all the containerd snapshotters) actually mount the image to a host directory and return that directory.

However, starting a container as we do now requires us to pass that \\?\Volume{GUID}\-formatted volume name in the OCI spec as Root.Path.

Normally, containerd doesn't expose that volume name to callers, but we can use the underlying Windows API to get the volume name from the mount point, which relies on internal knowledge of the containerd/mount implementation (earlier implementation of the same idea that I'm less confident in).

The BindFilter approach was pretty messy though, and I didn't like it very much. (There's probably a bunch of overlooked edge cases there too. I have no idea why I see four copies of the bind-mount there, which is another vote against the older BindFilter version.) The GetLayerMountPath isn't too bad, it's still poking at internal details, but they're details that containerd's mount package could trivially provide an API to expose, if we go this way.

Beyond cleaning it up, an alternative approach would be to not populate Windows.LayerFolders nor Root.Path at all (removing the need for 1ad566e as well) and instead use containerd's WithSnapshot and WithSnapshotter config-functor-things when setting up the container.

However, this would, at least on Windows, require us to unmount the directory before starting the container, as this flow makes hcsshim responsible for both mounting of the root layer and deriving Windows.LayerFolders from the snapshot. Note that in the latter case, there's a bug where hcsshim'll append to any Windows.LayerFolders passed in (0.12 onwards rejects this case explicitly instead), so we must not populate it in this case. nil is fine, even though it's out-of-spec.

This should be feasible as we already have the conditionalMountOnStart hook, but code flows after that is called need to be checked to ensure they are correctly handling the case of not mounting the container filesystem locally. For example, I noticed code on the Windows side that checks the same conditions as the current conditionalMountOnStart check, and if they match, does a temporary mount of the image; clearly there's some scope for refactoring that to not duplicate the condition around.

If it was just Windows, I'd have tried to go with this approach. However, I think it'd be cleaner if we use WithSnapshot for both Windows and Linux when executing containers. This would be a larger change, and probably has repercussions I'm unaware of at this time, since I haven't really worked to understand the overarching flow and its rationales. (Most other containerd-calling code appears to use WithNewSnapshot, but moby currently has a pile of stuff it wants to do to the image first, so I'm not considering that here.)


- How to verify it

CI.

This PR has the containerd-integration label applied, so the new snapshotter test suite introduced in #46402 #46572 will run for this PR (and any PR with that label), giving us both Windows and Linux coverage. As long as the failures in the snapshotter test suite are consistent across platforms (and I didn't break anything), we're probably fine.

All of the changes made here (except an interface change while the compiler will validate) are only in the "use snapshotter" code-path, so verification should be low-risk.

If someone wants to take the time and try this out with the WASM runtime, that'd be awesome. I have no particular plans to do so, even though that's what started me on this piece of work, months ago.

- Description for the changelog

Containerd image store on Windows is slightly usable.

- A picture of a cute animal (not mandatory but encouraged)

image

@rumpl rumpl added the containerd-integration Issues and PRs related to containerd integration label Sep 24, 2023
@rumpl
Copy link
Member

rumpl commented Sep 24, 2023

Added the label 👍

Comment on lines 335 to 358
// TODO: Do we actually need to put it back aterwards?
// The only place that calls this in-tree is (b *Builder) exportImage and
// that is called from the end of (b *Builder) performCopy which has a
// `defer rwLayer.Release()` pending.
Copy link
Contributor Author

@TBBle TBBle Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about the WCOW snapshotter implementation in containerd, once you Commit a snapshot, any future changes you make to it will leave it in an inconsistent state. This is because (unlike the graphdriver), Commit updates the snapshot in-place from a RWLayer to a ROLayer. The RWLayer data (sandbox.vhdx) is kept around so that 'Diff' works later, but it's actually impossible from the API to get a new read-write reference to that layer, as the old snapshot key is invalidated, and the new snapshot key can only be used as a parent for new layers.

We get around this here by capturing the mount data before Commit, but I realise now that's more hacky than I intended, as due to the difference with the graphdriver behaviour, anything touching that data will make future Diff calls inconsistent.

So perhaps rwlayer.Commit() should imply Release(), matching containerd's API? That's how it's currently used now, anyway. That's not a container image store-specific change though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the interface Godoc that's true for the Commit method in general, not only WCOW:

	// Commit captures the changes between key and its parent into a snapshot
	// identified by name.  The name can then be used with the snapshotter's other
	// methods to create subsequent snapshots.
	//
	// A committed snapshot will be created under name with the parent of the
	// active snapshot.
	//
	// After commit, the snapshot identified by key is removed.
	Commit(ctx context.Context, name, key string, opts ...Opt) error

So the current Linux implementation already assumes that the layer won't be used anymore after Commit.
IMO, we can just skip the remount. It would be nice if we could keep the API consistent and make the graphdriver's Commit also release the layer, but given the single usage and the fact that we're probably not going to have any new code that would use this I don't think it's strictly needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit that removes the remount.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about rebasing this PR the other day, so thank you for this.

If more non-trivial work is needed for this PR, and I'm doing it, do you have any concerns if I squash that "remount removal" commit into the commit it's modifying ("Temporarily unmount RWLayer during Commit")?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I did it as a separate commit to not mix my changes with your original PR and make it more visible was was changed. Will squash it now.

GetLayerFolders(img *image.Image, rwLayer layer.RWLayer) ([]string, error)
GetLayerFolders(img *image.Image, rwLayer layer.RWLayer, containerID string) ([]string, error)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be thrilled if someone points out a better thing to pass into this API that can provide what's needed for both graphdriver and snapshotter implementations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think passing the containerID makes sense here. I also did the same for removing the RWLayer field from container struct (that's only a draft though).

builder/builder-next/controller.go Show resolved Hide resolved
daemon/oci_windows.go Outdated Show resolved Hide resolved
@TBBle TBBle force-pushed the containerd_image_store_pr46402 branch from 7d9b386 to 4860d25 Compare September 25, 2023 06:09
.github/workflows/.windows.yml Outdated Show resolved Hide resolved
@TBBle TBBle force-pushed the containerd_image_store_pr46402 branch 2 times, most recently from 5a63f11 to 14a28b1 Compare September 29, 2023 14:14
@TBBle TBBle force-pushed the containerd_image_store_pr46402 branch from 14a28b1 to b099f42 Compare October 16, 2023 12:05
@TBBle TBBle force-pushed the containerd_image_store_pr46402 branch 2 times, most recently from 606a279 to 14da6a1 Compare November 3, 2023 16:33
@TBBle TBBle marked this pull request as ready for review November 3, 2023 16:33
@vvoland vvoland force-pushed the containerd_image_store_pr46402 branch from 14da6a1 to ff8840d Compare January 17, 2024 09:13
@vvoland
Copy link
Contributor

vvoland commented Jan 17, 2024

@TBBle I rebased the PR to get a fresh CI run.

@thaJeztah thaJeztah added this to the 25.0.0 milestone Jan 17, 2024
@@ -332,32 +332,19 @@ func (rw *rwlayer) Commit() (_ builder.ROLayer, outErr error) {
}()

// Temporarily unmount the layer, required by the containerd windows snapshotter.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer temporary. We also might want to mention that the windowsfilter graphdriver also remounts it afterwards, as it justifies the new final comment paragraph.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, missed that part of the comment. Fixed it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

windowsfilter graphdriver also remounts it afterwards

Hmmm are you sure about that? I looked into the code and couldn't find it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, there's this code in the image service (so level above the graphdriver):

// TODO: this mount call is not necessary as we assume that TarStream() should
// mount the layer if needed. But the Diff() function for windows requests that
// the layer should be mounted when calling it. So we reserve this mount call
// until windows driver can implement Diff() interface correctly.
_, err = rwlayer.Mount(mountLabel)
if err != nil {
return nil, err
}

suggesting the opposite - that the Diff actually needs the layer to be mounted.
OTOH, this PR works so it looks like that comment might actually be outdated/wrong?

@TBBle
Copy link
Contributor Author

TBBle commented Jan 17, 2024

I had a quick look at the test run, and most of the amazingly few integration test failures on Windows mentioned "file in use", e.g., container restart tests, which suggests the logic around the impedance mismatch between graphdriver and snapshotter might still be incorrect or inconsistent. But it might also be unrelated, if we're leaking a file handle or something.

My opinion is that we can improve that by moving containerd-storage to use WithSnapshot for executing containers as described in the original post, rather than mounting and passing-in the mount-point as we do now. But that's a wider question, and I think landing this work first is better, because more tests better.

@thaJeztah
Copy link
Member

@TBBle are you near a computer to do a new rebase and to fix up that comment? We just merged #46634, which makes all the tests green on Linux, and (sorry for the long delay on this PR), we want to merge this PR to get Windows tests "functional"

I agree that we may want to look at follow-ups, but at least your work on this PR (thank you SO much) brings us much further in the good direction.

Let me know if you have time / opportunity to rebase / push, otherwise I can ask someone to do so 👍

* conatinerd => containerd
* ROLayer => RWLayer

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
This is consistent with layerStore's CreateRWLayer behaviour.

Potentially this can be refactored to avoid creating the -init layer,
but as noted in layerStore's initMount, this name may be special, and
should be cleared-out all-at-once.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Needed for Diff on Windows. Don't remount it afterwards as the layer is
going to be released anyway.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
The existing API ImageService.GetLayerFolders didn't have access to the
ID of the container, and once we have that, the snapshotter Mounts API
provides all the information we need here.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
That means 'null', not that we can call builder-next on Windows. If and
when we do get builder-next going, this will need to be solved properly
in some way.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
The actual divergence is due to differences in the snapshotter and
graphfilter mount behaviour on Windows, but the snapshotter behaviour is
better, so we deal with it here rather than changing the snapshotter
behaviour.

We're relying on the internals of containerd's Windows mount
implementation here. Unless this code flow is replaced, future work is
to move getBackingDeviceForContainerdMount into containerd's mount
implementation.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
@vvoland vvoland force-pushed the containerd_image_store_pr46402 branch from ae478a8 to e8f4bfb Compare January 17, 2024 15:29
@thaJeztah
Copy link
Member

Looks indeed that there's some remaining failures (filesystem in use);

=== Failed
=== FAIL: github.com/docker/docker/integration-cli TestDockerCLICreateSuite/TestCreateWithWorkdir (3.29s)
    docker_cli_create_test.go:292: assertion failed: 
        Command:  D:\a\moby\moby\out\docker.exe cp foo:c:\home\foo\bar c:\tmp
        ExitCode: 1
        Error:    exit status 1
        Stdout:   
        Stderr:   Error response from daemon: failed to mount C:\Users\runneradmin\AppData\Local\Temp\moby-root\rootfs\windows\d48fe588c5f8aa74a34a3af9f4da4a489ab63ac8aac8960b914f0f6d93b6a183: failed to activate layer C:\Users\RUNNER~1\AppData\Local\Temp\ctn-root\io.containerd.snapshotter.v1.windows\snapshots\138: hcsshim::ActivateLayer failed in Win32: The process cannot access the file because it is being used by another process. (0x20)
        
        
        Failures:
        ExitCode was 1 expected 0
        Expected no error

We (@rumpl @vvoland) discussed this on Slack, and want to;

  • merge this PR (it give "100%" more coverage on Windows with snapshotters)
  • create a new epic for "fix remaining failures on Windows" (similar to [epic] fix remaining failing tests with the containerd image store #46760)
  • enable containerd snapshotter tests by default on Linux (now that Linux is green)
  • keep the containerd snapshotter tests for windows gated by the containerd-integration label

Waiting for CI to finish, but once it does, we can go ahead and merge this one.

Copy link
Contributor

@vvoland vvoland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 😄

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

graph driver tests are still green; failures with snapshotter are currently expected (see above)

LGTM

let's bring this one in 🎉 ❤️

@thaJeztah thaJeztah merged commit 436bf27 into moby:master Jan 17, 2024
134 of 138 checks passed
@vvoland
Copy link
Contributor

vvoland commented Jan 17, 2024

Thanks @TBBle for making this PR! Sorry it had to wait so long!

@rumpl
Copy link
Member

rumpl commented Jan 17, 2024

Wooohoooo 🎉❤️🎉

@TBBle
Copy link
Contributor Author

TBBle commented Jan 18, 2024

Awesome, thank you for fixing and merging. Sorry, I posted my comment yesterday and immediately went to bed, but looks like it wasn't a blocker. Yay!

@TBBle TBBle deleted the containerd_image_store_pr46402 branch January 18, 2024 07:05
@thaJeztah
Copy link
Member

Sorry, I posted my comment yesterday and immediately went to bed, but looks like it wasn't a blocker. Yay!

Ha! Yes, I'm aware you're in a very different timezone than us, so thought "let's wait a few minutes, and otherwise let's make the changes". We decided to take the risk of a possible race condition, and to carry the last few bits.

I also saw @vvoland started to have a look at some of the remaining failures in this PR;

(review / input welcome!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants