Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/1.7] Fix various timing issues with docker pusher #9921

Merged

Conversation

jedevc
Copy link
Contributor

@jedevc jedevc commented Mar 4, 2024

Backport of #8379, cc @dmcgowan.

It would be nice to be able to grab this update downstream for buildkit/buildx, see docker/buildx#2232 (comment).

io.Pipe produces a PipeReader and a PipeWriter - a close on the write
side, causes an error on both the read and write sides, while a close on
the read side causes an error on only the read side. Previously, we
explicitly prohibited closing from the read side.

However, http.Request.Body requires that "calling Close should unblock a
Read waiting for input". Our reader will not do this - calling close
becomes a no-op. This can cause a deadlock because client.Do may never
terminate in some circumstances.

We need the Reader side to close its side of the pipe as well, which it
already does using the go standard library - otherwise, we can hang
forever, writing to a pipe that will never be closed.

Allowing the requester to close the body should be safe - we never reuse
the same reader between requests, as the result of body() will never be
reused by the guarantees of the standard library.

Signed-off-by: Justin Chadwell <me@jedevc.com>
If Close is called externally before a request is attempted, then we
will accidentally attempt to send to a closed channel, causing a panic.

To avoid this, we can check to see if Close has been called, using a
done channel. If this channel is ever done, we drop any incoming errors,
requests or pipes - we don't need them, since we're done.

Signed-off-by: Justin Chadwell <me@jedevc.com>
If we get io.ErrClosedPipe in pushWriter.Write, there are three possible
scenarios:

- The request has failed, we need to attempt a reset, so we can expect a
  new pipe incoming on pipeC.
- The request has failed, we don't need to attempt a reset, so we can
  expect an incoming error on errC.
- Something else externally has called Close, so we can expect the done
  channel to be closed.

This patch ensures that we block for as long as possible (while still
handling each of the above cases, so we avoid hanging), to make sure
that we properly return an appropriate error message each time.

Signed-off-by: Justin Chadwell <me@jedevc.com>
Signed-off-by: Justin Chadwell <me@jedevc.com>
If a writer continually asks to be reset then it should always succeed -
it should be the responsibility of the underlying content.Writer to
stop producing ErrReset after some amount of time and to instead return
the underlying issue - which pushWriter already does today, using the
doWithRetries function.

doWithRetries already has a separate cap for retries of 6 requests (5
retries after the original failure), and it seems like this would be
previously overridden by content.Copy's max number of 5 attempts, hiding
the original error.

Signed-off-by: Justin Chadwell <me@jedevc.com>
If sending two messages from goroutine X:

	a <- 1
	b <- 2

And receiving them in goroutine Y:

	select {
	case <- a:
	case <- b:
	}

Either branch of the select can trigger first - so when we call
.setError and .Close next to each other, we don't know whether the done
channel will close first or the error channel will receive first - so
sometimes, we get an incorrect error message.

We resolve this by not sending both signals - instead, we can have
.setError *imply* .Close, by having the pushWriter call .Close on
itself, after receiving an error.

Signed-off-by: Justin Chadwell <me@jedevc.com>
We also need an additional check to avoid setting both the error and
response which can create a race where they can arrive in the receiving
thread in either order.

If we hit an error, we don't need to send the response.

> There is a condition where the registry (unexpectedly, not to spec)
> returns 201 or 204 on the put before the body is fully written. I would
> expect that the http library would issue close and could fall into a
> deadlock here. We could just read respC and call setResponse. In that
> case ErrClosedPipe would get returned and Commit shouldn't be called
> anyway.

Signed-off-by: Justin Chadwell <me@jedevc.com>
@k8s-ci-robot
Copy link

Hi @jedevc. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dmcgowan dmcgowan merged commit 33f877f into containerd:release/1.7 Mar 4, 2024
54 checks passed
@dmcgowan dmcgowan changed the title [release/1.7] Backport fix various timing issues with docker pusher [release/1.7] Fix various timing issues with docker pusher Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants