Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store: serialize container deletion #1722

Merged
merged 2 commits into from
Sep 27, 2023

Conversation

giuseppe
Copy link
Member

serialize the call to containersStore.Delete(id) since it attempts to recursively remove the graphroot directory.

Doing so, it makes ineffective the other goroutine that attempts to remove safely the graphroot with EnsureRemoveAll. Depending on what goroutine is faster, there can be a flake like:

2023-09-26T17:49:02.6708666Z stderr: Error: cleaning up storage: removing container 6ebff2c6c6f5fe78c158956a88467ef7af6f6a7c3d40334d248c7b7409341230 root filesystem: 1 error occurred: * unlinkat /var/tmp/podman_test3482607530/root/overlay-containers/6ebff2c6c6f5fe78c158956a88467ef7af6f6a7c3d40334d248c7b7409341230/userdata/shm: device or resource busy

it is a difficult to trigger condition, but I am hitting it constantly in: containers/crun#1312

Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com

EnsureRemoveAll already implements the trivial rm -rf attempt first,
so there is no need to try it before calling EnsureRemoveAll.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
serialize the call to containersStore.Delete(id) since it attempts to
recursively remove the graphroot directory.

Doing so, it makes ineffective the other goroutine that attempts to
remove safely the graphroot with EnsureRemoveAll.  Depending on what
goroutine is faster, there can be a flake like:

2023-09-26T17:49:02.6708666Z   stderr: Error: cleaning up storage: removing container 6ebff2c6c6f5fe78c158956a88467ef7af6f6a7c3d40334d248c7b7409341230 root filesystem: 1 error occurred: * unlinkat /var/tmp/podman_test3482607530/root/overlay-containers/6ebff2c6c6f5fe78c158956a88467ef7af6f6a7c3d40334d248c7b7409341230/userdata/shm: device or resource busy

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member Author

@edsantiago this could be the reason for the ".*/userdata/shm: device or resource busy" flake we see occasionally

@vrothberg PTAL

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

Could you open a test-PR against Podman?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 27, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

giuseppe added a commit to giuseppe/libpod that referenced this pull request Sep 27, 2023
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member Author

sure, opened here: containers/podman#20163

@vrothberg
Copy link
Member

I have a feeling you will make @edsantiago very very very happy with this fix. The flakes haunted us for so long.

@rhatdan
Copy link
Member

rhatdan commented Sep 27, 2023

LGTM

@rhatdan rhatdan merged commit f969739 into containers:main Sep 27, 2023
18 of 19 checks passed
@edsantiago
Copy link
Collaborator

OMG thank you so much @giuseppe!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants