Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race between serve and immediate shutdown on the server #175

Merged
merged 2 commits into from
Oct 29, 2024

Conversation

klihub
Copy link
Member

@klihub klihub commented Sep 13, 2024

Fix a race where an asynchronous server.Serve() invoked in a a goroutine races with an almost immediate server.Shutdown().

If Shutdown() finishes its locked closing of listeners before Serve() gets around to add the new one, Serve will sit stuck forever in l.Accept(), unless the caller closes the listener in addition to Shutdown().

This is probably almost impossible to trigger in real life, but unit tests which run the server and client in the same process can trigger this. If a test then tries to verify after a Shutdown() a final ErrServerClosed error from Serve() it gets stuck forever.

@klihub klihub requested review from dmcgowan and fuweid September 13, 2024 13:10
@klihub klihub force-pushed the fixes/serve-listen-shutdown-race branch 2 times, most recently from f4a5a58 to 4ca1d79 Compare September 13, 2024 13:24

Verified

This commit was signed with the committer’s verified signature.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

Verified

This commit was signed with the committer’s verified signature.
Fix a race where an asynchronous server.Serve() invoked in a
a goroutine races with an almost immediate server.Shutdown().
If Shutdown() finishes its locked closing of listeners before
Serve() gets around to add the new one, Serve will sit stuck
forever in l.Accept(), unless the caller closes the listener
in addition to Shutdown().

This is probably almost impossible to trigger in real life,
but some of the unit tests, which run the server and client
in the same process, occasionally do trigger this. Then, if
the test tries to verify a final ErrServerClosed error from
Serve() after Shutdown() it gets stuck forever.

Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
@klihub klihub force-pushed the fixes/serve-listen-shutdown-race branch from 4ca1d79 to c4d96d5 Compare September 16, 2024 06:53
@AkihiroSuda AkihiroSuda merged commit b71d9de into containerd:main Oct 29, 2024
11 checks passed
@klihub klihub deleted the fixes/serve-listen-shutdown-race branch October 29, 2024 06:59
@alam0rt
Copy link

alam0rt commented Jan 15, 2025

This is probably almost impossible to trigger in real life

I think we've definitely come across this: containerd/containerd#8981 (comment)

going to test out 2.0.2 and can let you know if we get these issues.

Mengkzhaoyun pushed a commit to open-beagle/containerd that referenced this pull request Feb 7, 2025

Verified

This commit was signed with the committer’s verified signature.
containerd 2.0.2

Welcome to the v2.0.2 release of containerd!

The second patch release for containerd 2.0 includes a number of bug fixes and improvements.

* Remove confusing warning in cri runtime config migration ([#11256](containerd/containerd#11256))
* Fix runtime platform loading in cri image plugin init ([#11248](containerd/containerd#11248))

* Update runc binary to v1.2.4 ([#11239](containerd/containerd#11239))

Please try out the release binaries and report any issues at
https://github.com/containerd/containerd/issues.

* Jin Dong
* Derek McGowan
* Akihiro Suda
* Kazuyoshi Kato
* Henry Wang
* Krisztian Litkey
* Phil Estes
* Samuel Karp
* Sebastiaan van Stijn
* Akhil Mohan
* Brian Goff
* Chongyi Zheng
* Maksym Pavlenko
* Mike Brown
* Pierre Gimalac
* Wei Fu
<details><summary>23 commits</summary>
<p>

* Prepare release notes for v2.0.2 ([#11245](containerd/containerd#11245))
  * [`cdaf4dfb4`](containerd/containerd@cdaf4df) Prepare release notes for v2.0.2
* Update platforms to latest rc ([#11259](containerd/containerd#11259))
  * [`eb125e1dd`](containerd/containerd@eb125e1) Update platforms to latest rc
* Remove confusing warning in cri runtime config migration ([#11256](containerd/containerd#11256))
  * [`468079c5c`](containerd/containerd@468079c) Remove confusing warning in cri runtime config migration
* Fix runtime platform loading in cri image plugin init ([#11248](containerd/containerd#11248))
  * [`a2d9d4fd5`](containerd/containerd@a2d9d4f) Fix runtime platform loading in cri image plugin init
* make sure console master tty is closed on task exit ([#11246](containerd/containerd#11246))
  * [`184ffad01`](containerd/containerd@184ffad) Add integ test to check tty leak
  * [`17181ed33`](containerd/containerd@17181ed) fix master tty leak due to leaking init container object
* Bump up otelttrpc to 0.1.0 ([#11242](containerd/containerd#11242))
  * [`8666e7422`](containerd/containerd@8666e74) Bump up otelttrpc to 0.1.0
* ctr: `ctr images import --all-platforms`: fix unpack ([#11236](containerd/containerd#11236))
  * [`c4270430d`](containerd/containerd@c427043) ctr: `ctr images import --all-platforms`: fix unpack
* Update runc binary to v1.2.4 ([#11239](containerd/containerd#11239))
  * [`7373ddd70`](containerd/containerd@7373ddd) update runc binary to v1.2.4
* downgrade go-difflib and go-spew to tagged releases ([#11222](containerd/containerd#11222))
  * [`f34147772`](containerd/containerd@f341477) downgrade go-difflib and go-spew to tagged releases
* Add a build tag to disable std `plugin` import ([#11213](containerd/containerd#11213))
  * [`dca769485`](containerd/containerd@dca7694) chore: add a build tag to disable containerd plugin import
* Update golangci to 1.60.3 ([#11187](containerd/containerd#11187))
  * [`5942b3fcb`](containerd/containerd@5942b3f) Update golangci to 1.60.3
</p>
</details>
<details><summary>6 commits</summary>
<p>

* Add dependabot and upgrade golang and dependency versions ([containerd/otelttrpc#3](containerd/otelttrpc#3))
  * [`2d46141`](containerd/otelttrpc@2d46141) upgrade golang, deps, CI versions
  * [`64922e7`](containerd/otelttrpc@64922e7) Add dependabot CI
* Fix concurrent map panic on metadata ([containerd/otelttrpc#2](containerd/otelttrpc#2))
  * [`2ba3be1`](containerd/otelttrpc@2ba3be1) Fix concurrent map panic on inject metadata
  * [`f50a922`](containerd/otelttrpc@f50a922) UT for concurrent inject/extract metadata
</p>
</details>
<details><summary>6 commits</summary>
<p>

* Move windows matcher logic so all platforms can use ([containerd/platforms#22](containerd/platforms#22))
  * [`7c58292`](containerd/platforms@7c58292) Move windows matcher logic so all platforms can use
* replace testify with stdlib in tests ([containerd/platforms#21](containerd/platforms#21))
  * [`86a86b7`](containerd/platforms@86a86b7) replace testify with stdlib in tests
* Replace arm64 minor variant logic with lookup table ([containerd/platforms#18](containerd/platforms#18))
  * [`364665a`](containerd/platforms@364665a) Replace arm64 minor variant logic with lookup table
</p>
</details>
<details><summary>5 commits</summary>
<p>

* Add MD.Clone function ([containerd/ttrpc#177](containerd/ttrpc#177))
  * [`430f734`](containerd/ttrpc@430f734) Add MD.Clone
* server: fix a Serve() vs. (immediate) Shutdown() race ([containerd/ttrpc#175](containerd/ttrpc#175))
  * [`c4d96d5`](containerd/ttrpc@c4d96d5) server: fix Serve() vs. immediate Shutdown() race.
  * [`ed6c3ba`](containerd/ttrpc@ed6c3ba) server_test: add Serve()/Shutdown() race test.
</p>
</details>

* **github.com/containerd/otelttrpc**  ea5083fda723 -> v0.1.0
* **github.com/containerd/platforms**  v1.0.0-rc.0 -> v1.0.0-rc.1
* **github.com/containerd/ttrpc**      v1.2.6 -> v1.2.7
* **github.com/davecgh/go-spew**       d8f796af33cc -> v1.1.1
* **github.com/pmezard/go-difflib**    5d4384ee4fb2 -> v1.0.0
* **github.com/stretchr/testify**      v1.9.0 -> v1.10.0

Previous release can be found at [v2.0.1](https://github.com/containerd/containerd/releases/tag/v2.0.1)
* `containerd-<VERSION>-<OS>-<ARCH>.tar.gz`:         ✅Recommended. Dynamically linked with glibc 2.31 (Ubuntu 20.04).
* `containerd-static-<VERSION>-<OS>-<ARCH>.tar.gz`:  Statically linked. Expected to be used on non-glibc Linux distributions. Not position-independent.

In addition to containerd, typically you will have to install [runc](https://github.com/opencontainers/runc/releases)
and [CNI plugins](https://github.com/containernetworking/plugins/releases) from their official sites too.

See also the [Getting Started](https://github.com/containerd/containerd/blob/main/docs/getting-started.md) documentation.
@dmcgowan dmcgowan changed the title server: fix a Serve() vs. (immediate) Shutdown() race Fix race between serve and immediate shutdown on the server Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants