Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Start ctx expiration making app unstoppable #1017

Merged
merged 1 commit into from
Jan 9, 2023

Conversation

sywhang
Copy link
Contributor

@sywhang sywhang commented Jan 9, 2023

This fixes #1015.

When Start() is called, it passes StartTimeout to the relayer goroutine, which is monitoring the Stop signal, as well as the OS signal handler.

It then exits the goroutine if the start context expires, which would naturally happen if the user uses the default setting (15s timeout) or any finite amount of timeout value.

This causes the app to then be not responsive to any OS signals such as SIGTERM or SIGINT.

This was a regression introduced in 1.19 release via #989 .

To fix this, this commit simply removes the relayer goroutine from selecting on the start context being completed.

Verified that all tests are still passing without any goroutine leak, except one test that was triggering a panic to test the panic handler.

To prevent that one test from opting out every single test into goleak.VerifyNone(t) in every sub test, I pulled out that panic test into a separate test package so that we can continue to use goleak.VerifyNone() method in app_test.

This fixes uber-go#1015.

When Start() is called, it passes StartTimeout to the relayer
goroutine, which is monitoring the Stop signal, as well as the
OS signal handler.

It then exits the goroutine if the start context expires, which
would naturally happen if the user uses the default setting (15s
timeout) or any finite amount of timeout value.

This causes the app to then be not responsive to any OS signals
such as SIGTERM or SIGINT.

This was a regression introduced in 1.19 release.

To fix this, this commit simply removes the relayer goroutine
from selecting on the start context being completed.

Verified that all tests are still passing without any goroutine
leak, except one test that was triggering a panic to test the
panic handler.

To prevent that one test from opting out every single test into
goleak.VerifyNone(t) in every sub test, I pulled out that panic
test into a separate test package so that we can continue to
use goleak.VerifyNone() method in app_test.
@codecov
Copy link

codecov bot commented Jan 9, 2023

Codecov Report

Merging #1017 (b36c345) into master (213eb86) will decrease coverage by 0.00%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1017      +/-   ##
==========================================
- Coverage   98.14%   98.14%   -0.01%     
==========================================
  Files          39       39              
  Lines        1994     1992       -2     
==========================================
- Hits         1957     1955       -2     
  Misses         29       29              
  Partials        8        8              
Impacted Files Coverage Δ
signal.go 100.00% <ø> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sywhang
Copy link
Contributor Author

sywhang commented Jan 9, 2023

Note to reviewers: will kick off internal Go monorepo build pinning it to this branch

@sywhang sywhang merged commit f0d73c6 into uber-go:master Jan 9, 2023
@sywhang sywhang deleted the fix-start-timeout branch January 9, 2023 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1.19 Regression: It's impossible to shutdown from an OS signal after startCtx is Done().
3 participants