Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a race in lifecycle execution #1062

Merged
merged 1 commit into from
Mar 28, 2023
Merged

Conversation

sywhang
Copy link
Contributor

@sywhang sywhang commented Mar 28, 2023

In #1061, a fix was made to ensure all Stop hooks whose corresponding Start hooks ran successfully were run.

This does not fix a case where it's possible for the Start hooks to race with the Stop hooks. In particular, it's possible for a Start hook to be in the middle of execution when Fx App bails out of the execution loop due to Start lifecycle context expiration. If the Stop hook happens to be running when the Start hook finishes running, the same hook will be run twice because the l.hooks and l.numStarted can change in the middle of the Stop hook execution loop.

The fix is quite simple - we need to snapshot this state before going into the Stop hook execution loop so that the changed state due to the Start hook that updated the state does not affect the Stop hook execution loop.

Verified that the attached test fails without the fix made in this PR with the following error:

--- FAIL: TestAppStart (0.00s)
    --- FAIL: TestAppStart/race_test (0.30s)
        app_test.go:1252:
            	Error Trace:	/Users/sungyoon/go/src/github.com/uber-go/fx/app_test.go:1252
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/lifecycle.go:338
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/lifecycle.go:300
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/app.go:700
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/app.go:781
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/asm_arm64.s:1172
            	Error:      	expected each hook to run exactly once only
            	Test:       	TestAppStart/race_test
FAIL

In uber-go#1061, a fix was made to ensure all Stop hooks whose corresponding
Start hooks ran successfully were run.

This does not fix a case where it's possible for the Start hooks to
race with the Stop hooks. In particular, it's possible for a Start
hook to be in the middle of execution when Fx App bails out of the
execution loop due to Start lifecycle context expiration. If the
Stop hook happens to be running when the Start hook finishes running,
the same hook will be run twice because the l.hooks and l.numStarted
can change in the middle of the Stop hook execution loop.

The fix is quite simple - we need to snapshot this state before going
into the Stop hook execution loop so that the changed state due to the
Start hook that updated the state does not affect the Stop hook execution
loop.

Verified that the attached test fails without the fix made in this PR
with the following error:
```
--- FAIL: TestAppStart (0.00s)
    --- FAIL: TestAppStart/race_test (0.30s)
        app_test.go:1252:
            	Error Trace:	/Users/sungyoon/go/src/github.com/uber-go/fx/app_test.go:1252
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/lifecycle.go:338
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/lifecycle.go:300
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/app.go:700
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/app.go:781
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/asm_arm64.s:1172
            	Error:      	expected each hook to run exactly once only
            	Test:       	TestAppStart/race_test
FAIL

```
@codecov
Copy link

codecov bot commented Mar 28, 2023

Codecov Report

Merging #1062 (85113b1) into master (8b301d4) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1062   +/-   ##
=======================================
  Coverage   98.52%   98.52%           
=======================================
  Files          39       39           
  Lines        2989     2992    +3     
=======================================
+ Hits         2945     2948    +3     
  Misses         37       37           
  Partials        7        7           
Impacted Files Coverage Δ
internal/lifecycle/lifecycle.go 93.63% <100.00%> (+0.08%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sywhang sywhang merged commit c9134b8 into uber-go:master Mar 28, 2023
@sywhang sywhang deleted the fix-race-lifecycle branch March 28, 2023 03:15
sywhang added a commit to sywhang/fx that referenced this pull request Mar 28, 2023
The race test that was added in Start/Stop hooks in uber-go#1062. This test
can fail if the machine is sufficiently slow that the second Stop
hook does not finish running before the test finishes.

This removes the check for the second hook from running - this is still
adequately testing the code path we wanted to test since the race
would occur on the first stop hook with the second start hook.
sywhang added a commit that referenced this pull request Mar 28, 2023
The race test that was added in Start/Stop hooks in #1062. This test can
fail if the machine is sufficiently slow that the second Stop hook does
not finish running before the test finishes.

This removes the check for the second hook from running - this is still
adequately testing the code path we wanted to test since the race would
occur on the first stop hook with the second start hook.
JacobOaks added a commit that referenced this pull request Mar 30, 2023
#1062 introduced a test for fixing a race condition introduced by #1061.
#1063 tuned this race condition because it was failing on Windows
machines. This PR changes the test to use channels instead of relying on
timing to provide more deterministic reproduction of the race, with the
caveat being that it's not triggered by context timeout or cancel
anymore.

I have verified that this consistently fails pre-#1062 and consistently
passes post-#1062.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants