-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test/integration: Avoid data race from FileMutex #12049
Conversation
Current dependencies on/for this PR:
This comment was auto-generated by Graphite. |
Changelog[uncommitted] (2023-02-03) |
|
||
// ctx.Err will be non-nil when the context finishes | ||
// either because it timed out or because it got canceled. | ||
for ctx.Err() == nil { | ||
if err := mutex.Lock(); err != nil { | ||
time.Sleep(1 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to change in this changeset but I wonder if this sleep loop could be improved to speed up tests? If a lot of these ops locked by synchronouslyDo are ms might be worth tuning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah answered my own question, it's only used for building components so 1s is probably on par with the script being ran.
bors r+ |
12025: [sdks/go] Delegate alias computation to the engine r=Zaid-Ajaj a=Zaid-Ajaj Fixes #11066 Addresses #11697 Credit to `@abhinav` for making aliases unit-testable by intercepting `RegisterResource` calls. > I did change the test slightly so that it either checks for `AliasURNs: []string` or `Aliases: []*pulumirpc.Alias` because I've made it such that one of them is `nil` depending on `supportsAliasSpecs` ## Checklist <!--- Please provide details if the checkbox below is to be left unchecked. --> - [x] I have added tests that prove my fix is effective or that my feature works <!--- User-facing changes require a CHANGELOG entry. --> - [x] I have run `make changelog` and committed the `changelog/pending/<file>` documenting my change <!-- If the change(s) in this PR is a modification of an existing call to the Pulumi Service, then the service should honor older versions of the CLI where this change would not exist. You must then bump the API version in /pkg/backend/httpstate/client/api.go, as well as add it to the service. --> - [ ] Yes, there are changes in this PR that warrants bumping the Pulumi Service API version <!-- `@Pulumi` employees: If yes, you must submit corresponding changes in the service repo. --> 12046: Use 'errors' not 'pkg/errors' in go codegen r=Frassle a=Frassle <!--- Thanks so much for your contribution! If this is your first time contributing, please ensure that you have read the [CONTRIBUTING](https://github.com/pulumi/pulumi/blob/master/CONTRIBUTING.md) documentation. --> # Description <!--- Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. --> Continue clean up of our use of pkg/errors. This changes our Go code generator to stop using it, there's still a few places in the sdk using it so go.mods will still reference it. Looks like the only thing the code generator used "pkg/errors" for was `New` which is also on "errors". ## Checklist <!--- Please provide details if the checkbox below is to be left unchecked. --> - [ ] I have added tests that prove my fix is effective or that my feature works - Covered by existing tests <!--- User-facing changes require a CHANGELOG entry. --> - [x] I have run `make changelog` and committed the `changelog/pending/<file>` documenting my change <!-- If the change(s) in this PR is a modification of an existing call to the Pulumi Service, then the service should honor older versions of the CLI where this change would not exist. You must then bump the API version in /pkg/backend/httpstate/client/api.go, as well as add it to the service. --> - [ ] Yes, there are changes in this PR that warrants bumping the Pulumi Service API version <!-- `@Pulumi` employees: If yes, you must submit corresponding changes in the service repo. --> 12049: test/integration: Avoid data race from FileMutex r=abhinav a=abhinav integration_util_test contains a helper function synchronouslyDo which attempts to run an operation with a file lock with a timeout, in a blocking manner. This implementation has a couple issues. First, there's a data race between Lock() and Unlock() of the mutex: the function defers an unlock regardless of whether a lock was acquired, and if the timing is right, it causes a data race in FileMutex on reading and writing the fsunlock field. ``` WARNING: DATA RACE Read at 0x00c000388040 by goroutine 16: github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Unlock() /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:64 +0x3e github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func1() /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:263 +0x39 [..] Previous write at 0x00c000388040 by goroutine 17: github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Lock() /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:55 +0x72 github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func2() /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:269 +0x4e ``` This data race is not expected because the contract for FileMutex is that the Unlock method should only be called after Lock, typicaly from the same goroutine --- synchronouslyDo does not do this. Secondly, synchronouslyDo has a minor bug: it will run the function *eventually* when the lock has been acquired even if the timeout has expirted and the test has failed by then. Resolve these issues by making the following changes: - use a context to track the timeout - defer an unlock only if a lock was successfully acquired - run the operation only if we still have time to run it Includes a failing test case. Co-authored-by: Zaid Ajaj <zaid.naom@gmail.com> Co-authored-by: Fraser Waters <fraser@pulumi.com> Co-authored-by: Abhinav Gupta <abhinav@pulumi.com>
Build failed (retrying...): |
12046: Use 'errors' not 'pkg/errors' in go codegen r=Frassle a=Frassle <!--- Thanks so much for your contribution! If this is your first time contributing, please ensure that you have read the [CONTRIBUTING](https://github.com/pulumi/pulumi/blob/master/CONTRIBUTING.md) documentation. --> # Description <!--- Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. --> Continue clean up of our use of pkg/errors. This changes our Go code generator to stop using it, there's still a few places in the sdk using it so go.mods will still reference it. Looks like the only thing the code generator used "pkg/errors" for was `New` which is also on "errors". ## Checklist <!--- Please provide details if the checkbox below is to be left unchecked. --> - [ ] I have added tests that prove my fix is effective or that my feature works - Covered by existing tests <!--- User-facing changes require a CHANGELOG entry. --> - [x] I have run `make changelog` and committed the `changelog/pending/<file>` documenting my change <!-- If the change(s) in this PR is a modification of an existing call to the Pulumi Service, then the service should honor older versions of the CLI where this change would not exist. You must then bump the API version in /pkg/backend/httpstate/client/api.go, as well as add it to the service. --> - [ ] Yes, there are changes in this PR that warrants bumping the Pulumi Service API version <!-- `@Pulumi` employees: If yes, you must submit corresponding changes in the service repo. --> 12049: test/integration: Avoid data race from FileMutex r=abhinav a=abhinav integration_util_test contains a helper function synchronouslyDo which attempts to run an operation with a file lock with a timeout, in a blocking manner. This implementation has a couple issues. First, there's a data race between Lock() and Unlock() of the mutex: the function defers an unlock regardless of whether a lock was acquired, and if the timing is right, it causes a data race in FileMutex on reading and writing the fsunlock field. ``` WARNING: DATA RACE Read at 0x00c000388040 by goroutine 16: github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Unlock() /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:64 +0x3e github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func1() /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:263 +0x39 [..] Previous write at 0x00c000388040 by goroutine 17: github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Lock() /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:55 +0x72 github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func2() /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:269 +0x4e ``` This data race is not expected because the contract for FileMutex is that the Unlock method should only be called after Lock, typicaly from the same goroutine --- synchronouslyDo does not do this. Secondly, synchronouslyDo has a minor bug: it will run the function *eventually* when the lock has been acquired even if the timeout has expirted and the test has failed by then. Resolve these issues by making the following changes: - use a context to track the timeout - defer an unlock only if a lock was successfully acquired - run the operation only if we still have time to run it Includes a failing test case. Co-authored-by: Fraser Waters <fraser@pulumi.com> Co-authored-by: Abhinav Gupta <abhinav@pulumi.com>
bors cancel |
Canceled. |
Cherry-picks #12049 into the test PR since that also includes some fixes for how component setup works.
bors merge |
12049: test/integration: Avoid data race from FileMutex r=Frassle a=abhinav integration_util_test contains a helper function synchronouslyDo which attempts to run an operation with a file lock with a timeout, in a blocking manner. This implementation has a couple issues. First, there's a data race between Lock() and Unlock() of the mutex: the function defers an unlock regardless of whether a lock was acquired, and if the timing is right, it causes a data race in FileMutex on reading and writing the fsunlock field. ``` WARNING: DATA RACE Read at 0x00c000388040 by goroutine 16: github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Unlock() /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:64 +0x3e github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func1() /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:263 +0x39 [..] Previous write at 0x00c000388040 by goroutine 17: github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Lock() /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:55 +0x72 github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func2() /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:269 +0x4e ``` This data race is not expected because the contract for FileMutex is that the Unlock method should only be called after Lock, typicaly from the same goroutine --- synchronouslyDo does not do this. Secondly, synchronouslyDo has a minor bug: it will run the function *eventually* when the lock has been acquired even if the timeout has expirted and the test has failed by then. Resolve these issues by making the following changes: - use a context to track the timeout - defer an unlock only if a lock was successfully acquired - run the operation only if we still have time to run it Includes a failing test case. Co-authored-by: Abhinav Gupta <abhinav@pulumi.com>
Build failed: |
bors retry |
12049: test/integration: Avoid data race from FileMutex r=Frassle a=abhinav integration_util_test contains a helper function synchronouslyDo which attempts to run an operation with a file lock with a timeout, in a blocking manner. This implementation has a couple issues. First, there's a data race between Lock() and Unlock() of the mutex: the function defers an unlock regardless of whether a lock was acquired, and if the timing is right, it causes a data race in FileMutex on reading and writing the fsunlock field. ``` WARNING: DATA RACE Read at 0x00c000388040 by goroutine 16: github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Unlock() /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:64 +0x3e github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func1() /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:263 +0x39 [..] Previous write at 0x00c000388040 by goroutine 17: github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Lock() /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:55 +0x72 github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func2() /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:269 +0x4e ``` This data race is not expected because the contract for FileMutex is that the Unlock method should only be called after Lock, typicaly from the same goroutine --- synchronouslyDo does not do this. Secondly, synchronouslyDo has a minor bug: it will run the function *eventually* when the lock has been acquired even if the timeout has expirted and the test has failed by then. Resolve these issues by making the following changes: - use a context to track the timeout - defer an unlock only if a lock was successfully acquired - run the operation only if we still have time to run it Includes a failing test case. Co-authored-by: Abhinav Gupta <abhinav@pulumi.com>
}() | ||
|
||
select { | ||
case <-time.After(timeout): | ||
case <-ctx.Done(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get rid of this select and just wait for lockWait.
My worry is that we could start running fn() in the goroutine, then the timeout hits and this select fires, and we just leave the goroutine running and don't get to see it's result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess it means lockWait needs to be a bool of true/false for it it timed out or not.
Build failed: |
I think given this is an improvement over what's currently in, and is ready to go we should merge it. But I'm going to take a look to see if we can just delete filelocks from our system entirely. They're used in two places:
|
Ack. |
Never mind, #12065 was incorrect. |
integration_util_test contains a helper function synchronouslyDo which attempts to run an operation with a file lock with a timeout, in a blocking manner. This implementation has a couple issues. First, there's a data race between Lock() and Unlock() of the mutex: the function defers an unlock regardless of whether a lock was acquired, and if the timing is right, it causes a data race in FileMutex on reading and writing the fsunlock field. ``` WARNING: DATA RACE Read at 0x00c000388040 by goroutine 16: github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Unlock() /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:64 +0x3e github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func1() /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:263 +0x39 [..] Previous write at 0x00c000388040 by goroutine 17: github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Lock() /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:55 +0x72 github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func2() /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:269 +0x4e ``` This data race is not expected because the contract for FileMutex is that the Unlock method should only be called after Lock, typicaly from the same goroutine --- synchronouslyDo does not do this. Secondly, synchronouslyDo has a minor bug: it will run the function *eventually* when the lock has been acquired even if the timeout has expirted and the test has failed by then. Resolve these issues by making the following changes: - use a context to track the timeout - defer an unlock only if a lock was successfully acquired - run the operation only if we still have time to run it Includes a previously failing test case.
e6588b2
to
dd69dfc
Compare
bors r+ |
12025: [sdks/go] Delegate alias computation to the engine r=abhinav a=Zaid-Ajaj Fixes #11066 Addresses #11697 Credit to `@abhinav` for making aliases unit-testable by intercepting `RegisterResource` calls. > I did change the test slightly so that it either checks for `AliasURNs: []string` or `Aliases: []*pulumirpc.Alias` because I've made it such that one of them is `nil` depending on `supportsAliasSpecs` ## Checklist <!--- Please provide details if the checkbox below is to be left unchecked. --> - [x] I have added tests that prove my fix is effective or that my feature works <!--- User-facing changes require a CHANGELOG entry. --> - [x] I have run `make changelog` and committed the `changelog/pending/<file>` documenting my change <!-- If the change(s) in this PR is a modification of an existing call to the Pulumi Service, then the service should honor older versions of the CLI where this change would not exist. You must then bump the API version in /pkg/backend/httpstate/client/api.go, as well as add it to the service. --> - [ ] Yes, there are changes in this PR that warrants bumping the Pulumi Service API version <!-- `@Pulumi` employees: If yes, you must submit corresponding changes in the service repo. --> 12028: Require linting before running unit, integ, and smoke tests. r=abhinav a=RobbieMcKinstry <!--- Thanks so much for your contribution! If this is your first time contributing, please ensure that you have read the [CONTRIBUTING](https://github.com/pulumi/pulumi/blob/master/CONTRIBUTING.md) documentation. --> # Description **Update:** With #12031 linting runs in about a minute. Linting should be an extremely low-watermark requirement for evaluating build health. Blocking on it allows use to reduce the number of concurrent runners who are canceled early. **Trade-offs:** * This should delay CI time by the amount of time it takes to lint: _e.g._ CI will be ~5 minutes slower on the happy path. * When a job is queued that fails a lint check, fewer runners will be soaked up just to fail lint checks. This will decrease the overall queue time across all builds. * Ultimately, we're trading slower happy-path builds for smarter build scheduling. * We can mitigate the linting bottleneck by speeding up the linting process (#12023). <!--- Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. --> This PR supports but isn't sufficient for #12019 ## Checklist **This PR is intended to impact CI only, and thus does not justify a CHANGELOG entry or a test.** <!--- Please provide details if the checkbox below is to be left unchecked. --> - [ ] I have added tests that prove my fix is effective or that my feature works <!--- User-facing changes require a CHANGELOG entry. --> - [ ] I have run `make changelog` and committed the `changelog/pending/<file>` documenting my change <!-- If the change(s) in this PR is a modification of an existing call to the Pulumi Service, then the service should honor older versions of the CLI where this change would not exist. You must then bump the API version in /pkg/backend/httpstate/client/api.go, as well as add it to the service. --> - [ ] Yes, there are changes in this PR that warrants bumping the Pulumi Service API version <!-- `@Pulumi` employees: If yes, you must submit corresponding changes in the service repo. --> 12043: sdk/go: Don't store DependsOn in a lossy form r=abhinav a=abhinav The `DependsOn` and `DependsOnInputs` resource options store their captured information on the `resourceOptions` struct in a lossy format: they store function references. This makes it impossible to go back to the original lists of resources or resource array inputs for use cases like #11698. As a step towards making this possible, replace the stored closures with interfaces. The implementations in the first commit are a drop-in replacement for the prior behavior with no logic changes whatsoever. The second commit makes a minor optimization: it adds URNs to the same set instead of constantly allocating new sets and combining them afterwards. Refs #11698 12046: Use 'errors' not 'pkg/errors' in go codegen r=abhinav a=Frassle <!--- Thanks so much for your contribution! If this is your first time contributing, please ensure that you have read the [CONTRIBUTING](https://github.com/pulumi/pulumi/blob/master/CONTRIBUTING.md) documentation. --> # Description <!--- Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. --> Continue clean up of our use of pkg/errors. This changes our Go code generator to stop using it, there's still a few places in the sdk using it so go.mods will still reference it. Looks like the only thing the code generator used "pkg/errors" for was `New` which is also on "errors". ## Checklist <!--- Please provide details if the checkbox below is to be left unchecked. --> - [ ] I have added tests that prove my fix is effective or that my feature works - Covered by existing tests <!--- User-facing changes require a CHANGELOG entry. --> - [x] I have run `make changelog` and committed the `changelog/pending/<file>` documenting my change <!-- If the change(s) in this PR is a modification of an existing call to the Pulumi Service, then the service should honor older versions of the CLI where this change would not exist. You must then bump the API version in /pkg/backend/httpstate/client/api.go, as well as add it to the service. --> - [ ] Yes, there are changes in this PR that warrants bumping the Pulumi Service API version <!-- `@Pulumi` employees: If yes, you must submit corresponding changes in the service repo. --> 12047: pkg/errors cleanup for sdk/go/common/resource/config r=abhinav a=Frassle Continuing pkg/errors cleanup. 12049: test/integration: Avoid data race from FileMutex r=abhinav a=abhinav integration_util_test contains a helper function synchronouslyDo which attempts to run an operation with a file lock with a timeout, in a blocking manner. This implementation has a couple issues. First, there's a data race between Lock() and Unlock() of the mutex: the function defers an unlock regardless of whether a lock was acquired, and if the timing is right, it causes a data race in FileMutex on reading and writing the fsunlock field. ``` WARNING: DATA RACE Read at 0x00c000388040 by goroutine 16: github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Unlock() /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:64 +0x3e github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func1() /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:263 +0x39 [..] Previous write at 0x00c000388040 by goroutine 17: github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Lock() /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:55 +0x72 github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func2() /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:269 +0x4e ``` This data race is not expected because the contract for FileMutex is that the Unlock method should only be called after Lock, typicaly from the same goroutine --- synchronouslyDo does not do this. Secondly, synchronouslyDo has a minor bug: it will run the function *eventually* when the lock has been acquired even if the timeout has expirted and the test has failed by then. Resolve these issues by making the following changes: - use a context to track the timeout - defer an unlock only if a lock was successfully acquired - run the operation only if we still have time to run it Includes a failing test case. 12068: test/integration: Don't panic during setup r=abhinav a=abhinav Currently, component setup panics if there's an error. This isn't great because when it panics, it fails to notify the outer goroutine that's waiting for setup, which leaves that goroutine waiting for 10 minutes gefore giving up. The net effect of this is that when setup fails for a test, it takes 10 minutes to kill the test even if setup failed within seconds. Fix this by using testify and logging errors right away. Co-authored-by: Zaid Ajaj <zaid.naom@gmail.com> Co-authored-by: Robbie McKinstry <robbie@pulumi.com> Co-authored-by: Abhinav Gupta <abhinav@pulumi.com> Co-authored-by: Fraser Waters <fraser@pulumi.com>
Build failed (retrying...): |
Build succeeded: |
integration_util_test contains a helper function synchronouslyDo
which attempts to run an operation with a file lock with a timeout,
in a blocking manner.
This implementation has a couple issues.
First, there's a data race between Lock() and Unlock() of the mutex:
the function defers an unlock regardless of whether a lock was acquired,
and if the timing is right, it causes a data race in FileMutex
on reading and writing the fsunlock field.
This data race is not expected because the contract for FileMutex is
that the Unlock method should only be called after Lock,
typicaly from the same goroutine --- synchronouslyDo does not do this.
Secondly, synchronouslyDo has a minor bug:
it will run the function eventually when the lock has been acquired
even if the timeout has expirted and the test has failed by then.
Resolve these issues by making the following changes:
Includes a failing test case.