Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/integration: Avoid data race from FileMutex #12049

Merged
merged 1 commit into from
Feb 4, 2023

Conversation

abhinav
Copy link
Contributor

@abhinav abhinav commented Feb 2, 2023

integration_util_test contains a helper function synchronouslyDo
which attempts to run an operation with a file lock with a timeout,
in a blocking manner.
This implementation has a couple issues.

First, there's a data race between Lock() and Unlock() of the mutex:
the function defers an unlock regardless of whether a lock was acquired,
and if the timing is right, it causes a data race in FileMutex
on reading and writing the fsunlock field.

WARNING: DATA RACE
Read at 0x00c000388040 by goroutine 16:
  github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Unlock()
      /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:64 +0x3e
  github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func1()
      /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:263 +0x39
  [..]

Previous write at 0x00c000388040 by goroutine 17:
  github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Lock()
      /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:55 +0x72
  github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func2()
      /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:269 +0x4e

This data race is not expected because the contract for FileMutex is
that the Unlock method should only be called after Lock,
typicaly from the same goroutine --- synchronouslyDo does not do this.

Secondly, synchronouslyDo has a minor bug:
it will run the function eventually when the lock has been acquired
even if the timeout has expirted and the test has failed by then.

Resolve these issues by making the following changes:

  • use a context to track the timeout
  • defer an unlock only if a lock was successfully acquired
  • run the operation only if we still have time to run it

Includes a failing test case.

Copy link
Contributor Author

abhinav commented Feb 2, 2023

@abhinav abhinav added the impact/no-changelog-required This issue doesn't require a CHANGELOG update label Feb 2, 2023
@pulumi-bot
Copy link
Contributor

pulumi-bot commented Feb 2, 2023

Changelog

[uncommitted] (2023-02-03)


// ctx.Err will be non-nil when the context finishes
// either because it timed out or because it got canceled.
for ctx.Err() == nil {
if err := mutex.Lock(); err != nil {
time.Sleep(1 * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change in this changeset but I wonder if this sleep loop could be improved to speed up tests? If a lot of these ops locked by synchronouslyDo are ms might be worth tuning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah answered my own question, it's only used for building components so 1s is probably on par with the script being ran.

@abhinav
Copy link
Contributor Author

abhinav commented Feb 2, 2023

bors r+

bors bot added a commit that referenced this pull request Feb 2, 2023
12025: [sdks/go] Delegate alias computation to the engine r=Zaid-Ajaj a=Zaid-Ajaj

Fixes #11066
Addresses #11697 

Credit to `@abhinav` for making aliases unit-testable by intercepting `RegisterResource` calls. 

> I did change the test slightly so that it either checks for `AliasURNs: []string` or `Aliases: []*pulumirpc.Alias` because I've made it such that one of them is `nil` depending on `supportsAliasSpecs`

## Checklist

<!--- Please provide details if the checkbox below is to be left unchecked. -->
- [x] I have added tests that prove my fix is effective or that my feature works
<!--- 
User-facing changes require a CHANGELOG entry.
-->
- [x] I have run `make changelog` and committed the `changelog/pending/<file>` documenting my change
<!--
If the change(s) in this PR is a modification of an existing call to the Pulumi Service,
then the service should honor older versions of the CLI where this change would not exist.
You must then bump the API version in /pkg/backend/httpstate/client/api.go, as well as add
it to the service.
-->
- [ ] Yes, there are changes in this PR that warrants bumping the Pulumi Service API version
  <!-- `@Pulumi` employees: If yes, you must submit corresponding changes in the service repo. -->


12046: Use 'errors' not 'pkg/errors' in go codegen r=Frassle a=Frassle

<!--- 
Thanks so much for your contribution! If this is your first time contributing, please ensure that you have read the [CONTRIBUTING](https://github.com/pulumi/pulumi/blob/master/CONTRIBUTING.md) documentation.
-->

# Description

<!--- Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. -->

Continue clean up of our use of pkg/errors. This changes our Go code generator to stop using it, there's still a few places in the sdk using it so go.mods will still reference it.

Looks like the only thing the code generator used "pkg/errors" for was `New` which is also on "errors".

## Checklist

<!--- Please provide details if the checkbox below is to be left unchecked. -->
- [ ] I have added tests that prove my fix is effective or that my feature works - Covered by existing tests
<!--- 
User-facing changes require a CHANGELOG entry.
-->
- [x] I have run `make changelog` and committed the `changelog/pending/<file>` documenting my change
<!--
If the change(s) in this PR is a modification of an existing call to the Pulumi Service,
then the service should honor older versions of the CLI where this change would not exist.
You must then bump the API version in /pkg/backend/httpstate/client/api.go, as well as add
it to the service.
-->
- [ ] Yes, there are changes in this PR that warrants bumping the Pulumi Service API version
  <!-- `@Pulumi` employees: If yes, you must submit corresponding changes in the service repo. -->


12049: test/integration: Avoid data race from FileMutex r=abhinav a=abhinav

integration_util_test contains a helper function synchronouslyDo
which attempts to run an operation with a file lock with a timeout,
in a blocking manner.
This implementation has a couple issues.

First, there's a data race between Lock() and Unlock() of the mutex:
the function defers an unlock regardless of whether a lock was acquired,
and if the timing is right, it causes a data race in FileMutex
on reading and writing the fsunlock field.

```
WARNING: DATA RACE
Read at 0x00c000388040 by goroutine 16:
  github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Unlock()
      /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:64 +0x3e
  github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func1()
      /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:263 +0x39
  [..]

Previous write at 0x00c000388040 by goroutine 17:
  github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Lock()
      /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:55 +0x72
  github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func2()
      /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:269 +0x4e
```

This data race is not expected because the contract for FileMutex is
that the Unlock method should only be called after Lock,
typicaly from the same goroutine --- synchronouslyDo does not do this.

Secondly, synchronouslyDo has a minor bug:
it will run the function *eventually* when the lock has been acquired
even if the timeout has expirted and the test has failed by then.

Resolve these issues by making the following changes:

- use a context to track the timeout
- defer an unlock only if a lock was successfully acquired
- run the operation only if we still have time to run it

Includes a failing test case.


Co-authored-by: Zaid Ajaj <zaid.naom@gmail.com>
Co-authored-by: Fraser Waters <fraser@pulumi.com>
Co-authored-by: Abhinav Gupta <abhinav@pulumi.com>
@bors
Copy link
Contributor

bors bot commented Feb 2, 2023

Build failed (retrying...):

bors bot added a commit that referenced this pull request Feb 2, 2023
12046: Use 'errors' not 'pkg/errors' in go codegen r=Frassle a=Frassle

<!--- 
Thanks so much for your contribution! If this is your first time contributing, please ensure that you have read the [CONTRIBUTING](https://github.com/pulumi/pulumi/blob/master/CONTRIBUTING.md) documentation.
-->

# Description

<!--- Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. -->

Continue clean up of our use of pkg/errors. This changes our Go code generator to stop using it, there's still a few places in the sdk using it so go.mods will still reference it.

Looks like the only thing the code generator used "pkg/errors" for was `New` which is also on "errors".

## Checklist

<!--- Please provide details if the checkbox below is to be left unchecked. -->
- [ ] I have added tests that prove my fix is effective or that my feature works - Covered by existing tests
<!--- 
User-facing changes require a CHANGELOG entry.
-->
- [x] I have run `make changelog` and committed the `changelog/pending/<file>` documenting my change
<!--
If the change(s) in this PR is a modification of an existing call to the Pulumi Service,
then the service should honor older versions of the CLI where this change would not exist.
You must then bump the API version in /pkg/backend/httpstate/client/api.go, as well as add
it to the service.
-->
- [ ] Yes, there are changes in this PR that warrants bumping the Pulumi Service API version
  <!-- `@Pulumi` employees: If yes, you must submit corresponding changes in the service repo. -->


12049: test/integration: Avoid data race from FileMutex r=abhinav a=abhinav

integration_util_test contains a helper function synchronouslyDo
which attempts to run an operation with a file lock with a timeout,
in a blocking manner.
This implementation has a couple issues.

First, there's a data race between Lock() and Unlock() of the mutex:
the function defers an unlock regardless of whether a lock was acquired,
and if the timing is right, it causes a data race in FileMutex
on reading and writing the fsunlock field.

```
WARNING: DATA RACE
Read at 0x00c000388040 by goroutine 16:
  github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Unlock()
      /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:64 +0x3e
  github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func1()
      /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:263 +0x39
  [..]

Previous write at 0x00c000388040 by goroutine 17:
  github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Lock()
      /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:55 +0x72
  github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func2()
      /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:269 +0x4e
```

This data race is not expected because the contract for FileMutex is
that the Unlock method should only be called after Lock,
typicaly from the same goroutine --- synchronouslyDo does not do this.

Secondly, synchronouslyDo has a minor bug:
it will run the function *eventually* when the lock has been acquired
even if the timeout has expirted and the test has failed by then.

Resolve these issues by making the following changes:

- use a context to track the timeout
- defer an unlock only if a lock was successfully acquired
- run the operation only if we still have time to run it

Includes a failing test case.


Co-authored-by: Fraser Waters <fraser@pulumi.com>
Co-authored-by: Abhinav Gupta <abhinav@pulumi.com>
@abhinav
Copy link
Contributor Author

abhinav commented Feb 3, 2023

bors cancel

@bors
Copy link
Contributor

bors bot commented Feb 3, 2023

Canceled.

abhinav added a commit that referenced this pull request Feb 3, 2023
Cherry-picks #12049 into the test PR
since that also includes some fixes for how component setup works.
@Frassle
Copy link
Member

Frassle commented Feb 3, 2023

bors merge

bors bot added a commit that referenced this pull request Feb 3, 2023
12049: test/integration: Avoid data race from FileMutex r=Frassle a=abhinav

integration_util_test contains a helper function synchronouslyDo
which attempts to run an operation with a file lock with a timeout,
in a blocking manner.
This implementation has a couple issues.

First, there's a data race between Lock() and Unlock() of the mutex:
the function defers an unlock regardless of whether a lock was acquired,
and if the timing is right, it causes a data race in FileMutex
on reading and writing the fsunlock field.

```
WARNING: DATA RACE
Read at 0x00c000388040 by goroutine 16:
  github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Unlock()
      /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:64 +0x3e
  github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func1()
      /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:263 +0x39
  [..]

Previous write at 0x00c000388040 by goroutine 17:
  github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Lock()
      /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:55 +0x72
  github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func2()
      /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:269 +0x4e
```

This data race is not expected because the contract for FileMutex is
that the Unlock method should only be called after Lock,
typicaly from the same goroutine --- synchronouslyDo does not do this.

Secondly, synchronouslyDo has a minor bug:
it will run the function *eventually* when the lock has been acquired
even if the timeout has expirted and the test has failed by then.

Resolve these issues by making the following changes:

- use a context to track the timeout
- defer an unlock only if a lock was successfully acquired
- run the operation only if we still have time to run it

Includes a failing test case.


Co-authored-by: Abhinav Gupta <abhinav@pulumi.com>
@bors
Copy link
Contributor

bors bot commented Feb 3, 2023

Build failed:

@Frassle
Copy link
Member

Frassle commented Feb 3, 2023

Error: ../../../../go/pkg/mod/gocloud.dev@v0.27.0/secrets/azurekeyvault/akv.go:49:2: github.com/Azure/azure-sdk-for-go@v66.0.0+incompatible: read "https:/proxy.golang.org/@v/v66.0.0+incompatible.zip": http2: server sent GOAWAY and closed the connection; LastStreamID=7, ErrCode=NO_ERROR, debug="server_shutting_down"
Go cache building failed

bors retry

bors bot added a commit that referenced this pull request Feb 3, 2023
12049: test/integration: Avoid data race from FileMutex r=Frassle a=abhinav

integration_util_test contains a helper function synchronouslyDo
which attempts to run an operation with a file lock with a timeout,
in a blocking manner.
This implementation has a couple issues.

First, there's a data race between Lock() and Unlock() of the mutex:
the function defers an unlock regardless of whether a lock was acquired,
and if the timing is right, it causes a data race in FileMutex
on reading and writing the fsunlock field.

```
WARNING: DATA RACE
Read at 0x00c000388040 by goroutine 16:
  github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Unlock()
      /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:64 +0x3e
  github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func1()
      /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:263 +0x39
  [..]

Previous write at 0x00c000388040 by goroutine 17:
  github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Lock()
      /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:55 +0x72
  github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func2()
      /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:269 +0x4e
```

This data race is not expected because the contract for FileMutex is
that the Unlock method should only be called after Lock,
typicaly from the same goroutine --- synchronouslyDo does not do this.

Secondly, synchronouslyDo has a minor bug:
it will run the function *eventually* when the lock has been acquired
even if the timeout has expirted and the test has failed by then.

Resolve these issues by making the following changes:

- use a context to track the timeout
- defer an unlock only if a lock was successfully acquired
- run the operation only if we still have time to run it

Includes a failing test case.


Co-authored-by: Abhinav Gupta <abhinav@pulumi.com>
}()

select {
case <-time.After(timeout):
case <-ctx.Done():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of this select and just wait for lockWait.
My worry is that we could start running fn() in the goroutine, then the timeout hits and this select fires, and we just leave the goroutine running and don't get to see it's result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess it means lockWait needs to be a bool of true/false for it it timed out or not.

@bors
Copy link
Contributor

bors bot commented Feb 3, 2023

Build failed:

@Frassle
Copy link
Member

Frassle commented Feb 3, 2023

I think given this is an improvement over what's currently in, and is ready to go we should merge it. But I'm going to take a look to see if we can just delete filelocks from our system entirely. They're used in two places:

  1. Here where I think we can actually just use process mutex because we don't run test processes in parallel.
  2. In the plugin code, which even with a lock I'm pretty sure is racy.

@abhinav
Copy link
Contributor Author

abhinav commented Feb 3, 2023

Ack.
Superseded by #12065

@abhinav abhinav closed this Feb 3, 2023
@abhinav abhinav deleted the abhinav/integration-sync-do-race branch February 3, 2023 19:02
@abhinav abhinav restored the abhinav/integration-sync-do-race branch February 3, 2023 20:08
@abhinav
Copy link
Contributor Author

abhinav commented Feb 3, 2023

Never mind, #12065 was incorrect.
We do run external binaries in parallel, so we need the file lock.

integration_util_test contains a helper function synchronouslyDo
which attempts to run an operation with a file lock with a timeout,
in a blocking manner.
This implementation has a couple issues.

First, there's a data race between Lock() and Unlock() of the mutex:
the function defers an unlock regardless of whether a lock was acquired,
and if the timing is right, it causes a data race in FileMutex
on reading and writing the fsunlock field.

```
WARNING: DATA RACE
Read at 0x00c000388040 by goroutine 16:
  github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Unlock()
      /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:64 +0x3e
  github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func1()
      /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:263 +0x39
  [..]

Previous write at 0x00c000388040 by goroutine 17:
  github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Lock()
      /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:55 +0x72
  github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func2()
      /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:269 +0x4e
```

This data race is not expected because the contract for FileMutex is
that the Unlock method should only be called after Lock,
typicaly from the same goroutine --- synchronouslyDo does not do this.

Secondly, synchronouslyDo has a minor bug:
it will run the function *eventually* when the lock has been acquired
even if the timeout has expirted and the test has failed by then.

Resolve these issues by making the following changes:

- use a context to track the timeout
- defer an unlock only if a lock was successfully acquired
- run the operation only if we still have time to run it

Includes a previously failing test case.
@abhinav abhinav changed the base branch from master to abhinav/macos-latest February 3, 2023 21:15
@abhinav abhinav force-pushed the abhinav/integration-sync-do-race branch from e6588b2 to dd69dfc Compare February 3, 2023 21:15
Base automatically changed from abhinav/macos-latest to master February 3, 2023 22:06
@abhinav
Copy link
Contributor Author

abhinav commented Feb 3, 2023

bors r+

bors bot added a commit that referenced this pull request Feb 3, 2023
12025: [sdks/go] Delegate alias computation to the engine r=abhinav a=Zaid-Ajaj

Fixes #11066
Addresses #11697 

Credit to `@abhinav` for making aliases unit-testable by intercepting `RegisterResource` calls. 

> I did change the test slightly so that it either checks for `AliasURNs: []string` or `Aliases: []*pulumirpc.Alias` because I've made it such that one of them is `nil` depending on `supportsAliasSpecs`

## Checklist

<!--- Please provide details if the checkbox below is to be left unchecked. -->
- [x] I have added tests that prove my fix is effective or that my feature works
<!--- 
User-facing changes require a CHANGELOG entry.
-->
- [x] I have run `make changelog` and committed the `changelog/pending/<file>` documenting my change
<!--
If the change(s) in this PR is a modification of an existing call to the Pulumi Service,
then the service should honor older versions of the CLI where this change would not exist.
You must then bump the API version in /pkg/backend/httpstate/client/api.go, as well as add
it to the service.
-->
- [ ] Yes, there are changes in this PR that warrants bumping the Pulumi Service API version
  <!-- `@Pulumi` employees: If yes, you must submit corresponding changes in the service repo. -->


12028: Require linting before running unit, integ, and smoke tests. r=abhinav a=RobbieMcKinstry

<!--- 
Thanks so much for your contribution! If this is your first time contributing, please ensure that you have read the [CONTRIBUTING](https://github.com/pulumi/pulumi/blob/master/CONTRIBUTING.md) documentation.
-->

# Description

**Update:** With #12031 linting runs in about a minute.

Linting should be an extremely low-watermark requirement for evaluating build health. Blocking on it allows use to reduce the number of concurrent runners who are canceled early.

**Trade-offs:**
* This should delay CI time by the amount of time it takes to lint: _e.g._ CI will be ~5 minutes slower on the happy path.
* When a job is queued that fails a lint check, fewer runners will be soaked up just to fail lint checks. This will decrease the overall queue time across all builds.
* Ultimately, we're trading slower happy-path builds for smarter build scheduling. 
* We can mitigate the linting bottleneck by speeding up the linting process (#12023).

<!--- Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. -->

This PR supports but isn't sufficient for #12019

## Checklist

**This PR is intended to impact CI only, and thus does not justify a CHANGELOG entry or a test.**

<!--- Please provide details if the checkbox below is to be left unchecked. -->
- [ ] I have added tests that prove my fix is effective or that my feature works
<!--- 
User-facing changes require a CHANGELOG entry.
-->
- [ ] I have run `make changelog` and committed the `changelog/pending/<file>` documenting my change
<!--
If the change(s) in this PR is a modification of an existing call to the Pulumi Service,
then the service should honor older versions of the CLI where this change would not exist.
You must then bump the API version in /pkg/backend/httpstate/client/api.go, as well as add
it to the service.
-->
- [ ] Yes, there are changes in this PR that warrants bumping the Pulumi Service API version
  <!-- `@Pulumi` employees: If yes, you must submit corresponding changes in the service repo. -->


12043: sdk/go: Don't store DependsOn in a lossy form r=abhinav a=abhinav



The `DependsOn` and `DependsOnInputs` resource options
store their captured information on the `resourceOptions` struct
in a lossy format: they store function references.

This makes it impossible to go back to the original lists of resources
or resource array inputs for use cases like #11698.

As a step towards making this possible,
replace the stored closures with interfaces.

The implementations in the first commit
are a drop-in replacement for the prior behavior
with no logic changes whatsoever.

The second commit makes a minor optimization:
it adds URNs to the same set instead
of constantly allocating new sets and combining them afterwards.

Refs #11698


12046: Use 'errors' not 'pkg/errors' in go codegen r=abhinav a=Frassle

<!--- 
Thanks so much for your contribution! If this is your first time contributing, please ensure that you have read the [CONTRIBUTING](https://github.com/pulumi/pulumi/blob/master/CONTRIBUTING.md) documentation.
-->

# Description

<!--- Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. -->

Continue clean up of our use of pkg/errors. This changes our Go code generator to stop using it, there's still a few places in the sdk using it so go.mods will still reference it.

Looks like the only thing the code generator used "pkg/errors" for was `New` which is also on "errors".

## Checklist

<!--- Please provide details if the checkbox below is to be left unchecked. -->
- [ ] I have added tests that prove my fix is effective or that my feature works - Covered by existing tests
<!--- 
User-facing changes require a CHANGELOG entry.
-->
- [x] I have run `make changelog` and committed the `changelog/pending/<file>` documenting my change
<!--
If the change(s) in this PR is a modification of an existing call to the Pulumi Service,
then the service should honor older versions of the CLI where this change would not exist.
You must then bump the API version in /pkg/backend/httpstate/client/api.go, as well as add
it to the service.
-->
- [ ] Yes, there are changes in this PR that warrants bumping the Pulumi Service API version
  <!-- `@Pulumi` employees: If yes, you must submit corresponding changes in the service repo. -->


12047: pkg/errors cleanup for sdk/go/common/resource/config r=abhinav a=Frassle

Continuing pkg/errors cleanup.

12049: test/integration: Avoid data race from FileMutex r=abhinav a=abhinav

integration_util_test contains a helper function synchronouslyDo
which attempts to run an operation with a file lock with a timeout,
in a blocking manner.
This implementation has a couple issues.

First, there's a data race between Lock() and Unlock() of the mutex:
the function defers an unlock regardless of whether a lock was acquired,
and if the timing is right, it causes a data race in FileMutex
on reading and writing the fsunlock field.

```
WARNING: DATA RACE
Read at 0x00c000388040 by goroutine 16:
  github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Unlock()
      /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:64 +0x3e
  github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func1()
      /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:263 +0x39
  [..]

Previous write at 0x00c000388040 by goroutine 17:
  github.com/pulumi/pulumi/sdk/v3/go/common/util/fsutil.(*FileMutex).Lock()
      /Users/runner/work/pulumi/pulumi/sdk/go/common/util/fsutil/lock.go:55 +0x72
  github.com/pulumi/pulumi/tests/integration.synchronouslyDo.func2()
      /Users/runner/work/pulumi/pulumi/tests/integration/integration_util_test.go:269 +0x4e
```

This data race is not expected because the contract for FileMutex is
that the Unlock method should only be called after Lock,
typicaly from the same goroutine --- synchronouslyDo does not do this.

Secondly, synchronouslyDo has a minor bug:
it will run the function *eventually* when the lock has been acquired
even if the timeout has expirted and the test has failed by then.

Resolve these issues by making the following changes:

- use a context to track the timeout
- defer an unlock only if a lock was successfully acquired
- run the operation only if we still have time to run it

Includes a failing test case.


12068: test/integration: Don't panic during setup r=abhinav a=abhinav

Currently, component setup panics if there's an error.
This isn't great because when it panics,
it fails to notify the outer goroutine that's waiting for setup,
which leaves that goroutine waiting for 10 minutes gefore giving up.

The net effect of this is that when setup fails for a test,
it takes 10 minutes to kill the test
even if setup failed within seconds.

Fix this by using testify and logging errors right away.


Co-authored-by: Zaid Ajaj <zaid.naom@gmail.com>
Co-authored-by: Robbie McKinstry <robbie@pulumi.com>
Co-authored-by: Abhinav Gupta <abhinav@pulumi.com>
Co-authored-by: Fraser Waters <fraser@pulumi.com>
@bors
Copy link
Contributor

bors bot commented Feb 4, 2023

Build failed (retrying...):

@bors
Copy link
Contributor

bors bot commented Feb 4, 2023

Build succeeded:

@bors bors bot merged commit 7675a4e into master Feb 4, 2023
@bors bors bot deleted the abhinav/integration-sync-do-race branch February 4, 2023 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact/no-changelog-required This issue doesn't require a CHANGELOG update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants