Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Fix NPE race during hedging #10007

Merged
merged 1 commit into from Apr 11, 2023
Merged

core: Fix NPE race during hedging #10007

merged 1 commit into from Apr 11, 2023

Conversation

ejona86
Copy link
Member

@ejona86 ejona86 commented Apr 3, 2023

The problem was one hedge was committed before another had drained start(). This was not testable because HedgingRunnable checks whether scheduledHedgingRef is cancelled, which is racy, but there's no way to deterministically trigger either race.

The same problem couldn't be triggered with retries because only one attempt will be draining at a time. Retries with cancellation also couldn't trigger it, for the surprising reason that the noop stream used in cancel() wasn't considered drained.

This commit marks the noop stream as drained with cancel(), which allows memory to be garbage collected sooner and exposes the race for tests. That then showed the stream as hanging, because inFlightSubStreams wasn't being decremented.

Fixes #9185

The problem was one hedge was committed before another had drained
start(). This was not testable because HedgingRunnable checks whether
scheduledHedgingRef is cancelled, which is racy, but there's no way to
deterministically trigger either race.

The same problem couldn't be triggered with retries because only one
attempt will be draining at a time. Retries with cancellation also
couldn't trigger it, for the surprising reason that the noop stream used
in cancel() wasn't considered drained.

This commit marks the noop stream as drained with cancel(), which allows
memory to be garbage collected sooner and exposes the race for tests.
That then showed the stream as hanging, because inFlightSubStreams
wasn't being decremented.

Fixes grpc#9185
@ejona86
Copy link
Member Author

ejona86 commented Apr 11, 2023

Friendly ping

@ejona86 ejona86 merged commit 8d98e5f into grpc:master Apr 11, 2023
5 checks passed
@ejona86 ejona86 deleted the hedge-npe branch April 11, 2023 20:10
@ejona86 ejona86 added the TODO:backport PR needs to be backported. Removed after backport complete label Apr 13, 2023
@ejona86 ejona86 removed the TODO:backport PR needs to be backported. Removed after backport complete label Apr 13, 2023
zeebe-bors bot added a commit to zeebe-io/zeebe-cluster-testbench that referenced this pull request May 17, 2023
828: fix(deps): update dependency io.grpc:grpc-api to v1.55.1 r=korthout a=renovate[bot]

[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [io.grpc:grpc-api](https://togithub.com/grpc/grpc-java) | `1.54.0` -> `1.55.1` | [![age](https://badges.renovateapi.com/packages/maven/io.grpc:grpc-api/1.55.1/age-slim)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://badges.renovateapi.com/packages/maven/io.grpc:grpc-api/1.55.1/adoption-slim)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://badges.renovateapi.com/packages/maven/io.grpc:grpc-api/1.55.1/compatibility-slim/1.54.0)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://badges.renovateapi.com/packages/maven/io.grpc:grpc-api/1.55.1/confidence-slim/1.54.0)](https://docs.renovatebot.com/merge-confidence/) |

---

### Release Notes

<details>
<summary>grpc/grpc-java</summary>

### [`v1.55.1`](https://togithub.com/grpc/grpc-java/releases/tag/v1.55.1)

[Compare Source](https://togithub.com/grpc/grpc-java/compare/v1.54.1...v1.55.1)

The 1.55.0 release failed. There were no artifacts published for it.

##### API Changes

-   services: Rename `MetricRecorder.setQps`/`clearQps` to `setQpsMetric`/`clearQpsMetric` ([#&#8203;10031](https://togithub.com/grpc/grpc-java/issues/10031))

##### Behavior Changes

-   gcp-observability: Remove monitored resource detection for logging ([grpc/grpc-java#10020). The cloud libraries will fill in these details instead
-   protoc-gen-grpc-java: binaries for Linux ARM and PPC are now built using Ubuntu 18.04. They will no longer work on Ubuntu 16.04 and Debian 9

##### New Features

-   api: Stabilize the frequently used compression APIs ([#&#8203;9942](https://togithub.com/grpc/grpc-java/issues/9942)): `CallOptions.withCompression`, `CallOptions.getCompressor`, `AbstractStub.withCompression`,  `ServerCall.setCompression`, `ServerCall.setMessageCompression`
-   api: Stabilize `Detachable` and `HasByteBuffer`
-   gcp-observability: Stabilize `GcpObservability` ([grpc/grpc-java#10024). The GcpObservability API provides a simple way to export logging, tracing, and metrics to Google Cloud Operations. See [the Google Cloud blog post](https://cloud.google.com/blog/products/networking/introducing-grpc-observability-for-microservices).
-   census: Add new tracer annotation to indicate the time when name resolution completed for those RPCs that experienced name resolution delay, or the time when picking subchannel completed for those RPCs that experienced picking subchannel delay.  ([#&#8203;10014](https://togithub.com/grpc/grpc-java/issues/10014), [#&#8203;10044](https://togithub.com/grpc/grpc-java/issues/10044))
-   protoc-gen-grpc-java: binary for s390x is now published ([#&#8203;9455](https://togithub.com/grpc/grpc-java/issues/9455)). The glibc version used is available in Ubuntu 20.04, Debian 11, and CentOS 9 and later
-   authz: Added `FileWatcherAuthorizationServerInterceptor` ([#&#8203;9775](https://togithub.com/grpc/grpc-java/issues/9775))
-   services: Added `OrcaMetricReportingServerInterceptor.create(MetricRecorder)` which adds common metrics per-RPC ([#&#8203;9902](https://togithub.com/grpc/grpc-java/issues/9902))
-   android: Add `UdsChannelBuilder` for using LocalSocket an Android ([#&#8203;8418](https://togithub.com/grpc/grpc-java/issues/8418))
-   alts: Observe the `GRPC_ALTS_MAX_CONCURRENT_HANDSHAKES` environment variable user to adjust the max number of concurrent ALTS handshakes ([#&#8203;10016](https://togithub.com/grpc/grpc-java/issues/10016))
-   binder: Expose client identity via `PeerUid` and `PeerUids` ([#&#8203;9952](https://togithub.com/grpc/grpc-java/issues/9952))
-   binder: Add `BindServiceFlags.setAllowActivityStarts()` for `BIND_ALLOW_ACTIVITY_STARTS` added in Android U ([#&#8203;10008](https://togithub.com/grpc/grpc-java/issues/10008))

##### Bug Fixes

-   core: Fix NPE race during hedging ([grpc/grpc-java#10007), fixing a Netty buffer memory leak for cancelled RPCs
-   core: Allow transparent retries after a retry attempt and the configured max retries was 1 ([#&#8203;10066](https://togithub.com/grpc/grpc-java/issues/10066))
-   okhttp: properly implement `OkHttpServerBuilder.maxConnectionAgeGrace()` ([#&#8203;9968](https://togithub.com/grpc/grpc-java/issues/9968))
-   xds: Enable federation support. See [gRFC A47](https://togithub.com/grpc/proposal/blob/master/A47-xds-federation.md)
-   xds: Enable Weighted Round Robin LB policy support. See [gRFC A58](https://togithub.com/grpc/proposal/blob/master/A58-client-side-weighted-round-robin-lb-policy.md)
-   xds: Avoid ClassCastException if the control plane changes the top-level policy ([#&#8203;10091](https://togithub.com/grpc/grpc-java/issues/10091)). This is expected to be unlikely, but is possible
-   xds: Fix `java.util.NoSuchElementException: SecurityProtocolNegotiators$ClientSdsHandler#&#8203;0` ([#&#8203;10118](https://togithub.com/grpc/grpc-java/issues/10118)). This error did not cause any problems, other than unnecessary logging
-   xds: Avoid using the default locale for case insensitive path matching ([#&#8203;10148](https://togithub.com/grpc/grpc-java/issues/10148))
-   googleapis: Enable ignore_resource_deletion for `google-c2p:` resolver’s default xds bootstrap ([#&#8203;10121](https://togithub.com/grpc/grpc-java/issues/10121))
-   rls: Refresh name resolution on rejected addresses ([#&#8203;10032](https://togithub.com/grpc/grpc-java/issues/10032))

##### New Examples

-   Keepalive ([#&#8203;9956](https://togithub.com/grpc/grpc-java/issues/9956))
-   Cancellation ([#&#8203;9962](https://togithub.com/grpc/grpc-java/issues/9962))
-   Deadline ([#&#8203;9958](https://togithub.com/grpc/grpc-java/issues/9958))
-   Using waitForReady ([#&#8203;9960](https://togithub.com/grpc/grpc-java/issues/9960))
-   Client and Server sharing ([#&#8203;9969](https://togithub.com/grpc/grpc-java/issues/9969))
-   Reflection ([#&#8203;9955](https://togithub.com/grpc/grpc-java/issues/9955))
-   Doing debug ([#&#8203;9957](https://togithub.com/grpc/grpc-java/issues/9957))
-   Health service ([#&#8203;9991](https://togithub.com/grpc/grpc-java/issues/9991))
-   Error details ([#&#8203;9997](https://togithub.com/grpc/grpc-java/issues/9997))
-   Custom load balancing ([#&#8203;9951](https://togithub.com/grpc/grpc-java/issues/9951))
-   gRPC-level reverse proxy ([#&#8203;10059](https://togithub.com/grpc/grpc-java/issues/10059))

##### Dependencies

-   protobuf-java and protobuf-java-util upgraded to 3.22.3 ([#&#8203;10045](https://togithub.com/grpc/grpc-java/issues/10045))

##### Acknowledgements

-   [`@&#8203;carl-mastrangelo](https://togithub.com/carl-mastrangelo)`
-   [`@&#8203;haubenr](https://togithub.com/haubenr)`
-   [`@&#8203;jpd236](https://togithub.com/jpd236)`
-   [`@&#8203;kenk42292](https://togithub.com/kenk42292)`

### [`v1.54.1`](https://togithub.com/grpc/grpc-java/releases/tag/v1.54.1)

[Compare Source](https://togithub.com/grpc/grpc-java/compare/v1.54.0...v1.54.1)

##### Bug Fixes

-   core: Fix NPE race during hedging ([grpc/grpc-java#10046), fixing a Netty buffer memory leak for cancelled RPCs

##### Behavior Changes

-   gcp-observability: Remove monitored resource detection for logging ([grpc/grpc-java#10026). The cloud libraries will fill in these details instead

##### API stabilizations

-   Stabilize GcpObservability ([grpc/grpc-java#10027)
    -   The GcpObservability API provides users with a simple way to export logging, tracing, and metrics to Google Cloud Operations. For more information, please see [this blog post](https://cloud.google.com/blog/products/networking/introducing-grpc-observability-for-microservices).

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://app.renovatebot.com/dashboard#github/zeebe-io/zeebe-cluster-testbench).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS43NC4wIiwidXBkYXRlZEluVmVyIjoiMzUuODkuMSIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==-->


Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants