Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PickFirstLeafLoadBalancer does not emit TRANSIENT_FAILURE states #11082

Open
raboof opened this issue Apr 6, 2024 · 6 comments
Open

PickFirstLeafLoadBalancer does not emit TRANSIENT_FAILURE states #11082

raboof opened this issue Apr 6, 2024 · 6 comments
Assignees

Comments

@raboof
Copy link
Contributor

raboof commented Apr 6, 2024

When using channel.notifyWhenStateChanged and trying to connect to addresses that don't accept connections, the PickFirstLoadBalancer emits TRANSIENT_FAILURE states, while the PickFirstLeafLoadBalancer just stays in CONNECTING.

What version of gRPC-Java are you using?

I noticed this after updating from 1.62.2 to 1.36.0. I can also reproduce it on 1.62.2 when I set the GRPC_EXPERIMENTAL_ENABLE_NEW_PICK_FIRST environment variable to true (which has become the default in 1.63.0).

What is your environment?

Linux (NixOS unstable), Oracle Java 1.8.0_362

What did you expect to see?

Alternating between CONNECTING and TRANSIENT_FAILURE states

What did you see instead?

Silence after entering the CONNECTING state

Steps to reproduce the bug

I don't have a particularly minimal reproducer, but can reliably show the problem with the "NonBalancingIntegrationSpecNetty" test in Akka gRPC (apache/pekko-grpc#271 (comment))

@temawi
Copy link
Contributor

temawi commented Apr 8, 2024

@larry-safran You might be best to investigate this one.

@YifeiZhuang
Copy link
Contributor

I can not reproduce it yet in the unit test. The existing UT covers some basic situations for the channel to report TRANSIENT_FAILURE after CONNECTING, e.g. with multiple and one address, initial iteration complete for all the addresses
https://github.com/grpc/grpc-java/blob/master/core/src/test/java/io/grpc/internal/PickFirstLeafLoadBalancerTest.java#L333
https://github.com/grpc/grpc-java/blob/master/core/src/test/java/io/grpc/internal/PickFirstLeafLoadBalancerTest.java#L531

@raboof
Copy link
Contributor Author

raboof commented Apr 11, 2024

Thank you for looking into this, much appreciated - I'll see if I can find the time to dig into what's different between those grpc-java UT's and the behavior I'm seeing in the pekko-grpc integration test. Might have to wait a couple of weeks, though - busy period here.

@larry-safran
Copy link
Contributor

Are you providing multiple addresses or only a single address? Is it only in CONNECTING for a limited time or remains that way permanently?

@raboof
Copy link
Contributor Author

raboof commented Apr 16, 2024 via email

@raboof
Copy link
Contributor Author

raboof commented Apr 25, 2024

I created a possible unit test reproducer in https://github.com/grpc/grpc-java/compare/master...raboof:grpc-java:test-for-PickFirstLeafLoadBalancer-11082?expand=1 - I'm not too familiar with the grpc-java codebase, so it is possible that I'm misunderstanding something and not accurately reproducing the issue, but it might be a good starting point for further analysis. The behaviour does look similar to what I'm seeing in the pekko-grpc failure, where isPassComplete keeps returning false (because addressIndex.isValid() remains true).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants