Update grpc-core, grpc-interop-testing, ... to 1.29.0 #933

scala-steward · 2020-04-22T02:04:31Z

Updates

from 1.28.1 to 1.29.0.
GitHub Release Notes - Version Diff

I'll automatically update this PR to resolve conflicts as long as you don't change it yourself.

If you'd like to skip this version, you can just close this PR. If you have any feedback, just mention me in the comments below.

Configure Scala Steward for your repository with a .scala-steward.conf file.

Have a fantastic day writing Scala!

Ignore future updates

Add this to your .scala-steward.conf file to ignore future updates of this dependency:

updates.ignore = [ { groupId = "io.grpc" } ]

labels: library-update, semver-minor

ignasi35 · 2020-04-23T18:00:46Z

I think the LoadBalancingintegrationSpec is consistently failing in this PR because of changes in the of grpc-java regarding the ConnectivityStatus affecting the logic in:

akka-grpc/runtime/src/main/scala/akka/grpc/internal/ChannelUtils.scala

Lines 51 to 84 in 8825278

    
           @InternalApi 
        
           private[akka] def monitorChannel( 
        
               done: Promise[Done], 
        
               channel: ManagedChannel, 
        
               maxConnectionAttempts: Option[Int], 
        
               log: LoggingAdapter): Unit = { 
        
             def monitor(currentState: ConnectivityState, connectionAttempts: Int): Unit = { 
        
               log.debug(s"monitoring with state $currentState and connectionAttempts $connectionAttempts") 
        
               val newAttemptOpt = currentState match { 
        
                 case ConnectivityState.TRANSIENT_FAILURE => 
        
                   if (maxConnectionAttempts.contains(connectionAttempts + 1)) { 
        
                     // shutdown is idempotent in ManagedChannelImpl 
        
                     channel.shutdown() 
        
                     done.tryFailure( 
        
                       new ClientConnectionException(s"Unable to establish connection after [$maxConnectionAttempts]")) 
        
                     None 
        
                   } else Some(connectionAttempts + 1) 
        
                 case ConnectivityState.READY => 
        
                   Some(0) 
        
                 case ConnectivityState.SHUTDOWN => 
        
                   done.trySuccess(Done) 
        
                   None 
        
                 case ConnectivityState.IDLE | ConnectivityState.CONNECTING => 
        
                   Some(connectionAttempts) 
        
               } 
        
               newAttemptOpt.foreach { attempts => 
        
                 channel.notifyWhenStateChanged(currentState, () => monitor(channel.getState(false), attempts)) 
        
               } 
        
             } 
        
             monitor(channel.getState(false), 0) 
        
           }

Reading the release notes for 1.29.0 I suspect the issue is related to grpc/grpc-java#6657.

Before the upgrade the test would output:

monitoring with state IDLE and connectionAttempts 0 - 
monitoring with state CONNECTING and connectionAttempts 0 - 
monitoring with state TRANSIENT_FAILURE and connectionAttempts 0 - 
monitoring with state CONNECTING and connectionAttempts 1 - 
monitoring with state TRANSIENT_FAILURE and connectionAttempts 1 -

But according to grpc/grpc-java#6657: the subchannel now remains in TRANSIENT_FAILURE so the counter logic in ChannelUtils doesn't make sense in the TRANSIENT_FAILURE case anymore.

I think we'd need a scheduler with a timeout or similar... 🤔

ignasi35 · 2020-04-23T18:50:10Z

interop-tests/src/test/scala/akka/grpc/scaladsl/LoadBalancingIntegrationSpec.scala

-      val failure =
-        client.sayHello(HelloRequest(s"Hello friend")).failed.futureValue.asInstanceOf[StatusRuntimeException]
-      failure.getStatus.getCode should be(Code.UNAVAILABLE)
-      client.closed.failed.futureValue shouldBe a[ClientConnectionException]


Removing this line fixes the test.
grpc-java 1.29.0 changed the way subchannel events are notified (in load-balanced situations). With that change, the ChannelUtils listener for channel changes no longer works when using load balanced channels because it can no longer count reconnection attempts. As a consequence, a scenario like the one in the test will no longer fail but retry indefinitely.

Once I realized the reconnection would work indefinitely I decided the test could be removed.

And, if we remove the test, and then start pulling the thread, we should probably deprecate withConnectionAttempts(2) (or maybe re-implement the underlying logic if we consider a connectionAttempts counter makes any sense now that we are using the internal NameResolver engine provided by groc-java.

Before pushing forward and deleting all usages of connectionAttempts I preferred raising this comment for discussion.

I guess it makes sense to follow the lead of the reference implementation and just keep retrying when loadbalancing, then.

if we remove the test, and then start pulling the thread, we should probably deprecate withConnectionAttempts(2)

Isn't it used in the non-loadbalanced scenario either?

Isn't it used in the non-loadbalanced scenario either?

Yeah, I had the same thought over breakfast earlier today. I think we don't have a test for that and, even if we have such a test, we should improve documentation or API around withConnectionAttempts.

My main concern is whether we should improve the load-balanced use-cases. For example:

in situations where none of the records returned by the service discovery are valid, I'd like to see the aggregated channel to fail eventually (like it does in non-balanced usages). Or, at least, I'd like to see failure traces in the logs (I think we can have a look at logging settings for that).

One of the reasons this took me a while to diagnose was that the test code (aka user code) had zero visibility over the underlying failures. Also, I (the developer) didn't see any apparent failure or retries happening even with DEBUG logs.

in situations where none of the records returned by the service discovery are valid, I'd like to see the aggregated channel to fail eventually

I agree. If we don't have any insight in the progress of reconnecting / number of reconnects, I guess the best we could do is wait for a timeout: if a channel remains in TRANSIENT_FAILURE for X amount of time, assume it won't ever successfully connect and fail.

Does the reference implementation have any such timeout?

Isn't it used in the non-loadbalanced scenario either?

Yeah, I had the same thought over breakfast earlier today. I think we don't have a test for that and, even if we have such a test, we should improve documentation or API around withConnectionAttempts.

I had the answer above DRAFTed and then I realized that I don't know how to control wether the Channel provided is a single-Channel or a load-balanced one with multiple sub-channels.

Maybe we're no longer using a non LoadBalanced client. The client factory methods all create a ServiceDiscovery instance (maybe akka-dns, maybe config, maybe other) and, since we always inject a NamedResolver based on that ServiceDiscovery it's possible we're always running a Channel with sub-channel(s).

I need to dig that deeper.

(I had multiple GH windows and posted 2 variants of the same comment)

Ok, so with grpc-java 1.29.0 we should keep withConnectionAttempts around but it'll only be used when withGrpcLoadBalancingType is not invoked. I've just tested that and it works.

I think we have to improve documentation with multiple things:

when withGrpcLoadBalancingType is used, withConnectionAttempts is ignored, the underlying ( ~~akka-grpc~~ grpc-java ) implementation for backoff reconnecting will be used indefintely and client.closed only completes when client.close() is directly invoked. Note: TIL the load balancing in the akka-grpc implementation is per call.

when withGrpcLoadBalancingType is not used, but a ServiceDiscovery is provided alls calls are sent to a single server instance but users can map over client.closed to reconnect again which will trigger a new ServiceDiscovery lookup.

Does the reference implementation have any such timeout?

I'm reading docs here and there trying to see what other choices we have.

The upstream default is pick_first (no loadbalancing...

Again! 🤦🏼‍♂️ I should refresh the page before answering...

the underlying (akka-grpc) implementation for backoff reconnecting will be used

Do you mean grpc-java?

TIL the load balancing in the akka-grpc implementation is per call

What do you mean?

all calls are sent to a single server instance but users can map over client.closed to reconnect again which will trigger a new ServiceDiscovery lookup.

I would expect that when that single server instance fails, grpc-java will automatically fail over to another instance (triggering a new ServiceDiscovery lookup) as needed. Do you think we need to do something extra there?

In any case would be good to add tests for those scenario's.

(🤦🏼‍♂️ Today is not my day)

In both statements above I meant grpc-java (not akka.grpc).

TIL the load balancing in the ~~akka-grpc~~ grpc-java implementation is per call

🤦🏼‍♂️ It is round-robin! What did I expect?

I would expect that when that single server instance fails, grpc-java will automatically fail over to another instance (triggering a new ServiceDiscovery lookup) as needed. Do you think we need to do something extra there?

I think our code listening on the number of re-connection attempts is the issue here and we may want to remove it once and for all. I mean, having moved to an implementation based on NameResolver, even when using pick-first you are probably correct and grpjc-java is doing a new name resolution. So, it doesn't make sense that close after a certain number of reconnection attempts. I mean, we could keep the code around for logging purposes (even though that code is useless in round-robin) but we probably can drop the withConnectionAttempts setting and closed future.

In any case would be good to add tests for those scenario's.

+1

raboof · 2020-05-07T09:43:02Z

Solid analysis, I think we have a good grasp of what's going on here.

I think 'just retrying indefinitely' does not provide enough observability. I think we can and should give the option to fail when connecting fails. As a future improvement we could expose the Future that is resolved when the channel becomes 'ready' and/or allow configuring a timeout. Perhaps that isn't needed yet though.

…tils)

…ints are provided

raboof · 2020-05-07T13:37:05Z

(this does suggest it'd be good to have a different setting for 'retry before becoming ready' and 'retry on failures after becoming ready', but master doesn't have that either, so that can be a future improvement)

probot-autolabeler bot added sbt update labels Apr 22, 2020

raboof approved these changes Apr 22, 2020

View reviewed changes

probot-autolabeler bot added the gradle label Apr 22, 2020

ignasi35 mentioned this pull request Apr 22, 2020

LoadBalancingIntegrationSpec: should re-discover endpoints on failure #935

Open

ignasi35 force-pushed the update/grpc-core-1.29.0 branch from 01391d3 to 16ab21c Compare April 23, 2020 18:45

ignasi35 reviewed Apr 23, 2020

View reviewed changes

This was referenced May 1, 2020

Remove unused settings #947

Closed

Multiple test failures using AdoptOpenJDK 8u252 (hs) #946

Closed

raboof requested a review from ignasi35 May 7, 2020 11:22

scala-steward and others added 8 commits May 7, 2020 12:16

Update grpc-core, grpc-interop-testing, ... to 1.29.0

e3239a8

Add missing updates

60f2c83

Since grpc-java 1.29.0 subchannel failures are not notified (ChannelU…

ce786d0

…tils)

Since grpc-java 1.29.0 the client will never fail when no valid endpo…

30ac6b2

…ints are provided

Adds NonBalacingIntegrationSpec

582f647

Review documentation for client reconnection

789282d

Adds missing header

1c6545d

Allow failing after a loadbalanced connection failure

860b433

raboof force-pushed the update/grpc-core-1.29.0 branch from a1df10b to 860b433 Compare May 7, 2020 12:17

Fix case where a failure occurs after the channel has been ready

4ea1b2e

Add mima filter

14b6e82

raboof merged commit 73c91b2 into akka:master May 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update grpc-core, grpc-interop-testing, ... to 1.29.0 #933

Update grpc-core, grpc-interop-testing, ... to 1.29.0 #933

scala-steward commented Apr 22, 2020

ignasi35 commented Apr 23, 2020

ignasi35 Apr 23, 2020

raboof Apr 24, 2020

ignasi35 Apr 24, 2020

raboof Apr 24, 2020

ignasi35 Apr 24, 2020

ignasi35 Apr 24, 2020 •

edited

ignasi35 Apr 24, 2020

raboof Apr 24, 2020

ignasi35 Apr 24, 2020

ignasi35 Apr 24, 2020

raboof commented May 7, 2020 •

edited

raboof commented May 7, 2020

Update grpc-core, grpc-interop-testing, ... to 1.29.0 #933

Update grpc-core, grpc-interop-testing, ... to 1.29.0 #933

Conversation

scala-steward commented Apr 22, 2020

ignasi35 commented Apr 23, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ignasi35 Apr 24, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raboof commented May 7, 2020 • edited

raboof commented May 7, 2020

ignasi35 Apr 24, 2020 •

edited

raboof commented May 7, 2020 •

edited