Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unavailable in grpc queries when android app goes to background #11018

Open
rostik404 opened this issue Mar 15, 2024 · 13 comments
Open

Unavailable in grpc queries when android app goes to background #11018

rostik404 opened this issue Mar 15, 2024 · 13 comments
Assignees
Labels

Comments

@rostik404
Copy link
Contributor

Hello!
I have a problem with grpc queries in kotlin app when my app goes to background. I have a next scheme of relation between my app and backend - when backend needs to wake up a phone, I'm doing next things:

  • Sending a push from the backend to the phone through the Firebase
  • In my app on the phone I'm listening events from firebase.
  • When I'm getting the push from the firebase, I'm creating CoroutineWorker
  • Inside of CoroutineWorker I'm going to my backend via grpc

Firebase listener:

class FCMListenerService : FirebaseMessagingService() {
    override fun onMessageReceived(remoteMessage: RemoteMessage) {
        Log.d(TAG, "Got push message: ${remoteMessage.rawData.toString()}, priority ${remoteMessage.priority}")
        val constraints = Constraints.Builder()
            .setRequiredNetworkType(NetworkType.CONNECTED)
            .build()

        val workRequest = OneTimeWorkRequestBuilder<ExecuteNextOperationWorker>().setConstraints(constraints).build()
        WorkManager.getInstance(applicationContext).enqueue(workRequest)
    }

Coroutine worker:

...
override suspend fun doWork(): Result {
    val channel = OkHttpChannelBuilder
                .forAddress(config.host, config.port)
                .sslSocketFactory(deviceSslProvider.getSSLSocketFactory())
                .useTransportSecurity()
                .intercept(GrpcLoggerInterceptor())
                .defaultServiceConfig(getServiceConfig())
                .enableRetry()
                .build()
   val myService = MyServiceGrpcKt.MyServiceCoroutineStub(channel)
   val myRequest = MyServiceOuterClass.MyRequest.build()

   return myService.myOperation(myRequest)
}

It works fine, when app works in foreground, but when I'm locking the phone, after few minutes, this grpc queries are not passing to the backend - I'm getting an error:

2024-03-15 11:06:10.098 29021-5531  io.grpc.StatusException: UNAVAILABLE: Unable to resolve host my-backend.com

Also, I added http queries near before grpc queries - http queries work fine even in background mode. Seems like the problem just with grpc queries.

I thought that problem is related with #8860, but I've updated grpc to the last version and the problem is still reproducing.

My build.gradle file:

implementation("io.grpc:grpc-okhttp:1.62.2")
implementation("io.grpc:grpc-protobuf:1.62.2")
implementation("io.grpc:grpc-stub:1.62.2")
@larry-safran
Copy link
Contributor

The DNS name resolver will usually produce a stacktrace with a Cause that may help direct you.

@sergiitk
Copy link
Member

To rule out HTTP request having DNS cached, could you please test the same flow, but once you create gRPC Channel, call ManagedChannel.getState(true) to force gRPC channel to connect? We suspect this issue could be related to gRPC lazy-loading the channel and not caching DNS.

@sergiitk sergiitk added the Waiting on reporter there was a request for more information without a response or answer or advice has been provided label Mar 20, 2024
@rostik404
Copy link
Contributor Author

Hello!
I added some logs and noticed that after phone inactivity chanel comes to TRANSIENT_FAILURE state, so I added some logic before my requests:

val currentChannelState = channel.getState(false)

if (currentChannelState == ConnectivityState.TRANSIENT_FAILURE || currentChannelState == ConnectivityState.SHUTDOWN) {
            Log.d(TAG, "Channel is in $currentChannelState switching to Idle")

            channel.enterIdle()
        }

Now it works correct

@rostik404
Copy link
Contributor Author

rostik404 commented Mar 22, 2024

Also, during the debugging of this problem, I've noticed, that retries are not working.
Here is my config:

.defaultServiceConfig(
                    mapOf(
                        "methodConfig" to listOf(
                            mapOf(
                                "retryPolicy" to mapOf(
                                    "maxAttempts" to "5",
                                    "initialBackoff" to "0.5s",
                                    "maxBackoff" to "10s",
                                    "backoffMultiplier" to 1.2,
                                    "retryableStatusCodes" to listOf("UNAVAILABLE")
                                )
                            )
                        )
                    )
                )
.maxRetryAttempts(5)
.enableRetry()

I see in logs just one request:

2024-03-22 14:56:41.689 22515-22562 GrpcLogger              com.test.app                       D  method -> test.app.v0.TestService/test, message -> ...

And after that, I'm getting an exception with Unavailbale error. There is no evidence of any retries in the logs.

Could you please tell, what am I doing wrong?

UPD

Also, I tried to pass the name of my service I need to retry, it didn't work :(

methodConfig" to listOf(
                            mapOf(
                                "name" to listOf(mapOf("service" to "test.app.v0.TestService")),
                                "retryPolicy" to mapOf(
                                    "maxAttempts" to "5",
                                    ...
                            )
                        )

@sergiitk sergiitk removed the Waiting on reporter there was a request for more information without a response or answer or advice has been provided label Mar 22, 2024
@rostik404
Copy link
Contributor Author

rostik404 commented Mar 25, 2024

Finally, I've passed service and method to the name field of the config and now retires are working for passed method. This example was very helpful https://github.com/grpc/grpc-java/tree/master/examples/src/main/java/io/grpc/examples/retrying (btw, I found some problem with bazel for this example #11037)

But I expected, retries would be enabled for all methods, if specific name of the method was not passed. Also, I expected, that retries would be enabled for all services if specific name of the service was not passed.

Could you please tell what is correct behavior in theese cases?

UPD
Okay, according to the methodConfig title of the config field, seems like behavior is correct :)
But maybe there is a way to apply retries for all services/methods of the service?

UPD
According to this doc, retries should be applied to all method of the sevice, if I've passed just service name https://github.com/grpc/grpc/blob/master/doc/service_config.md#example and actually it works :)
But still have no idea, how to pass retry policy for all services.
And is it possible to see info about retries in logs interceptor? Now I see just info about first request
Here is the interceptor:

class GrpcLoggerInterceptor(): ClientInterceptor {
    override fun <ReqT : kotlin.Any?, RespT : kotlin.Any?> interceptCall(
        method: MethodDescriptor<ReqT, RespT>?,
        callOptions: CallOptions?,
        next: Channel?
    ): ClientCall<ReqT, RespT>  =
        object : ForwardingClientCall.SimpleForwardingClientCall<ReqT, RespT>(
            next?.newCall(method, callOptions)
        ) {
            override fun sendMessage(message: ReqT) {
                Log.d(TAG, "method -> ${method?.fullMethodName}, message -> $message")

                super.sendMessage(message)
            }
        }
}

@rostik404
Copy link
Contributor Author

rostik404 commented Mar 26, 2024

btw, initial problem may be related with android dns cache, as you supposed above. The whole error looks like:

Status{code=UNAVAILABLE, description=Unable to resolve host my-awesome-host.com, cause=java.lang.RuntimeException: java.net.UnknownHostException: Unable to resolve host "my-awesome-host.com": No address associated with hostname
at io.grpc.internal.DnsNameResolver.resolveAddresses(DnsNameResolver.java:223)
at io.grpc.internal.DnsNameResolver.doResolve(DnsNameResolver.java:282)
at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:318)

And unfortunatelly, retries are not working if this error was thrown. It can be reporduced if you'll pass non-existent host to your retry example https://github.com/grpc/grpc-java/blob/master/examples/src/main/java/io/grpc/examples/retrying/RetryingHelloWorldClient.java#L130

@temawi
Copy link
Contributor

temawi commented Mar 26, 2024

A note on your service config - in your earlier comment you had a service config that you wanted to apply to all calls, but it was missing the required "name" field. But your other example config did look correct so I suppose in at least some of your testing retry was configured correctly. You can check out the service config user guide for an example config.

But more saliently, it looks like you have uncovered a bug in how the default service config is applied. I created issue #11040 to track the fix for that. Note that this does not seem like an Android specific issue.

@rostik404
Copy link
Contributor Author

@temawi, thanks, my config really was correct, but I suppose retries didn't work because of the problem I described here.
Could you please check it? I added notes how to reproduce this case. Seems like retries are not working if we are getting an exception from dns resolver (or any other exceptions, I suppose) – is it expected behaviour? Is there any way to retry this case?

@temawi
Copy link
Contributor

temawi commented Mar 28, 2024

Unfortunately in situations where the first DNS lookup fails the default config will not be honored. That is a bug in gRPC that needs to get fixed as part of #11040.

Until then, what I can think of as a workaround would be to create your own NameResolver by extending the DnsNameResolver class and have it always return the service config you need. I think you can get by with just wrapping the doResolve() method.

The Custom Name Resolvers user guide gives some context on this and has a link to an example custom name resolver implementation.

Did you have any other issues here besides the service config not getting applied?

@rostik404
Copy link
Contributor Author

rostik404 commented Mar 29, 2024

@temawi, thanks, I'll try to use custom resolver

Did you have any other issues here besides the service config not getting applied?

Unfortunately, I did :(
In my android app I'm making some grpc calls right after establishing wifi connection. I'm using network callback to know when connection was established. And than I'm facing with a problem: channel stucks in CONNECTING state. I tried to add several calls with backoff – channel still being in CONNECTING state:

12:39:46.296 32366-32366 OBSERVER      com.test.app D  [type: WIFI[], state: CONNECTED/CONNECTED, reason: (unspecified), extra: , failover: false, available: true, roaming: false]
12:39:46.310 32366-32435 TestApi       com.test.app D  Channel is in IDLE before making request
12:39:46.310 32366-32435 TestApi       com.test.app D  GONNA MAKE GRPC CALL
12:39:46.403 32366-32437 GrpcLogger    com.test.app D  method -> test.app.v0.TestSerivce/Method
12:39:46.404 32366-32437 ErrorHandler  com.test.app D  Error io.grpc.StatusException: DEADLINE_EXCEEDED: ClientCall started after CallOptions deadline was exceeded. Deadline has been exceeded for 14.029396739s. 
12:39:49.304 32366-32437 TestApi       com.test.app D  Channel is in CONNECTING before making request
12:39:49.304 32366-32437 TestApi       com.test.app D  GONNA MAKE GRPC CALL
12:39:49.314 32366-32436 GrpcLogger    com.test.app D  method -> test.app.v0.TestSerivce/Method
12:39:49.316 32366-32436 ErrorHandler  com.test.app D  Error io.grpc.StatusException: DEADLINE_EXCEEDED: ClientCall started after CallOptions deadline was exceeded -16,994016358 seconds ago. Name resolution delay 0,000000000 seconds.
12:39:52.314 32366-32435 TestApi       com.test.app D  Channel is in CONNECTING before making request
12:39:52.314 32366-32435 TestApi       com.test.app D  GONNA MAKE GRPC CALL
12:39:52.322 32366-32435 ErrorHandler  com.test.app D  Error io.grpc.StatusException: DEADLINE_EXCEEDED: ClientCall started after CallOptions deadline was exceeded -19,999490615 seconds ago. Name resolution delay 0,000000000 seconds.

If I add Thread.sleep(2000) before making first grpc call – everything works correct. The problem is reproducing only If we are making call right after wifi connection has been etablished.
Here is the full example https://gist.github.com/rostik404/e920f25cc34ec11dde08d030ed4daea4

Also, I added deadlineAfter to my client. Withot adding dedlineAfter it stuckes in CONNECTING state on first call

@temawi
Copy link
Contributor

temawi commented Apr 9, 2024

Just a note that I haven't forgotten about this issue - but I'm not sure how to help troubleshoot it further. This could even be an Android issue where that notification comes in a bit before the network stack is fully ready (but that's just speculation). I take it you are still facing this issue?

@rostik404
Copy link
Contributor Author

rostik404 commented Apr 18, 2024

Hello!
Thank you for continuing to help with this issue :)

This could even be an Android issue where that notification comes in a bit before the network stack is fully ready

Maybe, but HTTP calls seem to be fine. It seems like I'm having a problem only with gRPC calls.
Additionally, it is interesting that retries are not helpful if the first attempt was not successful..

@temawi
Copy link
Contributor

temawi commented May 2, 2024

We noticed one thing when reviewing your question - you are using HttpcChannelBuilder. Can you try using AndroidChannelBuilder instead? AndroidChannelBuilder has code to react to the network becoming available and it's likely you are suffering from not having that behavior enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants