New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Timeout on consuming message from azure service bus through Application Gateway #40020
Comments
Thank you for your feedback. Tagging and routing to the team member best able to assist. |
Hello @fanemama, the first error message indicate that the background scheduled task that is responsible for renewing the auth token at regular interval is failing to do so at certain point. If this is happening often then it possibly indicating an unstable network between application running consumer and AD endpoint or broker. I wonder if there is a restricted network or proxy or some kind of firewall rules that result in dropping the connection often. |
Hi @anuchandy , Out of the different meetings we had with the support, the conclusion was that it is not a net work issue. |
@fanemama, let me prepare a setup (SB resource behind app-gateway and SDK consumer running in a Docker instance) to see if this can be reproduced. |
FYI @anuchandy I will know take the lead on this, helping @fanemama . Thank you for preparing this setup. |
Hi @anuchandy, any news on this ? Thanks. |
Hi @lazhar, I’ve been looking into this. One thing I noticed is, if the gateway front end sent FIN+ACK and TCP RST, then the underlying proton-j library does not signal connection termination to the application. I’ve created an issue in that project’s JIRA [PROTON-2823] Proton-J does not raise transport closed when TCP FIN+ACK arrives followed by TCP RST - ASF JIRA (apache.org) . I’m not sure if your environment is impacted by this. Here are a few observations I had that I thought helpful to share -
Also, may I know Service Bus SDK version, the mode of authentication (e.g., connection string) and how the receive code looks like. |
Hello @anuchandy, Thank you very much for your help and feedback. On the infrastructure, network side, we have noticed the same findings about FIN+ACK and TCP RST and you confirm my assumption that the application does not get the signal and thus doesn't try to restart the connection. We have already setup the timeout on Application Gateway at 120s (following that note: https://learn.microsoft.com/en-us/azure/application-gateway/application-gateway-websocket#backendaddresspool-backendhttpsetting-and-routing-rule-configuration). 120s is double than the default timeout of ASB message locks. To be on safe side, even if it is not optimized. And indeed, we noticed little improvement on some applications. I let @lazhar answer about the code details. Thank you again for your help. |
Hi @anuchandy, thank you for your help and for opening the ticket about Proton-J. We will try to set the heartbeat option in our java applications and see if it helps. We are also trying to see if this heartbeat option can be set directly in the code somewhere. |
@anuchandy I saw this option on .NET SDK. Is it the heartbeat we are talking about ? Do you know the equivalent in the java SDK ? |
@anuchandy and to answer your questions:
|
Context:
We are using Azure Service Bus via an application gateway (custom endpoint) with the transport type: AmqpTransportType.AMQP_WEB_SOCKETS.
Our consumer encounters regularly a connection timeout issue and the application stops consuming messages.
We are constantly forced to restart the application to consume again messages.
Do you have a solution for our issue ? and an explanation of this behaviours
Stack trace:
{"az.sdk.message":"Error occurred while refreshing token that is not retriable. Not scheduling refresh task. Use ActiveClientTokenManager.authorize() to schedule task again.","exception":"Could not emit tick 256 due to lack of requests (interval doesn't support small downstream requests that replenish slower than the ticks)","scopes":[https:// t](https://*******,"audience":"amqp://"}
{"az.sdk.message":"Timeout waiting for RemoteClose. Manually terminating EndpointStates and completing close.","connectionId":"MF_a2709f_1714114657307","entityPath":"","linkName":""}
{"az.sdk.message":"onLinkRemoteClose","connectionId":"MF_6ffaaf_1714548329200","errorCondition":"amqp:link:detach-forced","errorDescription":"The link 'G14:5461966: ' is force detached. Code: publisher(link162650). Details: AmqpMessagePublisher.IdleTimerExpired: Idle timeout: 00:10:00.","linkName":"","entityPath":"***********"}
The text was updated successfully, but these errors were encountered: