Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GOAWAY http/2 frame intermittently observed on high traffic on adding idleTimeout #2649

Closed
owaiscs001 opened this issue Jan 12, 2023 · 17 comments · Fixed by #2660
Closed

GOAWAY http/2 frame intermittently observed on high traffic on adding idleTimeout #2649

owaiscs001 opened this issue Jan 12, 2023 · 17 comments · Fixed by #2660
Assignees
Labels
type/bug A general bug
Milestone

Comments

@owaiscs001
Copy link

Problem statement

While sending the high traffic(same message and payload) we observed some 503 intermittently.
Further we collected tcpdump in the same time interval and observed a GOAWAY http/2 frame sent by netty server after every 10 min.
We have set idleTimeout = 600000 using below mentioned addServerCustomizers

'''factory.addServerCustomizers(httpServer -> httpServer.idleTimeout(Duration.ofMillis(idleTimeout)));'''

In the next test we changed idleTimeout idleTimeout = 420000 and observed GOAWAY http/2 frame in tcpdump after every 3 min 23 sec and 3 min 37 sec alternately.
In the next test we disabled idleTimeout by commenting the addServerCustomizers and 503 was not observed further.

'''//factory.addServerCustomizers(httpServer -> httpServer.idleTimeout(Duration.ofMillis(idleTimeout)));'''

Expected Behavior

If traffic is running then netty must not send a GOAWAY http/2 frame.

Actual Behavior

Netty send a GOAWAY http/2 frame in running traffic after idleTimeout.
Also can you provide some help on how to debug this issue.

  • Reactor version(s) used: 1.0.23
  • Other relevant libraries versions (eg. netty, ...): (spring-boot-starter-webflux:2.7.4, spring-boot-starter:2.7.4 (), spring-boot-starter-json:2.7.4 (), spring-boot-starter-reactor-netty:2.7.4)
    +--- org.springframework.boot:spring-boot-starter-webflux:2.7.4
    | +--- org.springframework.boot:spring-boot-starter:2.7.4 ()
    | +--- org.springframework.boot:spring-boot-starter-json:2.7.4 (
    )
    | +--- org.springframework.boot:spring-boot-starter-reactor-netty:2.7.4
    | | --- io.projectreactor.netty:reactor-netty-http:1.0.23
  • JVM version (java -version): 17.0.5
  • OS and version (eg. uname -a): Linux
@owaiscs001 owaiscs001 added status/need-triage A new issue that still need to be evaluated as a whole type/bug A general bug labels Jan 12, 2023
@violetagg
Copy link
Member

@owaiscs001 Please provide reproducible example.

@violetagg violetagg added for/user-attention This issue needs user attention (feedback, rework, etc...) and removed status/need-triage A new issue that still need to be evaluated as a whole labels Jan 13, 2023
@violetagg violetagg self-assigned this Jan 13, 2023
@owaiscs001
Copy link
Author

Hi @violetagg,

Providing a reproducible example is difficult because of my application size.
Could you please provide me some direction whether I can enable some logs or peg some metric to debug this issue.
As I have already mentioned after disabling idleTimeout I'm not getting any GOAWAY http/2 frames.

Any help here would be appreciated.

@violetagg
Copy link
Member

@owaiscs001 Try to enable wiretap and check when GOAWAY will appear. https://projectreactor.io/docs/netty/release/reference/index.html#_wire_logger

@owaiscs001
Copy link
Author

Hi @violetagg ,

While investigating this issue I came across the following ticket which was raised for reactor netty to send GOAWAY frame when idleTimeout occurs.
#2412

The fix for the above issue was available in v1.0.23, which I was using for my testing and in this version I observed the issue i.e. idletimeout getting kicked even if connection has traffic.

Post this I fallback to v1.0.21, where the GOAWAY fix is not present. And there I didn't saw the problem.

  • Idle timeout got kicked when no traffic was there
  • Incase, traffic is there, idletimeout never kicked

Hence, looks like the fix for ticket 2412 (47deeb2) has broken this behavior.

Could you please take a look and provide your feedback.

@violetagg
Copy link
Member

violetagg commented Jan 17, 2023

@owaiscs001 You are not able to observe this issue with 1.0.21 because this change 47deeb227c3859d4eeb82e9984f090b05b26cd28adds adds this functionality to HTTP/2. So in short with 1.0.21 you do not have idle timeout for HTTP/2.

Please at least provide some logs from the failure.

@violetagg
Copy link
Member

also specify whether you use:

  • H2
  • H2 and HTTP/1.1
  • H2C
  • H2C and HTTP/1.1

@violetagg
Copy link
Member

Do you see log entry like the one below in the log file. If yes provide all log entries that contain the corresponding connection id.
For example in the log entry below the connection id is 1a72d7d4

15:41:28.669 [reactor-http-nio-10] DEBUG r.n.http.server.HttpServerOperations - [1a72d7d4, L:/0:0:0:0:0:0:0:1:55114 - R:/0:0:0:0:0:0:0:1:55115] Connection was idle for [500ms], as per configuration the connection will be closed.

More about how to correlate with connection id you can find here https://projectreactor.io/docs/netty/release/reference/index.html#_http2_3

@owaiscs001
Copy link
Author

owaiscs001 commented Jan 18, 2023

Hi @violetagg

I am using H2C.
I have set idleTimeout 60000ms and took logs as mentioned in the above comment.
I observed the below log after one minute and traffic was running.
"message":"[c8395ca4, L:/192.168.140.231:8080 - R:/192.168.140.207:40068] Connection was idle for [60000ms], as per configuration the connection will be closed."
It further established a new connection and started sending again.
Same thing was repeated every one minute.

For more reference I'm attaching the entire log for one connection.
Netty_Timeout.log

@violetagg violetagg removed the for/user-attention This issue needs user attention (feedback, rework, etc...) label Jan 18, 2023
@owaiscs001
Copy link
Author

Hi @violetagg

Could you please provide some inputs on why this behavior occurs?
Do you think this is an issue?

@violetagg
Copy link
Member

@owaiscs001 Based on the logs it seems that your server is configured with H2C and HTTP/1.1. I did some code review and I think I found one potential issue. I'll create a PR.

Are you able to test a PR or to test a SNAPSHOT version?

@violetagg violetagg added this to the 1.0.28 milestone Jan 19, 2023
@owaiscs001
Copy link
Author

Hi @violetagg

Yes I will test the snapshot version.

Thanks

violetagg added a commit that referenced this issue Jan 19, 2023
- When the server is configured with HTTP/1.1 and H2C ensure Http2MultiplexHandler
is at the end of the pipeline because this handler doesn't forward all channelRead events.
- When the server is configured with HTTP/1.1 and H2 and the client sends HTTP/1.1,
when channelActive event happens, HttpTrafficHandler is still not in the pipeline.
In this use case add IdleTimeoutHandler when HttpTrafficHandler is added to the pipeline.

Fixes #2649
@violetagg violetagg reopened this Jan 20, 2023
@violetagg
Copy link
Member

violetagg commented Jan 20, 2023

Hi @violetagg

Yes I will test the snapshot version.

Thanks

@owaiscs001
Can you please try 1.0.28-SNAPSHOT version from https://repo.spring.io/snapshot

@owaiscs001
Copy link
Author

Hi @violetagg

We have done testing on 1.0.28-SNAPSHOT
We have set HTTP idleTimeout 60000 ms.

Our Test:
Send traffic to Netty 1.0.28-SNAPSHOT.
Stopped traffic and waited for HTTP idleTimeout.

Our Observation:
Netty didn't close the connection after 60000ms.
The connection was intact until the HTTP idleTimeout (Which was 12 min in this case) of the client reached, and the client further closed the connection.

At the log level "Connection was idle for [60000ms], as per configuration the connection will be closed." was not observed
For your reference, we are attaching one log file also.

Netty_Timeout2.log

@violetagg
Copy link
Member

@owaiscs001 Without a reproducible example I'm out of ideas. The tests that I added do not show the behaviour in your logs.

@violetagg violetagg added the for/user-attention This issue needs user attention (feedback, rework, etc...) label Jan 23, 2023
@violetagg
Copy link
Member

@owaiscs001 If you try to enable wiretap and provide these extended logs, I might be able to see what's happening.

@Component
public class MyNettyWebServerCustomizer
        implements WebServerFactoryCustomizer<NettyReactiveWebServerFactory> {
    @Override
    public void customize(NettyReactiveWebServerFactory factory) {
        factory.addServerCustomizers(httpServer -> httpServer.wiretap(true));
    }
}

More about wiretap https://projectreactor.io/docs/netty/release/reference/index.html#_wire_logger

@owaiscs001
Copy link
Author

Hi

Hi @violetagg,

Thanks for the fix.
My bad in testing, I have re-verified again it is working as expected.

Received this log "message":"[id: 0xe563440c, L:/192.168.140.236:8080 - R:/192.168.219.86:43948] OUTBOUND GO_AWAY: lastStreamId=11 errorCode=0 length=0 bytes=","endOfBatch" after set idleTimeout

Can you please share when you will be publishing 1.0.28

@violetagg
Copy link
Member

Hi

Hi @violetagg,

Thanks for the fix. My bad in testing, I have re-verified again it is working as expected.

Received this log "message":"[id: 0xe563440c, L:/192.168.140.236:8080 - R:/192.168.219.86:43948] OUTBOUND GO_AWAY: lastStreamId=11 errorCode=0 length=0 bytes=","endOfBatch" after set idleTimeout

That's amazing!

Can you please share when you will be publishing 1.0.28

Our temporary plans are for 14.02

@violetagg violetagg removed the for/user-attention This issue needs user attention (feedback, rework, etc...) label Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug A general bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants