Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downgrade response time HTTP-2 after 1.3.0 to 2.0.1 #321

Open
Vkutovoy92 opened this issue Nov 16, 2023 · 26 comments
Open

Downgrade response time HTTP-2 after 1.3.0 to 2.0.1 #321

Vkutovoy92 opened this issue Nov 16, 2023 · 26 comments

Comments

@Vkutovoy92
Copy link

Vkutovoy92 commented Nov 16, 2023

Hello! We have a huge downgrade response time after updating to 2.0.1 after 1.3.0.
My config for client...

Opts = #{
connect_timeout => 5000,
retry => 100,
retry_timeout => 1000 % 1s
},
Args = [
{size, PoolSize},
{start_mfa, {gun, start_link, [self(), Url, 443, Opts]}},
{supervisor_period, 1},
{supervisor_intensity, 1000},
{supervisor_restart, permanent}
],
Result = erlpool:start_pool(PoolName, Args),

And request

StreamRef = gun:post(ConnPid, Path, [
{<<"content-type">>, "application/json"},
{<<"content-length">>, integer_to_binary(size(ReqBody))}
], ReqBody),

GunResp = gun:await(ConnPid, StreamRef, ?TIMEOUT),

Maybe we have to add some extra options which are not required in 1.3?

@essen
Copy link
Member

essen commented Nov 17, 2023

Hello, I'm not sure what you mean by "downgrade" here or what you're measuring exactly. I don't think there's any changes that require configuration although if you're connecting over HTTP/2 some settings are best tweaked.

@Vkutovoy92
Copy link
Author

Response time from services has increased, I have changed only 1.3.0 to 2.0.1 without any settings, and I have got worst percentiles. No any changes except gun version.

I can show graphics.

@essen
Copy link
Member

essen commented Nov 17, 2023

I need a way to reproduce but having data would help understand what this is about yes.

@Vkutovoy92
Copy link
Author

There are 2 services on AWS + http2 with gun, response from 2nd service grew after update 1st service to 2.0.1 gun
Снимок экрана 2023-11-17 в 13 10 46
20:00 update to 2.0.1.

@Vkutovoy92
Copy link
Author

So in % BEFORE
Снимок экрана 2023-11-17 в 13 13 48
AFTER
Снимок экрана 2023-11-17 в 13 13 46
No any changes except update client to new gun

@essen
Copy link
Member

essen commented Nov 17, 2023

4194682 was meant to improve HTTP/2 performance when receiving larger bodies. But perhaps this had a negative impact in your case.

What is the size of the body you send, and what is the size of the body you receive (roughly)?

Another change that was done is send_timeout is now enabled by default.

There may be a few other things. If you can try different Gun commits it could help identify when things started getting worse. I could provide a few interesting commits to upgrade to and see what happens.

@Vkutovoy92
Copy link
Author

received_bytes sent_bytes
4004 4687
586 735
588 737
149 296
992 1213
200 299
592 740
6036 7050
568 716
1110 1324
265 366
330 454
1759 2071
149 296
825 1003
149 294
581 722
187 285
270 368
184 283
211 301
287 384
828 1003
211 301
998 1213
4123 4746
630 750
669 817
283 384
149 295
149 296
149 295
1072 1286
479 587
263 361
4815 5518
149 295
149 295
265 366
149 295
1048 1236
149 295
2676 2969
2135 2067
1108 1319
517 661
149 294
671 838
149 297
149 297
265 366
149 295
479 587
1581 1789
225 324

@essen
Copy link
Member

essen commented Nov 17, 2023

Yeah small. So it's possible the higher default is causing trouble for the server. Try setting the http2_opts for initial_connection_window_size and initial_stream_window_size to their default values (65535). Or if you don't want to upgrade to test, try changing the default in 1.3.0 to 8000000 and see if that makes things worse.

@Vkutovoy92
Copy link
Author

And another
received_bytes sent_bytes
558 2193
867 2983
343 1959
872 2997
868 3152
379 548
868 3188
869 3244
195 718
417 2354
493 683
676 2415
208 270
498 2838
871 3292
522 2471
634 2351
204 503
645 1326
343 1798
557 2113
923 3681
340 1797
198 709
177 272
197 722
618 2760
535 2502
494 684
642 2375
372 914
496 686
372 546
375 893
197 783
198 703
509 2728
198 718
721 3105
721 3033
557 2135
536 2729
535 2744

@Vkutovoy92
Copy link
Author

Yeah small. So it's possible the higher default is causing trouble for the server. Try setting the http2_opts for initial_connection_window_size and initial_stream_window_size to their default values (65535). Or if you don't want to upgrade to test, try changing the default in 1.3.0 to 8000000 and see if that makes things worse.

But I don't see
initial_stream_window_size
initial_connection_window_size
in gun options

-type opts() :: #{
	connect_timeout => timeout(),
	http_opts       => http_opts(),
	http2_opts      => http2_opts(),
	protocols       => [http | http2],
	retry           => non_neg_integer(),
	retry_timeout   => pos_integer(),
	trace           => boolean(),
	transport       => tcp | tls | ssl,
	transport_opts  => [gen_tcp:connect_option()] | [ssl:connect_option()],
	ws_opts         => ws_opts()
}.

and
http2_opts

-type http2_opts() :: #{
	keepalive => timeout()
}.

@essen
Copy link
Member

essen commented Nov 17, 2023

Right it's not available in 1.3, sorry. I guess the only way to properly test is upgrading to 2.0 and set the option there to 65535.

#{ http2_opts => #{ initial_connection_window_size => 65535, initial_stream_window_size => 65535 }}

@Vkutovoy92
Copy link
Author

Vkutovoy92 commented Nov 17, 2023

Right it's not available in 1.3, sorry. I guess the only way to properly test is upgrading to 2.0 and set the option there to 65535.

#{ http2_opts => #{ initial_connection_window_size => 65535, initial_stream_window_size => 65535 }}

Thank's! I'll try it later!
Do I have to add additional options to

#{
    http2_opts => #{
      keepalive => 60 * 1000,
initial_connection_window_size => 65535, initial_stream_window_size => 65535 
    },
    connect_timeout => 5000,
    retry => 100,
    retry_timeout => 1000 % 1s
  },

except of yours?

@essen
Copy link
Member

essen commented Nov 17, 2023

Go with that for now and let's see.

@Vkutovoy92
Copy link
Author

Vkutovoy92 commented Nov 17, 2023

initial_connection_window_size => 65535, initial_stream_window_size => 65535 

It really works! Much better with options!

1st, 1.3.0
Снимок экрана 2023-11-17 в 19 34 47

After, 2.0.1
Снимок экрана 2023-11-17 в 19 34 54

@essen
Copy link
Member

essen commented Nov 17, 2023

Glad to hear. I didn't expect that the default value's increase would have such a negative impact, I will need to do further experiments and perhaps change either the value or the related algorithm to better handle both cases (small bodies and large bodies). Might be worth looking at what browsers are doing too.

@Vkutovoy92
Copy link
Author

Glad to hear. I didn't expect that the default value's increase would have such a negative impact, I will need to do further experiments and perhaps change either the value or the related algorithm to better handle both cases (small bodies and large bodies). Might be worth looking at what browsers are doing too.

Another example

1.3.0
Снимок экрана 2023-11-17 в 20 00 31

2.0.1 default
Снимок экрана 2023-11-17 в 20 00 36

2.0.1 + extra options
Снимок экрана 2023-11-17 в 20 00 41

So you see 2.0.1 with not extra options gives really downgrade response time

@Vkutovoy92
Copy link
Author

Maybe you can advise how to correct choose these values? What's the rule?

@Vkutovoy92
Copy link
Author

I have a service which works correct with default gun, I'll give you sizes of bodies on Monday. It's really interesting, no any downgrade.

@essen
Copy link
Member

essen commented Nov 17, 2023

Maybe you can advise how to correct choose these values? What's the rule?

It's just a control for how much memory you are willing to accept using. With the downside that the lower the value (and the lower the memory usage) the lower the performance. Default Gun has those values to 8MB to favor performance when having large bodies. But in Gun setting this value to 8MB has no effect on its own, no buffer gets allocated immediately, it just means there can be roughly 8MB in transit at once between your application and the server.

Clearly the server you are connected to doesn't seem to like that though. Maybe on your service that runs well you are connected to a different server. Or perhaps it's an issue related to shared resources or bandwidth. Figuring out the real cause is likely to be difficult.

@Vkutovoy92
Copy link
Author

Maybe you can advise how to correct choose these values? What's the rule?

It's just a control for how much memory you are willing to accept using. With the downside that the lower the value (and the lower the memory usage) the lower the performance. Default Gun has those values to 8MB to favor performance when having large bodies. But in Gun setting this value to 8MB has no effect on its own, no buffer gets allocated immediately, it just means there can be roughly 8MB in transit at once between your application and the server.

Clearly the server you are connected to doesn't seem to like that though. Maybe on your service that runs well you are connected to a different server. Or perhaps it's an issue related to shared resources or bandwidth. Figuring out the real cause is likely to be difficult.

So
initial_connection_window_size - is it a max size of body for all stream at once time or what?

initial_stream_window_size - is it a max size for 1 stream?

@essen
Copy link
Member

essen commented Nov 17, 2023

Best I redirect you to the spec, see https://datatracker.ietf.org/doc/html/rfc7540#section-6.9.1 for full details about flow control. The two options are what Gun will set for the initial values for the flow control window. Then Gun has an algorithm that ensures there's always some space in the window, see cow_http2_machine:ensure_window for the implementation.

@Vkutovoy92
Copy link
Author

Morning) Now I get new error

{badmap,{'EXIT',{{badmatch,{error,{stream_error,{closed,{error,closed}}}}}

And stacktrace shows a line with
{ok, Body} = gun:await_body(ConnPid, StreamRef, ?TIMEOUT),

Maybe another additional options are needed?

@essen
Copy link
Member

essen commented Nov 18, 2023

That just indicates the server closed a connection. Please open a separate ticket with the stacktrace.

@dubrovine
Copy link

dubrovine commented Nov 18, 2023

That just indicates the server closed a connection. Please open a separate ticket with the stacktrace.

It’s already open
#291

actually that’s why I switched back to gun 1.3.0

@RoadRunnr
Copy link

Have you tried to using tcp_opts => [{nodelay, true}] in gun:opts() ???

The way the HTTP/2 handler is implemented in gun means that gun will use two seperate gen_tcp:send for every HTPP/2 requests, one send for the header frame and one for the data frame. This leads to two TCP fragments being send and that pattern then triggers a bad interaction between the TCP Nagle algorithm and TCP delayed ACK.

The observable effect is typically a 40ms delay in the request.

@Vkutovoy92
Copy link
Author

Have you tried to using tcp_opts => [{nodelay, true}] in gun:opts() ???

The way the HTTP/2 handler is implemented in gun means that gun will use two seperate gen_tcp:send for every HTPP/2 requests, one send for the header frame and one for the data frame. This leads to two TCP fragments being send and that pattern then triggers a bad interaction between the TCP Nagle algorithm and TCP delayed ACK.

The observable effect is typically a 40ms delay in the request.

No, I haven't. I used initial_connection_window_size/initial_stream_window_size and it helped with reduce of response time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants