Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

websockets: Count pings from server as activity for idle_timeout #2718

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

xim
Copy link

@xim xim commented Aug 11, 2023

If the stream is receiving control packets like ping, don't count it as idle. This means you can enable timeout_opt.keep_alive_ping on only one side to get heartbeat.

Addresses issue #2716

@codecov
Copy link

codecov bot commented Aug 11, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (e42cf8a) 93.03% compared to head (c8ffce2) 93.03%.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2718      +/-   ##
===========================================
- Coverage    93.03%   93.03%   -0.01%     
===========================================
  Files          178      178              
  Lines        13687    13684       -3     
===========================================
- Hits         12734    12731       -3     
  Misses         953      953              
Files Coverage Δ
include/boost/beast/websocket/impl/stream_impl.hpp 96.16% <100.00%> (-0.03%) ⬇️
include/boost/beast/websocket/stream_base.hpp 26.31% <ø> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e42cf8a...c8ffce2. Read the comment docs.

@klemens-morgenstern
Copy link
Collaborator

Can't this be configured already?

beast::websocket::stream_base::timeout to;
my_ws.get_option(to);
to.keep_alive_pings = true;
my_ws.set_option(to);

@xim
Copy link
Author

xim commented Aug 11, 2023

This PR changes the behavior when keep_alive_ping is false.

Without the change: receiving control frames does not count as "activity"

With the change: receiving control frames counts as activity

I could make that behaviour change another config member of the timeout struct if wanted.

@klemens-morgenstern
Copy link
Collaborator

I understand that - but why? Isn't setting the option achieving the same thing? What's the difference?

@xim
Copy link
Author

xim commented Aug 11, 2023

Ah. The setting makes you send ping frames at an even interval. This change counts incoming control frames as activity regardless of whether or not you are sending pings yourself.

@klemens-morgenstern
Copy link
Collaborator

Receiving control frames is not activity.

So we would need another option to allow the pings to be counted as such. This all seems off to me, why don't you just send pings or increase the timeout?

@xim
Copy link
Author

xim commented Aug 14, 2023

I first reported this because I was confused by the idle_timeout / keep_alive_pings config options; Since a pong is activity when you've sent a ping, shouldn't a ping be activity as well?

If not, timely disconnect detection needs ping and pong to be sent in both directions, doubling the number of packets sent/received for no additional benefit.

Not a big issue, but I feel counting ping as activity is the more intuitive behaviour. Adding this as a bool incoming_ping_keepalives = false option also makes sense if keeping the old behaviour as default is a priority.

@xim
Copy link
Author

xim commented Aug 30, 2023

Updated the PR with tests, @ashtum

I feel using a second wall clock time may be excessive, but seems to be somewhat in line with what other tests have done

@xim
Copy link
Author

xim commented Sep 1, 2023

Can/should I add/change documentation somehow? Is it sufficient that I update the inline documentation in stream_base.hpp?

@ashtum
Copy link
Collaborator

ashtum commented Sep 1, 2023

Yes, please update the docs here as well (if there is anything related).

@ashtum
Copy link
Collaborator

ashtum commented Sep 16, 2023

@xim Anything I can help with?

@xim
Copy link
Author

xim commented Sep 16, 2023

Sorry, @ashtum, this just disappeared in the Big Pile of Stuff. I'll try to update docs on Monday

@xim
Copy link
Author

xim commented Oct 2, 2023

Apologies for forgetting about the documentation update. I rebased on develop, added documentation and pushed again.

@cppalliance-bot
Copy link

An automated preview of the documentation is available at https://2718.beastdocs.prtest.cppalliance.org/libs/beast/doc/html/index.html

1 similar comment
@cppalliance-bot
Copy link

An automated preview of the documentation is available at https://2718.beastdocs.prtest.cppalliance.org/libs/beast/doc/html/index.html

@cppalliance-bot
Copy link

An automated preview of the documentation is available at https://2718.beastdocs.prtest.cppalliance.org/libs/beast/doc/html/index.html

@xim
Copy link
Author

xim commented Oct 2, 2023

I'm happy with this now. If you want me to squash all commits, or fixup my fixup, or anything, just give the word =)

@klemens-morgenstern
Copy link
Collaborator

This can be squashed on our side, this looks like a single commit.

From what I can tell @vinniefalco approves the behavioural change, so everything looks good to me.

@xim
Copy link
Author

xim commented Oct 4, 2023

Looks like one of the tests has a timer that's 500ms and expects it to have been triggered after 600ms. I'll make it more permissive.

@xim
Copy link
Author

xim commented Oct 4, 2023

Made that 750ms, effectively bumping the slack from 100ms to 250ms

@cppalliance-bot
Copy link

An automated preview of the documentation is available at https://2718.beastdocs.prtest.cppalliance.org/libs/beast/doc/html/index.html

@xim
Copy link
Author

xim commented Oct 4, 2023

Apologies for the noise. Tweaking these values is hard.

@cppalliance-bot
Copy link

An automated preview of the documentation is available at https://2718.beastdocs.prtest.cppalliance.org/libs/beast/doc/html/index.html

1 similar comment
@cppalliance-bot
Copy link

An automated preview of the documentation is available at https://2718.beastdocs.prtest.cppalliance.org/libs/beast/doc/html/index.html

@ashtum
Copy link
Collaborator

ashtum commented Oct 9, 2023

@xim Why did you alter the timing values in the tests? Did you encounter any failures as a result of the tight timings?

@xim
Copy link
Author

xim commented Oct 9, 2023

Yes. I had to increase/change margins many times before all the different CI builders were green at the same time. The original timing worked once across the board, but not on the next run. The current timing should be stable.

I'd love to see these tests run with stubbed time, for stability and execution speed. But don't know if that's readily available.

@xim
Copy link
Author

xim commented Nov 26, 2023

I mean we have been setting idle_counter to zero unconditionally.

Yes, but without keep_alive_pings the value of idle_counter is ignored and we go straight to timeout.

@ashtum
Copy link
Collaborator

ashtum commented Nov 26, 2023

Yes, but without keep_alive_pings the value of idle_counter is ignored and we go straight to timeout.

Yeah, my point was that this might have been an unintentional behavior because we wanted to only send the ping message when keep_alive_pings is set and accidentally ignored the situation when idle_counter is zero and keep_alive_pings is unset.

Besides that, other parts of the code show that we have been counting control frames as activity (by unconditionally setting idle_counter to zero).

@cppalliance-bot
Copy link

An automated preview of the documentation is available at https://2718.beastdocs.prtest.cppalliance.org/libs/beast/doc/html/index.html

@ashtum
Copy link
Collaborator

ashtum commented Dec 25, 2023

@xim, could you please squash these commits into a single commit?
And it seems there is still a problem in the unit tests for the ping operation: https://github.com/boostorg/beast/actions/runs/7004420416/job/19732998803?pr=2718#step:9:174

@xim
Copy link
Author

xim commented Dec 26, 2023

Yeah, the timing there is apparently night-impossible to get correct in a way that always passes on all platforms. In that one case, a ping hasn't been processed within 400ms. I can increase everything 25% again so it's 500ms. But is anyone has a way of running these tests with stubbed time, that would be preferable...

xim added a commit to xim/beast that referenced this pull request Dec 26, 2023
If the stream is receiving control packets like ping, don't count it as
idle. This means you can enable `timeout_opt.keep_alive_ping` on only
one side to get heartbeat.

Make tests that verify current behaviour.

Update documentation to match changes in PR boostorg#2718.

Addresses issue boostorg#2716
@cppalliance-bot
Copy link

An automated preview of the documentation is available at https://2718.beastdocs.prtest.cppalliance.org/libs/beast/doc/html/index.html

@xim
Copy link
Author

xim commented Dec 26, 2023

The failures appear unrelated to my commit

Comment on lines 116 to 131
BEAST_EXPECT(ws.impl_->idle_counter == 0);

test::run_for(ioc_, std::chrono::milliseconds(1250));
// After 1.25s idle, no timeout but idle counter is 1
BEAST_EXPECT(ws.impl_->idle_counter == 1);

es.async_ping();
test::run_for(ioc_, std::chrono::milliseconds(500));
// The server sent a ping at 1.25s mark, and we're now at 1.75s mark.
// We haven't hit the idle timer yet (happens at 1s, 2s, 3s)
BEAST_EXPECT(ws.impl_->idle_counter == 0);
BEAST_EXPECT(!got_timeout);

test::run_for(ioc_, std::chrono::milliseconds(750));
// At 2.5s total; should have triggered the idle timer
BEAST_EXPECT(ws.impl_->idle_counter == 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the tight timing is a consequence of asserting idle_counter in multiple places. There is a good chance that these timings could lead to non-deterministic test results. Therefore, I suggest not checking for idle_counter and instead only verifying the side effect (inactivity timeout doesn't occur when receiving pings).

If the stream is receiving control packets like ping, don't count it as
idle. This means you can enable `timeout_opt.keep_alive_ping` on only
one side to get heartbeat.

Make tests that verify current behaviour.

Update documentation to match changes in PR boostorg#2718.

Addresses issue boostorg#2716
@ashtum
Copy link
Collaborator

ashtum commented Feb 15, 2024

@xim, I've removed the idle_counter assertions in the hope that this will reduce the chance of getting non-deterministic test results. Let's see if we can run CI multiple times with all tests passing.

@cppalliance-bot
Copy link

An automated preview of the documentation is available at https://2718.beastdocs.prtest.cppalliance.org/libs/beast/doc/html/index.html

@cppalliance-bot
Copy link

An automated preview of the documentation is available at https://2718.beastdocs.prtest.cppalliance.org/libs/beast/doc/html/index.html

@ashtum ashtum force-pushed the develop branch 2 times, most recently from 3491313 to 14a1550 Compare February 15, 2024 11:47
@cppalliance-bot
Copy link

An automated preview of the documentation is available at https://2718.beastdocs.prtest.cppalliance.org/libs/beast/doc/html/index.html

@cppalliance-bot
Copy link

An automated preview of the documentation is available at https://2718.beastdocs.prtest.cppalliance.org/libs/beast/doc/html/index.html

@cppalliance-bot
Copy link

An automated preview of the documentation is available at https://2718.beastdocs.prtest.cppalliance.org/libs/beast/doc/html/index.html

@cppalliance-bot
Copy link

An automated preview of the documentation is available at https://2718.beastdocs.prtest.cppalliance.org/libs/beast/doc/html/index.html

@@ -227,8 +227,6 @@ class stream<NextLayer, deflateSupported>::read_some_op
// Handle ping frame
if(impl.rd_fh.op == detail::opcode::ping)
{
impl.update_timer(this->get_executor());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating the timer is necessary; otherwise, it will attempt to ping the other side even after receiving a ping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

inactivity_timeout is triggered regardless of having received pings
5 participants