Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data channels quickly slow down to a crawl under pressure #1088

Open
lostmsu opened this issue Mar 20, 2024 · 3 comments
Open

Data channels quickly slow down to a crawl under pressure #1088

lostmsu opened this issue Mar 20, 2024 · 3 comments

Comments

@lostmsu
Copy link
Contributor

lostmsu commented Mar 20, 2024

Please see #1087 for the test project.

P.S. I think a unit test for transmitting a few GiB of data should be put in place.

@lostmsu
Copy link
Contributor Author

lostmsu commented Mar 20, 2024

OK, I had some midnight debugging, and I believe the main reason is incorrect reset of _congestionWindow in both places where it is set to _defaultMTU.

In the bit that refers to T3-rtx timer I believe this line above it is the cause:

.Where(x => now.Subtract(x.LastSentAt).TotalMilliseconds > (_hasRoundTripTime ? _rto : _rtoInitialMilliseconds))

Here T3-rtx timer is considered expired when RTO milliseconds passed since the last time the chunk was sent at. I believe this is wrong, because from my understanding of RFC T3-rtx timer is not the chunk property, but rather the destination property. I might be mistaken here, maybe LastSentAt is a clever trick that does exactly that, but it is not immediately clear from the code.

The second bit that enters retransmit mode I believe simply used the wrong formula from RFC. It should have used cwnd = ssthresh instead. E.g. the line should be _congestionWindow = _slowStartThreshold.

Additionally, I believe in CalculateCongestionWindow all comparisons to _congestionWindow must be replaced with "or equal" comparisons again from reading RFC9620. For example, (7.2.1)

MUST use the slow-start algorithm to increase cwnd only if the current congestion window is being fully utilized

I interpret this as outstandingBytes >= _congestionWindow while the code has outstandingBytes > _congestionWindow.

lostmsu added a commit to BorgGames/sipsorcery that referenced this issue Mar 20, 2024
some of the fixes are probably legit, but I believe the T3-rtx timer is not implemented correctly

kinda works around sipsorcery-org#1088
lostmsu added a commit to BorgGames/sipsorcery that referenced this issue Mar 20, 2024
some of the fixes are probably legit, but I believe the T3-rtx timer is not implemented correctly

kinda works around sipsorcery-org#1088
lostmsu added a commit to BorgGames/sipsorcery that referenced this issue Mar 21, 2024
@ris-work
Copy link

@lostmsu, thank you for finding this out. I would like to have the patches applied to the one I am currently using. It says that the commits don't belong to a branch. Are the patches under the same license as SipSorcey?

I also noticed that on a not-really-bad server, with dd if=/dev/zero bs=1k piped to it, the throughput is about 768kB/s (kilobytes) and the CPU usage is... not bad, but ok. I have no idea how to monkeypatch C# projects locally so it might take some time. Rust gives me around 2-3 MB/s, but due to some issues, I had to move from webrtc-rs with TURN relay candidates with coturn to SipSorcery. I could not pinpoint the details, but node-wrtc works well under the same circumstances though it is unmaintained, at about 1.3 Mb/s (which is pretty okay for a language like JS.).

@lostmsu
Copy link
Contributor Author

lostmsu commented Mar 28, 2024

@ris-work same license

If you want to experiment, ignore the first commit, I edited it and force-pushed as the second commit. You would need to get their branch and cherry-pick the commits from it (the branch has many breaking changes of various stability).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants