Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace pool with bytes in readLoop #2758

Merged
merged 1 commit into from Apr 24, 2024
Merged

Replace pool with bytes in readLoop #2758

merged 1 commit into from Apr 24, 2024

Conversation

cnderrauber
Copy link
Member

Replace pool with bytes in readLoop

Description

Reference issue

Fixes #...

Replace pool with bytes in readLoop
Copy link

codecov bot commented Apr 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.60%. Comparing base (a9e88d2) to head (f8b3e6a).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2758      +/-   ##
==========================================
- Coverage   78.63%   78.60%   -0.04%     
==========================================
  Files          87       87              
  Lines        8202     8198       -4     
==========================================
- Hits         6450     6444       -6     
- Misses       1279     1281       +2     
  Partials      473      473              
Flag Coverage Δ
go 80.18% <100.00%> (-0.04%) ⬇️
wasm 64.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@boks1971 boks1971 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@cnderrauber cnderrauber merged commit d851a44 into master Apr 24, 2024
18 checks passed
@cnderrauber cnderrauber deleted the remove_pool branch April 24, 2024 09:02
@Sean-Der
Copy link
Member

@cnderrauber Would you mind explaining the reasoning for the change?

Does this reduce allocation for a single data channel? What is the impact if the user has many?

The original change was driven with data. Would be nice to match that!

@cnderrauber
Copy link
Member Author

cnderrauber commented Apr 25, 2024

@Sean-Der It does not revert the change 159ba5a but a simplify, the oldest behavior was allocating a 65KB buffer for each read call, then @bshimc change it to get buffer from pool for every loop, but we don't need a pool here if the readLoop always need a bytes buffer, just allocate it once at the head and release it at end

@cnderrauber
Copy link
Member Author

goos: darwin
goarch: arm64
pkg: github.com/pion/webrtc/v4
                     │    old.txt    │                  new.txt                  │
                     │    sec/op     │     sec/op      vs base                   │
DataChannelSend2-10     1.860µ ± ∞ ¹   1.837µ ±   ∞ ¹        ~ (p=1.000 n=1+3) ²
DataChannelSend4-10     3.882µ ± ∞ ¹   2.175µ ±   ∞ ¹        ~ (p=0.400 n=3+2) ²
DataChannelSend8-10     2.797µ ± ∞ ¹   3.271µ ±   ∞ ¹        ~ (p=1.000 n=1)   ²
DataChannelSend16-10   13.133µ ± ∞ ¹   5.842µ ±   ∞ ¹        ~ (p=0.333 n=2)   ²
DataChannelSend32-10     1.119 ± ∞ ¹    1.030 ± 19%          ~ (p=0.683 n=4+8)
geomean                 49.49µ         37.94µ          -23.33%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                     │    old.txt    │                  new.txt                  │
                     │     B/op      │      B/op        vs base                  │
DataChannelSend2-10    1.274Ki ± ∞ ¹   1.243Ki ±   ∞ ¹  -2.45% (n=1+3)
DataChannelSend4-10    1.305Ki ± ∞ ¹   1.252Ki ±   ∞ ¹       ~ (p=0.200 n=3+2) ²
DataChannelSend8-10    1.284Ki ± ∞ ¹   1.256Ki ±   ∞ ¹       ~ (p=1.000 n=1)   ²
DataChannelSend16-10   1.303Ki ± ∞ ¹   1.318Ki ±   ∞ ¹       ~ (p=1.000 n=2)   ²
DataChannelSend32-10   6.831Mi ± ∞ ¹   7.465Mi ± 22%         ~ (p=0.683 n=4+8)
geomean                7.208Ki         7.226Ki          +0.25%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                     │   old.txt    │                 new.txt                  │
                     │  allocs/op   │   allocs/op     vs base                  │
DataChannelSend2-10     33.00 ± ∞ ¹    32.00 ±   ∞ ¹       ~ (p=1.500 n=1+3) ²
DataChannelSend4-10     33.00 ± ∞ ¹    32.00 ±   ∞ ¹  -3.03% (n=3+2)
DataChannelSend8-10     33.00 ± ∞ ¹    32.00 ±   ∞ ¹       ~ (p=1.000 n=1)   ²
DataChannelSend16-10    24.50 ± ∞ ¹    32.00 ±   ∞ ¹       ~ (p=0.333 n=2)   ²
DataChannelSend32-10   87.43k ± ∞ ¹   73.69k ± 68%         ~ (p=0.808 n=4+8)
geomean                 150.4          150.5          +0.08%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

@bshi
Copy link

bshi commented Apr 26, 2024

but we don't need a pool here if the readLoop always need a bytes buffer, just allocate it once at the head and release it at end

The reason that the pool is required is that, under high concurrency and GC pressure, memory allocation explodes and performance is crippled.

From the 159ba5a - note how allocations/op stays constant as parallelism increases (and goes to 0 with a pooled approach). Note also how time/op stays relatively constant because GC pressure has been alleviated.

image

Two things to note about the the most recent benchmark numbers in the previous comment are;

  • the # of measurements needs to be higher to have more confidence in them (-count)
  • The more interesting thing is that the old measurements already show signs of GC pressure. At some point, some other changes in and around this code (not just this one) have made performance regressions. It's probably worth repeating the profiling exercise in Reduce DataChannel.readLoop allocations #1516 at this point.
                     │    old.txt    │                  new.txt                  │
                     │    sec/op     │     sec/op      vs base                   │
DataChannelSend2-10     1.860µ ± ∞ ¹   1.837µ ±   ∞ ¹        ~ (p=1.000 n=1+3) ²
DataChannelSend4-10     3.882µ ± ∞ ¹   2.175µ ±   ∞ ¹        ~ (p=0.400 n=3+2) ²
DataChannelSend8-10     2.797µ ± ∞ ¹   3.271µ ±   ∞ ¹        ~ (p=1.000 n=1)   ²
DataChannelSend16-10   13.133µ ± ∞ ¹   5.842µ ±   ∞ ¹        ~ (p=0.333 n=2)   ²
DataChannelSend32-10     1.119 ± ∞ ¹    1.030 ± 19%          ~ (p=0.683 n=4+8) <-- 1 second per operation (!!!)
geomean                 49.49µ         37.94µ          -23.33%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

I would recommend rolling back this change and profiling to figure out where this regression came from (or perhaps just bisect and benchmark for a quick search). Instead of eliminating pooling, I suspect the team needs to increase the use of pools.

CC @Sean-Der

@cnderrauber
Copy link
Member Author

This is different with code before 1516, the problem of the original code is it allocates 65KB bytes in every read that cause the gc pressure, and this change allocates the buffer only once for a data channel readloop and reuse it for in the entire channel lifecycle, I don't think the data channel creation is a high frequency operation that needs a pool for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants