Replace pool with bytes in readLoop #2758

cnderrauber · 2024-04-24T08:54:42Z

Replace pool with bytes in readLoop

Description

Reference issue

Fixes #...

Replace pool with bytes in readLoop

codecov · 2024-04-24T08:56:31Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.60%. Comparing base (a9e88d2) to head (f8b3e6a).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2758      +/-   ##
==========================================
- Coverage   78.63%   78.60%   -0.04%     
==========================================
  Files          87       87              
  Lines        8202     8198       -4     
==========================================
- Hits         6450     6444       -6     
- Misses       1279     1281       +2     
  Partials      473      473

Flag	Coverage Δ
go	`80.18% <100.00%> (-0.04%)`	⬇️
wasm	`64.41% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

boks1971

lgtm!

Sean-Der · 2024-04-24T11:12:00Z

@cnderrauber Would you mind explaining the reasoning for the change?

Does this reduce allocation for a single data channel? What is the impact if the user has many?

The original change was driven with data. Would be nice to match that!

cnderrauber · 2024-04-25T01:57:45Z

@Sean-Der It does not revert the change 159ba5a but a simplify, the oldest behavior was allocating a 65KB buffer for each read call, then @bshimc change it to get buffer from pool for every loop, but we don't need a pool here if the readLoop always need a bytes buffer, just allocate it once at the head and release it at end

cnderrauber · 2024-04-25T02:27:03Z

goos: darwin
goarch: arm64
pkg: github.com/pion/webrtc/v4
                     │    old.txt    │                  new.txt                  │
                     │    sec/op     │     sec/op      vs base                   │
DataChannelSend2-10     1.860µ ± ∞ ¹   1.837µ ±   ∞ ¹        ~ (p=1.000 n=1+3) ²
DataChannelSend4-10     3.882µ ± ∞ ¹   2.175µ ±   ∞ ¹        ~ (p=0.400 n=3+2) ²
DataChannelSend8-10     2.797µ ± ∞ ¹   3.271µ ±   ∞ ¹        ~ (p=1.000 n=1)   ²
DataChannelSend16-10   13.133µ ± ∞ ¹   5.842µ ±   ∞ ¹        ~ (p=0.333 n=2)   ²
DataChannelSend32-10     1.119 ± ∞ ¹    1.030 ± 19%          ~ (p=0.683 n=4+8)
geomean                 49.49µ         37.94µ          -23.33%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                     │    old.txt    │                  new.txt                  │
                     │     B/op      │      B/op        vs base                  │
DataChannelSend2-10    1.274Ki ± ∞ ¹   1.243Ki ±   ∞ ¹  -2.45% (n=1+3)
DataChannelSend4-10    1.305Ki ± ∞ ¹   1.252Ki ±   ∞ ¹       ~ (p=0.200 n=3+2) ²
DataChannelSend8-10    1.284Ki ± ∞ ¹   1.256Ki ±   ∞ ¹       ~ (p=1.000 n=1)   ²
DataChannelSend16-10   1.303Ki ± ∞ ¹   1.318Ki ±   ∞ ¹       ~ (p=1.000 n=2)   ²
DataChannelSend32-10   6.831Mi ± ∞ ¹   7.465Mi ± 22%         ~ (p=0.683 n=4+8)
geomean                7.208Ki         7.226Ki          +0.25%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                     │   old.txt    │                 new.txt                  │
                     │  allocs/op   │   allocs/op     vs base                  │
DataChannelSend2-10     33.00 ± ∞ ¹    32.00 ±   ∞ ¹       ~ (p=1.500 n=1+3) ²
DataChannelSend4-10     33.00 ± ∞ ¹    32.00 ±   ∞ ¹  -3.03% (n=3+2)
DataChannelSend8-10     33.00 ± ∞ ¹    32.00 ±   ∞ ¹       ~ (p=1.000 n=1)   ²
DataChannelSend16-10    24.50 ± ∞ ¹    32.00 ±   ∞ ¹       ~ (p=0.333 n=2)   ²
DataChannelSend32-10   87.43k ± ∞ ¹   73.69k ± 68%         ~ (p=0.808 n=4+8)
geomean                 150.4          150.5          +0.08%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

bshi · 2024-04-26T03:23:22Z

but we don't need a pool here if the readLoop always need a bytes buffer, just allocate it once at the head and release it at end

The reason that the pool is required is that, under high concurrency and GC pressure, memory allocation explodes and performance is crippled.

From the 159ba5a - note how allocations/op stays constant as parallelism increases (and goes to 0 with a pooled approach). Note also how time/op stays relatively constant because GC pressure has been alleviated.

Two things to note about the the most recent benchmark numbers in the previous comment are;

the # of measurements needs to be higher to have more confidence in them (-count)
The more interesting thing is that the old measurements already show signs of GC pressure. At some point, some other changes in and around this code (not just this one) have made performance regressions. It's probably worth repeating the profiling exercise in Reduce DataChannel.readLoop allocations #1516 at this point.

                     │    old.txt    │                  new.txt                  │
                     │    sec/op     │     sec/op      vs base                   │
DataChannelSend2-10     1.860µ ± ∞ ¹   1.837µ ±   ∞ ¹        ~ (p=1.000 n=1+3) ²
DataChannelSend4-10     3.882µ ± ∞ ¹   2.175µ ±   ∞ ¹        ~ (p=0.400 n=3+2) ²
DataChannelSend8-10     2.797µ ± ∞ ¹   3.271µ ±   ∞ ¹        ~ (p=1.000 n=1)   ²
DataChannelSend16-10   13.133µ ± ∞ ¹   5.842µ ±   ∞ ¹        ~ (p=0.333 n=2)   ²
DataChannelSend32-10     1.119 ± ∞ ¹    1.030 ± 19%          ~ (p=0.683 n=4+8) <-- 1 second per operation (!!!)
geomean                 49.49µ         37.94µ          -23.33%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

I would recommend rolling back this change and profiling to figure out where this regression came from (or perhaps just bisect and benchmark for a quick search). Instead of eliminating pooling, I suspect the team needs to increase the use of pools.

CC @Sean-Der

cnderrauber · 2024-04-26T03:32:44Z

This is different with code before 1516, the problem of the original code is it allocates 65KB bytes in every read that cause the gc pressure, and this change allocates the buffer only once for a data channel readloop and reuse it for in the entire channel lifecycle, I don't think the data channel creation is a high frequency operation that needs a pool for it.

Replace pool with bytes in readLoop

f8b3e6a

Replace pool with bytes in readLoop

cnderrauber requested review from davidzhao, Sean-Der and boks1971 April 24, 2024 08:54

boks1971 approved these changes Apr 24, 2024

View reviewed changes

cnderrauber merged commit d851a44 into master Apr 24, 2024
18 checks passed

cnderrauber deleted the remove_pool branch April 24, 2024 09:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace pool with bytes in readLoop #2758

Replace pool with bytes in readLoop #2758

cnderrauber commented Apr 24, 2024

codecov bot commented Apr 24, 2024 •

edited

boks1971 left a comment

Sean-Der commented Apr 24, 2024

cnderrauber commented Apr 25, 2024 •

edited

cnderrauber commented Apr 25, 2024

bshi commented Apr 26, 2024

cnderrauber commented Apr 26, 2024

Replace pool with bytes in readLoop #2758

Replace pool with bytes in readLoop #2758

Conversation

cnderrauber commented Apr 24, 2024

Description

Reference issue

codecov bot commented Apr 24, 2024 • edited

Codecov Report

boks1971 left a comment

Choose a reason for hiding this comment

Sean-Der commented Apr 24, 2024

cnderrauber commented Apr 25, 2024 • edited

cnderrauber commented Apr 25, 2024

bshi commented Apr 26, 2024

cnderrauber commented Apr 26, 2024

codecov bot commented Apr 24, 2024 •

edited

cnderrauber commented Apr 25, 2024 •

edited