Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression in Buffer.concat #45802

Open
wemeetagain opened this issue Dec 9, 2022 · 3 comments
Open

Performance regression in Buffer.concat #45802

wemeetagain opened this issue Dec 9, 2022 · 3 comments
Labels
buffer Issues and PRs related to the buffer subsystem. performance Issues and PRs related to the performance of Node.js.

Comments

@wemeetagain
Copy link

Version

v18.12.1

Platform

Linux ruby 5.15.0-53-generic #59-Ubuntu SMP Mon Oct 17 18:53:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

Buffer

What steps will reproduce the bug?

Run this under node v18 vs node v16

const randomBuffer = (length) => Buffer.from(Array.from({length}, () => Math.random() * 255))

const testCases = [
  {bytesPerBuffer: 32, numBuffers: 10},
]

for (const {bytesPerBuffer, numBuffers} of testCases) {
  const results = []
  for (let i = 0; i < 100_000; i++) {
    const buffers = Array.from({length: numBuffers}, () => randomBuffer(bytesPerBuffer))
    const before = performance.now()
    Buffer.concat(buffers)
    const after = performance.now()
    results.push(after - before)
  }
  const avgTime = results.reduce((a, b) => a + b, 0) / results.length
  console.log(`concat ${numBuffers} buffers, ${bytesPerBuffer} each: ${avgTime}s`)
}

How often does it reproduce? Is there a required condition?

Always reproduceable

What is the expected behavior?

Expected behavior is that there is no performance regression.

What do you see instead?

Buffer.concat is ~2.5x slower after node v16 (v17 and beyond)

Additional information

No response

@VoltrexKeyva VoltrexKeyva added buffer Issues and PRs related to the buffer subsystem. performance Issues and PRs related to the performance of Node.js. labels Dec 10, 2022
@RafaelGSS
Copy link
Member

RafaelGSS commented Dec 15, 2022

HI @wemeetagain. I've run it locally and it seems fixed in v19. See below:

v14

concat 10 buffers, 32 each: 0.00033s

v16

concat 10 buffers, 32 each: 0.00044s

v18

concat 10 buffers, 32 each: 0.00114s

v19

concat 10 buffers, 32 each: 0.0004s


Note: I haven't investigated the cause, but, I'd wait for the next v18 release and compare.

@krk
Copy link
Contributor

krk commented Dec 30, 2022

I bisected with the help of nvm, found that 17.2.0 introduced the regression. Then profiling with perf using this modified reproducer:

const randomBuffer = (length) => Buffer.from(Array.from({ length }, () => Math.random() * 255))

const ranBuf = randomBuffer(32);

const results = []
const buffers = Array.from({ length: 10 }, () => ranBuf)

const before = performance.now();
for (let i = 0; i < 10000000; i++) {
    Buffer.concat(buffers)
}
const after = performance.now()
console.log(after - before);

17.1.0 (good)

  28,82%  node     node                 [.] Builtins_TypedArrayPrototypeSet                                                                                                                                       
   6,52%  node     node                 [.] Builtins_CreateTypedArray                                                                                                                                             
   4,11%  node     [JIT] tid 6479       [.] 0x00007f307e848d70                                                                                                                                                    
   3,53%  node     [JIT] tid 6479       [.] 0x00007f307e848d2e                                                                                                                                                    
   2,92%  node     libc-2.26.so         [.] __memmove_avx_unaligned_erms                                                                                                                                          
   2,66%  node     node                 [.] Builtins_LoadGlobalIC                                                                                                                                                 
   1,71%  node     libc-2.26.so         [.] _int_malloc                                                                                                                                                           
   1,54%  node     [JIT] tid 6479       [.] 0x00007f307e848d3d                                                                                                                                                    
   1,52%  node     [JIT] tid 6479       [.] 0x00007f307e848a46                                                                                                                                                    
   1,43%  node     libc-2.26.so         [.] malloc
...
   0,02%  node  node  [.] Builtins_LoadIC                                                                                                                                                                         
   0,08%  node  node  [.] Builtins_Call_ReceiverIsAny                                                                                                                                                             
   0,00%  node  node  [.] Builtins_TypedArrayPrototypeLength                                                                                                                                                     
   0,81%  node  node  [.] Builtins_CallFunction_ReceiverIsAny                                                                                                                                                     

   Builtins_LoadICTrampoline NOT sampled.

17.2.0 (bad)

  32,32%  node     node                 [.] Builtins_LoadIC
  13,60%  node     node                 [.] Builtins_Call_ReceiverIsAny
  12,57%  node     node                 [.] Builtins_TypedArrayPrototypeLength
   9,43%  node     node                 [.] Builtins_CallFunction_ReceiverIsAny
   8,66%  node     node                 [.] Builtins_TypedArrayPrototypeSet
   1,82%  node     node                 [.] Builtins_CreateTypedArray
   1,69%  node     node                 [.] Builtins_LoadICTrampoline
   0,96%  node     [JIT] tid 6491       [.] 0x00007f7ab7048fa0
   0,93%  node     node                 [.] Builtins_LoadGlobalIC
   0,85%  node     libc-2.26.so         [.] __memmove_avx_unaligned_erms
   0,59%  node     libc-2.26.so         [.] _int_malloc
   0,38%  node     libc-2.26.so         [.] malloc

In 17.2.0, Builtins_* dominate the call stacks, pointing to a v8 issue.

Further bisecting with git identifies #40488, which updates v8 from 9.5.172.25 to 9.6.180.14 as the PR when this regression was introduced.

@spollack
Copy link

Question: is the perf fix likely to be picked up for node 18.x? or will we have to wait for 20.x to have the fix in an LTS? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. performance Issues and PRs related to the performance of Node.js.
Projects
None yet
Development

No branches or pull requests

5 participants