src: use find instead of char-by-char in FromFilePath() #50288

lemire · 2023-10-19T14:48:14Z

In PR https://github.com/nodejs/node/pull/50253/files, an optimization was proposed for the FromFilePath() function. This function replaces every occurence of the '%' character by the string '%25'. The expectation is that in this function (FromFilePath), strings typically do not contain the '%', or if they do, they have few such characters. It lead me to write a blog post and write a non-trivial benchmark and realistic data: For processing strings, streams in C++ can be slow. My work suggests that a loop calling find is faster. Furthermore, we can do one better and avoid the allocation of a temporary std::string in the common case where the '%' is not found.

If you use my benchmarking code, you find that code similar to the PR is several times more efficient the current code (5 GB/s vs 0.34 GB/s) (note: the benchmark does not include the URL parsing which is considered separate). Here are my numbers of my macBook with LLVM 14 (you can run your own benchmarks):

Capture d’écran, le 2023-10-22 à 12 52 02

Clang compiles the second inner loop to the following:

.LBB0_12: # =>This Inner Loop Header: Depth=1
  lea r14, [r13 + 1]
  cmp r15, r14
  mov rdx, r14
  cmovb rdx, r15
  mov rax, rbp
  sub rax, qword ptr [rbx + 8]
  cmp rax, rdx
  jb .LBB0_13
  mov rdi, rbx
  mov rsi, r12
  call std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)@PLT
  mov rax, qword ptr [rbx + 8]
  and rax, -2
  cmp rax, qword ptr [rsp + 16] # 8-byte Folded Reload
  je .LBB0_17
  mov edx, 2
  mov rdi, rbx
  lea rsi, [rip + .L.str]
  call std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)@PLT
  cmp r15, r13
  jbe .LBB0_21
  add r12, r14
  sub r15, r14
  je .LBB0_24
  mov rdi, r12
  mov esi, 37
  mov rdx, r15
  call memchr@PLT
  test rax, rax
  je .LBB0_27
  mov r13, rax
  sub r13, r12
  cmp r13, -1
  jne .LBB0_12

There are two calls to string append, as you'd expect. Otherwise, there is no allocation. E.g., for example, there are calls to a std::string_view constructor because it gets optimized away. Remember that std::string_view instances are non-allocating. That's why, for example, we pass them by value (not by reference), typically.

The adversarial scenario is one where the entire string is made of '%'. You can benchmark this case using my code, while passing --adversarial to the benchmark program (benchmark --adversarial). In that scenario, the repeated calls to find are not a positive. All tested functions are slow in this adversarial case, but the new code is about 50% slower. That's to be expected. If you do expect strings to contain a large fraction of '%' characters, then the PR is not beneficial. But the point of the PR is that on realistic inputs, the PR can multiply the performance by 10x.

Results on my macBook (LLVM14, you can run your own benchmarks):

Capture d’écran, le 2023-10-22 à 12 52 55

It is possible to use a slightly more complicated function that does a first pass, counting the number of '%' characters and allocates accordingly. Empirically (see the _count results in the screenshot), it does not make much of a difference in the performance, whether you are in the realistic or adversarial case. However, requiring two passes over the data is slightly more complex so I opt for the simplest efficient implementation.

Note that appending to an std::string, even character by character, has linear complexity. Each append does not translate into a new allocation. Rather the complexity grows exponentially, doubling a few times.

nodejs-github-bot · 2023-10-19T14:48:20Z

Review requested:

@nodejs/url

nodejs-github-bot · 2023-10-19T15:07:58Z

CI: https://ci.nodejs.org/job/node-test-pull-request/55004/

nodejs-github-bot · 2023-10-19T18:03:37Z

CI: https://ci.nodejs.org/job/node-test-pull-request/55020/

src/node_url.cc

Co-authored-by: Tobias Nießen <tniessen@tnie.de>

lemire · 2023-10-19T22:41:44Z

@tniessen Thanks for the review of the comments. I have committed your proposed changes. I think that the code itself is correct and contains no superfluous lines. Do you agree?

nodejs-github-bot · 2023-10-19T23:00:26Z

CI: https://ci.nodejs.org/job/node-test-pull-request/55028/

nodejs-github-bot · 2023-10-20T18:05:16Z

CI: https://ci.nodejs.org/job/node-test-pull-request/55066/

nodejs-github-bot · 2023-10-20T23:22:56Z

CI: https://ci.nodejs.org/job/node-test-pull-request/55069/

src/node_url.cc

nodejs-github-bot · 2023-10-21T01:58:48Z

CI: https://ci.nodejs.org/job/node-test-pull-request/55075/

bnoordhuis · 2023-10-22T09:35:27Z

src/node_url.cc

+  // Escape '%' characters to a temporary string.
+  std::string escaped_file_path;
+  do {
+    escaped_file_path += file_path.substr(0, pos + 1);


Potential poor performance due to reallocating here. Imagine a pathological input like "%".repeat(1e6).

Better to count the number of % characters, then preallocate a buffer of size file_path.size() + 2 * count.

There is no reallocation due to the string_view.

Clang compiles the second inner loop to the following:

.LBB0_12: # =>This Inner Loop Header: Depth=1 lea r14, [r13 + 1] cmp r15, r14 mov rdx, r14 cmovb rdx, r15 mov rax, rbp sub rax, qword ptr [rbx + 8] cmp rax, rdx jb .LBB0_13 mov rdi, rbx mov rsi, r12 call std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)@PLT mov rax, qword ptr [rbx + 8] and rax, -2 cmp rax, qword ptr [rsp + 16] # 8-byte Folded Reload je .LBB0_17 mov edx, 2 mov rdi, rbx lea rsi, [rip + .L.str] call std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)@PLT cmp r15, r13 jbe .LBB0_21 add r12, r14 sub r15, r14 je .LBB0_24 mov rdi, r12 mov esi, 37 mov rdx, r15 call memchr@PLT test rax, rax je .LBB0_27 mov r13, rax sub r13, r12 cmp r13, -1 jne .LBB0_12

There are two calls to string append, as you'd expect. Otherwise, there is no allocation. E.g., for example, there are calls to a std::string_view constructor because it gets optimized away. Remember that std::string_view instances are non-allocating. That's why, for example, we pass them by value (not by reference), typically.

The adversarial scenario is one where the entire string is made of '%'. You can benchmark this case using my code, while passing --adversarial to the benchmark program (benchmark --adversarial). In that scenario, the repeated calls to find are not a positive. All tested functions are slow in this adversarial case, but the new code is about 50% slower. That's to be expected. If you do expect strings to contain a large fraction of '%' characters, then the PR is not beneficial. But the point of the PR is that on realistic inputs, the PR can multiply the performance by 10x.

Results on my macBook (LLVM14, you can run your own benchmarks):

Empirically, doing a count + reserve does not improve the performance in the adversarial case where every character is a '%'. It makes the code slightly more complicated, however.

Remember that std::string_view instances are non-allocating

Yes, but escaped_file_path is a string, not a string_view, and that's what being appended to.

Empirically, doing a count + reserve does not improve the performance in the adversarial case

Honestly, I'm not worried about maximum performance here. I'd rather have predictable worst-case performance.

@bnoordhuis I'm going to merge this in a couple of hours since this is not a review that blocks.

I'd rather have predictable worst-case performance.

Appending character-by-character to an std::string in C++ provides predictable performance: https://lemire.me/blog/2023/10/23/appending-to-an-stdstring-character-by-character-how-does-the-capacity-grow/

It is a common usage pattern in C++ (with std::vector, std::string, etc.). Complexity-wise, it is linear time with or withour reserve. A reserve may improve the performance, but it does not change the complexity of the algorithm. In my mind, you'd only want to do a reserve if it improves the real-world performance. It is effectively a performance/efficiency optimization.

Bugs are always possible, but in my mind, neither this PR nor the previous code (written by @anonrig I think) could degenerate performance-wise. They use linear time, safe algorithms.

I should stress that this code is not nearly as good as it could be, but further gains require changing ada.

Appending character-by-character to an std::string in C++ provides predictable performance

You're making an observation based on a sample size of one. I suspect you're also working on the assumption that (re)allocation is a O(1) operation.

Now, the change in this PR doesn't make the code materially worse (only longer and more complex) so I'm not going to block it but it also doesn't make it materially better, it's just rummaging in the margin.

Count + alloc on the other hand is a material improvement because it changes the asymptotic runtime behavior.

(I'm not interested in discussing this further because time is finite and if I haven't gotten my point across by now, it's never going to happen.)

I'm not interested in discussing this further because time is finite and if I haven't gotten my point across by now, it's never going to happen.

There must be a misunderstanding. I suspect I explained myself poorly. I can see no reason why we would even disagree. Let me try to clarify.

I am saying that if you benchmark the following function...

std::string my_string; while (my_string.size() <= volume) { my_string += "a"; }

... the time per entry (time/volume) is effectively constant with existing C++ runtime libraries (glibc++, libc++, Visual Studio). My blog post has an analysis, and even a benchmark (complete with C++ code).

Another way to put it is that the running time is proportional to volume. And yes, I mean "the measured, wall-clock time".

They make it so because it is common usage. E.g., it would be considered a bug if it were quadratic time. We would know. The dynamic arrays in Swift, early on, could be tricked into doing one allocation per append. They quickly updated it.

I suspect you're also working on the assumption that (re)allocation is a O(1) operation.

I am assuming that allocating (or copying) N bytes takes O(N) time. With this model, insertion in a dynamic array with capacity that is expanded by a constant factor (which is how C++ runtime libraries do it) ensures that inserting an element is constant time (amortized).

Let us consider a toy example where you construct a 1024-char string, doubling each time the capacity, and starting with a capacity of 1 byte... (that's not realistic but it is simple)

first character: allocate 1 byte.

second character: double the capacity to 2 bytes.

4th character: double the capacity to 4 bytes.

8th character: double the capacity to 8 bytes.

16th character: double the capacity to 16 bytes.

32nd character: double the capacity to 32 bytes.

64th character: double the capacity to 64 bytes.

128th character: double the capacity to 128 bytes.

256th character: double the capacity to 256 bytes.

512th character: double the capacity to 512 bytes.

1024th character: double the capacity to 1024 bytes.

If you sum it up you get 1 + 2 + 4 + ... 512 + 1024 = 2047. And the result is general. To construct a string of N bytes, you need ~2N bytes being allocated. The key part is that it has amortized linear complexity. Both glibc++ and libc++ use a factor of 2 whereas Microsoft and Facebook prefer a smaller factor, but the analysis is the same.

Feel free to reach out to me at daniel@lemire.me if I did not clarify that point. We can set up a videoconference if needed.

nodejs-github-bot · 2023-10-22T16:56:36Z

CI: https://ci.nodejs.org/job/node-test-pull-request/55124/

nodejs-github-bot · 2023-10-24T18:09:11Z

Landed in c89bae1

PR-URL: nodejs#50288 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Vinícius Lourenço Claro Cardoso <contact@viniciusl.com.br> Reviewed-By: Tobias Nießen <tniessen@tnie.de>

PR-URL: #50288 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Vinícius Lourenço Claro Cardoso <contact@viniciusl.com.br> Reviewed-By: Tobias Nießen <tniessen@tnie.de>

lemire added 2 commits October 19, 2023 10:05

src: use find instead of char-by-char in FromFilePath()

3f4deaa

fix typo

95e9f4c

nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Oct 19, 2023

adding space

a61db5d

anonrig approved these changes Oct 19, 2023

View reviewed changes

anonrig added request-ci Add this label to start a Jenkins CI on a PR. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. labels Oct 19, 2023

github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 19, 2023

H4ad approved these changes Oct 19, 2023

View reviewed changes

H4ad added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Oct 19, 2023

tniessen reviewed Oct 19, 2023

View reviewed changes

src/node_url.cc Outdated Show resolved Hide resolved

src/node_url.cc Outdated Show resolved Hide resolved

src/node_url.cc Outdated Show resolved Hide resolved

lemire and others added 2 commits October 19, 2023 18:33

Update src/node_url.cc

8c6a185

Co-authored-by: Tobias Nießen <tniessen@tnie.de>

Update src/node_url.cc

11d038f

Co-authored-by: Tobias Nießen <tniessen@tnie.de>

tniessen approved these changes Oct 19, 2023

View reviewed changes

anonrig approved these changes Oct 19, 2023

View reviewed changes

anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Oct 19, 2023

github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 19, 2023

github-actions bot mentioned this pull request Oct 20, 2023

CI Reliability 2023-10-20 nodejs/reliability#692

Open

20 tasks

github-actions bot mentioned this pull request Oct 21, 2023

CI Reliability 2023-10-21 nodejs/reliability#693

Open

24 tasks

eugeneo reviewed Oct 21, 2023

View reviewed changes

src/node_url.cc Outdated Show resolved Hide resolved

simplifying by not setting pos to zero.

ba7f31d

github-actions bot mentioned this pull request Oct 22, 2023

CI Reliability 2023-10-22 nodejs/reliability#694

Open

29 tasks

bnoordhuis reviewed Oct 22, 2023

View reviewed changes

H4ad added the request-ci Add this label to start a Jenkins CI on a PR. label Oct 22, 2023

github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 22, 2023

This was referenced Oct 23, 2023

CI Reliability 2023-10-23 nodejs/reliability#695

Open

CI Reliability 2023-10-24 nodejs/reliability#696

Open

anonrig approved these changes Oct 24, 2023

View reviewed changes

anonrig added the commit-queue Add this label to land a pull request using GitHub Actions. label Oct 24, 2023

nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Oct 24, 2023

nodejs-github-bot merged commit c89bae1 into nodejs:main Oct 24, 2023
53 checks passed

targos mentioned this pull request Nov 12, 2023

v21.2.0 release proposal #50681

Merged

UlisesGascon mentioned this pull request Dec 12, 2023

v20.11.0 proposal #51124

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src: use find instead of char-by-char in FromFilePath() #50288

src: use find instead of char-by-char in FromFilePath() #50288

lemire commented Oct 19, 2023 •

edited

nodejs-github-bot commented Oct 19, 2023

nodejs-github-bot commented Oct 19, 2023

nodejs-github-bot commented Oct 19, 2023

lemire commented Oct 19, 2023

nodejs-github-bot commented Oct 19, 2023

nodejs-github-bot commented Oct 20, 2023

nodejs-github-bot commented Oct 20, 2023

nodejs-github-bot commented Oct 21, 2023

bnoordhuis Oct 22, 2023

lemire Oct 22, 2023 •

edited

lemire Oct 22, 2023

bnoordhuis Oct 24, 2023

anonrig Oct 24, 2023

lemire Oct 24, 2023

bnoordhuis Oct 24, 2023

lemire Oct 24, 2023 •

edited

nodejs-github-bot commented Oct 22, 2023

nodejs-github-bot commented Oct 24, 2023

src: use find instead of char-by-char in FromFilePath() #50288

src: use find instead of char-by-char in FromFilePath() #50288

Conversation

lemire commented Oct 19, 2023 • edited

nodejs-github-bot commented Oct 19, 2023

nodejs-github-bot commented Oct 19, 2023

nodejs-github-bot commented Oct 19, 2023

lemire commented Oct 19, 2023

nodejs-github-bot commented Oct 19, 2023

nodejs-github-bot commented Oct 20, 2023

nodejs-github-bot commented Oct 20, 2023

nodejs-github-bot commented Oct 21, 2023

bnoordhuis Oct 22, 2023

Choose a reason for hiding this comment

lemire Oct 22, 2023 • edited

Choose a reason for hiding this comment

lemire Oct 22, 2023

Choose a reason for hiding this comment

bnoordhuis Oct 24, 2023

Choose a reason for hiding this comment

anonrig Oct 24, 2023

Choose a reason for hiding this comment

lemire Oct 24, 2023

Choose a reason for hiding this comment

bnoordhuis Oct 24, 2023

Choose a reason for hiding this comment

lemire Oct 24, 2023 • edited

Choose a reason for hiding this comment

nodejs-github-bot commented Oct 22, 2023

nodejs-github-bot commented Oct 24, 2023

lemire commented Oct 19, 2023 •

edited

lemire Oct 22, 2023 •

edited

lemire Oct 24, 2023 •

edited