Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: use ostringstream instead of string in FromFilePath() #50253

Closed
wants to merge 1 commit into from

Conversation

pluris
Copy link
Contributor

@pluris pluris commented Oct 18, 2023

When performing string concatenation operations, it is more advantageous in terms of performance to use std::ostringstream rather than std::string. This approach is also used elsewhere, such as debug_utils and node_errors.

Refs: #46410

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/url

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Oct 18, 2023
@anonrig
Copy link
Member

anonrig commented Oct 18, 2023

What is the performance impact of the new proposed change? Did you had any chance to benchmark it?

@pluris
Copy link
Contributor Author

pluris commented Oct 19, 2023

I ran benchmark/url/* and there doesn't seem to be much difference.
Also, this is not a binding API, so I don't know if I should create a benchmark. Could you please give me some advice?
To compare std::string and std::ostringstream, I created the some code as follows.

#include <iostream>
#include <chrono>
#include <sstream>

std::string FromFilePath1(const std::string_view file_path) {
  std::string escaped_file_path;
  for (size_t i = 0; i < file_path.length(); ++i) {
    escaped_file_path += file_path[i];
    if (file_path[i] == '%') escaped_file_path += "25";
  }
  return escaped_file_path;
}

std::string FromFilePath2(const std::string_view file_path) {
  std::ostringstream escaped_file_path;
  for (size_t i = 0; i < file_path.length(); ++i) {
    escaped_file_path << file_path[i];
    if (file_path[i] == '%') escaped_file_path << "25";
  }
  return escaped_file_path.str();
}

int main() {
    const std::string_view sample_url= "https://nodejs.org/en/blog/";

    std::chrono::system_clock::time_point start = std::chrono::system_clock::now();
    for (int i = 0;i < 1000000;i++)
        FromFilePath1(sample_url);
    std::chrono::duration<double> sec = std::chrono::system_clock::now() - start;
    std::cout << "result 1 : " << sec.count() <<" seconds"<< std::endl;
    // Prints : result 1 : 1.63845 seconds
    start = std::chrono::system_clock::now();
    
    for (int i = 0;i < 1000000;i++)
        FromFilePath2(sample_url);
    
    sec = std::chrono::system_clock::now() - start;
    std::cout << "result 2 : " << sec.count() <<" seconds"<< std::endl;
    // Prints : result 2 : 1.17444 seconds

    return 0;
}

@lemire
Copy link
Member

lemire commented Oct 19, 2023

I ran my own benchmarks. Please see my blog post:

For processing strings, streams in C++ can be slow

Here is the gist:

At least in this case, I find that the stream version is four times slower than naive string processing, and it is 30 times slower than the optimized ‘find’ approach. Your results will vary depending on your system, but I generally consider the use of streams in C++ as a hint that there might be poor performance.

I understand that there has been performance engineering guided by the belief that streams in C++ are efficient. My recommendation is to re-assess this belief and conduct benchmarks.

In fact, it is entirely possible that removing streams where they have been added could be a strategy to improve performance.

For benchmarking, I recommend using relatively large inputs. Here is why: Benchmarking is hard: processors learn to predict branches.

@pluris
Copy link
Contributor Author

pluris commented Oct 19, 2023

@lemire Thanks for the good information. This PR will be closed. 😄

@pluris pluris closed this Oct 19, 2023
@pluris pluris deleted the fix/string_to_ostringstream branch November 8, 2023 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants