Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: move ToNamespacedPath to c++ #52135

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

anonrig
Copy link
Member

@anonrig anonrig commented Mar 18, 2024

This is a step forward to move all node:fs implementation to C++. Since we now have support for win32.pathResolve() in C++, we can move forward with moving ToNamespacedPath functions to C++.

In a follow up pull-request, I'll move FromNamespacedPath() from node_url.cc to path.cc

cc @nodejs/fs @RafaelGSS

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/url

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Mar 18, 2024
@anonrig anonrig force-pushed the simplify-fs branch 2 times, most recently from 003590b to 2d120ca Compare March 18, 2024 15:16
@richardlau richardlau added fs Issues and PRs related to the fs subsystem / file system. windows Issues and PRs related to the Windows platform. labels Mar 18, 2024
@anonrig
Copy link
Member Author

anonrig commented Mar 18, 2024

cc @nodejs/platform-windows Any idea where this error/warning is thrown from?

src/path.cc Outdated Show resolved Hide resolved
src/path.cc Outdated Show resolved Hide resolved
@LiviaMedeiros
Copy link
Contributor

AFAICT the changes are implying that getValidatedPath() should do path.ToNamespacedPath() internally, but there's no changes to getValidatedPath() itself in this PR. Does it happen implicitly? If yes, on which step?

src/path.cc Outdated Show resolved Hide resolved
src/path.h Outdated Show resolved Hide resolved
src/path.cc Show resolved Hide resolved
@anonrig
Copy link
Member Author

anonrig commented Mar 18, 2024

AFAICT the changes are implying that getValidatedPath() should do path.ToNamespacedPath() internally, but there's no changes to getValidatedPath() itself in this PR. Does it happen implicitly? If yes, on which step?

getValidatedPath() result is called with toNamespacedPath() to ensure Windows long paths are handled correctly. I didn't quite understand your comment, but this pull request basically replaces toNamespacedPath(getValidatedPath(input)) with getValidatedPath(input) and calls toNamespacedPath from C++

@anonrig anonrig force-pushed the simplify-fs branch 2 times, most recently from d93f610 to 5c62bf2 Compare March 18, 2024 18:08
@LiviaMedeiros
Copy link
Contributor

getValidatedPath() result is called with toNamespacedPath() to ensure Windows long paths are handled correctly. I didn't quite understand your comment, but this pull request basically replaces toNamespacedPath(getValidatedPath(input)) with getValidatedPath(input) and calls toNamespacedPath from C++

Sorry, I read the changes wrong. It doesn't affect js-side toNamespacedPath() itself but adds cpp-side version that is used in cpp implementations for each fs method.
We might need to doublecheck that every method uses this. Of course, it's non-blocking concern: I'm just pointing out that after this change we'll have to check cpp side instead of js.
Right now I'm only seeing that Mkdtemp lacks it here:

node/src/node_file.cc

Lines 2820 to 2823 in d93f610

THROW_IF_INSUFFICIENT_PERMISSIONS(
env, permission::PermissionScope::kFileSystemWrite, tmpl.ToStringView());
const enum encoding encoding = ParseEncoding(isolate, args[1], UTF8);

Which mirrors current (might be bugged on Windows if called with long prefix) js implementation.

@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Mar 18, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Mar 18, 2024
@nodejs-github-bot
Copy link
Collaborator

@RafaelGSS
Copy link
Member

This should land first #52148.

Copy link
Member

@RafaelGSS RafaelGSS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @joyeecheung suggested on my initial PR for node::pathResolve, I suggest you do the same for path.toNamespacePath. Change the bindings in a different PR and run the tests to see if it works.

Example: #51295

@anonrig
Copy link
Member Author

anonrig commented Mar 19, 2024

@RafaelGSS the function signature is not 1 to 1. We're always using BufferValue in node:fs.

@joyeecheung
Copy link
Member

joyeecheung commented Mar 19, 2024

I think this is different from #51295 in that, the other PR only used the C++ version in a not very well-tested branch (only when permission is enabled) so if it's broken, we may not be able to know due to the low test coverage. This PR however changed most fs bindings so if it's broken, it's probably more likely that some tests would fail on Windows.

That said I think we already don't have enough test cases to check the kind of paths that are need to be handled specially by toNamespacedPath(). Perhaps it would be useful to check the cases in #51097 too

@anonrig anonrig force-pushed the simplify-fs branch 2 times, most recently from 8ab9cf7 to a611c98 Compare March 21, 2024 14:57
@RafaelGSS
Copy link
Member

RafaelGSS commented Mar 27, 2024

So, I ran some benchmarks on this PR as I believe the real intention to move fs to C++ is to improve performance somehow. Looks like this PR doesn't improve performance (due to high variance I can't say it decreases performance though). But here's the results:

toNamespacedPath

$ node benchmark/compare.js --old ./node --new ./node2 --filter toNamespacedPath path | Rscript benchmark/compare.R
[00:00:47|% 100| 2/2 files | 60/60 runs | 8/8 configs]: Done
                                                                                                 confidence improvement accuracy (*)   (**)  (***)
path/toNamespacedPath-posix.js n=100000 path=''                                                        ***     -4.22 %       ±1.99% ±2.66% ±3.46%
path/toNamespacedPath-posix.js n=100000 path='.'                                                         *     -1.93 %       ±1.52% ±2.03% ±2.66%
path/toNamespacedPath-posix.js n=100000 path='/Users/rafaelgss/repos/os/node/benchmark/path/..'         **     -2.27 %       ±1.66% ±2.21% ±2.87%
path/toNamespacedPath-posix.js n=100000 path='/home/node/..'                                           ***     -3.72 %       ±1.63% ±2.17% ±2.83%
path/toNamespacedPath-posix.js n=100000 path='/tmp/bar'                                                  *     -2.67 %       ±2.06% ±2.74% ±3.57%
path/toNamespacedPath-posix.js n=100000 path='bar/baz'                                                  **     -3.20 %       ±1.92% ±2.56% ±3.33%
path/toNamespacedPath-win32.js n=100000 path=''                                                                -0.75 %       ±4.06% ±5.42% ±7.08%
path/toNamespacedPath-win32.js n=100000 path='/Users/rafaelgss/repos/os/node/benchmark/path/..'                -0.09 %       ±0.39% ±0.52% ±0.67%
path/toNamespacedPath-win32.js n=100000 path='\\\\e.exe'                                                       -0.14 %       ±2.04% ±2.73% ±3.58%
path/toNamespacedPath-win32.js n=100000 path='c:../a'                                                          -0.10 %       ±0.99% ±1.31% ±1.71%
path/toNamespacedPath-win32.js n=100000 path='c:/blah\\\\blah'                                                 -0.27 %       ±1.25% ±1.66% ±2.16%
path/toNamespacedPath-win32.js n=100000 path='c:/ignore'                                                        0.50 %       ±0.90% ±1.20% ±1.58%
path/toNamespacedPath-win32.js n=100000 path='d:/games'                                                         0.50 %       ±0.58% ±0.78% ±1.01%
path/toNamespacedPath-win32.js n=100000 path='d:\\\\a/b\\\\c/d'                                                -0.53 %       ±0.88% ±1.18% ±1.55%

Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case, there are 14 comparisons, you can thus
expect the following amount of false-positive results:
  0.70 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.14 false positives, when considering a   1% risk acceptance (**, ***),
  0.01 false positives, when considering a 0.1% risk acceptance (***)

fs --filter=bench-read

$ node benchmark/compare.js --old ./node --new ./node2 --filter bench-read fs | Rscript benchmark/compare.R
[00:01:10|% 100| 5/5 files | 60/60 runs | 2/2 configs]: Done
                                                                        confidence improvement accuracy (*)    (**)   (***)
fs/bench-readSync.js n=10000 type='existing'                                          -0.49 %       ±2.37%  ±3.15%  ±4.10%
fs/bench-readSync.js n=10000 type='non-existing'                                       0.98 %       ±2.60%  ±3.49%  ±4.58%
fs/bench-readdir.js withFileTypes='false' dir='lib' n=10                               0.39 %       ±2.43%  ±3.24%  ±4.22%
fs/bench-readdir.js withFileTypes='false' dir='test/parallel' n=10                    -0.10 %       ±0.34%  ±0.45%  ±0.59%
fs/bench-readdir.js withFileTypes='true' dir='lib' n=10                               -0.86 %       ±1.86%  ±2.47%  ±3.22%
fs/bench-readdir.js withFileTypes='true' dir='test/parallel' n=10                     -0.03 %       ±0.40%  ±0.53%  ±0.69%
fs/bench-readdirSync.js withFileTypes='false' dir='lib' n=10                           1.68 %       ±2.15%  ±2.89%  ±3.83%
fs/bench-readdirSync.js withFileTypes='false' dir='test/parallel' n=10                -0.17 %       ±0.23%  ±0.30%  ±0.39%
fs/bench-readdirSync.js withFileTypes='true' dir='lib' n=10                           -0.03 %       ±0.97%  ±1.30%  ±1.69%
fs/bench-readdirSync.js withFileTypes='true' dir='test/parallel' n=10                 -0.23 %       ±0.47%  ±0.63%  ±0.81%
fs/bench-readlinkSync.js n=1000 type='invalid'                                        -7.06 %       ±8.81% ±11.75% ±15.35%
fs/bench-readlinkSync.js n=1000 type='valid'                                           2.62 %       ±9.90% ±13.18% ±17.15%
fs/bench-readvSync.js n=100000 type='invalid'                                         -0.30 %       ±0.91%  ±1.21%  ±1.58%
fs/bench-readvSync.js n=100000 type='valid'                                            0.84 %       ±1.45%  ±1.93%  ±2.51%

Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case, there are 14 comparisons, you can thus
expect the following amount of false-positive results:
  0.70 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.14 false positives, when considering a   1% risk acceptance (**, ***),
  0.01 false positives, when considering a 0.1% risk acceptance (***)

It's also important to mention I no longer have access to a dedicated machine to perform those benchmarks, so I have used my local MacOS M2 machine. I have created a benchmark specific to path.toNamespacedPath as my initial thought was the impact of the JS -> C++ layer for users of this API (#52236).

The benchmark used ff7b27faf3540f9e8d73bf66635d2cffedd9d341 (This PR) for node2 and 85830da4fd814404ba490232430dc05d14868124 for node (benchmark PR).

Maybe we should re-evaluate this movement of moving things to C++ claiming for performance (if that's the case).

@anonrig
Copy link
Member Author

anonrig commented Mar 27, 2024

This move is required so that we can move getValidatedPath function. For macOS this function is a noop, for Windows it does call some complex functions.

Unless we move getValidatedPath and toNamespacedPath functions to C++ in the same call, it's normal to not see any improvement on macOS

@lemire
Copy link
Member

lemire commented Mar 27, 2024

@RafaelGSS From first principles, I would expect this PR to have little impact on performance. As @anonrig states, we are moving a trivial function (everywhere except under Windows) to C++.

Once it is done right, there is no further downside, however. E.g., it should not impact maintainability.

It is not like any JavaScript hacker really loved ToNamedspacePath.

@RafaelGSS
Copy link
Member

RafaelGSS commented Mar 28, 2024

This move is required so that we can move getValidatedPath function. For macOS this function is a noop, for Windows it does call some complex functions.

Unless we move getValidatedPath and toNamespacedPath functions to C++ in the same call, it's normal to not see any improvement on macOS

Note that I'm using path.posix. and path.win32. on the benchmarks, so I expect my operating system to not be relevant to this particular benchmark (despite of process.cwd() of course), am I missing something?

UPDATE: Actually, it seems the current API of path.toNamespacedPath will not call the C++ function, so we are basically handling two functions that behave similarly. I think there's no problem with that then, it can just be a problem if we attempt to change the API to call this new C++ function. My benchmark for this case is irrelevant.

Copy link
Member

@RafaelGSS RafaelGSS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Mar 28, 2024
test/cctest/test_path.cc Outdated Show resolved Hide resolved
@richardlau richardlau added the baking-for-lts PRs that need to wait before landing in a LTS release. label Mar 28, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Mar 28, 2024
@nodejs-github-bot
Copy link
Collaborator

@anonrig
Copy link
Member Author

anonrig commented Mar 28, 2024

@nodejs/build Why does this pull-request doesn't trigger request-ci? Any idea?

@richardlau
Copy link
Member

https://ci.nodejs.org/job/node-test-commit/69402/console

01:02:16 + REBASE_ONTO=27493a1dd73f41c07f04ca5d89d001e340a9feec
01:02:16 + git rebase --committer-date-is-author-date 27493a1dd73f41c07f04ca5d89d001e340a9feec
01:02:16 Rebasing (1/16)
Rebasing (2/16)
Rebasing (3/16)
dropping 19a943600a666c79e94bfd8e5149bc76c723d009 fix: InternalModuleStat should use ToNamespacedPath -- patch contents already upstream
01:02:16 Rebasing (4/16)
Rebasing (5/16)
Rebasing (6/16)
Rebasing (7/16)
Rebasing (8/16)
Rebasing (9/16)
Rebasing (10/16)
Auto-merging src/path.cc
01:02:17 CONFLICT (content): Merge conflict in src/path.cc
01:02:17 error: could not apply a0d2d1a1018... fix ToNamespacedPath
01:02:17 hint: Resolve all conflicts manually, mark them as resolved with
01:02:17 hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
01:02:17 hint: You can instead skip this commit: run "git rebase --skip".
01:02:17 hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
01:02:17 Could not apply a0d2d1a1018... fix ToNamespacedPath
01:02:17 Build step 'Execute shell' marked build as failure

@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Mar 28, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Mar 28, 2024
@nodejs-github-bot
Copy link
Collaborator

Co-authored-by: Daniel Lemire <daniel@lemire.me>
@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Mar 28, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Mar 28, 2024
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

test/cctest/test_path.cc Outdated Show resolved Hide resolved
Co-authored-by: Daniel Lemire <daniel@lemire.me>
@anonrig anonrig added request-ci Add this label to start a Jenkins CI on a PR. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. labels Apr 5, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 5, 2024
@nodejs-github-bot
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
baking-for-lts PRs that need to wait before landing in a LTS release. c++ Issues and PRs that require attention from people who are familiar with C++. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. fs Issues and PRs related to the fs subsystem / file system. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. windows Issues and PRs related to the Windows platform.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants