Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certain files aren't hardlinked on Windows, and are duplicated instead. #7046

Closed
2 of 4 tasks
MatthewKing opened this issue Sep 4, 2023 · 9 comments · Fixed by #7177 or #7255
Closed
2 of 4 tasks

Certain files aren't hardlinked on Windows, and are duplicated instead. #7046

MatthewKing opened this issue Sep 4, 2023 · 9 comments · Fixed by #7177 or #7255

Comments

@MatthewKing
Copy link

MatthewKing commented Sep 4, 2023

Verify latest release

  • I verified that the issue exists in the latest pnpm release

pnpm version

No response

Which area(s) of pnpm are affected? (leave empty if unsure)

No response

Link to the code that reproduces this issue or a replay of the bug

https://github.com/MatthewKing/pnpm-issue-7046-reproduction

Reproduction steps

There are full step-by-step instructions in the following repository: https://github.com/MatthewKing/pnpm-issue-7046-reproduction
There is also a script that reproduces the issue (reproduce.ps1)

A quick summary, though:

  1. Restore a project with a dependency on "next".
  2. Check if "node_modules.pnpm@next+swc-win32-x64-msvc@13.4.19\node_modules@next\swc-win32-x64-msvc\next-swc.win32-x64-msvc.node" is hardlinked
  3. The file in step 2 should be hardlinked, but it isn't. Rather, it is just duplicated.

Describe the Bug

When using pnpm on Windows 11, some files that I would expect to be hardlinked are not hardlinked. It appears that they are simply copied, leading to duplication. In particular, files such as next-swc.win32-x64-msvc.node are quite large (110mb), and are being duplicated rather than hardlinked. This means that I'm not getting the full benefit of the content addressable store.

Expected Behavior

I would expect that these files are hardlinked, rather than duplicated.

Which Node.js version are you using?

18.16.0

Which operating systems have you used?

  • macOS
  • Windows
  • Linux

If your OS is a Linux based, which one it is? (Include the version if relevant)

No response

@zkochan
Copy link
Member

zkochan commented Sep 5, 2023

If creating a hard link fails then pnpm creates a copy:

function linkOrCopy (existingPath: string, newPath: string) {
try {
fs.linkSync(existingPath, newPath)
} catch (err: any) { // eslint-disable-line
// If a hard link to the same file already exists
// then trying to copy it will make an empty file from it.
if (err['code'] === 'EEXIST') return
// In some VERY rare cases (1 in a thousand), hard-link creation fails on Windows.
// In that case, we just fall back to copying.
// This issue is reproducible with "pnpm add @material-ui/icons@4.9.1"
fs.copyFileSync(existingPath, newPath)
}
}

@MatthewKing
Copy link
Author

MatthewKing commented Sep 6, 2023

Thanks, good to know. So I guess the next question is that why is this consistently failing?

I've tried the following, with consistent results:

Action Result
Create hardlink using pnpm Consistently fails
Create hardlink manually using fs.linkSync Consistently succeeds
Create hardlink manually using Powershell Consistently succeeds

Anything else I can do to help here? I'm not particularly familiar with this ecosystem so I'm unsure how to proceed.

@MatthewKing
Copy link
Author

MatthewKing commented Sep 6, 2023

After looking at it a bit more, I don't think the copy is being created because something is going wrong with the creation of the hardlink. Rather, I believe that the clone-or-copy package import method is being selected for this specific package (due to opts.requiresBuild being true). I'm not familiar enough with the codebase or the npm ecosystem to know if this is intended of not, or why this particular package is using clone-or-copy when the others are using auto. Thoughts? Thanks

@zkochan
Copy link
Member

zkochan commented Sep 6, 2023

That's actually true. Maybe we can figure out some solution to deduplicate build packages but this was done like this intentionally. There was an issue with built projects linked from the store. For instance, some packages are published with an empty file and then the build script write to the empty file. So when we were hard linking that empty file from the store, it was rewritten for all packages with an empty file. We can't allow any modifications to happen to files in the store.

The good news is that pnpm uses side-effects cache, so next time you install the package, it will not be built, so it should be hard linked from the store. But even after building the package we should be able to relink it to the store (though might make no sense from performance perspective).

@MatthewKing
Copy link
Author

MatthewKing commented Sep 7, 2023

Thanks, that makes a lot of sense.

The good news is that pnpm uses side-effects cache, so next time you install the package, it will not be built, so it should be hard linked from the store.

Is there something I have to do to get the side-effects-cache to function like this? With the package in my reproduction above, no matter how many times I install the package I never get a hardlink to the store version - always a copy.

The following reproduction will demonstrate this (on my Windows machine at least):

git clone https://github.com/MatthewKing/pnpm-issue-7046-reproduction repo1
git clone https://github.com/MatthewKing/pnpm-issue-7046-reproduction repo2
cd .\repo1
pnpm install
.\reproduce.ps1
cd ..\repo2
pnpm install
.\reproduce.ps1

(Both runs of reproduce.ps1 script show that the file is copied not hardlinked, and that the file in the store has no hardlinks elsewhere).

Am I misunderstanding how things should work, or is there another area I can investigate and/or help with? Thanks again for your time on this issue. Cheers

@zkochan
Copy link
Member

zkochan commented Sep 13, 2023

No, there's nothing you should do. It should work by default.

As you can see in the code below:

const pkgImportMethod = (opts.requiresBuild && !isBuilt)

A package is copied only if it requires a build and it is not built. So in theory it should work. I think it is also covered with tests (I hope so). If it doesn't work for you, we should debug it.

@MatthewKing
Copy link
Author

Yeah, it still doesn't work for me (on two machines, reproduced as per the git repo above).

Not sure if this helps, but I've noticed that, for the packages it doesn't work on:

  • opts.sideEffectsCacheKey is undefined
  • opts.filesResponse.sideEffects is also undefined

@tris203
Copy link
Contributor

tris203 commented Oct 7, 2023

have deleted my last comment, as I have done more testing since then, and discovered the bug is due to a cobination of two changes:

  • 0abfe17: New optional option added to package importer: requiresBuild. When requiresBuild is true, the package should only be imported using cloning or copying.

and

  • 07e7b1c: Optional dependencies are always marked as requiresBuild as they are not always fetched and as a result there is no way to check whether they need to be built or not.

manually editing the pnpm-lock.yaml to include

  /@next/swc-win32-x64-msvc@13.4.19:
    resolution: {integrity: sha512-YzA78jBDXMYiINdPdJJwGgPNT3YqBNNGhsthsDoWHL9p24tEJn9ViQf/ZqTbwSpX/RrkPupLfuuTH2sf73JBAw==}
    engines: {node: '>= 10'}
    cpu: [x64]
    os: [win32]
    requiresBuild: false
    dev: false
    optional: true

Results in the file being hardlinked correctly

@tris203
Copy link
Contributor

tris203 commented Oct 7, 2023

Related issues that caused these two changes are:
#2038 and #3238

I think its 2038 that needs a change personally, but i dont know how we get around it, I will give it some more thought

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants