Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix file cloning on macOS #5001

Closed
zkochan opened this issue Jul 9, 2022 · 23 comments · Fixed by #7031
Closed

Fix file cloning on macOS #5001

zkochan opened this issue Jul 9, 2022 · 23 comments · Fixed by #7031

Comments

@zkochan
Copy link
Member

zkochan commented Jul 9, 2022

pnpm currently uses hard links (not cloning) on macOS. This is due to an issue in Node.js: libuv/libuv#3654

As a result, pnpm is much slower on macOS. Looks like bun doesn't have this issue and as a result is a lot faster than pnpm in macOS. If Node.js is refusing to merge a fix, maybe there's another way to work it out.

Contributors are welcomed as I don't have a mac and probably won't have till the war ends.

@zkochan zkochan pinned this issue Jul 17, 2022
@bd82
Copy link

bd82 commented Sep 6, 2022

Hello @zkochan

This is unrelated to the MacOS issue,
However this issue made me realize the existence of the clone vs hard-links in pnpm.
as I initially thought hard-links were the primary approach used by pnpm as it is strongly emphasized in the FAQ / Docs.

but I just realized that my linux env (container on K8s) is using hard-links and not clone (reflinks) with pnpm.
So I was wondering:

  1. Would using a linux file system which supports reflinks significantly increase the performance on linux?
  2. Are reflinks commonly supported in linux file systems? or is this still considered a rare feature nowadays?
  3. Or are reflinks on linux mostly a correctness feature to avoid accidental modifications to the store's contents?

Thanks for your work on PNPM 👍

@zkochan
Copy link
Member Author

zkochan commented Sep 6, 2022

There are filesystems in linux that support cloning. For instance, btrfs.

@Silic0nS0ldier
Copy link
Sponsor Contributor

Silic0nS0ldier commented Nov 15, 2022

Timeline for the capability landing (returning?) in NodeJS (via libuv) isn't looking great.

Any word on just how much faster pnpm would perform on macOS with file cloning? Will make it much easier to prioritise potentially working on this as part of my current project.

EDIT: Ah, I initially misread. bun appears to be much faster because of it. Apologies for asking that which does not exist. I'll try and make time to investigate this.

@shirotech
Copy link
Contributor

When I run man cp in my mac terminal, there is an option to use clone mode, can you confirm if that is the same clone mode you are referring to?

-c    copy files using clonefile(2)

Here is the complete man command output:

CP(1)                       General Commands Manual                      CP(1)

NAME
     cp – copy files

SYNOPSIS
     cp [-R [-H | -L | -P]] [-fi | -n] [-alpsvXx] source_file target_file
     cp [-R [-H | -L | -P]] [-fi | -n] [-alpsvXx]
        source_file ... target_directory
     cp [-f | -i | -n] [-alPpsvx] source_file target_file
     cp [-f | -i | -n] [-alPpsvx] source_file ... target_directory

DESCRIPTION
     In the first synopsis form, the cp utility copies the contents of the
     source_file to the target_file.  In the second synopsis form, the
     contents of each named source_file is copied to the destination
     target_directory.  The names of the files themselves are not changed.  If
     cp detects an attempt to copy a file to itself, the copy will fail.

     The following options are available:

     -H    If the -R option is specified, symbolic links on the command line
           are followed.  (Symbolic links encountered in the tree traversal
           are not followed.)

     -L    If the -R option is specified, all symbolic links are followed.

     -P    No symbolic links are followed.  This is the default if the -R
           option is specified.

     -R    If source_file designates a directory, cp copies the directory and
           the entire subtree connected at that point.  If the source_file
           ends in a /, the contents of the directory are copied rather than
           the directory itself.  This option also causes symbolic links to be
           copied, rather than indirected through, and for cp to create
           special files rather than copying them as normal files.  Created
           directories have the same mode as the corresponding source
           directory, unmodified by the process' umask.

           In -R mode, cp will continue copying even if errors are detected.

           Note that cp copies hard linked files as separate files.  If you
           need to preserve hard links, consider using tar(1), cpio(1), or
           pax(1) instead.

     -a    Archive mode.  Same as -RpP options. Preserves structure and
           attributes of files but not directory structure.

     -f    If the destination file cannot be opened, remove it and create a
           new file, without prompting for confirmation regardless of its
           permissions.  (The -f option overrides any previous -n option.)

           The target file is not unlinked before the copy.  Thus, any
           existing access rights will be retained.

     -i    Cause cp to write a prompt to the standard error output before
           copying a file that would overwrite an existing file.  If the
           response from the standard input begins with the character ‘y’ or
           ‘Y’, the file copy is attempted.  (The -i option overrides any
           previous -n option.)

     -l    Create hard links to regular files in a hierarchy instead of
           copying.

     -n    Do not overwrite an existing file.  (The -n option overrides any
           previous -f or -i options.)

     -p    Cause cp to preserve the following attributes of each source file
           in the copy: modification time, access time, file flags, file mode,
           user ID, and group ID, as allowed by permissions.  Access Control
           Lists (ACLs) and Extended Attributes (EAs), including resource
           forks, will also be preserved.

           If the user ID and group ID cannot be preserved, no error message
           is displayed and the exit value is not altered.

           If the source file has its set-user-ID bit on and the user ID
           cannot be preserved, the set-user-ID bit is not preserved in the
           copy's permissions.  If the source file has its set-group-ID bit on
           and the group ID cannot be preserved, the set-group-ID bit is not
           preserved in the copy's permissions.  If the source file has both
           its set-user-ID and set-group-ID bits on, and either the user ID or
           group ID cannot be preserved, neither the set-user-ID nor set-
           group-ID bits are preserved in the copy's permissions.

     -s    Create symbolic links to regular files in a hierarchy instead of
           copying.

     -v    Cause cp to be verbose, showing files as they are copied.

     -X    Do not copy Extended Attributes (EAs) or resource forks.

     -x    File system mount points are not traversed.

     -c    copy files using clonefile(2)

     For each destination file that already exists, its contents are
     overwritten if permissions allow.  Its mode, user ID, and group ID are
     unchanged unless the -p option was specified.

     In the second synopsis form, target_directory must exist unless there is
     only one named source_file which is a directory and the -R flag is
     specified.

     If the destination file does not exist, the mode of the source file is
     used as modified by the file mode creation mask (umask, see csh(1)).  If
     the source file has its set-user-ID bit on, that bit is removed unless
     both the source file and the destination file are owned by the same user.
     If the source file has its set-group-ID bit on, that bit is removed
     unless both the source file and the destination file are in the same
     group and the user is a member of that group.  If both the set-user-ID
     and set-group-ID bits are set, all of the above conditions must be
     fulfilled or both bits are removed.

     Appropriate permissions are required for file creation or overwriting.

     Symbolic links are always followed unless the -R flag is set, in which
     case symbolic links are not followed, by default.  The -H or -L flags (in
     conjunction with the -R flag) cause symbolic links to be followed as
     described above.  The -H, -L and -P options are ignored unless the -R
     option is specified.  In addition, these options override each other and
     the command's actions are determined by the last one specified.

     If cp receives a SIGINFO (see the status argument for stty(1)) signal,
     the current input and output file and the percentage complete will be
     written to the standard output.

     If cp encounters an I/O error during the copy, then cp may leave a
     partially copied target_file in place.  cp specifically avoids cleaning
     up the output file in error cases to avoid further data loss in cases
     where the source may not be recoverable.  Alternatives, like install(1),
     may be preferred if stronger guarantees about the target_file are
     required.

EXIT STATUS
     The cp utility exits 0 on success, and >0 if an error occurs.

EXAMPLES
     Make a copy of file foo named bar:

           $ cp foo bar

     Copy a group of files to the /tmp directory:

           $ cp *.txt /tmp

     Copy the directory junk and all of its contents (including any
     subdirectories) to the /tmp directory:

           $ cp -R junk /tmp

COMPATIBILITY
     Historic versions of the cp utility had a -r option.  This implementation
     supports that option, however, its behavior is different from historical
     FreeBSD behavior.  Use of this option is strongly discouraged as the
     behavior is implementation-dependent.  In FreeBSD, -r is a synonym for
     -RL and works the same unless modified by other flags.  Historical
     implementations of -r differ as they copy special files as normal files
     while recreating a hierarchy.

     The -l, -s, -v, -x and -n options are non-standard and their use in
     scripts is not recommended.

LEGACY DESCRIPTION
     In legacy mode, -f will override -i.  Also, under the -f option, the
     target file is always unlinked before the copy.  Thus, new access rights
     will always be set.

     In -R mode, copying will terminate if an error is encountered.

     For more information about legacy mode, see compat(5).

SEE ALSO
     install(1), mv(1), rcp(1), umask(2), fts(3), compat(5), symlink(7)

STANDARDS
     The cp command is expected to be IEEE Std 1003.2 (“POSIX.2”) compatible.

HISTORY
     A cp command appeared in Version 1 AT&T UNIX.

macOS 13.1                     February 23, 2022                    macOS 13.1

@zkochan
Copy link
Member Author

zkochan commented Jan 25, 2023

can you confirm if that is the same clone mode you are referring to?

I think so

@Brooooooklyn
Copy link

@zkochan, do you have an interest in trying https://github.com/sverrejoh/rclonefile?

@zkochan
Copy link
Member Author

zkochan commented Mar 2, 2023

We should give it a try. If it works, we can use it.

@Silic0nS0ldier
Copy link
Sponsor Contributor

Silic0nS0ldier commented Mar 4, 2023

In case it helps, rclonefile@1.0.1 was tested with an internal Yarn Classic fork. Shaved 4 seconds off install duration.

This benchmark is with;

  • Cache fully populated
  • --ignore-scripts
  • Lockfile existing and valid
  • A very large dependency closure
  • Yarn not using workspaces
  • pnpm using workspaces to aid local dependency discovery
  • pnpm running with --filter to match yarn install scoping

yarn@1.22.19: ~35s
@canva/yarn@0.0.0: ~22s
@canva/yarn@0.0.0 with rclonefile@1.0.1: ~18s
pnpm@7.25.0: ~17s

I also made an attempt to integrate rclonefile with pnpm@019e4f2de7b202a44c903a53e97c9c034a32682a, but it either had no impact on total runtime or I missed some copy logic (quite likely). Timing could also be a factor, mds_stores was determined to run whenever source changes occurred despite Spotlight exclusions. 🤷

EDIT: That said, even if the clonefile syscall doesn't offer a perceivable improvement, the copy-on-write behaviour still means reduced disk space requirements without the corruption risks using hardlinks for the same purpose has.

@bnoordhuis
Copy link

The libuv PR was just merged and a new libuv release is imminent. It should be available in a node release in a few weeks.

@zkochan
Copy link
Member Author

zkochan commented Mar 13, 2023

Great, do you know which major versions of Node.js will include the fix?

@bnoordhuis
Copy link

I'm afraid my crystal ball is cloudy today. The next or next-next v19.x seems like a safe bet but for older release lines, it's up to the discretion of the releaser of duty.

@danielbayley
Copy link

as I don't have a mac and probably won't have till the war ends.

@zkochan Of all the many reasons to want Putin dead, this one would probably push me over the edge! Donated some BTC/ETH.

@shirotech
Copy link
Contributor

Contributors are welcomed as I don't have a mac and probably won't have till the war ends.

If you still haven't have a Mac already, github Actions supports the mac platform, unit tests can be performed from it https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners

@zkochan
Copy link
Member Author

zkochan commented Mar 23, 2023

Thanks!

I do have a mac now. Bit has purchased a workstation for me at the end of November. It helped me to work during the winter, when there were regular cutoffs of electrical power. Luckily electricity is fine now too.

@sgarfinkel
Copy link

sgarfinkel commented May 27, 2023

Tested rclonefile on mac and it seems to work well—although I’m not certain how to confirm that it is a clonefile and not just a copy. @zkochan This seems like a straightforward fix just a small enhancement to the function here I think:

async function cloneFile (from: string, to: string) {

But it would be helpful if you could confirm. I assume you’re still accepting PRs on this?

The other question though is whether pnpm wants to depend on this package or include it’s own wrapper around this API.

@rubnogueira
Copy link

libuv pull request is included in v1.45.0.

That specific version was bumped in Node's master branch. nodejs/node@9e68f94

We will have this fix included probably in Node v20.3.0.

@mischnic
Copy link

mischnic commented Jun 2, 2023

No, it was reverted because it caused some regressions that couldn't be fixed in a timely manner: libuv/libuv#3987

@sgarfinkel
Copy link

Will it be backported? This is sort of useless if it's not in the LTS releases.

@sgarfinkel
Copy link

@zkochan Is this no longer a priority? I noticed it was unpinned.

@zkochan
Copy link
Member Author

zkochan commented Sep 1, 2023

There is nothing we can do about it.

@sgarfinkel
Copy link

We can use rclonefile, it worked well in my testing.

zkochan added a commit that referenced this issue Oct 4, 2023
…7031)

close #5001

---------

Co-authored-by: Ignacio Aldama Vicente <sr.drabx@gmail.com>
@sgarfinkel
Copy link

sgarfinkel commented Oct 4, 2023

Amazing work, thanks @zkochan.

When will this be released to a stable version?

@zkochan
Copy link
Member Author

zkochan commented Oct 5, 2023

In a few days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants