Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nix assumes file size reported by fstat is accurate when copying files to the Nix store #10667

Open
LordMZTE opened this issue May 8, 2024 · 4 comments
Labels

Comments

@LordMZTE
Copy link

LordMZTE commented May 8, 2024

Describe the bug

When a file is copied to the Nix store from a local path, and an fstat on that file returns a size of 0, an empty file will end up in the Nix store even though the file may yield more data when read.

This mismatch between actual file size and reported file size is considered absolutely legal under Linux (citation needed). Nix should handle it correctly.

Steps To Reproduce

The filesystem I have used to reproduce this is my own FUSE3-based filesystem, confgenfs (which behaves like this because another implementation is not possible - files may be different every time they're read and are generated on-demand). I also recall observing this behavior with other (typically FUSE-based) filesystems, but did not find another one while writing this issue.

  1. Set up a filesystem that behaves as stated, such as confgenfs.
sh-5.2$ stat confgenfs/.gtkrc-2.0
  File: confgenfs/.gtkrc-2.0
  Size: 0         	Blocks: 0          IO Block: 4096   regular empty file
Device: 0,48	Inode: 18          Links: 1
Access: (0444/-r--r--r--)  Uid: ( 1000/lordmzte)   Gid: ( 1000/lordmzte)
Access: 1970-01-01 01:00:00.000000000 +0100
Modify: 1970-01-01 01:00:00.000000000 +0100
Change: 1970-01-01 01:00:00.000000000 +0100
 Birth: -
sh-5.2$ cat confgenfs/.gtkrc-2.0
gtk-icon-theme-name="candy-icons"
gtk-cursor-theme-name="LyraX-cursors"
gtk-theme-name="Catppuccin-Mocha-Standard-Red-Dark"
gtk-font-name="Iosevka NF 11"
sh-5.2$ nix eval --impure --expr '"${./confgenfs/.gtkrc-2.0}"'
"/nix/store/0mcflvh26xlj0qhlw748yy7ix31666vg-.gtkrc-2.0"
sh-5.2$ cat /nix/store/0mcflvh26xlj0qhlw748yy7ix31666vg-.gtkrc-2.0
sh-5.2$ 

Expected behavior

The file is read until the read (or whatever syscall) returns a length of 0 instead of stopping at the length returned by fstat, causing the file to be correctly copied to the store.

nix-env --version output
nix-env (Nix) 2.22.0

Additional context

Nix will actually open the file as reported by the filesystem in this case, but will not read all data from it.

Priorities

Add 👍 to issues you find important.

@LordMZTE LordMZTE added the bug label May 8, 2024
@edolstra
Copy link
Member

This is hard to support because the NAR file format puts the size of the file before its contents. So if we can't rely on the file size reported by lstat(), we would have to read the entire file into memory first, which would be a problem for supporting large files.

Related: #10019

@LordMZTE
Copy link
Author

I see. Here's a few thoughts I have on how this could theoretically be solved, note that I don't really know how feasible these are given that I'm not familiar with how Nix works internally:

  • Read the file as we do now, but then attempt to continue reading. If this returns more data than expected, keep adding to the NAR file and then seek back to the beginning of the file, updating the size.
    • This wouldn't work if the NAR file is created in a streaming fashion.
  • Buffer the entire file into memory or a temporary file if and only if the filesystem reports a size of zero and a read call yields data.
    • This would still break for filesystems that report a non-zero size that is smaller than the actual file size.

Some things to consider:

  • What if the filesystem reports a size larger than the actual file? Would this lead to an invalid NAR file being generated, or an error being emitted? We may be able to employ a strategy similar to what I've first suggested above.
  • How to handle infinitely large files? What should we do if someone attempts to use, for example, /dev/random? Should we have a hard limit on how large a file can be? Should we have this limit only for files that report a zero-size?

@fricklerhandwerk
Copy link
Contributor

Triaged in Nix team meeting:

  • @roberth: we could special-case the 0. if the size is reported as 0, we could still try reading it and buffer it.

    • @edolstra: at least that would work. wonder what POSIX has to say what should happen if the file size is not reported correctly (these files do exist, such as /proc)
  • @edolstra: we used to buffer essentially entire files in memory, but that simply doesn't scale

    • it could be done, but it's hard to do in constant-space memory
  • This wouldn't work if the NAR file is created in a streaming fashion.

    @Ericson2314: It is created in a streaming fashion, we wish to use constant space and often create NARs directly into a sink like a pipe or socket.

  • This mismatch between actual file size and reported file size is considered absolutely legal under Linux (citation needed)

    @Ericson2314: We would love to see that citation :)

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-05-15-nix-team-meeting-minutes-146/45491/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants