Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs.readSync throws EOF Error instead of returning 0 (Windows, if STDIN is a pipe) #35997

Open
RomainMuller opened this issue Nov 6, 2020 · 36 comments
Labels
fs Issues and PRs related to the fs subsystem / file system.

Comments

@RomainMuller
Copy link

RomainMuller commented Nov 6, 2020

  • Version: 14.15.0
  • Platform: Windows 10 x86_64
  • Subsystem: fs

What steps will reproduce the bug?

Given the following node script:

/// script.js
const fs = require('fs');

console.error('Node: ' + process.execPath);
console.error('Version: ' + process.version);
console.error('Platform: ' + process.platform);

console.error('process.stdin: ' + process.stdin.constructor.name);

const fd = 0; // STDIN
const buffer = Buffer.alloc(4_096);

let read;
do {
  read = fs.readSync(fd, buffer, 0, buffer.length, null);
  if (read != 0) {
      console.log('STDIN: ' + buffer.slice(0, read).toString('utf-8'))
  }
} while (read != 0);

console.log('EOF was (normally) reached.')

Will behave differently based on how it is invoked:

$ node script.js
Node: C:\Program Files\nodejs\node.exe
Version: v14.15.0
Platform: win32
process.stdin: ReadStream
Data!
STDIN: Data!
^Z
EOF was (normally) reached.


$ echo 'Data!' | node script.js
Node: C:\Program Files\nodejs\node.exe
Version: v14.15.0
Platform: win32
process.stdin: Socket
STDIN: Data!
internal/fs/utils.js:308
    throw err;
    ^

Error: EOF: end of file, read
    at Object.readSync (fs.js:592:3)
    at Object.<anonymous> (H:\repro\test\script.js:14:13)
    at Module._compile (internal/modules/cjs/loader.js:1063:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1092:10)
    at Module.load (internal/modules/cjs/loader.js:928:32)
    at Function.Module._load (internal/modules/cjs/loader.js:769:14)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:72:12)
    at internal/main/run_main_module.js:17:47 {
  errno: -4095,
  syscall: 'read',
  code: 'EOF'
}

How often does it reproduce? Is there a required condition?

This issue happens 100% of the time when process.stdin is a Socket instance.

What is the expected behavior?

The expected and documented behavior is that fs.readSync returns 0 when EOF is reached.

What do you see instead?

Instead, an Error: EOF is thrown, which is unexpected.

Additional information

This problem was surfaced as users are starting to see failures on Windows at aws/aws-cdk#11314.

We have additionally seen issues where in similar conditions (process.stdin is a socket/pipe), attempting to perform fs.readSync on file descriptor 0 (aka STDIN), occasionally results in unexpected EAGAIN errors being thrown, which might be related. That was highlighted in aws/aws-cdk#5187 and we are currently working with a pretty ugly workaround that would be nice if removed (but that other issue is a lot more difficult to steadily reproduce, as it does not always trigger, and may be caused by side-effects of another operation).

@RomainMuller
Copy link
Author

The stack trace of the error points at the following code:

node/lib/fs.js

Lines 590 to 592 in c1da528

const result = binding.read(fd, buffer, offset, length, position,
undefined, ctx);
handleErrorFromBinding(ctx);

I basically expect the windows binding returns different codes for Socket than for ReadStream and in turn causes the difference in behavior.

@mmomtchev
Copy link
Contributor

You will get much better results with:

/// script.js
const fs = require('fs');

console.error('Node: ' + process.execPath);
console.error('Version: ' + process.version);
console.error('Platform: ' + process.platform);

console.error('process.stdin: ' + process.stdin.constructor.name);

const buffer = Buffer.alloc(4_096);

let read;
do {
    read = fs.readFileSync('/dev/stdin', buffer, 0, buffer.length, null);
    if (read != 0) {
        console.log('STDIN: ' + buffer.slice(0, read).toString('utf-8'));
    }
} while (read != 0);

console.log('EOF was (normally) reached.');

Don't fiddle with the 0, 1 and 2 file descriptors. They are open in non-blocking mode and libuv doesn't expect this when you are passing a file descriptor.

@RomainMuller
Copy link
Author

RomainMuller commented Nov 9, 2020

Hey @mmomtchev,

read = fs.readFileSync('/dev/stdin', buffer, 0, buffer.length, null);

This is interesting and all... But /dev/stdin is not a thing on Windows, and I cannot require WSL... 🤷🏻‍♂️
Or does fs have some magic tricks to make it look like this is a file that exists?

@mmomtchev
Copy link
Contributor

No, you are right, I can't think of any easy way to synchronously read from stdin on Windows
There are npm modules or you could go the async way
There is an awful solution here http://corpus.hubwiz.com/2/node.js/3430939.html that will simulate blocking access, repeating the call on EAGAIN and breaking the loop on EOF

But the real answer is that Node is built for async I/O

@RomainMuller
Copy link
Author

We already have the ugly code to loop on EAGAIN...
Unfortunately, the use-case I have does not lend itself to async exchange here... As there are ordering guarantees I must maintain.

The real thing I'm concerned with here, is how fs.readSync throws an EOF error... When it is documented to never do this and instead return 0.

@mmomtchev
Copy link
Contributor

It says that the descriptor you are passing must be opened in blocking mode.

@RomainMuller
Copy link
Author

RomainMuller commented Nov 9, 2020

It says that the descriptor you are passing must be opened in blocking mode.

Where though?

I'm looking at: https://nodejs.org/api/fs.html#fs_fs_readsync_fd_buffer_offset_length_position, and the only place on this page where a requirement for fd to be in blocking (or non-blocking for that matters) mode is on createReadStream and createWriteStream, none of which I am using...


In any case, in light of your confirming that we are doing things likely to trip libuv in weird and awkward ways, I've been implementing a more comprehensive (dirty) workaround in aws/jsii#2238 that also catches the EOF error; which would buy us time to figure out if/how we can proceed without needing to do synchronous I/O on our STD{IN,OUT,ERR}...

@mmomtchev
Copy link
Contributor

@RomainMuller in fact, the stdin is not opened in non-blocking mode on Windows, that is the case only on Linux and OSX, so maybe there is a solution to your problem

@mmomtchev
Copy link
Contributor

On Windows libuv does not properly handle pipes when operating in file mode. I wonder if it is supposed to, because normally, there is a separate pipe mode, but it is a trivial change. In any case, this issue should be taken to https://github.com/libuv/libuv
It happens because the writing process exits and closes the anonymous pipe.
Feel free to open it there, referencing this issue.

This is everything that is needed to make it work:

--- a/deps/uv/src/win/fs.c
+++ b/deps/uv/src/win/fs.c
@@ -918,7 +918,7 @@ void fs__read(uv_fs_t* req) {
     SET_REQ_RESULT(req, bytes);
   } else {
     error = GetLastError();
-    if (error == ERROR_HANDLE_EOF) {
+    if (error == ERROR_HANDLE_EOF || error == ERROR_BROKEN_PIPE) {
       SET_REQ_RESULT(req, bytes);
     } else {
       SET_REQ_WIN32_ERROR(req, error);

@mmomtchev
Copy link
Contributor

But then again, you will end up with Windows-specific JS code, which is not the Node way

@mmomtchev
Copy link
Contributor

@ronag is this supposed to work?

@ronag
Copy link
Member

ronag commented Nov 9, 2020

Sorry, sync api's is not my area of expertise.

@mmomtchev
Copy link
Contributor

It is not specific to synchronous mode. This is getting stranger by the minute.
On Linux this leaves stdin in non-blocking mode:

const fs = require('fs');
console.error('process.stdin: ' + process.stdin.constructor.name);
(async () => {
    let read;
    read = fs.createReadStream(null, { fd: 0 })
    for await (const r of read)
        console.log(r)
    console.log('EOF was (normally) reached.')
})();

but this leaves it in blocking mode:

const fs = require('fs');

(async () => {
    let read;
    read = fs.createReadStream(null, { fd: 0 })
    for await (const r of read)
        console.log(r)
    console.log('EOF was (normally) reached.')
})();

The difference is accessing process.stdin.constructor.name. Adding a simple console.error doesn't change anything.

@mmomtchev
Copy link
Contributor

Easiest way to check if a file descriptor is in blocking or non-blocking mode is cat /proc/<pid>/fdinfo/<fd> - look at flags, the fourth digit from the right to the left will be 4 for non-blocking mode

@mmomtchev
Copy link
Contributor

@RomainMuller, please disregard what I said, it seems that Node has a problem with handling consistently stdin across different situations and different platforms

@mmomtchev
Copy link
Contributor

mmomtchev commented Nov 10, 2020

@joyeecheung, the type of stdin is reassigned in the process.stdin getter??

If users are not supposed to be playing with 0, 1 and 2 file descriptors anymore, it should probably be mentioned, and maybe even add a warning
I have seen tons of examples with these, there is a widely used AsyncResource tutorial which does writeSync(1, ...)

@mmomtchev
Copy link
Contributor

mmomtchev commented Nov 10, 2020

So there are three separate issues

  • Should the type of stdin be reassigned in the process.stdin getter - meaning getting the value of process.stdin has side effects
  • Should stdin be a non-blocking tty on Linux - meaning that the 0, 1 and 2 fds become unusable for the user
  • A pipe stdin on Windows when used as a file, does not accept EPIPE as a valid EOF leading to @RomainMuller's original issue

@vtjnash
Copy link
Contributor

vtjnash commented Nov 10, 2020

The 2nd bullet point there has two tracking issues for it already (libuv/libuv#2962 and libuv/libuv#2062)

@ftoh
Copy link

ftoh commented Nov 16, 2020

There are some examples I think have the same reason.
Windows 7 via bash from Git for Windows, Node v13.14.0
works as expected:

$ echo test | node.exe -p 'require("fs").readFileSync(0)'
$ echo test | node.exe -e 'require("fs").readFile(0, console.log)'

doesn't works:

$ curl -s http://ifconfig.me/all.json | node.exe -p 'require("fs").readFileSync(0)'
<...>
Error: EOF: end of file, read
    at Object.readSync (fs.js:578:3)
    at tryReadSync (fs.js:353:20)
    at Object.readFileSync (fs.js:390:19)
    at [eval]:1:15
    at Script.runInThisContext (vm.js:131:20)
    at Object.runInThisContext (vm.js:297:38)
    at Object.<anonymous> ([eval]-wrapper:10:26)
    at Module._compile (internal/modules/cjs/loader.js:1118:30)
    at evalScript (internal/process/execution.js:94:25)
    at internal/main/eval_string.js:23:3 {
  errno: -4095,
  syscall: 'read',
  code: 'EOF'
}

$ curl -s http://ifconfig.me/all.json | node.exe -e 'require("fs").readFile(0, console.log)'
<...>
[Error: EOF: end of file, read] {
  errno: -4095,
  code: 'EOF',
  syscall: 'read'
}

In all cases constructor name of stdin is Socket

@mmomtchev
Copy link
Contributor

@joyeecheung @ronag @mcollina

There are three separate issues here:

  • The first and most important, is that stdio seems to change type based on an auto-detection code. I think that this is a design decision by @joyeecheung . If this is indeed the case, than the users need clarification on the 0, 1 and 2 file descriptors. As, depending on the type of the stdio (pipe, tty, file...), these can switch to non-blocking mode, they should not be used anymore as they are not compatible with the file operations of libuv. This should be made clear, especially since I found at least one popular tutorial on AsyncResource which suggests that people make use of writeFileSync(1, ...) in order to avoid the asynchronous operation of console.log. Currently using the 0, 1 and 2 file descriptors will lead to completely random results, depending on the stdio's type and OS. This is not limited to writeFileSync, all synchronous and asynchronous file operations that bypass the ReadStream on these descriptors are affected.
    Changing this requires rewriting lots of code.

  • The second one is that fact that the stdio reassignment takes places in the getter of process.stdin / process.stdout. I think that this was an oversight when trying to lazy-load the ReadStream. So user code will behave in a radically different manner if one adds a console.log(process.stdin.constructor.name) as @RomainMuller did.
    Fixing this is trivial.

  • The status of the third issue - that a pipe stdin on Windows when used as a file, does not accept EPIPE as a valid EOF leading to @RomainMuller's original issue - depends on the decisions for the first two

@mcollina
Copy link
Member

The first and most important, is that stdio seems to change type based on an auto-detection code. I think that this is a design decision by @joyeecheung . If this is indeed the case, than the users need clarification on the 0, 1 and 2 file descriptors. As, depending on the type of the stdio (pipe, tty, file...), these can switch to non-blocking mode, they should not be used anymore as they are not compatible with the file operations of libuv. This should be made clear, especially since I found at least one popular tutorial on AsyncResource which suggests that people make use of writeFileSync(1, ...) in order to avoid the asynchronous operation of console.log. Currently using the 0, 1 and 2 file descriptors will lead to completely random results, depending on the stdio's type and OS. This is not limited to writeFileSync, all synchronous and asynchronous file operations that bypass the ReadStream on these descriptors are affected.
Changing this requires rewriting lots of code.

This is not really a design decision but a necessity. STDIO can be of different types and the operations on the file descriptors MUST match those cases. I stumbled upon this quite a lot myself. Check out https://www.npmjs.com/package/sonic-boom. This handles most of the edge cases related to use fd and writeFileSync. Note that what looks like random results are in fact not random at all.

https://github.com/nodejs/node/blob/master/lib/internal/bootstrap/switches/is_main_thread.js contains all the logic that create a different implementation depending on the type of the file descriptors.

Note that you can find a lot of deeply incorrect information on blog posts about Node.js.

The second one is that fact that the stdio reassignment takes places in the getter of process.stdin / process.stdout. I think that this was an oversight when trying to lazy-load the ReadStream. So user code will behave in a radically different manner if one adds a console.log(process.stdin.constructor.name) as @RomainMuller did.
Fixing this is trivial.

Can you please articulate this? A PR would be welcomed. It's important that we lazy-load things to speed up the boostrap of Node.js.

The status of the third issue - that a pipe stdin on Windows when used as a file, does not accept EPIPE as a valid EOF leading to @RomainMuller's original issue - depends on the decisions for the first two

This looks like a bug. PR?

@mmomtchev
Copy link
Contributor

@mcollina

Can you please articulate this? A PR would be welcomed. It's important that we lazy-load things to speed up the boostrap of Node.js.

For example, currently, on Linux when stdin is a tty (ie without redirection), fd 0, 1 and 2 are blocking and support readFileSync(0, ...) and readFile(0, ...). As soon as one accesses process.stdin - for example with console.log(process.stdin.constructor.name), they switch to non-blocking mode and readFile(0, ...) fails with EAGAIN as non-blocking file descriptors are not supported by libuv. I don't think this is by design? Unless the design decision is that these descriptors are not to be used by the user code anymore which should be made clear? Maybe this is the best solution if these FDs are going go be handled behind the scenes, switching blocking mode on and off in a transparent way.

This looks like a bug. PR?

If the user is allowed to use FDs 0, 1 and 2, than yes, maybe. The problem is that when stdin is a pipe, it is handled as a Socket and libuv doesn't like it when one uses the file operations on it - this is the libuv issue I created - libuv/libuv#3043 - but its status is not very clear as it is not clear whether this should be allowed or not

@mmomtchev
Copy link
Contributor

@mcollina , on Linux readFileSync('/dev/stdin') works perfectly in all cases, but there is no Windows equivalent. The readStream works perfectly in all cases too. Maybe the best decision is to simply disallow usage of 0, 1 and 2 instead of adding a huge amount of complexity - it already bad enough.

@mcollina
Copy link
Member

For example, currently, on Linux when stdin is a tty (ie without redirection), fd 0, 1 and 2 are blocking and support readFileSync(0, ...) and readFile(0, ...). As soon as one accesses process.stdin - for example with console.log(process.stdin.constructor.name), they switch to non-blocking mode and readFile(0, ...) fails with EAGAIN as non-blocking file descriptors are not supported by libuv.

It's totally possible to handle EAGAIN with synchronous code. See https://github.com/mcollina/sonic-boom/blob/master/index.js#L94-L105 for more details. Note that the sleep operation that is performed is https://github.com/davidmarkclements/atomic-sleep/blob/b8149d3ca276c84a54fa8fa1478f9cc79aabc15a/index.js#L18.

@mcollina , on Linux readFileSync('/dev/stdin') works perfectly in all cases, but there is no Windows equivalent. The readStream works perfectly in all cases too. Maybe the best decision is to simply disallow usage of 0, 1 and 2 instead of adding a huge amount of complexity - it already bad enough.

I'll be -1 to that.

@mmomtchev
Copy link
Contributor

Ok, there is a solution to the non-blocking problem - I find it awful since it is essentially a way around the fundamentals of libuv and Node, but never mind, it is possible and it seems to be what people do - there is another example of this in my comments above

In this case the only thing that needs solving is the pipe closing on Windows - if this is to be allowed - and in this case this is a libuv problem which should support reading from pipes in file mode - libuv/libuv#3043

@mmomtchev
Copy link
Contributor

@mcollina Just to make it clear - the official position on that one is that FD 0, 1 and 2 can be both blocking or non-blocking depending on the situation and the OS, and it is up to the user code to handle this?

@mcollina
Copy link
Member

Just to make it clear - the official position on that one is that FD 0, 1 and 2 can be both blocking or non-blocking depending on the situation and the OS, and it is up to the user code to handle this?

Yes.

@joyeecheung
Copy link
Member

I don't think this is by design? Unless the design decision is that these descriptors are not to be used by the user code anymore which should be made clear? Maybe this is the best solution if these FDs are going go be handled behind the scenes, switching blocking mode on and off in a transparent way.

I don't remember what the original behavior was but it wasn't by design. I vaguely remember (could be wrong) that we used to turn them non-blocking from the start and allowing them to be blocking until you access process.stdin and friends should be just an artifact of lazy-loading them.

@mmomtchev
Copy link
Contributor

@joyeecheung, the current solution when it comes to FDs 0, 1 and 2 is absolutely horrible - ie they tend to transparently switch from blocking to non-blocking, requiring the user to reimplement the async read loop in his code by eventually microsleeping... However me too (and I didn't take @mcollina word for granted, I spent some time reading/testing) I came to the conclusion that there is simply no other solution. I still think that Node should not incite people to write horrible code, so this should not be official 😄 , but jokes apart, people are using these FDs, and most of the time they don't even realize something very weird is going on.

They only viable long-term solution I see is that libuv supports reading from regular files in non-blocking mode - something that would be totally useless except to give Node (and the user code) an uniform abstraction layer.

When it comes to the lazy-loading - at this point it doesn't change anything to not do it - the user code will still have to expect these FDs to be in either mode.

I will only try pushing a PR to libuv for the EOF problem on Windows - now that we all agree that reading from a pipe in file mode should be supported.

@joyeecheung
Copy link
Member

If the confusing part is that the FD switches from blocking to non-blocking once you access process.stdin etc. (due to lazy loading), I guess we could just do something early and make sure that the mode is consistently non-blocking for users (which is probably the old behavior).

@mmomtchev
Copy link
Contributor

Yes, it is exactly what made that issue so confusing for me. Changing this however this will have a detrimental performance effect on process creation for everyone and it won't really accomplish anything - the user will still need to support both blocking and non-blocking FDs

@vtjnash
Copy link
Contributor

vtjnash commented Nov 18, 2020

Changing it to non-blocking is often likely going to be problematic for any native C libraries that nodejs interacts with, as well as any programs that are execv from nodejs. Thus, I don't recommend doing that. It's also potentially quite problematic in the (very infrequent) case where some other external library calls dup2 to replace the stdio handles with something else.

@mmomtchev
Copy link
Contributor

Me too, as an old-school UNIX guy, I was initially appalled by the design decision to have a non-blocking stdio - it is something that is known to break things. But how do you implement pipes in libuv? They are currently built around net.Socket and net.Socket is always in non-blocking mode. And at this point it is not even about blocking or non-blocking - it is about it being sometimes blocking and sometimes non-blocking - which is the major problem. But having a consistent stdio requires that libuv fully supports both files, pipes and ttys in one of those modes - which is not the case. You are libuv, would you consider a PR that adds a completely useless support for blocking I/O on pipes/sockets/ttys (through threads) or non-blocking supports for files (that actually blocks and also requires threads)? And the only reason for doing it would be to have a consistent stdio in Node across all platforms? I am not sure it is worth the effort.
By the way, while researching this, I stumbled upon an effort by DJ Bernstein to standardize a new POSIX file API - that will have separate read_nonblocking / write_nonblocking calls instead of a flag attached to the descriptor. There was even an attempt to implement it in Linux (readv2 / writev2 calls) but the PR has been frozen for years.

@bughit
Copy link

bughit commented Nov 24, 2020

the following patch appears to work (tested with piped curl)

fs.readSync = function(origReadSync) {
  return function newReadSync() {
    try {
      return origReadSync.apply(this, arguments);
    } catch (e) {
      if (e.code === 'EOF')
        return 0;
      else
        throw e;
    }
  };
}(fs.readSync);

@mmomtchev
Copy link
Contributor

@bughit it is a remarkably ugly solution, so I think it will be very appropriate to include it in the read loop with the microsleep 😄
@RomainMuller there is a work around for your issue, this is libuv, so the final fix won't happen before Node 16

@PoojaDurgad PoojaDurgad added the fs Issues and PRs related to the fs subsystem / file system. label Dec 16, 2020
0x2b3bfa0 added a commit to iterative/cml that referenced this issue May 14, 2021
The fs.readSync function throws EOF instead of returning 0 like in other systems.
See nodejs/node#35997 for more information.
DavidGOrtega pushed a commit to iterative/cml that referenced this issue May 18, 2021
The fs.readSync function throws EOF instead of returning 0 like in other systems.
See nodejs/node#35997 for more information.
DavidGOrtega added a commit to iterative/cml that referenced this issue May 19, 2021
* No await runner (#523)

* cml-publish mime and repo (#519)

* cml-publish mime and repo

* cleanup

* image/svg+xml

* buffer

* resolve

* text buffer outputs plain

* native bolean

* snapshot

* Fix pipe reading logic on Windows (#516)

The fs.readSync function throws EOF instead of returning 0 like in other systems.
See nodejs/node#35997 for more information.

* minor doc fixes (#534)

* cml-pr command (#510)

* test-cml-pr

* packages

* pre-release

* log isci

* log isci

* user_name

* log gitlab url

* log gitlab url

* short 8

* len paths

* no undef

* git

* files

* files

* files

* files

* files

* files

* files

* files

* files

* files

* files

* files

* files

* files

* files

* files

* files

* sha2

* sha2

* log filepath

* log filepath

* log filepath

* exists and  clean

* SHA vs SHA2

* pre-release

* pre-release 2

* pre-release 3

* pre-release 3

* pre-release 3

* pre-release 3

* pre-release 3

* pre-release 3

* git back

* git back

* git back

* release

* REPO_TOKEN

* html_url

* cml pr defaults

* cml pr defaults

* test strict again

* CI

* vars

* shorter params

* snapshots and olivaw

* test david

* BB fix

* snapshots

* No git sha dependencies (#517)

* check head_sha

* 0.4.0

Co-authored-by: Helio Machado <0x2b3bfa0+git@googlemail.com>
Co-authored-by: Casper da Costa-Luis <work@cdcl.ml>
@shnooshnoo
Copy link

Hi @RomainMuller, I wanted to look into this issue, but was not able to reproduce (tried 18.18.2, 20.10.0 and 22.0.0-pre). Do you think this can be closed now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system.
Projects
None yet
Development

No branches or pull requests

10 participants