Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timing problem when piping process.stdin|stdout|stderr #616

Closed
ehmicky opened this issue Dec 17, 2023 · 1 comment · Fixed by #618
Closed

Timing problem when piping process.stdin|stdout|stderr #616

ehmicky opened this issue Dec 17, 2023 · 1 comment · Fixed by #618

Comments

@ehmicky
Copy link
Collaborator

ehmicky commented Dec 17, 2023

Execa helps users pass data to/from processes

We implement multiple features with the following basic idea:

  1. Spawn the process
  2. As soon as spawned, pipe some value to process.stdin or from process.stdout|stderr

The features that use this idea include: the input option, the inputFile option, and the stdio/stdin/stdout/stderr options when their value is a file URL, a file path, a web stream (not a Node.js stream) or an iterable.

All those features revolve around the same idea: make it easy for users to write/read any type of values to a process: raw strings, files, streams, iterables.

This simplify working with processes a lot. The alternative is for users to manually pipe values, which has many pitfalls to avoid (such as proper error handling). Streams and processes are two topics many of us find confusing.

Why do we implement those features like we do?

I cannot think of another way to implement those features. This is due to child_process.spawn() being quite restrictive with the stdio option:

  • It only allows: pipe/overlapped/null/undefined (which we use as described above), inherit/ipc/ignore (which cannot be used for the above purpose)
  • It also allows passing a Node.js stream. One might think this would be a better way to implement the above features. However, Node.js requires this stream to have an underlying file descriptor, which is a major restriction in our case.
    • Side note: we can't even implement the inputFile option by calling child_process.spawn(..., { stdio: [createReadStream(filePath), ...] }) because createReadStream() does not have an underlying file descriptor until its open event has been emitted. We cannot await that open event because Execa must return the child process synchronously.

So, I feel like, for the above features, we are stuck with the strategy of: spawn the process, then pipe to/from it.

Problem

However, this strategy creates a race condition: even if we pipe "as soon" as the process is spawned, the process might have run for many milliseconds already.

This problem is often softened by the following:

  • Processes that read stdin should not assume the data to be available on process start, but wait for it instead.
  • The values written to stdout/stderr is buffered at the OS level.
  • Although processes might hit the stdout/stderr buffer max limit, they should apply back pressure then, i.e. wait for the buffer to empty up before keeping on writing to it.
  • Processes should not exit until their writes on stdout/stderr has been consumed

I.e. everything should work fine with well-behaving processes (or underlying runtimes). However this is a potential gotcha that I felt was worth writing down in an issue, in case anyone is experiencing this problem, or has any feedback.

@ehmicky
Copy link
Collaborator Author

ehmicky commented Dec 17, 2023

Added some tests to document the above behavior, at #618.

See also #474

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant