Timing problem when piping `process.stdin|stdout|stderr` #616

ehmicky · 2023-12-17T03:36:08Z

Execa helps users pass data to/from processes

We implement multiple features with the following basic idea:

Spawn the process
As soon as spawned, pipe some value to process.stdin or from process.stdout|stderr

The features that use this idea include: the input option, the inputFile option, and the stdio/stdin/stdout/stderr options when their value is a file URL, a file path, a web stream (not a Node.js stream) or an iterable.

All those features revolve around the same idea: make it easy for users to write/read any type of values to a process: raw strings, files, streams, iterables.

This simplify working with processes a lot. The alternative is for users to manually pipe values, which has many pitfalls to avoid (such as proper error handling). Streams and processes are two topics many of us find confusing.

Why do we implement those features like we do?

I cannot think of another way to implement those features. This is due to child_process.spawn() being quite restrictive with the stdio option:

It only allows: pipe/overlapped/null/undefined (which we use as described above), inherit/ipc/ignore (which cannot be used for the above purpose)
It also allows passing a Node.js stream. One might think this would be a better way to implement the above features. However, Node.js requires this stream to have an underlying file descriptor, which is a major restriction in our case.
- Side note: we can't even implement the inputFile option by calling child_process.spawn(..., { stdio: [createReadStream(filePath), ...] }) because createReadStream() does not have an underlying file descriptor until its open event has been emitted. We cannot await that open event because Execa must return the child process synchronously.

So, I feel like, for the above features, we are stuck with the strategy of: spawn the process, then pipe to/from it.

Problem

However, this strategy creates a race condition: even if we pipe "as soon" as the process is spawned, the process might have run for many milliseconds already.

This problem is often softened by the following:

Processes that read stdin should not assume the data to be available on process start, but wait for it instead.
The values written to stdout/stderr is buffered at the OS level.
Although processes might hit the stdout/stderr buffer max limit, they should apply back pressure then, i.e. wait for the buffer to empty up before keeping on writing to it.
Processes should not exit until their writes on stdout/stderr has been consumed

I.e. everything should work fine with well-behaving processes (or underlying runtimes). However this is a potential gotcha that I felt was worth writing down in an issue, in case anyone is experiencing this problem, or has any feedback.

The text was updated successfully, but these errors were encountered:

ehmicky · 2023-12-17T19:02:21Z

Added some tests to document the above behavior, at #618.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timing problem when piping `process.stdin|stdout|stderr` #616

Timing problem when piping `process.stdin|stdout|stderr` #616

ehmicky commented Dec 17, 2023 •

edited

ehmicky commented Dec 17, 2023 •

edited

Timing problem when piping process.stdin|stdout|stderr #616

Timing problem when piping process.stdin|stdout|stderr #616

Comments

ehmicky commented Dec 17, 2023 • edited

Execa helps users pass data to/from processes

Why do we implement those features like we do?

Problem

ehmicky commented Dec 17, 2023 • edited

Timing problem when piping `process.stdin|stdout|stderr` #616

Timing problem when piping `process.stdin|stdout|stderr` #616

ehmicky commented Dec 17, 2023 •

edited

ehmicky commented Dec 17, 2023 •

edited