Separate subtest stdout from TAP output #45529

kmannislands · 2022-11-19T23:58:03Z

What is the problem this feature will solve?

Currently, there is no separation between test TAP output coming from node:test and arbitrary user stdout written from the test.

This is problematic in cases like:

import { test } from "node:test";

test("passes", async () => {
  await new Promise((res) =>
    setTimeout(() => {
      console.log("not ok");
      res();
    }, 10)
  );
});

This test will either output invalid tap by writing not ok before the TAP version 14 line or output invalid TAP by writing not ok after the TAP version line and therefore reporting the results of more tests than the plan allows for.

This is clearly a contrived example but more generally, this introduces the following problems:

Spec compliance

The TAP version 13 contains wording around this

Any output that is not a version, a plan, a test line, a YAML block, a diagnostic or a bail out is incorrect. How a harness handles the incorrect line is undefined. Test::Harness silently ignores incorrect lines, but will become more stringent in the future. TAP::Harness reports TAP syntax errors at the end of a test run.

The TAP version 14 spec comments on this as well:

Any line that is not a valid version, plan, test point, YAML diagnostic, pragma, a blank line, or a bail out is invalid TAP.

A Harness may silently ignore invalid TAP lines, pass them through to its own stderr or stdout, or report them in some other fashion. However, Harnesses should not treat invalid TAP lines as a test failure by default.

My interpretation of these two specs is that any test that writes to std out falls in one of two buckets
- Outputs valid tap - See 2
- Ouputs anything else - Outputs invalid TAP, making the job of any parser implementation more difficult
In any case, I would expect that a language core test runner should output valid TAP by default when users write to stdout.
Accidental control flow via console.log
console.loging or otherwise writing string like ok, not ok to stdout will impact harness's ability to interpret test results. Hopefully we can agree this is undesirable.
Ambiguity between stdout from different subtests within a file
This is secondary to this feature request but it would be quite nice if in addition to solving 1 and 2 above, the stdout from each subtest could be cleanly separated by node:test. This would enable richer userland implementations on top of run to do common test runner things like only show logs written from a failing subtest.

What is the feature you are proposing to solve the problem?

De-interleaved stdout output from TAP output of the node test runner tests. See proposals below

What alternatives have you considered?

Some ideas:

Use a separate channel to communicate test result output

Here's a diagram representing how it works today:

Each box is a fully fledged OS process. Running a single test directly with node ./path/to/file.test.js outputs TAP just like running a set of tests with node --test does. This is a pleasing application of composite design and this behavior should be preserved through any changes.

The proposal here is that an additional channel is introduced:

This solution solves problems 1 and 2 but not 3. Assuming separate processes are still used, an implementation would likely involve using an IPC channel between the head program and the test executor program. The executor program would detect the presence of an IPC channel and post messages to that IPC channel instead of stdout when it is present. Since the IPC channel messages are an implementation detail they do not have to be TAP. JSON seems like a natural choice but any encoding scheme is possible.

A toy implementation of this design showing the messaging channel mechanism and how to cleanly vary between it and direct stdout: https://github.com/kmannislands/node-test-executor-proposal

Pros:
- Potential for increased efficiency - It should be possible to outperform the current implementation by picking an efficient and simple to encode/decode messaging format. Taken to its extreme, a custom binary encoding scheme could be used for max perf with no user-facing impact. After all, test pass/fail could be one bit far fewer than ok/not ok in UTF-8.
- Simplifies run implementation significantly - Obviates the TAP parser in run, which accounts for a few thousand lines of relatively complex code. Also assuming IPC channels, test messages will arrive and be processed in full removing a lot of the current complexity around handling results serialized as TAP strings to streaming stdout.
- Lends itself more easily to a user-facing API for run - Currently, the return value of run is a TAPStream where users are exposed to quite a bit of implementation detail about TAP. This strikes me as a leaky abstraction--as a user I would expect the return value of run to be like TestResultStream and emit messages that make sense in the domain of a test runner rather than expose me to internal detail of TAP via a TAP text stream or TAP lexer tokens.
Cons:
- Introduces variance in behavior between running a test directly with node and running via run or node:test
- Users can still potentially write to the process's IPC channel, introducing a similar problem to before albeit to a lesser extent since IPC comminication is much more niche in node than console.log. This could be remediated via a special encoding of messages on the IPC channel or probably something else more elegant like a custom additional IPC channel for node:test since this is a core implementation.
- Also diverges from TAP spec on a should
  
  A harness that is collecting output from a test program should read and interpret TAP from the process’s standard output, not standard error. (Handling of test standard error is implementation-specific.)
  
  That being said, it's a should and when pursuing this design, I don't think the TAP spec necessarily applies.
Include stdout either in the yaml diagnostic block or as diagnostic/comment lines
Not sure the best way to implement something like this in the node core but the idea could potentially solve all 3 problems mentioned above while preserving the high-level architecture of node test.

The proposal here is that test std out writes are intercepted and either written to the TAP output as comments/diagnostics or in the body of the yaml diagnostic.

So the desired outcome is something like the contrived example above would output TAP like:
```
TAP version 13
# test-1: not ok
# Subtest: passes
ok 1 - passes
  ---
  duration_ms: 2.229
  ...
1..1
```
or
```
TAP version 13
# Subtest: passes
ok 1 - passes
  ---
  duration_ms: 2.229
  stdOut: |-
    not ok
  ...
1..1
```
Pros:
- Minimal change to the current architecture of node:test
- No difference between direct test runs and running via run
- Solves problem 3, enabling cooler stuff to be built with node:test!
  Cons:
- Seems likely to introduce some funny business with process.stdout
Redirect test stdout to stderr
This is the simplest option to implement. The proposal here is that when running a test, process.stdout is changed by side-effect of calling test or describe to reference process.stderr instead. The reference to stdout should be held on to and only accessible by node:test this ensuring that only node:test can write to it.

The text was updated successfully, but these errors were encountered:

kmannislands · 2022-11-19T23:58:59Z

cc @MoLow @cjihrig keen to hear your feedback.

cjihrig · 2022-11-20T03:58:30Z

Spec compliance

I'm not convinced that this is a spec violation. TAP exists on stdout, which is also where most test authors are likely to dump their debugging logs. I imagine this is also why the spec doesn't define any harness behavior other than NOT treating the test as a failure.

It would be nice to take any invalid lines and format them as a diagnostic or part of the YAML block. I believe the TAP parser PR does the former.

Accidental control flow via console.log

As far as I know, this is a common thing with TAP-first tools. I know it's possible with node-tap. I've also seen mention of this from other ecosystems. For example, ctap's diag() function specifically calls out that it does not interfere with test output. I haven't dug into all of the tools from other ecosystems, but I encourage you to survey them so we can coexist with them.

Ambiguity between stdout from different subtests within a file

This is definitely something that we can improve. I believe the TAP parser PR helps out here. We also want to do something similar with stderr.

Use a separate channel to communicate test result output

In addition to the Cons you listed, this would only work in the context of the Node.js CLI runner. In theory, you should be able to mix and match TAP tools, including existing parsers from npm and even other languages.

Spawning Node with an IPC channel also changes the process - for example, by defining process.send(). I think we should keep the test processes as close to the default configuration as possible to avoid surprising people with additional edge cases. I think the child processes could be spawned with additional non-IPC stdio to avoid this case, but the other Cons remain.

Include stdout either in the yaml diagnostic block or as diagnostic/comment lines

I agree with this. I think this ties in with the "Ambiguity between stdout from different subtests within a file" point. I don't think this is something that can be done with 100% accuracy though.

Redirect test stdout to stderr

I don't think we should do this. Again, the more things we tweak, the more likely we are to surprise people and need to document edge cases.

My personal opinion is that we should try to capture as much non-TAP stdout and stderr as we can from the tests and show them as diagnostics/YAML, but the rest of the concerns here are not really a big deal (In years of working on projects using TAP I've personally never ran into an issue related to tests logging things to stdout).

kmannislands · 2022-11-23T00:54:02Z

Taking a step back, a few questions that I may have incorrectly assumed the answers to that could help me understand your positions better:

What are the design goals of node:test? Why include it in core?

Why was TAP selected for node:test?

cjihrig · 2022-11-23T01:14:04Z

I think those questions are mostly covered in the issue suggesting adding a test runner to core: #40954

Speaking only for myself here: I want a simple, fast test runner that is specifically designed for Node code (not browsers or any other environment). I also don't want to install a ton of dependencies for something that I need on literally every project because the npm registry is hard for me to trust. Also, adding tooling like a test runner is becoming increasingly common for JS runtimes (see Deno and Bun for example). I don't know if I would have picked TAP myself if others hadn't brought it up, but it is used widely (including in Node already), is pretty simple, and does the job well enough.

EDIT: I should clarify that I trust the npm registry itself. It's the module authors that I don't trust.

iambumblehead · 2022-12-01T01:26:58Z

I'm just sort of lurking here but, what a nice response

kmannislands · 2022-12-01T02:32:50Z

Agreed, thanks for taking the time @cjihrig and thanks for linking that issue. I hadn't stumbled upon that yet and it's really useful for understanding the design decision process behind node:test. Solid thinking there and the motivations align with what I'd hope for personally.

FWIW, my motivations for getting involved are very similar. Working on a little passion project, I couldn't stomach the literal hundreds of dependencies that a vanilla jest install brings in so I decided to check out node:test.

Working with the run API, I started figuring out how to parse the TAP stream. Read up on the spec and realized that sticking with the goal of zero dependencies, the nested YAML document was going to be a problem. The parser that node has now is a great start but I can't help but think there are still problems lurking around those YAML docs down the road. Implementing a spec compliant YAML parser from scratch is a huge lift and I think the main friction between the JS and TAP worlds.

Assuming the IPC message approach is a dead-end, I'd like to talk about how the TAP frictions can be alleviated. Did you have any thoughts on my proposal to use a JSON subset of YAML for the nested docs in node:test TAP output? #43344 (comment)

At some point (around when this stops being marked as experimental), it would be good to write a spec that extends say TAP 14 and additionally specifies the contents/structure of the embedded YAML doc.

cjihrig · 2022-12-01T20:25:13Z

I haven't given a lot of thought to parsing YAML. The TAP spec doesn't currently call for any YAML parsing, and I think we can get pretty far without needing to parse it. For example, we want to add reporter support. In core we could log the YAML as is, regex out specific fields that we need, or possibly support parsing some subset of YAML that we need. Userland reporters would be free to npm install anything they want/need. Maybe @MoLow or @manekinekko have additional thoughts there?

Did you have any thoughts on my proposal to use a JSON subset of YAML for the nested docs in node:test TAP output?

I'm not sure about that. It will make the YAML easier to parse at the cost of making it more difficult to read.

it would be good to write a spec that extends say TAP 14

I doubt that we will write any spec, but once the test runner is stable we can discuss treating the TAP output according to semver or some other versioning scheme (similar to what we do with diagnostic reports already).

manekinekko · 2022-12-02T10:22:38Z

possibly support parsing some subset of YAML that we need.

Actually, we already look for specific keys in the YAML block (in the TAP parser). We can add more rules to this logic.

node/lib/internal/test_runner/tap_parser.js

Lines 899 to 922 in d385106

    
           #YAMLLine() { 
        
             const yamlLiteral = this.#readNextLiterals(); 
        
             const { 0: key, 1: value } = StringPrototypeSplit(yamlLiteral, ':'); 
        
             // Note that this.#lastTestPointDetails has been cleared when we encounter a YAML start marker 
        
             switch (key) { 
        
               case 'duration_ms': 
        
                 this.#lastTestPointDetails.duration = Number(value); 
        
                 break; 
        
               // Below are diagnostic properties introduced in https://github.com/nodejs/node/pull/44952 
        
               case 'expected': 
        
                 this.#lastTestPointDetails.expected = Boolean(value); 
        
                 break; 
        
               case 'actual': 
        
                 this.#lastTestPointDetails.actual = Boolean(value); 
        
                 break; 
        
               case 'operator': 
        
                 this.#lastTestPointDetails.operator = String(value); 
        
                 break; 
        
             } 
        
             ArrayPrototypePush(this.#yamlBlockBuffer, yamlLiteral); 
        
           }

MoLow · 2023-02-07T09:14:15Z

now that both TAP parser and reporters landed - closing this as not planned

kmannislands added the feature request Issues that request new features to be added to Node.js. label Nov 19, 2022

MoLow added the test_runner label Nov 29, 2022

MoLow closed this as not planned Won't fix, can't repro, duplicate, stale Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate subtest stdout from TAP output #45529

Separate subtest stdout from TAP output #45529

kmannislands commented Nov 19, 2022

kmannislands commented Nov 19, 2022

cjihrig commented Nov 20, 2022

kmannislands commented Nov 23, 2022

cjihrig commented Nov 23, 2022 •

edited

iambumblehead commented Dec 1, 2022

kmannislands commented Dec 1, 2022

cjihrig commented Dec 1, 2022 •

edited

manekinekko commented Dec 2, 2022

MoLow commented Feb 7, 2023

Separate subtest stdout from TAP output #45529

Separate subtest stdout from TAP output #45529

Comments

kmannislands commented Nov 19, 2022

What is the problem this feature will solve?

What is the feature you are proposing to solve the problem?

What alternatives have you considered?

kmannislands commented Nov 19, 2022

cjihrig commented Nov 20, 2022

kmannislands commented Nov 23, 2022

cjihrig commented Nov 23, 2022 • edited

iambumblehead commented Dec 1, 2022

kmannislands commented Dec 1, 2022

cjihrig commented Dec 1, 2022 • edited

manekinekko commented Dec 2, 2022

MoLow commented Feb 7, 2023

cjihrig commented Nov 23, 2022 •

edited

cjihrig commented Dec 1, 2022 •

edited