Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various issues with assert.snapshot() #44466

Closed
tniessen opened this issue Aug 31, 2022 · 23 comments
Closed

Various issues with assert.snapshot() #44466

tniessen opened this issue Aug 31, 2022 · 23 comments
Labels
assert Issues and PRs related to the assert subsystem.

Comments

@tniessen
Copy link
Member

tniessen commented Aug 31, 2022

Copied from #44429 (comment):


  • I guess that the made-up serialization format was supposed to be human readable, but it really does not seem appropriate for assertions. I don't expect it to be reliable in the presence of things such as

    { [util.inspect.custom]() { return '#*#*#*#*#*#*#*#*#*#*#*#'; } }
  • util.inspect() potentially omits information, leading to false negatives:

    const a = new ArrayBuffer(101);
    const snapshot1 = util.inspect(a);
    new Uint8Array(a)[a.byteLength - 1] = 1;
    const snapshot2 = util.inspect(a);
    assert.strictEqual(snapshot1, snapshot2);
    // passes even though the inspected value changed!
  • util.inspect() is affected by util.inspect.defaultOptions, which means that two snapshots of the same value can be different, leading to false positives.

  • Looking at the change history for util.inspect(), it seems that the default options changed between versions. This would potentially also break existing .snapshot files, leading to false positives.

  • The documentation of util.inspect() literally says:

    The util.inspect() method returns a string representation of object that is intended for debugging. The output of util.inspect may change at any time and should not be depended upon programmatically.

  • Reading and potentially writing a file at some opinionated location during a call to an assert API seems very unusual to me, especially because it causes the API to become async. It also seems like this could easily lead to unexpected conflicts.

    This is particularly problematic when the main module is not the actual test file (e.g., when using mocha or other test runners).

  • I understand its purpose, but the addition of a CLI option to "refresh" snapshots (during API calls within assert) seems unusual to me (but maybe that's common practice somewhere).


Because this is an experimental feature, we can still revert it or modify it to at least address some of these issues.

@tniessen tniessen added the assert Issues and PRs related to the assert subsystem. label Aug 31, 2022
@BridgeAR
Copy link
Member

BridgeAR commented Aug 31, 2022

A couple of the issues can be overcome by settings the options. I believe serializing with our v8 serializer would indeed be safer?

@benjamingr
Copy link
Member

Copying my comment from the other issue:

I think we generally just missed this in the original PR and hadn't considered the point you're making here and I think that after I've read your point it I agree with you. We should absolutely change the serialization to not always use util.inspect because of the false negatives/positives that might produce.


So, I consider this a bug in the current implementation (since the results aren't always correct) and I think we should modify this.

@benjamingr
Copy link
Member

FWIW Jest also has a addSnapshotSerializer API

@MoLow
Copy link
Member

MoLow commented Sep 1, 2022

A couple of the issues can be overcome by settings the options. I believe serializing with ourv8 serializer would indeed be safer?

@BridgeAR are you referring to https://nodejs.org/api/v8.html#new-serializer ?

@tniessen
Copy link
Member Author

tniessen commented Sep 1, 2022

v8.serialize() is generally safe, however, the output may also depend on the version of V8 and thus on the version of Node.js being used. I believe it is backward compatible, so a newer version of V8 will usually be able to deserialize data produced by an earlier version of V8, but the converse might not be true: data serialized by a newer version of V8 might not be readable by older versions of V8. If snapshots are supposed to be committed to git repositories, this might be a problem.

@MoLow
Copy link
Member

MoLow commented Sep 1, 2022

another approach would be to only accept strings, and escape strings that match snapshot delimiters, then we have no serialization

@benjamingr
Copy link
Member

v8.serialize() is generally safe, however, the output may also depend on the version of V8 and thus on the version of Node.js being used. I believe it is backward compatible, so a newer version of V8 will usually be able to deserialize data produced by an earlier version of V8, but the converse might not be true: data serialized by a newer version of V8 might not be readable by older versions of V8. If snapshots are supposed to be committed to git repositories, this might be a problem.

I think "snapshots generated in newer versions of Node only work in these newer versions" is a fair approach (and if you need to change versions just regenerate the snapshots).

A different alternative would be for us to use Jest's serializer (or one of the other userland ones)

@tniessen
Copy link
Member Author

tniessen commented Sep 1, 2022

another approach would be to only accept strings, and escape strings that match snapshot delimiters, then we have no serialization

Aside from how restrictive this limitation would be, why does assert.snapshot() use some unreliable made-up format instead of JSON or some other well-defined format? (Or even JavaScript files, which is what jest seems to do... not that I necessarily like the comparison.)

I think "snapshots generated in newer versions of Node only work in these newer versions" is a fair approach (and if you need to change versions just regenerate the snapshots).

I am not sure. My understanding of assert.snapshot() is that it is supposed to replace explicit assertions with .snapshot files that are committed to the repository. If someone creates said .snapshot file with version X of Node.js and commits it to some repository and then someone else checks out the repository with version Y < X, it might unexpectedly fail, even if both X and Y are non-EOL versions of Node.js and both support assert.snapshot().


This feature appears sufficiently complicated while easy to implement in userland that I am not sure including it in core is a good idea.

@BridgeAR
Copy link
Member

BridgeAR commented Sep 1, 2022

why does assert.snapshot() use some unreliable made-up format instead of JSON

We would be limited to only accept input that is fully compliant to JSON and that contains no values that are stripped by JSON. Ideally, we serialize as much information as possible to support a wide range of inputs.

@tniessen
Copy link
Member Author

tniessen commented Sep 1, 2022

@BridgeAR My comment was in response to the suggestion to only serialize key-value pairs of strings, in which case JSON would be more than enough :)

@benjamingr
Copy link
Member

I played a bit with moving serialization to the v8.serializer/deserialize API and making it synchronous.

I think that:

  • Making it synchronous is nice and simplifies code - so I am in favor of that change especially since JS doesn't have async stacks (only traces) - so this is an improvement and I'm in favor.
  • Using v8.serialize/deserialize kind of sucks because it's a binary serialization format and it makes one of the most useful aspects of snapshot testing (being able to look at the diff during code review and seeing how behavior change) impossible - you basically end up with binary blocks.

Some thoughts:

  • We can save the snapshot twice (one binary with 100% fidelity using the v8.serialize API and one textual with inspect with infinite depth for DX)
  • We can JSON.stringify with a reasonable reviver which is a pretty common approach and support customSerializer/deserializer and that can take v8.serialize/deserialize for sacrificing developer experience for more supported cases.

@BridgeAR
Copy link
Member

BridgeAR commented Sep 1, 2022

Having a sync API is tricky for big snapshots but we could accept that and just document it so that everyone is aware not to save huge files in a snapshot.

It is possible to prohibit custom inspect and define all options so that the user's custom settings won't impact the output (we should define the options anyway to guarantee the value is properly stored by inspecting hidden values etc.). So the first three mentioned issues above can all easily be resolved. The main issue I see with inspect is that the output may change at any point of time and it would not be backwards compatible. We could improve that by adding the Node.js version to the output and to allow to pass that through to a duplicate implementation that contains the implementation of each version that changed something on the output format. Not nice but it would work.
It would at least still be possible to diff the code during code reviews that way.

We can save the snapshot twice

This might not be sufficient as users might be confused in case the output format changed while switching versions and they just refreshed the snapshot and now there seem to be changes even though that's not actually the case.

@tniessen
Copy link
Member Author

tniessen commented Sep 3, 2022

We can save the snapshot twice (one binary with 100% fidelity using the v8.serialize API and one textual with inspect with infinite depth for DX)

Data redundancy leads to inconsistency, which is something an assert API should avoid IMHO.

So the first three mentioned issues above can all easily be resolved. The main issue I see with inspect is that the output may change at any point of time and it would not be backwards compatible. We could improve that by adding the Node.js version to the output and to allow to pass that through to a duplicate implementation that contains the implementation of each version that changed something on the output format. Not nice but it would work.
It would at least still be possible to diff the code during code reviews that way.

It seems to me that util.inspect() was not at all designed for this purpose, and using it to reliably compare potentially complex data structures still seems like a hacky approach to me.

Finding a simple serialization format suitable for snapshots might be difficult. Jest seems to use CJS files as snapshots, which export the values.


The current implementation appears to unconditionally store the snapshot based on the main module path. Isn't this problematic when using test runners such as mocha, where the main module is not the user's test file?

If this API is going to be complex (custom serializers, where to store snapshots, tooling such as --update-assert-snapshot, matching using asymmetric matchers etc.), does it need to live in core?

@benjamingr
Copy link
Member

The current implementation appears to unconditionally store the snapshot based on the main module path. Isn't this problematic when using test runners such as mocha, where the main module is not the user's test file?

Not when I tested it (either with Mocha nor with the native test runner) but it's worth exploring and I think making the path configurable is definitely something we should (eventually) do.

If this API is going to be complex (custom serializers, where to store snapshots, tooling such as --update-assert-snapshot, matching using asymmetric matchers etc.), does it need to live in core?

Those things don't seem overly complex honestly compared to most stuff that lives in core and I do think snapshot testing is something users would expect to live in core since we ship a test runner.

That said - I think it's totally fine to experiment with it for a while and iron out issues (e.g. fixing serialization) and eventually decide we'd rather vendor it in its own package. I think it'll be easier to estimate the complexity and weigh the trade-off later based on feedback (both from core and from users and adoption).

@tniessen
Copy link
Member Author

tniessen commented Sep 4, 2022

The current implementation appears to unconditionally store the snapshot based on the main module path. Isn't this problematic when using test runners such as mocha, where the main module is not the user's test file?

Not when I tested it

Example using mocha, leading to the arbitrary snapshot path colliding for two separate test files:

tniessen@ubuntuserver:~/dev/snapshot-problems$ node -v
v18.8.0
tniessen@ubuntuserver:~/dev/snapshot-problems$ cat test1.mjs 
import { snapshot } from 'node:assert';
console.log('test1', process.mainModule?.filename, process.argv[1]);
await snapshot({ foo: 'bar' }, 'foo1');
tniessen@ubuntuserver:~/dev/snapshot-problems$ cat test2.mjs 
import { snapshot } from 'node:assert';
console.log('test2', process.mainModule?.filename, process.argv[1]);
await snapshot({ foo: 'bar' }, 'foo2');
tniessen@ubuntuserver:~/dev/snapshot-problems$ npx -y mocha test1.mjs 
test1 /home/tniessen/.npm/_npx/508606763866ae01/node_modules/mocha/bin/mocha.js /home/tniessen/.npm/_npx/508606763866ae01/node_modules/.bin/mocha


  0 passing (1ms)

tniessen@ubuntuserver:~/dev/snapshot-problems$ npx -y mocha test2.mjs 
test2 /home/tniessen/.npm/_npx/508606763866ae01/node_modules/mocha/bin/mocha.js /home/tniessen/.npm/_npx/508606763866ae01/node_modules/.bin/mocha

AssertionError [ERR_ASSERTION]: Snapshot "foo2" does not exist
    at new AssertionError (node:internal/assert/assertion_error:467:5)
    at snapshot (node:internal/assert/snapshot:125:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async file:///home/tniessen/dev/snapshot-problems/test2.mjs:3:1

Those things don't seem overly complex honestly compared to most stuff that lives in core and I do think snapshot testing is something users would expect to live in core since we ship a test runner.

Maybe it should be part of node:test then? node:assert tends to be a stable, unopiniated, and clean API that does not do any I/O and does not need command-line flags etc.

@benjamingr
Copy link
Member

Maybe it should be part of node:test then? node:assert tends to be a stable, unopiniated, and clean API that does not do any I/O and does not need command-line flags etc.

That sounds reasonable to me (though unlike your point on serialization and Colin's about a sync API I don't have a strong feeling of where this should live other than it should be experimental) so:

  • Make the API sync
  • Use JSON.stringify with util.inspect as a reviver for custom stuff JSON doesn't deal with
  • Move it from assert to node:test
  • Change the pathing of where the snapshot is stored and test that with mocha/jest/whatever

@MoLow @tniessen how does that sound?

Additionally - feel free to split those tasks into issues and ask for help - even if we agree on everything it's likely you shouldn't have to be the one to do all the work.

@MoLow
Copy link
Member

MoLow commented Sep 4, 2022

Make the API sync

Souns Ok

Use JSON.stringify with util.inspect as a reviver for custom stuff JSON doesn't deal with

Sounds ok to me - but I am not sure this will address all of the concerns raised here (mainly the stability/backwards compatibility of util.inspect)
the reason why I suggested we should only accept strings is so serialization would be 100% userland's responsibility

Move it from assert to node:test

no strong opinion

Change the pathing of where the snapshot is stored and test that with mocha/jest/whatev

in the case of node:test I think the current setting is fine, I think this should be configurable and use the current setting as a default. the initial implementation accepted a target and we moved that for simplicity

@benjamingr
Copy link
Member

I would strongly prefer to serialize the input using JSON (Which is what most userland snapshot libs tend to do) + known extensions than only accepting strings.

Other than that I agree with everything you said.

@MoLow
Copy link
Member

MoLow commented Sep 6, 2022

CC @tniessen is this recap good enough to start working on it?

@tniessen
Copy link
Member Author

tniessen commented Jan 4, 2023

Sorry about the delay. I strongly feel that the current API is problematic and does not belong in node:assert in its current state. The function should not be used, so removing the API from node:assert would be my first priority. If someone finds a way to fix the various problems, they could suggest adding it to node:test or maybe node:assert again.

@intrnl
Copy link

intrnl commented Feb 9, 2023

is there any changes with regards to the file format that these snapshots are going to be written into?

or did the JSON serialization imply that the snapshot file is going to be in JSON as well? (so not just the snapshot values)

my worry is that JSON will make reading multi-line snapshots harder (especially with diffs)

@tniessen
Copy link
Member Author

tniessen commented Feb 9, 2023

@intrnl No snapshot API exists at the moment. The serialization format of a future snapshot API, if any, has not been defined.

@tniessen
Copy link
Member Author

I am going to close this issue since the problematic API has been removed and because I am not aware of any related feature requests or development activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assert Issues and PRs related to the assert subsystem.
Projects
None yet
Development

No branches or pull requests

5 participants