Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reportVersion semantics are not defined #349

Closed
addaleax opened this issue Jan 17, 2020 · 31 comments
Closed

reportVersion semantics are not defined #349

addaleax opened this issue Jan 17, 2020 · 31 comments

Comments

@addaleax
Copy link
Member

The process.report/--experimental-report feature comes with output that contains a version number; however, it is unclear what that version number means and when it is incremented.

For nodejs/node#31386, it would be a good starting point to know if purely additive changes to the JSON format (i.e. only adding previously non-existent keys) should lead to version bumps.

@jasnell also suggested switching to semver, whereas @cjihrig pointed out that in the PR that added versioning, a single-integer versioning scheme was requested. I personally find it hard to make a decision on this question without knowing how consumers are supposed to interact with the version number.

I'm adding this to the WG agenda, I hope that's okay.

@richardlau
Copy link
Member

cc @boneskull

@mhdawson
Copy link
Member

For me I think it depends on whether it's ok to ask consumers to check the Node.js version as well. If a breaking change will be signalled by a bump in the Node.js major, then the version number as is might be ok with it being bumped every time something new is added to the report.

Having said that it may be better to use SemVer and separate out from the Node.js version. For example, a new Node.js major may have no breaking changes in the report so using the Node.js version for when you may need a new version of tools processing reports is sub-optimal.

The previous comparisons to N-API and ABI using a single number don't necessarily apply in my mind because ABI is always SemVer major and N-API is only SemVer minor so you only need 1 number to represent them.

@boneskull
Copy link

as a tooling author consuming these—and I realize I may be the only one—semver seems like an overcomplication; more difficult to parse and of dubious value. if someone wants to argue for semver, I’m happy to listen about what it’ll enable that I’m failing to consider.

if the report format changes in any way (new keys, changed, removed, or even moved since the order is deterministic; changed types of values; consider the entire tree) the version should be bumped.

@gireeshpunathil
Copy link
Member

I agree with @boneskull . Given that report is a diagnostic feature, IMO modification is all about more data, less data or structural changes, and the main party affected is tools. semver means adding more complexity here.

pinging @june07 (who developed an inspector plugin that understands and renders report https://github.com/june07/NiM ) for an opinion

@june07
Copy link

june07 commented Jan 23, 2020

@gireeshpunathil Thank you for including me in the discussion.

While I see the benefit in simplicity vs complexity, I do think that semantic versioning would be the better choice. And given that as software developers we're all very familiar with semver, I'm not even sure I'd categorize using semver as adding to complexity, in fact I think it will help save us from complexity down the road. @boneskull has done a great job with rtk and in fact NiM and related tooling (BrakeCODE) use the rtk library, which demonstrates perfectly the sort of pipelines that do/may exist amongst different software, all which do/may interact with these reports.

In example, JSON fields are currently added to reports that are ingested by BrakeCODE tooling (see screenshot) to encapsulate processing done by the rtk library. While this did not have to be included directly in the report schema, I think doing so is helpful as it eliminates introducing yet another parent data type. But, having an agreed upon standard (semver) of bumping report versions might make things nicer for other potential consumers of the data further down the pipeline, giving better transparency into data schema. NiM is a perfect example, because it is a consumer of reports from BrakeCODE. While adding keys should not equate to breaking changes, it might be nice to have a way to represent those additions via versioning.

it would be a good starting point to know if purely additive changes to the JSON format (i.e. only adding previously non-existent keys) should lead to version bumps.

No with single integer versioning but yes with semver.

I think using a single integer version scheme might be somewhat limiting, non breaking changes being essentially invisible. @mhdawson brings out a good point regarding the Node.js version, having report versioning tightly coupled with Node.js versioning such that report versions are bumped in sync with Node.js is probably bad since changes in one don't necessarily indicate breaking changes in the other. Semver obviously solves this problem. And while more complicated, ultimately we’re all used to the semver paradigm and the complication really only lies in how semver will map to JSON schema changes vs code changes as we’re all very used to, not not much differently I assume.

In working together to agree on the answer to...

it is unclear what that version number means and when it is incremented.

I feel that semver empowers one to answer that question in the most elegant way, while considering more than breaking changes.

@gireeshpunathil
Copy link
Member

thanks @june07 for the insights! Feel free to join our one of the upcoming diagnostic WG meetings when we take this up. the next one is going to be on 29th, but unlikely to pick this one up (due to pre-decided agenda, pinging @hekike for a confirmation. )

@boneskull
Copy link

To be clear, my complexity concern is mainly around avoid string-parsing acrobatics or pulling in the userland semver package.

I see semver is useful for developers who can choose what version of something to consume.

For example, it enables automated upgrades, assuming the contract is upheld. Or you may know you don’t want to upgrade to a new major release, because it’s likely to contain breaking changes. It works pretty well most of the time!

But a tool consuming a report file won’t be able to choose the version of said report file; the concept of an “upgrade” doesn’t really apply to the use case.

A tool will change its behavior based on the version of the report, however. Regardless of what a semver version “means” in a report (e.g., minor version means “new field”), the behavior must still be defined based on information that semver cannot provide.

Example:

If we add a report field called foo in a hypothetical version 2.1.0, a tool will read the version of the report file, and if it’s >=2.1.0 the tool must do whatever it’s going to do with foo. If it is <2.1.0 it will not. Note that the semantic version is unable to tell the tool that foo was added. 😉

If v2.1.1 is a bug fix to field bar, a very similar decision will need to be made—assuming the tool cares about the fix—but this time, based on >=2.1.1 / <2.1.1.

Likewise with a breaking change in v3.0.0. Note that the semver conventions ~ and ^ are not currently applicable, and will not be until we have a situation in which the same behavior is broken more than once between different major releases.

OTOH, if you are using incrementing integers for versions, the decision-making is exactly the same as above example... except a tool doesn’t have to parse semver!

Another problem—this is not technical, but of the “people” variety—is that if a tooling author might see semver and think it’s generally okay to restrict the tool’s operation based on the major release number—because that’s how people use semver! For example, only supporting ^2.0.0, when in fact nobody knows if a v3 report will work or not.

In summary, SemVer was designed to give people better control of automated software upgrades. I just can’t see how our use case fits with that aim.

@jasnell
Copy link
Member

jasnell commented Jan 25, 2020

A simplified two digit semver approach works also here. Just major and minor/patch.

@richardlau
Copy link
Member

Not too keen on a two digit approach. Single digit as @boneskull points out is easier for tools to parse. If we need more digits then semver at least is well defined and has existing parsers that could be reused. If tools were doing their own parsing they could just ignore the patch level (since by definition changes to the patch level should not break them).

@boneskull
Copy link

@jasnell From your point-of-view, what are the advantages of using semver here?

@gireeshpunathil
Copy link
Member

We deliberated in the last(-to-last) WG meeting on this.

Report being a data (as opposed to code), patch does not make sense anyways. So the contention is really between x.y semantics versus x semantics.

I propose a single digit versioning with version bump on every structural changes wherein structural change is anything that:

  • adds a new key
  • removes a key (though chances are less for this to happen)

It is reasonable to expect report parsers to parse it in JSON-native way, instead of line-parsing, char-parsing etc. One of the main motivation for the report to be in JSON format was easy-parsing for consuming tools. Because of this, section movements / aesthetic modifications do not cause a version bump.

It is reasonable to expect no data type changes being applied to the fields.

@mhdawson
Copy link
Member

It is reasonable to expect no data type changes being applied to the fields.

Can we depend on that? Maybe we should add a point to your list that says:

  • changes the data type for a field

@gireeshpunathil
Copy link
Member

makes sense @mhdawson . so the modified proposal is:

a single digit versioning with version bump on every structural changes wherein structural change is anything that:

  • adds a new key
  • removes a key
  • changes the data type for a field

@legendecas
Copy link
Member

legendecas commented Feb 17, 2020

It is reasonable to expect no data type changes being applied to the fields.

Will a format change of string types be counted as a significant change? Like the format of error stacks.

@gireeshpunathil
Copy link
Member

gireeshpunathil commented Feb 18, 2020

Will a format change of string types be counted as a significant change? Like the format of error stacks.

@legendecas - I don't think so, as this sort of change implies no change in the report's functionality, but a change in node core w.r.t data representation.

@mmarchini
Copy link
Contributor

On the other hand, if the formatting of a string changes, clients consuming the report could break, so maybe this should be considered a breaking change (thus bumping the number)?

@boneskull
Copy link

I agree with @mmarchini. @gireeshpunathil said it himself--the report is data, not software. It does not have functionality; it has a format. If that format changes, it should be considered a bumpable change. My preference is that we bump on every change to the format; I don't see the advantage of not bumping.

@jasnell
Copy link
Member

jasnell commented Feb 26, 2020

Changes in format changes absolutely have to be handled as breaking, requiring a version bump.

I'm absolutely fine with a single digit version but that does leave a couple of additional unanswered questions...

(a) Is the version number bumped once per change landed in master?

Let's say the current version is 1, and I land two changes in master in separate commits over the course of 2 weeks. Is the version now 2 or 3?

(b) What is the backport policy?

Let's say that LTS 12.x report version is 3, and master is updated to version 4. Let's say it's an additive change. Are all report changes backportable to 12.x and how do we determine that?

(c) What if a bug is found in an LTS branch that does not exist in master?

Let's say that LTS 12.x report version is 3, and master has had a breaking change that alters the format and bumps the version to 4. Now a bug in the 12.x version 3 report is found and needs to be fixed only in that branch, what does the new version number become?

@june07
Copy link

june07 commented Feb 26, 2020

For those so inclined, I think this would be a good read.
https://snowplowanalytics.com/blog/2014/05/13/introducing-schemaver-for-semantic-versioning-of-schemas/

@boneskull
Copy link

Let me prefix this by saying I really don't want to beat this to death--IMO there's enough analysis paralysis going on in nodejs without this particular issue. So I'm going to decline to comment after this one.

From the snowplow article:

When versioning a data schema, we are concerned with the backwards-compatibility between the new schema and existing data represented in earlier versions of the schema

I am also concerned with this, which is good!

From their definitions:

ADDITION when you make a schema change that is compatible with all historical data

They go on to offer an example of adding an optional field which would satisfy this requirement. Indeed, this would satisfy the requirement in the context of a well-defined JSON schema which allows arbitrary fields.

OTOH, we do not have a well-defined JSON schema for reports (maybe we should), so we cannot make these guarantees. For example, a consumer may base their parsing of a report based on a field count. An addition would change the field count. Because there's no contract--a schema, in this case--which says extra fields are allowed, we can't claim adding a field won't break a consumer.

From the section about REVISION:

REVISION when you make a schema change which may prevent interaction with some historical data

Essentially, this means "we don't know". Wouldn't it be better to err on the side of caution and go with MODEL, which is equivalent to MAJOR?

So if ADDITION is not applicable, and REVISION is dubious, that leaves a single number.

At this point, anything more than a single number seems like overengineering to me. We can always change it later if we need to.

To @jasnell's questions

(a) Is the version number bumped once per change landed in master?

It's probably easier to say "if you're going to land a PR which changes the format, bump the version" than the alternative, which is... I don't know what it is, but it's a PITA.

(b) What is the backport policy?

What's the prior art?

(c) What if a bug is found in an LTS branch that does not exist in master?

What's a "bug" in terms of the diagnostic report format? 😄

@jasnell
Copy link
Member

jasnell commented Feb 26, 2020

What's the prior art?

full-semver, only patch and minors backported.

What's a "bug" in terms of the diagnostic report format?

Incorrect data being generated, for example.

One possible way to deal with this would be to make the report format version relative to the Node.js major, with the counter reset to zero at each major release. For instance... All 12.x related changes: 12.1, 12.2, 12.3 etc ... All 13.x related changes 13.1, 13.2, 13.3.. and so on.

@addaleax
Copy link
Member Author

@jasnell I think in that case you could just use the Node.js version number. I don’t think such a format would be helpful, because we will most likely maintain backwards-compatibility up to additive changes in the future, and making it relative to the Node.js version number would make it look like there are breaking changes when there are none.

Honestly, I’d just go with semver at this point. In the worst case, it means that we’re exposing more information about the format than necessary, which seems okay to me.

@mhdawson
Copy link
Member

I think the key question raised by @jasnell is around backports. If the number is going to be meaningful across versions then we will have to either backport all of the changes up to a certain report version or none (ie if you want to backport the change which results in version 10 you need to include all changes which resulted in 1,2,3,4,5,6,7,8,9. You can't just pick individual changes out of sequence).

If the number is not going to be meaningful across versions then making it relative to the Node.js version might make sense but I'm not sure that is all that useful as @addaleax mentions.

I think if we commit to backporting so that the numbers are meaningful across versions then a single number is probably ok, but I'm also fine with SemVer.

@jasnell
Copy link
Member

jasnell commented Feb 26, 2020

With regards to the backport issue, it's perfectly valid to declare that the diagnostic report format is not subject to the normal semver rules and therefore breaking changes to the diagnostic format can occur in any release, even within an LTS line. This allows backports to be made any time. I would limit this to format changes. Major changes that impact the overall function of the report mechanism (e.g. removing it, changing the command line flags, etc) would fall under normal semver rules.

@github-actions
Copy link

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@github-actions github-actions bot added the stale label Jul 21, 2022
@gireeshpunathil
Copy link
Member

I would like to resurrect this from being stale, and would like to move forward with single digit versioning, and on top of that, what @jasnell suggested in #349 (comment)

problem: right now we don't have a rule around versioning, and that needs to be fixed
opportunity: the above proposal gives a reasonable way forward, we can refine it as we go, if need be.

putting it back to diagnostics WG agenda.

@gireeshpunathil
Copy link
Member

@RafaelGSS - you asked, why can't we attach the containing Node.js version for the report version too? .

here is the issue with that: if there are more than one structural changes in the report that a Node.js version carry, then we cannot differentiate between those.

@No9
Copy link
Member

No9 commented Aug 24, 2022

As agreed in the diagnostics meeting I said I would take a look.

To me #349 (comment) from Chris is a good summary.
Specifically the point on there being no schema in the current implementation making the conversation somewhat academic.

With that I'm +1 on the suggestion from @gireeshpunathil to move this forward with a single number as it reflects the current implementation and can be revisited.

gireeshpunathil added a commit to gireeshpunathil/node that referenced this issue Oct 18, 2022
Diagnostics report has a version number representing its format,
yet its rule is not defined. This doc change specifies the rule.

Refs: nodejs/diagnostics#349
Refs: nodejs#28121 (comment)
gireeshpunathil added a commit to gireeshpunathil/node that referenced this issue Oct 21, 2022
Diagnostics report has a version number representing its format,
yet its rule is not defined. This doc change specifies the rule.

Refs: nodejs/diagnostics#349
Refs: nodejs#28121 (comment)
nodejs-github-bot pushed a commit to nodejs/node that referenced this issue Oct 21, 2022
Diagnostics report has a version number representing its format,
yet its rule is not defined. This doc change specifies the rule.

Refs: nodejs/diagnostics#349
Refs: #28121 (comment)
PR-URL: #45050
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
@gireeshpunathil
Copy link
Member

this is done via nodejs/node#45050 , closing.

RafaelGSS pushed a commit to nodejs/node that referenced this issue Nov 1, 2022
Diagnostics report has a version number representing its format,
yet its rule is not defined. This doc change specifies the rule.

Refs: nodejs/diagnostics#349
Refs: #28121 (comment)
PR-URL: #45050
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
RafaelGSS pushed a commit to nodejs/node that referenced this issue Nov 10, 2022
Diagnostics report has a version number representing its format,
yet its rule is not defined. This doc change specifies the rule.

Refs: nodejs/diagnostics#349
Refs: #28121 (comment)
PR-URL: #45050
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
danielleadams pushed a commit to nodejs/node that referenced this issue Dec 30, 2022
Diagnostics report has a version number representing its format,
yet its rule is not defined. This doc change specifies the rule.

Refs: nodejs/diagnostics#349
Refs: #28121 (comment)
PR-URL: #45050
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
danielleadams pushed a commit to nodejs/node that referenced this issue Dec 30, 2022
Diagnostics report has a version number representing its format,
yet its rule is not defined. This doc change specifies the rule.

Refs: nodejs/diagnostics#349
Refs: #28121 (comment)
PR-URL: #45050
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
danielleadams pushed a commit to nodejs/node that referenced this issue Jan 3, 2023
Diagnostics report has a version number representing its format,
yet its rule is not defined. This doc change specifies the rule.

Refs: nodejs/diagnostics#349
Refs: #28121 (comment)
PR-URL: #45050
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants