Discussion: Criteria for upgrading diagnostic reporting from experimental to stable #26293

cjihrig · 2019-02-24T19:54:08Z

I don't think it's too early to start discussing the criteria for eventually moving the new diagnostic reporting functionality out of experimental status. As a diagnostic tool, the criteria might be different from n-api, http2, or workers, as the ecosystem is less likely (though not impossible) to build other modules on top of the core functionality.

gireeshpunathil · 2019-02-25T04:09:51Z

@cjihrig - here are few questions / considerations that I can think of:

user interfaces (flags, API semantics): are they consumable in the current form, and ratified by 'some' real users? [ my take is that we should; may be evangelize through few conferences and then collect some feedback ]
or aret those need to be ratified by end users, instead our conviction is sufficient?
platform coverage: I guess we don't have any platform skipped for report that Node.js is supported.
test coverage: I don't know the latest on this, but there were few new flags that came up which did not have coverage, when I ported it.
worker support: do we need to support workers, to be able to call it stable? [ my take is we don't need to, this is something we can do later without changing the external interfaces ]
if so, what does it mean by worker support? many of the sections in the report are process-scopped (versions, command line, shared libs, etc.) while few are environment / isolate scopped (call stack, JS heap, uv loop etc.). A full worker support in my opinion would be: those sections should have an array of artifacts, covering all the workers and the main. In addition, worker data inclusion could be configurable.
should the unhandled native crashes (GPF) be supported under fatalexception? Right now this is not supported, as GPF handling requires signal safe operations, and report needs to be heavily refactored. [ my take is that it depends on the number of such cases we get, as well as end user suggestions ].
right now there is a bug in the report. When fatal error occurs with OOM, a report is generated irrespective of the configuration. This is because the env data where we store the config data is unavailabe in fatal error contexts; so we cannot figure out what the user request was. This could be solved in a couple of ways, but worker support decision can potentially influence how we want to attack this.

Overall, I agree that it is reasonable to discuss the experimental exit criteria at this point, and identify / action on items that we agree upon.

cjihrig · 2019-04-03T01:26:09Z

I think this list can be condensed a bit down to:

User adoption. Again, I don't think many modules will be built on top of this feature.
Test coverage. I checked the coverage, and I think this metric is beyond good enough for classification as stable. Some code paths that are covered are not reported in our coverage reports.
Handling of GPF and OOM crashes. I would classify these as bugs, but not blocking bugs.
Worker support.

I'd really like to see this move out of experimental before Node 12 goes LTS.

richardlau · 2019-06-07T16:30:26Z

nodejs/diagnostics#306 proposing versioning the report format.

cjihrig · 2019-06-07T16:47:46Z

Proof of concept for the versioning thing at #28121.

richardlau · 2020-01-17T00:02:01Z

Support for Workers: #31386

cjihrig · 2020-02-09T17:49:05Z

I'm working on unflagging this. These are the changes I'm implementing:

Make the node_report configure option a no-op. The idea is that report will always be available moving forward.
Make the --experimental-report CLI flag a no-op and remove its uses in the code and docs. The experimental warning will also be removed.
Make the --report-on-fatalerror CLI flag experimental. It seems like most remaining issues are related to this flag and things like OOM scenarios. I don't think it's worth keeping everything else experimental because of this.

I'll follow up with semver-major changes:

Remove the no-op node_report configure option.
Remove the no-op --experimental-report CLI flag.

Is anyone opposed to this plan? It seems like the report feature is mostly stable at this point, and it would be great to get more people using it by unflagging.

sam-github · 2020-02-09T18:27:18Z

LGTM, but consider leaving the no-op CLI flag around for as long as its needed in some supported Node.js versions, this seems to be the pattern we're using for other options (--http-parser) that don't do anything, but are still allowed to be specified so the same invocation works for across all release lines.

legendecas · 2020-02-10T01:45:54Z

Make the --experimental-report CLI flag a no-op and remove its uses in the code and docs. The experimental warning will also be removed.

Then would an opt-out option to disable the generation of node-report make sense with it?

cjihrig · 2020-02-10T14:40:46Z

Then would an opt-out option to disable the generation of node-report make sense with it?

I might be misunderstanding your comment, but I don't think that's necessary. --experimental-report currently enables the diagnostic report feature, but the reports should only be generated if --report-uncaught-exception, --report-on-signal, --report-on-fatalerror, etc. is used.

gireeshpunathil · 2020-03-11T02:23:54Z

@cjihrig -

v14.0.0 is being planned with March 24th as the development-cut date #32181 . It would be great to release report as stable in 14, given it is a major plus slated for future LTS. Are you already working on items listed in #26293 (comment) ? let me know if I can be of any help!

cjihrig · 2020-03-11T02:27:51Z

Yes, my plan is to have this ready to land as stable in Node 14.

legendecas · 2020-03-11T07:22:37Z

Make the --report-on-fatalerror CLI flag experimental. It seems like most remaining issues are related to this flag and things like OOM scenarios. I don't think it's worth keeping everything else experimental because of this.

Just bringing #29881 up to see if --report-on-fatalerror not been respected (conditional generating reports on fatal error regardless of the presence of the flag) will be counted as a blocker to the stable promotion?

gireeshpunathil · 2020-03-11T07:30:12Z

@legendecas - as quoted somewhere, the issue #29881 is trying to address is a bug, not a missing feature, so should not be a blocker for elevation to stable.

However, I don't see major challenges in getting that addressed in 2 week's time as well!

HarshithaKP · 2020-03-12T09:46:29Z

Rework of #29881 in #32207.

This commit makes the configure --without-report flag a no-op. This commit also updates a test that depends on the report CLI flags being conditionally present. PR-URL: nodejs#32242 Fixes: nodejs#26293 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

The report feature won't ever be disabled moving forward, so checking for its existence in the tests is no longer needed. PR-URL: nodejs#32242 Fixes: nodejs#26293 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

This commit makes the --experimental-report CLI flag a no-op. PR-URL: nodejs#32242 Fixes: nodejs#26293 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

This commit updates the stability documentation for the report feature. All diagnostic report APIs are now listed as stable, with the exception of report-on-fatalerror, which still has a few bugs to work out. PR-URL: nodejs#32242 Fixes: nodejs#26293 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

sam-github · 2020-03-16T19:51:15Z

I'm reopening this because #32242 is marked semver-major, which will block backport to 12.x

Why is moving to un-experimental considered semver-major? In what case is that a breaking change? It seems semver-minor to me. Changing stability to be less stable would be a breaking change, I can see that, but the opposite seems backportable.

addaleax · 2020-03-16T19:55:24Z

@sam-github I guess that’s a question for @cjihrig, but yes, it does not seem semver-major to me at all.

This commit removes all #ifdef NODE_REPORT checks in the src directory. PR-URL: #32242 Fixes: #26293 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

PR-URL: #32242 Fixes: #26293 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

This commit makes the configure --without-report flag a no-op. This commit also updates a test that depends on the report CLI flags being conditionally present. PR-URL: #32242 Fixes: #26293 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

The report feature won't ever be disabled moving forward, so checking for its existence in the tests is no longer needed. PR-URL: #32242 Fixes: #26293 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

This commit makes the --experimental-report CLI flag a no-op. PR-URL: #32242 Fixes: #26293 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

This commit updates the stability documentation for the report feature. All diagnostic report APIs are now listed as stable, with the exception of report-on-fatalerror, which still has a few bugs to work out. PR-URL: #32242 Fixes: #26293 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

This commit removes all #ifdef NODE_REPORT checks in the src directory. PR-URL: #32242 Fixes: #26293 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

PR-URL: #32242 Fixes: #26293 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

This commit makes the configure --without-report flag a no-op. This commit also updates a test that depends on the report CLI flags being conditionally present. PR-URL: #32242 Fixes: #26293 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

The report feature won't ever be disabled moving forward, so checking for its existence in the tests is no longer needed. PR-URL: #32242 Fixes: #26293 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

This commit makes the --experimental-report CLI flag a no-op. PR-URL: #32242 Fixes: #26293 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

This commit updates the stability documentation for the report feature. All diagnostic report APIs are now listed as stable, with the exception of report-on-fatalerror, which still has a few bugs to work out. PR-URL: #32242 Fixes: #26293 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

This commit removes all #ifdef NODE_REPORT checks in the src directory. PR-URL: nodejs#32242 Fixes: nodejs#26293 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

PR-URL: nodejs#32242 Fixes: nodejs#26293 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>