Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

capture an error for unhandledRejection #2366

Closed
trentm opened this issue Oct 12, 2021 · 11 comments · Fixed by #3323
Closed

capture an error for unhandledRejection #2366

trentm opened this issue Oct 12, 2021 · 11 comments · Fixed by #3323
Assignees
Labels
8.9-candidate agent-nodejs Make available for APM Agents project planning. good first issue
Milestone

Comments

@trentm
Copy link
Member

trentm commented Oct 12, 2021

Currently the agent will capture an APM error for an uncaughtException (process event) -- depending on the captureExceptions config option (default true). We should do the same for unhandledRejection. I'm not sure if we'd want that to be a separate config option or piggy-back on captureExceptions.

// example-unhandled-rejection.js
var apm = require('./').start({
  metricsInterval: '0s',
  serviceName: 'example-unhandled-rejection'
})

Promise.reject('boom')

Using node v12, without the APM agent:

% ELASTIC_APM_ACTIVE=false node example-unhandled-rejection.js
(node:68494) UnhandledPromiseRejectionWarning: boom
(node:68494) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:68494) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

and with the APM agent:

% node example-unhandled-rejection.js
(node:68500) UnhandledPromiseRejectionWarning: boom
(node:68500) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:68500) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

No "error" event is reported to APM server in the second case. It should be, without changing the behaviour of the node app (which I hope is possible).

Node has the following option:

  --unhandled-rejections=...      define unhandled rejections behavior. Options
                                  are 'strict' (raise an error), 'warn'
                                  (enforce warnings) or 'none' (silence
                                  warnings)

and the default value for that has changed over the major node.js versions. I think this probably means we should make this separately configurable. The default should be to capture an exception.

@github-actions github-actions bot added the agent-nodejs Make available for APM Agents project planning. label Oct 12, 2021
@trentm
Copy link
Member Author

trentm commented Oct 12, 2021

This came from discussion from a user that was confused that "errors are not being reported to APM." The issue was that their app output showed ... UnhandledPromiseRejectionWarning ... warning output, but there was nothing reported to APM.

@sardanam
Copy link

sardanam commented Jan 8, 2022

Look like i am hitting the same issue. due to ELK APM server network problem, agent is timing out. Due to STRICT configuration at node level, this result into application call itself failing, Though we don't want application call to fail just because APM server is not reachable/timing out.

@trentm
Copy link
Member Author

trentm commented Jan 10, 2022

@sardanam I think this might be something different. It is definitely a bug in the APM agent if it is crashing the application when it is unable to talk to APM server. Would you be willing to open a separate issue (to not get mixed up with this one) and provide some more details?

  • Do you have a stacktrace and console output you can share?
  • What elastic-apm-node version are you using?

@david-luna
Copy link
Member

Hi @trentm

Two questions regarding this issue:

  • When you say no "error" is reported to APM you mean that unhandled rejections should be treated as errors?
  • A promise could be rejected with a reason that could be a instance of an Error or something completely different. So should the agent construct and Error object from any rejection reason which is not an error?

@trentm
Copy link
Member Author

trentm commented Feb 8, 2023

@david-luna

  • When you say no "error" is reported to APM you mean that unhandled rejections should be treated as errors?

I mean that the APM agent should report an "error" event to the APM server so it can be reported in the Kibana APM app.

  • A promise could be rejected with a reason that could be a instance of an Error or something completely different. So should the agent construct and Error object from any rejection reason which is not an error?

The .captureError() agent method does support an Error, a message string, or a specifc parameterized message object. So perhaps the logic could be:

  • if err instanceof Error then use it as is
  • if typeof err === 'string' the use it as is
  • else: Perhaps we could use err.toString()?

While looking at this, I notice the (new to me) origin second argument to Event 'uncaughtException' added in Node v12, v10.17.0.

This makes me wonder if just handling uncaughtException suffices. For example, if user code has an unhandled rejection and --unhandled-rejections is set to strict or throw (the default for modern Node.js versions) -- i.e. values for which the user considers an unhandled rejection to be an error case -- then our handling of process.on('uncaughtException') will already capture an APM error event. If --unhandled-rejections is set to one of the other values (e.g. "warn", "none"), then this indicates the user does not consider an unhandled rejection to be an error case. In other words, maybe this issue isn't necessary.

If I re-run the original example-unhandled-rejection.js I posted above -- now with a newer Node.js version which has --unhandled-rejections=throw as the default -- then we get the expected behaviour:

% node --version
v16.19.0

% node example-unhandled-rejection.js
{"log.level":"info","@timestamp":"2023-02-08T18:40:11.912Z","log":{"logger":"elastic-apm-node"},"ecs":{"version":"1.6.0"},"message":"Sending error to Elastic APM: {\"id\":\"51170e84eb39acdf94f87d3bb5020173\"}"}

and the error reported to APM server is:

    {
        "error": {
            "id": "51170e84eb39acdf94f87d3bb5020173",
            "timestamp": 1675881611904000,
            "context": {
                "user": {},
                "tags": {},
                "custom": {}
            },
            "exception": {
                "message": "This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason \"boom\".",
                "type": "UnhandledPromiseRejection",
                "handled": false,
                "code": "ERR_UNHANDLED_REJECTION",
                "stacktrace": [
                    {
                        "filename": "UnhandledPromiseRejection: This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch. The promise rejected with the reason \"boom\".",
                        "library_frame": true,
                        "abs_path": "UnhandledPromiseRejection: This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch. The promise rejected with the reason \"boom\"."
                    }
                ]
            },
            "culprit": "undefined (UnhandledPromiseRejection: This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch. The promise rejected with the reason \"boom\".)"
        }
    }

I think this issue can be closed as not necessary.
@david-luna Make sense? I'd love a sanity check.


Side note: Note the separate issue for perhaps switching to uncaughtExceptionMonitor (Added in Node.js: v13.7.0, v12.17.0): #2367 This would be a bit complex to do in the agent while we still support earlier Node.js versions.

@david-luna
Copy link
Member

Hi @trentm

This came from discussion from a user that was confused that "errors are not being reported to APM." The issue was that their app output showed ... UnhandledPromiseRejectionWarning ... warning output, but there was nothing reported to APM.

This makes me thing some customers either were not aware of this option or development team did not have control on how the node process is started in the host machine.

Implementing a handler for rejections may come handy in the second case but that implementation has already an expiration date. I guess a paragraph in captureExceptions config option could be enough. The note should explain that APM is capable of reporting unhandled rejections with the proper value in --unhandled-rejections option.

But also I think it might be a good idea to inform in the console messages that the --unhandled-rejections can be changed to catch them as errors. Something of the sorts

process.on('uncaughtException', function () {
  console.warn('Elastic APM detected an unhandled Promise rejection. If you want the agent to report this please make sure --unhandled-rejections option is set to strict mode');
})

This way we notice:

  • customers with apps running older version of node
  • customers with recent version of node but with --unhandled-rejections set to warn.

So to summarize we can do 2 things:

  • update docs
  • log to stdout about the rejection and how to allow the agent to capture it.

@trentm
Copy link
Member Author

trentm commented Feb 13, 2023

This makes me thing some customers either were not aware of this option or development team did not have control on how the node process is started in the host machine.

Or perhaps explicitly using --unhandled-rejection=throw with older versions of node doesn't result in node sending the uncaughtException process event? I didn't go back through older node versions trying "node --unhandled-rejection=throw|strict example-unhandled-rejection.js" to see if/where that changed.

Most likely "some customers either were not aware of this option", as you suggest.

I guess a paragraph in captureExceptions config option could be enough. ...

Yes, and perhaps also in the shared section of "docs/shared-set-up.asciidoc":

// Error logging
// tag::error-logging[]
By default, the Node.js agent will watch for uncaught exceptions and send them to Elastic APM automatically.
But in most cases, errors are not thrown but returned via a callback,
...

that is used in a number of the "Get started ..." doc pages, e.g. https://www.elastic.co/guide/en/apm/agent/nodejs/current/express.html#express-error-logging

But also I think it might be a good idea to inform in the console messages that the --unhandled-rejections can be changed to catch them as errors. Something of the sorts

process.on('uncaughtException', ...

Did you mean process.on('unhandledRejection', ... here? If I understand correctly, if a user uses node --unhandled-rejections=warn|none ... then there will not be a process 'uncaughtException' event.


If we consider only more recent versions of node, for a moment -- those versions where --unhandled-rejections=throw is the default -- a user using node --unhandled-rejections=warn ... is an explicit choice by them to consider unhandled rejections to not be errors. I'm not sure there is a lot of value in adding a console.warn(...) message at that point.
Granted there could be the case where the person adding --unhandled-rejections=warn and the person monitoring the deployed app could be on separate teams -- one in Dev, one in SRE for example.

I haven't played with code for this, but my suspicion and worry is that if the APM agent added a 'unhandledRejection' handler, then it would potentially change the user's application behaviour for the case when --unhandled-rejections is throw. The APM agent should avoid changing application behaviour if it can.

Aside: if you want to read a LONG passionate discussion on how --unhandled-rejections=throw can to be the default: see nodejs/node#33021 which changed the default for node v15.0.0.

@david-luna
Copy link
Member

david-luna commented Feb 15, 2023

Hi @trentm

I'll run some test but I'd say that adding a listener to the unhandledRejection event won't change app behavior since it will not swallow the rejection and other listeners (form the app code) will be called as well.

The goal of my proposal is:

  • if app is running on an older version with the default mode set to warn the agent would log the recommendation
  • if app is running a more recent version of node, so default mode is throw, agent just reports the error since the rejection handler is not called
  • and the tricky one: if app is running with a recent version but the flag is set on purpose to warn we have to options.
    • log the message like in 1st case
    • not logging at all. we could inspect execArgv to see if the flag was set on purpose

On these two options I'd like the one which logs anyway. I like the tools I?m using to give me hints on how to use them but probably is best to not log since probably stdout is store somewhere and we are adding more bytes to it

@david-luna
Copy link
Member

So I did some tests with different node versions to see how it behaves when adding the handlers and doing some variations on the --unhandled-rejections mode.

The script used to perform tests is quite straightforward. We inspect process object to resolve which is the mode and if it's overwritten by the CLI option when the process starts. then we add a couple of handlers for rejections and an exception handler.

// Check the version
const [major, minor, patch] = process.versions.node.split('.').map(n => parseInt(n, 10))
console.log('Node major version: ' + major)

// Check if --unhandled-rejections override is present and resolve the mode
const defaultMode = major >= 15 ? 'throw' : 'warn';
const overrideMode = process.execArgv.find(arg => arg.indexOf('--unhandled-rejections') !== -1)
const currentMode = overrideMode ? overrideMode.split('=')[1] : defaultMode
console.log('--unhandled-rejections mode is ' + currentMode, overrideMode ? '(override)' : '(default)')

// This could be APM handler
process.on('unhandledRejection', function (rej) {
  console.log('APM agent detected unhandledRejection', rej);
})

// And this the monitored APP handler
process.on('unhandledRejection', function (rej) {
  console.log('Application detected unhandledRejection', rej);
})

// APM or Application handler for exceptions
process.on('uncaughtException', function (err) {
  console.log('uncaughtException => ', err);
})

Promise.reject('boom')

I can summarize the mode is consistent across versions and that indeed adding a handler changes the behavior of the App when the mode is the default throw or warn-with-error-code. The handler prevents the process to be terminated and probably leaving the App in an unknown state.

About the other modes:

  • in strict mode an exception is thrown so the agent is already handling it
  • in warn mode Node will not print the message and call the handler instead
  • in none mode the handler will get called anyway

This validates your thought about changing the behavior of the App because of the handlers. IMHO the most we can do is to detect the mode & version at startup and provide a suggestion if applies.

  • Node 12 and no CLI option => suggest to pass it with value throw
  • Node 16 and CLI option set to none => suggest to remove the option

So for sure the documentation update is a must and this suggestion message at startup is a nice to have that could avoid some questions from customers in the future.

@trentm
Copy link
Member Author

trentm commented Feb 21, 2023

@david-luna Thanks for verifying that it would cause app behaviour changes.

So for sure the documentation update is a must

I think some doc updates could be helpful to users, yes.

and this suggestion message at startup is a nice to have that could avoid some questions from customers in the future.

I'm still unsure about this. For one, I don't recall any user questions about this in the past two years. As well:

  • Node 12 and no CLI option => suggest to pass it with value throw

In this case, the default is warn and its message already mentions using strict:

(node:98383) UnhandledPromiseRejectionWarning: boom
(node:98383) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:98383) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

The throw option didn't exist at the time.

  • Node 16 and CLI option set to none => suggest to remove the option

This suggests that the APM agent knows better than the user, which isn't always the case. Granted, I don't know of a valid reason for using 'none'.

@david-luna
Copy link
Member

Hi @trentm

I'll proceed to update the docs and do not implement the suggestion message in the node process. As you said this implies APM agent knowing better than the user and it is not true always.

Let me know if you have any suggestion for the copy. You can write on the PR (which I'm going to create) directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.9-candidate agent-nodejs Make available for APM Agents project planning. good first issue
Projects
3 participants