Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle nextTick task errors #37

Closed
wants to merge 1 commit into from
Closed

Conversation

slorber
Copy link

@slorber slorber commented Mar 24, 2015

When draining the nextTick task queue, if there is a single task error, it is thrown and stop immediately the draining of the queue.
Also it leaves the attribute draining = true, making subsequent calls to nextTick not trigger the draining again)

I think the draining should complete for all queued tasks, and errors on single tasks should not stop the other queued tasks to execute.

I choose to handle exceptions in a setTimeout as it seems to be more natural than using a logger. As exceptions are "exceptional" this should probably not be a big performance problem.

This solves serious issues that appeared with nextTick, with Browserify >= 8.1.0 (for example with the Q promise library).

See:
kriskowal/q#669
browserify/browserify#1179

…cistant state and stop process in case of a single nextTick task error
@calvinmetcalf
Copy link
Collaborator

so the root of the error is that Q is misidentifying browserified code as being in node as opposed to the browser, like node this makes no guarantees about what an uncaught exception will do inside of a next tick.

That being said if you add a second set timeout after this line which checks if draining is true and if it is cleans up it will likely accomplish the same thing without causing a massive performance penalty.

@calvinmetcalf
Copy link
Collaborator

line 11 would be another place you could schedule it, something like

...
function cleanUpNextTick() {
    draining = false;
     queue = [];
}
function drainQueue() {
    if (draining) {
        return;
    }
    var timeout = setTimeout(cleanUpNextTick, 0);
    ...
    draining = false;
    clearTimeout(timeout);

this would make sure all pending events were booted from the queue as well

edit: was able to simplify it more

@slorber
Copy link
Author

slorber commented Mar 24, 2015

Sorry I won't be able to help more, I'm not a nodejs dev and I don't know what's the intended behavior of node's nextTick.

Just giving a solution that worked fine for me.

Imho using a cleaup timeout won't solve all problems as when registrering 2 Q handlers in then if one of them raises an error the second shoud still be called. It's not just about cleaning it's about ensuring all tasks queued will be executed, and that is the behavior that we had before with a simple setTimeout for each task

@calvinmetcalf
Copy link
Collaborator

Imho using a cleaup timeout won't solve all problems as when registrering 2 Q handlers in then if one of them raises an error the second shoud still be called. It's not just about cleaning it's about ensuring all tasks queued will be executed, and that is the behavior that we had before with a simple setTimeout for each task

since Q uses it's own task queue this likely won't break it.

The tricky thing is just figuring out the right behavior because node just crashes the process here

@defunctzombie
Copy link
Owner

I am open to a PR that fixes stuff up here, however I think this could be a Q issue since it is trying to be clever and identify where it is running instead of using standard js primitives like setTimeout(0).

@calvinmetcalf
Copy link
Collaborator

see #38 for my take on it, downside is that while it handles errors better it is a bit more complex

@slorber
Copy link
Author

slorber commented Mar 24, 2015

I don't know... what is the purpose of this nextTick shim?

I guess the original ambition was to permit to nodejs modules using nextTick to be able to run in the browser with Browserify right?

So the default behavior of NodeJS, both with setTimeout / nextTick is to crash whenever there is an uncaught exception as far as I understand. This permits to fail-fast, and to restart the process in a clean state.

In my opinion it does not really make sense to reproduce this behavior in the browser since we don't want the user UI to be totally unusable, and by the way this is not the behavior of setTimeout in the browser, as when an error is thrown it does not crash the browser/tab but just log an uncaught exception in the console.

The question is, why would we want to have a different behavior in the browser for nextTick? If the UI were stateless and we could recover it easily, then ok to fail fast. But UI's are stateful, so we must fail safe in all cases, and not make the UI totally unusable on any uncaught exception.

So for me it does not make sense to reproduce the fail-fast behavior of node's nextTick with the browser.

The former implementation was based on setTimeout and it worked fine right (I don't see any issue)? So I'm totally ok for perf improvements but why changing a behavior that seems to be ok for everybody until now?

process.nextTick(function() { throw new Error("error"); });
process.nextTick(function() { console.log("message"); });

The old behavior would log both the error and the message.
The current behavior would only log the error (+ leaves in inconsistent state but this is will be fixed and is not my point here).

I think the old behavior is better for the browser.

@calvinmetcalf
Copy link
Collaborator

The old behavior would log both the error and the message.

that was not the case previously on all browsers as inconsistently certain browsers that used mutation observer would copy the array queue first while others would unshift, but give #38 a test as that should do what you want

@slorber
Copy link
Author

slorber commented Mar 24, 2015

@calvinmetcalf yes, it solves Q test case as it does not leave an inconsistant state and thus next ticks can still be executed.

But notice that a very much simpler implementation also solves my usecase. I really don't understand why you would want to handle errors in a planned cleanup task.

process.nextTick = function(task) {
    if ( !queue ) {
        queue = [task];
    } else {
        queue.push(task);
    }
    if ( !nextTickTimer ) {
        nextTickTimer = setTimeout(function processNextTick() {
            nextTickTimer = undefined;
            for (var i = 0, len = queue.length; i < len; i++) {
                queue[i]();
            }
        },0);
    }
};

I don't really understand why your solution is much more complex than that, am I missing something?

Also I noticed that in your cleanup task you have some logic to be able to reschedule queued tasks that could not be processed because an error was thrown in a previous task. This behavior seems fine for me. (at least it's what I understand of your code)
After all, all my concern about former vs current behavior is about guaranteeing that all tasks submitted will be called, even if there are failed tasks in the current tick. I am ok for making current tick fail fast and replanning not called tasks to the next tick.

But I still think it could be much simpler. Why not using a try/catch/finally instead of using timers to handle errors and recovering?

@calvinmetcalf
Copy link
Collaborator

in you example

process.nextTick(function() { throw new Error("error"); });
process.nextTick(function() { console.log("message"); });

won't print message until process.nextTick is called some other time because nothing will restart the draining after an error.

In general if you have the debugger open set to pause on uncaught errors, do you want that to pause in the setTimeout in this library or in the original place the error was?

@slorber
Copy link
Author

slorber commented Mar 24, 2015

@calvinmetcalf I agree that with my implementation the "message" will never be printed. But this is actually because my implementation is a simpler rewrite of current behavior which is not fine for me :)

I could rewrite my pull request like that:

process.nextTick = function(task) {
    if ( !queue ) {
        queue = [task];
    } else {
        queue.push(task);
    }
    if ( !nextTickTimer ) {
        nextTickTimer = setTimeout(function processNextTick() {
            nextTickTimer = undefined;
            for (var i = 0, len = queue.length; i < len; i++) {
                try {
                    queue[i]();
                } catch(e) {
                      setTimeout(function() {
                         throw e;
                      },0);
                }
            }
        },0);
    }
};

which will print "message" and is much simpler than my initial PR imho.

I understand your point of loosing the original place of the error when using setTimeout like that... I think this happens when you rethrow an error right?

I think your solution could still be simplified, have you tried using a try/finally without catching the error? like

var processCompleted = false;
try {
   // tie up a resource
   processQueue()
   processCompleted = true
}
finally {
   if ( !processCompleted ) {
      reschedule() ....
   }
}

@calvinmetcalf
Copy link
Collaborator

so based on some perfs the try block isn't as bad as I expected it to be, interesting

@calvinmetcalf
Copy link
Collaborator

though a try/finally does cause some issues in firefox jsperf.com/asyncstuff/9/edit

edit: could be the labeled statment

@slorber
Copy link
Author

slorber commented Mar 24, 2015

yes @calvinmetcalf and you could probably make it faster by moving the try outside of the loop, and maintaining the last processed task index to reschedule tasks that were never called.

Something like that:

var queue = undefined;
var nextTickTimer = undefined;

function processNextTick() {
    nextTickTimer = undefined;
    var i = 0;
    try {
        for (var len = queue.length; i < len; i++) {
            queue[i]();
        }
    } finally {
        // If there is an error and there's at least one unprocessed task we reschedule it in next tick
        if (i+1 <= queue.length) {
            queue = queue.slice(i+1);
            nextTickTimer = setTimeout(processNextTick, 0);
        }
    }
}


process.nextTick = function (task) {
    if (!queue) {
        queue = [task];
    } else {
        queue.push(task);
    }
    if (!nextTickTimer) {
        nextTickTimer = setTimeout(processNextTick, 0);
    }
};

@calvinmetcalf
Copy link
Collaborator

var queue = [];
var draining = false;

function drainQueue() {
    if (draining) {
        return;
    }
    draining = true;
    var currentQueue;
    var len = queue.length;
    while(len) {
        currentQueue = queue;
        queue = [];
        var i = -1;
        try {
          while (++i < len) {
              currentQueue[i]();
          }
        } finally {
          if (i++ < len) {
            queue = currentQueue.slice(i).concat(queue);
            draining = false;
            return drainQueue();
          }
        }

        len = queue.length;
    }
    draining = false;
}
process.nextTick = function (fun) {
    queue.push(fun);
    if (!draining) {
        setTimeout(drainQueue, 0);
    }
};

is also really slow in firefox, I think firefox doesn't like finally blocks

@slorber
Copy link
Author

slorber commented Mar 25, 2015

@calvinmetcalf I still can't understand the complexity of your code requiring 2 variables for queue and 2 loops. Also calling drainQueue(); in a finally block seems not appropriate at all for me. finally blocks are intended for handling cleanup and should rather not trigger extra calls that could lead so a recursion

@calvinmetcalf
Copy link
Collaborator

The 2 queues are because you need to deal with process.nextTick being
called recursively which in this code causes it to be put into a new array,
not the one we are iterating. You could put them into the current array and
that works fine until you get a situation where nested nextTicks causes the
length of the array to be too long to fit into a 32 bit integer at which
point a massive slowdown happens in v8 (chrome and node) due to the type
signature of the function changing so the optimized code that was generated
for the function is throw out (the function is almost definitely going to
be optimized at this point as we are like 64k iterations in) this just
tanks performance. I discovered this while perf testing a promise library
and after a certain threshold of parallel things going on performance
plummeted and it eventually ran out of memory. So the 2 queues are to
prevent any one queue from getting too long, it also means we don't have
this massive array hanging around the whole time and functions can be
garbage collected much sooner. A comment in there would likely be helpful.

With regards to the finally block, that is what they are used for and that
is a best practice but in the end is just a block of code that always runs
after the try and catch clauses. I don't like the recursive function call
either and originally tried to do it with a labeled jump statement but it
seems that having the finally in there caused firefox to be about half as
fast.

On Wed, Mar 25, 2015, 6:08 AM Sébastien Lorber notifications@github.com
wrote:

@calvinmetcalf https://github.com/calvinmetcalf I still can't
understand the complexity of your code requiring 2 variables for queue
and 2 loops. Also calling drainQueue(); in a finally block seems not
appropriate at all for me. finally blocks are intended for handling cleanup
and should rather not trigger extra calls that could lead so a recursion


Reply to this email directly or view it on GitHub
#37 (comment)
.

@calvinmetcalf
Copy link
Collaborator

try 1 was

function drainQueue() {
    if (draining) {
        return;
    }
    draining = true;
    var currentQueue, i;
    var len = queue.length;
    var done;
    while(len) {
        currentQueue = queue;
        queue = [];
        i = -1;
        done = false;
        loopit: while (++i < len) {
             try {
                currentQueue[i]();
                done = true;
            } finally {
                if (!done) {
                    continue loopit;
                }
            }
         }
        len = queue.length;
    }
    draining = false;
}

@defunctzombie
Copy link
Owner

Has this PR been superseded by #38 ?

@calvinmetcalf
Copy link
Collaborator

Yes

On Wed, Apr 8, 2015, 3:04 AM Roman Shtylman notifications@github.com
wrote:

Has this PR been superseded by #38
#38 ?


Reply to this email directly or view it on GitHub
#37 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants