Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notify the user when a notebook kernel autorestarts #6246

Merged
merged 35 commits into from May 15, 2019

Conversation

jasongrout
Copy link
Contributor

@jasongrout jasongrout commented Apr 25, 2019

References

Fixes #4273

This tackles some of the same issues as #4724, but is way less ambitious.

Code changes

  • Restarting a kernel now does not disconnect the websocket, but keeps the same websocket open. This helps us get all the messages from a restarted kernel. The logic around the kernel ready promise and status was reworked to reflect basically connection status (i.e., a kernel is not ready if it is not connected, but messages will still be queued and sent when ready automatically).
  • The kernel state transition logic is refined
  • A new autorestarting kernel state is added for when a kernel is restarted automatically by the server (for example, if the kernel runs out of memory and dies, the server recognizes this from the heartbeat, and tries restarting it automatically)
  • A dialog is popped up the first time an autorestarting status is set in a row in a notebook, notifying the user that a kernel is being restarted automatically. Like the classic notebook, if multiple autorestarts happen in a row, only one dialog pops up.
  • Documentation for shutting down a kernel that is already dead is changed to match existing behavior (i.e., shutting down a dead kernel does not return a rejected promise, but just clears the websocket and returns normally).
  • jlpm run coverage now does not bail on the first failed test, but instead runs through all tests. This is so we can see all test failures instead of just the first one in CI.
  • Converted more promise-based code to async/await
  • Kernel busy status indicator now shows busy when state is busy, starting, or restarting, instead of just showing idle when state was idle. Perhaps state should also show busy when state is autorestarting?
  • Consoles now don't make a separate kernel info request to get the banner, but instead just wait until the kernel is ready and use the cached kernel info.
  • Kernel messages are saved from the very start, and sent as soon as the kernel is ready (which means as soon as a request kernel info request succeeds, so the kernel is a known idle state). This means you don't need to wait until a kernel is ready before sending a message, and you can just await the resulting future.done promise. This cuts down on a lot of test console errors where the kernel was disposed before it was done getting replies back.
  • The logic around disposing kernel futures on kernel disposal was reworked to be more careful about edge cases.
  • Several tests are restructured to cut down on the number of kernel startups in tests to save time.
  • Fixed a few other errors in tests, and now run a few tests that had been skipped because they spuriously seemed to fail (around kernel restarts)

User-facing changes

A dialog pops up in a notebook when a kernel is autorestarted, with the same text as the classic notebook.

import os
os.kill(os.getpid(), 9)

Screen Shot 2019-05-15 at 3 39 04 AM

Backwards-incompatible changes

  • Kernel state transitions are subtly different and have a new state, autorestarting.

@jupyterlab-dev-mode
Copy link

Thanks for making a pull request to JupyterLab!

To try out this branch on binder, follow this link: Binder

@jasongrout jasongrout added this to the 1.0 milestone Apr 27, 2019
@jasongrout
Copy link
Contributor Author

jasongrout commented May 1, 2019

One of the things I am seeing in my logging is that if I open a notebook, then open an associated console, then close the associate console, the websocket for the console kernel connection stays open, which means that it receives all of the broadcast iopub messages. We should probably close the kernel connection when a console is closed.

Edit: I opened #6290 to deal with this issue

@jasongrout
Copy link
Contributor Author

This is now getting much closer. Restarts now don't disconnect and reconnect the websocket. The kernel status should switch from restarting to autorestarting when an autorestart happened, so we can listen for that state to notify the user. Popping up a dialog is the next step.

…he banner.

Instead of requesting the kernel info a second time, just use the original kernel info we request for a kernel connection.
This makes sure we don’t miss messages during the restart, and helps the browser not have so many different websocket connections.
Now that we don’t reconnect when restarting, we’ll just show the banner right away.
Tests are still failing. This commit is a checkpoint in the work.
This gets the test suite passing, and narrows down our work to the tests that are still failing.
I think the handler wasn’t getting removed after the check, so it was failing on the next signal emission. testEmission automatically removes the handler after the check, taking care of this issue.
jupyter/notebook#3705 noted some issues with kernel messages. I think these were mainly because we disconnected and reconnected the websocket on restart. We no longer do this, and these tests seem to pass, so let’s remove the TODO and keep an eye on it.
…n to match existing behavior.

Throwing an error when shutting down a dead kernel causes a lot of test failures. So we keep the existing behavior and update the documentation, rather than changing the behavior to match the documentation.
@jasongrout jasongrout changed the title [WIP] Autorestart Notify the user when kernel autorestarts May 15, 2019
@jasongrout
Copy link
Contributor Author

@blink1073, @ian-r-rose, and everyone else - I think this is ready for review.

@jasongrout jasongrout changed the title Notify the user when kernel autorestarts Notify the user when a notebook kernel autorestarts May 15, 2019
Copy link
Member

@blink1073 blink1073 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging through that state machine, nicely done!

@blink1073 blink1073 merged commit dcda928 into jupyterlab:master May 15, 2019
@lock lock bot added the status:resolved-locked Closed issues are locked after 30 days inactivity. Please open a new issue for related discussion. label Aug 7, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Aug 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pkg:notebook pkg:services status:resolved-locked Closed issues are locked after 30 days inactivity. Please open a new issue for related discussion. tag:Feature Parity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No warning/error message when kernal is forced to restart
2 participants