Skip to content

noteline allows to streamline Jupyter Notebooks from experimenting to production

License

Notifications You must be signed in to change notification settings

aarondietz234/noteline

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

Problem Statement

Notebook Streamlined (noteline) attempts to solve the following problem:

It takes far too long for a fully functional notebook to reach production.

These time delays have many consequences, including:

  • Many notebooks that provide value never reach production
  • When notebooks do reach production, their logic needs to be rewritten from the ground up to fit into the pre-existing production infrastructure
  • It is almost impossible to execute a notebook as part of a pipeline

noteline as the Solution

noteline aims to solve the main problem and reduce the amount of time required to deploy artifacts generated by Notebooks into production. To achieve this, we are planning to create the following components:

  • noteline-schema - a project that defines how all the metadata stored in Notebooks
  • noteline-sdk-core - core noteline SDK for manipulating the Notebooks metadata
  • noteline-sdk-gcp - SDK for doing different operations with the fully reproducible Notebooks on GCP
  • noteline-cli-gcp - CLI for executing different actions with Notebooks on GCP
  • noteline-kf - Notebook as a step in Kubeflow Pipeline
  • noteline-examples - collections of different reference architectures of building different pipelines

Key Principles

Two main principles distinguish noteline among other projects that are aiming to solve the same or similar problem:

  • We are using UODP during the implementation of the noteline
  • noteline have "low bar, high ceiling."

"low bar, high ceiling." principle means that if you are starting building new infrastructure, noteline will have a set of opinionated pre-configured ways to bootstrap a new Pipeline for you. At the same time, if you already have implemented infrastructure, you can pick any noteline sub-modules and use only them. In many cases, you will be able to keep your current infrastructure and build your pipeline with minimum changes.

Alternative Solutions

Several pre-existing solutions are focusing on solving a similar (or very close) problems. Apart from the solutions that are aiming to solve the same problem ("it takes weeks for a fully functional notebook to reach production"), we also have evaluated solutions that solving specific sub-problems:

  • non-reproducibility of Notebooks outside of the original environment
  • inability to deploy inference logic directly from the Notebook
  • inability to execute Notebooks in the background

There is a solution that exists for each of the problems from the list. However, no one system connects all the answers in one simple-to-use system. Some evaluated solutions do provide reach and connected ecosystem; however, at the same time, they are very opinionated and therefore requiring to redesign infrastructure around them.

Existing Solutions

Hosted

  • GCP Cloud AI Platform
  • Azure
  • SageMaker
  • DeepNote
  • PaperSpace
  • DataBrick
  • Netflix stack: PolyNote/Papermill

About

noteline allows to streamline Jupyter Notebooks from experimenting to production

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published