Skip to content

Example end-to-end ml pipeline build with the Sagemaker Python SDK

License

Notifications You must be signed in to change notification settings

datadrivers/effective-guide-mlops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

effective-guide-mlops

End-to-end machine learning pipeline with Sagemaker Python SDK

Sagemaker Python

This repository provides an example end-to-end machine learning pipeline on AWS build using the Sagemaker Python SDK. It leans on other resources (e.g. here and here), however, it provides a unified end-to-end example in a notebook from data processing to deployment of a REST API. This not production ready, but it will give you a good primary intuition how to orchestrate the ml lifecycle on AWS via the Sagemaker SDK.

The main ressource for this guid is the notebook ml_pipeline.ipynb in the folder notebooks. The easiest way to follow along the tutorial would be to launch a notebook instance on AWS Sagemaker and pull the repository into your jupyterlab environment.

1. Data

The Penguin Dataset from Alison Horst is an alternative to the famous iris dataset that can be used for demonstrating various ml tasks. Read more here. Penguins

species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 female 2007
3 Adelie Torgersen 40.3 18 195 3250 female 2007

2. Objective

The goal is to train a classifier that predicts the sex/gender of a penguin based on all other variables available.

3. Ressources

Notebooks:
  • stored in /notebooks
  • eda.ipynb visual exploration of the data
  • ml_pipeline.ipynb orchestrates preprocessing of the data, model training and deployment of the model as endpoint

4 Tutorial Wolkthrough

  • head over to notebooks.ml_pipeline.ipynb and follow the procedure