Skip to content

hdt94/dtc-mlops-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MLOps project - course from DataTalksClub

Online web service for prediction of duration taxi ride based on public datasets provided by the New York City Taxi & Limousine Commission - TLC regarding taxi trips.

Test online service by using simple web page at root home from Cloud Run service URL: https://rides-ucpkfmi6pq-ue.a.run.app/

Test online service through command line:

URL=https://rides-ucpkfmi6pq-ue.a.run.app/predict  # Cloud Run service URL
curl -X POST \
    -H 'Content-type: application/json' \
    -d '{"PULocationID": 43, "DOLocationID": 151, "trip_distance": 1.01}' \
    "${URL}"

General technical implementation descriptions:

  • The model is developed as a scikit-learn pipeline for preprocessing and using a linear regression model, experiment tracking is performed through MLflow, and model registry is accomplished through MLflow and Google Cloud Storage as models sink. This development stage is performed through Prefect orchestrator for accessing datasets and training model. The configuration of Evidently AI along with metrics database has been setup for model building pipeline.
  • The prediction service is deployed to Google Cloud where Cloud Build along with Artifact Registry are used to containerize the service and Cloud Run is used as execution environment for exposing as web service through Flask framework and MLflow to access models registry sink.
  • All cloud resources are provisioned using Terraform and most reproducibility steps are automated with Bash scripts.

Up and running (reproducibility)

Cloud resources

Prerequisites:

  • Unix-like system with following cli tools: bash, gcloud, jq, make, terraform

Provision cloud infrastructure on existing Google Cloud project:

  • if using Cloud Shell:
    make terraform_apply
  • if using other than Cloud Shell:
    gcloud auth application-default login
    
    GCP_PROJECT_ID=
    gcloud config set project "${GCP_PROJECT_ID}"
    
    make terraform_apply

Share the generated output.json file at infra/gcp/terraform/ if using other computer/instance for development, this enables development with no direct access to Terraform state. Copy to same location in development.

Development

Prerequisites:

  • Unix-like system with following cli tools: bash, gcloud, jq, make
  • Docker Compose
  • Python>=3.8 distribution

Setup access to Google Cloud if missing:

gcloud auth application-default login

GCP_PROJECT_ID=
gcloud config set project "${GCP_PROJECT_ID}"

MLflow server:

# Using default python3 >=3.8
make mlflow_server

# Using custom Python interpreter for compatibility
make mlflow_server PYTHON_BASE_INTERPRETER=python3.8

Metrics database and Prefect server:

make compose_up

Run ML pipelines:

Deploy web service to Cloud Run: services/rides/README.md

export MLFLOW_MODEL_URI=
export MLFLOW_MODEL_VERSION=
export REGION=us-east1
make rides_build_deploy

Test service:

URL=  # Cloud Run service URL
curl -X POST \
    -H 'Content-type: application/json' \
    -d '{"PULocationID": 43, "DOLocationID": 151, "trip_distance": 1.01}' \
    "${URL}"