MLOps project - course from DataTalksClub

Online web service for prediction of duration taxi ride based on public datasets provided by the New York City Taxi & Limousine Commission - TLC regarding taxi trips.

Test online service by using simple web page at root home from Cloud Run service URL: https://rides-ucpkfmi6pq-ue.a.run.app/

Test online service through command line:

URL=https://rides-ucpkfmi6pq-ue.a.run.app/predict  # Cloud Run service URL
curl -X POST \
    -H 'Content-type: application/json' \
    -d '{"PULocationID": 43, "DOLocationID": 151, "trip_distance": 1.01}' \
    "${URL}"

General technical implementation descriptions:

The model is developed as a scikit-learn pipeline for preprocessing and using a linear regression model, experiment tracking is performed through MLflow, and model registry is accomplished through MLflow and Google Cloud Storage as models sink. This development stage is performed through Prefect orchestrator for accessing datasets and training model. The configuration of Evidently AI along with metrics database has been setup for model building pipeline.
The prediction service is deployed to Google Cloud where Cloud Build along with Artifact Registry are used to containerize the service and Cloud Run is used as execution environment for exposing as web service through Flask framework and MLflow to access models registry sink.
All cloud resources are provisioned using Terraform and most reproducibility steps are automated with Bash scripts.

Up and running (reproducibility)

Cloud resources

Prerequisites:

Unix-like system with following cli tools: bash, gcloud, jq, make, terraform

Provision cloud infrastructure on existing Google Cloud project:

if using Cloud Shell:
```
make terraform_apply
```

if using other than Cloud Shell:

gcloud auth application-default login

GCP_PROJECT_ID=
gcloud config set project "${GCP_PROJECT_ID}"

make terraform_apply

Share the generated output.json file at infra/gcp/terraform/ if using other computer/instance for development, this enables development with no direct access to Terraform state. Copy to same location in development.

Development

Prerequisites:

Unix-like system with following cli tools: bash, gcloud, jq, make
Docker Compose
Python>=3.8 distribution

Setup access to Google Cloud if missing:

gcloud auth application-default login

GCP_PROJECT_ID=
gcloud config set project "${GCP_PROJECT_ID}"

MLflow server:

# Using default python3 >=3.8
make mlflow_server

# Using custom Python interpreter for compatibility
make mlflow_server PYTHON_BASE_INTERPRETER=python3.8

Metrics database and Prefect server:

make compose_up

Run ML pipelines:

analytics/ml-pipelines/duration/README.md

Deploy web service to Cloud Run: services/rides/README.md

export MLFLOW_MODEL_URI=
export MLFLOW_MODEL_VERSION=
export REGION=us-east1
make rides_build_deploy

Test service:

URL=  # Cloud Run service URL
curl -X POST \
    -H 'Content-type: application/json' \
    -d '{"PULocationID": 43, "DOLocationID": 151, "trip_distance": 1.01}' \
    "${URL}"

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
analytics		analytics
infra		infra
services/rides		services/rides
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
diagram.png		diagram.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analytics

analytics

infra

infra

services/rides

services/rides

.gitignore

.gitignore

Makefile

Makefile

README.md

README.md

diagram.png

diagram.png

Repository files navigation

MLOps project - course from DataTalksClub

Up and running (reproducibility)

Cloud resources

Development

About

Releases

Packages

Languages

hdt94/dtc-mlops-project

Folders and files

Latest commit

History

Repository files navigation

MLOps project - course from DataTalksClub

Up and running (reproducibility)

Cloud resources

Development

About

Topics

Resources

Stars

Watchers

Forks

Languages