Skip to content

An end-to-end example of a serverless machine learning pipeline for multiclass classification on AWS with SageMaker Pipelines, Data Wrangler, Athena and XGBoost.

License

Notifications You must be signed in to change notification settings

Devalent/aws-realtime-predictions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated Predictions with Machine Learning on AWS

An end-to-end example of a serverless machine learning pipeline for multiclass classification on AWS with SageMaker Pipelines, Data Wrangler, Athena and XGBoost. See this blog post for details.

Prerequisites

  • Node.js
  • Python
  • AWS CLI

Optional:

Installation

Before you proceed, set up MAXMIND_LICENSE_KEY environment variable with a valid license key. If not provided, IP address lookup will be disabled.

Install all required dependencies with command:

bash init.sh

Deployment

The following command will deploy all resources and will launch an inference server:

bash deploy.sh

The deployed infrastructure is serverless and does not have any hourly costs associated to it when not used, except for the inference server, which costs $0.056 per hour. Consider shutting down the inference server when you don't need it (see npm run stop below). Shutting down the server does not remove any data.

All resources are deployed to the Oregon region (us-west-2) and are managed by three CloudFormation stacks.

Execution

To start the ML pipeline execution, run the command:

bash invoke.sh

It will return an AWS Console URL to the Step Functions pipeline that you can use to track the execution. Additionally, go to SageMaker and launch the Studio application to check the ML workflow progress.

Project structure

Test data that will be deployed to S3.

AWS CDK project with infrastructure definition.

npm commands:

  • npm run bootstrap - deploy the AWS CDK (required when deploying for the first time);
  • npm run deploy - deploy the main infrastructure (no hourly costs);
  • npm run runtime - deploy the runtime infrastructure (hourly costs incurred);
  • npm run stop - delete the runtime infrastructure (the data will be retained).

SageMaker pipeline definitions and Python scripts.

Serverless.js project with Lambda API.

npm commands:

  • npm run deploy - deploy the service.