Automated Predictions with Machine Learning on AWS

An end-to-end example of a serverless machine learning pipeline for multiclass classification on AWS with SageMaker Pipelines, Data Wrangler, Athena and XGBoost. See this blog post for details.

Prerequisites

Node.js
Python
AWS CLI

Optional:

MaxMind license key (free)

Installation

Before you proceed, set up MAXMIND_LICENSE_KEY environment variable with a valid license key. If not provided, IP address lookup will be disabled.

Install all required dependencies with command:

bash init.sh

Deployment

The following command will deploy all resources and will launch an inference server:

bash deploy.sh

The deployed infrastructure is serverless and does not have any hourly costs associated to it when not used, except for the inference server, which costs $0.056 per hour. Consider shutting down the inference server when you don't need it (see npm run stop below). Shutting down the server does not remove any data.

All resources are deployed to the Oregon region (us-west-2) and are managed by three CloudFormation stacks.

Execution

To start the ML pipeline execution, run the command:

bash invoke.sh

It will return an AWS Console URL to the Step Functions pipeline that you can use to track the execution. Additionally, go to SageMaker and launch the Studio application to check the ML workflow progress.

Project structure

data

Test data that will be deployed to S3.

infrastructure

AWS CDK project with infrastructure definition.

npm commands:

npm run bootstrap - deploy the AWS CDK (required when deploying for the first time);
npm run deploy - deploy the main infrastructure (no hourly costs);
npm run runtime - deploy the runtime infrastructure (hourly costs incurred);
npm run stop - delete the runtime infrastructure (the data will be retained).

pipeline

SageMaker pipeline definitions and Python scripts.

service

Serverless.js project with Lambda API.

npm commands:

npm run deploy - deploy the service.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.vscode

.vscode

data

data

infrastructure

infrastructure

pipeline

pipeline

service

service

LICENSE

LICENSE

README.md

README.md

deploy.sh

deploy.sh

img.png

img.png

init.sh

init.sh

invoke.sh

invoke.sh

Repository files navigation

Automated Predictions with Machine Learning on AWS

Prerequisites

Installation

Deployment

Execution

Project structure

data

infrastructure

pipeline

service

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.vscode		.vscode
data		data
infrastructure		infrastructure
pipeline		pipeline
service		service
LICENSE		LICENSE
README.md		README.md
deploy.sh		deploy.sh
img.png		img.png
init.sh		init.sh
invoke.sh		invoke.sh

License

Devalent/aws-realtime-predictions

Folders and files

Latest commit

History

Repository files navigation

Automated Predictions with Machine Learning on AWS

Prerequisites

Installation

Deployment

Execution

Project structure

About

Topics

Resources

License

Stars

Watchers

Forks

Languages