Skip to content

A machine learning at scale demo on flight delay prediction. The project includes an exploration of a series of data transformation and ML pipelines in Apache Spark (via Databricks).

Notifications You must be signed in to change notification settings

chenliny-zz/Flight_Delay_Prediction

Repository files navigation

Machine Learning At Scale (Spark, Spark SQL, Spark ML)

Access the project via Databricks here

image

Flight delays create problems in scheduling for airlines and airports, leading to passenger inconvenience, and huge economic losses. As a result, there is growing interest in predicting flight delays beforehand in order to optimize operations and improve customer satisfaction. The objective of this playground project is to predict flight departure delays two hours ahead of departure at scale. The project includes an exploration of a series of data transformation and ML pipelines in Apache Spark (using Databricks). It concludes with some challenges faced along the journey and some key lessons learned.

The Databricks notebook is connected with AWS where it can create and manage compute and VPC resources. Data access in the notebook was through a mounted S3 bucket on AWS.

Datasets used in the project include the following:

The project can be directly accessed via Spark Playground - Flight Delay Prediction. This repository also contains the .dbc and .py versions of the Databricks notebook.

About

A machine learning at scale demo on flight delay prediction. The project includes an exploration of a series of data transformation and ML pipelines in Apache Spark (via Databricks).

Topics

Resources

Stars

Watchers

Forks

Languages