Skip to content

The goal is to train a linear regression model to predict Deerfoot commute times given weather and accident conditions using Spark RDD and MLlib

Notifications You must be signed in to change notification settings

vaibhav50596/DeerfootTrailAnalysis

Repository files navigation

Deerfoot Trail Analysis using Spark and Linear Regression (PySpark MLlib)

Project Abstract

The Deerfoot Trail commute analysis project involves the calculation of commute time statistics along with the prediction of commute times given a set number of inputs. This project will also determine how accurate a machine learning model is in predicting commute times given these inputs. The analysis is interesting since it uses readily-available public data to provide predictions that could potentially benefit a large number of people. While the analysis focuses on a fixed time period for one specific roadway, it could potentially be expanded to predict commute times for major roadways across the country in real-time. Besides providing the project team with an opportunity to learn and apply Spark concepts, the results of the project could have real-world applications in transportation forecasting, planning, and safety.

How to run?

These are jupyter notebook files. You should have jupyter installed on your machine.

ENSF612_Spark_Project_1.ipynb requires

'deerfoot.csv' file

ENSF612_Spark_Project_2.ipynb requires

'deerfoot_part2-1.csv', 'eng-daily-01012013-12312013.csv' and 'eng-daily-01012014-12312014.csv' files.

About

The goal is to train a linear regression model to predict Deerfoot commute times given weather and accident conditions using Spark RDD and MLlib

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published