Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
data/precincts		data/precincts
etl		etl
fetch		fetch
hive		hive
profile		profile
results		results
web		web
.gitignore		.gitignore
README.md		README.md

Repository files navigation

Vision Zero

Github repo: https://github.com/thoughts1053/visionzero

Directory Explanations

fetch: python code to programatically download our data sources:

profile: Java MapReduce code to profile each data source

classes and jar included for Java 1.7
Build instructions can be found inside corresponding etl directory

etl: Java MapReduce code to filter and clean each data source

classes and jar included for Java 1.7
Build instructions can be found inside corresponding etl directory

hive: hive analytic scripts for each data source

schema: setup schema for each data source
by_month: creates an aggregate table for each data source to help with speed and quick iterations
running_averages: Gets 3/6/12 month running averages for each field
percent_change: Gets the change for each field from the previous month

results

local downloads of the results from the hive scripts
includes screenshots of running the analytics and etl
includes tableau file and image exports of charts for data visualization

web: allows for realtime data access and visualization

Uses ssh and impyla to connect to Dumbo and query Impala
Option to run analytics in realtime and see output
Option to download results of analytics to csv

Steps to Reproduce from Scratch

Fetch data

python fetch/collisions.py
python fetch/violations.py
python fetch/tlc.py

Put data into hdfs

hdfs dfs -mkdir visionzero
hdfs dfs -put data visionzero

Extract, transform, and load the data

Build jar files - look at etl directory for instructions
hadoop jar etl/tlc/TlcEtlDriver.jar visionzero/data/original/tlc visionzero/data/formatted/tlc
hadoop jar etl/violations/ViolationsEtlDriver.jar visionzero/data/original/violations visionzero/data/formatted/violations
hadoop jar etl/collisions/CollisionsEtlDriver.jar visionzero/data/original/collisions visionzero/data/formatted/collisions

Use Hive to Analyze the Data

Update {base_path} in the schema files
Connect to Hive
Create database "vision_zero"
Run script schema scripts:
- hive/tlc/schema.q
  - hive/violations/schema.q
  - hive/collisions/schema.q
Run aggregate data scripts:
- hive/tlc/tlc_by_month.q
  - hive/violations/violations_by_month.q
  - hive/collisions/collisions_by_month.q
View output and run other scripts

Realtime Web View

Follow README in web directory to setup server
Run local server
Download results via the browser or view the data in realtime

About

Vision Zero Analytic

Report repository

Releases

No releases published

Packages

No packages published

Contributors 3

Languages