Skip to content

Public repository containing the materials and slides for Breuninger's data2day 2022 presentation.

Notifications You must be signed in to change notification settings

e-breuninger/data2day-2022

Repository files navigation

data2day-2022

Public repository containing the material for the 2022 data2day conference.

More on the program: From PoCs to Large Scale ML Operationalization Covering the End-to-End Pipeline.

Developers

This repository is owned and maintained by E-Breuninger Developer Team.

For any feedbacks or inquiries related to this repository, you can contact Olivier Bénard (Data Software Engineer).

Dependencies

The dependencies are managed via poetry. We recommend to use and integrate this tool in your process. However, we also provide the list of necessary requirements with the requirements.txt file if you decide otherwise.

Note: It might be possible that you have to switch your python version. We recommend using pyenv as a python version manager, to be installed via brew install pyenv.

Quick Start

To install all the dependencies and rapidly start getting your hands dirty:

  1. Create a settings.toml file based on the following template:
[default]
LOG_LEVEL = "DEBUG"
LATITUDE = "<google-map-latitude>"
LONGITUDE = "<google-map-longitude>"
APP_PATH = "/absolute/path/to/the/local/repository/"
  1. Create a .secrets.toml file based on the following template (you can left the default if you have no key):
[default]
google_map_api_key = "<your-google-map-api-key>"
  1. Install all the dependencies on the virtual environment via poetry:

     poetry install
    
  2. You are ready to go and can start the jupyter notebook kernel:

     make notebook
    

Only thing left to do if to naviguate through notebooks/ and play with the notebooks.

Bonus: If you want to publish some changes, you first need to install pre-commit:

    make pre-commit-install

This will guarantee that the code you push meets the best software development standards and the github CI/CD pipeline to succeed i.e. your code will be accepted.

Notes:

  • You need to install poetry if you do not have it already via brew install poetry.
  • The Google Map API key is used to display the weather stations on Google Map. However, you do not need it since by default, the developer mode (activated by default if you do not have a key or a valid one) - even though grants less opportunities - also does the job.

Architecture

  • The data2day_2022/ foler contains reusable part of the code such as the sql queries and the helpers package.
  • The datasets/ folder contains the template you have to fill int to make the forecast.
  • The notebooks/ folder contains a couple of jupyter notebook where lies the main logic of the code.
  • The results/ folder contains the results to be generated by the notebooks.
  • The slides/ folder contains the anonymised presentation as a .pdf format.
  • The tests/ folder contains a couple of unittests to test our code.
  • The .pre-commit-config.yaml file contains a couple of logics to be executed at the commit time before the code can be pushed.
  • The Makefile contains a serie of redundant commands e.g. make check or make notebook.
  • The .secrets.toml and settings.toml are parametrisation files containing the variables used in the code.

Running your own forecast

  • You can parametrised the serie you want to predict using the datasets/customer_frequentation.csv file. Fill it with your own data, respecting the following template:
date quantity
<YYYY-MM-DD> <float>
  • Rainfall data for Stuttgart in 2018 has been retrieved and collected in the results/weather_prpc.csv file. You can however query the intial tables on BigQuery using notebooks/weather_data_on_biqguery.ipynb. Results will be captured under the results/ folder.

Troubleshooting

The troubleshooting section is empty so far but should you encounter any issue not stated in the current documentation, please contact us.