Skip to content

This is a part of the course TDT4173 - Machine Learning at NTNU.

Notifications You must be signed in to change notification settings

nicklasbekkevold/covid-19-clustering

 
 

Repository files navigation

COVID-19 Clustering

Python CI

This is a part of the course TDT4173 - Machine Learning at NTNU. The project proposal is available here.

Clustering methods being evaluated:

  • Agglomerative Clustering
  • BIRCH
  • DBSCAN
  • k-Means
  • Mean Shift
  • Spectral Clustering

Data set

This project uses the "Mortality risk of COVID-19"-dataset from Our World in Data https://ourworldindata.org/mortality-risk-covid. The dataset contains country-by-country data on mortality risk of the COVID-19 pandemic.

Installation guide

Prerequisites

  • Python (version 3.8 or higher)
  • Some kind of package manager. We recomend using conda or pip.

Installing dependencies

All the project dependencies are listed and pinned to a specific version in requirements.txt which can easily be installed.

If you are using conda, run the following at the command-line:

conda install --file requirements.txt

If you are using pip, run the following at the command-line:

pip install -r requirements.txt

Running scripts

Important! All the python files are assumed to be executed from root. Do not try to run scripts from a sub directory (such as src).

👍 Example of correct usage:

.../covid-19-clustering python src/preprocessing.py

👎 Example of incorrect usage:

.../covid-19-clustering/src python preprocessing.py

File strucure

The files found in the notebooks folder are jupyter notebooks. data contains raw csv files from OWID, as well as processed and cleaned files.

📂covid-19-clustering
┣ 📁.github (CI config)
┣ 📁.vscode (vscode editor config)
┣ 📁data (raw, clean, and processed csv files)
┣ 📁models (persisted models with metadata)
┣ 📁notebooks (jupyter notebooks)
┣ 📁results (clustering assignment and metrics for each model as well as plots)
┣ 📁src
┃ ┣ 📂evaluation (Python scripts for comparing models)
┃ ┣ 📂model (Python scripts for training and presisting the models)
┃ ┣ 📂visualization (Python scripts for making visualizations)
┃ ┣ 📜preprocessing.py (Same as EDA in 📁notebooks for data cleaning and preprocessing)
┃ ┣ 📜utils.py
┣ 📁tests
┣ 📜.flake8
┣ 📜.gitignore
┣ 📜project_proposal.md
┣ 📜README.md (this file)
┣ 📜requirements.txt (3rd-party dependencies / packages)

About

This is a part of the course TDT4173 - Machine Learning at NTNU.

Topics

Resources

Stars

Watchers

Forks

Languages

  • Jupyter Notebook 98.5%
  • Python 1.5%