This is a part of the course TDT4173 - Machine Learning at NTNU. The project proposal is available here.
Clustering methods being evaluated:
- Agglomerative Clustering
- BIRCH
- DBSCAN
- k-Means
- Mean Shift
- Spectral Clustering
This project uses the "Mortality risk of COVID-19"-dataset from Our World in Data https://ourworldindata.org/mortality-risk-covid. The dataset contains country-by-country data on mortality risk of the COVID-19 pandemic.
All the project dependencies are listed and pinned to a specific version in requirements.txt which can easily be installed.
If you are using conda, run the following at the command-line:
conda install --file requirements.txt
If you are using pip, run the following at the command-line:
pip install -r requirements.txt
Important! All the python files are assumed to be executed from root. Do not try to run scripts from a sub directory (such as src).
👍 Example of correct usage:
.../covid-19-clustering python src/preprocessing.py
👎 Example of incorrect usage:
.../covid-19-clustering/src python preprocessing.py
The files found in the notebooks
folder are jupyter notebooks.
data
contains raw csv files from OWID, as well as processed and cleaned files.
📂covid-19-clustering
┣ 📁.github (CI config)
┣ 📁.vscode (vscode editor config)
┣ 📁data (raw, clean, and processed csv files)
┣ 📁models (persisted models with metadata)
┣ 📁notebooks (jupyter notebooks)
┣ 📁results (clustering assignment and metrics for each model as well as plots)
┣ 📁src
┃ ┣ 📂evaluation (Python scripts for comparing models)
┃ ┣ 📂model (Python scripts for training and presisting the models)
┃ ┣ 📂visualization (Python scripts for making visualizations)
┃ ┣ 📜preprocessing.py (Same as EDA in 📁notebooks for data cleaning and preprocessing)
┃ ┣ 📜utils.py
┣ 📁tests
┣ 📜.flake8
┣ 📜.gitignore
┣ 📜project_proposal.md
┣ 📜README.md (this file)
┣ 📜requirements.txt (3rd-party dependencies / packages)