MLHEP-2021 Baseline

Dataset

Dataset is on CoCalc in folder /home/user/share/competition and is available at Kaggle

The train and test folders contain images in .png format. The train filenames contain the energy and the folder names the particle class.

train
├── ER
└── NR

The most important part is the particle energy which is in {1.0, 3.0, 6.0, 10.0, 20.0, 30.0} and has _keV suffix.

0.00012752991074573__CYGNO_60_40_ER_30_keV_930V_30cm_IDAO_iso_crop_hist_pic_run4_ev846;1.png
0.0001805024851534__CYGNO_60_40_ER_10_keV_930V_30cm_IDAO_iso_crop_hist_pic_run2_ev317;1.png
0.00013557143164375__CYGNO_60_40_ER_10_keV_930V_30cm_IDAO_iso_crop_hist_pic_run2_ev842;1.png
0.00019057084775703__CYGNO_60_40_ER_3_keV_930V_30cm_IDAO_iso_crop_hist_pic_run2_ev116;1.png
0.0001135022106767__CYGNO_60_40_ER_10_keV_930V_30cm_IDAO_iso_crop_hist_pic_run5_ev136;1.png
0.0001275016178883__CYGNO_60_40_ER_3_keV_930V_30cm_IDAO_iso_crop_hist_pic_run2_ev485;1.png
0.0001375808674508__CYGNO_60_40_ER_30_keV_930V_30cm_IDAO_iso_crop_hist_pic_run3_ev662;1.png
0.0011665058173393__CYGNO_60_40_ER_10_keV_930V_30cm_IDAO_iso_crop_hist_pic_run5_ev574;1.png
0.0011465791675372__CYGNO_60_40_ER_3_keV_930V_30cm_IDAO_iso_crop_hist_pic_run2_ev114;1.png
0.0011065850424555__CYGNO_60_40_ER_3_keV_930V_30cm_IDAO_iso_crop_hist_pic_run4_ev868;1.png

Data spliting

The total number of samples in the dataset is ~30k sample distributed with 2 classes ER=0 and NR=1. The dataset are interleaved in the following scheme:

Interleave the files

Energy	He	e
1	*	-
3	-	*
6	*	-
10	-	*
20	*	-
30	-	*

is training; - is testing

Setup without CoCalc

Download the data from Kaggle
Extract it with tar -xvf MLHEP-2021-train.tar.xz && tar -xvf MLHEP-2021-test.tar.xz
Install poetry
Set up the poetry environment with poetry install
Change config.ini[DATA][DatasetPath] to the location of the extracted data
Prefex all your commands with poetry run

Training

If you want to retrain the model from scratch just run:

mv checkpoints checkpoints_bk && python train.py

This will move the original checkpoints and run the experiment again. Note you can modify config.ini to load the checkpoint without moving the files.

Results

To generate the report just run:

python report.py

This will generate ./resultsreport.log in the current directory containg information and bunch of plots in the ./results/ directory

Classification

Regression

Submission

To generate submission_classification.csv.gz and submission_regression.csv.gz run:

python generate_submission.py

By default, it will use the checkpoints provided in the repository, if you trained your own model, you should put the appropriate checkpoints into config.ini.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
checkpoints		checkpoints
idao		idao
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.ini		config.ini
generate_submission.py		generate_submission.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
report.py		report.py
requirements.txt		requirements.txt
train.py		train.py

License

yandexdataschool/mlhep-2021-baseline

Folders and files

Latest commit

History

Repository files navigation

MLHEP-2021 Baseline

Dataset

Data spliting

Interleave the files

Setup without CoCalc

Training

Results

Classification

Regression

Submission

About

Topics

Resources

License

Stars

Watchers

Forks

Languages