Skip to content

Project for a receipt analysis of a dataset from kaggle

License

Notifications You must be signed in to change notification settings

JiriValasek/ReceiptAnalysis

Repository files navigation

ReceiptAnalysis

Project for a receipt analysis of a dataset from kaggle.

Full report (CS only) is here

Requirements

  • This project has not been optimized to run on a PC with any RAM size, 32Gb is thus recommended to analyze 100k of records.
  • For more records, optimization or bigger RAM is necessary.
  • Python v3.11.x (used through pyenv)

Usage

  1. Clone the repository
  2. Register to Kaggle
  3. Download dataset eCommerce purchase history from electronics store
  4. Place the dataset (kz.csv) into /data directory
  5. Install poetry
  6. Install depencences cd /path/to/ReceiptAnalysis && poetry install
  7. Run preprocessing poetry run preprocessing
  8. Run clustering poetry run clustering
  9. Generate associative rules poetry run associative_rules

Outputs

  • Saved numpy matrices are in /data
  • Saved matplotlib figures are in /images
  • Saved text outputs of the scripts are in /outputs
  • Saved clusters and cluster rules are in /rules

Used tutorials

About

Project for a receipt analysis of a dataset from kaggle

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages