Skip to content
This repository has been archived by the owner on Sep 23, 2021. It is now read-only.

⚡️ Ⓒlassification on Persian Database using MLP and CNN in PyTorch

Notifications You must be signed in to change notification settings

wildonion/PersianDatabaseClassification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Persian Database Classification Task using PyTorch


Author: Mohammaderfan Arefimoghaddam(marefimoghaddam@unixerr.com)

If you have any question, please feel free to contact me.

IFHCDB database

IFHCDB paper

To extract the data please contact with Dr. Karim Faez

⚙️ Environment Settings

  • PyTorch 1.7
  • Python 3.8
  • CUDA 10.2
  • Ubuntu 20.04.1 LTS

⚖️ Performance

⚠️ Both models trained on <device_names_here> using default hyperparameters.

✅ MLP:

✅ CNN:

🔧 Setup

docker build -t pdc .

⚠️ uvloop module is not supported by windows!

Download Persian Database dataset CSV files and extract images.tar.xz inside dataset folder.

💻 Usage

Run trainer.py for training selected network(cnn or mlp):

sudo docker run pdc trainer.py --network mlp --batch-size 32 --num-workers 4 --epochs 200 --learning-rate 1e-3 --device cpu

After finishing the training process run bot.py 🤖 server for prediction using Telegram-bot APIs.

sudo docker run pdc bot.py

📋 Procedures

📌 Preprocessing

Both models are trained on CSV files which are the numpy arrays of dataset images and their associated labels of Persian Database dataset. If you want to preprocess images of another dataset from scratch just run _img_to_csv.py script inside utils folder to resize them and store their numpy arrays in to their related CSV files.

python utils/_img_to_csv.py --path /path/to/dataset --image-size 64

📌 Calculating std and mean of your dataset

In order to normalize the images of your dataset you have to calculate mean and std of your data. By using one the methods in _cal_mean_std.py script inside utils folder you can calculate those parameters and normalize(standard scaler) your images to build train and valid dataset pipelines. More information about calculating mean and std in PyTorch.

⚠️ Remember to pass dataloader object into those methods.

mean, std = CalMeanStd0(training_dataloader)

or

mean, std = CalMeanStd1(training_dataloader)

⚠️ trainer.py script do this automatically for CSV files dataset 🙂

📌 Building pipelines and dataloaders

The dataset pipelines of training and valid data will normalize all images using calculated mean and std and convert them into PyTorch tensor. Finally we pass pipelines through dataloader object to prepare them for training and evaluating.

📌 Training and evaluating on selected model

I coded backpropagation algorithm from scratch using the chain rule of gradient descent optimization technique for training and tuning the weights of MLP model. You can see it in backward function.

For the CNN model I used the built in backward method of the loss function. It'll automatically backward through the network and calculate the gradient of each weights and update them using computational graph. You can access the derivative of each weights' tensor of a specific layer like so: self.fc1.weight.grad.

📊 MLP Plotted history

📊 CNN Plotted history

📌 Prediction

Start predicting 🔮 with pdc bot 😎✌️