Skip to content
/ ironn Public

An image recognition project through a partnership between the University of British Columbia Master of Data Science program and Quebec Iron Ore

License

Notifications You must be signed in to change notification settings

mfqqio/ironn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ironn

An image recognition project through a partnership between the University of British Columbia Master of Data Science program and Quebec Iron Ore


Contributors: Jingyun Chen, Socorro Dominguez, Milos Milic, Aditya Sharma

Project Summary

This project has been developed in order for QIO to leverage Machine Learning (ML) techniques to improve ore yields while reducing operational costs. We are providing a Python package that contains a trained neural network that categorises Ore, Dilution Waste and Contamination Waste in candidate images to 85% accuracy. We are also providing two dashboard applications that can be used in assessing image quality.

Repo Structure

    ubc_geodetection_review/
├── data
│   ├── images
│   └── tagfiles                                             # data: images + json tags                   
└── ironn                     
    ├── modules                                              # all modules for the package
    │   ├── arch                                             # model architectures used
    │   ├── dashboards                                 
    │   │   ├── iCA                                          # image Colour Analysis
    │   │   └── iQA                                          # image Quality Assurance
    │   ├── dataloader                                       # dataloader + related files
    │   ├── output                                           # preprocessing output
    │   │   └── stat_test_result
    │   └──  preprocessing                                   # preprocessing modules
    └── tests                                                # all tests for the modules
         └── test_data

The following is the description of each folder in detail.

Data

Images

Over 1000 pictures of the mine’s blasted face in Lac Bloom, Fermont. Different types of blast and locations. Images are jpg or JPG format.

Tagfiles

Label files with coordinates of different rock types. Can be linked to the images files by name. Files come in JSON format.

IRONN

Modules

Architecture

This folder contains 4 python scripts for two model architectures including FCN and UNET.

  • fcn.py

    This script contains a class named FCN8s(), which inherits from the torch.nn.Module class and it defines the architecture of the Fully Convolutional Networks currently being used for the model.

  • vgg.py

    This script contains a class named VGGNet(), which inherits from the PyTorch VGG module and is used to modify the existing VGG pre-trained model architecture into a Fully Convolutional version which can be fed into the FCN8s class.

  • unet.py

    This script contains the forward pass structure of the UNET architecture in Pytorch which inherits from the nn.Module class.

  • unet_parts.py

    This is the script where the smaller pieces of UNET architecture are defined which are then ultimately used in unet.py

Dataloader

This folder contains one script for loading in the data for training and testing based on the PyTorch DataLoader module.

  • dataset.py This script contains the following 4 classes, which are:

    • CustomTransform() class: it adds transformations to images such as rotation and cropping, which allows doing data augmentation to increase the number of images by creating multiple images from a single image
    • QioTrain() class: it inherits from the Dataset class from the Pytorch utils.data module and is used for loading the training dataset.
    • QioVal() class: it is used for loading the validation dataset.
    • QioTest() class: it is used for loading the test dataset.
Output

This folder contains all spreadsheets and images generated using scripts inside preprocessing folder, which includes:

  • img_tbl_w_labels.csv file

    It contains the name of the image file, associated polygon coordinates for individual rock type, coordinates for combined rock types, and coordinates for each convex hull. Generated using get_polygon_coords.py.

  • img_tbl_w_roughness.csv file

    It contains the name of the image files, rock type, and associated 9 colour features (i.e. skewness, kurtosis, mean pixel on 3 channels) extracted from each rock type in every image. Generated using get_roughness.py.

  • exif_data_summary.csv file

    It contains the name of the image files, camera type, focal length, lens aperture, zoom, exposure time, image size, and megapixel. Generated using exif_summary.py.

  • complete_joined_df.csv file

    It contains all the features in both roughness and exif summary data. Generated using join_data.py.

  • general_outlier_imgs.csv file

    It contains the name of the image files that need to be removed in the training data. Generated using find_outlier.py.

  • stat_test_results folder

    It contains 6 csv files including results from anova test, pairwise test, as well as calculating mean pixel values for each rock type. Generated using stat_test.py.

  • 3 sample images

    Sample images used for eda report and presentation. Generated using get_roughness.py.

Preprocessing

This folder contains all python scripts that are used to conduct colour, exif data analysis, as well as preprocess training data. It includes:

  • convexhull.py

    It contains one class named MyJarvisWalk(), which includes set of functions that allow the user to input a list of tuples that represent a set of coordinates and return the coordinates belonging to the convex hull of the coordinates.

  • exif_summary.py

    It takes in a specific dictionary key that looks at specific EXIF information from an image file and returns a data frame of the images analyzed and the image feature results for each image.

  • export_poly_train_data.py

    This script is used to create an appropriate directory structure and create labels from the raw json files and original images. After this script is run, we have the training and the testing data created from the original images and labels.

  • export_ch_train_data.py

    This script works similar to the export_poly_train_data.py script except that it creates the training data for the basic model used to predict blasted faces using convex hulls.

  • find_outlier.py

    It reads image roughness data, generates an outlier table for each rock type and saves as a csv file. An outlier is defined as values that are 2 standard deviations away from the mean of each colour feature.

  • get_polygon_coord.py

    It takes every image name, corresponding tag file to extract polygon coordinates and save the results (a table that contains: image name, coordinate dictionary and convex hull) as a csv file.

  • get_roughness.py

    It takes every image name to extract roughness statistics of each area of interest and save the results in a table as a csv file.

  • join_data.py

    It combined roughness data and EXIF data and saves the results as a csv file.

  • stat_test.py

    It allows user to conduct anova test, pairwise test and determine mean pixel value on 3 channels for each rock type based on roughness data and saves test results as csv files.

  • utils.py

    It contains a set of small functions that are required for the model building process.

Dashboards
iCA

This app is a proof of concept created by plotly dash that helps users play around with colour features and camera effect data for each rock type and detect which images have unusual behaviours and can be further used for image quality check.

This folder contains five items including:

  • app.py: it contains main application code.
  • apps folder: it contains two Python scripts (camera_effect.py, colouranalyzer.py).
  • assets folder: it contains the IRONN logo image.
  • requirements.txt: it contains all dependencies used for creating the application.
  • results folder: it is used for storing all analysis results.
iQA

This shiny app will allow a user to go through all or a specific set of images to see how well they fared in the predictive algorithm. The app will look at images in three dummy folders explained below:

  • The “original” folder holds the original images
  • The “superimposed_train_labelme” folder holds the image with the labelMe mask superimposed on it
  • The “train_predict” folder that holds the images with the predicted mask superimposed

The user can upload a csv file of image names they would like to analyse and then mark the results of the images. The results of the analysis are saved as a csv file.

Tests

Test data

JSON and JPG test files that are required in the tests

train.py

This is the main file from where the model training is invoked and contains the training loop, the validation loop and code for saving model performance.

Dependencies

All python packages used in the product are listed below:

Python 3.6.8

  • numpy==1.15.4
  • pandas==0.23.0
  • Pillow==5.1.0
  • opencv-python== 4.1.0.25
  • matplotlib==2.2.2
  • scipy==1.1.0
  • statsmodels==0.9.0
  • seaborn==0.9.0
  • torchvision==0.2.2
  • pytorch==1.0.1
  • scikit-image==0.15.0
  • scikit-learn==0.21.2

About

An image recognition project through a partnership between the University of British Columbia Master of Data Science program and Quebec Iron Ore

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published