Reading-digits

Reading alphanumeric digits from real-world images is a very hard task to solve. While OCR techniques combined with other algorithms have successfully achieved that objective for binary documents and images, the task still remains for the images in real-world scenerio. There, different conditions such as brightness, color and other backgrounds affect the identification and extraction of numbers and characters.

Deep Learning can, however, solve these issues by taking each pixels into account. Using CNNs, the task of character recognition in natural-scene images can be solved very efficiently.

The following deep learning model was designed for classifying the individual characters after extracting them from the full image. For this, the Street View House Number(SVHN) dataset has been used. It has a large amount of collection of these natural images containing house numbers, which were collected from Google Street View images.

Link: Street View House number

A MATLAB code is provided alongside the dataset for extracting individual digits, along with their labels.

Steps for Training and Testing

Clone the repo using: git clone

A) Install all the dependencies if you are using an offline environment, such as PC or Mac. You can create a virtual environment in your project folder and install all the dependencies that doesn't come with python installation.

For this you will need:

TensorFlow
OpenCV
Matplotlib

B) After installation, prepare your dataset into an 'Image' folder and a 'Label.txt' file.

Here, the SVHN dataset has been used. For that, run the see_bboxes.m file in MATLAB and provide the appropriate folder location with the above names.

C) After dataset preparation, place the image folder and the label file within the project folder and use their location to run the pre-processing files: 'Image_Preprocess.py' and 'Output_Preprocessing.py'.

Now we will have 2 numpy files: One storing the images and another their corresponding labels, for training, validation and testing.

D) Run the 'Model_train.py' file for training and testing, through the command line and provide the following data sequentially:

location of the numpy files
Training, validation and test sizes
Number of epochs.

On Line 46 of 'Model_train.py', select which model to train with from the 'models.py' file.

Both the VGG models, provided 97% of accuracy during testing, using a total dataset size of 40,000 images.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image_Preprocess.py

Image_Preprocess.py

LICENSE

LICENSE

Model_train.py

Model_train.py

Output_Preprocessing.py

Output_Preprocessing.py

README.md

README.md

models.py

models.py

see_bboxes.m

see_bboxes.m

Repository files navigation

Reading-digits

Steps for Training and Testing

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Image_Preprocess.py		Image_Preprocess.py
LICENSE		LICENSE
Model_train.py		Model_train.py
Output_Preprocessing.py		Output_Preprocessing.py
README.md		README.md
models.py		models.py
see_bboxes.m		see_bboxes.m

License

sbhmajum369/read-digits

Folders and files

Latest commit

History

Repository files navigation

Reading-digits

Steps for Training and Testing

About

Topics

Resources

License

Stars

Watchers

Forks

Languages