Skip to content

gokul-pv/ObjectLocalization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Object Localization

Object Localization is a very hard task to do because we should not only predict the object but also we need to detect the exact position of it.

Detection Approaches

We can simplify Object Detection problem by:

  1. Ignoring lower prediction values
  2. Predicting bounding boxes instead of the exact object mask mixing different receptive field layers but this is easier said than done.

There are two main approaches to driving detection algorithms, namely:

  1. YOLO-like approach, where k-means extracted anchor boxes are used, and
  2. SSD-like approach, where a fixed number of predefined bounding boxes are used.

Part A: TinyImageNet

Objective:

  • Download this TINY IMAGENET dataset.
  • Train ResNet18 on this dataset (70/30 split) for 50 Epochs. Target 50%+ Validation Accuracy.

Parameters

  1. Augmentations - Horizontal flip, Padding , Random Crop, Cutout and Normalization
  2. Batch Size - 256
  3. Model - Resnet 18 with 200 classes
  4. Optimizer - SGD(momentum - 0.9 , weight_decay - 0.0001)
  5. Scheduler - One Cycle ( max_lr = 0.02, epochs=30, pct_start=1/3, anneal_strategy='linear', cycle_momentum=True, base_momentum=0.85, max_momentum=0.95, div_factor=10.0,final_div_factor =10)

Results

  1. Best train Accuracy - 77.83%
  2. Best test Accuracy - 57.55%

Forwarding
Figure 1 : Training log

Forwarding
Figure 2 : Accuracy plot

GradCam on MisClassified Images

Forwarding
Figure 3 : GradCam

Part B: Analyzing COCO data format (Link)

Coco Dataset:

Objective

  1. Learn how COCO (link) object detection dataset's schema is . This file has the same schema. You'll need to discover what those number are.
  2. Identify these things for this dataset:
    • Class distribution (along with the class names) along with a graph
    • Calculate the Anchor Boxes for k = 3, 4, 5, 6 and draw them.

What is The COCO Dataset?

COCO annotations are inspired by the Common Objects in Context (COCO) dataset.

"COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features: Object segmentation, Recognition in context, Superpixel stuff segmentation, 330K images (>200K labeled), 1.5 million object instances, 80 object categories, 91 stuff categories, 5 captions per image, 250,000 people with keypoints."

It is one of the best image datasets available, so it is widely used in cutting edge image recognition artificial intelligence research. It is used in open source projects such as

COCO Dataset Format

The COCO dataset is formatted in JSON and is a collection of “info”, “licenses”, “images”, “annotations”, “categories” (in most cases), and “segment info”.

An example of data from (link) is shown below:

id: 1, height: 782, width: 439, bbox:[359, 292, 83, 199]

id     - Image Id
height - Image Original Height
width  - Image Original Width
bbox   - Bounding Box in COCO format (x-top left, y-top left, width, height)

Class Distribution

There are 80 categories in the provided dataset and its distribution is shown below:

Forwarding
Figure 4 : COCO class distribution

What is a bounding box?

A bounding box in essence, is a rectangle that surrounds an object, that specifies its position, class(eg: car, person) and confidence(how likely it is to be at that location). Bounding boxes are mainly used in the task of object detection, where the aim is identifying the position and type of multiple objects in the image.

Forwarding
Figure 5 : Scatter plot for normalized BB dimensions

What is a Anchor box?

Anchor boxes are a set of template bounding boxes which the object detection model use to make a bounding box.

State of the art object detection systems currently do the following:

  1. Create thousands of “anchor boxes” or “prior boxes” for each predictor that represent the ideal location, shape and size of the object it specializes in predicting. In order to determine the ideal number of boxes we need, we use K-means clustering.

  2. For each anchor box, calculate which object’s bounding box has the highest overlap divided by non-overlap. This is called Intersection Over Union or IOU.

  3. If the highest IOU is greater than 50%, tell the anchor box that it should detect the object that gave the highest IOU.

  4. Otherwise if the IOU is greater than 40%, tell the neural network that the true detection is ambiguous and not to learn from that example.

  5. If the highest IOU is less than 40%, then the anchor box should predict that there is no object.

Forwarding
Figure 6 : Elbow method

Forwarding
Figure 7 : Mean IOU

Clusters Anchor boxes

Forwarding

Forwarding

Forwarding

Forwarding

Forwarding

Forwarding

Forwarding

Forwarding

About

Exploration of COCO object detection dataset schema

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published