Skip to content

GokulVSD/MonoDAC

Repository files navigation

Monocular Depth Estimation via a Fully Convolutional Deep Neural Network, utilising Atrous Convolutions, with 3D Point Cloud Visualisation.

Final Year Project
Developed during our 7th and 8th semesters as a part of our undergraduate degree course work.

Generating depth maps, colloquially known as depth estimation, from a single monocular RGB image has long been known to be an ill-posed problem. Traditional depth estimation techniques involve inference from stereo RGB pairs, via depth cues, or through the use of laser based LIDAR sensors, which produce sparse or dense point clouds depending on the size or cost of the sensor. Most modern smartphones contain more than one image sensor; however, utilising these sensors for depth estimation is infeasible as smartphone vendors restrict access to one image sensor at a time. In other cases, the sensors are of varying quality and focal lengths, rendering them inadequate for the purpose of depth inference. Producing depth maps for monocular RGB images is a crucial task due to their use in Depth-of-Field (DoF) image processing, Augmented Reality (AR), and Simultaneous Localisation and Mapping (SLAM).

To tackle the above problem, we propose a fully convolutional DCNN approach to learning and generating depth maps from single RGB images, utilising Atrous Convolution layers and ASPP for semantic segmentation and feature pooling & extraction in a convolutional neural network, employing an encoder-decoder architecture. We also apply Bicubic upsampling convolutions to further boost depth estimation accuracy, while simplifying previously proposed architectures so as to improve on performance, taking into consideration the computational and accuracy constraints that plague prior efforts.

We are showcasing the results of our model using a 3D point cloud view, and have trained our model using a subset of NYUv2 dataset that contains RGB and depth map pairs, which were constructed using Kinect sensors in an unsupervised manner.

Documentation


Data-Flow Diagram Level 2


Network Architecture

Screenshots


Dependencies

TensorFlow dataflow and differentiable programming library

Keras neural-network library

NumPy multi-dimensional arrays and matrices

PIL python imaging library

Pillow a fork of PIL

Scikit-learn various classification, regression and clustering algorithms

Scikit-image segmentation, transformations, color manipulation, filtering, morphology, feature detection

Open3D 3D rendering

OpenCV2 real-time computer vision library

Flask web server


Other Requirements

OpenGL 3.5.5 or newer

IP Camera and a network connection


Training code and model withheld due to academic constraints.

About

Monocular Depth Estimation via a Fully Convolutional Deep Neural Network, utilising Atrous Convolutions, with 3D Point Cloud Visualisation.

Topics

Resources

License

Stars

Watchers

Forks