Fusing Geometrical Knowledge with Deep Learning for Reliable Dense Stereo Matching

Max Mehltretter, M. Sc.

Main Supervisor: C. HeipkeCo-Supervisor: B. Wagner


The reconstruction of depth information from one image pair is a classical task in photogrammetry and the minimal case of the well-known structure from motion problem. It refers to the concept that 3D structures can be recovered from the projected 2D motion field of a scene acquired with a moving sensor. A special case of this task is dense stereo matching. It not only determines depth for significant feature points, but for every or at least a majority of pixels within a stereo image pair. In principle, this task can be interpreted as the inverse operation to a perspective projection, which directly leads to the major difficulty of this task: Projecting the 3D scene to a 2D image plane results in a dimensionality reduction. Consequently, the inverse operation has no unique solution in general, characterising it as ill-posed. To determine a solution nevertheless, the identification of point correspondences within the two images of a pair is a prerequisite in general. However, especially under challenging conditions, depth reconstruction approaches might not be able to identify the correct correspondences for all pixels. This raises the question about the reliability of such a solution. A question that is of great relevance for various application domains such as robotics and autonomous driving.

Therefore, one of the objectives of this work is to investigate and emphasise the importance and usefulness of reliability in the context of dense stereo matching. To achieve this objective, a methodology is developed that estimates the uncertainty of the single components and propagates it through the stereo matching pipeline. The gained information regarding uncertainty is further used directly within the pipeline to improve robustness and accuracy of the final depth estimation. Moreover, the information is provided as additional output and is therefore available to subsequent applications that build on top of the reconstructed depth information.

Moreover, the proposed methodology is based on the symbiosis of geometric principles and deep learning. Since the principles of geometry are well-known and do not have to be learned from scratch, this fusion promises two major advantages: A significant reduction of the necessary training data and the combination of the improved performance of learning-based approaches with the general validity of geometrical principles. Consequently, the influence of image variations on the learning-based elements can be minimised, making the proposed approach reliably applicable for a wide range of applications and scenarios

The evaluation of the developed methodology is carried out on a wide variety of publicly available and well-established datasets, as well as on data captured in the context of this project. These data do not only cover different sensor setups, such as varying sensor types and baselines, but also varying external conditions, such as changing illumination and indoor as well as outdoor scenes. In this way the reliability of the proposed methodology is examined and assessed comprehensively.

Fig.1: Example image from the KITTI 2012 stereo dataset, with depth information (from near in yellow to far away in dark blue) and associated uncertainty (from small in green to high in red)
Max Mehltretter, Dr. -Ing.
Address
Institut für Photogrammetrie und GeoInformation
Nienburger Straße 1
30167 Hannover
Building
Room
Max Mehltretter, Dr. -Ing.
Address
Institut für Photogrammetrie und GeoInformation
Nienburger Straße 1
30167 Hannover
Building
Room