3D point reconstruction from 2D images

I know this must exist, but I'm having enormous trouble finding the right search terms.

Say I have a bunch of labelled 3D points, and I capture multiple 2D images of it. If I want to reconstruct the 3D points, are there well-established algorithms/libraries for doing this?

This is presumably the basis for 3D facial recognition, which is a well-established field of research, but the general case (i.e. non-faces) doesn't seem to have an obvious literature that I can find.

One way I can see for approaching this is an optimisation problem where each 2D image establishes a minimum distance constraint between each point, and one can reconstruct the 3D points by minimising the distance required to satisfy these constraints. This does feel like one of those problems that has a sneaky linear algebra solution, though.

Does this class of problem have a literature I can search? Does it have existing libraries? (I'd be really surprised if there wasn't something in OpenCV, but I don't really know what I'm looking for)

Topic 3d-reconstruction computer-vision

Category Data Science


Are you familiar with multi-view geometry? There's a classic book on it: Multiple View Geometry in Computer Vision by Hartley and Zisserman. A good starting point is the wiki article on 3D reconstruction from multiple images.

Essentially, yes, the idea is to solve an optimization problem. Let the object be a point set $P$ in 3D. In the general case, we have $N_C$ cameras $\{C_i\}_{i=1}^{N_C}$, such that $$ q_i = \mathcal{I}_k\mathcal{E}_k p_i $$ $$ (x_i,y_i)=(q_{i1}/q_{i3},q_{i2}/q_{i3}) $$ where $p_i\in P$ is the $i$th point in homogeneous coordinates, $\mathcal{I}$ is the intrinsic camera parameters matrix, $\mathcal{E}$ is the extrinsic camera parameters matrix, and $q_i$ is the $i$th projected point with image coordinates $(x_i,y_i)$. (Check out the camera matrix and camera calibration for more.)

To solve this problem depends on how much information you have (especially regarding the camera parameters) and what assumptions you make (e.g. perspective vs orthographic projection equations). Note there can be some inevitable ambiguity, depending on the number of cameras/images and their view-points, as well as the number/location of corresponding points. If you have some prior knowledge of the 3D structure itself, that can also be used.

In the two-view case, the (grandly named) fundamental matrix and the essential matrix, allow estimation of the 3D point positions (up to translational scaling) based on the epipolar constraints. With more views, one can for example do triangulation via direct linear transformation estimation.

A nice place to start might be the PhD thesis of Daniel Martinec on Robust Multiview Reconstruction.

Two libraries of potential interest are OpenCV and OpenMVG.


Now, entirely separate from this view as a classical computer vision problem is the machine learning approach. The idea is to complement (or sometimes replace) the geometric relations from multiview geometry (which can be brittle due to noise, measurement error, etc...) with data-driven estimations. Some starting points might be:

Such approaches may be more in line, I suppose, with the average users of a "Data Science" site :)

Note that these approaches tend to assume that no point-wise correspondence is known. This makes the problem significantly harder. In your case, however, it seems this is not the case, though you haven't given an explicit formalization. Just keep this mind when looking at these various methods, as an easier approach may be preferable.


Perhaps "depth estimation" (or related search terms like: depth estimation from image, depth estimation from mono, depth estimation from stereo ...)?

There are a lot of research in that area right now.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.