Geometric Context from a Single Image

The paper “Geometric Context from a Single Image” describes a method for approximating 3D structures from the content of a single 2D image. The first part of this method involves classifying each image pixel into one of three categories: part of the ground plane, sticking up from the ground plane, or part of the sky. Then the pixels sticking up from the ground are subdivided into planar (i.e. flat) surfaces facing various directions (left, right, or toward the camera) and non-planar surfaces that are either porous (e.g. leaves) or solid (e.g. a person).

After looking at a sampling of 300 outdoor images from Google image search, the researchers found that 97% of the image pixels were either in the ground plane, at right angles to the ground plane, or in the sky. They also found that the camera angle for most images is 15 degrees. These assumptions are helpful in assessing the geometry of the image contents. Pixels are grouped into “superpixels,” which are sets of images that all correspond to the same spatial content. This information is then used to make multiple hypotheses about the information in each training image. Each hypothesis is then given a confidence estimate based on a probability function. Using the training information, the researchers were able to detect cars in a series of images.