Structure from Motion 2
British Society for Geomorphology Geomorphological Techniques, Chap. 2, Sec. 2.2 (2015)
position and orientation automatically and
without the need for a pre-defined set of
“ground control”, visible points at known
three-dimensional positions (Westoby et al.,
2012). The need for a high degree of overlap
to cover the full geometry of the object or
scene of interest, gives rise to the name:
structure derived from a moving sensor.
Whilst the exact implementation of SfM may
vary with how it is coded, the general
approach has been outlined by other authors
(Westoby et al., 2012; James and Robson,
2012; Fonstad et al., 2013; Micheletti et al.,
2014) and only a brief explanation is required
here. In essence, multiple views of an object
are captured with a digital camera from a
range of different positions. A scale invariant
feature transform (SIFT) then identifies
common feature points across the image set,
sufficient to establish the spatial relationships
between the original image locations in an
arbitrary 3-D coordinate system. A sparse
bundle adjustment (e.g. Snavely et al., 2008),
needed to transform measured image co-
ordinates into 3-D points covering the area of
interest, is used in this process. The result is
three-dimensional locations of the feature
points in the form of a sparse point cloud in
the same local 3-D co-ordinate system.
Accurate key point correspondence requires
the availability of visually distinct texture
appearing in the imagery, which can present
a problem with some objects and/or lighting
conditions. The sparse point cloud is then
intensified using Multi View Stereo (MVS)
techniques (e.g. Furukawa and Ponce, 2010;
Rothermel et al., 2012). It is the ability of
these techniques to generate very high
resolution datasets, whilst isolating and
removing gross errors, which is now allowing
such visually impressive 3-D models to be
generated so easily when compared to
traditional stereo based DEM generation
methods involving “stereomatching”
(Remondino et al., 2014). Effectively,
because of the ease with which sensor
distortion can be modelled, all consumer
grade digital cameras, including the
ubiquitous “smartphone”, can acquire
valuable geomorphic data (Micheletti et al.,
2014). Furthermore, the recent development
of low-cost, sometimes free, internet-based
processing systems enable the upload,
processing and download of the derived 3-D
data in just a few minutes, potentially during
field data collection. This is in direct contrast
to traditional photogrammetric software,
where the user is forced to define and to
determine interior and exterior orientation
parameters explicitly. Most SfM platforms are
now fully automated. The advantage of SfM
is that it provides a black-box tool where
expert supervision is unnecessary. It may
also be a disadvantage in that the user has
much less involvement in data quality control
and the origins of error in data may not be
identifiable.
This paper presents guidelines and a
workflow for the application of SfM
photogrammetry with a hand-held camera, to
help avoid generating such inaccurate
datasets. Examples and considerations are
taken from a study conducted by Micheletti et
al. (2014) involving ground-based imagery.
Although not discussed formally here, all
principles also remain valid for images
obtained using other approaches such as
with Unmanned Aerial Vehicles (UAVs) or
drones.
Photogrammetric heritage
The term Structure-from-Motion has evolved
from the machine vision community,
specifically for tracking points across
sequences of images occupied from different
positions (e.g. Spetsakis and Aloimonos,
1999; Boufama et al., 1993; Szeliski and
Kang, 1994). SfM owes its existence to
innovations and mathematical models
developed many generations ago, particularly
in photogrammetry. The coplanarity
condition, now used to establish the spatial
relationship between images, was applied in
the 1950 and 1960s for numerical aerial
triangulation and mapping from aerial
photography (Thompson, 1965). The bundle
adjustment, which implements the collinearity
condition to establish a mathematically
rigorous relationship between image and
object, was established later by Brown (1971,
1976), Kenefick et al. (1972) and Granshaw
(1980). Only perfect metric cameras generate
images which are distortion free. However, a
“self-calibrating” bundle adjustment (Kenefick
et al., 1972; Faig and Moniwa, 1973) can
model and estimate additional parameters
suitable to represent a wide range of internal
distortions associated with consumer grade
digital cameras. Unfortunately, much of this
important pioneering work necessary to
establish both appropriate camera models
评论0