3D计算机视觉：高效方法与应用探索

5星 · 超过95%的资源需积分: 9 3 浏览量更新于2024-07-18 1 收藏 23.14MB PDF 举报

"3D Computer Vision Efficient Methods and Applications" 是一本关于计算机图形学和计算机视觉的英文原版书籍，由Christian Wöhler撰写，来自戴姆勒公司研究与先进工程部。本书旨在介绍三维计算机视觉的基础，并深入探讨了该领域的最新进展和特定应用的方法。在计算机视觉领域，3D技术是核心之一，它涉及到如何从二维图像中恢复物体的三维信息，以及如何构建和理解三维场景。这本书首先可能涵盖了基础理论，如摄像机模型、投影几何、立体匹配、重建算法等。这些基础知识对于理解3D视觉系统至关重要，它们帮助读者了解如何从多个视角捕获图像并重建三维空间。书中提到的"新方法"可能包括近年来在3D视觉中的创新技术，例如深度学习在3D重建中的应用，或者基于结构光或激光雷达的新型传感器技术。这些方法通常提高了数据采集的效率和准确性，使得3D建模和分析更加精确。 "应用特定系统"的详细描述可能涉及自动驾驶、机器人导航、增强现实、虚拟现实、工业检测等多个领域。在自动驾驶中，3D视觉用于识别道路、障碍物和交通标志；在机器人领域，3D视觉帮助机器人理解环境并进行精准操作；在AR/VR中，它提供真实世界与数字世界的融合；在工业检测中，3D视觉可以实现高精度的质量控制。此外，书中还可能讨论了数据处理的优化策略，如并行计算、实时处理和内存管理，这些都是在处理大量3D数据时至关重要的问题。作者可能还探讨了误差分析、鲁棒性问题以及如何评估3D视觉系统的性能。版权信息表明，这本书遵循Springer出版社的规定，对内容的复制和使用有明确的法律约束。任何使用都应遵守德国版权法，且需要获得Springer的许可。违反版权规定可能会引起法律纠纷。这本书为读者提供了全面的3D计算机视觉理论和实践知识，结合最新的技术发展，是这个领域的研究人员、工程师和学生的重要参考资料。通过深入学习，读者将能够掌握3D视觉的关键技术和应用，为实际项目开发打下坚实基础。

1.1 The Pinhole Camera Model 5

deﬁne a world coordinate system as soon as multiple cameras are involved. The

orientation and translation of each camera i with respect to this world coordinate

system is then expressed by

T, transforming a point

x from the world coordinate

system W into the camera coordinate system C

. The transformation

T is com-

posed of a rotational part R

, corresponding to an orthonormal matrix of size 3 ×3

determined by three independent parameters, e.g. the Euler rotation angles (Craig,

1989), and a translation vector t

denoting the offset between the coordinate systems.

This decomposition yields

x =





= R

x+ t

. (1.2)

Furthermore, the image formation process is determined by the intrinsic parameters

}

of each camera i, some of which are lens-speciﬁc while others are sensor-

speciﬁc. For a pinhole camera equipped with a digital sensor, these parameters com-

prise the principal distance b, the effective number of pixels per unit length k

and k

along the horizontal and the vertical image axis, respectively, the pixel skew angle

, and the coordinates u

and v

of the principal point in the image plane. For most

modern camera sensors, the skew angle amounts to

= 90

◦

and the pixels are of

quadratic shape with k

= k

For a real lens system, however, the observed image coordinates of scene points

may deviate from those given by Eq. (1.1) due to the effect of lens distortion. In

this work we employ the lens distortion model by Brown (1966, 1971) which has

been extended by Heikkil¨a and Silv´en (1997) and by Bouguet (1999). The distorted

coordinates

of a point in the image plane are obtained from the undistorted

coordinates

x according to

= (1 + k

+ k

)

x+ d

, (1.3)

where

x = ( ˆu, ˆv)

and r

= ˆu

+ ˆv

. If radial distortion is present, straight lines in

the object space crossing the optical axis still appear straight in the image, but the

observed distance of a point in the image from the principal point deviates from the

distance expected according to Eq. (1.1). The vector



ˆuˆv + k

+ 2 ˆu

)

+ 2 ˆv

) + 2k

ˆuˆv



(1.4)

is termed tangential distortion. The occurrence of tangential distortion implies that

straight lines in the object space crossing the optical axis appear bent in some direc-

tions in the image.

When a ﬁlm is used as an imaging sensor, ˆu and ˆv directly denote metric distances

on the ﬁlm with respect to the principal point, which has to be determined by an

appropriate calibration procedure (cf. Section 1.4). When a digital camera sensor is

used, the transformation

x =





(1.5)

6 1 Geometric Approaches to Three-dimensional Scene Reconstruction

from the image coordinate system into the sensor coordinate system is deﬁned in the

general case by an afﬁne transformation

T (as long as the sensor has no “exotic”

architecture such as a hexagonal pixel raster, where the transformation would be still

more complex). The corresponding coordinates

x = (u,v)

are measured in pixels.

At this point it is useful to deﬁne a projection function P



T,{c

}



which

projects a point

x deﬁned in the world coordinate system into the sensor coordinate

system of camera i by means of a perspective projection as deﬁned in Eq. (1.1) with

x = P



T,{c

}



. (1.6)

Since Eq. (1.1) is based on Euclidean geometry, it is nonlinear in z, implying that the

function P is nonlinear as well. It depends on the extrinsic camera parameters de-

ﬁned by the transformation

T and on the lens-speciﬁc and sensor-speciﬁc intrinsic

camera parameters {c

}

Formulation in Terms of Projective Geometry

To circumvent the nonlinear formulation of perspective projection in Euclidean

geometry, it is advantageous to express the image formation process in the more

general mathematical framework of projective geometry (Faugeras, 1993; Birch-

ﬁeld, 1998). A point x = (x,y,z)

in three-dimensional Euclidean space is repre-

sented in three-dimensional projective space by the homogeneous coordinates

x =

(X,Y, Z,W)

= (x,y,z,1)

. Overall scaling is unimportant, such that (X,Y, Z,W)

is equivalent to (

for any nonzero value of

. To recover the Eu-

clidean coordinates from a point given in three-dimensional projective space, the

ﬁrst three coordinates X, Y, and Z are divided by the fourth coordinate W according

to x = (X/W,Y/W,Z/W)

. The general transformation in three-dimensional pro-

jective space is a matrix multiplication by a 4 ×4 matrix. For the projection from a

three-dimensional world into a two-dimensional image plane a matrix of size 3 ×4

is sufﬁcient. Hence, analogous to Eq. (1.1), in projective geometry the projection of

a scene point

x deﬁned in the camera coordinate system C

into the image coordi-

nate system I

is given by the linear relation

x =





−b 0 0 0

0 −b 0 0

0 0 1 0





x. (1.7)

This formulation of perspective projection is widely used in the ﬁelds of computer

vision (Faugeras, 1993) and computer graphics (Foley et al., 1993). An important

class of projective transforms is deﬁned by the essential matrix, containing the

extrinsic parameters of two pinhole cameras observing a scene from two differ-

ent viewpoints. The fundamental matrix is a generalisation of the essential matrix

and contains as additional information the intrinsic camera parameters (Birchﬁeld,

1998). A more detailed explanation of the essential and the fundamental matrix will

1.2 Bundle Adjustment Methods 7

be given in Section 1.3 in the context of the epipolar constraint of stereo image

analysis.

In the formulation of projective geometry, the transformation from the world

coordinate system W into the camera coordinate system C

is deﬁned by the 3 ×4

matrix

| t

]. (1.8)

The projection from the coordinate system C

of camera i into the sensor coordinate

system S

is given by the matrix





cot

/sin

0 0 1





, (1.9)

with

, u

, and v

as the intrinsic parameters of the pinhole camera i. In

Eq. (1.9), the scale parameters

and

are deﬁned according to

= −bk

and

= −bk

. The complete image formation process can then be described in terms

of the projective 3×4 matrix P

which is composed of a perspective projection along

with the intrinsic and extrinsic camera parameters according to

x = P

x = A

| t

]

x, (1.10)

such that P

= A

| t

]. For each camera i, the linear projective transformation P

describes the image formation process in projective space.

1.2 Bundle Adjustment Methods

Most geometric methods for three-dimensional scene reconstruction from multiple

images are based on establishing corresponding points in the images. For a scene

point

x observed in N images, the corresponding image points

x in each image

i, where i = 1,...,N, can be determined manually or by automatic correspondence

search methods. Given the extrinsic and intrinsic camera parameters, each image

point

x deﬁnes a ray in three-dimensional space, and in the absence of measure-

ment errors all N rays intersect in the scene point

First general scene reconstruction methods based on images acquired from dif-

ferent views were developed e.g. by Kruppa (1913) and Finsterwalder (1899).

Overviews of these early methods are given by Astr¨om (1996) and Luhmann (2003).

They aim for a determination of intrinsic and extrinsic camera parameters and the

three-dimensional coordinates of the scene points. Kruppa (1913) presents an ana-

lytical solution for the scene structure and extrinsic camera parameters from a min-

imal set of ﬁve corresponding image points.

Classical bundle adjustment methods (Brown, 1958; Luhmann, 2003; Lourakis

and Argyros, 2004) jointly recover scene points and camera parameters from a set

of K corresponding image points. The measured image coordinates of the scene

8 1 Geometric Approaches to Three-dimensional Scene Reconstruction

points in the images of the N cameras are denoted by the sensor coordinates

where i = 1,...,N and k = 1,...,K. The image coordinates inferred from the extrin-

sic camera parameters

T, the intrinsic camera parameters {c

}

, and the K scene

point coordinates

are given by Eq. (1.6). Bundle adjustment corresponds to a

minimisation of the reprojection error

∑

i=1

∑

k=1



−1



T,{c

}



−





. (1.11)

The transformation by

−1

in Eq. (1.11) ensures that the backprojection error is

measured in Cartesian image coordinates. It can be omitted if a ﬁlm is used for

image acquisition, on which Euclidean distances are measured in a Cartesian coor-

dinate system, or as long as the pixel raster of the digital camera sensor is orthogonal

(

= 90

◦

) and the pixels are quadratic (

). This special case corresponds to

T in Eq. (1.5) describing a similarity transform.

The bundle adjustment approach can be used for calibration of the intrinsic and

extrinsic camera parameters, reconstruction of the three-dimensional scene struc-

ture, or estimation of object pose. Depending on the scenario, some or all of the

parameters

T, {c

}

, and

may be unknown and are obtained by a minimisa-

tion of the reprojection error E

with respect to the unknown parameters. As long as

the scene is static, utilising N simultaneously acquired images (stereo image analy-

sis, cf. Section 1.3) is equivalent to evaluating a sequence of N images acquired by

a single moving camera (structure from motion).

Minimisation of Eq. (1.11) involves nonlinear optimisation techniques such as

the Gauss-Newton or the Levenberg-Marquardt approach (Press et al., 1992). The

reprojection error of scene point

in image i inﬂuences the values of

T and

}

only for images in which this scene point is also detected, leading to a sparse

set of nonlinear equations. The sparsity of the optimisation problem is exploited

in the algorithm by Lourakis and Argyros (2004). The error function deﬁned by

Eq. (1.11) may have a large number of local minima, such that reasonable initial

guesses for the parameters to be estimated have to be provided. As long as no a-

priori knowledge about the camera positions is available, a general property of the

bundle adjustment method is that it only recovers the scene structure up to an un-

known constant scale factor, since an increase of the mutual distances between the

scene points by a constant factor can be compensated by accordingly increasing the

mutual distances between the cameras and their distances to the scene. However,

this scale factor can be obtained if additional information about the scene, such as

the distance between two scene points, is known.

Difﬁculties may occur in the presence of false correspondences or gross errors

of the determined point positions in the images, corresponding to strong deviations

of the distribution of reprojection errors from the assumed Gaussian distribution.

Lourakis and Argyros (2004) point out that in realistic scenarios the assumption of

a Gaussian distribution of the measurement errors systematically underestimates the

fraction of large errors. Searching for outliers in the established correspondences can

be performed e.g. using the random sample consensus (RANSAC) method (Fischler

1.3 Geometric Aspects of Stereo Image Analysis 9

and Bolles, 1981) in combination with a minimal case ﬁve point algorithm (Nister,

2004). Alternatively, it is often useful to reduce the weight of large reprojection er-

rors, which corresponds to replacing the L

norm in Eq. (1.11) by a suitable different

norm. This optimisation approach is termed M-estimator technique (Rey, 1983).

A further drawback of the correspondence-based geometric bundle adjustment

approach is the fact that correspondences can only be reliably extracted in textured

image parts, leading to a sparse three-dimensional reconstruction result in the pres-

ence of large weakly or repetitively textured regions.

1.3 Geometric Aspects of Stereo Image Analysis

The reconstruction of three-dimensional scene structure based on two images ac-

quired from different positions and viewing directions is termed stereo image anal-

ysis. In this section we will regard the “classicial” Euclidean approach to this impor-

tant ﬁeld of image-based three-dimensional scene reconstruction (cf. Section 1.3.1)

as well as its formulation in terms of projective geometry (cf. Section 1.3.2).

1.3.1 Euclidean Formulation of Stereo Image Analysis

In this section, we begin with an introduction in terms of Euclidean geometry, es-

sentially following the derivation described by Horn (1986). We assume that the

world coordinate system is identical with the coordinate system of camera 1, i.e.

the transformation matrix

T corresponds to unity while the relative orientation of

camera 2 with respect to camera 1 is given by

T and is assumed to be known. In

Section 1.4 we will regard the problem of camera calibration, i.e. the determination

of the extrinsic and intrinsic camera parameters. A point

x = ( ˆu

, ˆv

)

in image 1

corresponds to a ray through the origin of the camera coordinate system according

x =













ˆu

ˆv





, (1.12)

where s is assumed to be a positive real number. In the coordinate system of cam-

era 2, according to Eq. (1.2) the points on this ray have the coordinates

x =









= R

x+ t =





ˆu

+ r

ˆv

+ r

b)s+ t

ˆu

+ r

ˆv

+ r

b)s+ t

ˆu

+ r

ˆv

+ r

b)s+ t





(1.13)

with r

as the elements of the orthonormal rotation matrix R and t

as the elements

of the translation vector t (cf. Eq. (1.2)). In the image coordinate system of camera 2,

the coordinates of the vector

x = ( ˆu

, ˆv

)

are given by

剩余388页未读，继续阅读

hahaduoduo

粉丝: 0
资源: 7

3D计算机视觉：高效方法与应用探索

3D Computer Vision Efficient Methods and Applications

3D computer vision- Efficient methods and applications-Second Edition

computer vision: algorithms and applications pdf

computer vision algorithms and applications

Computer Vision and Pattern Recognition是期刊还是会议

computer vision toolbox的免费资源

Computer Vision Toolbox在MATLAB中怎么打开

如何测试computer vision toolbox安装成功

computer vision and image understanding ccf

怎么在MATLAB中安装Computer Vision Toolbox

最新资源