多视角三维重建教程

5星 · 超过95%的资源需积分: 49 71 浏览量更新于2024-07-16 2 收藏 21.42MB PDF 举报

"Multi-View Stereo: A Tutorial" 是一篇由Yasutaka Furukawa（华盛顿大学圣路易斯分校）和Carlos Hernández（谷歌公司）合著的技术论文，主要探讨了多视图立体（Multi-View Stereo, MVS）技术在三维重建中的应用和理论。这篇论文发表在《计算机图形学与视觉的基础与趋势》(Foundations and Trends in Computer Graphics and Vision)期刊的第9卷第1-2期，共计148页，出版于2013年。多视图立体是计算机视觉领域的一个重要研究方向，它利用多个不同视角拍摄的图像来恢复场景的三维几何信息。该技术的核心在于通过匹配不同视图中的特征，构建一个稠密的深度图，从而实现三维重建。在实际应用中，多视图立体广泛用于考古、建筑、电影制作、机器人导航以及虚拟现实等领域。论文“Multi-View Stereo: A Tutorial”可能涵盖了以下几个关键知识点： 1. **图像配准**：这是多视图立体的基础，通过对不同视角的图像进行精确对齐，确保同一场景点在不同图像中对应一致。 2. **特征检测与匹配**：包括SIFT、SURF等经典特征描述符的使用，以及如何在不同图像间寻找对应点，以确定视图间的对应关系。 3. **稀疏到稠密深度重建**：从初始的特征匹配结果开始，通过扩展和优化，逐渐增加深度图的密度，直至获得整个场景的稠密深度信息。 4. **立体匹配算法**：如基于成本聚合的算法（如半全局匹配，Semi-Global Matching）和基于深度优化的方法，这些算法用于计算每个像素的深度值。 5. **几何一致性检查**：通过验证从不同视图恢复的深度信息是否一致，去除不合理的深度估计，提高重建的精度。 6. **后处理技术**：包括深度图平滑、空洞填充等步骤，以提高重建结果的质量。 7. **多视图几何**：如基础矩阵和本质矩阵的计算，以及如何利用它们来恢复场景的三维结构。 8. **实时与大规模三维重建**：讨论在处理大量数据或要求实时性能的场景下，如何优化算法以满足需求。 9. **应用实例**：论文可能会展示MVS技术在实际项目中的应用，如文化遗址的数字化、电影特效的制作等。这篇教程性论文不仅介绍了多视图立体的基本原理，还可能深入讨论了相关技术的最新进展和挑战，对于学习和研究三维重建的读者来说，是一份非常宝贵的参考资料。

展开

6 Introduction

Figure 1.3: Diﬀerent MVS capture setups. From left to right: a controlled MVS

capture using diﬀuse lights and a turn table, outdoor capture of small-scale scenes,

and crowd-sourcing from online photo-sharing websites.

In the chapter we will give more insight into the ﬁrst three main

stages of MVS: imagery collection, camera parameters estimation, and

3D geometry reconstruction. Chapter 2 develops the notion of photo-

consistency as the main signal being optimized by MVS algorithms.

Chapter 3 presents and compares some of the most successful MVS al-

gorithms. Chapter 4 discusses the use of domain knowledge, in particu-

lar, structural priors in improving the reconstruction quality. Chapter 5

gives an overview of successful applications, available software, and best

practices. Finally Chapter 6 describes some of the current limitations

of MVS as well as research directions to solve them.

1.1 Imagery collection

One can roughly classify MVS capture setups into three categories (See

Figure 1.3):

• Laboratory setting,

• Outdoor small-scale scene capture,

• Large-scale scene capture using ﬂeets or crowd-sourcing, e.g.,

cars, planes, drones, and Internet.

MVS algorithms ﬁrst started in a laboratory setting [184, 147, 58],

where the light conditions could be easily controlled and the camera

1.1. Imagery collection 7

could be easily calibrated, e.g. from a robotic arm [165], rotation ta-

ble [93], ﬁducial markers [2, 43, 192], or early SfM algorithms [62]. MVS

algorithms went through two major developments that took them to

their current state: They left the laboratory setting to a small-scale

outdoor scenes [174, 102, 85, 169, 190], e.g. a building facade or a foun-

tain, then scaled up to much larger scenes, e.g. entire buildings and

cities [129, 153, 97, 69].

These major changes were not solely due to the developments in the

MVS ﬁeld itself. It was a combination of new hardware to capture bet-

ter images, more computation power, and scalable camera estimation

algorithms.

Improvements in hardware: Two areas of hardware improvements

had the most impact on MVS: digital cameras and computation power.

Digital photography became mainstream and image digital sensors con-

stantly improved in terms of resolution and quality. Additionally, mass

production and miniaturization of geo positioning sensors (GPS) made

them ubiquitous in digital cameras, tablets, and mobile phones. Al-

though the precision of commercial units is not enough for MVS pur-

poses, it does provide an initial estimate on camera parameters that

can be reﬁned using Computer Vision techniques. The second signiﬁ-

cant hardware improvement was computation power. The rise of inex-

pensive computer clusters [5] or GPU general computation [6] enabled

SfM algorithms [25, 64] and MVS algorithms [69] to easily handle tens

of thousands of images.

Improvements in Structure-from-Motion algorithms: Re-

searchers have been working on visual reconstruction algorithms for

decades [183, 182]. However, only relatively recently have these tech-

niques matured enough to be used in large-scale industrial applications.

Nowadays industrial algorithms are able to estimate camera parameters

for millions of images. Two slightly diﬀerent techniques have made great

progress in recent years: Structure-from-Motion (SfM) [88] and Visual

Simultaneous Localization and Mapping (VSLAM) [53]. Both rely on

the correspondence cue and the assumption that the scene is rigid. SfM

is most commonly used to compute camera models of unordered sets

of images, usually oﬄine, while VSLAM specializes in computing the

10 Introduction

Figure 1.4 bottom left). Undistorting the images simpliﬁes the MVS

algorithm and often leads to faster processing times. Some cameras,

e.g. those in mobile phones, incorporate dedicated hardware to remove

radial distortion during the processing of the image just after its cap-

ture. Note however that rectifying wide-angle images will introduce

resampling artifacts as well as ﬁeld of view cropping. To avoid these is-

sues MVS pipelines can support radial distortion and more complicated

camera models directly, at the expense of extra complexity.

Finally, rolling shutter is another source of complexity particularly

important for video processing applications (See Figure 1.4 right). A

digital sensor with an electronic rolling shutter exposes each row of an

image at slightly diﬀerent times. This is in contrast to global shutters

where the whole image is exposed at the same time. A rolling shut-

ter often provides higher sensor throughput at the expense of a more

complicated camera model. As a result, if the camera or the scene are

moving while capturing the image, each row of the image captures ef-

fectively a slightly diﬀerent scene. If the camera or scene motion is slow

w.r.t. the shutter speed, rolling shutter eﬀects can be small enough to

be ignored. Otherwise the camera projection model needs to incorpo-

rate the eﬀects [63].

1.3 Structure from Motion

There is a vast literature on Structure-from-Motion algorithms, and it

is not our intention to throughly review it here. In the following, we

will discuss some of the key aspects of SfM and how they relate to MVS

algorithms.

SfM algorithms take as input a set of images and produce

two things: the camera parameters of every image, and a set

of 3D points visible in the images which are often encoded as

tracks. A track is deﬁned as the 3D coordinates of a recon-

structed 3D point and the list of corresponding 2D coordinates in

a subset of the input images. Most of the current state-of-the-art

SfM algorithms share the same basic processing pipeline (See Fig-

ure 1.5):

剩余163页未读，继续阅读

身份认证购VIP最低享 7 折!

30元优惠券

瘦瘦的五花肉

粉丝: 1

多视角三维重建教程

A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms

MVS基础教程（中文）

Geometry-Enhanced Attentive Multi-View Stereo.pdf

RobustMVS Single Domain Generalized Deep Multi-View Stereo.pdf

EI-MVSNet Epipolar-Guided Multi-View Stereo Network.pdf

Learnable Cost Metric-Based Multi-View Stereo for.pdf

Multi-View Stereo Representation Revisit Region-Aware MVSNet.pdf

CL-MVSNet Unsupervised Multi-view Stereo with Dual-level.pdf

MVCPS-NeuS Multi-View Constrained Photometric.pdf

Visibility-Aware_Point-Based_Multi-View_Stereo_Network.pdf

最新资源