递归多视图立体重建网络：高分辨率深度推断

需积分: 10 9 浏览量更新于2024-07-16 收藏 4.67MB PDF 举报

"RMVS,即Recurrent Multi-view Stereo Network，是一种针对高分辨率多视图立体深度推断的深度学习框架，旨在解决当前基于学习的多视图立体（MVS）方法在处理高分辨率场景时的可扩展性问题。" 在多视图立体重建中，深度学习已展现出其卓越的表现，但目前的学习方法主要受限于内存消耗大的体积正规化，这使得应用到高分辨率场景的MVS变得困难。为此，研究者提出了一种基于循环神经网络的可扩展多视图立体框架——R-MVSNet。R-MVSNet摒弃了传统的一步到位的整个3D成本体积正规化方法，转而采用门控循环单元（GRU）沿着深度方向逐步对2D成本图进行正规化。这种策略极大地降低了内存消耗，使得高分辨率重建成为可能。 R-MVSNet的核心在于其递归结构，通过GRU在深度维度上序列化地处理2D成本图，有效地减少了计算复杂度，解决了内存瓶颈问题。这一创新设计不仅优化了内存使用，还保持了重建质量。论文展示了R-MVSNet在实现状态-of-the-art性能的同时，能够处理高分辨率的图像数据，这在之前的方法中是难以实现的。此外，R-MVSNet的提出还意味着对于大型、复杂的场景，如城市景观或自然环境的3D重建，可以更高效地进行。通过循环神经网络的动态处理，该网络能适应不同深度层次的信息，提高了重建精度，同时也提升了处理大规模数据的效率。 R-MVSNet为多视图几何重建提供了新的解决方案，它在保持深度学习方法的优势同时，解决了高分辨率场景下的内存限制问题，有望推动高精度、大范围的3D重建技术的发展。未来的研究可能将集中在进一步提高网络的效率和泛化能力，以及在实际应用中如自动驾驶、虚拟现实等领域中的部署。

Feature Extraction Loss Computation

C(0)

…

C(1) C(2)

C(D-1)

(0)

(1) C

(2) C

(D-1)

…

Recurrent Regularization

GT Depth Map

One-hot

Loss

…

Softmax

Conv + BN + ReLU, stride = 1

Conv, stride = 1

Conv + BN + ReLU, stride = 2

GRU unit

Cost Maps

Regularized Cost Maps

Differentiable Homography Warping

&Variance Cost Metric

M M

Figure 2: The R-MVSNet architecture. Deep image features are extracted from input images and then warped to the fronto-

parallel planes of the reference camera frustum. The cost maps are computed at different depths and are sequentially regular-

ized by the convolutional GRU. The network is trained as a classiﬁcation problem with the cross-entropy loss

the end of MVSNet to further enhance the depth map qual-

ity. As deep image features {F

}

i=1

are downsized during

the feature extraction, the output depth map size is 1/4 to

the original image size in each dimension.

MVSNet has shown state-of-the-art performance on

DTU dataset [1] and the intermediate set of Tanks and

Temples dataset [17], which contain scenes with outside-

looking-in camera trajectories and small depth ranges.

However, MVSNet can only handle a maximum reconstruc-

tion scale at H ×W ×D = 1600 ×1184 ×256 with the 16

GB large memory Tesla P100 GPU, and will fail at larger

scenes e.g., the advanced set of Tanks and Temples. To

resolve the scalability issue especially for the wide depth

range reconstructions, we will introduce the novel recurrent

cost volume regularization in the next section.

3.2. Recurrent Regularization

Sequential Processing An alternative to globally regu-

larize the cost volume C in one go is to sequentially pro-

cess the volume through the depth direction. The simplest

sequential approach is the winner-take-all plane sweeping

stereo [7], which crudely replaces the pixel-wise depth

value with the better one and thus suffers from noise (Fig. 1

(a)). To improve, cost aggregation methods [29, 31] ﬁlter

the matching cost C(d) at different depths (Fig. 1 (b)) so as

to gather spatial context information for each cost estima-

tion. In this work, we follow the idea of sequential process-

ing, and propose a more powerful recurrent regularization

scheme based on convolutional GRU. The proposed method

is able to gather spatial as well as the uni-directional con-

text information in the depth direction (Fig. 1 (c)), which

achieves regularization results comparable to the full-space

3D CNNs but is much more efﬁcient in runtime memory.

Convolutional GRU Cost volume C could be viewed as

D cost maps {C(i)}

i=1

concatenated in the depth direc-

tion. If we denote the output of regularized cost maps

as {C

(i)}

i=1

, for the ideal sequential processing at the

step, C

(t) should be dependent on cost maps of the

current step C(t) as well as all previous steps {C(i)}

t−1

i=1

Speciﬁcally, in our network we apply a convolutional vari-

ant of GRU to aggregate such temporal context information

in depth direction, which corresponds to the time direction

in language processing. In the following, we denote ‘’ as

the element-wise multiplication, ‘[]’ the concatenation and

‘∗’ the convolution operation. Cost dependencies are for-

mulated as:

(t) = (1 − U(t))  C

(t − 1) + U(t)  C

(t) (1)

where U(t) is the update gate map to decide whether to up-

date the output for current step, C

(t −1) is the regularized

cost map of late step, and C

(t) could be viewed as the

updated cost map in current step, which is deﬁned as:

(t) = σ

∗ [C(t), R(t)  C

(t − 1)] + b

) (2)

R(t) here is the reset gate map to decide how much the pre-

vious C

(t − 1) should affect the current update. σ

(·) is

剩余14页未读，继续阅读

sdu_scx

粉丝: 3
资源: 1

递归多视图立体重建网络：高分辨率深度推断

上海人民电气RMVS1-12kV系列户内高压真空断路器产品样本201512.pdf

上海电气RMVS1-24样本.pdf

2进制3位数过去现在将来输赢公式代码.txt

福州大学在广东2021-2024各专业最低录取分数及位次表.pdf

WordPress 集网址、资源、资讯于一体的导航类主题开心版

【Java学习】activemq消息中间件学习demo.zip

爬取淘宝热销(热门)手机支架商品信息公开透明的数据集

【目标检测数据集】斧子数据集2396张VOC+YOLO格式（含增强60%）.zip

南昌大学科学技术学院在广东2021-2024各专业最低录取分数及位次表.pdf

南京财经大学红山学院在广东2021-2024各专业最低录取分数及位次表.pdf

最新资源