深度学习无监督图像拼接：对视差的容忍方法

179 浏览量更新于2024-06-15 收藏 9.18MB PDF 举报

"Parallax-Tolerant Unsupervised Deep Image Stitching是ICCV 2023会议上的一篇论文，由Lang Nie、Chunyu Lin、Kang Liao、Shuaicheng Liu、Yao Zhao等人共同撰写，来自北京交通大学、北京先进信息科学与网络北京市重点实验室和电子科技大学。该研究主要关注无监督深度图像拼接中的视差容忍问题。" 在图像拼接领域，传统的方法通常依赖于复杂的几何特征（如点、线、边缘等）来提升性能。然而，这些手工设计的特征在处理大视差或低纹理场景时存在局限性。例如，Unsupervised Deep Image Stitching (UDIS) 方法在面对大视差时会通过模糊处理视差区域来解决问题，但这会导致图像细节丢失（如图1(a)所示）。另一方面，Local Perspective Correction (LPC) 等传统方法在缺乏足够几何特征的低纹理场景中可能失效（如图1(b)所示）。本文提出了一种新的解决方案，旨在克服这些限制，能够在具有挑战性的大视差和低纹理环境中实现令人满意的拼接效果。这一创新方法的核心在于其对视差的容忍能力，它能够无需监督学习的情况下，更好地处理不同视角间的图像融合，从而避免了传统方法中的模糊处理或失败情况。作者们构建了一个名为UDIS-D的数据集，其中包含大视差和低纹理的图像，用于测试和验证他们的算法。实验结果表明，与现有方法相比，该方法在保留图像细节和处理复杂场景方面具有显著优势，能提供更高质量的拼接图像。这为未来的图像处理和计算机视觉应用，特别是在无人机航拍、全景摄影和虚拟现实等领域，提供了新的技术可能性。这篇论文贡献了以下几点： 1. 提出了一种视差容忍的无监督深度图像拼接方法，解决了传统方法在处理大视差和低纹理场景时的不足。 2. 设计了一种能够自动学习并适应不同视差环境的模型，无需依赖手动设计的几何特征。 3. 创建了UDIS-D数据集，为研究大视差图像拼接提供了宝贵的资源。 4. 实验结果证明了该方法的有效性和广泛的应用潜力，尤其是在具有挑战性的图像条件下的表现。这项工作对于推动图像处理技术的发展，特别是无监督学习在图像拼接中的应用，具有重要的理论价值和实际意义。

Composition

Global

Transformation

Local

Transformation

Decoder

Encoder

Weights

Detail

Warp via H

Weights

w w

- +

Subtraction

Addition

Composition

Warp via TPS

Legend

Stitched image (S)

Contextual

Correlation Layer

1/16

Regression

Net 1

1/8

Warped F

1/8

Warp

Contextual

Correlation Layer

TPS

/ I

Regression

Net 2

4-pt motion

DLT

TPS

Solving

(Eq. 4)

Residual control

point motion

Initial

motion

Warp

ResNet50

Figure 2: An overview of the proposed parallax-tolerant unsupervised stitching network. Our framework consists of two

stages: warp and composition. The ﬁrst stage predicts a robust and ﬂexible warp to align images with shape preservation.

The second stage composites the seamless stitched image by generating composition masks corresponding to warped images.

3.1.2 Pipeline of Warp

As shown in Fig.2, given I

, I

, we adopt ResNet50 [17]

with pretrained parameters as our backbone to extract se-

mantic features ﬁrst. It maps a 3-channel image to the high-

dimensional semantic features with a resolution scaled to

1/16 of the original. Then the correlation between these

feature maps (F

1/16

and F

1/16

) can be aggregated into 2-

channel feature ﬂows using the contextual correlation layer

[43]. Subsequently, a regression network is used to esti-

mate the 4-pt parameterization of the homography warp.

This global warp also generates the initial motions of con-

trol points.

Next, we warp the feature maps with higher resolution

1/8

) to embed the homographic prior into the following

workﬂow. After another contextual correlation layer and

regression network, the residual motions of control points

are predicted, contributing to a robust ﬂexible TPS warp.

3.1.3 Optimization of Warp

To achieve content alignment and shape preservation simul-

taneously, we design our objective function L

concerning

two aspects: alignment and distortion.

= L

alignment

+ ωL

distortion

. (5)

For the alignment, we encourage the overlapping regions

to keep consistent at the pixel level. Denoting φ(·, ·) is the

warping operation and 1 an all-one matrix with the same

resolution as I

, the alignment loss can be deﬁned as fol-

lows:

alignment

=λ∥I

· φ(1, H) − φ(I

, H)∥

λ∥I

· φ(1, H

−1

) − φ(I

, H

−1

)∥

∥I

· φ(1, T PS) − φ(I

, T PS)∥

(6)

where H and T PS are warp parameters, and λ is a hyperpa-

rameter to balance the impacts of different transformations.

For the distortion, we link adjacent control points in

the warped target image to form a mesh and introduce

an inter-grid constraint ℓ

inter

and an intra-grid constraint

ℓ

intra

. The former preserves geometric structures for non-

overlapping regions, while the latter reduces projective dis-

tortions. In the beginning, we approximate a similar trans-

formation by DLT for every grid in non-overlapping regions

and take the 4-pt projective error as the loss. But this con-

straint that is commonly used in traditional methods [16, 37]

does not work in deep learning schemes. Instead, we re-

explore the constraints from a more intuitive perspective —

the grid edge.

Similar to [42], we penalize the grid edge e with the mag-

nitude exceeding a threshold. Denoting {e

hor

} and {e

v er

}

are the collections of horizontal and vertical edges, we de-

scribe the intra-grid constraint as follows:

ℓ

intra

(U+1)×V

{e

hor

}

σ(⟨e,



i⟩ −

U×(V +1)

{e

ver

}

σ(⟨e,



j⟩ −

(7)

where



i /



j is the horizontal/vertical unit vector, and σ(·) is

the RELU function. The projective distortions are reduced

by preventing the grid shape from dramatic scaling.

By encouraging the edge pairs (successive edges in hor-

izontal or vertical directions, denoted as e

, e

) to be co-

linear, we formulate the inter-grid constraint as:

ℓ

inter

{e

,e

}

s1,s2

· (1 −

⟨e

, e

⟩

∥ e

∥ · ∥ e

∥

), (8)

where Q is the number of edge pairs and S

s1,s2

is a 0-1 label

that is set to 1 if this edge pair locates on non-overlapping

regions. We only preserve the structures in non-overlapping

regions, preventing adverse effects on the alignment.

剩余19页未读，继续阅读

Seung-YimYau

粉丝: 302
资源: 16

深度学习无监督图像拼接：对视差的容忍方法

Deshpande-Learning-Large-Scale-Automatic-ICCV-2015-paper

ICCV-2019-Paper-Statistics:接受率的统计和可视化，ICCV 2019主要计算机视觉会议（ICCV）接受论文的关键词

ICCV2013-Hybrid Deep Learning for Face Verification

iccv2023+遥感

iccv2023 医学图像分割

ICCV 2023三维视觉方向

你知道2021-ICCV_TRANSREID_TRANSFORMER-BASED-OBJECT-RE-IDENTIFICATION这篇文章嘛

深度学习无监督图像分割综述

找找最近两年ECCV ICCV CVPR中，关于transformer在遥感方面的论文

目前有哪些top-down方法的姿态估计网络，按年份梳理

最新资源