深度学习多尺度边缘保持图像拼接

71 浏览量更新于2024-08-03 收藏 5.03MB PDF 举报

"这篇论文探讨了边缘保护的图像拼接技术，通过多尺度深度 homography 方法来提升图像拼接的性能。作者 Lang Nie、Chunyu Lin、Kang Li、Yao Zhao 来自北京交通大学的信息科学研究所和先进信息科学与网络技术北京市重点实验室。文章在2021年6月8日提交，8月4日修订，12月12日被接受，并于12月17日在线发布。关键词包括：图像拼接、homography 估计、深度学习和计算机视觉。" 本文主要研究的是计算机视觉领域的一个经典且具有挑战性的问题——图像拼接。传统的图像拼接方法严重依赖特征检测，需要图像中的特征点密集且均匀分布，这在低纹理场景下导致鲁棒性较差。由于缺乏真实世界的拼接结果作为地面真相（ground truth），关于学习方法的研究相对较少，因此在现实世界数据集上的表现往往不可靠。为了解决这些问题，论文提出了一种基于深度学习的图像拼接框架。该框架包含一个多尺度深度 homography 模块，旨在增强图像的边缘保护，从而在处理不同纹理和复杂场景时提高拼接的质量和稳定性。homography 是一种几何变换模型，用于描述平面图像之间的一对一对应关系，通常用于解决图像拼接中的透视失真问题。深度学习在图像处理领域已展现出强大的能力，通过训练神经网络模型，可以自动学习和提取特征，而无需手动设计复杂的特征检测算法。在图像拼接中，深度学习模型能够学习到更丰富的图像特征，适应各种条件下的图像匹配，从而提高拼接的准确性和鲁棒性。论文中提到的多尺度方法可能意味着模型会在不同分辨率或尺度上处理输入图像，这有助于捕捉不同大小的图像细节，同时减少由于尺度变化导致的匹配错误。通过这样的方式，即使在特征点稀疏或不均匀分布的情况下，也能实现高质量的图像拼接。这篇论文为图像拼接提供了一个新的深度学习解决方案，特别是在处理低纹理和复杂环境的图像时，有望改善传统方法的局限性，提高拼接的稳定性和视觉效果。这一研究对于推动计算机视觉领域的进步，特别是在无人机航拍、全景图像生成和虚拟现实应用等领域，具有重要的理论和实践意义。

preserving warping (CPW) [31] to align overlapping regions for

small local adjustment while using the homography to maintain

the global image structure. Different from aligning pixels of the

overlapping area, Lin et al. [28] proposed to ﬁnd a local area to

stitch images, which can protect the curves and lines during

stitching.

Although traditional image stitching methods have achieved

promising performance, they cannot handle low-texture scenarios.

2.2. Deep image stitching

Deep image stitching is still in development, since the labeled

data is hard to collect. In [37,50], synthetic datasets are proposed

to solve this problem. Besides, a content revision network is pro-

posed to generate the stitched image after image registration in

[37].

However, the performance of these methods in real-world data-

sets is not reliable and the resolution of the network input is

limited.

2.3. Deep Homography schemes

Homography estimation is an important part of image stitching,

and deep homography can also be regarded as a signiﬁcant step in

deep image stitching. The deep homography solution was ﬁrst pro-

posed in [8], where a synthetic dataset and a VGG-style solution

are put forward together. Then, Nguyen et al. [36] proposed an

unsupervised version for [8], in which a photometric loss is

adopted to measure the pixel error between warped images. Le

et al. [22] and Zhang et al. [48] proposed content-aware networks

to reject parallax regions and dynamic areas. And deep Lucas-

Kanade networks [3,51] are also presented to align a template

image with a source image. Besides, Koguciuk et al. [20] propose

to increase the robustness using perceptual loss. Ye et al. [45]

replace homography offset with motion basis to enhance the esti-

mation performance.

Nevertheless, when it comes to scenes of low overlap rates, The

performance of these solutions drops because of the limited recep-

tive ﬁelds of convolutional layers.

3. Our method

In this section, we discuss our multi-scale deep homography

module, edge-preserved deformation module, and size-free

schemes, respectively.

3.1. Multi-scale deep homography

Although deep homography methods in scenes of high overlap

rates [8,36,48,22,3] have outperformed traditional solutions, deep

homography estimation in scenes of low overlap rates is still chal-

lenging due to the limited receptive ﬁelds of neural networks. To

overcome this challenge, the proposed multi-scale deep homogra-

phy network integrates feature pyramid and feature correlation

into a network, increasing the utilization of feature maps and

expanding the receptive ﬁeld, respectively. The architecture of

the proposed multi-scale deep homography network is shown in

Fig. 2.

Feature Pyramid. After the images are fed into our network,

they will be processed by 8 convolutional layers, where the num-

ber of ﬁlters per layer is set to 64, 64, 128, 128, 256, 256, 512,

and 512, respectively. A max-pooling layer is adopted every two

convolutional layers to represent multi-scale features as

F; F

1=2

; F

1=4

, and F

1=8

. As shown in Fig. 2, we select F

1=2

; F

1=4

, and

1=8

to form a three-layer feature pyramid. The features of each

layer in the pyramid are used to estimate the homography, and

we transmit the estimated homography of the upper layer to the

lower layer to enhance the prediction accuracy progressively.

Besides, among the features of the four scales, the features of three

scales will be used for subsequent homography regression, signif-

icantly improving the feature utilization.

Feature Correlation. To increase the receptive ﬁelds of our net-

work, the feature correlation layer [38,14,39,18] is used here to

strengthen feature matching explicitly. Formally, the correlation c

between the reference feature F

2 W

 H

 C

and the target fea-

ture F

2 W

 H

 C

can be calculated as,

; x



< F



; F



jjF



; x

2 Z

; ð1Þ

where x

; x

are the 2-D spatial location in F

and F

, respectively.

Specifying the search radius on the axis of width (or height) as R

(or R

), we obtain c 2 W

 H

 2R

þ 1ðÞ2R

þ 1ðÞ[43] by Eq. 1.

Speciﬁcally, we calculate the global correlation by setting R

(or

) equal to W

(or H

), and we calculate the local correlation when

(or R

) is less than W

(or H

). By applying global correlation and

local correlation to our network, we predict the homography pro-

gressively from global to local.

After extracting pyramid features and calculating feature corre-

lations, we adopt a simple regression network that comprises three

convolutional layers and two fully connected layers to predict

eight vertices’ offsets of the target image that can uniquely deter-

mine a homography. To be more speciﬁc, every layer of our three-

layer pyramid predicts the residual offsets

; i ¼ 1; 2; 3. Every fea-

ture correlation in the pyramid is only calculated between the

warped target feature and the reference feature rather than

between the target feature and the reference feature. In this way,

each layer in the pyramid only learns to predict the residual

homography offsets instead of the complete offsets. And

can

be calculated as follows:

¼ H

4pt

1=2

4i

; W F

1=2

4i

; DLT

i1

n¼0

!*+()

; ð2Þ

where H

4pt

is the operation of estimating the residual offsets from

the reference feature map and the warped target feature map. W

warps the target feature map using the homography and DLT

converts the offsets to the corresponding homography. We specify

¼ 0, which means all predicted offsets are 0. The ﬁnal predicted

offsets can be calculated as follows:

wh

: ð3Þ

After that, image registration can be implemented by solving

the homography and warping the input images.

Objective Function: Our multi-scale deep homography is

trained in a supervised manner. Given the ground truth offsets

wh

, we designed the following objective function:

¼ w

wh





þw

wh





þw

wh





;

ð4Þ

where the w

; w

, and w

represent the weights of each layer in the

three-layer pyramid.

3.2. Edge-preserved deformation network

Stitching images with a global homography can easily produce

artifacts in scenes with parallax. To eliminate the ghosting effects,

L. Nie, C. Lin, K. Liao et al.

Neurocomputing 491 (2022) 533–543

535

剩余10页未读，继续阅读

Seung-YimYau

粉丝: 302
资源: 16

深度学习多尺度边缘保持图像拼接

Adaptive tone-preserved image detail enhancement

Generation of edge-preserved noise-added phase-only hologram

python小爬虫.zip

最全的JAVA设计模式，包含原理图解+代码实现.zip

CPPC++_世界上最快的3d贴图转换工具.zip

【风电】基于TCN-BiGRU的风电功率单变量输入多步预测研究附Matlab代码.rar

CPPC++_OSGI for C 通往架构师之路.zip

童心派贪吃蛇游戏pygame版

Matlab实现雪融优化算法SAO-TCN-Multihead-Attention多输入单输出回归预测算法研究.rar

python学习代码2KL.zip

最新资源