6186 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021
A. Feature-Based Image Stitching
According to different strategies to eliminate artifacts,
the feature-based image stitching algorithms can be divided
into the following two categories:
1) Adaptive Warping Methods: Considering that a single
transformation model is not enough to accurately align images
with parallax, the idea of combining multiple parametric
alignment models to align the images as much as possible
is introduced. In [11], the dual-homography warping (DHW)
is presented to align the foreground and the background,
respectively. This method works well in the scene composed
of two p redominating planes but shows poor performance in
more complex scenes. Lin et al. [12] apply multiple smoothly
varying affine (SVA) transformations in different regions,
enhancing local deformation and alignment performance.
Zaragoza et al. [13] propose the as-projective-as-possible
(APAP) approach, where an image can be partitioned into
dense grids, and each grid would b e allocated a corresponding
homography by weighting the features. In fact, APAP would
still exhibit parallax artifacts in the vicinity of the object
boundaries, for dramatic depth changes might occur in these
areas. To get rid of this problem, the warping residual vectors
are proposed to distinguish matching features from different
depth planes in [19], contributing to more naturally stitched
images.
2) Seam-Driven Methods: Seam-driven image stitching
methods are also influential, ac quiring natural stitched images
by hiding the artifacts. Inspired by the idea of interactive
digital photomontage [39], Gao et al. [24] propose to choose
the best homography with the lowest seam-related cost from
candidate homography matrices. Then the artifacts are hidden
through seam cutting. Referring to the optimization strategy of
content-preserving warps (CPW) [40], Zhang and Liu [22] pro-
pose a seam-based local alignment approach while maintaining
the global image structure using an optimal homography. This
work was also extended to stereoscopic image stitch ing [41].
Using the iterative warp and seam estimation, Lin et al. [23]
find the optimal local area to stitch images, wh ich can protect
the curve and line structure during image stitching.
These feature-based algorithms contribute to perceptually
nature stitched results. However, they rely heavily on th e
quality of featu re detection, often failing in scenes with few
features or at low resolution.
B. Learning-Based Image Stitching
Getting a real dataset for stitching is difficult. In additio n,
deep stitching is quite challenging for the scenes with low
overlap rate and large parallax. Subjected to these two prob-
lems, learn ing-based image stitching is still in development.
1) View-Fixed Methods: View-fixed image stitching meth-
ods are task-driven, which are designed for the specific appli-
cation scenes such as autonomous driving [6], [7], surveillance
videos [4]. In these works, the end-to-end networks are pro-
posed to stitch images from fixed views while they cannot be
extended to stitch images from arbitrary views.
2) View-Free Methods: To stitch images from arbitrary
views u sing CNNs, some researchers propose to adopt CNNs
in the stage of feature detection [32], [33]. However, these
methods cannot be regarded as a complete learning-based
framework strictly. The first complete learning-based frame-
work to stitch images from arbitrary views was proposed
in [35]. The images can be stitched through three stages:
homography estimation, spatial transformation, and content
refinement. Nevertheless, this work cannot handle input
images with arbitrary resolutions d ue to the fully connected
layers in the network, and the stitching quality in real appli-
cations is unsatisfying. Following this deep stitching pipeline,
an edge-preserved deep image stitching solution was proposed
in [36], freeing the limitation of input resolution and signifi-
cantly improving the stitch ing perfo rmance in real scenes.
C. Deep Homography Schemes
The first deep homography method was put forward in [42],
where a VGG-style [27] network was used to predict the eight
offsets of four vertices of an image, thus uniquely determine
a corresponding homography. Nguyen et al. [37] proposed
the first unsupervised deep homography approach with the
same architecture as [42] with an effective unsupervised loss.
Introducing spatial attention to deep homography network,
Zhang et al. [38] proposes a content-aware unsupervised
network, contributing to SOTA performance in small-baseline
deep homography. In [43], multi-scale featur es are extracted
to predict the homography from coarse to fine using image
pyramids.
Besides that, the deep homography network is usually
adopted as a part of the view-free image stitching frameworks
[35], [36]. Different from [37], [38], [42], [43], the deep
homography in image stitching is more ch allenging, for the
baseline between input images is usually 2X∼3X larger.
III. U
NSUPERVISED COAR S E IMAGE ALIGNMENT
Given two high-resolution input images, we first estimate
the homography using a deep homography network in an
unsupervised manner. Then the input images can be warped
to align each other coarsely in the proposed stitching-domain
transformer layer.
A. Unsupervised Homography
The existing unsupervised deep homography methods [37],
[38] take the image patches as the input, which is shown in
the white squares in Fig. 3(a). The objective function of these
methods can be expressed as Eq. (1):
L
PW
=
P(I
A
) − P(H(I
B
))
1
, (1)
where I
A
, I
B
represent the full images of the reference image
and the target image, respectively. P (·) is the operation of
extracting an image patch from a full image, and H(·) warps
one image to align with the other using estimated homography.
From Eq. (1), we can see that to make the warped target
patch close to the reference patch, the extra contents around
the target patch are utilized to pad the invalid pixels in the
warped target patch. We call it a padding-based constraint
strategy. This strategy works well in small-baseline [38],
Authorized licensed use limited to: National University of Singapore. Downloaded on September 19,2023 at 02:07:50 UTC from IEEE Xplore. Restrictions apply.