Fast Implementation of Image Mosaicing on GPU
Yixiang Lu
1
, Qingwei Gao
1,∗
, Shuai Chen
1
1
School of Electrical Engineering and Automation,
Anhui University, Hefei 230601, China
Dong Sun
1
, Yi Xia
1
, Xueming Peng
1,2
2
Shanghai Huawei Technology
Co., Ltd, Shanghai 200120, China
Abstract—Image mosaicing has been studied and widely used
in many fields of computer science, but there exists a huge
amount of computations involved in steps of feature matching,
warping and blending. And thus it could not meet the real-
time demands of some applications. Fortunately, some related
parallel operations which can speed up the process of mosaicing
have been developed and implemented on the Graphics Processor
Unit (GPU). In this paper, we present a parallel implementation
of image mosaicing based on GPU using the Compute Unified
Device Architecture (CUDA). We obtain better results in terms
of execution time than that of implementation on the central
processing unit (CPU). When an integrated GPU GTX745 was
used in the experiment, we achieved a speedup ratio up to 27.6
times for large input images.
Index Terms—Image mosaicing; Matching; Parallel; Graphics
Processor Unit (GPU).
I. INTRODUCTION
Image mosaicing is an active area of research in the fields of
photogrammetry, computer vision, image processing and com-
puter graphics. It can be defined as a process of constructing
panoramic image mosaics from a sequence of partial images
obtained from different views [1]. The initial application of
image mosaicing mainly focuses on the construction of large
aerial and satellite photographs from collection of images
[2]. Nowadays, a variety of new applications of mosaicing
have been emerged, including scene stabilization and change
detection [3], increasing the field of view and resolution
[4], video compression [5], wide-area video surveillance [6],
the construction of virtual environments [7] and image-based
rendering [8]. A typical mosaicing process mainly consists of
three different steps of image processing, that is, registration,
warping and interpolation, and blending. Image registration is
the key task of image mosaicing [9]. Registration refers to the
establishment of a geometric transformation between a pair
of images depicting the same scene, and the transformation is
determined by an 8 degrees of freedom planar homography.
If the homography have some errors, it will result in image
misalignment and make it difficult to the subsequent blending.
To ensure the elements of the homography to be more accurate,
we must search for the best correct matching feature points
which are used to estimated the homography. However, the
searching process is computationally extremely expensive,
especially for the images with large sizes. Moreover, when
the mosaicing technique is used to video processing (e.g. video
indexing and wide-area video surveillance) which contains a
great large number of images, the mosaicing speed is very
important in such practical applications.
In recent years, the Graphics Processor Unit (GPU) has
attracted researches’ attention in many fields for its massive
parallel computational power. Using the GPU as a copro-
cessor to accelerate the algorithms with heavy computational
burden has become an important way in practice, and many
image processing algorithms have already been successfully
implemented on GPU. For example, Luo and Duraiswami [10]
implemented a version of the complete (including all stages
of the algorithms) Canny edge detector under CUDA, and
achieved a speedup of more than 3 times against its straight
CPU implementation. In their work, the author considered the
hysteresis labeling connected component stage which was not
included in previous GPU versions, this is the main reason that
they could not achieve a faster implementation performance.
For image matching and mosaicing, many related applications
are also available on GPU. In [11], Schatz and Trapnell
implemented a string-matching program that runs on the GPU
and achieved a speedup of as much as 35x over the equivalent
CPU-bound version. They presented string-matching kernel for
use in the CUDA, which executes parallelized searching of a
suffix tree to find exact matches for a set of query strings.
M. Adam et al. [12] presented a novel approach to local
alignment of images of real-time video stitching application on
GPU. To achieve a nearly double-sized panorama, they mainly
focused on stitching the margin regions of high definition
stereo images. To accelerate the assembling large mosaics of
electron microscope images, K. U. Venkataraju [13] proposed
to use texture memory lookups to speedup the access to
microscopy image tiles and data parallel computing which
leads to the root of complexity of the calculation. Due to the
usage of unsigned char as the image data type, this results in
slightly inaccurate calculation for pixel values in the mosaic.
Even though good results were achieved by these papers
mentioned above, they all avoided considering two extremely
time-consuming steps, that is, feature matching and random
sample consensus (RANSAC). As two key processes in image
registration, they should be considered in the proposed GPU-
accelerated parallel algorithms.
In this paper, a parallel image mosaicing method imple-
mented on GPU using Computed Unified Device Architecture
(CUDA) programming model is presented. To reduce compu-
tation time efficiently, this paper mainly focuses on the most
time-consuming part of mosaicing. In fact, for most precision
mosaicing, the execution time mainly depends on the number
of matched point pairs in the overlapping images, not on the
image size. Thus, our method starts with feature matching and
2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics( CISP-BMEI 2017)
978-1-5386-1936-0/17/$31 ©2017 IEEE