Image Blending Techniques Based on GPU Acceleration
Jung Soo Kim
Department of Electronics and
Computer Engineering
Hanyang University
Seoul, Korea
+82 2-2220-4701
jungsookim4080@gmail.com
Min-Kyu Lee
Department of Electronics and
Computer Engineering
Hanyang University
Seoul, Korea
+82 2-2220-4701
hanlovelan@hanyang.ac.kr
Ki-Seok Chung*
Department of Electronics and
Computer Engineering
Hanyang University
Seoul, Korea
+82 2-2220-4701
kchung@hanyang.ac.kr
ABSTRACT
Today, image blending has been used for a high-resolution image
in medical, aerospace, and even defense areas. To blend images,
several filters and various processing steps such as Gaussian
pyramid, Laplacian pyramid, and multi-band computation will be
needed. However, these computations consist of a large amount of
arithmetic operations. As the processing capability of graphic
processing units (GPUs) grows very rapidly, GPUs have
commonly been used to supplement central processing units
(CPUs) for high-performance computing. By employing hardware
accelerators such as GPU, a significant speedup can be achieved.
In this paper, we present an implementation of fast image
blending methods using compute unified device architecture
(CUDA). The proposed implementation utilizes a shared memory
in GPU better than conventional implementations leading to a
better speed-up. The proposed implementation of this paper shows
an improvement of 3.9 times in the overall execution time
compared to a conventional implementation.
CCS Concepts
• Computing methodologies➝Computer graphics➝Image
Manipulation➝Image processing.
Keywords
Image blending; GPGPU; CUDA; padding; shared memory;
image pyramid; multi-resolution spline; multi-band blending.
1. INTRODUCTION
Image blending is a technique to combine several images in order
to form a single output image. In the past, image blending had
been adopted to combine only the images. Since multi-band
blending [1-3], which is also known as multi-resolution spline,
was introduced, image blending is used not only to blend images
but also to correct and calibrate images from different sensors to
make a natural conjunction between images at the joints. Because
of these reasons, multi-band blending has often been adopted in a
variety of computer vision areas such as medical and aerospace
applications.
It is strongly required for processors to have a high processing
capability to satisfy the growing need for processing a large
amount of data. For CPUs, the processing power is improved
mainly either by increasing the clock frequency or by increasing
the number of cores. However, neither one will achieve a
groundbreaking performance improvement. Thus, it is necessary a
new processor architecture to achieve a much better performance.
One of the widely adopted methods to overcome the CPU’s
limitation is to employ General-Purpose Graphics Processing Unit
(GPGPU) [4]. GPU had been mainly developed to speed up only
the graphic processing. However, due to its relatively regular
hardware architecture, the number of cores in a GPU has
increased tremendously. When the GPU executes massively data-
parallel applications, thousands of threads can be executed in
parallel resulting in an excellent speed-up in many data-parallel
applications over the CPU. Therefore, more and more application
areas find it very advantageous to employ GPUs to achieve
performance improvement. Compute unified device architecture
(CUDA) is a parallel programming framework and an application
programming interface (API) model provided by NVIDIA. It
allows software developers to use a CUDA-enabled GPU for
general purpose processing [5].
The higher the required resolution gets, or the more images the
image blender has to take care of, the more critical the
computational capability becomes. In this paper, to guarantee
sufficient processing power, the CUDA framework is employed
for image blending.
To maximize the performance improvement by utilizing the GPU,
we have attempted two implementations: computation without a
shared memory and that with a shared memory option [6]. It is to
show that it is very advantageous to utilize the shared memory
option to get better performance. Without the shared memory
option, each thread must copy the data from a global memory
called Graphics DDR SDRAM (GDDR) to cores directly. On the
other hand, with the shared memory option, data is copied from
the global memory to the shared memory so that it should be
possible that threads in the same block can share the copied data.
In this paper, we show how the shared memory option should be
utilized to reduce the memory loading time. This is the main
contribution of this paper.
The rest of this paper is organized as follows. We will introduce
previous studies on the image blending in Section 2 and the
CUDA computing framework in Section 3. We propose our
proposed implementation of the image blending in Section 4. Next,
we will show experimental results and evaluate the proposed
scheme in terms of execution time in Section 5. Finally, we
conclude our work in Section 6.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights
for components of this work owned by others than ACM must be
honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee. Request permissions from
Permissions@acm.org.
ICIGP 2018, February 24–26, 2018, Hong Kong, Hong Kong
© 2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-6367-9/18/02…$15.00
https://doi.org/10.1145/3191442.3191471