Journal
of
Computational
Science
17
(2016)
591–598
Contents
lists
available
at
ScienceDirect
Journal
of
Computational
Science
journa
l
h
om
epage:
www.elsevier.com/locate/jocs
A
parallel
Non-Local
means
denoising
algorithm
implementation
with
OpenMP
and
OpenCL
on
Intel
Xeon
Phi
Coprocessor
Huming
Zhu
∗
,
Yanfei
Wu,
Pei
Li,
Duo
Wang,
Wei
Shi,
Peng
Zhang,
Licheng
Jiao
Key
Laboratory
of
Intelligent
Perception
and
Image
Understanding
of
Ministry
of
Education,
International
Research
Center
for
Intelligent
Perception
and
Computation,
Xidian
University,
Xi’an,
Shaanxi
Province
710071,
China
a
r
t
i
c
l
e
i
n
f
o
Article
history:
Received
30
October
2015
Received
in
revised
form
24
April
2016
Accepted
7
July
2016
Available
online
11
July
2016
Keywords:
Parallel
algorithm
Non-Local
means
denoising
OpenMP
OpenCL
MIC
a
b
s
t
r
a
c
t
The
Non-Local
means
(NLM)
denoising
algorithm
calculates
similarity
weight
between
denoising
pix-
els
and
searching
area
pixels
by
establishing
similar
functions.
In
texture
denoising
and
edge
region
denoising
domain,
the
Non-Local
Means
denoising
algorithm
performs
better
than
many
other
existing
denoising
algorithms
because
it
uses
the
redundant
information
of
images.
However,
NLM
algorithm
has
defect
in
speed
for
the
huge
computational
amount.
Recently,
Intel
Xeon
Phi
Coprocessor
(based
on
Intel
Many
Integrated
Core
architecture,
MIC)
exhibits
huge
superiority
in
speedup
computation.
Therefore
we
design
parallel
algorithm
strategies
of
OpenMP
and
OpenCL
based
on
the
serial
NLM
algorithm
for
MIC
architecture,
and
conduct
the
experiment
on
CPU,
GPU,
and
MIC
with
images
of
different
sizes.
The
experiment
suggests
that
the
OpenMP-based
NLM
algorithm
has
better
performance
on
Xeon
Phi
7120
than
on
Xeon
E5
2692
when
the
image
size
is
greater
than
or
equal
to
1024*1024,
the
OpenCL-based
NLM
algorithm
has
better
performance
on
Xeon
Phi
7120
than
on
NVIDIA
Kepler
K20M
GPU,
and
OpenCL-based
NLM
algorithm
performs
a
little
better
than
OpenMP-based
NLM
algorithm
when
they
both
implemented
on
Intel
Xeon
Phi
7120.
©
2016
Elsevier
B.V.
All
rights
reserved.
1.
Introduction
Image
denoising
is
the
process
of
recovering
the
original
image
from
the
noisy
image.
In
recent
years,
varieties
of
image
denoising
algorithms
have
been
proposed
by
the
academic
community,
which
are
generally
classified
to
spatial
denoising
method
or
frequency
denoising
method.
There
are
some
classical
denoising
algorithms
in
spatial
denoising
domain,
including
isotropic
linear
filtering,
median
filtering,
etc.
In
contrast,
some
mature
algorithms
belong
to
frequency
method,
such
as
Wiener
filtering
and
wavelet
thresh-
olding
method
[1].
In
2005,
Buades
et
al.
proposed
a
representative
filtering
method
called
Non-Local
Means
(NLM)
denoising
algorithm
[2],
which
is
an
improvement
of
bilateral
filtering.
However,
NLM
is
computa-
tionally
demanding
as
each
noisy
pixel
is
replaced
by
a
weighted
average
of
all
the
pixels
in
a
large
search
window
or
whole
image.
Therefore,
many
variants
on
it
have
arisen
to
decrease
the
com-
∗
Corresponding
author.
E-mail
addresses:
zhuhum@mail.xidian.edu.cn
(H.
Zhu),
wuyanfei@stu.xidian.edu.cn
(Y.
Wu),
1570558611@qq.com
(P.
Li),
770453932@qq.com
(D.
Wang),
1063929728@qq.com
(W.
Shi),
1943379881@qq.com
(P.
Zhang),
jlcxidian@163.com
(L.
Jiao).
putational
time
such
as
utilizing
its
highly
parallelizable
nature
and
attempting
to
decrease
the
computational
time
for
a
single
pixel
or
location.
In
2006,
Wang
et
al.
proposed
fast
Non-Local
Means
denoising
algorithm
[3],
in
which
Fast
Fourier
Transform
(FFT)
is
used
to
accelerate
the
weight
calculation.
In
2009,
Kar-
nati
et
al.
proposed
modified
multi-resolution
pyramid
architecture
to
accelerate
the
computation
of
window
similarity
[4].
In
2014,
Bhujle
et
al.
proposed
a
novel
speed-up
strategy
that
to
build
a
dictionary
to
search
similar
patches
very
quickly
and
reduce
the
computational
cost
[5].
Taking
the
complexity
of
the
computation
into
consideration,
all
the
algorithms
mentioned
above
still
can’t
meet
the
real-time
request,
so
it
has
important
practical
signifi-
cance
to
accelerate
the
image
denoising
algorithm.
Fortunately,
the
development
of
computing
platform
makes
it
possible
to
solve
the
problem.
In
2013,
Palma
et
al.
proposed
a
fully
3D
NLM
denoising
on
a
multi-GPU
architecture
and
meet
the
requirement
of
acceptable
performance
for
real-time
scenarios
[6].
Intel
MIC
architecture
is
specially
designed
for
high
performance
computing
(HPC),
which
is
regarded
as
the
next
generation
plat-
form.
The
advantages
of
MIC
architecture
are
the
simplicity
of
programming
and
the
convenience
of
using
the
existing
tools
[7].
The
source
code
of
the
application
can
be
modified
easier
on
MIC
than
on
GPU
owing
to
its
similar
framework
to
CPU.
The
Intel
tools
including
compiler,
profiling
tool
and
debugging
tool
can
be
used
in
http://dx.doi.org/10.1016/j.jocs.2016.07.001
1877-7503/©
2016
Elsevier
B.V.
All
rights
reserved.