GPU programming using C++ AMP
Petrika Manika
Dept. of Informatics
University of Tirana
petrika.manika@fshn.edu.al
Elda Xhumari
Dept. of Informatics
University of Tirana
elda.xhumari@fshn.edu.al
Julian Fejzaj
Dept. of Informatics
University of Tirana
julian.fejzaj@fshn.edu.al
Abstract
Nowadays, a challenge for programmers is to
make their programs better. The word "better"
means more simple, portable and much faster
in execution. Heterogeneous computing is a
new methodology in computer science field.
GPGPU programming is a new and
challenging technique which is used for
solving problems with data parallel nature. In
this paper we describe this new programming
methodology with focus on GPU
programming using C++ AMP language, and
what kinds of problems are suitable for
acceleration using these parallel techniques.
Finally we describe the solution for a simple
problem using C++ AMP and the advantages
of this solution.
1. Introduction
The process of implementation of an algorithm as a
solution for a difficult problem, requires a deep
analysis. Although, today there are many tools that
facilitate this work for the analysts and the process of
translation into a programming language for the
programmers. There are always difficulties when the
execution speed is important. When the execution
speed is not the main condition, then for programmers
is easier and they can faster find a solution by building
a source code, which contains instructions that are
executed in series. When the primary condition of the
proposed algorithm is the execution speed, then
parallel programming becomes more important.
Besides parallel source code, whose instructions are
executed in parallel from CPU (Central Processing
Unit), a new methodology is GPGPU programming.
General-purpose computing on graphics processing
units (GPGPU, rarely GPGP or GP²U) is the use of a
graphics processing unit (GPU), which typically
handles computation only for computer graphics, to
perform computation in applications traditionally
handled by the central processing unit (CPU)
1
. The
architecture of graphics processing units (GPUs) is
very well suited for data-parallel problems. They
support extremely high throughput through many
parallel processing units and very high memory
bandwidth. For problems that match the GPU
architecture well, it common to easily achieve a 2×
speedup over a CPU implementation of the same
problem, and tuned implementations can outperform
the CPU by a factor of 10 to 100. Programming these
processors, however, remains a challenge because the
architecture differs so significantly from the CPU. This
paper describes the benefits of GPU programming
using C++ AMP language, and what kinds of problems
are suitable for acceleration using these parallel
techniques.
2. Performance Improvements
The world "Personal Computer" was introduced for the
first time in 1975. Over the decades, the idea of having
a personal computer become possible and real.
Nowadays every person possesses various electronic
machines from desktop computer, laptop up to
smartphones. Over the years, the technology evolution
made these electronic machines to work much faster.
Manufacturers continued to increase the number of
transistors on a single chip, but this faced with the
problem of heat produced from this chips. Due to this
problem, manufacturers started to produce multicore
machines with two or more CPUs on a computer.
However, adding CPU cores did not make everything
faster.
We can divide softwares in two groups: parallel-
aware and parallel-unaware. Parallel-unaware
softwares use almost 1/4 or 1/8 of available CPU cores,
while parallel-aware softwares can reach an execution
speed 2x or 4x more than softwares of the second
category, proportional to the numbers of CPU cores.
1
General-purpose computing on graphics processing