International Journal of Computer Applications (0975 – 8887)
Volume 72– No.18, June 2013
38
New Approach for Graph Algorithms on GPU using
CUDA
1
Gunjan Singla,
2
Amrita Tiwari,
3
Dhirendra Pratap Singh
Department of Computer Science and Engineering
Maulana Azad National Institute of Technology
Bhopal, Madhya Pradesh
ABSTRACT
Large Graph algorithms like Breadth-First Search (BFS),
Depth-First Search(DFS), shortest path algorithms etc. used
frequently in various engineering and real world applications
that demand execution of these algorithms in large graphs
having millions of edges and sequential implementation of
these algorithms takes large amount of time. Today’s
Graphics Processing Units (GPUs) provide a platform to
implement such applications with high computation power
and massively multithreaded architecture at low price. In this
paper, we present parallel implementations of two basic graph
algorithms breadth-first search and Dijkstra’s single source
shortest path algorithm by using a new approach called edge
based kernel execution on GPU. The performance analysis of
parallel implementation over the serial execution gives a good
speed-up.
Keywords
SSSP (Single Source Shortest Path) problem, Dijkstra’s
algorithm, BFS (Breadth -First Search), CUDA (Compute
Unified Device Architecture) model, GPU(Graphic
Processing Unit).
1. INTRODUCTION
Graphs are the commonly used data structures that describe a
set of objects as nodes and the connections between them as
edges. A large number of graph operations are present, such
as minimum spanning tree, breadth-first search, shortest path
etc., having applications in different problem domains like
VLSI chip layout [1], phylogeny reconstruction [2] , data
mining, and network analysis[3].
With the development of computer and information
technology, researches on graph algorithms get wide
attention. In particular, the Single Source Shortest Path
(SSSP) problem is a major problem in graph theory which
computes the weight of the shortest path from a source vertex
to all other vertices in a weighted directed graph. The most
well-known algorithm for solving this problem was given by
Dijkstra in 1959 [4] with non-negative edge weights and
further, more work is done considering it as base algorithm.
So far, many different variants of Dijkstra’s algorithm have
implemented sequentially as well as in parallel manner. In all
parallel implementations, a thread corresponds to a node in
graph database but in our implementation, a thread
corresponds to edges and as number of edges is greater than
number of nodes, comparatively more degree of parallelism is
achieved.
We have also given parallel implementation of BFS [5] [6] on
the basis of edges as it is one of the basic paradigm for the
design of efficient graph algorithm and hence, requires high
degree of parallelism. Given a graph G= (V, E) with m edges,
n vertices and a source vertex s, BFS traverses the edges of G
to discover every vertex that is reachable from s.
At present, the serial graph algorithms have reached the time
limitation as they used to take a large amount of time.
Therefore, the parallel computation is an efficient way to
improve the performance by applying some constraints on the
data and taking the advantage of the hardware available
currently.
Different implementations of parallel algorithms for the SSSP
problem are reviewed in [7]. Bader et al. [8], [9] use CRAY
supercomputer to perform BFS and single pair shortest path
on very large graphs. A. Crauser et al. [10] have given a
PRAM implementation of Dijkstra’s algorithm while such
methods are fast, hardware used in them is very expensive. N.
Jasika et al. [11] presented a parallel dijkstra’s algorithm
using OpenMP (Open Multi-Processing) and OpenCL (Open
Computing Language) which gives good results over serial
algorithm. Pedro J. Martín et al. [12] have given an efficient
parallel dijkstra’s algorithm on GPU using CUDA. L. Luo et
al. [13] have given a GPU implementation of BFS which
gives around 10X speed-up over the algorithm given by P.
Harish et al. [14].
In this paper, we present new edge based parallel
implementations of Dijkstra’s algorithm and breadth first
search (BFS) on GPU using CUDA handling large graphs up
to 2 million edges. We show the results for the speed-up
obtained by our parallel algorithm over its serial execution.
The rest of the paper is organized as follows: CUDA basics
along with GPU architecture is discussed in Section 2. Graph
representation used by our implementation is discussed in
Section 3. Section 4 presents edge based parallel Dijkstra’s
algorithm with a subsequent edge based parallel BFS
implementation in section 5. Performance analysis of our
implementation on various types of graphs is done in section 6
and finally concluded in section 7.
2. CUDA MODEL ON GPU
Graphics Processing Unit (GPU) was introduced by NVIDIA
and has four types of memory in it i.e. shared memory,
constant memory, texture memory and global memory. Its
design does not have any memory restrictions as one can
access all these memory available on the device except shared
memory with no restrictions on its representation though the
access times may differ for different types of memory. It uses
a massively multithreaded computing architecture called
CUDA for parallel processing of data. In CUDA
programming model, GPU is referred as device and CPU is
referred as host. Basically, CUDA device is a multi-core co-