C
o
n
s
i
s
t
e
n
t
*
C
o
m
p
l
e
t
e
*
W
e
l
l
D
o
c
u
m
e
n
t
e
d
*
E
a
s
y
t
o
R
e
u
s
e
*
*
E
v
a
l
u
a
t
e
d
*
P
o
P
*
A
r
t
i
f
a
c
t
*
A
E
C
Gunrock: A High-Performance Graph
Processing Library on the GPU
Yangzihao Wang, Andrew Davidson
∗
, Yuechao Pan, Yuduo Wu
†
, Andy Riffel, John D. Owens
University of California, Davis
{yzhwang, aaldavidson, ychpan, yudwu, atriffel, jowens}@ucdavis.edu
Abstract
For large-scale graph analytics on the GPU, the irregularity of data
access/control flow and the complexity of programming GPUs have
been two significant challenges for developing a programmable
high-performance graph library. “Gunrock,” our high-level bulk-
synchronous graph-processing system targeting the GPU, takes
a new approach to abstracting GPU graph analytics: rather than
designing an abstraction around computation, Gunrock instead
implements a novel data-centric abstraction centered on operations
on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high-performance
GPU computing primitives and optimization strategies with a high-
level programming model that allows programmers to quickly
develop new graph primitives with small code size and minimal
GPU programming knowledge. We evaluate Gunrock on five graph
primitives (BFS, BC, SSSP, CC, and PageRank) and show that
Gunrock has on average at least an order of magnitude speedup over
Boost and PowerGraph, comparable performance to the fastest GPU
hardwired primitives, and better performance than any other GPU
high-level graph library.
1. Introduction
Graphs are ubiquitous data structures that can represent relation-
ships between people (social networks), computers (the Internet),
biological and genetic interactions, and elements in unstructured
meshes, just to name a few. In this paper, we describe “Gunrock,”
our graphics processor (GPU)-based system for graph processing
that delivers high performance in computing graph analytics with
its high-level, data-centric parallel programming model. Unlike pre-
vious GPU graph programming models that focus on sequencing
computation steps, our data-centric model’s key abstraction is the
frontier, a subset of the edges or vertices within the graph that is
currently of interest. All Gunrock operations are bulk-synchronous
and manipulate this frontier, either by computing on values within it
or by computing a new frontier from it.
At a high level, Gunrock targets graph primitives that are iter-
ative, convergent processes. Among the graph primitives we have
∗
Currently an employee at Google.
†
Currently an employee at IBM.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without
fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice
and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to
lists, contact the Owner/Author. Request permissions from permissions@acm.org or Publications Dept., ACM, Inc., fax
+1 (212) 869-0481. Copyright 2016 held by Owner/Author. Publication Rights Licensed to ACM.
PPoPP ’16 March 12-16, 2016, Barcelona, Spain
Copyright
c
2016 ACM 978-1-4503-4092-2/16/03.. .$15.00
DOI: http://dx.doi.org/10.1145/2851141.2851145
implemented and evaluated in Gunrock, we focus in this paper on
breadth-first search (BFS), single-source shortest path (SSSP), be-
tweenness centrality (BC), PageRank, and connected components
(CC). Though the GPU’s excellent peak throughput and energy
efficiency [
17
] have been demonstrated across many application
domains, these applications often exploit regular, structured par-
allelism. The inherent irregularity of graph data structures leads
to irregularity in data access and control flow, making an efficient
implementation on GPUs a significant challenge.
Our goal with Gunrock is to deliver the performance of cus-
tomized, complex GPU hardwired graph primitives with a high-
level programming model that allows programmers to quickly de-
velop new graph primitives. To do so, we must address the chief
challenge in a highly parallel graph processing system: managing
irregularity in work distribution. Gunrock integrates sophisticated
load-balancing and work-efficiency strategies into its core. These
strategies are hidden from the programmer; the programmer instead
expresses what operations should be performed on the frontier rather
than how those operations should be performed. Programmers can
assemble complex and high-performance graph primitives from op-
erations that manipulate the frontier (the “what”) without knowing
the internals of the operations (the “how”).
Our contributions are as follows:
1.
We present a novel data-centric abstraction for graph operations
that allows programmers to develop graph primitives at a high
level of abstraction while simultaneously delivering high per-
formance. This abstraction, unlike the abstractions of previous
GPU programmable frameworks, is able to elegantly incorpo-
rate profitable optimizations—kernel fusion, push-pull traversal,
idempotent traversal, and priority queues—into the core of its
implementation.
2.
We design and implement a set of simple and flexible APIs that
can express a wide range of graph processing primitives at a
high level of abstraction (at least as simple, if not more so, than
other programmable GPU frameworks).
3.
We describe several GPU-specific optimization strategies for
memory efficiency, load balancing, and workload management
that together achieve high performance. All of our graph primi-
tives achieve comparable performance to their hardwired coun-
terparts and significantly outperform previous programmable
GPU abstractions.
4.
We provide a detailed experimental evaluation of our graph
primitives with performance comparisons to several CPU and
GPU implementations.
Gunrock is currently available in an open-source repository
at http://gunrock.github.io/ and is currently available for use by
external developers.