GPU上的高性能图处理库：Gunrock

157 浏览量更新于2024-07-14 收藏 812KB PDF 举报

"Gunrock - 一篇由加州大学戴维斯分校作者或部门公开的学术文章，关于在GPU上实现高性能图处理库的2016年论文。该文章是第21届ACM SIGPLAN并行编程原理与实践研讨会的会议论文，详细介绍了Gunrock库的设计与应用。" Gunrock是一个专门针对图形处理单元（GPU）设计的高性能图处理库，它在2016年由UC Davis的研究团队发表。这个库旨在利用GPU的并行计算能力，以高效的方式处理大规模图数据。在图形处理领域，图可以表示网络、社交关系、计算机科学中的数据结构等，因此，对图的快速分析和操作至关重要。文章详细讨论了Gunrock的架构和实现，它是一个运行时框架，支持多种图算法，如遍历、最短路径查找、社区检测等。这些算法在许多应用中都有广泛需求，例如在网络分析、推荐系统、搜索引擎优化和生物学研究中。通过利用GPU的并行计算能力，Gunrock能够在处理大规模图数据时提供显著的性能提升，比传统的CPU解决方案更快。 Gunrock的核心特性包括高度优化的图遍历操作，如BFS（广度优先搜索）和DFS（深度优先搜索），以及高效的路径查找算法，如Dijkstra和Bellman-Ford。此外，它还支持动态图处理，允许在运行时添加或删除边和节点，这对于实时数据分析特别有用。论文还探讨了Gunrock的设计原则，如可扩展性、易用性和可维护性。其模块化设计使得研究人员和开发者能够轻松地添加新的图算法或优化现有算法。同时，Gunrock提供了详细的文档和示例，帮助用户理解和应用这个库。该文的发布得益于加州大学学术参议院的开放访问政策，使得这篇研究工作可以免费在全球范围内获取，促进了学术交流和知识传播。读者可以通过给定的permalink和DOI链接访问到原文，并可以提供反馈。 Gunrock是一个在GPU上执行高效图处理的关键工具，对于那些需要处理大量图数据的科研人员和工程师来说，这是一个强大的资源。通过其创新的并行计算方法，Gunrock极大地提高了图算法的执行速度，为处理大规模复杂网络问题提供了新的可能性。

Gunrock: A High-Performance Graph

Processing Library on the GPU

Yangzihao Wang, Andrew Davidson

∗

, Yuechao Pan, Yuduo Wu

†

, Andy Riffel, John D. Owens

University of California, Davis

{yzhwang, aaldavidson, ychpan, yudwu, atriﬀel, jowens}@ucdavis.edu

Abstract

For large-scale graph analytics on the GPU, the irregularity of data

access/control ﬂow and the complexity of programming GPUs have

been two signiﬁcant challenges for developing a programmable

high-performance graph library. “Gunrock,” our high-level bulk-

synchronous graph-processing system targeting the GPU, takes

a new approach to abstracting GPU graph analytics: rather than

designing an abstraction around computation, Gunrock instead

implements a novel data-centric abstraction centered on operations

on a vertex or edge frontier. Gunrock achieves a balance between

performance and expressiveness by coupling high-performance

GPU computing primitives and optimization strategies with a high-

level programming model that allows programmers to quickly

develop new graph primitives with small code size and minimal

GPU programming knowledge. We evaluate Gunrock on ﬁve graph

primitives (BFS, BC, SSSP, CC, and PageRank) and show that

Gunrock has on average at least an order of magnitude speedup over

Boost and PowerGraph, comparable performance to the fastest GPU

hardwired primitives, and better performance than any other GPU

high-level graph library.

1. Introduction

Graphs are ubiquitous data structures that can represent relation-

ships between people (social networks), computers (the Internet),

biological and genetic interactions, and elements in unstructured

meshes, just to name a few. In this paper, we describe “Gunrock,”

our graphics processor (GPU)-based system for graph processing

that delivers high performance in computing graph analytics with

its high-level, data-centric parallel programming model. Unlike pre-

vious GPU graph programming models that focus on sequencing

computation steps, our data-centric model’s key abstraction is the

frontier, a subset of the edges or vertices within the graph that is

currently of interest. All Gunrock operations are bulk-synchronous

and manipulate this frontier, either by computing on values within it

or by computing a new frontier from it.

At a high level, Gunrock targets graph primitives that are iter-

ative, convergent processes. Among the graph primitives we have

∗

Currently an employee at Google.

†

Currently an employee at IBM.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without

fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice

and the full citation on the ﬁrst page. Copyrights for components of this work owned by others than ACM must be

honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to

lists, contact the Owner/Author. Request permissions from permissions@acm.org or Publications Dept., ACM, Inc., fax

PPoPP ’16 March 12-16, 2016, Barcelona, Spain

 2016 ACM 978-1-4503-4092-2/16/03.. .$15.00

DOI: http://dx.doi.org/10.1145/2851141.2851145

implemented and evaluated in Gunrock, we focus in this paper on

breadth-ﬁrst search (BFS), single-source shortest path (SSSP), be-

tweenness centrality (BC), PageRank, and connected components

(CC). Though the GPU’s excellent peak throughput and energy

efﬁciency [

] have been demonstrated across many application

domains, these applications often exploit regular, structured par-

allelism. The inherent irregularity of graph data structures leads

to irregularity in data access and control ﬂow, making an efﬁcient

implementation on GPUs a signiﬁcant challenge.

Our goal with Gunrock is to deliver the performance of cus-

tomized, complex GPU hardwired graph primitives with a high-

level programming model that allows programmers to quickly de-

velop new graph primitives. To do so, we must address the chief

challenge in a highly parallel graph processing system: managing

irregularity in work distribution. Gunrock integrates sophisticated

load-balancing and work-efﬁciency strategies into its core. These

strategies are hidden from the programmer; the programmer instead

expresses what operations should be performed on the frontier rather

than how those operations should be performed. Programmers can

assemble complex and high-performance graph primitives from op-

erations that manipulate the frontier (the “what”) without knowing

the internals of the operations (the “how”).

Our contributions are as follows:

We present a novel data-centric abstraction for graph operations

that allows programmers to develop graph primitives at a high

level of abstraction while simultaneously delivering high per-

formance. This abstraction, unlike the abstractions of previous

GPU programmable frameworks, is able to elegantly incorpo-

rate proﬁtable optimizations—kernel fusion, push-pull traversal,

idempotent traversal, and priority queues—into the core of its

implementation.

We design and implement a set of simple and ﬂexible APIs that

can express a wide range of graph processing primitives at a

high level of abstraction (at least as simple, if not more so, than

other programmable GPU frameworks).

We describe several GPU-speciﬁc optimization strategies for

memory efﬁciency, load balancing, and workload management

that together achieve high performance. All of our graph primi-

tives achieve comparable performance to their hardwired coun-

terparts and signiﬁcantly outperform previous programmable

GPU abstractions.

We provide a detailed experimental evaluation of our graph

primitives with performance comparisons to several CPU and

GPU implementations.

Gunrock is currently available in an open-source repository

at http://gunrock.github.io/ and is currently available for use by

external developers.

剩余14页未读，继续阅读

weixin_38613640

粉丝: 5
资源: 882

GPU上的高性能图处理库：Gunrock

cpp-Gunrock专为GPU设计用于图分析计算的CUDA库

gunrock, 在gpu上，高性能图形基元.zip

gunrock:GPU上的高性能图形基元

Gunrock - A Fast and Programmable Multi-GPU Graph Processing Library - Slides - 2016 (S6374)-计算机科学

Gunrock - A Fast and Programmable Multi-GPU Graph Processing Library - Slides - 2015 (SC5139)-计算机科学

Gunrock：高性能多GPU图处理库

Gunrock：多GPU图处理库的快速与可编程解决方案

上市公司企业澄清公告数据（2001-2023年） .xlsx

(源码)基于Java和MySQL的物联网环境监测系统.zip

中国2002-2021年31省份经济韧性测度三级指标数据【重磅，更新！】

最新资源