GPU上的图算法新策略：CUDA实现的BFS与Dijkstra算法

183 浏览量更新于2024-08-25 收藏 511KB PDF 举报

"New Approach for Graph Algorithms on GPU using CUDA - 2013" 这篇论文探讨了一种使用CUDA在GPU上实现图算法的新方法，主要针对大规模图算法如广度优先搜索（BFS）、深度优先搜索（DFS）以及最短路径算法。这些算法在工程和现实世界的应用中广泛应用，尤其是在处理包含数百万边的大图时，传统的顺序实现需要大量时间。随着现代图形处理单元（GPUs）的发展，它们提供了高计算能力和大规模并行架构，能够以较低的成本运行此类应用。论文中，作者 Gunjan Singla、Amrita Tiwari 和 Dhirendra Pratap Singh 来自Maulana Azad National Institute of Technology的计算机科学与工程系，他们提出了一种新的GPU执行模式——基于边的内核执行。这种方法优化了在GPU上并行运行BFS和Dijkstra单源最短路径算法的性能。在传统的CPU实现中，这些图算法通常以节点为中心，逐个处理节点及其邻接关系。而在GPU上，基于边的内核执行策略则侧重于同时处理多条边，这更符合GPU的并行处理能力。通过这种方式，可以显著减少算法的执行时间，提高效率。性能分析部分展示了并行实现相对于序列执行的优越性。通常，GPU的并行处理能力使得数据处理速度大幅提升，尤其是在处理大量并发任务时。论文可能会详细讨论并行算法的设计细节，包括如何有效地分配工作负载，如何利用GPU的流处理器（Streaming Multiprocessors, SMs）以及如何避免或最小化数据传输和同步开销。此外，可能还会涉及内存管理策略，例如使用全局内存、共享内存和纹理内存来优化数据访问。在GPU上实现图算法时，内存带宽和数据局部性是性能的关键因素。作者可能还讨论了如何通过适当的数据结构和算法设计来最大化并行度和内存效率。这篇2013年的研究论文提供了一种使用CUDA在GPU上高效执行图算法的新方法，对于理解和优化大规模图处理具有重要意义。这种方法不仅可以应用于图论和算法领域，还可以扩展到其他依赖于快速计算和并行处理的领域，如网络分析、社交网络研究、生物信息学以及机器学习中的图神经网络等。

International Journal of Computer Applications (0975 – 8887)

Volume 72– No.18, June 2013

New Approach for Graph Algorithms on GPU using

CUDA

Gunjan Singla,

Amrita Tiwari,

Dhirendra Pratap Singh

Department of Computer Science and Engineering

Maulana Azad National Institute of Technology

Bhopal, Madhya Pradesh

ABSTRACT

Large Graph algorithms like Breadth-First Search (BFS),

Depth-First Search(DFS), shortest path algorithms etc. used

frequently in various engineering and real world applications

that demand execution of these algorithms in large graphs

having millions of edges and sequential implementation of

these algorithms takes large amount of time. Today’s

Graphics Processing Units (GPUs) provide a platform to

implement such applications with high computation power

and massively multithreaded architecture at low price. In this

paper, we present parallel implementations of two basic graph

algorithms breadth-first search and Dijkstra’s single source

shortest path algorithm by using a new approach called edge

based kernel execution on GPU. The performance analysis of

parallel implementation over the serial execution gives a good

speed-up.

Keywords

SSSP (Single Source Shortest Path) problem, Dijkstra’s

algorithm, BFS (Breadth -First Search), CUDA (Compute

Unified Device Architecture) model, GPU(Graphic

Processing Unit).

1. INTRODUCTION

Graphs are the commonly used data structures that describe a

set of objects as nodes and the connections between them as

edges. A large number of graph operations are present, such

as minimum spanning tree, breadth-first search, shortest path

etc., having applications in different problem domains like

VLSI chip layout [1], phylogeny reconstruction [2] , data

mining, and network analysis[3].

With the development of computer and information

technology, researches on graph algorithms get wide

attention. In particular, the Single Source Shortest Path

(SSSP) problem is a major problem in graph theory which

computes the weight of the shortest path from a source vertex

to all other vertices in a weighted directed graph. The most

well-known algorithm for solving this problem was given by

Dijkstra in 1959 [4] with non-negative edge weights and

further, more work is done considering it as base algorithm.

So far, many different variants of Dijkstra’s algorithm have

implemented sequentially as well as in parallel manner. In all

parallel implementations, a thread corresponds to a node in

graph database but in our implementation, a thread

corresponds to edges and as number of edges is greater than

number of nodes, comparatively more degree of parallelism is

achieved.

We have also given parallel implementation of BFS [5] [6] on

the basis of edges as it is one of the basic paradigm for the

design of efficient graph algorithm and hence, requires high

degree of parallelism. Given a graph G= (V, E) with m edges,

n vertices and a source vertex s, BFS traverses the edges of G

to discover every vertex that is reachable from s.

At present, the serial graph algorithms have reached the time

limitation as they used to take a large amount of time.

Therefore, the parallel computation is an efficient way to

improve the performance by applying some constraints on the

data and taking the advantage of the hardware available

currently.

Different implementations of parallel algorithms for the SSSP

problem are reviewed in [7]. Bader et al. [8], [9] use CRAY

supercomputer to perform BFS and single pair shortest path

on very large graphs. A. Crauser et al. [10] have given a

PRAM implementation of Dijkstra’s algorithm while such

methods are fast, hardware used in them is very expensive. N.

Jasika et al. [11] presented a parallel dijkstra’s algorithm

using OpenMP (Open Multi-Processing) and OpenCL (Open

Computing Language) which gives good results over serial

algorithm. Pedro J. Martín et al. [12] have given an efficient

parallel dijkstra’s algorithm on GPU using CUDA. L. Luo et

al. [13] have given a GPU implementation of BFS which

gives around 10X speed-up over the algorithm given by P.

Harish et al. [14].

In this paper, we present new edge based parallel

implementations of Dijkstra’s algorithm and breadth first

search (BFS) on GPU using CUDA handling large graphs up

to 2 million edges. We show the results for the speed-up

obtained by our parallel algorithm over its serial execution.

The rest of the paper is organized as follows: CUDA basics

along with GPU architecture is discussed in Section 2. Graph

representation used by our implementation is discussed in

Section 3. Section 4 presents edge based parallel Dijkstra’s

algorithm with a subsequent edge based parallel BFS

implementation in section 5. Performance analysis of our

implementation on various types of graphs is done in section 6

and finally concluded in section 7.

2. CUDA MODEL ON GPU

Graphics Processing Unit (GPU) was introduced by NVIDIA

and has four types of memory in it i.e. shared memory,

constant memory, texture memory and global memory. Its

design does not have any memory restrictions as one can

access all these memory available on the device except shared

memory with no restrictions on its representation though the

access times may differ for different types of memory. It uses

a massively multithreaded computing architecture called

CUDA for parallel processing of data. In CUDA

programming model, GPU is referred as device and CPU is

referred as host. Basically, CUDA device is a multi-core co-

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38560039

粉丝: 3
资源: 888

GPU上的图算法新策略：CUDA实现的BFS与Dijkstra算法

cudnn-windows-x86-64-9.0.0.312-cuda11-archive.zip

cudnn-windows-x86-64-9.0.0.312-cuda12-archive.zip

cudnn-windows-x86-64-9.1.0.70-cuda12-archive.zip

解释一下for (int[] path : paths) { graph[path[0] - 1].add(path[1] - 1); graph[path[1] - 1].add(path[0] - 1); }的功能

neo4j-graph-data-science-2.4.0.jar

x = linspace(1e-10, 93.34, 1000);y3 = 1-(0.83./(0.83+0.17*x.^5.73)); y4 = 1-0.83.*(1.2-0.2.*x.^5);plot(x,y3,x,y4);

dot: graph is too large for cairo-renderer bitmaps.如何使得Graphviz不进行缩放

最新资源

x = linspace(1e-10, 93.34, 1000);y3 = 1-(0.83./(0.83+0.17x.^5.73)); y4 = 1-0.83.(1.2-0.2.*x.^5);plot(x,y3,x,y4);