Eurographics Symposium on Rendering 2010
Jason Lawrence and Marc Stamminger
(Guest Editors)
Volume 29 (2010), Number 4
Real-Time Concurrent Linked List Construction on the GPU
Jason C. Yang, Justin Hensley, Holger Grün, and Nicolas Thibieroz
Advanced Micro Devices, Inc.
Abstract
We introduce a method to dynamically construct highly concurrent linked lists on modern graphics processors.
Once constructed, these data structures can be used to implement a host of algorithms useful in creating complex
rendering effects in real time. We present a straightforward way to create these linked lists using generic atomic
operations available in APIs such as OpenGL 4.0 and DirectX 11. We also describe several possible applications of
our algorithm. One example uses per-pixel linked lists for order-independent transparency; as a consequence, we
are able to directly implement fully programmable blending, which frees developers from the restrictions imposed
by current graphics APIs. The second uses linked lists to implement real-time indirect shadows.
Categories and Subject Descriptors
(according to ACM CCS): I.3.6 [Computer Graphics]: Methodology and
Techniques—Graphics data structures and data types
1. Introduction
Linked lists are a common and basic data structure in com-
puter science, and, in the past, have been difficult to con-
struct on the GPU with good performance. Pixel shaders
lacked scatter capability and could only write a fixed number
of bytes to a set screen position.
Recent APIs (e.g., DirectX
R
11 and OpenGL
R
4.0) and
hardware features include the ability to perform atomic
memory operations to arbitrary locations, which is important
for building linked lists and other complex data structures.
In this paper, we describe a general method of dynamically
constructing linked lists on the GPU fast enough for real-
time rendering. The basic idea behind our algorithm is to
use one buffer to store all linked-list node data and another
buffer to store head pointers that reference the start of the
linked lists in the first buffer. This is possible using atomic
memory operations.
Linked list-style data structures have many rendering ap-
plications. We describe two examples – order-independent
transparency and indirect shadowing. In general, our method
can be used to implement programmable blend, which is a
desired API feature.
The contributions of this paper include:
• Fast creation of linked lists of arbitrary size using existing
GPUs and APIs.
• Integration of our method into the standard graphics
pipeline.
• Example applications for rendering – order-independent
transparency and indirect shadows.
2. Related Work
Linked lists can be created on the GPU using multi-pass ren-
dering methods [RBA08], but this is not efficient.
Our method is similar to the A-buffer [Car84] style of per-
pixel data structures and is the most general method of han-
dling multiple fragments per-pixel. Developed for REYES
software rendering, the A-buffer maintained an unbounded,
sorted list of fragment data per-pixel. Fragment data in-
cluded depth, color, transparency, and pixel coverage. Our
method can also be thought of as a true implementation of
the A-buffer on the GPU.
There have been many variations and optimizations to
the A-buffer method that improve performance or reduce
the memory requirements. The R-buffer [Wit01] is a pro-
posed hardware implementation of the A-buffer that uses a
single FIFO to store all scene fragments. The irregular Z-
buffer [JLBM05] proposes modifications to existing hard-
ware that would enable construction of linked lists. The F-
buffer [MP01] [HPS], stencil-routed A-buffer [MB07], Z
3
algorithm [JC99], and k-buffer [CICS05] [BCL
∗
07] are also
c
2010 The Author(s)
Journal compilation
c
2010 The Eurographics Association and Blackwell Publishing Ltd.
Published by Blackwell Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and
350 Main Street, Malden, MA 02148, USA.
DOI: 10.1111/j.1467-8659.2010.01725.x