GPU上实时并发链表构建：实时渲染与透明度应用

5星 · 超过95%的资源需积分: 10 76 浏览量更新于2024-09-10 收藏 6.36MB PDF 举报

本文档探讨了"Real-Time Concurrent Linked List Construction on the GPU"这一主题，主要关注在现代图形处理器上动态构建高度并发链表的方法。在实时渲染应用中，这种技术具有重要意义，因为它能够支持创建复杂的视觉效果。作者Jason C. Yang、Justin Hensley、Holger Grüning和Nicolas Thibieroz提出了一种简单而高效的方法，利用OpenGL 4.0和DirectX 11等API中的通用原子操作来构建这些数据结构。链表的实时并发构建对于游戏开发尤其有价值，因为它能够提升图形性能，尤其是在处理如实时透明度和阴影效果时。一个关键的应用是使用像素级链表实现无序透明性（Order Independent Transparency, OIT），这使得开发者能够直接实现完全可编程的混合模式，从而摆脱现有图形API对透明度处理的限制。这意味着游戏设计师可以更自由地定制渲染逻辑，提升图像质量。另一个应用是利用链表实时构建间接阴影，这对于实现复杂光照和阴影动态变化至关重要。通过在GPU上并行处理链表，可以显著提高阴影计算的速度，进而提升游戏的沉浸感和交互性。这篇论文还可能涉及链表的数据结构设计，如何保证在多线程环境中数据的一致性和正确性，以及如何优化性能以适应图形处理器的并行架构。此外，它可能会讨论如何在实际项目中集成这些技术，包括可能遇到的问题和解决策略，以及与其他图形技术（如纹理贴图、顶点着色器等）的协同工作。 "Real-Time Concurrent Linked List Construction on the GPU"是一篇深入研究如何利用GPU硬件优势，提升游戏和实时图形应用程序性能的重要研究论文，它不仅提供了理论基础，还为实践者提供了解决实际问题的技术指南。

Eurographics Symposium on Rendering 2010

Jason Lawrence and Marc Stamminger

(Guest Editors)

Volume 29 (2010), Number 4

Real-Time Concurrent Linked List Construction on the GPU

Jason C. Yang, Justin Hensley, Holger Grün, and Nicolas Thibieroz

Advanced Micro Devices, Inc.

Abstract

We introduce a method to dynamically construct highly concurrent linked lists on modern graphics processors.

Once constructed, these data structures can be used to implement a host of algorithms useful in creating complex

rendering effects in real time. We present a straightforward way to create these linked lists using generic atomic

operations available in APIs such as OpenGL 4.0 and DirectX 11. We also describe several possible applications of

our algorithm. One example uses per-pixel linked lists for order-independent transparency; as a consequence, we

are able to directly implement fully programmable blending, which frees developers from the restrictions imposed

by current graphics APIs. The second uses linked lists to implement real-time indirect shadows.

Categories and Subject Descriptors

(according to ACM CCS): I.3.6 [Computer Graphics]: Methodology and

Techniques—Graphics data structures and data types

1. Introduction

Linked lists are a common and basic data structure in com-

puter science, and, in the past, have been difﬁcult to con-

struct on the GPU with good performance. Pixel shaders

lacked scatter capability and could only write a ﬁxed number

of bytes to a set screen position.

Recent APIs (e.g., DirectX

11 and OpenGL

4.0) and

hardware features include the ability to perform atomic

memory operations to arbitrary locations, which is important

for building linked lists and other complex data structures.

In this paper, we describe a general method of dynamically

constructing linked lists on the GPU fast enough for real-

time rendering. The basic idea behind our algorithm is to

use one buffer to store all linked-list node data and another

buffer to store head pointers that reference the start of the

linked lists in the ﬁrst buffer. This is possible using atomic

memory operations.

Linked list-style data structures have many rendering ap-

plications. We describe two examples – order-independent

transparency and indirect shadowing. In general, our method

can be used to implement programmable blend, which is a

desired API feature.

The contributions of this paper include:

• Fast creation of linked lists of arbitrary size using existing

GPUs and APIs.

• Integration of our method into the standard graphics

pipeline.

• Example applications for rendering – order-independent

transparency and indirect shadows.

2. Related Work

Linked lists can be created on the GPU using multi-pass ren-

dering methods [RBA08], but this is not efﬁcient.

Our method is similar to the A-buffer [Car84] style of per-

pixel data structures and is the most general method of han-

dling multiple fragments per-pixel. Developed for REYES

software rendering, the A-buffer maintained an unbounded,

sorted list of fragment data per-pixel. Fragment data in-

cluded depth, color, transparency, and pixel coverage. Our

method can also be thought of as a true implementation of

the A-buffer on the GPU.

There have been many variations and optimizations to

the A-buffer method that improve performance or reduce

the memory requirements. The R-buffer [Wit01] is a pro-

posed hardware implementation of the A-buffer that uses a

single FIFO to store all scene fragments. The irregular Z-

buffer [JLBM05] proposes modiﬁcations to existing hard-

ware that would enable construction of linked lists. The F-

buffer [MP01] [HPS], stencil-routed A-buffer [MB07], Z

algorithm [JC99], and k-buffer [CICS05] [BCL

∗

07] are also

 2010 The Author(s)

Journal compilation

 2010 The Eurographics Association and Blackwell Publishing Ltd.

Published by Blackwell Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and

350 Main Street, Malden, MA 02148, USA.

DOI: 10.1111/j.1467-8659.2010.01725.x

1297

下载后可阅读完整内容，剩余7页未读，立即下载

aokman

粉丝: 7
资源: 115

GPU上实时并发链表构建：实时渲染与透明度应用

LinkedList详解和使用示例_动力节点Java学院整理

Improving Real-Time Performance with CUDA Persistent Threads (CuPer) on the Jetson TX2 - Concurrent Real-Time White Paper (2016)-计算机科学

Cache-Optimized Concurrent Skip List-开源

Concurrent and Real-Time Programming in Java

Real-Time-The-Finger-Censoring

Concurrent-and-real-time-programming:加的斯大学关于并行和实时编程的实践

Concurrent, Real-Time and Distributed Programming in Java Threads, RTSJ 无水印原版pdf

java中的线程并发库-----javaconcurrent探秘

LoadRunner函数中文翻译系列之3--Concurrent Group

gc-notification-utils:捕捉真正的 stop-the-world ConcurrentMarkSweep 时间！

最新资源