没有合适的资源?快使用搜索试试~ 我知道了~
首页优化图形流水线-Optimising the Graphics Pipeline
资源详情
资源评论
资源推荐

1
优化图形流水线
Optimising the Graphics Pipeline
优化图形流水线
Optimising the Graphics Pipeline
Koji Ashida
Hi. My name is Koji Ashida. I'm part of the developer technology group at
NVIDIA and we are the guys who tend to assist game developers in creating
new effects, optimizing their graphics engine and taking care of bugs.
Today, I'm going to talk to you about optimizing the graphics pipeline.

2
概述
GPU
CPU
application
潜在的瓶颈部位
driver
瓶颈( bottleneck )制约着总的数据吞吐率
总的来说,每个应用程序甚至每一帧都会有不同的瓶颈部位
对于流水线( pipeline )架构来说,要获得好的性能取决于发现和
排除瓶颈的影响
We're going to go through all the stages of the GPU plus talk a little bit about
what happens on the CPU side that can bottleneck a graphics application.
As we all know, in order to render a scene, the application which is running
on the CPU must send data and instructions across to the device. It
communicates with the device through the driver. Then, once the device has
the data, it processes the data using the graphics chip itself, and finally
writes it out to the frame buffer. Because this whole process is a single
pipeline from the CPU to the last stage of the GPU, any one of those stages
can be a potential bottleneck. The good thing about a pipeline is that it's
very efficient and, in particular, it's very efficient at rendering graphics
because it can parallelize a lot of operations. The bad thing about a pipeline
is that, once you do have a bottleneck, then the whole pipeline is running at
that speed so you really want to basically level off all your stages in the
pipeline such that they have equal workloads. Hopefully, by the end of this
presentation, you'll have a very good idea of how to either reduce the
workload on a certain stage or at least increase the workload on the other
stage such that you're getting better visual quality.

3
瓶颈的定位和排除
Locating and eliminating bottlenecks
定位:在每一个阶段( stage )
改变它的负荷
总的性能受到了明显的影响吗?
降低时钟
总的性能受到了明显的影响吗?
排除:
降低产生瓶颈的部位的负荷
增加未发生瓶颈部位的负荷
workload
workload
workload
Now, please remember that, for any given scene, you're going to have
different bottlenecks for different objects, different materials, different parts of
the scene and so it can be fairly difficult to find where the impact is. Now,
you do have two choices once you've identified the bottleneck. One thing
you can do is try to reduce the overall workload of that stage and thereby
increasing the frame rates or what you can do is increase the workload of all
the other stages and, therefore, increasing the visual quality. Now, you want
to choose which one you do, depending on your target frame rate so, if the
application is already running at 80 frames per second, you may, instead of
seeking to run at 100 frames per second, increase the workload on the other
stages. The basic theory that we're going to use is that we're going to step
through each stage and vary its workload. If it is the bottleneck, then the
overall frame rate's going to change. If it isn't, then we're going to see no
difference, supposedly.

4
芯片内高速缓存
视频存储器
系统内存
光栅化
图形渲染流水线
Graphics rendering pipeline
CPU
顶点着色
(T&L)
三角形建立
片段着色
和光栅操作
纹理
帧缓冲
几何
命令
pre-TnL
cache
post-TnL cache
纹理高速
缓存
This is an overall view of the graphics pipeline. To the far left, we see the
CPU, which is where the application and driver are going to be running, and
the CPU is communicating with the graphics device through the AGP bus.
These days, we have AGP8X, so it's running pretty fast. It communicates
both with the graphics chip itself and with video memory through the graphics
chip memory controller. In video memory, typically what's stored is static
geometry or semi-static geometry, also the command stream, textures,
preferably compressed, and, of course, your frame buffer and any other
intermediate services that you have. Then, on the actual hardware chip, we
have some caches to do buffering and to ensure that the pipeline's running
as optimally as possible.

5
潜在的瓶颈部位
Potential bottlenecks
芯片内高速缓存
视频存储器
系统内存
光栅化
CPU
顶点着色
(T&L)
三角形建立
片段着色
和光栅操作
纹理
帧缓冲
几何
命令
pre-TnL
cache
post-TnL cache
纹理高速
缓存
顶点变换
限制
像素着色
器限制
CPU 限
制
纹理带宽
限制
帧缓冲带宽限制
setup
限制
光栅限
制
AGP 传
输限制
The first real module that we encounter is the vertex Shader, also called the
vertex program units, and this is where you're going to do your transform and
lighting of vertices. These vertices then get put into some sort of vertex
cache that operates in different fashions. Some operate as FIFO’s, some, as
leased recently used, or LRU, and, obviously, current high-end cards have
larger caches than the older generation and mainstream cards. Then the
triangle setup stage is reached. This module reads vertices from the cache
and this is where basically the polygon is formed. Once the polygon is
formed, the rasterization is where it gets broken up into pixels. So now the
next stage only accepts pixels and this is the fragment shading or pixel
shader stage. The pixel shader stage is typically where a lot of time is spent
these days and we're going to spend a good amount of time analyzing this.
After the pixel shading stage is the raster operationm meaning the alpha
blend and stencil Z buffer. These can contribute to the bottleneck but
typically not. And, finally, you can have traffic from the pixel Shader and
raster module to the frame by frame back, right, and back because you can
do alpha blending, you need to read stenciling to read a Z and also the
fragment Shader can read textures so you can have a texture bottleneck as
you're accessing memory again.
剩余44页未读,继续阅读

安全验证
文档复制为VIP权益,开通VIP直接复制

评论0