TensorRT开发者指南：C++与Python API应用

需积分: 10 74 浏览量更新于2024-07-14 收藏 2.82MB PDF 举报

"TensorRT开发者指南，适用于TensorRT 7.0.0，旨在介绍如何高效利用TensorRT进行深度学习推理优化。" TensorRT是NVIDIA推出的一款高性能深度学习推理（Inference）优化器和运行时，它能为深度学习模型提供快速、精确的部署。本开发者指南详细阐述了TensorRT的功能、工作原理以及如何通过C++和Python API来构建和运行模型。 ### 第一章：什么是TensorRT？ 1.1 **TensorRT的优势** TensorRT的主要优势在于提高模型的运行速度和效率，同时保持高精度。它通过模型的静态分析和优化，以及对硬件的紧密集成，可以实现比原始框架更快的推理速度。 1.1.1 **谁可以受益于TensorRT** 开发者、研究人员和企业都能从TensorRT中获益。特别是那些需要在生产环境中部署深度学习模型，对性能有高要求的用户。 1.2 **TensorRT的位置** TensorRT位于训练模型和实际应用之间，负责将经过训练的模型转换为可在GPU上高效执行的引擎。 1.3 **TensorRT的工作原理** TensorRT通过解析模型的计算图，进行静态形状分析，消除冗余操作，并利用低级数学库实现运算的硬件优化。 1.4 **TensorRT提供的能力** 提供了包括自动微分、模型优化、量化、内存管理、以及多GPU支持等在内的多种功能。 1.5 **如何获取TensorRT** 用户可以从NVIDIA的官方网站下载TensorRT的SDK，包含库、头文件和相应的文档。 ### 第二章：使用C++ API 2.1 **在C++中实例化TensorRT对象** 开发者可以通过C++ API创建和管理网络定义、解析器、构建引擎等对象。 2.2 **在C++中创建网络定义** 可以从头开始创建网络，或通过解析器导入预先训练的模型。支持的模型格式包括Caffe、TensorFlow UFF和ONNX。 2.2.1 **使用C++ API从头创建网络** 开发者可以手动构建计算图，定义层、连接和输出。 2.2.2 **使用C++解析器导入模型** 支持通过C++接口导入Caffe模型，简化网络构建过程。 2.2.3 **使用C++ Caffe解析器API导入模型** 提供了专门的API来解析Caffe格式的模型，将其转换为TensorRT可识别的网络。 2.2.4 **使用C++ UFF解析器API导入TensorFlow模型** 对于TensorFlow模型，可以使用Unified Format (UFF)将模型转换后导入。 2.2.5 **使用C++ ONNX解析器API导入模型** 支持直接导入ONNX（Open Neural Network Exchange）格式的模型。 2.3 **在C++中构建引擎** 使用优化后的网络定义，构建可执行的推理引擎。 2.4 **在C++中序列化模型** 引擎可以被序列化到磁盘，以便于存储和加载。 2.5 **在C++中执行推理** 提供接口执行推理任务，处理输入数据并获取输出结果。 2.6 **C++中的内存管理** API提供了管理内存分配和释放的机制，确保高效地使用GPU资源。 2.7 **重新调整引擎** TensorRT允许在运行时更新模型的部分参数，以适应动态变化的数据分布。 ### 第三章：使用Python API 3.1 **在Python中导入TensorRT** Python开发者可以像导入其他库一样，通过`import tensorrt`来使用TensorRT的Python接口。 3.2 **创建Python中的网络定义** Python API提供了与C++类似的接口，用于创建网络、构建引擎和执行推理。 3.3 **后续章节** 本章后续内容会详细介绍如何在Python环境下进行模型导入、引擎构建、序列化、推理和内存管理等操作。 TensorRT开发者指南不仅涵盖了基础概念，还提供了丰富的示例代码，帮助开发者深入理解和使用TensorRT，以提升深度学习模型在NVIDIA Jetson等平台上的推理性能。通过学习和实践，开发者可以更好地优化模型，加速AI应用的部署。

Using The C++ API

www.nvidia.com

TensorRT Developer's Guide SWE-SWDOCTRT-001-DEVG_vTensorRT 7.0.0|12

‣

by reading the serialized engine from the disk. In this case, the performance is better,

since the steps of parsing the model and creating intermediate objects are bypassed.

An object of type ILogger needs to be created globally. It is used as an argument to

various methods of TensorRT API. A simple example demonstrating the creation of the

logger is shown here:

class Logger : public ILogger

{

void log(Severity severity, const char* msg) override

{

// suppress info-level messages

if (severity != Severity::kINFO)

std::cout << msg << std::endl;

}

} gLogger;

A global TensorRT API method called createInferBuilder(gLogger) is used to

create an object of type IBuilder. For more information, see IBuilder class reference.

A method called createNetwork defined for IBuilder is used to create an object of

type INetworkDefinition.

One of the available parsers is created (Caffe, ONNX, or UFF) using the INetwork

definition as the input:

‣

ONNX: auto parser = nvonnxparser::createParser(*network,

gLogger);

‣

Caffe: auto parser = nvcaffeparser1::createCaffeParser();

‣

UFF: auto parser = nvuffparser::createUffParser();

A method called parse() from the object of type IParser is called to read the model

file and populate the TensorRT network.

A method called buildCudaEngine() of IBuilder is called to create an object of

ICudaEngine type.

The engine can be optionally serialized and dumped into the file.

The execution context is used to perform inference.

If the serialized engine is preserved and saved to a file, you can bypass most of the steps

described above.

A global TensorRT API method called createInferRuntime(gLogger) is used to

create an object of type IRuntime.

The rest of the inference is identical for those two usage models.

Even though it is possible to avoid creating the CUDA context, (the default context will

be created for you), it is not advisable. It is recommended to create and configure the

CUDA context before creating a runtime or builder object.

The builder or runtime will be created with the GPU context associated with the creating

thread. Although a default context will be created if it does not already exist, it is

advisable to create and configure the CUDA context before creating a runtime or builder

object.

Using The C++ API

www.nvidia.com

TensorRT Developer's Guide SWE-SWDOCTRT-001-DEVG_vTensorRT 7.0.0|13

2.2.Creating A Network Definition In C++

The first step in performing inference with TensorRT is to create a TensorRT network

from your model.

The easiest way to achieve this is to import the model using the TensorRT parser library,

which supports serialized models in the following samples:

‣

Object Detection With A TensorFlow SSD Network (sampleMNIST), located in the

GitHub repository (both BVLC and NVCaffe)

‣

“Hello World” For TensorRT From ONNX (sampleOnnxMNIST), located in the

GitHub repository

‣

Import A TensorFlow Model And Run Inference (sampleUffMNIST), located in the

GitHub repository (used for TensorFlow)

An alternative is to define the model directly using the TensorRT API. This requires you

to make a small number of API calls to define each layer in the network graph and to

implement your own import mechanism for the model’s trained parameters.

In either case, you will explicitly need to tell TensorRT which tensors are required as

outputs of inference. Tensors which are not marked as outputs are considered to be

transient values that may be optimized away by the builder. There is no restriction on

the number of output tensors, however, marking a tensor as the output may prohibit

some optimizations on that tensor.

Inputs and output tensors must also be given names (using ITensor::setName()). At

inference time, you will supply the engine with an array of pointers to input and output

buffers. In order to determine in which order the engine expects these pointers, you can

query using the tensor names.

An important aspect of a TensorRT network definition is that it contains pointers to

model weights, which are copied into the optimized engine by the builder. If a network

was created via a parser, the parser will own the memory occupied by the weights, and

so the parser object should not be deleted until after the builder has run.

2.2.1.Creating A Network Definition From Scratch Using

The C++ API

Instead of using a parser, you can also define the network directly to TensorRT via the

network definition API. This scenario assumes that the per-layer weights are ready in

host memory to pass to TensorRT during the network creation.

In the following example, we will create a simple network with Input, Convolution,

Pooling, FullyConnected, Activation and SoftMax layers. To see the code in totality, refer

to Building A Simple MNIST Network Layer By Layer (sampleMNISTAPI) located in the

opensource/sampleMNISTAPI directory in the GitHub repository.

Create the builder and the network:

IBuilder* builder = createInferBuilder(gLogger);

Using The C++ API

www.nvidia.com

TensorRT Developer's Guide SWE-SWDOCTRT-001-DEVG_vTensorRT 7.0.0|14

INetworkDefinition* network = builder->createNetwork();

Add the Input layer to the network, with the input dimensions. A network can have

multiple inputs, although in this sample there is only one:

auto data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{1, INPUT_H,

INPUT_W});

Add the Convolution layer with hidden layer input nodes, strides and weights for

filter and bias. In order to retrieve the tensor reference from the layer, we can use:

auto conv1 = network->addConvolution(*data->getOutput(0), 20, DimsHW{5, 5},

weightMap["conv1filter"], weightMap["conv1bias"]);

conv1->setStride(DimsHW{1, 1});

Weights passed to TensorRT layers are in host memory.

Add the Pooling layer:

auto pool1 = network->addPooling(*conv1->getOutput(0), PoolingType::kMAX,

DimsHW{2, 2});

pool1->setStride(DimsHW{2, 2});

Add the FullyConnected and Activation layers:

auto ip1 = network->addFullyConnected(*pool1->getOutput(0), 500,

weightMap["ip1filter"], weightMap["ip1bias"]);

auto relu1 = network->addActivation(*ip1->getOutput(0),

ActivationType::kRELU);

Add the SoftMax layer to calculate the final probabilities and set it as the output:

auto prob = network->addSoftMax(*relu1->getOutput(0));

prob->getOutput(0)->setName(OUTPUT_BLOB_NAME);

Mark the output:

network->markOutput(*prob->getOutput(0));

2.2.2.Importing A Model Using A Parser In C++

The builder must be created before the network because it serves as a factory for the

network. Different parsers have different mechanisms for marking network outputs.

Different parsers have different mechanisms for marking network outputs.

To import a model using the C++ Parser API, you will need to perform the following

high-level steps:

Create the TensorRT builder and network.

IBuilder* builder = createInferBuilder(gLogger);

nvinfer1::INetworkDefinition* network = builder->createNetwork();

For an example on how to create the logger, see Instantiating TensorRT Objects in C

++.

Create the TensorRT parser for the specific format.

ONNX

auto parser = nvonnxparser::createParser(*network, gLogger);

Using The C++ API

www.nvidia.com

TensorRT Developer's Guide SWE-SWDOCTRT-001-DEVG_vTensorRT 7.0.0|16

2.2.4.Importing A TensorFlow Model Using The C++ UFF

Parser API

The following steps illustrate how to import a TensorFlow model using the C++ Parser

API.

For new projects, it’s recommended to use the TF-TRT integration as a method for

converting your TensorFlow network to use TensorRT for inference. For integration

instructions, see Accelerating Inference In TF-TRT User Guide.

Importing from the TensorFlow framework requires you to convert the TensorFlow

model into intermediate format UFF (Universal Framework Format). For more

information about the conversion, see Converting A Frozen Graph To UFF.

For more information about the UFF import, see Importing A TensorFlow Model And

Running Inference (sampleUffMNIST) located in the GitHub repository.

Create the builder and network:

IBuilder* builder = createInferBuilder(gLogger);

INetworkDefinition* network = builder->createNetwork();

Create the UFF parser:

IUFFParser* parser = createUffParser();

Declare the network inputs and outputs to the UFF parser:

parser->registerInput("Input_0", DimsCHW(1, 28, 28), UffInputOrder::kNCHW);

parser->registerOutput("Binary_3");

Parse the imported model to populate the network:

parser->parse(uffFile, *network, nvinfer1::DataType::kFLOAT);

2.2.5.Importing An ONNX Model Using The C++ Parser

API

The following steps illustrate how to import an ONNX model using the C++ Parser API.

In general, the newer version of the ONNX Parser is designed to be backward

compatible up to opset 7. There could be some exceptions when the changes were

not backward compatible. In this case, convert the earlier ONNX model file into a

later supported version. For more information on this subject, see ONNX Model Opset

Version Converter.

It is also possible that the user model was generated by an exporting tool supporting

later opsets than supported by the ONNX parser shipped with TensorRT. In this

case, check whether the latest version of TensorRT released to GitHub, onnx-

tensorrt, supports the required version. The supported version is defined by the

剩余140页未读，继续阅读

君宝bob

粉丝: 229
资源: 20

TensorRT开发者指南：C++与Python API应用

TensorRT-Installation-Guide.pdf

TensorRT 实现深度网络模型推理加速

TensorRT-Developer-Guide-3.0.4.pdf

Jetson-Xavier-NX-Developer-Kit-User-Guide.pdf 开发者套件使用指南英文版

nv_jetson_agx_xavier_developer_kit_user_guide.pdf

Jetson_Nano英伟达nano_Developer_Kit_User_Guide_cn.pdf

TensorRTTraining-TRT8.6.1-Part1~4-V1.1.pdf

TensorRT7 官方指导文档

关于tensorrt的参考文献

TensorRT4.0指南

最新资源

Jetson-Xavier-NX-Developer-Kit-User-Guide.pdf　开发者套件使用指南英文版