Programmability
Machine learning is experiencing explosive growth not only in the size and complexity of the
models but also the burgeoning diversity of neural network architectures. It is difficult even for
experts to understand the model choices and then choose the appropriate model to solve their AI
business problems.
After a deep learning model is coded and trained, it is then optimized for a specific runtime
inference environment. NVIDIA addresses training and inference challenges with two key tools.
For coding, AI-based service developers use CUDA, a parallel computing platform and
programming model for general computing on GPUs. For inference, AI-based service developers
use TensorRT, NVIDIA’s programmable inference accelerator.
CUDA helps data scientists by simplifying the steps needed to implement an algorithm on the
NVIDIA platform. The TensorRT Programmable Inference Accelerator tool takes a trained
neural network and optimizes it for runtime deployment. It tests different levels of floating point
and integer precision, so that developers and operations can balance system-required accuracy
and performance to provide an optimized solution.
Developers can use TensorRT directly from within the TensorFlow framework to optimize
models for AI-based service delivery. TensorRT can import Open Neural Network Exchange
(ONNX) models from a variety of frameworks, including Caffe2, MXNet, and PyTorch. While
deep learning is still coding at a technical level, this will help the data scientist better leverage
valuable time.
Measuring Programmability
Programmability affects developer productivity and therefore time-to-market. TensorRT
accelerates AI inference on multiple popular frameworks, including Caffe2, Kaldi, MXNet,
PyTorch, and TensorFlow. In addition, TensorRT can ingest CNNs, RNNs and MLP networks,
and offers a Custom Layer API for novel, unique, or proprietary layers, so developers can
implement their own CUDA kernel functions. TensorRT also supports the Python scripting
language, allowing developers to integrate a TensorRT-based inference engine into a Python
development environment.
Programmability in Action
Baker Hughes (BHGE) is a leading oil field services company. It helps oil and gas companies in
all aspects of exploration, extraction, processing, and delivery. At each step of the process, AI
can help oil and gas companies better understand the massive volumes of data their operations
create. Each type of business need can lean on a different type of deep learning model. That
means programmers must efficiently be able to implement, test, and instantiate multiple models.
BHGE uses CUDA and TensorRT to create the deep learning models that help its customers
identify and locate oil and gas resources. BHGE also uses NVIDIA hardware, including DGX-1
servers for model training; DGX Stations at the deskside or on remote offshore platforms for