Glow编译器：优化神经网络计算的利器

需积分: 50 131 浏览量更新于2024-09-13 1 收藏 824KB PDF 举报

"本文介绍了一种名为Glow的机器学习编译器，旨在优化异构硬件上的计算。Glow通过减少神经网络数据流图的计算量来提高代码效率，并采用两阶段强类型中间表示，实现了高级和低级的优化。其优化器能够针对特定领域进行优化，同时进行内存相关的优化，如指令调度、静态内存分配和复制消除。Glow的降低阶段允许编译器利用专门的硬件特性进行机器特定的代码生成。" 在深度学习领域，计算图是神经网络模型的基础，它表示了模型中的运算顺序和依赖关系。Glow编译器的目标就是通过优化这些计算图，减少不必要的计算，提高执行效率，从而更好地适应各种硬件平台，包括GPU、CPU以及专门的AI加速器等。首先，Glow采用了一种两阶段的中间表示（Intermediate Representation, IR）方法。高级的IR允许编译器进行领域特定的优化，例如融合运算符、删除冗余操作或调整运算顺序以减少计算负载。这一步骤对于实现跨不同架构的通用优化至关重要，因为它可以确保算法在不牺牲性能的情况下，适应多种硬件配置。其次，Glow的低级IR是以指令为基础的地址仅表示，主要关注内存管理优化。通过指令调度，编译器可以有效地安排计算任务，减少内存访问冲突，提高执行速度。静态内存分配则有助于减少动态分配带来的开销，而复制消除则避免了数据的无效复制，进一步节省了计算资源。再者，Glow的降低阶段是一个关键特征，它使得编译器能够针对特定的硬件特性进行定制化的代码生成。这包括针对硬件加速器的指令集优化，例如对于GPU的并行计算优化，或者对于ASICs的硬件流水线设计。这种硬件感知的编译策略可以最大化地利用硬件资源，实现更高的计算效率。 Glow编译器是一个综合性的解决方案，它结合了高级优化策略和低级硬件优化技术，旨在提升神经网络在各种硬件环境下的运行效率。通过这种方式，开发者可以在不牺牲模型性能的前提下，充分利用现有硬件的计算能力，为深度学习应用带来更高效、更节能的执行体验。

Glow: Graph Lowering Compiler Techniques for

Neural Networks

Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Summer Deng, Roman Dzhabarov,

James Hegeman, Roman Levenstein, Bert Maher, Satish Nadathur, Jakob Olesen,

Jongsoo Park, Artem Rakhov, Misha Smelyanskiy

Facebook

Abstract

This paper presents the design of Glow, a machine

learning compiler for heterogeneous hardware. It is

a pragmatic approach to compilation that enables

the generation of highly optimized code for multi-

ple targets. Glow lowers the traditional neural net-

work dataﬂow graph into a two-phase strongly-typed

intermediate representation. The high-level inter-

mediate representation allows the optimizer to per-

form domain-speciﬁc optimizations. The lower-level

instruction-based address-only intermediate represen-

tation allows the compiler to perform memory-related

optimizations, such as instruction scheduling, static

memory allocation and copy elimination. At the low-

est level, the optimizer performs machine-speciﬁc code

generation to take advantage of specialized hardware

features. Glow features a lowering phase which en-

ables the compiler to support a high number of input

operators as well as a large number of hardware targets

by eliminating the need to implement all operators on

all targets. The lowering phase is designed to reduce

the input space and allow new hardware backends to

focus on a small number of linear algebra primitives.

1 Introduction

The end of power saving due to Moore’s Law, com-

bined with the increased demand for compute power

driven by machine learning, has led to a wave of in-

novation in computer architecture. Hennessy and

Patterson [1] present ﬁve principles that guide the de-

sign of machine-learning domain speciﬁc architectures

(DSA): dedicated local memories, large numbers of

arithmetic units, simple forms of parallelism, reduced

bitwidths, and domain-speciﬁc programming mod-

els. Compilers need to perform advance whole-graph

optimizations in order to execute neural networks ef-

ﬁciently on DSAs. In this paper we describe some of

these techniques.

Traditional machine learning frameworks iterate

over the nodes in the graph and execute them one

by one. Unfortunately the node-visitor method of

execution is ineﬃcient, even on traditional proces-

sors. As a result, machine learning frameworks have

started to hand over the graph to compilers [2] that

execute code more eﬃciently. Based on the increasing

importance of neural networks, the need for energy

eﬃciency in data centers and mobile devices, and the

design principles of domain-speciﬁc architectures, we

believe that the machine learning frameworks of the

future will focus on providing attractive programming

models on top of a layer that integrates compilers for

many diﬀerent targets.

In the Glow project, we focus on the lower parts of

the software stack. We work to provide PyTorch [3]

and other frameworks with a low-level graph and a

code generator for neural networks. The name Glow

is an abbreviation for Graph-Lowering, which is the

main technique that the compiler uses for generat-

ing eﬃcient code. The Glow low-level graph will

not replace the machine learning high-level graph, in

the same way that the low-level intermediate repre-

sentation in compilers does not replace the abstract

syntax tree. We aim to provide a useful compiler

toolkit that will allow hardware developers to focus

on implementing eﬃcient acceleration hardware, each

of which likely diﬀer in capabilities, and use Glow

for automating compilation tasks such as instruction

selection, memory allocation and graph scheduling.

The full compiler toolkit is open-source and publicly

available

2 Related Work

2.1 Relationship to Neural Network

Frameworks

Frameworks such as PyTorch [3], Caﬀe [4], and Ten-

sorFlow [5] have found success by providing a useful

http://github.com/pytorch/glow

arXiv:1805.00907v2 [cs.PL] 4 May 2018

下载后可阅读完整内容，剩余9页未读，立即下载

xiao_mei_mei

粉丝: 0
资源: 4

Glow编译器：优化神经网络计算的利器

Glow执行流程及其分图规则详解

JS+Flash网页元素发光插件glow!的计算机毕业设计实现

Java即时编译器与静态单赋值IR

如何利用Glow编译器进行计算图优化以提升神经网络在不同硬件上的性能？

如何运用Glow编译器对神经网络进行计算图优化，以实现在不同硬件平台上的性能提升？

针对神经网络的计算图，Glow编译器是如何优化以适应不同硬件并提升性能的？

Facebook揭秘深度学习编译器Glow.pdf

glow执行流程，分图规则

glow shader

glow:在PyTorch中实现Glow

最新资源