
Workshop track - ICLR 2018
DLVM: A MODERN COMPILER INFRASTRUCTURE FOR
DEEP LEARNING SYSTEMS
Richard Wei
Departments of Computer Science & Linguistics
University of Illinois at Urbana-Champaign
Urbana, IL 61801
xwei12@illinois.edu
Lane Schwartz
Department of Linguistics
University of Illinois at Urbana-Champaign
Urbana, IL 61801
lanes@illinois.edu
Vikram Adve
Department of Computer Science
University of Illinois at Urbana-Champaign
Urbana, IL 61801
vadve@illinois.edu
ABSTRACT
Deep learning software demands reliability and performance. However, many of
the existing deep learning frameworks are software libraries that act as an unsafe
DSL in Python and a computation graph interpreter. We present DLVM, a design
and implementation of a compiler infrastructure with a linear algebra intermediate
representation, algorithmic differentiation by adjoint code generation, domain-
specific optimizations and a code generator targeting GPU via LLVM. Designed
as a modern compiler infrastructure inspired by LLVM, DLVM is more modular
and more generic than existing deep learning compiler frameworks, and supports
tensor DSLs with high expressivity. With our prototypical staged DSL embedded
in Swift, we argue that the DLVM system enables a form of modular, safe and
performant frameworks for deep learning.
1INTRODUCTION
Within the deep learning community, most current approaches to neural networks make use of
high-level frameworks with a tensor domain-specific language (DSL) such as Torch (Collobert et al.,
2011), TensorFlow (Abadi et al., 2016), PyTorch (PyTorch Development Team, 2016), and MXNet
(Chen et al., 2015). Traditionally, developers would build a computation graph (or dynamically
generate graph nodes) using a DSL and let the framework interpret the computation graph on parallel
architectures such as NVIDIA GPUs. While using hand-tuned GPU subroutines usually yields the
best performance for complex operators, advanced compiler techniques can be applied to simplify
computation, merge high-level operators based on shaping conditions, and fuse compatible element-
wise operators to a single kernel to minimize the latency between kernel launches. Recent projects, the
TensorFlow XLA compiler (Leary & Wang, 2017) and the NNVM compiler (NNVM, 2017) including
TVM (Chen et al., 2017), have begun to apply compiler techniques to deep learning systems, targeting
LLVM (Lattner & Adve, 2004) and various back-ends to achieve good performance. However, their
design and implementation have not entirely followed established best practices in widely-used
compiler frameworks in the industry.
Moreover, some frameworks use operator-overloading algorithmic differentiation (AD) to compute
gradients, leaving the gradient computation unoptimizable. The other approach to AD, source code
transformation, can produce more efficient code. While frameworks such as TensorFlow already
perform AD as a graph transformation and apply various optimizations, their AD transformation is
not designed as a transformation pass in the pipeline of their compiler framework, but as part of the
DSL library. Making AD part of the compiler framework would greatly simplify the development of
DSLs, achieving separation of concerns.
1
arXiv:1711.03016v5 [cs.PL] 2 Feb 2018