PeachPy: A Python Framework for
Developing High-Performance Assembly
Kernels
Marat Dukhan
School of Computational Science & Engineering
Georgia Institute of Technology, Atlanta, GA, USA
Abstract—We introduce PeachPy, a Python frame-
work which aids the development of assembly kernels
for high-performance computing. PeachPy automates
several routine tasks in assembly programming such as
allocating registers and adapting functions to different
calling conventions. By representing assembly instruc-
tions and registers as Python objects, PeachPy enables
developers to use Python for assembly metaprogram-
ming, and thus provides a modern alternative to tradi-
tional macro processors in assembly programming. The
current version of PeachPy supports x86-64 and ARM
architectures.
I. INTRODUCTION
We consider the problem of how to enable produc-
tive assembly language programming. The use of
assembly still plays an important role in develop-
ing performance-critical computational kernels in
high-performance computing. For instance, recent
studies have shown how the performance of many
computations in dense linear algebra depend criti-
cally on a relatively small number of highly tuned
implementations of microkernel code [1], for which
high-level compilers produce only suboptimal im-
plementations [2]. In cases like these, manual low-
level programming may be the only option. Unfor-
tunately, existing mainstream tools for assembly-
level programming are still very primitive, being
tedious and time-consuming to use, compared to
higher-level programming models and languages.
Our goal is to ease assembly programming. In
particular, we wish to enable an assembly pro-
grammer to build high-performing code for a va-
riety of operations (i.e., not just for linear algebra),
data types, and processors, and to do so using a
relatively small amount of assembly code, com-
bined with easy-to-use metaprogramming facilities
and nominal automation for routine tasks provided
they do not hurt performance. Toward this end,
we are developing PeachPy, a new Python-based
framework that aids the development of high-
performance assembly kernels.
PeachPy joins a vast pool of tools that blend
Python and code generation. Code-generation is
used by Python programs for varying reasons and
use-cases. Some projects, such as PyPy [3] and
Cython [4], use code-generation to improve the
performance of Python code itself. Cython achieves
this goal by statically compiling Python sources to
machine codes (via C compiler) and providing a
syntax to specify types of Python variables. PyPy
instead relies on JIT-compilation and type inference.
Another group of code generation tools is com-
prised of Python bindings to widely used general-
purpose code-generation targets. This groups in-
cludes LLVM-Py [5], PyCUDA and PyOpenCL [6]
projects. In essence, these examples focus on accel-
erating Python through low-level code generation.
By contrast, we are interested primarily in the
task of generating assembly code using Python.
This style of Python-assisted assembly program-
ming was pioneered by CorePy [7], the works of
Malas et al. [8], Dongarra and Luszczek [9], and
several other projects [10, 11]. CorePy [7] enabled
developers a programmer to write an assembly
program in Python and compile it from the Python
interpreter. The authors suggested using CorePy
to optimize performance-sensitive parts of Python
application. Malas et al. [8] used Python to auto-
tune assembly for PowerPC 450 processors power-
ing Blue Gene/P supercomputers. Their assembly
framework could simulate the processor pipeline
and reschedule instructions to avoid data hazards
and improve performance. The assembly code-
generator of Dongarra and Luszczek [9] focuses on
an assembly programmer productivity, and is the
most similar in spirit to PeachPy. The primary use-
case for their code-generator is cross-compilation
to ARM, but they also supported a debug mode,
where the code-generator outputs equivalent C
code.
The audience for PeachPy is an optimization
expert writing multiple but similar kernels in as-
PyHPC 2013, November 18, 2013, Denver, Colorado, USA