Grapefruit: An Open-Source, Full-Stack, and
Customizable Automata Processing on FPGAs
Reza Rahimi
1
, Elaheh Sadredini
2
, Mircea Stan
1
, Kevin Skadron
2
1
Department of Electrical & Computer Engineering,
2
Department of Computer Science
University of Virginia
Charlottesville, Virginia, USA
Email: {rahimi, elaheh, mircea, skadron}@virginia.edu
Abstract—Regular expressions have been widely used in var-
ious application domains such as network security, machine
learning, and natural language processing. Increasing demand for
accelerated regular expressions, or equivalently finite automata,
has motivated many efforts in designing FPGA accelerators.
However, there is no framework that is publicly available,
comprehensive, parameterizable, general, full-stack, and easy-to-
use, all in one, for design space exploration for a wide range of
growing pattern matching applications on FPGAs. In this paper,
we present Grapefruit, the first open-source, full-stack, efficient,
scalable, and extendable automata processing framework on
FPGAs. Grapefruit is equipped with an integrated compiler with
many parameters for automata simulation, verification, mini-
mization, transformation, and optimizations. Our modular and
standard design allows researchers to add capabilities and explore
various features for a target application. Our experimental results
show that the hardware generated by Grapefruit performs 9%-
80% better than prior work that is not fully end-to-end and
has 3.4× higher throughput in a multi-stride solution than a
single-stride solution.
I. INTRODUCTION
Finite automata are an efficient computational model for
widely used pattern recognition languages such as regular
expressions, with applications in network security [1], [2], log
analysis [3], and newly-demonstrated other applications in do-
mains such as data-mining [4], [5], [6], [7], bioinformatics [8],
[9], machine learning [10], [11], natural language processing
[12], [13], and big data analytics [14] that have been shown
to greatly benefit from accelerated automata processing.
Researchers are increasingly exploiting hardware accelera-
tors to meet demanding real-time requirements as performance
growth in conventional processors is slowing. In particular,
several FPGA-based regex implementations for single-stride
[15], [16], [17], [18], [19] and multi-stride [20], [21], [22],
[23] automata processing have been proposed to improve the
performance of regex matching. These solutions provide a
reconfigurable substrate to lay out the rules in hardware by
placing-and-routing automata states and connections onto a
pool of hardware units in logic- or memory-based fabrics. This
allows a large number of automata to be executed in parallel,
up to the hardware capacity, in contrast to von Neumann
architectures such as CPUs that must handle one rule at a time
in each core. Most of the current FPGA solutions are inspired
by network applications such as Network Intrusion Detection
Systems (NIDS). However, patterns in other applications can
have different structure and behavior, e.g., higher fan-outs, and
this makes it difficult for NIDS-based FPGA solutions to map
other automata to FPGA resources efficiently [18], [24], [25].
To enable architectural research, trade-off analysis, and
performance comparison with other architectures on the grow-
ing range of applications, an open-source, full-stack, param-
eterized, optimized, scalable, easy-to-use, and easy-to-verify
framework for automata processing is required. REAPR [15] is
a reconfigurable engine for automata processing, and generates
FPGA configurations that operate very similarly to the Micron
Automata Processor (AP) style [26] processing model. The
RTL generated from the automata graph is a flat design, which
causes a very long compilation time. Due to this flat-design
approach, this solution is not scalable and the synthesizer fails
to generate RTL for larger designs. Moreover, REAPR only
generates the matching kernel and does not provide a full-stack
solution or even the automata reporting architecture.
Bo et al. [27] extend REAPR and provide an end-to-end
solution on FPGAs using SDAccel for the I/O. However,
their I/O design has two issues. First, the input stream should
be segmented into limited-size chunks. Second, the reporting
structure is very simple; whenever a state generates a report,
a long vector (the size of the vector is equal to the number of
total reporting states), mostly filled with zeroes, is read and
sent to the host. This reporting architecture may become a
bottleneck for applications with frequent but sparse reporting,
which is a common reporting behavior [28]. Casias et al. [29]
also extended REAPR and proposed a tree-shaped hierarchical
pipeline architecture. However, their solution generates the
HDL code for only the kernel and does not provide a full-
stack solution (i.e., broadcasting input symbols to the logic
elements and getting the reporting data out the FPGA chip).
Furthermore, their source code is not publicly available.
Researchers are interested in using a tool that gives them
the flexibility to explore design space parameters compre-
hensively. For example, in automata processing, symbol size
impacts the throughput and hardware cost [30], and none of
the prior tools provide support for that. Similarly, reporting
architecture can be a performance bottleneck that can reduce
the throughput significantly [28] (up to 46X stall overhead in
the Micron AP). To improve the performance, Liu et el. [31]
138
2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)
2576-2621/20/$31.00 ©2020 IEEE
DOI 10.1109/FCCM48280.2020.00027
Authorized licensed use limited to: Univ of Calif Riverside. Downloaded on December 16,2020 at 06:39:35 UTC from IEEE Xplore. Restrictions apply.