1. It was done in C . Segmentation faults were occurring every day.
2. T here were many macros everywhere to handle different types of data in the
same way. For example, it is not the same to solve something that executes
SUM (column_int32) than something which executes SUM (column_int64). This
not only causes problems when programming because you have to duplicate a lot
of code, but it also had an impact on execution time, since for each value that is
generated you have to call the function with the corresponding data type. Basically
the same as a virtual table in C++ (not to be confused with the “Virtual Tables” of
SQLite that are a completely different thing).
It was at this point that we learned how a database engine really works. It turns out that
what you learn in college on relational algebra and those kind of things are not an invention
and can be used in real life!
We studied the paper “Efficiently Compiling Efficient Query Plans for Modern Hard-
ware” [38]. The main problem, they say, is the dispatch of methods in run time. The same
problem we m entioned earlier with the data types! This causes thousands or millions of
virtual function calls in the execution of a query plan, which breaks the execution pipeline
of the CPU.
The solution proposed by this paper was to compile the queries to machine code using
LLVM [1], which solves the problem of virtual functions by directly generating the neces-
sary code for each data type. As an experiment, we started to implement the ideas of this
paper. The main ideas of [38] are:
1. P rocessing is data centric and not operator centric. Data is processed such that we
can keep it in CPU registers as long as possible. Operator boundaries are blurred to
achieve this goal.
2. Data is not pulled by operators but pushed towards the operators. This results in
much better code and data locality.
3. Queries are compiled into native machine code using the optimizing LLVM com-
piler framework.
An additional advantage was that using relational algebra to solve queries gave a lot more
flexibility to solve queries than VTor did. So, we went on developing and developing.
It took one year but in the end we were able to finish it. We called this new engine “SNEL”
as an acronym for “S QL Native Execution for LLVM”. Well, that’s the excuse, actually we
chose Snel because that means “fast” in Dutch and the name seemed right to us.
When developing Snel we made the following decisions:
1. We were going to continue using the mechanism of SQLite virtual tables, which
was already implemented.
2. We were going to use the same VTor tables. So only the switch of the virtual table
module from VTor to Snel could be made and the results had to be the same, with an
improvement in performance. Therefore, we could continue using the mechanism
of mmaping the files with the data of the columns.
3. We did it in C++, since we were hating C and also the LLVM API is in C++. While
C++ brought a few problems, the number of segmentation faults and memory leaks
was greatly reduced. Now at least we had exceptions!
6