Snel：LLVM的SQL原生执行引擎

86 浏览量更新于2024-07-14 收藏 277KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

"Snel是一个利用LLVM编译器基础设施实现SQL原生执行的关联数据库引擎。它具有即时编译查询和列式数据表示的特性，特别适用于快速在线分析。Snel通过提供对SQL语言的扩展方法，如直方图解析，增强了功能。该系统不采用传统的客户端-服务器接口，而是作为SQLite的扩展，提供命令行交互和应用程序嵌入的简便方式。由于Snel表格只读，没有事务或更新功能，因此查询速度非常快，避免了表锁定和一致性保证的开销。Snel本质上是一个动态库，客户端应用可以无缝集成其功能。" Snel是一个创新的数据库引擎，其核心亮点在于结合了Just-In-Time（JIT）编译技术与LLVM框架，这使得SQL查询能够被高效地转换为机器代码，从而在运行时直接执行。这种设计显著提升了在线分析处理（OLAP）的性能，特别是在大数据场景下，能够快速处理复杂查询。不同于大多数传统的关系型数据库，Snel并没有提供标准的客户端-服务器接口。相反，它选择与SQLite进行深度集成，成为SQLite的一个扩展模块。用户可以通过命令行轻松地与Snel进行交互，同时，开发人员也可以将Snel嵌入到他们的应用程序中，简化了数据库操作的集成过程。由于Snel的数据存储模型基于列式存储，这种模式在处理大量数据的分析查询时具有优势，因为可以并行处理同一列的多个元素，提高I/O效率。此外，Snel还支持对SQL语言的自定义扩展，比如直方图的解析，这允许用户在查询中直接处理统计信息，增强了数据分析的能力。值得注意的是，Snel的设计是只读的，这意味着它不支持事务处理和数据更新功能。这样的设计决策是为了避免在执行查询时进行表锁定和维护数据一致性所带来的开销，从而进一步优化查询速度。然而，这也限制了Snel在需要频繁写入和更新数据的应用场景中的适用性。作为一个动态链接库，Snel能够被应用程序直接调用，提供了灵活的接口供开发者使用。这使得Snel成为那些需要高效分析查询但不需要事务处理能力的场景的理想选择，例如实时报表生成、大数据分析等。 Snel通过结合LLVM的编译优化和独特的系统架构，为需要快速响应和高分析性能的应用提供了一个强大的解决方案。它的设计思路和特性展示了数据库技术在应对特定挑战时的创新路径，为未来的数据库系统设计提供了新的思考方向。

资源详情

资源推荐

1. It was done in C . Segmentation faults were occurring every day.

2. T here were many macros everywhere to handle different types of data in the

same way. For example, it is not the same to solve something that executes

SUM (column_int32) than something which executes SUM (column_int64). This

not only causes problems when programming because you have to duplicate a lot

of code, but it also had an impact on execution time, since for each value that is

generated you have to call the function with the corresponding data type. Basically

the same as a virtual table in C++ (not to be confused with the “Virtual Tables” of

SQLite that are a completely different thing).

It was at this point that we learned how a database engine really works. It turns out that

what you learn in college on relational algebra and those kind of things are not an invention

and can be used in real life!

We studied the paper “Efﬁciently Compiling Efﬁcient Query Plans for Modern Hard-

ware” [38]. The main problem, they say, is the dispatch of methods in run time. The same

problem we m entioned earlier with the data types! This causes thousands or millions of

virtual function calls in the execution of a query plan, which breaks the execution pipeline

of the CPU.

The solution proposed by this paper was to compile the queries to machine code using

LLVM [1], which solves the problem of virtual functions by directly generating the neces-

sary code for each data type. As an experiment, we started to implement the ideas of this

paper. The main ideas of [38] are:

1. P rocessing is data centric and not operator centric. Data is processed such that we

can keep it in CPU registers as long as possible. Operator boundaries are blurred to

achieve this goal.

2. Data is not pulled by operators but pushed towards the operators. This results in

much better code and data locality.

3. Queries are compiled into native machine code using the optimizing LLVM com-

piler framework.

An additional advantage was that using relational algebra to solve queries gave a lot more

ﬂexibility to solve queries than VTor did. So, we went on developing and developing.

It took one year but in the end we were able to ﬁnish it. We called this new engine “SNEL”

as an acronym for “S QL Native Execution for LLVM”. Well, that’s the excuse, actually we

chose Snel because that means “fast” in Dutch and the name seemed right to us.

When developing Snel we made the following decisions:

1. We were going to continue using the mechanism of SQLite virtual tables, which

was already implemented.

2. We were going to use the same VTor tables. So only the switch of the virtual table

module from VTor to Snel could be made and the results had to be the same, with an

improvement in performance. Therefore, we could continue using the mechanism

of mmaping the ﬁles with the data of the columns.

3. We did it in C++, since we were hating C and also the LLVM API is in C++. While

C++ brought a few problems, the number of segmentation faults and memory leaks

was greatly reduced. Now at least we had exceptions!

2 The Architecture of Snel

Broadly speaking, Snel has 3 important parts:

1. T he storage system.

2. T he SQL translator to relational algebra and optimizer.

3. T he translator from relational algebra to IR. The Intermediate Representation (IR)

is a low-level programming language similar to assembly, and can be considered as

the “bytecode” of LLVM.

2.1 Storage System

Snel stores the tables in a very simple way: each column is stored in a single ﬁle, in “ﬂat”

format. That is, if for example we have an int32 column, the ﬁle will have 4 bytes for each

value. This allows you to map the ﬁle to memory and use it as if it were an array directly,

without any intermediate conversion. The big problem with this is that it wastes a lot of

space, since the vast majority of data are N ULLs or 0 and could easily be compressed.

Each column ﬁle has a .snelcol extension.

The indexes are stored in a similar way to the columns, except that each value of the index

is a tuple (value, rowid). The rowid is 64 bits, so each index entry is (sizeof (value) + 8)

bytes. This also wastes a lot of space. E ach index ﬁle has a .s nelidx extension.

In addition to the column and index ﬁles, there is a ﬁle with extension .snel that is nothing

more than a JSON with the description of the schema of the table: what type of data each

column has, if they are indexed or not, if N ULLs are allowed, etc.

Text Columns

What we said earlier makes sense for columns with ﬁxed-length data types: integers and

ﬂoating point. However, the text columns have a special format.

Each .snelcol ﬁle that stores a column of text has the following format:

String 1\0

String 2\0

...

String n\0

Sync bytes (’SB’)

Offset to String 1 (−1 if NULL)

Offset to String 2 (−1 if NULL)

...

Offset to String n (−1 if NULL)

Offset to the offset of String 1 (yo dawg)

At ﬁrst glance, this last value does not seem to make sense. However, it is necessary to know

where the list of offsets begins within the ﬁle quickly. The indices of the text columns do

剩余32页未读，继续阅读

weixin_38663029

粉丝: 8
资源: 948

Snel：LLVM的SQL原生执行引擎

可莱特SNEL/SNE-S防爆型LED长亮/闪亮警示灯目录.pdf

JDK12-java-shell-user-guide.pdf

Mnist数据集，用于人工智能相关基础模型的学习及编程练习

水空两用无人机动力系统设计与研究.pdf

大神asp新闻发布系统毕业课程设计项目+论文

矫正图像带的旋转角度信息和目标检测标签坐标也随之改变.zip

技术方案资料技术方案资料PCB设计高级教程资料.zip

8051Proteus仿真c源码用定时器T0的中断实现长时间定时

基于Java Swing框架的知识付费管理系统.zip

（全新整理）深圳市2023年房价数据

git教程：Git使用.one礼包集合

模型基础班预习课件.zip

森林病虫害管理系统（前端）.zip

springboot致远汽车租赁系统

基于SpringBoot的多媒体信息共享平台开发,java项目 eclipse和idea都能打开运行 推荐环境配置：ecli

CD销售管理系统JSP+SQL(源代码+lw)

1446-基于51单片机的气压检测（MPX4115,上位机，LCD）proteus，原理图、流程图、物料清单、仿真图、源代码

四旋翼无人机室内自主导航系统的研究与实现.pdf

智能病虫害防治系统.zip

8051Proteus仿真c源码大屏幕仿真

最新资源

基于SpringBoot的多媒体信息共享平台开发,java项目 eclipse和idea都能打开运行推荐环境配置：ecli