数据库查询执行：向量化与编译的对比分析

需积分: 14 174 浏览量更新于2024-08-05 收藏 257KB PDF 举报

"这篇论文探讨了向量化执行与编译在查询执行中的对比，作者包括Juliusz Sompolski, Marcin Zukowski和Peter Boncz。他们分别来自VectorWise B.V. 和Vrije Universiteit Amsterdam。文章的重点是分析现代CPU上针对分析型数据库工作负载的向量化和编译策略的行为，并通过研究Project、Select和Hash Join三个使用场景，揭示这两种方法的优劣。" 在数据库查询执行领域，向量化和编译是两种关键的技术，它们各自有着独特的优点。向量化执行（Vectorization）是一种优化技术，它将数据处理以向量的形式进行，一次处理多个数据元素，显著提升了CPU对数据的并行处理能力。这在现代处理器支持单指令多数据流（SIMD，Single Instruction Multiple Data）的情况下尤其有效，因为SIMD指令允许处理器在同一时钟周期内对多个数据执行相同的操作，从而提高了吞吐量。向量化执行模型通常用于减少解释执行的开销，因为它减少了指令的跳转，增强了指令代码的局部性。然而，编译（Compilation）策略，特别是即时编译（JIT，Just-In-Time）技术，通过将查询转化为机器码，可以进一步消除解释器的开销，并可能进行更深度的优化，如循环展开、死代码消除等。论文指出，对于分析型数据库工作负载，编译的优势在于能够生成更接近硬件优化的代码，而向量化则能利用现代处理器的SIMD特性。然而，两者并非互斥，而是相辅相成。在Ingres VectorWise数据库系统的研究中，他们发现编译应该总是与块级查询执行（block-wise query execution）相结合，这意味着查询被分解成一系列处理大块数据的子任务，每个子任务都可被高效地编译和向量化。在Project、Select和Hash Join这三个经典操作中，向量化执行和编译的结合表现出了更好的性能。Project操作涉及到选择数据集中的特定列，Select操作则是过滤数据，而Hash Join是关联两个数据集的关键操作。论文的贡献之一是识别出在这些操作中，结合使用向量化和编译可以取得最优效果。向量化和编译都是提升数据库查询效率的重要手段。在现代CPU架构下，理解如何有效地结合这两种技术对于数据库系统设计者和优化者至关重要。通过深入研究具体的工作负载和操作，可以更好地利用硬件资源，提供更高效的查询性能。

Vectorization vs. Compilation in Query Execution

Juliusz Sompolski

VectorWise B.V.

julek@vectorwise.com

Marcin Zukowski

VectorWise B.V.

marcin@vectorwise.com

Peter Boncz

Vrije Universiteit Amsterdam

p.a.boncz@vu.nl

ABSTRACT

Compiling database queries into executable (sub-) programs

provides substantial beneﬁts comparing to traditional inter-

preted execution. Many of these beneﬁts, such as reduced

interpretation overhead, better instruction code locality, and

providing opportunities to use SIMD instructions, have pre-

viously been provided by redesigning query processors to

use a vectorized execution model. In this paper, we try to

shed light on the question of how state-of-the-art compila-

tion strategies relate to vectorized execution for analytical

database workloads on modern CPUs. For this purpose, we

carefully investigate the behavior of vectorized and compiled

strategies inside the Ingres VectorWise database system in

three use cases: Project, Select and Hash Join. One of the

ﬁndings is that compilation should always be combined with

block-wise query execution. Another contribution is iden-

tifying three cases where “loop-compilation” strategies are

inferior to vectorized execution. As such, a careful merging

of these two strategies is proposed for optimal performance:

either by incorporating vectorized execution principles into

compiled query plans or using query compilation to create

building blocks for vectorized processing.

1. INTRODUCTION

Database systems provide many useful abstractions such

as data independence, ACID properties, and the possibil-

ity to pose declarative complex ad-hoc queries over large

amounts of data. This ﬂexibility implies that a database

server has no advance knowledge of the queries until run-

time, which has traditionally led most systems to implement

their query evaluators using an interpretation engine. Such

an engine evaluates plans consisting of algebraic operators,

such as Scan, Join, Project, Aggregation and Select. The op-

erators internally include expressions, which can be boolean

This work is part of a MSc thesis being written at Vrije

Universiteit Amsterdam.

The author also remains aﬃliated with CWI Amsterdam.

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

Proceedings of the Seventh International Workshop on Data Management

on New Hardware (DaMoN 2011), June 13, 2011, Athens, Greece.

conditions used in Joins and Select, calculations used to in-

troduce new columns in Project, and functions like MIN,

MAX and SUM used in Aggregation. Most query inter-

preters follow the so-called iterator-model (as described in

Volcano [5]), in which each operator implements an API that

consists of open(), next() and close() methods. Each next()

call produces one new tuple, and query evaluation follows a

“pull” model in which next() is called recursively to traverse

the operator tree from the root downwards, with the result

tuples being pulled upwards.

It has been observed that the tuple-at-a-time model leads

to interpretation overhead: the situation that much more

time is spent in evaluating the query plan than in actually

calculating the query result. Additionally, this tuple-at-a-

time interpretation model particularly aﬀects high perfor-

mance features introduced in modern CPUs [13]. For in-

stance, the fact that units of actual work are hidden in the

stream of interpreting code and function calls, prevents com-

pilers and modern CPUs from getting the beneﬁts of deep

CPU pipelining and SIMD instructions, because for these

the work instructions should be adjacent in the instruction

stream and independent of each other.

Related Work: Vectorized execution. MonetDB [2]

reduced interpretation overhead by using bulk processing,

where each operator would fully pro cess its input, and only

then invoking the next execution stage. This idea has been

further improved in the X100 project [1], later evolving into

VectorWise, with vectorized execution. It is a form of block-

oriented query processing [8], where the next() method rather

than a single tuple produces a block (typically 100-10000)

of tuples. In the vectorized model, data is represented as

small single-dimensional arrays (vectors), easily accessible

for CPUs. The eﬀect is (i) that the percentage of instruc-

tions spent in interpretation logic is reduced by a factor

equal to the vector-size, and (ii) that the functions that per-

form work now typically process an array of values in a tight

loop. Such tight loops can be optimized well by compilers,

e.g. unrolled when beneﬁcial, and enable compilers to gener-

ate SIMD instructions automatically. Modern CPUs also do

well on such loops, as function calls are eliminated, branches

get more predictable, and out-of-order execution in CPUs

often takes multiple loop iterations into execution concur-

rently, exploiting the deeply pipelined resources of modern

CPUs. It was shown that vectorized execution can improve

data-intensive (OLAP) queries by a factor 50.

Related Work: Loop-compilation. An alternative strat-

egy for eliminating the ill eﬀects of interpretation is using

Just-In-Time (JIT) query compilation. On receiving a query

下载后可阅读完整内容，剩余7页未读，立即下载

qhaoma

粉丝: 10
资源: 32

数据库查询执行：向量化与编译的对比分析

论文研究-向量计算Array OLAP查询处理技术.pdf

Derivatives, Backpropagation, and Vectorization.pdf

class Pipeline(object):

如何使用geotools实现栅格矢量化，请提供完整代码

如何重新生成 PCL 库，

spam数据集tfidf处理和count vectorization 处理结果对比

优化代码from scipy.misc import imread

最新资源