SQL查询优化技术：超越JOIN的GROUP-BY操作

1星需积分: 10 91 浏览量更新于2024-09-17 收藏 1.48MB PDF 举报

"《SQL 优化(适合开发数据库的人)》是关于数据库查询优化的一本好书，重点关注了如何在SQL查询中包含Group-By操作的优化策略。由Surajit Chaudhuri和Kyuseok Shim合著，他们在Hewlett-Packard Laboratories工作。本书提出了一种新的方法，使得在处理Group-By和聚合函数时，可以将这些操作提前到至少一个或多个JOIN操作之前，从而有可能显著降低查询处理的成本。优化决策基于成本估算进行，而传统如System-R风格的查询优化器可以通过引入我们提出的贪婪保守启发式算法进行改进。实验表明，这种方法不仅能提高计划的质量，而且优化成本增加幅度相对较小，同时也适用于Select Distinct查询的优化。" SQL查询优化是数据库性能的关键组成部分，尤其是在大数据量的场景下。在传统的数据库系统中，Group-By和聚合函数通常会在所有JOIN操作完成后执行。然而，这种做法可能导致不必要的数据处理和额外的计算开销。Surajit Chaudhuri和Kyuseok Shim提出的优化技术改变了这一情况，他们建议通过特定的转换将Group-By操作推前到JOIN操作之前，以减少整体查询处理的复杂性和资源消耗。他们的贪婪保守启发式算法是一种优化策略，旨在找到在满足查询结果准确性的同时，尽可能降低执行成本的查询执行计划。与传统的查询优化器相比，该算法能够生成至少不逊色、甚至更好的执行计划。这在实际应用中意味着更高效的查询性能，尤其是在处理复杂的多表JOIN和聚合操作时。实验结果显示，通过应用这种优化技术，可以显著提升查询计划的质量，这意味着查询速度的提高和资源利用率的改善。同时，优化过程本身的成本增加相对较小，这使得该方法在实际数据库管理中具有较高的实用性。此外，该方法也适用于处理Select Distinct查询，这对于需要去除重复记录的场景尤其有价值。《SQL 优化》这本书为数据库开发者和管理员提供了一套实用的工具和理论，帮助他们在设计和优化SQL查询时，能够更好地平衡性能和资源消耗，从而提升整个数据库系统的效率。

1.4 Related Work

In a recent paper pL93], Yan and Larson identified a

transformation that enables pushing the group-by past

joins. Their approach is based on deriving two queries,

one with and the other without a group-by clause, from

the given SQL query. The result of the given query is

obtained by joining the two queries so formed. Thus,

in their approach, given a query, there is a unique al-

ternate placement for the group-by operator. Observe

that the transformation reduces the space of choices

for join ordering since the ordering is considered only

within each query. Our transformations vastly general-

ize their proposal and also avoids the problem of the re-

duced search space for join ordering. For example, the

alternative execution suggested in Example 1.1 cannot

be obtained by transformations in pL93].

Prior work on group-by has addressed the problem

of pipelining group-by and aggregation with join [D87,

K182b] as well use of group-by to flatten nested SQL

queries [K82, D87, G87, M92]. But, these problems are

orthogonal to the problem of optimizing queries con-

taining group-by that we are addressing in this paper.

1.5 Outline

Section 2 discusses the preliminary concepts and as-

sumptions. In Section 3, we define the proposed trans-

formations. Section 4 is devoted to the optimization

algorithm. In Section 5, we discuss the experimental

results using our implementation of the optimizer. The

results in this section demonstrate that incorporating

the transformations in the traditional cost-based opti-

mizer is practical and results in significant improvement

in the quality of the plan produced.

2 Preliminaries and Notation

2.1 Query

We will follow the operational semantics associated

with SQL queries [DD93, ISO92]. We assume that the

query is a single block SQL query, as below.

Select All <columnlist> AGGl(bl)..AGGn(bn)

From

Where condl And cond2 . . . And condn

Group By coll,..colj

The WHERE clause of the query is a conjunction of simple

predicates. SQL semantics require that <columnlist>

must be among ~011,.

. co1 j . In the above notation,

AGGl. .AGGn represent built-in SQL aggregate functions.

In this paper, we will not be discussing the cases where

there is a HAVIBG and/or an OBBBB BY clause in the

query. We will also assume that there are no nulls in

the database. These extensions are addressed in [CS94].

We refer to columns in {bl, ..bn} as the aggregating

columns of the query. The columns in (~011, ..colj}

are called grouping columns of the query. The func-

tions {AGGI, ..AGGn} are called the aggregating func-

tions of the query. For the purposes of this paper, we

will assume that every aggregate function has one of

the following forms: Sum(colname), Hax(colname) or

Min(colname). Thus, we have excluded Avg and Count

as well as cases where the aggregate functions apply on

columns with the qualifier Distinct. In Section 3.4,

we will discuss extensions of our techniques.

2.2 Extended Annotated Join Trees

An execution plan for a query specifies choice of access

methods for each relation and an ordering of joins in

the query. Traditionally, such an execution plan is rep-

resented syntactically as an annotated join tree [S*79]

where the root is a group-by operation and each leaf

node is a scan operation. An internal node represents

a join operation. The annotations of a join node in-

clude the choice of the join method, as well as the se-

lection conditions and the list of projection attributes.

We assume that the selection conditions are evaluated

and projections are applied as early as possible. The

optimization problem is to cho

from its execution space.

e a plan of least cost

For optimization efficiency,

the execution space is often restricted to be the class

of left-deep join trees. These are annotated join trees

where the right child of every internal node is a leaf.

The transformations that we propose introduce

group-by operators as internal nodes. Therefore, we

define extended annotated join trees which are anno-

tated join trees except that a group-by may also occur

as an internal node. Likewise, we can define extended

left-deep join trees. These are trees subject to the same

restrictions as the traditional left-deep join trees. For

example, in Figure 1, tree (a) denotes a left-deep join

tree, whereas trees (b) and (c) are extended left-deep

join trees since the group-by occurs as an internal node

in these trees. Finally, note that we mark the scan

nodes by the name of the relation.

2.3 Group-By as an Operator

We assume that a group-by operator in an extended

join tree is specified using the following annotations:

(a) Grouping Columns (b) Aggregating Columns. The

meaning of these annotations are analogous to the cor-

responding properties for the query (See Section 2.1).

In reality, we need somewhat more elaborate annota-

tions, including aggregating functions, but such details

are not germane to our discussion here. We consider the

question of determining the annotations of a group-by

node if we place it immediately above a join or a scan

node n.

Definition 2.1: Join columns of a node n are columns

of n that participate in join predicates that are evalu-

356

剩余12页未读，继续阅读

yin007

粉丝: 5
资源: 14

SQL查询优化技术：超越JOIN的GROUP-BY操作

【整理】数据库面试题索引sql优化+数据库SQL优化总结之百万级数据库优化

Visual C#+SQL Server数据库开发与实例光盘

SQL Server 数据库技术---基础篇、数据库安全、SQL开发、数据库性能优化

SQL语句优化提高数据库性能

SQL Server 数据库优化

Oracle并行SQL优化：提升数据库性能

SQL优化：提升数据库性能的关键策略

SQL开发与优化：提升数据库性能的关键

SQL艺术提升：数据库开发实战与性能优化

MS SQL2008数据库开发与性能优化规范

最新资源