hash join 和hash军hash join outer
时间: 2023-07-21 16:48:31 浏览: 70
Hash Join是一种常见的关系型数据库连接操作,它使用哈希表来处理连接操作。具体来说,Hash Join将连接操作拆分为两个步骤:首先,将要连接的两个表中的一个表按照连接键进行哈希分组,然后将另一个表中的记录与哈希分组后的结果进行匹配,最终得到连接的结果。
而Hash Join Outer(也称为Hash Join Full Outer)是Hash Join的一种扩展形式,它可以处理左外连接、右外连接和全外连接。具体来说,Hash Join Outer将连接操作拆分为三个步骤:首先,将左表和右表分别按照连接键进行哈希分组;然后,将左表和右表中的记录与哈希分组后的结果进行匹配,并将匹配成功的结果输出;最后,将左表和右表中未匹配成功的记录输出,同时输出对应的空值。
总的来说,Hash Join和Hash Join Outer都是常见的关系型数据库连接操作,它们可以有效地处理大规模数据集的连接操作,提高查询效率。
相关问题
Oracle的执行计划的hash join和hash join outer 什么情况下会执行
在Oracle数据库中,Hash Join和Hash Join Outer的执行取决于许多因素,包括表大小、连接键的选择、可用内存大小等等。
通常情况下,如果要连接的表很小,那么Oracle可能会选择使用Nested Loop Join(嵌套循环连接)或Sort Merge Join(排序合并连接)等其他连接算法。但是,如果要连接的表非常大,而且内存资源充足,那么Oracle就可能会选择使用Hash Join来处理连接操作。
具体来说,如果Oracle认为Hash Join的执行代价比其他连接算法更低,那么它就会选择Hash Join。而对于Hash Join Outer,则是在处理左外连接、右外连接和全外连接时使用的,如果需要进行这些连接操作,Oracle就会选择Hash Join Outer来执行。
需要注意的是,虽然Hash Join和Hash Join Outer可以提高连接操作的效率,但是它们也需要消耗大量的内存资源,因此在实际使用中需要根据具体情况进行权衡和优化。
hashjoin 源码
hashjoin 是一种常用的关系型数据库的查询算法,用于将两个表中共同的列进行匹配并连接起来。具体实现可以参考 PostgreSQL 数据库中的代码实现。
在 PostgreSQL 中,hashjoin 的实现主要包括三个部分:构建哈希表、扫描哈希表和匹配结果。
构建哈希表:
```c
static HashJoinTable
ExecHashTableCreate(PlanState *parent,
List *hashOperators, /* hash function to use for each join key */
long nbuckets, /* # buckets in hashtable */
Size entrysize, /* size of each entry in hashtable */
bool use_variable_hash_iv)
{
HashJoinTable hashtable;
int nbuckets_est = nbuckets;
int log2_nbuckets;
/* Limit nbuckets to at most INT_MAX; must do this before sizing to power of 2 */
if ((double) nbuckets_est * (double) entrysize > (double) INT_MAX)
nbuckets_est = (int) floor((double) INT_MAX / (double) entrysize);
/* Size hash table to a power of 2 */
log2_nbuckets = my_log2(nbuckets_est);
hashtable = (HashJoinTable) palloc0(HJTUPLE_OVERHEAD +
sizeof(HashJoinTableData) +
(entrysize * (1 << log2_nbuckets)));
hashtable->nbuckets = nbuckets_est;
hashtable->log2_nbuckets = log2_nbuckets;
hashtable->buckets = (HashJoinTuple *)
(((char *) hashtable) + HJTUPLE_OVERHEAD + sizeof(HashJoinTableData));
hashtable->hash_iv = GetPerTupleExprContext(parent)->ecxt_hashjoin_outer;
/* Initialize all hash bucket headers to empty */
MemSet(hashtable->buckets, 0, sizeof(HashJoinTuple) << log2_nbuckets);
/* Set up array containing OIDs of hash operators */
ExecChooseHashFuncs(hashOperators,
hashtable->hashfunctions,
hashtable->nbuckets,
use_variable_hash_iv);
return hashtable;
}
```
扫描哈希表:
```c
static TupleTableSlot *
ExecScanHashBucket(HashJoinState *hjstate,
ExprContext *econtext)
{
HashJoinTable hashtable = hjstate->hj_HashTable;
AttrNumber *hj_OuterHashKeys = hjstate->hj_OuterHashKeys;
TupleTableSlot *innerTupleSlot = hjstate->hj_InnerTupleSlot;
TupleTableSlot *outerTupleSlot = hjstate->hj_OuterTupleSlot;
HashJoinTuple hashTuple;
uint32 hashvalue;
int bucketno;
/* loop until we find a join tuple */
for (;;)
{
hashvalue = ExecHashGetBucket(hjstate, hashtable,
hj_OuterHashKeys,
econtext,
false);
bucketno = ExecHashGetBucketNumber(hashvalue, hashtable->log2_nbuckets);
/*
* Scan the bucket for matching tuples.
*/
for (hashTuple = hashtable->buckets[bucketno];
hashTuple != NULL;
hashTuple = hashTuple->next)
{
if (hashTuple->hashvalue != hashvalue)
continue;
/* Found a match? Then report and save tuple */
if (ExecQualAndReset(hashTuple->hashressupport, econtext))
{
/* save the matching tuple */
ExecStoreMinimalTuple(HJTUPLE_MINTUPLE(hashTuple),
innerTupleSlot,
false);
/* set up for next join tuple, if any */
hjstate->hj_CurHashValue = hashvalue;
hjstate->hj_CurBucketNo = bucketno;
return outerTupleSlot;
}
}
/*
* No match in this bucket; check for additional matches in outer
* batches.
*/
if (!ExecScanHashTableForUnmatched(hjstate, econtext))
return NULL; /* need new outer tuple */
}
}
```
匹配结果:
```c
static TupleTableSlot *
ExecHashJoin(HashJoinState *node)
{
PlanState *outerNode = outerPlanState(node);
HashJoinTable hashtable = node->hj_HashTable;
TupleTableSlot *innerTupleSlot = node->hj_InnerTupleSlot;
TupleTableSlot *outerTupleSlot = node->hj_OuterTupleSlot;
ExprContext *econtext = node->js.ps.ps_ExprContext;
TupleTableSlot *result;
MinimalTuple tuple;
/*
* Reset per-tuple memory context to free any expression evaluation
* storage allocated in the previous tuple cycle.
*/
ResetExprContext(econtext);
/*
* if first time through, read all inner tuples into hashtable
*/
if (!node->hj_CurHashValue)
{
/* Reset hash table to empty */
ExecHashTableReset(hashtable);
/* Load hashtable with inner tuples */
ExecHashJoinNewBatch(node);
/* If inner relation is completely empty, return no rows */
if (hashtable->totalTuples == 0)
return NULL;
}
/*
* We read the outer tuple in the previous iteration, which means that we
* have to check for additional join matches for it before continuing.
*/
if (node->hj_JoinState == HJ_NEED_NEW_OUTER)
{
if (!ExecScanHashTableForUnmatched(node, econtext))
return NULL; /* need new outer tuple */
}
/*
* Now check for any matches
*/
for (;;)
{
/*
* If we've run out of inner tuples, then the current outer tuple
* can't have a match, so we're done with it.
*/
if (node->hj_CurTuple == NULL)
{
if (!ExecScanHashTableForUnmatched(node, econtext))
break; /* need new outer tuple */
continue; /* search next hash bucket */
}
/*
* Check for join match.
*/
if (ExecQual(node->js.ps.qual, econtext))
{
/*
* qualification was satisfied so we project and return the
* slot containing joined tuples, making sure that the slot is
* labeled with the join's rowtype.
*/
ExecProject(node->js.ps.ps_ProjInfo);
result = node->js.ps.ps_ProjInfo->pi_slot;
/*
* We return the first (and only) qualifying join tuple. The
* executor doesn't support the idea of generating multiple
* join rows from one outer tuple when there are multiple
* matching inner tuples (compare the semantics of a nested
* loops join).
*/
if (hashtable->nbatch == 1)
{
/* In single-batch case, just return the result */
return result;
}
else
{
/*
* Before returning the first join tuple, force the
* other tuples in the same join group to be fetched and
* appended to the result list.
*/
tuple = ExecFetchSlotMinimalTuple(innerTupleSlot);
ExecHashTableMarkCurBucket(hjstate);
ExecHashTableGetBucketAndBatch(hashtable,
node->hj_CurHashValue,
&node->hj_CurBucketNo,
&node->hj_CurTuple,
&node->hj_CurBucketBuf);
/*
* Set the next tuple to return, if any. Done in this order
* so that if there is only one tuple in the group, we don't
* advance the pointers at all.
*/
if (node->hj_CurTuple != NULL)
node->hj_NextTuple = node->hj_CurTuple->next;
else
node->hj_NextTuple = NULL;
/* Remember there's a join tuple available */
node->hj_JoinState = HJ_NEED_NEW_OUTER;
/* And return the first tuple */
return result;
}
}
/*
* Didn't match this time. Try next tuple in inner relation.
*/
node->hj_CurTuple = node->hj_CurTuple->next;
}
/*
* no more matches
*/
return NULL;
}
```
以上代码是 PostgreSQL 中 hashjoin 的基本实现,可以作为参考。
相关推荐
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![text/plain](https://img-home.csdnimg.cn/images/20210720083646.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![docx](https://img-home.csdnimg.cn/images/20210720083331.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![docx](https://img-home.csdnimg.cn/images/20210720083331.png)