动态扩展的线性哈希技术

需积分: 48 199 浏览量更新于2023-03-03 收藏 1.05MB PDF 举报

"线性散列表(linear hash)是一种动态扩展或收缩地址空间的哈希技术，能够在不牺牲存取速度或内存效率的情况下支持任意数量的插入和删除操作。线性散列表在查找记录时通常具有较高的性能，即使负载达到90%，其性能也几乎保持不变。对于表中的记录，平均只需1.7次访问就能找到，而负载始终保持在80%。这是已知的能实现这种性能的唯一算法。线性散列表是基于文件和以主键标识的记录表的基础数据结构，与哈希和树（如B树、二叉树等）一起，构成了这些数据结构的主要寻址技术。" 线性散列表（Linear Hashing）是一种特殊的哈希方法，它解决了传统哈希表在动态扩展时可能出现的性能下降问题。在传统的哈希表中，当表满时，需要重新哈希全部元素，这会导致大量的时间开销。线性散列表则通过动态地增加或减少哈希表的大小来避免这种情况，确保在高负载下仍能保持高效。线性散列表的核心思想是将数据存储在多个哈希表（称为分片或槽）中，并且这些分片可以在线性地扩展。初始时，只有一个分片，随着数据的增加，当一个分片达到预设的负载因子时，会分裂成两个新的分片。这个过程通过一个叫做“分裂”（splitting）的操作完成，分裂过程中，已有的元素会被重新哈希到新的分片中，但这个过程可以在后台异步进行，不影响前端的查找和插入操作。线性散列的查找过程通常包括两步：首先，使用哈希函数计算记录的关键字的哈希值，然后根据这个值定位到相应的分片。如果目标分片被分裂，那么会有一个链接指向新创建的分片，这样就可以沿着链接找到正确的记录位置。由于分裂操作是渐进的，所以查找性能在大多数情况下保持稳定。线性散列表适用于那些需要频繁插入和删除操作，并且对性能有较高要求的场景。例如，在数据库系统中，用于快速索引和检索记录。相比其他数据结构，如B树，线性散列表在查找速度上有优势，尤其是在负载高的情况下，仍能保持较低的平均访问次数。然而，线性散列表也有其局限性。它的主要缺点在于分裂操作可能导致的数据迁移，这在处理大量数据时可能会成为性能瓶颈。此外，分裂操作可能导致内存分配和管理的复杂性增加。尽管如此，线性散列表仍然是解决动态哈希表问题的一个有效方案，特别是在需要动态扩展和保持高性能的场合。线性散列表提供了一种动态调整哈希表大小的方法，以适应数据的变化，同时保持高效的查找性能。这种技术在数据库、文件系统和其他需要快速存取记录的应用中有着广泛的应用。

展开

This choice respects (2) since, obviously, for any

non-negative integers k, L, either :

k mod 2L

= k mod L

or :

k mod 2L

= k mod L + L.

We assume that a collision occurs during the

insertion of c

= 4900. Instead of simply storing c

as an overflow record, we

change ho

to the

following h :

h(c) = h,(c)

h(c)

= ho(c)

if ho(c) = 0

otherwise.

We then reorganize the file. We thus have applied

h, as the split function and we have performed the

split for the address 0. The hashing function h

results from the spLit and is

a dynamic hashing

function.

ho 8 h

4 900

‘a)

$J;-%J--#iJ

0 1 53

100

Fig.

1. The use of a split for a collision reso-

t;;ion. (a) -

a collision occurs for the bucket 0.

- the collision is resolved without creating

an overflow record and the address space is

extended.

Fig. 1.b shows the new state of the file. Since

h = ho for each address except 0, no records other

than these hashed to 0 have been moved.

Since

hzh

for the records which h0 hashed to the

addre& 0, for approximatively half the number of

these

records the address has been changed. It

followed from (2) that all these records had the

same new address which, under the circumstances,

had to be 100. A new bucket has therefore been

appended to the last bucket of the file to which

all the records have been moved &l pne m.

Since for all these records the bucket 100 is

henceforward the primary bucket, they are all

accessible in m access. In particular, this is

also the case of the new record, i. e,, 4 900. On

the other hand, the records which remained in the

bucket 0 continue to be accessible in one access.

In contrast to what could be done if a classical

hashing was used,

the split has resolved the

collision ulthout creatina a

JELW.&X record and

without--deterioration.

Let us now assume that the file addressed with

h0 undergoes a

sequence of insertions which did

not yet lead to overflow records. Furthermore, let

assume that a split is performed iff a

collision occurs. The natural idea would be to

split the bucket which undergoes the collision,

this was

implicit for all algorithms for VH.

However, split

addresses must then be random and

this must lead to dynamic hashing functions using

tables.

Dynamic hashing functions which do not

need tables may be

obtained only if the split

addresses

are chosen in

a predefined order. To

perform splits in some predefined order, instead

of split the bucket which undergoes the collision,

is the main new idea in the linear hashing.

Let m be the address of a collision. Let n be

the address of a split to be performed in the

course

of the resolution of this collision. Since

the values of m are random while these of n are

predefined, usually

n f m. If so, we assume that

the new record is stored as an overflow record

from the bucket m through a classical CRM, bucket

chaining for instance. Next, we assume that n is

given by a pointer which thus indicate the bucket

to be split.

For the first N collisions, the

buckets are

pointed in the

linear order

0,1,2,..,

N-l

and all splits use h,

(2.2) implies

then,

that the file becomes progr&sively larger,

including one

after another the buckets N+l,

N+2 ,...,2N-1.

A record to be inserted undergoes a

split usually not when it leads to the collision,

but with some delay. The delay corresponds to the

number of buckets which has to

be pointed while

the pointer travels up, from the address indicated

in the moment of the collision, to the address of

this collision.

With this mechanics,

no matter what is the

address,

let it be ml, of the first collision, LH

performs the first split using h, and for the

address 0. The records from the bucket 0 are then

randomly distributed between the bucket

0 and a

new bucket N, while, unless m,=O, an overflow

record is created for the bucket m,.

The second

collision,

no matter what is its address, let

say m2,

leads to an analogous result, except that,

214

下载后可阅读完整内容，剩余11页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

ciahi

粉丝: 68

动态扩展的线性哈希技术

散列表 （哈希表，线性探测再散列）

LinearProbingHash:使用哈希表的线性探测实现

Linear Hash Table (C++ Implementation)

基本散列表的线性探查法

Visual C++线性探测散列表技术实现解析

散列表原理与步长选择：从线性到二次探查

线性探查、双散列与开散列法在散列表构建中的应用

使用散列表实现插队买票算法

散列表详解：散列技术、散列函数与冲突解决方法

散列表优化技巧

最新资源

散列表（哈希表，线性探测再散列）