云大规模动态多维索引框架：Skip-Octree的效率提升与应用

研究论文

197 浏览量更新于2024-08-28 收藏 1.93MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

大规模云数据的管理是当今许多企业中不可或缺的一环，随着云计算的快速发展，对数据存储和检索的需求也在不断增长。传统的云存储系统往往依赖于键值对存储机制，这使得它们在处理复杂的多维度查询时效率低下，因为这类查询通常需要扫描整个数据集，耗时且资源消耗大。为了克服这一局限性，本研究论文提出了一种新颖的解决方案——基于Skip-list和Octree的动态多维索引框架，称为Skip-Octree。 Skip-list是一种概率数据结构，它通过增加链表的层级来实现快速查找，同时保持了线性时间复杂度。在大规模云环境中，将Skip-list与Octree结合，可以构建一个层次分明、易于在分布式系统中实现的数据索引结构。Octree是一种空间分割数据结构，特别适用于处理多维数据，它通过递归地将数据空间划分为八等份，形成树状结构，有效地减少了查询范围。论文的主要贡献包括： 1. 设计与实现：提出了一种基于随机化Skip-list的Octree索引结构，简化了在云存储系统中构建多维度索引的复杂性。这种设计允许快速定位到数据子集，而不仅仅是单一维度的关键信息。 2. 算法开发：针对这种索引结构，作者开发了一系列关键的索引操作算法，包括范围查询算法、索引维护算法以及动态扩展策略。这些算法旨在优化查询性能，并确保在数据量变化时能够高效地调整索引结构。 3. 实验评估：论文通过实验证明了Skip-Octree索引的有效性和效率。实验结果显示，与传统方法相比，该索引在处理复杂多维度查询时表现出显著的优势，尤其是在大规模数据集上，其性能优势更为明显。 4. 关键词：论文的关键词涵盖了主要的研究领域，如云存储、多维索引、分布式索引、Skip-Octree、Skip-list以及Octree，突出了研究的核心内容。总结来说，这篇研究论文为解决大规模云数据的高效多维度查询问题提供了一个创新的解决方案，通过将Skip-list和Octree相结合，构建出一种适应动态数据需求的动态多维索引，有望推动云计算领域内的数据管理和查询性能提升。

资源详情

资源推荐

In this study, we propose a novel skip list and Octree-

based dynamic index. As far as we know, ours is the first

work to set up an auxiliary cloud index using a skip list

and Octree structure.

Framework of the Skip-Octree index

In cloud storage systems , a whole dataset is distributed

and stored on multiple data servers. Therefore, query

performance is mainly affected by two aspects. One is

the manner in which to locate the corresponding data

servers that stored user required data effectively. The

other is the manner in which to improve the efficiency

of data access on each local data server. In this study, a

new double-layer cloud indexing framework based on

Octree and skip list is proposed.

Background of Octree and skip list

Octree is a type of multidimensional data structure with

which a multidimensional data space is recursively divided

into eight equal subspaces (namely quadran ts) until a qua d-

rant contains only one data object. In addition, Octree is an

adopted tree-based storage structure. For an Octree, an ori-

ginal data space is represented as a root node. Then, eight

quadrants which act as eight children nodes of the root are

generated by space partition. However, under the condition

in which data is both sparsed and skewed, the query per-

formance of Octree is worse than sequence retrieve. Hence,

the compressed Octree was proposed in the study in [28].

In a compressed Octree, all empty paths are removed.

Compared with R tree [29], the space division method of

compressed Octree is simpler, and no space overlap occurs.

Therefore, compressed Octree is used to index local data in

this study. For simplicity, the compressed Octree is also

called Octree in our cloud index framework.

The skip list [30] is a randomized data structure that or-

ganizes elements with hierarchical ordered link lists. Thus,

it is an extension of the ordered list. Because query pro-

cessing on each layer can skip many elements, a skip list

can provide adequate query performance with a balanced

binary tree. In addition, because a randomized algorithm

is adopted to maintain balance rather than employing

strictly enforced balancing, the insertion and deletion op-

erations in a skip list are much simpler and considerably

faster than the balanced binary tree. Furthermore, skip list

is well suited to parallel computation applications. The in-

sertion can be performed in parallel using different posi-

tions of the ordered list without rebalancing the global

data structure. Skip list has been embedded in some popu-

lar key-value store databases such as Leveldb and Redis.

Strictly speaking, skip list is not a search tree, but

its expected time complexity is O(log

n), which is

similar to a binary search tree. In our Skip-Octree,

the idea of skip list is utilized to accelerate the data

retrieval efficiency of Octree.

Skip-Octree index specification

Octree is an efficient three-dimensional space partition

method. However, in a cloud environment, extensive

data can enlarge Octree to such an extent that it be-

comes inaccessible. In this section, our proposed index

structure called Skip-Octree is described. Skip-Octree

provides a hierarchical view of the compressed Octree to

allow for logarithmic expected-time querying.

Design of Skip-Octree

Based on the randomizing idea of a skip list, the original

dataset is randomly divided into subsets with a probabil-

ity of 1/2. In addition, an individual Octree is con-

structed for each data set.

In Fig. 1, Q

, Q

,andQ

are three datasets, where Q

is the original dataset, Q

contains approximately half

the data of Q

and which is a subset of Q

, and Q

is a

subset of Q

. The query request is processed from right

to left, that is, from the smallest Octree to the largest.

For each non-empty subspace, a pointer links it between

different layers of the Octree. For example, if a user

wants to search a keyword k, the hierarchical Octree

index performs this query request at Q

. Then, because

k is not found on Q

, this query request is redirected to

. Finally, Q

receives this query request and obtains k.

Because this query procedure has similar properties to

those of a skip list, the hierarchical Octree is essentially

a skip list recon struction.

Definition of Skip-Octree The Skip-Octree is defined

by a sequence of subsets Li of the input points S with L

= S and builds a compressed Octree Qi for each Li. For

i >0, Li is sampled from L

i-1

by maintaining each point

with a probability of 1/2. For each Li, a compressed

Octree Qi is built for the points in Li. Therefore, Qi can

be seen as forming a sequence of levels in the skip list

such that L

and Ltop are the bottom and top le vels,

respectively.

As Fig. 2 illustrates, a skip list is a randomized data

structure in which level 0 is denoted as L

that records

all original data. In the same manner, L

records

approximately half of the data of L

and L

records ap-

proximately half those of L

. In Skip-Octree, L

, L

, and

correspond to the three hierarchal Octree Q

, Q

, and

. The multidimensional data space is partitioned by

Octree to obtain multiple level subspaces. The skip list

is used to organize these hierarchical data points and ac-

celerate query performance. In a skip list, the same

nodes between the upper and lower layers are associated

with the pointer. Thus, with the pointer pointed to the

root node in the topmost layer, we can find the specific

keyword by having the pointer move down . In addition,

with the locality sensitive hashing function [31], the

He et al. Journal of Cloud Computing: Advances, Systems and Applications (2016) 5:10 Page 3 of 11

剩余10页未读，继续阅读

weixin_38513794

粉丝: 1
资源: 946

云大规模动态多维索引框架：Skip-Octree的效率提升与应用

基于内容图像检索中的一种动态多维索引方法

除了hash索引 还有什么方法

python 多维数组赋值

说明SQL Azure和SQL Server的相同点和不同点。并说明SQL Azure怎样支持大数据。

SQL Server对数据仓储的支持主要表现在：可伸缩性，数据转换服务，索引视图，数据挖掘，综合性分析服务，联机分析处理， OLAP灵活性，可支持Web的分析，元数据服务。

doris数据模型的选择

2. 比较分布式数据库OceanBase、数据立方DataCube与传统关系数据库有何异同

kylin迁移mongodb

ConstantNDop

milvus numpy

csa文件作为np数组

clickhouse_zh.pdf

python numpy库引用

python numpy 三維数组

boost::geometry::index::rtree

时序/时空/向量/密态数据库

tdengine时间序列数据库

华为高斯数据库 和postgresql

numpy.array

numpy模型python

最新资源

除了hash索引还有什么方法

华为高斯数据库和postgresql