时态大数据的混合索引设计与优化

125 浏览量更新于2024-08-27 收藏 1012KB PDF 举报

"时态大数据的混合索引" 在大数据领域，特别是时态大数据的处理中，数据的管理和检索效率是至关重要的。时态大数据是指随着时间变化而不断更新的数据集，这些数据具有时间戳，用于记录数据的状态或事件发生的时间。在标题为"时态大数据的混合索引"的研究论文中，作者提出了一种创新的解决方案，即"SHB+-Tree"（Segmentation Hybrid B+ Tree），旨在提高对时态大数据的查询性能。 SHB+-Tree是一种融合了时态索引和对象索引优势的混合索引结构。传统的时态索引主要关注数据的历史版本和时间范围，而对象索引则专注于基于数据对象的特性进行索引。通过结合这两种索引方法，SHB+-Tree能够更有效地支持对时态大数据的复杂查询，包括对特定时间范围内的数据进行查找和分析。论文中提出的“分段存储策略”是SHB+-Tree的关键组成部分。这一策略将大数据集分割成多个时间段，每个时间段内的数据被组织在一个单独的B+树中。这种分段方法有助于减少查询时的磁盘I/O操作，从而提高查询速度。同时，它还允许系统根据需要动态调整存储策略，以适应数据的增长和变化。在构建SHB+-Tree的过程中，论文提出了自底向上的索引构造方法。这种方法从底层的数据段开始构建索引，逐渐向上合并，直至形成完整的全局索引结构。与自顶向下方法相比，自底向上的方式可以更有效地利用内存资源，避免一次性加载大量数据，尤其适合处理大规模的时态数据集。为了验证SHB+-Tree的有效性，研究人员进行了实验。实验结果表明，SHB+-Tree在查询性能、存储效率和可扩展性方面都优于传统的索引技术，证明了该方法在时态大数据环境中的优越性。关键词包括：时态大数据、混合索引、SHB+-Tree、分段存储、自底向上索引构造、查询性能。这篇研究论文对于那些致力于优化时态大数据处理和分析的IT专业人士来说，提供了有价值的理论和技术参考，有助于提升大数据系统的整体性能和用户体验。

Future Generation Computer Systems 72 (2017) 264–272

Contents lists available at ScienceDirect

Future Generation Computer Systems

journal homepage: www.elsevier.com/locate/fgcs

A hybrid index for temporal big data

Mei Wang

, Meng Xiao

, Sancheng Peng

b,c,∗

, Guohua Liu

School of Computer Science and Technology, Donghua University, Shanghai, 201620, PR China

School of Informatics, Guangdong University of Foreign Studies, Guangzhou, Guangdong Province, 510420, PR China

Laboratory of Language Engineering and Computing, Guangdong University of Foreign Studies, Guangzhou, Guangdong Province, 510420, PR China

h i g h l i g h t s

• A novel segmentation hybrid index SHB+-Tree for temporal big data is proposed.

• The proposed index integrates the advantages of temporal index and object index.

• The segmented storage strategy is proposed.

• The bottom-up index construction approach is provided.

• The experiments are conducted to verify the effectiveness of the proposed method.

a r t i c l e i n f o

Article history:

Received 15 November 2015

Received in revised form

14 May 2016

Accepted 6 August 2016

Available online 26 August 2016

Keywords:

Big data

Temporal database

Temporal index

SHB+-Tree index

Segmented storage

a b s t r a c t

Temporal index provides an important way to accelerate query performance in temporal big data.

However, the current temporal index cannot support the variety of queries very well, and it is hard to take

account of the efficiency of query execution as well as the index construction and maintenance. In this

paper, we propose a novel segmentation-based hybrid index B+-Tree, called SHB+- tree, for temporal big

data. First, the temporal data in temporal table deposited is separated to fragments according to the time

order. In each segment, the hybrid index is constructed by integrating the temporal index and the object

index, and the temporal big data is shared by them. The performance of construction and maintenance is

improved by employing the segmented storage strategy and bottom-up index construction approaches

for every part of the hybrid index. The experimental results on benchmark data set verify the effectiveness

and efficiency of the proposed method.

1. Introduction

In the era where data are being produced over time and shared

in an unprecedented pace, mining the information in the big data

has become increasingly crucial. Temporal information is the nat-

ural and basic description for the development and changes of

real-world objects, and almost everything has explicit or implicit

temporal features. While the traditional snapshot databases al-

ways record the information in a given specific time, it is difficult

to reflect the dynamic changes of real world sufficiently and accu-

rately. It is becoming increasingly urgent for the management and

retrieval of temporal big data in most modern database systems.

Temporal big data management has already attracted wide

concerns in both academic and industrial fields. Tang [1] proposed

∗

Corresponding author.

E-mail address: psc346@aliyun.com (S. Peng).

the concept of bi-temporal data at an earlier time. In this work,

each tuple of the temporal table carries two time intervals

[start

, end

] and [start

, end

], representing transaction time and

valid time (a.k.a system time and application time, respectively).

He also proposed to take time interval as a key, which makes

a breakthrough in traditional databases which only take digit

or character as a key. In this basis, many temporal database

prototypes have been implemented, such as TimeDB [2] and

TempDB [3]. Under the impetus of the above research and real

applications, ISO/IEC published the edition of the SQL standard

in December 2011, SQL: 2011 [4,5], which includes an important

functionality to create and manipulate temporal tables. In the

meantime, many popular commercial databases such as Oracle [6],

IBM DB2 [7], SAP HANA [8] also include temporal features. With

the developments of temporal databases, some key technologies in

the traditional databases have been re-examined. As an important

way to accelerate query performance, index has received great

attentions. Some index structures have been proposed to support

http://dx.doi.org/10.1016/j.future.2016.08.002

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38677190

粉丝: 6
资源: 891

时态大数据的混合索引设计与优化

分布式环境下时态大数据的连接操作研究.pdf

论文研究-一种可行的时态数据库索引技术.pdf

基于ADMD融合策略的海洋大数据高效索引架构

公交网络下的一种费用限制最小时态路径查询索引.pdf

偏序时态XML索引TempPartialIndex

时态数据索引TDindex研究与应用

语义协同时态XML索引研究与实现

一种基于时态中间件的高效双时态索引模型.pdf

基于时态密集度特征的大数据高效迁移策略.pdf

HBase下时态信息索引策略研究 (2014年)

最新资源