HB-File：US-ELM驱动的高维大数据存储优化方案

57 浏览量更新于2024-08-26 收藏 1.07MB PDF 举报

"HB-File是基于US-ELM（Unified Single Hidden Layer Feedforward Network with Extreme Learning Machine）算法的一种高效且有效的高维大数据存储结构。该结构主要针对当前计算机技术和互联网技术快速发展背景下，日益增长的高维大数据处理需求。高维大数据来源于网络交易数据、用户评论数据以及多媒体数据等，其特点在于同时包含了高维数据和大数据的特性，因此在处理和优化上带来了新的问题和挑战。在这种背景下，设计一个适应高维大数据的存储结构显得尤为重要。 US-ELM是一种改进的极端学习机（Extreme Learning Machine, ELM）模型，它在单层前馈神经网络的基础上，通过随机初始化隐藏层节点权重和偏置，然后用最小二乘法快速训练输出层权重，从而实现高效的训练过程。这种算法在处理高维数据时具有计算速度快、泛化性能好的优势，适用于大数据环境下的快速分类和回归任务。 HB-File存储结构充分利用了US-ELM的特性，旨在解决高维大数据的存储效率和查询性能问题。文章中提到，HB-File可能采用了分布式文件系统（如HDFS，Hadoop Distributed File System）作为基础，结合US-ELM进行数据的组织和索引。通过这种方式，HB-File能够有效地减少数据冗余，降低存储成本，并提高数据检索速度。在处理高维数据时，通常会面临“维度灾难”问题，即随着数据维度的增加，数据之间的差异性减小，导致分类或聚类的难度增大。HB-File通过US-ELM的降维能力，可能实现了对高维数据的有效压缩和表示，降低了数据的复杂性，从而改善了数据存储和查询的效率。此外，HB-File可能还考虑了大数据的扩展性和容错性，以适应不断增长的数据量和可能出现的硬件故障。这可能包括数据的副本策略、故障检测与恢复机制，以及数据分布和负载均衡策略。 HB-File是针对高维大数据存储挑战提出的一种创新解决方案，它利用US-ELM算法的优势，旨在提供高效、灵活且可扩展的存储架构。这一研究对于大数据处理领域具有重要的理论和实践意义，为高维大数据的管理和分析提供了新的思路和技术支持。"

Neurocomputing 261 (2017) 184–192

Contents lists available at ScienceDirect

Neurocomputing

journal homepage: www.elsevier.com/locate/neucom

HB-File: An eﬃcient and effective high-dimensional big data storage

structure based on US-ELM

Linlin Ding, Yu Liu, Baishuo Han, Shiwen Zhang, Baoyan Song

∗

School of Information, Liaoning University, Shenyang 110036, China

a r t i c l e i n f o

Article history:

Received 30 September 2015

Revised 11 June 2016

Accepted 16 June 2016

Available online 16 February 2017

Keywords:

US-ELM

HDFS

Big data

High-dimensional data

a b s t r a c t

With the rapid development of computer and the Internet techniques, the amount of data in all walks of

life increases sharply, especially accumulating numerous high-dimensional big data such as the network

transactions data, the user reviews data and the multimedia data. High-dimensional big data mixes the

typical features of both high-dimensional data and big data, which has also brought new problems and

great challenges for processing and optimizing the high-dimensional big data. In this case, the storage

structure of high-dimensional big data is a critical factor that can affect the processing performance in a

fundamental way. However, due to the huge dimensionality feature of high-dimensional data, the existing

data storage techniques, such as row-store and column-store, are not very suitable for high-dimensional

and large scale data. Therefore, in this paper, we present an eﬃcient high-dimensional big data storage

structure based on US-ELM, H igh-dimensional B ig Data File , named HB-File . Then, we propose a fuzzy

cluster algorithm to differentiate the key dimension and non-key dimension of high-dimensional big data

based on US-ELM, which can also gain the clusters of key dimension . After that, we propose the execution

and API of HB-File based on the open source implementation of MapReduce, Hadoop system. With the

intensive experiments, we show the effectiveness of HB-File in satisfying the storage of high-dimensional

big data.

Introduction

With the rapid development of computer and the improvement

of human cognitive abilities, the understanding view and depth of

things by human also continue extending and deepening. Many

attributes are derived to describe the things and entities, so the

high-dimensional data is generated, such as the network transac-

tions data, the mine microseism data, the user reviews data and

the multimedia data. Especially when the era of data explosion

comes, many data sets to be processed and analyzed are being

the “big data”, so more and more high-dimensional data forms

the high-dimensional big data. For example, the number of user

comments is close to 3.2 billion every day on Facebook. The high-

dimensional big data contains valuable knowledge and informa-

tion, which has important theoretical sense and wide application

ﬁelds. Except for the four typical characteristics of big data, Vol-

ume, Variety, Value and Velocity, the high-dimensional big data

also has its own complex structure and numerous dimensions. That

is, the high-dimensional big data mixes the typical features of both

∗

Corresponding author.

E-mail address: bysong@lnu.edu.cn (B. Song).

high-dimensional data and big data, which brings the new prob-

lems and challenges of the query processing and optimization of

high-dimensional big data. In this case, the storage structure of

high-dimensional big data is a critical factor that can affect the

processing performance in a fundamental way.

However, the existing storage structures of big data are not

suitable for storing high-dimensional big data by the reason of

the numerous dimensions of high-dimensional big data. For ex-

ample, the column-store structure, typical HBase [1] , is very ﬁt

for storing the data with sparse columns features. But, due to

the large amount and high coherence among dimensions of high-

dimensional big data, if we use the pure column-store technology

to manage high-dimensional big data, there would be numerous

join operations among the dimensions during recovering the data.

Instead, if we use row-store structure, typical HDFS [2] , to store the

high-dimensional big data, the single data record would be very

long due to so many data dimensions. So, each data block only has

a little high-dimensional big data records, which would reduce the

storage eﬃciency. In a word, it is an urgent need to design eﬃcient

storage model for eﬃcient storing high-dimensional big data.

Therefore, in this paper, we present an eﬃcient high-

dimensional big data storage model, H igh-dimensional B ig Data

File , named HB-File . First, a table stored high-dimensional big data

http://dx.doi.org/10.1016/j.neucom.2016.06.080

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38697979

粉丝: 6
资源: 947

HB-File：US-ELM驱动的高维大数据存储优化方案

基于US-ELM的高效高维大数据存储结构

CS-Chord：基于聚类分离的分布式高维向量索引.pdf

US-ELM驱动的高维大数据存储优化方案

LLLE-ELM: 基于局部线性嵌入极限学习机的高效人脸识别方法

TL-SVM：基于SVM的高效迁移学习算法

云计算环境下的分形聚类融合算法：高效处理高维大数据

BO神经网络MATLAB简单代码-Structural-Kernel-Learning-for-HDBBO:通过结构核学习进行批量高维贝叶斯优

LANTERN-NeurIPS-2019:NeurIPS 2019论文“通过高效采样从高维事件序列中学习潜在过程”的源代码

基于随机数三角阵映射的高维大数据二分聚类初始中心高效鲁棒生成算法.docx

基于随机数三角阵映射的高维大数据二分聚类初始中心高效鲁棒生成算法.pdf

最新资源