分布式存储解析：HBase与BigTable

需积分: 0 31 浏览量更新于2024-08-07 收藏 67KB PDF 举报

"理解HBase和BigTable的概念，这两者都是分布式数据存储系统，尤其是HBase作为开源实现的Google BigTable的变体。" HBase和BigTable是两种强大的分布式数据存储系统，尤其适用于处理大规模、非结构化或半结构化的数据。它们的设计灵感来源于谷歌的BigTable论文，而HBase则是Apache Hadoop生态系统中的一个关键组件，专门针对海量数据存储进行了优化。首先，让我们从BigTable的基本概念开始。BigTable被定义为“稀疏、分布式、持久的多维排序映射”。这个定义可能让人感到困惑，但其实它在说BigTable是一个可以存储大量数据的表格，这些数据分布在网络上的多个节点上，且能够根据多个维度进行排序。这里的“稀疏”意味着表格中的大多数单元格可能是空的，因为并非所有数据都必须在表格中预先分配。 BigTable的数据模型由行、列族、列和时间戳组成。行是数据的主键，可以快速定位到数据；列族是一组相关的列，它们共享相同的前缀；列是在列族内的具体属性；时间戳用于版本控制，每个单元格可以有多个版本，便于追踪数据的变化。 HBase作为BigTable的开源实现，保留了这些基本特性，但在设计上更注重于与Hadoop的集成。HBase利用HDFS（Hadoop分布式文件系统）作为其底层存储，并通过Zookeeper进行协调和管理。HBase的数据模型与BigTable相似，但它提供了更灵活的API，使得开发者更容易在Java和其他支持的语言中使用。在决定何时使用HBase而非传统的关系型数据库时，主要考虑以下几点： 1. 数据规模：如果数据量巨大，传统的单机数据库可能无法应对，HBase的分布式架构能很好地扩展存储能力。 2. 数据类型：HBase适合处理非结构化或半结构化数据，如日志、传感器数据等，而传统数据库更适合结构化数据。 3. 查询模式：HBase擅长随机读取和基于行键的快速查询，而复杂的SQL查询则不是它的强项。 4. 实时性需求：HBase提供低延迟的读写操作，对于实时数据处理场景非常有用。 HBase和BigTable是为大数据环境设计的，它们以列式存储、分布式架构和强大的扩展性解决了传统数据库在处理大规模数据时面临的挑战。理解它们的概念和工作原理对于有效地利用这些系统至关重要。通过深入学习，开发者可以更好地判断何时应选择HBase来满足特定的存储和分析需求。

Understanding HBase and BigTable

https://dzone.com/articles/understanding-hbase-and-bigtab

The hardest part about learning Hbase (the open source implementation of Google's BigTable), is

just wrapping your mind around the concept of what it actually is.

I find it rather unfortunate that these two great systems contain the words table and base in their

names, which tend to cause confusion among RDBMS indoctrinated individuals (like myself).

This article aims to describe these distributed data storage systems from a conceptual standpoint.

After reading it, you should be better able to make an educated decision regarding when you

might want to use Hbase vs when you'd be better off with a "traditional" database.

It's all in the terminology

Fortunately, Google's BigTable Paper clearly explains what BigTable actually is. Here is the first

sentence of the "Data Model" section:

A Bigtable is a sparse, distributed, persistent multidimensional sorted map.

Note: At this juncture I like to give readers the opportunity to collect any brain matter which may have

left their skulls upon reading that last line.

The BigTable paper continues, explaining that:

The map is indexed by a row key, column key, and a timestamp; each value in the map is an

uninterpreted array of bytes.

Along those lines, the HbaseArchitecture page of the Hadoop wiki posits that:

HBase uses a data model very similar to that of Bigtable. Users store data rows in labelled

tables. A data row has a sortable key and an arbitrary number of columns. The table is

stored sparsely, so that rows in the same table can have crazily-varying columns, if the user

likes.

Although all of that may seem rather cryptic, it makes sense once you break it down a word at a

time. I like to discuss them in this sequence: map, persistent, distributed, sorted,

multidimensional, and sparse.

Rather than trying to picture a complete system all at once, I find it easier to build up a mental

framework piecemeal, to ease into it...

map

At its core, Hbase/BigTable is a map. Depending on your programming language background, you

may be more familiar with the terms associative array (PHP), dictionary (Python), Hash (Ruby), or

Object (JavaScript).

From the wikipedia article, a map is "an abstract data type composed of a collection of keys and a

collection of values, where each key is associated with one value."

下载后可阅读完整内容，剩余5页未读，立即下载

Tomonkey

粉丝: 31
资源: 1

分布式存储解析：HBase与BigTable

《NoSQL数据库原理与应用案例教程》PPT课件(共9单元)第4章 HBase原理实现.pdf

hbase-site.xml.doc

Bigtable-参考-understanding-hbase and bigtable1

hbase_overview.pdf

Hbase权威指南.pdf

HBase权威指南.pdf

HBase架构简介.pdf

HBase基本操作.pdf

Hbase集群部署.pdf

hbase性能调优.pdf

最新资源