Bigdata® RDF数据库技术白皮书：高性能开源解决方案

需积分: 10 198 浏览量更新于2024-07-23 收藏 935KB PDF 举报

"《大数据架构白皮书》是一份由SYSTAP, LLC发布的专业文档，主要关注于大数据® RDF数据库的设计与技术细节。该数据库被定义为一个基于标准、高性能且可扩展的开源图形数据库，其核心特点是支持SPARQL 1.1系列规范，包括查询、更新、基本联邦查询和服务描述功能。大数据平台特别注重耐用的命名解决方案集、高效存储和查询实化语句模型，以及可扩展的图分析能力。它具有多租户特性，可以灵活部署，既可以作为嵌入式数据库独立运行，也可以作为一个单独的服务服务器，或者通过高可用性复制集群来实现。此外，它还支持类似于Google的Bigtable、Apache Accumulo或Cassandra等大规模分布式数据存储系统进行水平分割的联邦服务模式。自2006年以来，大数据开源平台一直处于持续的研发之中，采用双许可证模式（GPLv2和商业许可），得到了众多知名企业的OEM、分销和在应用中集成的支持。 SYSTAP, LLC作为项目的主要开发者，不仅提供开源项目的维护，也针对商业用户和开源社区提供支持订阅服务。这表明他们对保证大数据® RDB在不同环境中的稳定性和易用性有着坚定的承诺。这份白皮书深入探讨了如何在日益增长的数据环境中构建、优化和管理复杂的图数据结构，以及如何利用大数据® RDB的优势进行高效的数据处理和分析。对于那些寻求处理和理解大规模、非结构化数据的企业和个人来说，这份白皮书无疑提供了宝贵的参考和技术指南。"

The bigdata® RDF Database Technical Whitepaper

SYSTAP, LLC Page 5 of 25 5/29/2013

In bigdata, an index maps unsigned byte[] keys to byte[] values

. Mechanisms are provided

which support the encoding of single and multi-field numeric, ASCII, and Unicode data.

Likewise, extensible mechanisms provide for (de)serialization of application data as byte[]s for

values. An index entry is known as a “tuple”. In addition to the key and value, a tuple contains

a “deleted” flag which is used to prevent reads through to historical data in index views,

discussed below, and a revision timestamp, which supports optional transaction processing

based on Multi-Version Concurrency Control (MVCC)

. The IndexMetadata object is used to

configure both local and scale-out indices. Some of its most important attributes are the index

name, index UUID, branching factor, objects that know how to serialize application keys and

both serialize and deserialize application values store in the index, and the key and value coder

objects.

The B+Tree never overwrites records (nodes or leaves) on the disk. Instead, it uses copy-on-

write for clean records, expands them into Java objects for fast mutation and places them onto a

hard reference ring buffer for that B+Tree instance. On eviction from the ring buffer, and during

checkpoint operations, records are coded into their binary format and written on the backing

store.

Records can be directly accessed in their coded form. The default key coding technique is

front coding, which supports fast binary search with good compression. Canonical Huffman

coding is supported for values. Custom coders may be defined, and can be significantly faster

for specific applications.

The high-level API for the B+Tree includes methods that operate on a single key-value pair

(insert, lookup, contains, remove), methods which operate on key ranges (rangeCount,

rangeIterator), and a set of methods to submit Java procedures that are mapped against the

index and execute locally on the appropriate data services (see below). Scale-out applications

make extensive use of the key-range methods, mapped index procedures, and asynchronous

write buffers to ensure high performance with distributed data.

The rangeCount(fromKey,toKey) method is of particular relevance for query planning. The

B+Tree nodes internally track the #of tuples spanned by a separator key. Using this

information, the B+Tree can report the cardinality of a key-range on an index using only two key

probes against the index. This range count will be exact unless delete markers are being used,

in which case it will be an upper bound (the range count includes the tuples with delete

markers). Fast range counts are also available on a federation, where a key-range may span

multiple index partitions.

Scale-Up Architecture

The Journal manages a backing store, provides low-level mechanisms for writing and reading

allocations on that file, and has higher-level mechanisms for registering and operating on

indices. There are several different backing store models for the Journal. The most important

are described below.

We are reviewing this design decision with respect to column-wise storage.

Reed, D.P.. "Naming and Synchronization in a Decentralized Computer System". MIT dissertation.

http://www.lcs.mit.edu/publications/specpub.php?id=773

Huffman coding, http://en.wikipedia.org/wiki/Huffman_coding

Canonical Huffman coding, http://en.wikipedia.org/wiki/Canonical_Huffman_code

剩余24页未读，继续阅读

白乔

粉丝: 3423
资源: 14

Bigdata® RDF数据库技术白皮书：高性能开源解决方案

big data white paper

Oracle white paper-Big Data

盘若链白皮书

大数据白皮书.pdf

大数据标准化白皮书.pdf

企业上云-享云服务白皮书.docx

阿里云 专有云企业版 V3.8.1 大数据管家BCC 技术白皮书 20190910

阿里云 专有云企业版 V3.8.0 大数据管家BCC 技术白皮书 20190621.pdf

阿里云 专有云企业版 V3.7.1 大数据管家BCC 技术白皮书 20190124.pdf

华为埃森哲携手发布2020智慧园区白皮书

最新资源

阿里云专有云企业版 V3.8.1 大数据管家BCC 技术白皮书 20190910

阿里云专有云企业版 V3.8.0 大数据管家BCC 技术白皮书 20190621.pdf

阿里云专有云企业版 V3.7.1 大数据管家BCC 技术白皮书 20190124.pdf