分布式内存云上的图引擎Trinity：高效处理大规模图计算

需积分: 9 127 浏览量更新于2024-09-08 收藏 337KB PDF 举报

Trinity是一个分布式图引擎，专为内存云设计，旨在解决大规模图计算中的数据驱动问题。在现代IT行业中，图形算法的执行依赖于频繁的数据访问，而传统的磁盘技术在提供高效随机访问方面存在局限性。单机内存方法受限于机器容量，难以实现有效的扩展。为此，研究人员提出了Trinity，它构建在分布式内存云的基础上，实现了对大规模图数据的高效处理。 Trinity的核心优势在于其优化的内存管理和网络通信策略。通过智能地管理内存分配，它能够支持快速的图探索和并行计算，确保在大规模图上进行查询处理和分析时的性能。特别地，Trinity利用了在线和离线计算中的图访问模式，针对不同场景动态调整内存使用和网络通信，从而最大限度地提高整体效率。用户界面是Trinity另一个亮点，它提供了高级的TSL（Trinity Specification Language）规范语言，允许用户方便地定义数据模式和通信协议，极大地提升了图管理与计算的易用性。这意味着即使对于没有深入编程背景的用户，也能相对容易地进行复杂图操作和数据分析。在实际测试中，Trinity表现出色，无论是处理低延迟的实时查询还是高吞吐量的批量分析，都能在亿节点的Web规模图上展现出卓越性能。使用仅仅几台普通机器，Trinity就能胜任大规模图处理任务，这在资源有限或需要高度可扩展性的应用场景中具有显著的优势。 Trinity是一个革命性的解决方案，它结合了内存云的优势，通过优化的内存管理和灵活的用户接口，解决了大规模图计算中的关键挑战，对于提升IT行业的数据处理能力和效率具有重要的推动作用。

Distributed

Memory

Storage

Message

Passing

Framework

Memory Cloud

(Distributed Key-Value Store)

Trinity Speciﬁcation Language

Graph Model

Graph Operations

GetInlinks(), Outlinks.Foreach(...), etc

Figure 2: System Layers

the data storage. Due to the diversity of graphs and the

diversity of graph applications, it is hard , if not entirely im-

possible, to support eﬃcient general purpose graph compu-

tation using ﬁxed graph schema. Instead of using ﬁxed graph

schema and ﬁxed computation models, Trinity let users de-

ﬁne graph schema, communication protocols, and compu ta-

tion paradigms through TSL.

3. THE MEMORY CLOUD

We create a distributed memory cloud as Trinity’s stor-

age infrastructure. The memory cloud consists of 2

mem-

ory trunks, each of which is stored on a machine. Usually,

we have 2

> m, where m is the number of machines. In

other words, each machine hosts multiple memory trunks.

The reason we p artition a machine’s local memory space

into multiple memory trunks is twofold: 1) Trunk level par-

allelism can be achieved without any overhead of locking;

2) The performance of a single huge hash table is subop ti-

mal due to a higher probability of hashing conﬂicts. Essen-

tially, the entire memory cloud is partitioned into 2

mem-

ory trunks. To support fault-tolerant data persistence, these

memory trunks are also backed up in a shared distributed

ﬁle system called TFS (Trinity File System), which is similar

to HDFS [10].

On top of the memory cloud, we create a key-value store.

A key-value pair forms the most basic data structure of the

graph system or any system built on top of the memory

cloud. Here, keys are 64-bit globally unique identiﬁers, and

values are blobs of arbitrary length. As the memory cloud

is distributed across multiple machines, we cannot address

a key-value pair using its physical memory address. To ad-

dress a key-value p air, Trinity uses a hashing mechanism.

In order to locate the value of a given key, we ﬁrst 1) iden-

tify the machine that stores the key-value pair, and then 2)

locate the key-value pair in one of the memory trunks on

that machine. Through this hashing mechanism (shown in

Figure 3), we provide a globally add ressable memory space.

Speciﬁcally, given a 64-bit key, to locate its correspond -

ing value in the memory cloud, we hash the key to a p-bit

value i ∈ [0, 2

− 1]. This means the key-value pair is stored

in memory trunk i within the memory cloud. To ﬁnd out

which m achine memory t runk i is in, we maintain an “ad-

dressing table” that contains 2

slots, where each slot stores

a machine ID. Essentially, we implement a consistent hash-

ing mechanism that allows machines to join and leave the

64-bit UID

hash

machine 0 machine 1 machine 2 machine m

. . .

0 1

· · ·

1 2

p -bit

hash code

0 1 2 3

· · ·

-1

Addressing

Table

. . .

0 1 2

-1

Trinity File System

. . .

Memory

Trunk

Trinity

Slave

Memory Trunks

cell bytes

Memory Trunk

UID

Offset

Size

01. . . 321 123

10. . . 423 211

· · · · · · · · ·

Figure 3: Data Partitioning and Addressing

memory cloud (described later). Furthermore, in order for

global addressing to work, each machine keeps a replica of

the addressing table, and we will describe how we ensure the

consistency of the addressing tables in Section 6.2.

We then locate the key-value pair in memory trunk i,

which is stored on th e machine whose ID is in slot i of the

addressing table. Each memory trunk is associated with a

hash table. We hash the 64-bit key again to ﬁn d the oﬀset

and size of the key-value pair in the hash table. Given the

memory oﬀset and the size, we retrieve th e key-value pair

from the memory trunk.

The addressing table provides a mechanism that allows

machines to dynamically join and leave the memory cloud.

When a machine fails, we reload the memory trunks it owns

from the TFS to other alive machines. All we need to do

is to update the addressing table so that the corresponding

slots point to the machines that host the data now. Simi-

larly, when new machines join the memory cloud, we relocate

some memory trunks t o those new machines and update the

addressing table accordingly.

Each key-value pair in the memory cloud may contain

some meta data for various purposes. Most notably, we

may associate each key-value pair with a spin lock. Spin

locks are used for concurrency control and physical memory

pinning. Multiple threads m ay try to access the same key-

value pair concurrently. A physical key-value pair may also

be moved by the memory defragmentation thread as elabo-

rated in Section 6.1. Therefore, we must en sure a key-value

pair is locked and pinned to a ﬁxed memory position before

allowing any thread to manipulate it. For applications that

are not read-only, the spin lock mechanism allows a thread

to access a pinned physical key-value pair exclusively by re-

quiring all threads to acquire the lock before accessing or

moving a cell.

4. DATA MODEL

Trinity is d esigned to handle graph data of d iverse char-

acteristics. In this section, we describe data modeling issues

剩余11页未读，继续阅读

feilongxman

粉丝: 1
资源: 5

分布式内存云上的图引擎Trinity：高效处理大规模图计算

Trinity：开源Python网站URL抓取工具介绍

奥尔堡大学学生开发的GPU科学计算语言Trinity

数据整合与BI：Trinity平台助力大数据分析

Trinity：Trinity漏洞-模拟器逃生

trinity:以太坊网络的Trinity客户端

dockerfile-x11docker-trinity:基本图像Trinity桌面

trinity:Linux系统调用模糊器

trinity:Trinity 是一个基于 Finagle 的轻量级 MVC 框架

Trinity:奥尔堡大学第四学期项目

Trinity:Cypher和Neo4j的VSCode扩展

最新资源