Megalloc*：RDMA驱动的NVM集群分布式内存分配器

5 浏览量更新于2024-08-26 收藏 409KB PDF 举报

“Megalloc*：一种用于基于NVM的群集的快速分布式内存分配器，由Songping Yu、Nong Xiao、Mingzhu Deng、Yuxuan Xing、Fang Liu和Wei Chen等人在国防科技大学高性能计算国家重点实验室进行的研究。” 在当前的计算机科学领域，非易失性内存（NVM）技术，如3DXPoint，正逐渐成为存储领域的焦点。随着这些技术的生产应用，大数据处理社区开始从以存储为中心转向以内存为中心的架构。然而，大型系统中的传统网络分布内存管理，通常通过TCP/IP协议进行，会暴露出性能瓶颈。这是因为CPU中心化的网络操作涉及到上下文切换、内存拷贝等开销。远程直接内存访问（RDMA）技术为解决这一问题提供了可能。它允许绕过操作系统内核直接访问远程内存，从而显著提升了性能。论文“Megalloc*”提出了一种基于RDMA的分布式NVM内存分配器，旨在将NVM暴露为集群机器共享的地址空间。首先，Megalloc*的设计使得每个机器可以直接访问内存分配元数据，以粗粒度的方式分配NVM，减少了传统内存分配中的开销。这降低了网络通信延迟，因为元数据访问不再需要经过中央服务器或网络中的多次交互。其次，通过利用RDMA的特性，Megalloc*能够实现低延迟的跨节点内存分配，减少数据传输的延迟和带宽消耗。同时，由于RDMA的无中断操作，它减轻了CPU的负担，提高了整体系统效率。此外，Megalloc*可能还包括对NVM特定特性的优化，如耐用性和持久性。NVM不同于传统的挥发性DRAM，它在断电后仍能保持数据，因此分配器需要考虑如何正确地处理故障恢复和数据一致性问题。最后，论文可能会讨论Megalloc*的性能评估和比较，与现有的内存分配策略进行对比，证明其在吞吐量、延迟以及资源利用率等方面的优越性。这样的评估对于理解Megalloc*在实际大规模分布式系统中的应用潜力至关重要。 “Megalloc*”是针对基于NVM的群集系统的一项创新，通过结合RDMA技术，旨在提供一个高效、低延迟的内存分配解决方案，以适应内存中心化的大数据处理需求。

Megalloc*: Fast Distributed Memory Allocator for

NVM-based Cluster

Songping Yu, Nong Xiao, Mingzhu Deng, Yuxuan Xing, Fang Liu, Wei Chen

State Key Laboratory of High Performance Computing

National University of Defense Technology

ChangSha, China

we.isly@163.com

Abstract—As the expected emerging Non-Volatile Memory

(NVM) technologies, such as 3DXPoint, are in production,

there has been a recent push in the big data processing

community from storage-centric towards memory-centric.

Generally, in large-scale systems, distributed memory

management through traditional network with TCP/IP

protocol exposes performance bottleneck. Briefly, CPU-centric

network involves context switching, memory copy etc. Remote

Direct Memory Access (RDMA) technology reveals the

tremendous performance advantage over than TCP/IP:

Allowing access to remote memory directly bypassing OS

kernel. In this paper, we propose Megalloc, a distributed NVM

allocator exposes NVMs as a shared address space of a cluster

of machines based-on RDMA. Firstly, it makes memory

allocation metadata accessed directly by each machine,

allocating NVM in coarse-grained way; secondly, adopting

fine-grained memory chunk for applications to read or store

data; finally, it guarantees high distributed memory allocation

performance.

Keywords—distributed memory, memory allocator, RDMA,

Non-Volatile Memory

NTRODUCTION

Recent years have seen the growing demand for large-

scale data mining and data analysis application, spurred by

the development of novel solutions from both the industry

and the sciences. MapReduce[1] is a framework, introduced

by Google for programming commodity computer clusters to

perform large-scale data processing, which relieves the

burden of application developers from the complex details of

running a distributed program such as: issue on data

distribution, task scheduling, and fault tolerance[2]. It has

made one step forward in the large-scale data processing

community, which gives birth to its open-source apache top

project-Hadoop[3]. But we should bear in mind that the

innovative data processing model of MapReduce will not

change the disk-oriented storage architecture it builds on,

therefore, the facts that the performance of disk has not

improved as rapidly as its capacity and it is increasingly

difficult to scale disk-based systems to meet the needs of

large-scale data applications have been testified [4].

RAMCloud[4] argued for a new approach that shifts the

primary locus of online data from disk to random access

memory, namely, information is kept entirely in DRAM with

disk relegated to a backup/archival role in data center. This

new storage paradigm spawned a new data processing

variant based-on MapReduce model, Spark[5], residing

dataset in memory as much as possible. It presents a data

abstraction for big data analytics, called Resilient Distributed

Dataset (RDD)[6], which is a coarse-grained deterministic

immutable data structure with the flexibility to persist in

memory, on disk or both, and when the capacity of memory

is insufficient to hold data sets, data eviction to storage

happens.

It is necessary to expand the storage capacity with

HDD/SSD due to DRAM’s density limitation [7][8].

However, HDD/SDD brings huge IO overhead. In response

to this issue, the emerging Non-Volatile Memory(NVM) ,

such as Phase Change Memory (PCM) [9], 3DXPoint[

incorporating a host of desirable features—access speeds

comparable to DRAM, storage-like persistence, low power

consumption, and byte addressability. These new types of

memory show the promise of being the candidate main

memory with comparable performance and much higher

capacity than DRAM. Especially, emerging NVM products

are expected to hit the market in the next few years. As an

example, 3D XPoint technology has been announced by Intel

and Micron with an expected arrival time of 2016 [10]. Also,

NVMs, such as 3D XPoint, are expected to be deployed with

4x the capacity of DRAM in future systems [11] such that

the storage IO is eliminated since the data stored in NVM.

The storage of big data needs to coalesce the NVM

memories in many machines. In current popular shared

nothing architecture, distributed memory provision is

achieved by connecting each machine with traditional

network (TCP/IP) and every machine allocates its memory

with memory allocators (e.g. glibc malloc, JVM). However,

on one side, memory allocation performance of this schema

is underdeveloped due to network bottleneck; on the other

side, connecting every machine with fast network (e.g.

Infiniband [13]) and accessing with Remote Direct Memory

Access (RDMA) technology highlight the memory allocation

overhead of memory allocators, such as glibc malloc.

In this paper, we design a distributed memory allocator,

Megalloc exposing NVM memory of all machines in the

cluster as a shared memory space through RDMA, allocates

remote NVM memory dynamically. By allocating in

dynamic way, Megalloc takes two basic principles: 1)

* Megalloc is short for megabyte allocator, pronounced [meg'æləuk]

下载后可阅读完整内容，剩余8页未读，立即下载

NEDL003

粉丝: 160

Megalloc*：RDMA驱动的NVM集群分布式内存分配器

vagrant-nvm:安装了 nvm 的简约 vagrant box

nvm-global-installs:获取任何nvm可选版本的全局安装软件包

富士施乐3375维修手册：apc5570NVM调整代码-6.pdf

基于分布式内存数据库快速计算的设计与实现.pdf

基于NVM的混合内存系统的高效内存管理

NVM非易失性内存缓存分配器的实现

***: 构建与部署个人静态网站

LegoOS：迈向硬件资源解耦的分布式操作系统

富士施乐3375维修指南：Apc5570 NVM调整代码详解

CentOS 6.3搭建Pomelo部署环境：Git、NVM与Node.js详解

最新资源