Megalloc*: Fast Distributed Memory Allocator for
NVM-based Cluster
Songping Yu, Nong Xiao, Mingzhu Deng, Yuxuan Xing, Fang Liu, Wei Chen
State Key Laboratory of High Performance Computing
National University of Defense Technology
ChangSha, China
we.isly@163.com
Abstract—As the expected emerging Non-Volatile Memory
(NVM) technologies, such as 3DXPoint, are in production,
there has been a recent push in the big data processing
community from storage-centric towards memory-centric.
Generally, in large-scale systems, distributed memory
management through traditional network with TCP/IP
protocol exposes performance bottleneck. Briefly, CPU-centric
network involves context switching, memory copy etc. Remote
Direct Memory Access (RDMA) technology reveals the
tremendous performance advantage over than TCP/IP:
Allowing access to remote memory directly bypassing OS
kernel. In this paper, we propose Megalloc, a distributed NVM
allocator exposes NVMs as a shared address space of a cluster
of machines based-on RDMA. Firstly, it makes memory
allocation metadata accessed directly by each machine,
allocating NVM in coarse-grained way; secondly, adopting
fine-grained memory chunk for applications to read or store
data; finally, it guarantees high distributed memory allocation
performance.
Keywords—distributed memory, memory allocator, RDMA,
Non-Volatile Memory
I.
I
NTRODUCTION
Recent years have seen the growing demand for large-
scale data mining and data analysis application, spurred by
the development of novel solutions from both the industry
and the sciences. MapReduce[1] is a framework, introduced
by Google for programming commodity computer clusters to
perform large-scale data processing, which relieves the
burden of application developers from the complex details of
running a distributed program such as: issue on data
distribution, task scheduling, and fault tolerance[2]. It has
made one step forward in the large-scale data processing
community, which gives birth to its open-source apache top
project-Hadoop[3]. But we should bear in mind that the
innovative data processing model of MapReduce will not
change the disk-oriented storage architecture it builds on,
therefore, the facts that the performance of disk has not
improved as rapidly as its capacity and it is increasingly
difficult to scale disk-based systems to meet the needs of
large-scale data applications have been testified [4].
RAMCloud[4] argued for a new approach that shifts the
primary locus of online data from disk to random access
memory, namely, information is kept entirely in DRAM with
disk relegated to a backup/archival role in data center. This
new storage paradigm spawned a new data processing
variant based-on MapReduce model, Spark[5], residing
dataset in memory as much as possible. It presents a data
abstraction for big data analytics, called Resilient Distributed
Dataset (RDD)[6], which is a coarse-grained deterministic
immutable data structure with the flexibility to persist in
memory, on disk or both, and when the capacity of memory
is insufficient to hold data sets, data eviction to storage
happens.
It is necessary to expand the storage capacity with
HDD/SSD due to DRAM’s density limitation [7][8].
However, HDD/SDD brings huge IO overhead. In response
to this issue, the emerging Non-Volatile Memory(NVM) ,
such as Phase Change Memory (PCM) [9], 3DXPoint[
10
],
incorporating a host of desirable features—access speeds
comparable to DRAM, storage-like persistence, low power
consumption, and byte addressability. These new types of
memory show the promise of being the candidate main
memory with comparable performance and much higher
capacity than DRAM. Especially, emerging NVM products
are expected to hit the market in the next few years. As an
example, 3D XPoint technology has been announced by Intel
and Micron with an expected arrival time of 2016 [10]. Also,
NVMs, such as 3D XPoint, are expected to be deployed with
4x the capacity of DRAM in future systems [11] such that
the storage IO is eliminated since the data stored in NVM.
The storage of big data needs to coalesce the NVM
memories in many machines. In current popular shared
nothing architecture, distributed memory provision is
achieved by connecting each machine with traditional
network (TCP/IP) and every machine allocates its memory
with memory allocators (e.g. glibc malloc, JVM). However,
on one side, memory allocation performance of this schema
is underdeveloped due to network bottleneck; on the other
side, connecting every machine with fast network (e.g.
Infiniband [13]) and accessing with Remote Direct Memory
Access (RDMA) technology highlight the memory allocation
overhead of memory allocators, such as glibc malloc.
In this paper, we design a distributed memory allocator,
Megalloc exposing NVM memory of all machines in the
cluster as a shared memory space through RDMA, allocates
remote NVM memory dynamically. By allocating in
dynamic way, Megalloc takes two basic principles: 1)
* Megalloc is short for megabyte allocator, pronounced [meg'æləuk]
978-1-5386-3486-8/17/$31.00 ©2017 IEEE