NOVA: NVM驱动的LFS文件系统应对高并发挑战

需积分: 22 106 浏览量更新于2024-07-18 收藏 2.26MB PDF 举报

本文档是关于"NOVA:一种面向混合挥发性/非挥发性主内存的逻辑结构文件系统"的学术论文，发表于第14届USENIX文件和存储技术会议(FAST'16)。该会议于2016年2月22日至25日在加利福尼亚州圣克拉拉举行，由USENIX协会主办。论文的ISBN为978-1-931971-28-7，全文开放获取。 NOVA系统的设计目标是充分利用即将出现的非易失性内存(NVM)与动态随机存取内存(DRAM)之间的协同工作，为软件提供亚微秒级的高带宽持久数据访问。然而，由于NVM的数据管理、访问和一致性维护带来了诸多挑战，传统的文件系统并不完全适应这种新型内存环境。作者 Jian Xu 和 Steven Swanson 来自加州大学圣地亚哥分校，他们提出了一种创新的文件系统设计，它采用日志结构，旨在解决在混合内存系统中存储和处理数据时遇到的问题。日志结构的优势在于能够实现高效的写入操作，同时保持数据的一致性和可靠性，这对于支持NVM的高并发且原子性的数据操作至关重要。论文详细探讨了NOVA如何通过利用NVM的特性，如低延迟、非易失性以及大容量，来优化数据布局、缓存策略以及事务处理机制。此外，NOVA还可能涉及一致性协议、数据迁移策略以及如何在内存失效或故障恢复时保持系统的可用性。作者可能分析了现有的文件系统在面对NVM时的局限性，并提出了NOVA在这些方面的改进措施，例如通过细粒度的持久化控制、元数据管理以及数据分片，来提升系统的性能和鲁棒性。论文可能还会展示NOVA在真实环境中的实验结果，以证明其在实际应用场景中的有效性。总结来说，这篇论文不仅提供了关于NOVA文件系统的技术细节，还为理解如何设计和优化未来的内存管理系统，特别是在混合内存架构下，提供了有价值的洞见。对于IT专业人士和研究者来说，这篇论文是深入理解NVM和高性能文件系统交互的重要参考资料。

USENIX Association 14th USENIX Conference on File and Storage Technologies (FAST ’16) 325

systems [21, 71, 73] bypass the DRAM page cache and access

NVMM directly using a technique calle d Direct Access (DAX)

or eXecute In Place (XIP), avoiding extra copies between

NVMM and DRAM in the storage stack. NOVA is a DAX

ﬁle system and we expect that all NVMM ﬁle systems will

provide these (or similar) features. We describe currently

available DAX ﬁle systems in Section 2.4.

Write reordering Modern processors and their caching

hierarchies may reorder store operations to improve perfor-

mance. The CPU’s memory consistency protocol makes guar-

antees about the ordering of memory updates, but existing

models (with the exception of research proposals [20, 46]) do

not provide guarantees on when updates will reach NVMMs.

As a result, a power failure may leave the data in an inconsis-

tent state.

NVMM-aware software can avoid this by explicitly ﬂush-

ing caches and issuing memory barriers to enforce write

ordering. The x86 architecture provides the clflush in-

struction to ﬂush a CPU cacheline, but clflush is strictly

ordered and needlessly invalidates the cacheline, incurring a

signiﬁcant performance penalty [6, 76]. Also, clflush only

sends data to the memory controller; it does not guarantee

the data will reach memory. Memory barriers such as Intel’s

mfence instruction enforce order on memory operations be-

fore and after the barrier, but mfence only guarantees all

CPUs have the same view of the memory. It does not impose

any constraints on the order of data writebacks to NVMM.

Intel has proposed new instructions that ﬁx these prob-

lems, including clflushopt (a more efﬁcient version of

clflush), clwb (to explicitly write back a cache line with-

out invalidating it) and PCOMMIT (to force stores out to

NVMM) [26, 79]. NOVA is built with these instructions

in mind. In our evaluation we use a hardware NVMM emu-

lation system that approximates the performance impacts of

these instructions.

Atomicity POSIX-style ﬁle system semantics require

many operations to be atomic (i.e., to execute in an “all or

nothing” fashion). For example, the POSIX rename re-

quires that if the operation fails, neither the ﬁle with the old

name nor the ﬁle with the new name shall be changed or

created [53]. Renaming a ﬁle is a metadata-only operation,

but some atomic updates apply to both ﬁle system metadata

and data. For instance, appending to a ﬁle atomically updates

the ﬁle data and changes the ﬁle’s length and modiﬁcation

time. Many applications rely on atomic ﬁle system operations

for their own correctness.

Storage devices typically provide only rudimentary guaran-

tees about atomicity. Disks provide atomic sector writes and

processors guarantee only that 8-byte (or smaller), aligned

stores are atomic. To build the more complex atomic up-

dates that ﬁle systems require, programmers must use more

complex techniques.

2.3. Building complex atomic operations

Existing ﬁle systems use a variety of techniques like journal-

ing, shadow paging, or log-structuring to provide atomicity

guarantees. These work in different ways and incur different

types of overheads.

Journaling Journaling (or write-ahead logging) is widely

used in journaling ﬁle systems [24, 27, 32, 71] and

databases [39, 43] to ensure atomicity. A journaling system

records all updates to a journal before applying them and, in

case of power failure, replays the journal to restore the system

to a consistent state. Journaling requires writing data twice:

once to the log and once to the target location, and to im-

prove performance journaling ﬁle systems usually only jour-

nal metadata. Recent work has proposed back pointers [17]

and decoupling ordering from durability [16] to reduce the

overhead of journaling.

Shadow paging Several ﬁle systems use a copy-on-write

mechanism called shadow paging [20, 8, 25, 54]. Shadow

paging ﬁle systems rely heavily on their tree structure to

provide atomicity. Rather than modifying data in-place during

a write, shadow paging writes a new copy of the affected

page(s) to an empty portion of the storage device. Then, it

splices the new pages into the ﬁle system tree by updating

the nodes between the pages and root. The resulting cascade

of updates is potentially expensive.

Log-structuring Log-structured ﬁle systems (LFSs) [55,

60] were originally designed to exploit hard disk drives’ high

performance on sequential accesses. LFSs buffer random

writes in memory and convert them into larger, sequential

writes to the disk, making the best of hard disks’ strengths.

Although LFS is an elegant idea, implementing it efﬁ-

ciently is complex, because LFSs rely on writing sequentially

to contiguous free regions of the disk. To ensure a consistent

supply of such regions, LFSs constantly clean and compact

the log to reclaim space occupied by stale data.

Log cleaning adds overhead and degrades the performance

of LFSs [3, 61]. To reduce cleaning overhead, some LFS

designs separate hot and cold data and apply different clean-

ing policies to each [69, 70]. SSDs also perform best under

sequential workloads [9, 14], so LFS techniques have been

applied to SSD ﬁle systems as well. SFS [38] classiﬁes ﬁle

blocks based on their update likelihood, and writes blocks

with similar “hotness” into the same log segment to reduce

cleaning overhead. F2FS [30] uses multi-head logging, writes

metadata and data to separate logs, and writes new data di-

rectly to free space in dirty segments at high disk utilization

to avoid frequent garbage collection.

剩余16页未读，继续阅读

-Kylin

粉丝: 2

NOVA: NVM驱动的LFS文件系统应对高并发挑战

Gitolite与Git LFS集成指南：django-git-lfs支持

stash-git-lfs插件：Atlassian BitBucket的git-lfs REST API集成

STM32F103VE单片机LFS文件系统移植教程

gitolite-git-lfs:用于 git-lfs 集成的示例 gitolite 命令（适用于 django-git-lfs）

细节增强的matlab代码-echopype-lfs-test:echopype-lfs-test

细节增强的matlab代码-echopype-lfs-test3:echopype-lfs-test3

orion-lfs:Orion 文件系统的本地存储提供程序

django-git-lfs:用于 git-lfs 存储服务器的 django 应用程序概念证明

LFS文件系统

stash-git-lfs:用于Atlassian BitBucket（以前称为Stash）的git-lfs REST API插件

最新资源