2016年USENIX OSDI会议论文集：操作系统设计与实现前沿

需积分: 33 128 浏览量更新于2024-07-20 1 收藏 27.44MB PDF 举报

"USENIX OSDI 2016年论文集（Proceedings of USENIX OSDI 2016，OSDI '16）是一部汇集了操作系统领域顶尖研究的珍贵文献。作为由USENIX，一个成立于1975年的先进计算机系统协会主办的会议，OSDI（Operating Systems Design and Implementation）起源于1994年，每两年举办一次，旨在推动操作系统设计与实现技术的发展。OSDI会议通常为期三天，自2002年起，每届论文数量保持在27篇左右，体现出极高的学术质量和深度。会议在2016年11月2日至4日于美国佐治亚州萨凡纳召开，吸引了来自全球的学者和业界专家共同探讨最前沿的操作系统设计理念和技术成果。论文集赞助商阵容强大，包括Facebook、Google、微软研究、NetApp和VMware等知名企业，以及FreeBSD基金会等产业合作伙伴和Platinum赞助商，展现了OSDI会议在业界的影响力。该论文集不仅记录了会议上的精彩演讲和研究成果，而且对每篇论文的作者权益给予充分尊重，允许非商业性质的教育或研究目的下复制整部作品，但个人使用需限于单份打印副本。版权方面，所有权利归USENIX协会所有，强调了对知识产权的保护和尊重。 OSDI '16的论文涵盖了广泛的主题，如操作系统内核优化、虚拟化技术、分布式系统架构、安全机制、内存管理、并行计算、以及新兴技术的探索等。每一项研究都代表了当时操作系统领域的创新和挑战，对于理解操作系统的设计原则、实施策略以及未来发展趋势具有重要意义。通过阅读这些论文，研究人员、工程师和学生能够深入理解并应用最新的操作系统理论和实践，推动行业的进步和发展。"

that M maps each inode number in range (0, i) to a block

number in range (0, b). Finally, the root directory con-

straint requires D to map ﬁle names to inode numbers

in range (0, i). These three constraints are all Yggdrasil

needs to verify YminLFS (see §2.3).

2.3 Veriﬁcation

To verify that the YminLFS implementation (§2.2) sat-

isﬁes the FSSpec speciﬁcation (§2.1), Yggdrasil uses the

Z3 solver [15] to prove a two-part crash reﬁnement theo-

rem (§3). The ﬁrst part of the theorem deals with crash-

free executions. It requires the implementation and spec-

iﬁcation to behave alike in the absence of crashes: if both

YminLFS and FSSpec start in equivalent and consistent

states, they end up in equivalent and consistent states.

The veriﬁer deﬁnes equivalence using the speciﬁcation’s

equivalent predicate (§2.1), and consistency using the

implementation’s consistency invariants (§2.2).

The second part of the theorem deals with crashing

executions. It requires the implementation to exhibit no

more crash states (disk states after a crash) than the spec-

iﬁcation: each possible state of the YminLFS implemen-

tation (including states caused by crashes and reordered

writes) must be equivalent to some crash state of FSSpec.

Counterexamples. If there is any bug in the imple-

mentation or consistency invariants, the veriﬁer will gen-

erate a counterexample to help programmers understand

the bug. A counterexample consists of a concrete trace

of the implementation that violates the crash reﬁnement

theorem. As an example, consider the potential missing

ﬂush bug described in §2.2. If we remove the ﬂush

between the last two writes in the implementation of

mknod, Yggdrasil outputs the following counterexample:

# Pending writes

lfs.py:167 mknod write(new_imap_blkno, imap)

# Synchronized writes

lfs.py:148 mknod write(new_blkno, new_ino)

lfs.py:154 mknod write(new_parentdata, parentdata)

lfs.py:160 mknod write(new_parentblkno, parentinode)

lfs.py:170 mknod write(SUPERBLOCK, sb)

# Crash point

[..]

lfs.py:171 mknod flush()

The output describes the bug by showing the point at

which the system crashes and the list of writes pending

in the cache (along with their source code locations). In

this example, the write of the new inode mapping block

(step 4 above) is still pending, but the write to update the

superblock to point to that block (step 5) has reached the

disk, corrupting YminLFS’s state.

The visualization of “pending” and “synchronized”

writes in the counterexample is speciﬁc to the asyn-

chronous disk model; one can extend Yggdrasil with new

disk models and customized visualizations.

Our initial YminLFS implementation contained two

other bugs: one in the lookup logic and one in the data

layout. Neither of the bugs appeared during testing runs.

Both bugs were found by the veriﬁer in a matter of sec-

onds, and we quickly localized and ﬁxed them by exam-

ining the resulting counterexamples.

Proofs. If the Yggdrasil veriﬁer ﬁnds no counterexam-

ples to the crash reﬁnement theorem, then none exist, and

we have obtained a proof of correctness. In particular,

the crash reﬁnement theorem holds for all disks with up

to 2

blocks, and for every trace of ﬁle system opera-

tions, regardless of its length. After we ﬁxed the bugs in

our initial YminLFS implementation, the veriﬁer proved

its correctness in under 30 seconds.

It is worth noting that the theorem holds if the ﬁle sys-

tem is the only user of the disk. For instance, it does not

hold if an adversary corrupted the ﬁle system image by

directly modifying the disk. To address this issue, one

can run fsck generated by Yggdrasil, which guarantees

to detect any such inconsistencies.

2.4 Optimizations and compilation

As described in §2.2, YminLFS’s mknod implementation

uses ﬁve disk ﬂushes. Yggdrasil provides a greedy opti-

mizer that tries to remove every disk ﬂush and re-verify

the code. Running the optimizer on the mknod code re-

moves three out of the ﬁve ﬂushes within three minutes,

while still guaranteeing correctness.

The optimized and veriﬁed YminLFS implementation,

which is in Python, is executable but slow. Yggdrasil

invokes the Cython compiler [3] to generate C code from

Python for better performance. It also provides a small

bridge to connect the generated C code to FUSE [17].

The result is a single-threaded user-space ﬁle system.

2.5 Summary

We have demonstrated how to specify, implement, de-

bug, verify, optimize, and execute the YminLFS ﬁle sys-

tem using Yggdrasil. Compared to previous ﬁle sys-

tem veriﬁcation work, push-button veriﬁcation eases the

proof burden and enables automated features such as vi-

sualizing bugs and optimizing code.

Since there is no need to manually prove or annotate

implementation code when using Yggdrasil, the veriﬁ-

cation effort is spent mainly on writing the speciﬁcation

and coming up with consistency invariants about the on-

disk data format. We ﬁnd the counterexample visualizer

useful for ﬁnding bugs in these two parts.

The trusted computing base (TCB) includes the ﬁle

system speciﬁcation, Yggdrasil’s veriﬁer, visualizer, and

compiler (but not the optimizer), their dependencies (i.e.,

the Z3 solver, Python, and gcc), as well as FUSE and the

Linux kernel. See §6 for discussion on limitations.

USENIX Association 12th USENIX Symposium on Operating Systems Design and Implementation 5

3 The Yggdrasil architecture

In Yggdrasil, the core notion of correctness is crash re-

ﬁnement. This section gives a formal deﬁnition of crash

reﬁnement, and describes how Yggdrasil’s components

use this deﬁnition to support veriﬁcation, counterexam-

ple visualization, and optimization.

3.1 Reasoning about systems with crashes

In Yggdrasil, programmers write both speciﬁcations and

implementations (referred to as “systems” in this section)

as state machines: each system comprises a state and a

set of operations that transition the state. A transition

can occur only if the system is in a consistent state, as

determined by its consistency invariant I. This invariant

is a predicate over the system’s state, indicating whether

it is consistent or corrupted; see §2.2 for an example.

Consider a speciﬁcation F

and an implementation

. Our goal is to show that F

is correct with respect

to F

. Since both systems are state machines, a straw-

man deﬁnition of correctness is that they transition in

lock step (i.e., bisimulation): starting from equivalent

consistent states, if the same operation is invoked on

both systems, they will transition to equivalent consistent

states (where equivalence between states is deﬁned by a

system-speciﬁc predicate). However, this bisimulation-

based deﬁnition is too strong for systems that interact

with external storage, as it does not account for non-

determinism from disk reorderings, crashes, or recovery.

To address this shortcoming, we introduce crash re-

ﬁnement as a new deﬁnition of correctness. At a high

level, crash reﬁnement says that F

is correct with re-

spect to F

if, starting from equivalent consistent states

and invoking the same operation on both systems, any

state produced by F

is equivalent to some state produced

by F

. To formalize this intuition, we deﬁne the behav-

ior of a system in the presence of crashes, formalize crash

reﬁnement for individual operations, and extend the re-

sulting deﬁnition to entire systems.

System operations. We model the behavior of a sys-

tem operation with a function f that takes three inputs:

• its current state s;

• an external input x, such as data to write; and

• a crash schedule b, which is a set of boolean values

denoting the occurrence of crash events.

Applying f to these inputs, written as f(s, x, b), pro-

duces the next state of the system.

As a concrete example, consider a single disk write

operation that writes value v to disk address a. The ex-

ternal input to the write operation’s function f

is the

pair (a, v). The state s is the disk content before the

write; s(a) gives the old value at the address a. The

asynchronous disk model in Yggdrasil generates a pair of

boolean values (on , sync ) as the crash schedule. The on

value indicates whether the write operation completed

successfully by storing its data into the volatile cache.

The sync value indicates whether the write’s effect has

been synchronized from the volatile cache to stable stor-

age. After executing the write operation, the disk is up-

dated to contain v at the address a only if both on and

sync are true, and left unchanged otherwise (e.g., the

system crashed before completing the write, or before

synchronizing it to stable storage):

(s, x, b) = s[a 7→ if on ∧ sync then v else s(a)],

where x = (a, v) and b = (on, sync).

Crash reﬁnement. To deﬁne crash reﬁnement for a

given schedule, we start from a special case where write

operations always complete and their effects are synchro-

nized to disk. That is, the crash schedule is the constant

vector true. Let s

∼ s

denote that s

and s

are equiv-

alent states according to a user-deﬁned equivalence rela-

tion (as in §2.1). We write s

∼

to say that s

and s

are equivalent and consistent according to their

respective system invariants I

and I

∼

, I

) ∧ I

) ∧ s

∼ s

With a crash-free schedule true , two functions f

and f

are equivalent if they produce equivalent and consistent

output states when given the same external input x, as

well as equivalent and consistent starting states:

Deﬁnition 1 (Crash-free equivalence). Given two func-

tions f

and f

with their system consistency invariants

and I

, respectively, we say f

and f

are crash-free

equivalent if the following holds:

∀s

, s

, x. (s

∼

) ⇒ (s

∼

)

where s

= f

, x, true ) and s

= f

, x, true ).

Next, we allow for the possibility of crashes. We say

that f

is correct with respect to f

if, for any crash

schedule, the state produced by f

with that schedule is

equivalent to a state produced by f

with some schedule:

Deﬁnition 2 (Crash reﬁnement without recovery). Func-

tion f

is a crash reﬁnement (without recovery) of f

(1) f

and f

are crash-free equivalent and (2) the fol-

lowing holds:

∀s

, s

, x, b

. ∃b

. (s

∼

) ⇒ (s

∼

)

where s

= f

, x, b

) and s

= f

, x, b

Finally, we consider the possibility that the system

may run a recovery function upon reboot. A recovery

function r is a system operation (as deﬁned above) that

takes no external input (as it is executed when the system

starts). It should also be idempotent: even if the system

crashes during recovery and re-runs the recovery func-

tion many times, the resulting state should be the same

once the recovery is complete.

6 12th USENIX Symposium on Operating Systems Design and Implementation USENIX Association

Deﬁnition 3 (Recovery idempotence). A recovery func-

tion r is idempotent if the following holds:

∀s, b. r(s, true ) = r(r(s, b), true).

Note that this deﬁnition accounts for multiple crash-

reboot cycles during recovery, by repeated application

of the idempotence deﬁnition on each intermediate crash

state r(s, b), r(r(s, b), b

), . . . , where b, b

, . . . are the

schedules for each crash during recovery.

Deﬁnition 4 (Crash reﬁnement with recovery). Given

two functions f

and f

, their system consistency invari-

ants I

and I

, respectively, and a recovery function r,

with r is a crash reﬁnement of f

if (1) f

and f

are crash-free equivalent; (2) r is idempotent; and (3) the

following holds:

∀s

, s

, x, b

. ∃b

. (s

∼

) ⇒ (s

∼

)

where s

= f

, x, b

) and s

= r(f

, x, b

), true ).

Furthermore, systems may run background operations

that do not change the externally visible state of a sys-

tem (i.e., no-ops), such as garbage collection.

Deﬁnition 5 (No-op). Function f with a recovery func-

tion r is a no-op if (1) r is idempotent, and (2) the fol-

lowing holds:

∀s

, s

, x, b

. (s

∼

) ⇒ (s

∼

)

where s

= r(f (s

, x, b

), true ).

With per-function crash reﬁnement and no-ops, we can

now deﬁne crash reﬁnement for entire systems.

Deﬁnition 6 (System crash reﬁnement). Given two sys-

tems F

and F

, and a recovery function r , F

is a crash

reﬁnement of F

if every function in F

with r is either a

crash reﬁnement of the corresponding function in F

a no-op.

The rest of this section will describe Yggdrasil’s compo-

nents based on the deﬁnition of crash reﬁnement.

3.2 The veriﬁer

Given two ﬁle systems, F

and F

, Yggdrasil’s veriﬁer

checks that F

is a crash reﬁnement of F

according to

Deﬁnition 6. To do so, the veriﬁer performs symbolic

execution [6, 24] for each operation f

∈ F

to obtain

an SMT encoding of the operation’s output, f

, x, b

when applied to a symbolic input x (represented as a

bitvector), symbolic disk state s

(represented as an un-

interpreted function over bitvectors), and symbolic crash

schedule b

(represented as booleans). It then invokes

the Z3 solver to check the validity of either the no-op

identity (Deﬁnition 5) if f

is a no-op, or else the per-

function crash reﬁnement formula (Deﬁnition 4) for the

corresponding functions f

∈ F

and f

∈ F

To capture all execution paths in the SMT encoding of

, x, b

), the veriﬁer adopts a “self-ﬁnitizing” sym-

bolic execution scheme [49], which simply unrolls loops

and recursion without bounding the depth. Since this

scheme will fail to terminate on non-ﬁnite code, the ver-

iﬁer requires ﬁle systems to be implemented in a ﬁnite

way: for instance, loops must be bounded [50]. In our

experience (further discussed in §4), the ﬁniteness re-

quirement does not add much programming burden.

To prove the validity of the per-function crash reﬁne-

ment formula, the veriﬁer uses Z3 to check if the for-

mula’s negation is unsatisﬁable. If so, the result is a

proof that f

is a crash reﬁnement of f

. Otherwise, Z3

produces a model of the formula’s negation, which rep-

resents a concrete counterexample to crash reﬁnement:

disk states s

and s

, an input x, and a crash schedule

, such that s

∼

but there is no crash schedule

that satisﬁes f

, x, b

) ∼

, x, b

Checking the satisﬁability of the negated crash reﬁne-

ment formula in Deﬁnition 4 requires reasoning about

quantiﬁers. In general, such queries are undecidable. In

our case, the problem is decidable because the quantiﬁers

range over ﬁnite domains, and the formula is expressed

in a decidable combination of decidable theories (i.e.,

equality with uninterpreted functions and ﬁxed-width

bitvectors) [51]. Moreover, Z3 can solve this problem in

practice because the crash schedule b

, which is a set of

boolean variables, is the only universally quantiﬁed vari-

able in the negated formula. As many ﬁle system speciﬁ-

cations have simple semantics, the crash schedule b

has

few boolean variables—often only one (e.g., the transac-

tion in §2.1)—which makes the reasoning efﬁcient.

The veriﬁer’s symbolic execution engine supports all

regular Python code with concrete (i.e., non-symbolic)

values. For symbolic values, it supports booleans, ﬁxed-

width integers, maps, and lists of concrete length, as well

as regular control ﬂow including conditionals and loops,

but no exceptions or coroutines. It does not support sym-

bolic execution into C library code.

3.3 The counterexample visualizer

To make counterexamples to validity easier to under-

stand, Yggdrasil provides a visualizer for the asyn-

chronous disk model. Given a counterexample model of

the formula in Deﬁnition 4, the visualizer produces con-

crete disk event traces (e.g., see §2.3) as follows. First,

it uses the crash schedule b

to identify the boolean vari-

able on that indicates where the system crashed, and

relates that location to the implementation source code

with a stack trace. Second, it evaluates the boolean sync

variables that indicate whether a write is synchronized

to disk, and prints out the pending writes with their cor-

responding source locations to help identify unintended

reorderings. Yggdrasil also allows programmers to sup-

USENIX Association 12th USENIX Symposium on Operating Systems Design and Implementation 7

ply their own plugin visualizer for data structures speciﬁc

to their ﬁle system images. We found this facility useful

when developing YminLFS and Yxv6.

3.4 The optimizer

The Yggdrasil optimizer improves the run-time perfor-

mance of implementation code. Yggdrasil treats the op-

timizer as untrusted and re-veriﬁes the optimized code it

generates. This simple design, made possible by push-

button veriﬁcation, allows programmers to plug in cus-

tom optimizations without the burden of supplying a cor-

rectness proof. We provide one built-in optimization that

greedily removes disk ﬂush operations (see §2.4), imple-

mented by rewriting the Python abstract syntax tree.

4 The Yxv6 ﬁle system

The section describes the design, implementation, and

veriﬁcation of the Yxv6 journaling ﬁle system. At a

high level, verifying the correctness of Yxv6 requires

Yggdrasil to obtain an SMT encoding of both the speciﬁ-

cation and implementation through symbolic execution,

and to invoke an SMT solver to prove the crash reﬁne-

ment theorem. A simple approach, used by YminLFS in

§2, is to directly prove crash reﬁnement between the en-

tire ﬁle system speciﬁcation and implementation. How-

ever, the complexity of Yxv6 makes such a proof in-

tractable for state-of-the-art SMT solvers. To address this

issue, Yxv6 employs a modular design enabled by crash

reﬁnement to scale up SMT reasoning.

4.1 Design overview

Yxv6 uses crash reﬁnement to achieve scalable SMT rea-

soning in three steps. First, to reduce the size of SMT

encodings, Yxv6 stacks ﬁve layers of abstraction, each

consisting of a speciﬁcation and implementation, starting

with an asynchronous disk speciﬁcation (§4.2). We use

Yggdrasil to prove crash reﬁnement theorems for each

layer, showing that each correctly implements its speciﬁ-

cation. Upper layers then use the speciﬁcations of lower

layers, rather than their implementations, in order to ac-

celerate veriﬁcation. This layered approach effectively

bounds the reasoning to a single layer at a time.

Second, many ﬁle system operations touch only a

small part of the disk. To allow the SMT solver to ex-

ploit this locality, Yxv6 explicitly uses multiple separate

disks rather than one. For example, by storing the free

bitmap on a separate disk, the SMT solver can easily

infer that updating it does not affect the rest of the ﬁle

system. We then prove crash reﬁnement from this multi-

disk system to a more space-efﬁcient ﬁle system that uses

only a single disk (§4.3). The result of these ﬁrst two

steps is Yxv6+sync, a synchronous ﬁle system that com-

mits a transaction for each system call (by forcing the log

to disk), similar to xv6 [14] and FSCQ [7].

gular ﬁles, symbolic

links, and directories

Yxv6

ﬁles

inodes

Yxv6

inodes

virtual

trans-

actional disk

block

pointer

transactional

disk

write-ahead

logging

asynchronous

disk

block

device

Axiom

Theorem

Layer

Layer 2

Layer 3

Layer 4

Layer 5

Figure 3: The stack of layers of Yxv6. Within each layer, a

shaded box represents the speciﬁcation; a (white) box repre-

sents the implementation; and the implementation is a crash

reﬁnement of its speciﬁcation, denoted using an arrow. Each

implementation (except for the lowest layer) builds on top of a

speciﬁcation from the layer below, denoted using a circle.

Finally, for better run-time performance, we imple-

ment an optimized variant of Yxv6+sync that groups

multiple system calls into one transaction [19] and com-

mits only when the log is full or upon fsync. We prove

the resulting ﬁle system, called Yxv6+group_commit, is

a crash reﬁnement of Yxv6+sync with a more relaxed

crash consistency model (§4.4).

4.2 Stacking layers of abstraction

Figure 3 shows the ﬁve abstraction layers of Yxv6. Each

layer consists of a speciﬁcation and an implementation

that is written using a lower-level speciﬁcation. We de-

scribe each of these layers in turn.

Layer 1: Asynchronous disk. The lowest layer of the

stack is a speciﬁcation of an asynchronous disk. This

speciﬁcation comprises the asynchronous disk model we

used in §2.2 to implement YminLFS. Since the imple-

mentation of a physical block device is opaque, we as-

sume the speciﬁcation correctly models the block de-

vice (i.e., the speciﬁcation is more conservative and al-

lows more behavior than real hardware), as follows:

Axiom 1. A block device is a crash reﬁnement of the

asynchronous disk speciﬁcation.

Layer 2: Transactional disk. The next layer intro-

duces the abstraction of a transactional disk, which man-

8 12th USENIX Symposium on Operating Systems Design and Implementation USENIX Association

ages multiple separate data disks, and offers the follow-

ing operations:

• d.begin_tx() starts a transaction;

• d.commit_tx() commits a transaction;

• d.write_tx(j, a, v) adds to the current transaction a

write of value v to address a on disk j; and

• d.read(j, a) returns the value at address a on disk j.

The speciﬁcation says that operations executed within

the same transaction are atomic (i.e., all-or-nothing) and

sequential (i.e., transactions cannot be reordered).

The implementation uses the standard write-ahead

logging technique [19, 31]. It uses one asynchronous

disk (from layer 1) for the log, and a set of asynchronous

disks for data. Using a single transactional disk to man-

age multiple data disks allows higher layers to separate

writes within a transaction (e.g., updates to data and

inode blocks will not interfere), which helps scale SMT

reasoning; §4.3 reﬁnes the multiple disks to one.

The implementation is parameterized by the transac-

tion size limit k (i.e., the maximum number of writes in

one transaction). The log disk uses a ﬁxed number of

blocks, determined by k, as a header to store log entry

addresses, and the remaining blocks to store log entry

data. The ﬁrst entry in the ﬁrst header block is a counter

of log entries; the consistency invariant for the transac-

tional disk layer says that this counter is always zero after

recovery. The Yxv6+sync ﬁle system sets k = 10, while

Yxv6+group_commit sets k = 511. For each of these

settings, we prove the following theorem:

Theorem 2. The write-ahead logging implementation is

a crash reﬁnement of the transactional disk speciﬁcation.

Layer 3: Virtual transactional disk. The speciﬁca-

tion of the virtual transactional disk is similar to that

of the transactional disk, but instead uses 64-bit virtual

disk addresses [22]. Each virtual address can be mapped

to a physical disk address or unmapped later; reads and

writes are valid for mapped addresses only. We will use

this abstraction to implement inodes in the upper layer.

The virtual transactional disk implementation uses the

standard block pointers approach. It uses one transac-

tional disk managing at least three data disks: one to

store the free block bitmap, another to store direct block

pointers, and the third to store both data and singly in-

direct block pointers (higher layers will add additional

disks). The free block bitmap disk stores only one bit in

each of its blocks, which simpliﬁes SMT reasoning but

wastes disk space; §4.3 will reﬁne it to a more space-

efﬁcient version.

The implementation relies on two consistency invari-

ants: (1) the mapping from virtual disk addresses to

physical disk addresses is injective (i.e., each physical

address is mapped at most once), and (2) if a virtual disk

address is mapped to physical address a, the a

bit in

the block bitmap must be marked as used. We use these

invariants to prove the following theorem:

Theorem 3. The block pointer implementation is a crash

reﬁnement of the virtual transactional disk speciﬁcation.

Layer 4: Inodes. The fourth layer introduces the ab-

straction of inodes. Each inode is uniquely identiﬁed us-

ing a 32-bit inode number. The speciﬁcation maps an

inode number to 2

blocks, and to a set of metadata such

as size, mtime, and mode.

The implementation is straightforward thanks to the

virtual transactional disk speciﬁcation. It simply splits

the 64-bit virtual disk address space into 2

ranges,

and each inode takes one range, which has 2

“virtual”

blocks, similar to NVMFS/DFS [22]. Inode metadata re-

sides on a separate disk managed by the virtual transac-

tional disk (which now has four data disks). There are no

consistency invariants in this layer. We prove the follow-

ing theorem:

Theorem 4. The Yxv6 inode implementation is a crash

reﬁnement of the inode speciﬁcation.

Layer 5: File system. The top layer of the ﬁle system

is an extended version of FSSpec given in §2, with regular

ﬁles, directories, and symbolic links.

The implementation builds on top of the inode speci-

ﬁcation, using a separate inode bitmap disk and another

for orphan inodes. Both are managed by the virtual trans-

actional disk (which now has six data disks plus the log

disk, giving a total of seven disks). There are two consis-

tency invariants: (1) if an inode is not marked as used in

the inode bitmap disk, its size must be zero in the meta-

data; and (2) if an inode has n blocks, no “virtual” block

larger than n is mapped. Using these invariants, we prove

the ﬁnal crash reﬁnement theorem:

Theorem 5. The Yxv6 implementation of ﬁles is a crash

reﬁnement of the speciﬁcation of regular ﬁles, symbolic

links, and directories.

Finitization. The Yggdrasil veriﬁer requires Yxv6 op-

erations to be ﬁnite, as mentioned in §3.2. Most ﬁle sys-

tem operations satisfy this requirement, as they use only

a small number of disk reads and writes. For example,

moving a ﬁle involves updating only the source and des-

tination directories. However, there are two exceptions.

First, search-related procedures, such as ﬁnding a free

bit in a bitmap, may need to read many blocks. We

choose not to verify the bit-ﬁnding algorithm, but in-

stead adopt the idea of validation [38, 46, 48] to imple-

ment such search algorithms. The validator, which we

do verify, simply checks that an index returned by the

search is indeed marked free in the bitmap and if not,

fails the operation with an error code. We use similar

USENIX Association 12th USENIX Symposium on Operating Systems Design and Implementation 9

剩余796页未读，继续阅读

asdfdypro

粉丝: 0
资源: 3

2016年USENIX OSDI会议论文集：操作系统设计与实现前沿

2018安全顶会usenix论文集

USENIX顶会2005到2018年论文集以及论文题目摘要作者信息统计表格 part1

OSDI2018 paper集合

能列举一下USENIX ATC、HPCA、ASPLOS、OSDI、NSDI、EuroSys会议中有关操作系统的新算法吗

动手学深度学习参考文献的格式

能列举一下USENIX ATC、HPCA、ASPLOS、OSDI、NSDI、EuroSys会议中有关操作系统的新算法吗，并详细讲解一下具体算法吗

hbase来源于哪篇博文

usenix伦理声明

Usenix Security Symposium

usenix 2018

最新资源