没有合适的资源?快使用搜索试试~ 我知道了~
首页EXPLODE:轻量级存储系统错误检测系统
"eXplode(EXPLODE)是一个轻量级、通用的系统,专为检测严重的存储系统错误而设计,由Junfeng Yang, Can Sar和Dawson Engler在斯坦福大学计算机系统实验室开发。该系统针对文件系统、数据库和RAID等存储系统提出了一个关键的期望:用户提交的数据应被安全地保存,不应丢失或损坏。由于这些系统通常存储着唯一的副本,数据丢失可能带来灾难性的影响。 然而,实现这样的合同并非易事。存储系统的代码必须能够在任意程序点正确处理任何崩溃情况,无论数据的状态如何分布在易失性和持久性存储中。这就要求系统的健壮性和鲁棒性非常高,这使得代码编写极其复杂。 eXplode的创新之处在于它将模型检查技术——一种全面且通常较重的正式验证方法——巧妙地应用到了实际场景中。通过使用用户编写的、可能针对特定系统定制的检查器,eXplode能够引导存储系统进入各种棘手的边缘情况,包括但不限于崩溃恢复时的错误处理。这种方法比纯粹的测试策略更加系统化和高效,同时保持了轻量级的特性,这对于确保大规模存储系统的可靠性至关重要。 与传统的纯测试方法相比,eXplode通过结构化的检查过程,能够深入挖掘存储系统的潜在问题,发现那些在日常操作中可能不易察觉的错误。这不仅有助于提高系统的稳定性,还能帮助开发者尽早修复bug,减少数据丢失的风险,从而提升整体的IT环境安全性。因此,eXplode是一个在现代IT领域中不可或缺的工具,对于维护复杂存储系统的完整性具有重要的实践价值。"
资源详情
资源推荐
1 : const char *dir = "/mnt/sbd0/test-dir";
2 : const char *file = "/mnt/sbd0/test-file";
3 : static void do
fsync(const char *fn) {
4 : int fd = open(fn, O
RDONLY);
5 : fsync(fd);
6 : close(fd);
7 : }
8 : void FsChecker::mutate(void) {
9 : switch(choose(4)) {
10: case 0: systemf("mkdir %s%d", dir, choose(5)); break;
11: case 1: systemf("rmdir %s%d", dir, choose(5)); break;
12: case 2: systemf("rm %s", file); break;
13: case 3: systemf("echo \"test\" > %s", file);
14: if(choose(2) == 0)
15: sync();
16: else {
17: do
fsync(file);
18: // fsync parent to commit the new directory entry
19: do
fsync("/mnt/sbd0");
20: }
21: check
crash now(); // invokes check() for each crash
22: break;
23: }
24: }
25: void FsChecker::check(void) {
26: ifstream in(file);
27: if(!in)
28: error("fs", "file gone!");
29: char buf[1024];
30: in.read(buf, sizeof buf);
31: in.close();
32: if(strncmp(buf, "test", 4) != 0)
33: error("fs", "wrong file contents!");
34: }
Figure 2: Example file system checker. We omit the class initialization
code and some sanity checks.
Checkers range from aggressively system-specific (or
even code-version specific) to the fairly generic. Their
size scales with the complexity of the invariants checked,
from a few tens to many thousands of lines.
Figure 2 shows a file system checker that checks a
simple correctness property: a file that has been syn-
chronously written to disk (using either the fsync or
sync system calls) should persist after a crash. Mail
servers, databases and other application storage systems
depend on this behavior to prevent crash-caused data
obliteration. While simple, the checker illustrates com-
mon features of many checkers, including the fact that it
catches some interesting bugs.
The mutate method calls choose(4) (line 9) to
fork and do each of four possible actions: (1) create a
directory, (2) delete it, (3) create a test file, or (4) delete
it. The first two actions then call choose(5) and cre-
ate or delete one of five directories (the directory name is
based on choose’s return value). The file creation ac-
tion calls choose(2) (line 14) and forces the test file to
disk using sync in one child and fsync in the other. As
Figure 3 shows, one mutate call creates thirteen chil-
Figure 3: Choices made by one invocation of the mutate method in
Figure 2’s checker. It creates thirteen children.
dren.
The checker calls EXPLODE to check crashes. While
other code in the system can also initiate such check-
ing, typically it is the mutate method’s responsibil-
ity: it issues operations that change the storage sys-
tem, so it knows the correct system state and when
this state changes. In our example, after mutate
forces the file to disk it calls the EXPLODE routine
check
crash now(). EXPLODE then generates all
crash disks at the exact moment of the call and invokes
the check method on each after repairing and mounting
it using the underlying storage component (see § 3.3).
The check method checks if the test file exists (line 27)
and has the right contents (line 32). While simple, this
exact checker catches an interesting bug in JFS where
upon crash, an fsync’d file loses all its contents trig-
gered by the corner-case reuse of a directory inode as a
file inode (§7.3 discusses a more sophisticated version of
this checker).
So far we have described how a single mutate call
works. The next section shows how it fits in the check-
ing process. In addition, checking crashes at only a sin-
gle code point is crude; Section 6 describes the routines
EXPLODE provides for more comprehensive checking.
3.3 Setting up checked code: Storage components
Since EXPLODE checks live storage systems, these sys-
tems must be up and running. For each storage subsys-
tem involved in checking, clients provide a storage com-
ponent that implements five methods:
1. init: one-time initialization, such as formatting a
file system partition or creating a fresh database.
2. mount: set up the storage system so that operations
can be performed on it.
3. unmount: tear down the storage system; used by
EXPLODE to clear the storage system’s state so it can
explore a different one (§5.2).
4. recover: repair the storage system after an EX-
PLODE-simulated crash.
5. threads: return the thread IDs for the storage
system’s kernel threads. EXPLODE reduces non-
determinism by only running these threads when it
wants to (§5.2).
剩余15页未读,继续阅读
weixin_38669091
- 粉丝: 4
- 资源: 871
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 多模态联合稀疏表示在视频目标跟踪中的应用
- Kubernetes资源管控与Gardener开源软件实践解析
- MPI集群监控与负载平衡策略
- 自动化PHP安全漏洞检测:静态代码分析与数据流方法
- 青苔数据CEO程永:技术生态与阿里云开放创新
- 制造业转型: HyperX引领企业上云策略
- 赵维五分享:航空工业电子采购上云实战与运维策略
- 单片机控制的LED点阵显示屏设计及其实现
- 驻云科技李俊涛:AI驱动的云上服务新趋势与挑战
- 6LoWPAN物联网边界路由器:设计与实现
- 猩便利工程师仲小玉:Terraform云资源管理最佳实践与团队协作
- 类差分度改进的互信息特征选择提升文本分类性能
- VERITAS与阿里云合作的混合云转型与数据保护方案
- 云制造中的生产线仿真模型设计与虚拟化研究
- 汪洋在PostgresChina2018分享:高可用 PostgreSQL 工具与架构设计
- 2018 PostgresChina大会:阿里云时空引擎Ganos在PostgreSQL中的创新应用与多模型存储
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功