块级访问模式挖掘：提升分布式存储服务器预取性能

185 浏览量更新于2024-08-29 收藏 2.15MB PDF 举报

本文是一篇研究论文，标题为“通过挖掘块访问模式进行存储服务器上的预取”，发表在《计算机并行与分布式系统》(IEEE Transactions on Parallel and Distributed Systems)期刊上，其国际标准数字标识符为DOI:10.1109/TPDS.2015.2496595。该文章探讨了在分布式文件系统中广泛应用的一种优化技术——数据预取，其目标是减少网络和磁盘延迟，从而提升并行或分布式应用的I/O性能。传统的数据预取方法是由客户端文件系统主导的，它基于对用户访问模式的理解来预测和预先加载可能需要的数据。然而，本文作者—— Jianwei Liao、Franc¸ois Trahay、Balazs Geroﬁ 和 Yutaka Ishikawa（IEEE成员）提出了一个新的视角，即在存储服务器层面挖掘和分析块访问模式，以实现更精细的预取策略。在分布式文件系统环境中，数据的访问往往是无序且不可预测的，这给预取带来了挑战。通过在服务器端收集和分析大量数据块的访问历史，可以识别出潜在的访问模式，如热点数据区域、频繁访问的序列或周期性访问。这些模式的发现可以帮助服务器主动预测用户的下一次请求，从而在数据尚未被实际请求时将其提前加载到内存或缓存中。论文的关键贡献可能包括以下几点： 1. **新颖的预取策略**：提出了一种新的数据预取算法，它不再完全依赖于客户端的行为，而是通过服务器端的实时监控和分析，提高了预取的准确性和效率。 2. **访问模式挖掘**：开发了数据挖掘技术来识别和理解存储设备上的数据访问规律，这可能涉及到机器学习算法，如关联规则挖掘或时间序列分析。 3. **性能评估**：论文可能会提供实验结果，展示这种基于块访问模式的预取策略在实际分布式系统中的性能提升，以及它如何优于传统的客户端驱动预取方法。 4. **可扩展性和适应性**：讨论了新方法在处理大规模分布式环境中的可行性和适应性，可能包括处理多用户并发访问、动态数据分布等复杂情况。 5. **系统设计与实现**：论文可能包含了实现这一创新预取机制的系统架构和技术细节，包括如何集成到现有的分布式文件系统中，以及如何处理数据保护和隐私问题。这篇论文为存储服务器上的数据预取提供了全新的思路，它将预取决策的控制权转移到服务器端，利用块访问模式分析来提高I/O性能，对于现代大数据和云计算环境下高效的数据管理具有重要的理论和实践价值。

IEEE TRANSACTIONS ON PARALLEL DISTRIBUTED COMPUTING 3

servers in World Wide Web. Their observation was that

the web servers, which are responsible to handle access

requests from several clients, can make predictions on

which ﬁles are most likely to be demanded in the

near future [32]. I. Zhang [29] has implemented types

of prefetching schemes to improve the performance of

reading ﬁles and directories in WheelFS, which is a FUSE-

based distributed ﬁle system that aims to offer ﬂexible

wide-area storage for distributed applications [30]. Re-

cently, Y. Yin et al. have proposed the IOSIG tool based

on their previous work [23], which can keep track of

parallel I/O calls of an application and then analyze the

collected information to provide a clear understanding

of I/O behavior of the application on the client machine

[17]. As a consequence, the client ﬁle systems can issue

prefetching requests or adjusting layout requests to the

storage servers for I/O optimization after certain access

predictions. However, tracing and analyzing I/O calls on

the client node causes extra space and time overhead,

thus the IOSIG tool may not be a good choice for

conﬁguration-limited client machines to conduct I/O

optimization operations.

Furthermore, C. Amza’s group [36] and X. Zhang’s

group [37] are the pioneers of storage server side

prefetching in network based ﬁle systems. Both groups

proposed their prefetching schemes running on storage

servers, and their evaluation veriﬁed the effectiveness of

server-side prefetching. These schemes, however, either

require modiﬁcations of the applications or are only

working for a very limited number of block access

patterns. In brief, although block access history reveals

the behavior of disk traces, there are no general stor-

age server-side prefetching schemes that analyze block

access history in a distributed ﬁle system for yielding

better system performance. The well-known reasons for

this are the difﬁculties in modeling block access history

to generate block access patterns, as well as the aporias

in deciding the destination client ﬁle system for pushing

the prefetched data from storage servers.

3DATA PREFETCHING ON STO RAG E

SERVERS

In general, the sequence of block access on the stor-

age server is ordered in time, so that the block access

sequence can be split into successive parts by a con-

stant time interval, meaning that the sequence resembles

typical time series [13], [34]. This is a crucial fact for

understanding the proposal of this paper, which is a

server-side data prefetching mechanism that considers

block access history on the storage servers as a time

series, and then tries to classify various block access

patterns from the series. Consequently, it predicts the

future block access requests by matching the ﬁxed access

patterns with current block access events in the predic-

tion window to guide reading block data in advance, and

ﬁnally the fetched data will be pushed to the relevant

Client'ﬁle'system Storage'server Low'level'ﬁle'system

Network

Applica7on

read%(fd,%4096,%0)



read%(stripe_fd,%4096,%0)%

%%+%piggybacked%info.



read%(stripe_fd,%4096,%0)



4096%bytes%required%data



4096%bytes%required%data



required%data



read%(stripe_fd,%4096,%8192)



4096%bytes%required%data



4096%bytes%prefetched%data



read%(fd,%4096,%8192)



required%data



(

Forecas-ng(I/O(access(to(

prefetch(block(data(a9er(

analyzing(access(history(

&(piggybacked/info.



(

Caching(&(managing(

prefetched(data

Compu7ng'with'

input'data

Read'latency



Logging(access(event(

with(piggybacked/info.



①'

②'

③'

④'

⑤'

Fig. 1. Overview of the proposed server-side data

prefetching mechanism. The assumed synopsis of a read

operation is read(int ﬁldes, size t size, off t off)

client ﬁle systems to fulﬁll potential I/O requests on the

client side.

Figure 1 presents the basic idea of our proposed

server-side prefetching mechanism, where the interac-

tion between the client ﬁle system and storage server

can be described through the following steps:

1) After contacting the metadata server to know

which storage server needs to be accessed for the

actual data request, the client ﬁle system sends

the corresponding storage server an I/O request

accompanying with some additional information

about the application and the client ﬁle system

(labeled as piggybacked info.).

2) The storage server, fetches the requested data from

the low level ﬁle system, and records the block

access event along.

3) The application on the client machine can perform

computing tasks after receiving the required data

from the storage server.

4) Meanwhile, the storage server is able to forecast

future block I/O access requests by analyzing the

history of block I/O access events and the piggy-

backed client information. Therefore, the storage

server can issue relevant physical read requests to

the low level ﬁle system for reading data (that are

predicted to be accessed by the future I/O requests)

in advance.

5) Finally, the prefetched data are forwarded to the

corresponding client ﬁle system (determined by

the piggybacked client identiﬁcation) proactively.

As a result, when the prediction of I/O access

is successful, the buffered data on the client ﬁle

system can be returned to the application instantly,

and then read latency can be reduced to a great

extent.

In brief, the storage server can predict the future block

剩余13页未读，继续阅读

weixin_38725623

粉丝: 4
资源: 940

块级访问模式挖掘：提升分布式存储服务器预取性能

Accelerating BFS via Data Structure-Aware Prefetching on GPU

A Software Instruction Prefetching Method

IBM System Storage DS8000 Architecture and Implementation

简要介绍一下Page Size Aware Cache Prefetching的主要内容

关于短视频调度流程的国内外研究现状以及对应的参考文献

multiepochsdataloader

cortex-a7预取指令一般预取多少位？

vue2 h5页面怎么唤起app

数据预取的国内外研究现状

容器 hdfs 预读取

最新资源