FCoE SAN存储：高效可扩展的I/O堆栈优化策略

研究论文

140 浏览量更新于2024-08-28 收藏 2.02MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"针对基于FCoE的SAN存储中的有效和可扩展访问的I/O堆栈优化" 这篇研究论文深入探讨了在基于FCoE（Fibre Channel over Ethernet）的存储区域网络（SAN）环境中，如何优化I/O堆栈以实现更高效且可扩展的访问性能。FCoE是一种融合技术，它将光纤通道协议（Fibre Channel）与以太网结合，允许在以太网基础设施上传输存储数据，从而简化数据中心的布线和管理。在当前的FCoE SAN存储系统中，I/O堆栈的复杂性以及共享队列和锁机制导致了同步访问时的性能瓶颈，这通常会产生较高的I/O开销，并限制了多核服务器的扩展性。为了解决这个问题，论文提出了一个综合的优化方案，包括三个关键策略： 1. 私有化CPU结构和禁用内核抢占：通过为每个CPU核心分配私有的I/O处理结构，并禁止在处理I/O请求时的内核抢占，可以显著提升多核服务器中并行I/O的性能。这种方法减少了上下文切换，确保I/O操作能够连续执行，降低了资源竞争，从而提高整体吞吐量。 2. 块层到FCoE帧的直接映射：论文提出直接将块层的I/O请求映射到FCoE帧中，这有效地将存储I/O转换为网络消息。这一优化减少了中间转换步骤，减少了延迟，并提高了I/O请求的处理速度，使得数据传输更为高效。 3. 低延迟I/O完成方案：为了进一步减少I/O操作的完成时间，研究引入了一种低延迟的I/O完成策略。通过优化I/O完成路径，降低处理延迟，可以显著提高系统的响应速度，这对于实时性和高并发的应用场景尤其重要。研究人员已经实现了这个优化方案的原型系统——FastFCoE，它是一个针对FCoE SAN存储的高性能访问解决方案。实验结果表明，这些优化策略有效地提升了I/O性能，降低了延迟，并增强了系统的可扩展性，为多核服务器环境下的FCoE存储访问提供了有力支持。这项工作对于理解FCoE存储系统的性能瓶颈、设计更优的I/O处理机制以及提升数据中心存储效率具有重要的理论和实践价值。通过这种优化，可以预见未来数据中心的存储性能和扩展能力将得到显著提升，满足不断增长的存储需求。

资源详情

资源推荐

Each I/O has to traverse several layers from application to

hardware. The block layer allows applications to access

diverse storage devices in a uniform way and provides the

storage device drivers with a single point of entry from all

applications, thus alleviating the complexity and diversity of

storage devices. In addition, the block layer mainly imple-

ments I/O scheduling, which performs operations called

merging and sorting to signiﬁcantly improve the perfor-

mance of system as a whole.

The SCSI layer mainly constructs SCSI commands with

I/O requests from the block layer. The Libfc (FCP) layer

maps SCSI commands to Fibre Channel (FC) frames as

deﬁned in standard Fibre Channel Protocol for SCSI

(FCP) [18]. The FCoE layer encapsulates FC frames into

FCoE frames or de-encapsulates FCoE frames into FC

frames as FC-BB-6 standard [3]. In other words, the SCSI,

FCP and FCoE layer mainly translate the I/O requests from

BLOCK layer to FCoE command frames. The Ethernet

driver transmits/receives FCoE frames to/from hardware.

The main I/O performance factors in Open-FCoE stack can

summarized as follows: (1) I/O-issuing Side translates the

I/O requests into FCoE format frames; (2) I/O Completion

Side informs the I/O-issuing threads of the I/O comple-

tions; (3) Parallel Process and Synchronization implements

parallel access on multi-core servers. In this section, we

describe and investigate the current Open-FCoE stack

according to the above mentioned factors.

2.1 Issue 1: High Synchronization Overhead from

Single Queu e & Shared Lock Mecha nism

Fig. 2 shows the I/O requests transmission process in the

SCSI/FCP/FCoE layers of Open-FCoE stack when multiple

cores/threads submit I/O requests to the remote target in

multi-core systems. We describe it as follows :

1) The SCSI layer builds the SCSI command structure

describing the I/O operation from the block layer;

then acquires the shared lock when: (1) enqueueing

the SCSI command into the shared queue in the SCSI

layer; and (2) dispatching the SCSI command from

the shared queue in the SCSI layer to the FCP layer.

2) The FCP layer builds the internal data structure (FCP

request) to describe the SCSI command from the

SCSI layer and acquires the shared lock when

enqueueing the FCP request into the internal shared

queue in the FCP layer. Then, it initializes an FC

frame with sk_buff structure for the FCP request, and

delivers the sk_buff structure to the FCoE layer.

3) The FCoE layer encapsulates FC frame into FCoE

frame, and then acquires the shared lock when:

(1) enqueueing the FCoE frame; and (2) dequeueing

the FCoE frame to transmit the frame to network

with the standard interface dev_queue_xmit().

Obviously, the shared lock provides the synchronization

operations on the shared queue in multi-core servers. How-

ever, such single queue & shared lock mechanism in SCSI/

FCP/FCoE layer decreases the capacity of software scalabil-

ity in multi-core systems.

For the purpose of improving scalability, modern servers

employ cache coherent Non Uniform Memory Access (cc-

NUMA) in multi-core architecture, such as the one depicted

in Fig. 3 that corresponds to the servers in our work. In such

architecture, there are some representative features [11],

[19], [20], [21], [22], [23], [24] that cause signiﬁcantly impacts

on the software performance, such as Migratory Sharing,

False Sharing and signiﬁcant performance difference when

accessing local or remote memory. These features bring

challenges to the developers for developing multi-threaded

software in cc-NUMA multi-core systems [25].

Fig. 1. Architecture of Linux Open-FCoE stack.

Fig. 2. Process of I/O requests transmission in the current Open-FCoE

stack.

Fig. 3. Multi-core architecture with cache coherent non-uniform memory

access (cc-NUMA).

2516 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 28, NO. 9, SEPTEMBER 2017

剩余12页未读，继续阅读

皮卡丘穿皮裤

粉丝: 187
资源: 955

FCoE SAN存储：高效可扩展的I/O堆栈优化策略

The Role of FCoE in I/O Consolidation

SAN存储区域网络（中文版）

FC-SAN存储系统有哪些

华为san存储使用操作系统自带

IP SAN和FC SAN

如何确定linux支持FCoE、iSCSI

hp 3par 控制器

fc-ae-asm协议下载

esxi8.0兼容性表

浪潮交换机CN系列和S系列的区别

brocade 300 配置

mellanox/迈络思 msn2700-cs2f 以太网交换机 配置

华为存储设备主要巡检内容及技术指标

vmdk文件 如果该文件位于远程文件系统

华为 s5730 固件

介绍一下爱数交换机品牌型号

磁盘阵列怎么连接到服务器

FusionCompute 中的CNA

s5735s v200r019c10spc500

有S12500交换机参数吗？

最新资源

mellanox/迈络思 msn2700-cs2f 以太网交换机配置

vmdk文件如果该文件位于远程文件系统