PacketUsher：提升商品PC DPDK分组I/O性能的引擎

108 浏览量更新于2024-08-27 收藏 233KB PDF 举报

"PacketUsher是一个基于DPDK（Data Plane Development Kit）的分组I/O引擎，专门设计用于提升商品化个人计算机上的网络应用性能。由于其灵活性和成本效益，商品化PC在部署网络应用方面变得越来越重要。然而，由于标准I/O操作的高开销，这些应用程序的性能往往较低。PacketUsher通过引入DPDK库，有效地解决了这个问题，显著提升了I/O密集型和计算密集型应用的速度。" PacketUsher的核心在于利用DPDK的高性能特性来优化网络数据包的处理。DPDK是一套开源工具和库，旨在加速网络数据包在用户空间的处理，跳过传统的内核网络堆栈，减少了上下文切换和系统调用带来的延迟。PacketUsher通过替换标准的I/O流程，将数据包处理的工作直接转移到用户空间，从而减少了处理时间，提高了整体吞吐量。在PacketUsher的设计中，它充分利用了现代多核CPU的并行处理能力，使得数据包处理能够并行进行，进一步提升了处理效率。此外，PacketUsher还可能包含优化的内存管理和调度策略，确保数据包的快速获取和释放，避免了内存瓶颈。作为I/O密集型应用的案例，研究者们使用了RFC2544基准测试。RFC2544是网络设备性能评估的行业标准，主要衡量吞吐量、时延和背压等关键指标。在PacketUsher的支持下，该测试实现了与专用硬件相媲美的性能，证明了PacketUsher在提高商品化PC上网络应用性能方面的有效性。 PacketUsher是针对商品PC的一种创新解决方案，它通过DPDK库的集成优化了网络I/O性能，对于需要处理大量网络数据的应用来说，这是一个非常有价值的工具。无论是云计算、数据中心还是网络服务提供商，都能从中受益，提高服务质量和效率，同时降低硬件投入成本。PacketUsher的出现为商品PC在高性能网络应用领域开辟了新的可能性，推动了网络处理技术的发展。

PacketUsher: a DPDK-Based Packet I/O Engine for Commodity PC

Zhijun Xu

∗

, Liang Zhou

†

, Li Feng

‡

, Yujun Zhang

†

, Jun Zhang

and Huadong Ma

∗

Beijing University of Post and Telecommunications, Beijing, CHINA

Email: {xuzhijun, mhd}@bupt.edu.cn

†

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, CHINA

Email: {zhouliang, zhmj}@ict.ac.cn

‡

Faculty of Information Technology, Macau University of Science and Technology, Macau, CHINA

Email: lfeng@must.edu.mo

Inner Mongolia University, Hohhot, CHINA

Email: zhangjun@imu.edu.cn

Abstract—Deploying network applications on commodity PC

is increasingly important because of its ﬂexibility and cheapness.

Due to high packet I/O overheads, the performance of these appli-

cations are low. In this paper, we present PacketUsher, an efﬁcient

packet I/O engine based on the libraries of DPDK. By replacing

standard I/O routine with PacketUsher, we can remarkably

accelerate both I/O-intensive and compute-intensive applications

on commodity PC. As a case study of I/O-intensive application,

our RFC 2544 benchmark over PacketUsher achieves same

testing results as dedicated commercial device. For compute-

intensive application, the performance of our application-layer

trafﬁc generator over PacketUsher is more than 4 times of the

original value and outperforms existing frameworks by about 3

times.

Index Terms—network application; packet I/O; DPDK; I/O-

intensive; compute-intensive

I. INTRODUCTION

Software packet processing on commodity PC is an ideal

choice to deploy network applications, especially after the

thriving of Network Function Virtualization (NFV) [1]. It is

inexpensive to operate, easy to switch between vendors and

perfect to accommodate future software innovations [2]. While

a signiﬁcant step forward in some respects, it was a step

backwards in others. Flexibility on commodity PC is at the cost

of discouraging low performance, which is mainly restricted

by packet I/O overheads. For example, the sendto() system

call of FreeBSD averagely takes 942ns to transmit packets,

and RouteBricks (a software router) reports that 66% CPU

cycles are spent on packet I/O [3].

To address the issue of costly packet I/O, current works

anticipate to bypass Operating System and design novel packet

I/O frameworks to take direct control of hardware. Research

[4] demonstrates that replacing raw packet I/O APIs in general

purpose OS with novel packet I/O frameworks like Netmap [5]

can transparently accelerate software routers, including Open

vSwitch [6] and Click [7].

PF RING [8] is a novel packet I/O framework on commod-

ity PC. Its zero copy version can achieve line rate (14.881

Mpps) packet I/O on 10 Gbit/s link [9]. However, this version

is not free for commercial companies or common users. The

open-source Netmap usually takes 90 CPU cycles to send

or receive packets [5]. But it is not convenient to deploy

(sometimes need to re-compile Linux kernel) and suffers

packet loss at high frame rate. Intel DPDK [10] is a set of

open-source libraries for high-performance packet processing.

It reduces the cost of packet I/O to less than 80 CPU cycles

[10]. Many companies (Intel, 6WIND, Radisys, etc.) have

already supported DPDK within their products. In this paper,

we use the libraries of DPDK to design an efﬁcient packet I/O

engine for common users.

We argue that the packet I/O engine for commodity PC

should have four properties: low coupling with user applica-

tions, multi-thread safe, simple packet I/O API and high-speed

packet I/O performance. Such design goal motives us to im-

plement the packet I/O engine of PacketUsher on commodity

PC. PacketUsher brings noticeable performance improvement

for both I/O-intensive and compute-intensive applications. For

example, it makes our RFC 2544 benchmark (I/O-intensive)

have same testing results as dedicated hardware, and makes

our application-layer trafﬁc generator (compute-intensive) gain

more than 4 times of performance improvement.

The remainder of this paper is organized as follows. In

Section II, we introduce some background knowledge. Section

III shows PacketUsher and its performance evaluation. Section

IV presents two case studies.

II. BACKGROUND KNOWLEDGE

A. Overheads of Standard Packet I/O

Standard packet I/O mechanism in general purpose OS is

interrupt-driven. It has three overheads: interrupt handling,

buffer allocation and memory copy.

Interrupt handling: At high frame rate, interrupt-driven

mechanism suffers the problem of receive livelock [12]. Pre-

vious works [3] [4] [13] utilize batch processing to mitigate

receive livelock. However, some received packets may be

dropped if the OS fails to handle interrupt requests timely. An-

other possible method is replacing interrupt-driven mechanism

with polling which periodically checks the arrival of packets

on NICs. Its drawback is that we must use custom drivers

instead of standard ones. Compromised method is Linux NAPI

[14] which uses interrupt to notify the arrival of packets and

then uses polling to receive batch of packets.

Buffer allocation: Buffer allocation is another time-

consuming action. Allocating buffers for transmitted or re-

ceived packets costs much system resources. Previous works

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38680625

粉丝: 3
资源: 968

PacketUsher：提升商品PC DPDK分组I/O性能的引擎

DPDK峰会：Accelerate VPP Workload with DPDK Cryptodev Framework.pdf

基于dpdk的ovs软件安装步骤

DPDK_SURICATA-4_1_1:用于软件加速的dpdk基础结构。 目前正在研究RX和ACL预过滤器

DPDK-Dump:使用DPDK库，DPDK-Dump能够在磁盘上高速存储网络流量

DPDK峰会：美团云 OVS-DPDK Practices in Meituan Cloud.pdf

qnsm：QNSM是基于DPDK的网络安全监视框架

dpvs:DPVS是基于DPDK的高性能第4层负载均衡器

DPDK工具指南：dpdk-devbind, dpdk-pdump等命令详解

端系统分组I/O加速：提升网络性能2.14倍的创新技术

WinCC Comfort/Advanced V13.0: 替代输出值与DPDK接口指南

最新资源

DPDK_SURICATA-4_1_1:用于软件加速的dpdk基础结构。目前正在研究RX和ACL预过滤器