数据中心中基于LT码的多对一传输协议LTTP

156 浏览量更新于2024-08-27 收藏 1.41MB PDF 举报

"LTTP: An LT-code Based Transport Protocol for Many-to-One Communication in Data Centers" 在当前的数据中心网络中，TCP（传输控制协议）被广泛用于确保数据的可靠传输。然而，随着数据中心通信模式的多样化，一种名为TCP Incast的问题逐渐显现。TCP Incast主要发生在有障碍同步需求的多对一通信场景下，这种情况下TCP的吞吐量会显著下降。为了解决TCP Incast问题，先前的解决方案要么需要更新操作系统或硬件以支持细粒度的定时器，要么通过智能控制交换机缓冲区的利用率来降低拥塞和丢包的概率。本文提出了一种针对数据中心网络中多对一通信的新方法，即LTTP（基于LT码的传输协议）。LTTP的核心思想是改进Luby Transform（LT）码，利用数据冗余实现基于UDP（用户数据报协议）的可靠传输。同时，LTTP采用TCP Friendly Rate Control（TFRC）算法来调整服务器的发送速率，以确保网络流量的平滑和公平。 NS-2（网络模拟器2）的仿真结果显示，无论在何种条件下，LTTP的吞吐量都不会出现退化，这表明LTTP能有效避免TCP Incast导致的性能下降。此外，LTTP利用LT码的纠错能力，能够在数据丢失或错误时恢复信息，从而提高了传输的可靠性。相比传统的TCP，LTTP能够更好地处理数据中心内的大规模并发连接，降低了延迟，并提升了整体的系统效率。 LT码是一种基于概率的编码技术，它通过在数据流中插入冗余信息来增强抵抗错误的能力。在LTTP中，这些冗余信息使得接收端即使在存在部分数据丢失的情况下也能正确解码原始数据。而TFRC则是一种旨在与TCP共存的流量控制策略，它旨在保持发送速率与TCP的收敛速率相匹配，避免因过高的发送速率导致的网络拥塞。 LTTP是一个创新的、基于LT码的传输协议，它有效地解决了数据中心多对一通信中的TCP Incast问题，通过利用数据冗余和智能流量控制，实现了高效、可靠的传输性能。这一研究对于优化数据中心的通信效率和提升服务质量具有重要意义，对于未来数据中心网络架构的设计和优化提供了新的思路。

54 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 32, NO. 1, JANUARY 2014

applies TCP-friendly mechanism to congestion control. We

choose digital fountain code [15] (speciﬁcally, LT code [12])

to achieve reliable data delivery, and adopt TFRC [14] to

adjust the data sending rates and maintain reasonable band-

width utilization. We explain the reasons that we choose these

technologies to design LTTP as follows.

There is a tradeoff between bandwidth overhead (redun-

dancy) and performance when choosing the coding scheme.

For example, some erasure codes, such as Reed-Solomon

erasure codes [16], have very low redundancy and can restore

original data from any set of encoding data whose size is equal

to that of original data; but the time that is requ ired for en-

coding and decoding is unaffordable for realtime transmission,

particularly considering the trafﬁc speed in data centers is as

high as 1Gbps or even 10Gbps.

We choose digital fountain code as the coding scheme.

Firstly, digital fountain code can restore original data from

the encoding data whose size is marginally larger than that of

the original data, introducing reasonable bandwidth overhead.

As we will show later in simulations, the gain we get from

keeping network goodput high signiﬁcantly outweighs the

bandwidth cost we pay. Secondly, digital fountain code can

provide good performance in encoding and decoding. Thirdly,

digital fountain code only cares about how much encoding

data (enough to restore original data) has b een received, rather

than which encoding data are received. Thus, the out-of-order

delivery is not an issue any more. As a consequence, data

loss caused by switch buffer overﬂow does not have obviously

negative effect to the network throughput.

There are different implementations of digital fountain code,

such as LT code [12] and Raptor code [17]. Although Raptor

code outperforms LT code, we still choose LT code to realize

digital fountain code in LTTP. It is because that when the data

size is small, the performance difference between LT code and

Raptor code is trivial [18]. In the TCP Incast scenario, the

size of data exchanged between client and server is usually

small. In addition, Raptor code needs to generate intermediate

symbols ﬁrstly, and then uses these intermediate symbols as

the input of LT code’s encoding algorithm to produce the

encoding data. Therefore, the implementation complexity of

Raptor code is much higher than that of LT code.

As for congestion control, recent work [19] shows that

when all the senders adopt digital fountain based protocols

and act as selﬁsh players to inject data in network as fast as

they can, a Nash equ ilibrium can be reached eventually. At

this equilibrium state, the throughput of each ﬂow is similar

to that when all the senders use TCP. However, in typical

many-to-one communication pattern when TCP Incast occurs,

the transferred data volume is very small, and it is of high

probability that the Nash equilibrium cannot be reached before

all the data have been transferred. So we still have to spend

extra efforts to deal with congestion control in LTTP.

We simply resort to the existin g TCP-friendly mechanisms,

a good summary of which is presented in [20]. Generally,

they can be classiﬁed into rate-based schemes and window-

based schemes. To achieve TCP friendliness, the rate-based

schemes adjust the sending rate based on feedback from the

receiver, while the window-based schemes adopt a window

(similar to the congestion window in TCP) at the sender o r

receiver. However, the window-based schemes may lead to

a typical sawtooth pattern in the throughput. So we choose

TFRC [14] as the congestion control mechanism in LTTP,

which is rate-based and can provide a smoother sending rate.

The analysis in section IV will show the advantage of rate-

based congestion control in improving bandwidth utilization

for many-to-one communication.

We emphasize that LT code can restore the original data

from any set of encoding data, as long as the number of

packet losses/errors falls into a reasonable range. However, in

severe congestion condition, the packet losses may increase,

and the receiver may not be able to restore original data within

a reasonable time. In extreme cases, the receiver may fail to

receive enough encoding data and restore the original data

successfully. In this case, both encoding process at the sender

side and decoding process at the receiver side enter an endless

loop. The reason is that the encoding process does not receive

the terminating signal and continues to encode data and send

out encoding data, while the decoding process tries to receive

more encoding data to restore the original data. This situation

can be ﬁxed by setting a timer on the sender. If the sender does

not receive the terminating signal before the timer expires, the

sender can terminate the communication actively.

B. Overall Framework

The complete framework of LTTP to support many-to-one

communication in data centers includes two parts, i.e., the data

channel from each server to the client, and the control channel

between the client and each server. In the data channel, we

improve LT code for reliable data transport, and adopt TFRC

for controlling the trafﬁc sending rate at servers. The control

channel is employed by the client to issue data requests to

servers and send terminating signals to the servers as soon as

the requested data have been restored. The servers also use

the control channel to send decoding parameters to the client.

The decoding parameters include the original data size and

block size (the block size is deﬁned at section III-C), which

are used by the client to execute the decoding process (we will

discuss the decoding process in section III-C). For the control

channel messages, the data size is small enough to be put into

a single packet. Hence, it is unnecessary to employ coding

for transmission. Instead, we establish a TCP connection for

each client-server pair to deliver the control channel messages

reliably.

Fig. 2 illustrates the workﬂow in LTTP. First, the client

establishes control channels (TCP connections) to all the

servers. Second, the client sends requests to all the servers

simultaneously through the control channel, asking the servers

to start sending the data. Third, once receiving the request, the

servers use control channel to send the decoding parameters

back to the client. Meanwhile, each server starts to employ LT

code to produce and send encoding packets continually. TFRC

is used by both servers and the client to control the sending

rate. Finally, as soon as the original data is successfully

restored, the client sends a terminating signal through control

channel back to the corresponding server, which informs the

server to stop encoding.

In our implementation, the upper applications on both the

server side and the client side are responsible for making

剩余12页未读，继续阅读

weixin_38663595

粉丝: 4

数据中心中基于LT码的多对一传输协议LTTP

简单计时器：angular-math-centers 教学使用

Matlab算法速览：信号处理-图像处理-数值计算-机器学习实战

云计算中的大数据细粒度访问控制：基于CP-ABE的新策略

High-Performance-Networking-Software-for-Data-Centers.pdf

Dandelion: A Locally-High-Performance and Globally-High-Scalability Hierarchical Data Center Network

ESprint: QoS-Aware Management for Effective Computational Sprinting in Data Centers

SOFA: A Multi-Model Framework for Interactive Physical Simulation

An Effective Correlation-Aware VM Placement Scheme for SLA Violation Reduction in Data Centers

kaggle data: cms-estimated-uninsured-people-数据集

uofs-server:声音单位-服务器

最新资源