Enhancing TCP Incast Congestion Control
Over Large-scale Datacenter Networks
Lei Xu
1
, Ke Xu
1
, Yong Jiang
2
, Fengyuan Ren
3
, Haiyang Wang
4
l-xu12@mails.tsinghua.edu.cn, xuke@mail.tsinghua.edu.cn, jiangy@sz.tsinghua.edu.cn,
renfy@mail.tsinghua.edu.cn, haiyang@d.umn.edu
1
Tsinghua National Laboratory
for Information
Science and Technology,
Tsinghua University,
Beijing 100084, China
2
Graduate School at
Shenzhen,
Tsinghua University,
Shenzhen, Guangdong
518055, China
3
Department of Computer
Science & Technology,
Tsinghua University,
Beijing 100084, China
4
Department of Computer
Science, University of
Minnesota Duluth,
MN, USA.
Abstract—Many-to-one traffic pattern in datacenter
networks introduces the problem of Incast congestion
for Transmission Control Protocol (TCP) and puts un-
precedented pressure to the cloud service providers. To
address heavy Incast, we present an Receiver-oriented
Datacenter TCP (RDTCP). The proposal is motivated
by oscillatory queue size when handling heavy Incast
traffic and substantial potential of receiver in congestion
control. Finally, RDTCP adopts both open- and closed-loop
congestion controls. We provide a systematic discussion on
its design issues and implement a prototype to examine its
performance. The evaluation results indicate that RDTCP
has an average decrease of 47.5% in the mean queue size,
51.2% in the 99th-percentile latency in the increasingly
heavy Incast over TCP, and 43.6% and 11.7% over Incast
congestion Control for TCP (ICTCP).
I. INTRODUCTION
In datacenter networks, it is known that the many-
to-one traffic pattern will also introduce severe TCP
Incast Congestion problem in the high-bandwidth, low-
latency cloud datacenter networks [8], [9]. In parti-
tion/aggregate architecture, a user request is farmed
out among lots of worker nodes, which send back the
results almost simultaneously to the aggregators that are
responsible for combining data and giving back final
results to the user.
When sending data simultaneously to the same re-
ceiver, the output queue at the last-hop switch would
overflow, causing Incast [10], [11]. In Incast, some flows
experience severe packet-drops and Flow Completion
Time (FCT) [12]. Worse, Incast creates long-latency
flows which miss strict deadline and bring users poor-
quality services and enterprises revenue loss [6], [7].
Our aim is to consider numerous concurrent flows
under the many-to-one traffic pattern and diverse work-
loads and satisfy challenges from the upcoming un-
precedented large-scale datacenter networks.
In this paper, we take an initial step towards un-
derstanding the performance of the existing datacenter-
related TCP designs over large-scale datacenter net-
works. Our experiments indicate that typical TCP pro-
tocols fail to work when facing heavier Incast in scaling
up data centers. Additionally, we observe receiver’s
advantages in congestion control.
To address higher in-degree Incast, we present
Receiver-oriented Datacenter TCP (RDTCP), a protocol
that allows the receiver to dominate congestion control.
In addition to receiver-dominant control on congestion
window, RDTCP leverages an open-loop congestion
control, i.e., centralized scheduler, and a closed-loop
congestion control, i.e., ECN, together to respond to
congestion.
1
In this paper, we present RDTCP and implement it in
ns3 based on TCP-NewReno [1]. Our contributions are
as follows: First, we identify the pitfalls of TCP-related
Incast congestion control in large-scale datacenter net-
works and tease out the factors that cause transport
deficiency and performance collapse. Second, we pro-
pose a hybrid framework which seamlessly integrates
the centralized scheduler and ECN at receiver, which is
proved efficient to address large-scale Incast.
2
The rest of paper is organized as follows. Section
II offers related work; Section III describes the moti-
vations regarding RDTCP; Section IV focuses on the
centralized scheduler and ECN of RDTCP. Section V
further evaluates the performance of RDTCP in terms
of heavy Incast, etc. Finally, Section VI concludes the
paper.
1
Congestion Experienced (CE) bits in IP headers and ECN-Echo
(ECE) bits in TCP headers have been used to convey congestion
information in ECN packets. We use the term ECN packets as the
packets that are marked either with the ECE code point in TCP headers
or with the CE code point in IP headers.
2
Our codes of RDTCP in ns3 are accessible by commanding “svn
checkout http://2013rdtcp.googlecode.com/svn/trunk/ 2013rdtcp-
read-only”.