54 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 32, NO. 1, JANUARY 2014
applies TCP-friendly mechanism to congestion control. We
choose digital fountain code [15] (specifically, LT code [12])
to achieve reliable data delivery, and adopt TFRC [14] to
adjust the data sending rates and maintain reasonable band-
width utilization. We explain the reasons that we choose these
technologies to design LTTP as follows.
There is a tradeoff between bandwidth overhead (redun-
dancy) and performance when choosing the coding scheme.
For example, some erasure codes, such as Reed-Solomon
erasure codes [16], have very low redundancy and can restore
original data from any set of encoding data whose size is equal
to that of original data; but the time that is requ ired for en-
coding and decoding is unaffordable for realtime transmission,
particularly considering the traffic speed in data centers is as
high as 1Gbps or even 10Gbps.
We choose digital fountain code as the coding scheme.
Firstly, digital fountain code can restore original data from
the encoding data whose size is marginally larger than that of
the original data, introducing reasonable bandwidth overhead.
As we will show later in simulations, the gain we get from
keeping network goodput high significantly outweighs the
bandwidth cost we pay. Secondly, digital fountain code can
provide good performance in encoding and decoding. Thirdly,
digital fountain code only cares about how much encoding
data (enough to restore original data) has b een received, rather
than which encoding data are received. Thus, the out-of-order
delivery is not an issue any more. As a consequence, data
loss caused by switch buffer overflow does not have obviously
negative effect to the network throughput.
There are different implementations of digital fountain code,
such as LT code [12] and Raptor code [17]. Although Raptor
code outperforms LT code, we still choose LT code to realize
digital fountain code in LTTP. It is because that when the data
size is small, the performance difference between LT code and
Raptor code is trivial [18]. In the TCP Incast scenario, the
size of data exchanged between client and server is usually
small. In addition, Raptor code needs to generate intermediate
symbols firstly, and then uses these intermediate symbols as
the input of LT code’s encoding algorithm to produce the
encoding data. Therefore, the implementation complexity of
Raptor code is much higher than that of LT code.
As for congestion control, recent work [19] shows that
when all the senders adopt digital fountain based protocols
and act as selfish players to inject data in network as fast as
they can, a Nash equ ilibrium can be reached eventually. At
this equilibrium state, the throughput of each flow is similar
to that when all the senders use TCP. However, in typical
many-to-one communication pattern when TCP Incast occurs,
the transferred data volume is very small, and it is of high
probability that the Nash equilibrium cannot be reached before
all the data have been transferred. So we still have to spend
extra efforts to deal with congestion control in LTTP.
We simply resort to the existin g TCP-friendly mechanisms,
a good summary of which is presented in [20]. Generally,
they can be classified into rate-based schemes and window-
based schemes. To achieve TCP friendliness, the rate-based
schemes adjust the sending rate based on feedback from the
receiver, while the window-based schemes adopt a window
(similar to the congestion window in TCP) at the sender o r
receiver. However, the window-based schemes may lead to
a typical sawtooth pattern in the throughput. So we choose
TFRC [14] as the congestion control mechanism in LTTP,
which is rate-based and can provide a smoother sending rate.
The analysis in section IV will show the advantage of rate-
based congestion control in improving bandwidth utilization
for many-to-one communication.
We emphasize that LT code can restore the original data
from any set of encoding data, as long as the number of
packet losses/errors falls into a reasonable range. However, in
severe congestion condition, the packet losses may increase,
and the receiver may not be able to restore original data within
a reasonable time. In extreme cases, the receiver may fail to
receive enough encoding data and restore the original data
successfully. In this case, both encoding process at the sender
side and decoding process at the receiver side enter an endless
loop. The reason is that the encoding process does not receive
the terminating signal and continues to encode data and send
out encoding data, while the decoding process tries to receive
more encoding data to restore the original data. This situation
can be fixed by setting a timer on the sender. If the sender does
not receive the terminating signal before the timer expires, the
sender can terminate the communication actively.
B. Overall Framework
The complete framework of LTTP to support many-to-one
communication in data centers includes two parts, i.e., the data
channel from each server to the client, and the control channel
between the client and each server. In the data channel, we
improve LT code for reliable data transport, and adopt TFRC
for controlling the traffic sending rate at servers. The control
channel is employed by the client to issue data requests to
servers and send terminating signals to the servers as soon as
the requested data have been restored. The servers also use
the control channel to send decoding parameters to the client.
The decoding parameters include the original data size and
block size (the block size is defined at section III-C), which
are used by the client to execute the decoding process (we will
discuss the decoding process in section III-C). For the control
channel messages, the data size is small enough to be put into
a single packet. Hence, it is unnecessary to employ coding
for transmission. Instead, we establish a TCP connection for
each client-server pair to deliver the control channel messages
reliably.
Fig. 2 illustrates the workflow in LTTP. First, the client
establishes control channels (TCP connections) to all the
servers. Second, the client sends requests to all the servers
simultaneously through the control channel, asking the servers
to start sending the data. Third, once receiving the request, the
servers use control channel to send the decoding parameters
back to the client. Meanwhile, each server starts to employ LT
code to produce and send encoding packets continually. TFRC
is used by both servers and the client to control the sending
rate. Finally, as soon as the original data is successfully
restored, the client sends a terminating signal through control
channel back to the corresponding server, which informs the
server to stop encoding.
In our implementation, the upper applications on both the
server side and the client side are responsible for making