
Figure 2: PlanetLab nodes used for DNS lookups
nodes, which are spread across North America, South America,
Europe, Asia, and Australia. The queries were performed March
27-29, 2013. These distributed DNS queries help ensure that we
gather a comprehensive set of DNS records for each cloud-using
subdomain and capture any geo-location-specific cloud usage.
We refer to the list of cloud-using subdomains, and their associ-
ated DNS records, as the Alexa subdomains dataset.
Packet Capture Dataset. Our second primary dataset is a se-
ries of packet traces captured at the border of the Universityof
Wisconsin-Madison campus network
3
.WecapturedfullIPpackets
whose source or destination IP address fell wi t hin the publicad-
dress ranges published by EC2 and Azure. The capture was per-
formed from Tuesday, June 26 to Monday, July 2, 2012 giving
us a full week of traffic and a total of 1.4TB of data. The total
Internet traffic averaged approximately 7Gbps during the capture,
with about 1% of the t raffic going to/coming from EC2 or Azure.
Due to the relatively low rate of traffic being captured, no loss oc-
curred during the capture process (according to tcpdump and coun-
ters reported by the border router). To protect user privacy,we
anonymized the IP addresses of clients within the universitynet-
work, and we only report aggregate statistics.
Since our traces contain full packets, we were able to performan
in-depth analysis of network and transport layer information (e.g.,
IP addresses, protocols, ports), application layer information (e.g.,
HTTP hostnames, HTTP content-type, HTTPS certificates), and
packet payloads. We extracted relevant information from thetraces
using Bro [33], a network monitoring and traffic analysis tool. We
refer to these traces as the packet capture dataset.
We recognize that a packet trace from a single campus vantage
point may not reflect the “typically” usage patterns of services de-
ployed in IaaS clouds. Correspondingly, we only leverage thepack-
et capture for analysis which cannot be conducted using our Alexa
subdomains dataset—namely, protocol usage (§3.1), popularity es-
timates based on traffic volume and flow counts (§3.2), and flow
characteristics (§3.3).
3. WEB-FACING CLOUD TENANTS
In this section, we explore what applications are being hosted
on public IaaS clouds. We start by analyzing the packet capture
to identify the types of applications being hosted. This analysis
suggests (unsurprisingly) that web applications representalarge,
important set of cloud tenants. We then turn to examining which
of the most popular websites are using clouds. We view popularity
both globally, via the Alexa top website rankings, and locally, via
the volume of traffic associated with each domain in the packetcap-
ture. We also analyze the traffic patterns of cloud-using services,
including flow characteristics and content types served.
3
The university has seven /24 IP blocks and one /16 IP block
Cloud Bytes Flows
EC2 81.73 80.70
Azure 18.27 19.30
Total 100 100
Table 1: Percent of traffic volume and percent of flows associated
with each cloud in the packet capture.
EC2 Azure Overall
Protocol Bytes Flows Bytes Flows Bytes Flows
ICMP 0.01 0.03 0.01 0.18 0.01 0.06
HTTP (TCP) 16.26 70.45 59.97 65.41 24.24 69.48
HTTPS (TCP) 80.90 6.52 37.20 6.92 72.94 6.60
DNS (UDP) 0.11 10.33 0.10 11.59 0.11 10.58
Other (TCP) 2.40 0.40 2.41 1.10 2.40 0.60
Other (UDP) 0.28 0.19 0.31 14.77 0.28 3.00
Total 100 100 100 100 100 100
Table 2: Percent of traffic volume and percent of flows associated
with each protocol in the packet capture.
3.1 Protocols and Services
We first examine the fraction of bytes and flows in the packet
capture that are associated with each cloud (Table 1). We only
consider flows that were initiated within the university and destined
for EC2 or Azure. We observe that the majority of cloud traffic,
both as measured by volume and number of flows, is EC2-related:
81.73% of bytes (80.70% of flows) are associated with EC2, while
Azure accounts for 18.27% of bytes (19.30% of flows).
Next, we use the packet capture to study the application-layer
protocols used by cloud tenants. Table 2 shows the percentageof
bytes (and flows) using a specific protocol relative to the total num-
ber of bytes (and flows) for EC2, Azure, and the capture as a whole.
We observe that more than 99% of bytes in the packet capture are
sent and received using TCP, with less than 1% of bytes associated
with UDP or ICMP. The vast majority of this TCP t r affic is HTTP
and HTTPS. The proportion of HTTPS traffic is far higher than
that seen for general web services in the past (roughly 6% [18]);
as we will show later, HTTPS traffic is dominated by cloud stor-
age services. Interestingly, the majority of Azure’s TCP traffic is
HTTP (59.97%) while the majority of EC2’s TCP traffic is HTTPS
(80.90%)
The breakdown by flow count is less skewed towards TCP, with
UDP flows accounting for 14% of flows in the packet capture. This
is largely due to DNS queries, which account for 11% of flows but
carry few bytes.
As one would expect, public IaaS clouds are also used for non-
web-based services. In t he packet capture, we find a small fraction
of non-HTTP(S) TCP traffic and non-DNS UDP traffic going to
both EC2 and Azure. This traffic includes SMTP, FTP, IPv6-in-
IPv4, SSH, IRC, and other traffic that Bro could not classify.
Summary and implications. While we analyze a single vantage
point, our measurements suggest that web services using HTTP(S)
represent an important set of WAN-intensive cloud tenants. The
extent to which compute-intensive workloads (that may not result
in a large impact on network traffic) are prevalent as cloud tenants
remains an interesting open question. In the following sections we
dig into what tenants are hosting web services on public clouds as
well as diving deeper into their traffic patterns.
3.2 Popular Cloud-Using (Sub)Domains
Cloud-using Alexa domains. We now consider what subset of the
Alexa top 1 million websites use the cloud to (partly) host their
services. Recall that Alexa provides an estimate of the most pop-