J. Parallel Distrib. Comput. 119 (2018) 162–171
Contents lists available at ScienceDirect
J. Parallel Distrib. Comput.
journal homepage: www.elsevier.com/locate/jpdc
A data-driven approach of performance evaluation for cache server
groups in content delivery network
Ziyan Wu
a
, Zhihui Lu
a,
*, Wei Zhang
a
, Jie Wu
a
, Shalin Huang
b
, Patrick C.K. Hung
c
a
School of Computer Science, Fudan University, Shanghai 200433, China
b
Wangsu Science & Technology Co., Ltd., Shanghai, China
c
Faculty of Business and IT, University of Ontario Institute of Technology, Canada
h i g h l i g h t s
• We frame CDN performance evaluation problem as a sequence learning problem.
• We use representation learning by LSTM auto-encoder to extract useful features from CDN monitoring log data.
• We use a deep neural network to predict the reach rate of CDN service, and we compare our methods with state-of-arts methods which show ours is
superior by empirical studies.
a r t i c l e i n f o
Article history:
Received 31 January 2018
Received in revised form 4 April 2018
Accepted 16 April 2018
Available online 27 April 2018
Keywords:
Edge computing
Deep learning
Content delivery network
Sequence learning
Predictive analysis
High dimensional data
a b s t r a c t
In industry, Content Delivery Network (CDN) service providers are increasingly using data-driven mech-
anisms to build the performance models of the service-providing systems. Building a model to accurately
describe the performance of the existing infrastructure is very crucial to make resource management de-
cisions. Conventional approaches that use hand-tuned parameters or linear models have their drawbacks.
Recently, data-driven paradigm has been shown to greatly outperform traditional methods in modeling
complex systems. We design a data-driven approach to building a reasonable and feasible performance
model for CDN cache server groups. We use deep LSTM auto-encoder to capture the temporal structures
from the high-dimensional monitoring data, and use a deep neural network to predict the reach rate which
is a client QoS measurement from the CDN service providers’ perspective. The experimental results have
shown that our model is able to outperform state-of-the-art models.
© 2018 Elsevier Inc. All rights reserved.
1. Introduction
There is a trend [7,11,13,18,20] that both academia and industry
use data-driven methods to model complex networked systems.
Traditional approaches typically use some simple heuristics. These
methods have several drawbacks. They cannot accurately reflect
the complex systems due to the lack of knowledge of the real-
world environment. Driven by the opportunity to collect and an-
alyze data (e.g., application quality measurement from end users),
many recent proposals have demonstrated the promise of using
deep learning to characterize and optimize networked systems.
Drawing parallel from the success of deep-learning on pattern
recognition, instead of using an empirical analytical model to de-
scribe the complex interaction of different features, we use deep
learning methods and treat networked systems as a black-box.
*
Corresponding author.
E-mail address: lzh@fudan.edu.cn (Z. Lu).
Uploading all data or deploying all applications to a centralized
cloud is infeasible because of the excessive latency and bandwidth
limitation of the Internet. A promising approach to addressing
centralized cloud bottleneck is edge computing. Edge computing
pushes applications, data and computing power (services) away
from centralized points to the logical extremes of a network. Edge
computing replicates fragments of information across distributed
networks of web servers, which may spread over a vast area.
As a technological paradigm, edge computing is also referred to
as mesh computing, peer-to-peer computing, autonomic (self-
healing) computing, grid computing, and by other names implying
non-centralized, nodeless availability [5]. CDN (content delivery
network or content distribution network) is a typical representa-
tive of edge computing. A CDN is a globally distributed networked
system deployed across the edge of Internet. Composed with ge-
ographically distributed cache servers, CDNs deliver cached con-
tent to customers worldwide based on their geographic locations.
Extensively using cache servers, content delivery over CDN has
https://doi.org/10.1016/j.jpdc.2018.04.010
0743-7315/© 2018 Elsevier Inc. All rights reserved.