Manifold Structure Analysis of Internet Traffic Matrix Based on
E-Isomap
Shi Hao
1
, Yin Baoqun
1
, QianYekui
2
, Lei Yingke
3
1. Department of Automation, University of Science and Technology of China, Anhui Hefei 230027, P.R. China
E-mail: haoshi@mail.ustc.edu.cn, bqyin@ustc.edu.cn
2. Department of Missile, Air Defense Forces Academy of PLA, Henan Zhengzhou 450052, P.R. China
E-mail: qyk1129@163.com
3. Electronic Engineering Institute of PLA, Anhui Hefei 230037, P.R. China
E-mail: leiyingke@163.com
Abstract: With the advent of big data, the Internet data traffic has taken over people’s lives. How to analyze these mass data
has become an imperative problem to be solved recently. Traffic matrix has been a useful traffic model for addressing a variety
of problems from the whole-network perspective. The principal challenge presented by Origin-Destination (OD) traffic matrix
is that OD traffic matrix usually forms a high dimensional multivariate structure. In this paper we introduce an improved
Isomap algorithm, Efficient-Isomap (E-Isomap), which is a much more efficient nonlinear dimension reduction tool than the
classic Isomap, then we apply E-Isomap to real OD traffic matrix taken from the backbone network (Abilene). The simulation
results show that the high-dimensional OD traffic matrix has a small intrinsic dimension and there indeed exists a
low-dimensional manifold structure. The computing analysis and residual variance analysis indicate that E-Isomap is capable
of finding the manifold structure of the high-dimensional OD traffic matrix more efficiently in keeping accuracy constant.
Key Words: traffic matrix, manifold structure, E-Isomap
1 Introduction
A lot of effort has been put into the study of traffic on a
single link in isolation by far [1, 2]. However, a wide
variety of significant problems confronted by researchers
nowadays ask for modeling and analysis of network traffic
on all links simultaneously, including anomaly detection,
traffic engineering, traffic forecasting and capacity
planning [3-6]. Therefore, traffic matrix was proposed in [7]
and researches on network traffic’s temporal-spatial
coherence have made some progress so far. In [8], Lakhina
et al used dimensionality analysis to interpret the linear
structure of the traffic matrix and they applied Principle
Component Analysis (PCA) to decompose the structure of
OD traffic matrix into three main constituents. But PCA is
a linear dimensionality reduction technique, seeking an
alternate linear lower-dimensional approximation to the
given high-dimensional structure. Whereas, whether there
exists intrinsically a more intricate underlying nonlinear
structure for the traffic matrix is a problem that needs
further study.
In this paper, we use an improved Isomap algorithm,
E-Isomap to analyze the OD traffic matrix’s nonlinear
manifold structure. Classic Isomap algorithm is a
frequently-used nonlinear dimensionality reduction method.
However, Isomap suffers from computational complexity
issues when the number of the sample points is too large.
In order to overcome this problem, Tenenbaum et al.
proposed a modification on Isomap which is called
*
This paper is supported in part by the National Science
Foundation of China under grant Nos. 61174124, 61233003, in
part by Research Fund for the Doctoral Program of Higher
Education of China under grant No.20123402110029 and in part
by Natural Science Program of the Anhui High Education Bureau
of China under grant No.KJ2012A286.
Landmark-Isomap (L-Isomap). Unfortunately, L-Isomap is
not stable enough because of the randomness of the
landmark point selection. To address this issue, we use an
extended minimum subset cover (MSC) strategy instead, to
make sure that the selected landmark points are distributed
evenly across the sampling points. We call this evolved
algorithm, E-Isomap. Then a series of experiments verify
the effectiveness and feasibility of E-Isomap.
Finally, we apply E-Isomap to the OD traffic matrix data
set taken from Abilene. It is found that the 144-dimensional
traffic matrix has a small intrinsic dimension. From the
embedding results, we can see that there indeed exists an
underlying nonlinear structure hidden behind the
high-dimensional observation traffic matrix data.
This paper is organized as follows. We begin in Section
2 with an introduction of the OD traffic matrix modeling.
In Section 3, we discuss the extended MSC strategy and
introduce the E-Isomap in detail. Additionally, we compare
E-Isomap’s complexity with classic Isomap’s by applying
them to artificial data sets. The simulation results on traffic
matrix data sets are presented in Section 4. We investigate
the traffic matrices’ intrinsic dimensionality and its
manifold structure. Finally, concluding remarks are
presented in Section 5.
2 OD Traffic Matrix Modeling
In this section, formal approach to OD traffic matrix is
defined, which also facilitates discussion in the subsequent
sections.
An OD flow comprises all traffic originating from a
given source and delivering to a given destination. Let
n
denote the number of all sources and destinations in a
network. A traffic matrix is naturally represented by a three
dimensional, nonnegative hyper-matrix
tX
, with
,ij th
entry
,
()
ij
Xt
. Each entry represents the traffic
Proceedings of the 33rd Chinese Control Conference
Jul
28-30, 2014, Nan
in
, China
5498