A Landmark Selection Method for L-Isomap Based on Greedy
Algorithm and its Application
Hao Shi, Baoqun Yin, Xiaofeng Zhang, Yu Kang and Yingke Lei
Abstract— Isometric feature mapping (Isomap) is a widely-
used nonlinear dimensionality reduction method, but it suffers
from high computational complexity. L-Isomap is a variant of
Isomap which is faster than Isomap. In this algorithm, a subset
of points are chosen out of the total data points as landmark
points so as to simplify the embedding computation. In this
paper, we propose a novel landmark selection method for L-
Isomap based on a greedy algorithm. Experiments performed
on synthetic and physical data sets validate the effectiveness
of the proposed method. Internet traffic matrix has been an
effective model to analyzing the Internet. However, the Internet
traffic matrix data usually possesses high dimensionality. In this
paper, we apply the improved L-Isomap to the real Internet
traffic matrix data to investigate its low-dimensional features.
The experiment results show that the Internet traffic matrix
has a small intrinsic dimension and there indeed exists a low-
dimensional manifold structure.
I. INTRODUCTION
In the last few decades, dimensionality reduction (DR)
has gradually become a research hotspot, since most emerg-
ing applications are concerned with the high-dimensional
attributes [1-3]. Traditionally, DR was performed using linear
methods such as Principal Component Analysis (PCA) [4]
and Metric Multidimensional Scaling (MDS) [5]. However,
these linear techniques fail to deal with data with nonlinear
structures. Therefore, a vast number of nonlinear techniques
have drawn great interest. In contrast to the linear meth-
ods, the nonlinear techniques are capable of handling the
complex nonlinear data. Representative nonlinear methods
include Isomap [6], Locally Linear Embedding (LLE) [7]
and local tangent space alignment (LTSA) [8]. Of the list-
ed algorithms, Isomap is representative of global methods,
attempting to preserve intrinsic global properties of high-
dimensional observation data. Isomap has been applied in
This work is supported in part by the National Natural Science Foun-
dation of China under grant Nos. 61174124,61233003,61272333, in part
by Research Fund for the Doctoral Program of Higher Education of China
under grant No. 20123402110029 and in part by Natural Science Research
Program of the Anhui High Education Bureau of China under grant No.
KJ2012A286.
Hao Shi is with Department of Automation, University
of Science and Technology of China, 230027, Hefei, China
haoshi@mail.ustc.edu.cn
Baoqun Yin is with Department of Automation, University of Science and
Technology of China, 230027, Hefei, China bqyin@ustc.edu.cn
Xiaofeng Zhang is with Department of Automation, Universi-
ty of Science and Technology of China, 230027, Hefei, China
zhxf325@mail.ustc.edu.cn
Yu Kang is with Department of Automation, University
of Science and Technology of China, 230027, Hefei, China
kangduyu@ustc.edu.cn
Yingke lei is Electronic Engineering Institute, 230027, Hefei, China
leiyingke@163.com
many different domains, such as stellar spectra [9], the
detection of collective behaviors in animal species [10-11],
protein interaction prediction [12] and damage localization
[13]. Isomap incorporates geodesic distance between every
pair of N items on a weighted neighborhood graph with
the MDS and produces the low-dimensional embedding by
the eigen-decomposition of the N × N geodesic distance
matrix. When the number of samples increases, Isomap
has two computational bottlenecks: geodesic distance matrix
computation and MDS eigenvalue calculation [6]. To over-
come the computational limitations, Silva et al. proposed
L-Isomap in [14]. L-Isomap only compute the geodesic
distance matrix D
n,N
between n landmark points and N
data points instead of between all pairwise data points. MDS
is then applied to matrix D
n,N
and obtains the embedding
of the landmarks. The non-landmark points are projected
through the psedo inverse transformation of the landmark
coordinates. In this way, L-Isomap overcomes those two
inefficiencies of Isomap.
Landmark selection for L-Isomap is still an open question,
since the number and the distribution of the landmark points
can be arbitrary. As pointed out in [14], more landmarks
mean more stability. But too many landmarks will make
L-Isomap loss of its value. Meanwhile, poorly distributed
landmarks will distort the embedding result. So far, several
landmark selection approaches have been proposed. In [15],
landmarks were selected adopting the “minimum spanning
tree cut” method, which prefers to choose the set of boundary
samples as landmarks. The “maxmin” method, was proposed
to reduce uncertainty and improve landmarks for classifica-
tion [16, 17]. In [18], landmarks were selected based on
mixed-integer optimization.
In this paper, there are two main contributions. First, we
propose an alternative landmark selection method by find-
ing the minimum subset cover(MSC) of the neighborhood
graph. The problem is resolved by a greedy approximation
algorithm. Secondly, we apply this improved L-Isomap to
Internet traffic matrix to investigate its underlying nonlinear
structure.
The paper is organized as follows. Section II briefly
reviews the Isomap and L-Isomap. Section III gives the
landmark selection method based on a greedy algorithm.
Experiment results on synthetic and physical data sets are
given in Section IV, in order to validate the performance
of our method. Then we apply this improved L-Isomap to
Internet traffic matrix in Section V. Finally, conclusions and
future extensions are discussed in Section VI.
2015 IEEE 54th Annual Conference on Decision and Control (CDC)
December 15-18, 2015. Osaka, Japan
978-1-4799-7885-4/15/$31.00 ©2015 IEEE 7371