贪心算法优化的L-Isomap地标选择与应用研究

170 浏览量更新于2024-08-30 收藏 1.25MB PDF 举报

"基于贪心算法的L-Isomap地标选择方法及其在互联网流量矩阵分析中的应用" 在非线性降维技术中，Isometric Feature Mapping（Isomap）是一种广泛应用的方法，但其计算复杂度较高。为了提高效率，L-Isomap作为Isomap的变种应运而生。L-Isomap通过选取数据点子集作为地标点来简化嵌入计算，从而降低了计算负担。本文主要探讨了一种基于贪心算法的新型L-Isomap地标选择方法，并通过实验证明了该方法的有效性。贪心算法通常在每一步选择局部最优解，逐步构建全局解决方案。在L-Isomap的地标选择过程中，这种策略能有效地寻找具有代表性的地标点，以最小化总体误差并保持数据的拓扑结构。本文提出的贪心算法地标选择方法通过对数据点进行迭代处理，每次选择最能代表剩余数据点的点作为地标，直到满足预设的地标点数量。实验部分，作者们在合成数据集和物理数据集上对比了传统L-Isomap和新方法的效果。结果表明，所提出的贪心算法能够更有效地选择地标点，从而在保持降维效果的同时降低计算成本。此外，鉴于互联网流量矩阵数据分析的重要性，作者们将改进后的L-Isomap应用于实际的互联网流量矩阵数据。高维度的互联网流量数据给分析带来了挑战，而降维方法能够揭示数据的低维特性。实验结果显示，互联网流量矩阵具有较低的内在维度，并且确实存在一个低维流形结构。这为理解互联网流量模式提供了新的视角，有助于网络管理和优化。这篇研究论文贡献了一种基于贪心算法的L-Isomap地标选择策略，不仅提高了L-Isomap的计算效率，还在处理实际问题时揭示了互联网流量矩阵的低维结构，对网络分析和预测具有理论与实践价值。

A Landmark Selection Method for L-Isomap Based on Greedy

Algorithm and its Application

Hao Shi, Baoqun Yin, Xiaofeng Zhang, Yu Kang and Yingke Lei

Abstract— Isometric feature mapping (Isomap) is a widely-

used nonlinear dimensionality reduction method, but it suffers

from high computational complexity. L-Isomap is a variant of

Isomap which is faster than Isomap. In this algorithm, a subset

of points are chosen out of the total data points as landmark

points so as to simplify the embedding computation. In this

paper, we propose a novel landmark selection method for L-

Isomap based on a greedy algorithm. Experiments performed

on synthetic and physical data sets validate the effectiveness

of the proposed method. Internet trafﬁc matrix has been an

effective model to analyzing the Internet. However, the Internet

trafﬁc matrix data usually possesses high dimensionality. In this

paper, we apply the improved L-Isomap to the real Internet

trafﬁc matrix data to investigate its low-dimensional features.

The experiment results show that the Internet trafﬁc matrix

has a small intrinsic dimension and there indeed exists a low-

dimensional manifold structure.

I. INTRODUCTION

In the last few decades, dimensionality reduction (DR)

has gradually become a research hotspot, since most emerg-

ing applications are concerned with the high-dimensional

attributes [1-3]. Traditionally, DR was performed using linear

methods such as Principal Component Analysis (PCA) [4]

and Metric Multidimensional Scaling (MDS) [5]. However,

these linear techniques fail to deal with data with nonlinear

structures. Therefore, a vast number of nonlinear techniques

have drawn great interest. In contrast to the linear meth-

ods, the nonlinear techniques are capable of handling the

complex nonlinear data. Representative nonlinear methods

include Isomap [6], Locally Linear Embedding (LLE) [7]

and local tangent space alignment (LTSA) [8]. Of the list-

ed algorithms, Isomap is representative of global methods,

attempting to preserve intrinsic global properties of high-

dimensional observation data. Isomap has been applied in

This work is supported in part by the National Natural Science Foun-

dation of China under grant Nos. 61174124,61233003,61272333, in part

by Research Fund for the Doctoral Program of Higher Education of China

under grant No. 20123402110029 and in part by Natural Science Research

Program of the Anhui High Education Bureau of China under grant No.

KJ2012A286.

Hao Shi is with Department of Automation, University

of Science and Technology of China, 230027, Hefei, China

haoshi@mail.ustc.edu.cn

Baoqun Yin is with Department of Automation, University of Science and

Technology of China, 230027, Hefei, China bqyin@ustc.edu.cn

Xiaofeng Zhang is with Department of Automation, Universi-

ty of Science and Technology of China, 230027, Hefei, China

zhxf325@mail.ustc.edu.cn

Yu Kang is with Department of Automation, University

of Science and Technology of China, 230027, Hefei, China

kangduyu@ustc.edu.cn

Yingke lei is Electronic Engineering Institute, 230027, Hefei, China

leiyingke@163.com

many different domains, such as stellar spectra [9], the

detection of collective behaviors in animal species [10-11],

protein interaction prediction [12] and damage localization

[13]. Isomap incorporates geodesic distance between every

pair of N items on a weighted neighborhood graph with

the MDS and produces the low-dimensional embedding by

the eigen-decomposition of the N × N geodesic distance

matrix. When the number of samples increases, Isomap

has two computational bottlenecks: geodesic distance matrix

computation and MDS eigenvalue calculation [6]. To over-

come the computational limitations, Silva et al. proposed

L-Isomap in [14]. L-Isomap only compute the geodesic

distance matrix D

n,N

between n landmark points and N

data points instead of between all pairwise data points. MDS

is then applied to matrix D

n,N

and obtains the embedding

of the landmarks. The non-landmark points are projected

through the psedo inverse transformation of the landmark

coordinates. In this way, L-Isomap overcomes those two

inefﬁciencies of Isomap.

Landmark selection for L-Isomap is still an open question,

since the number and the distribution of the landmark points

can be arbitrary. As pointed out in [14], more landmarks

mean more stability. But too many landmarks will make

L-Isomap loss of its value. Meanwhile, poorly distributed

landmarks will distort the embedding result. So far, several

landmark selection approaches have been proposed. In [15],

landmarks were selected adopting the “minimum spanning

tree cut” method, which prefers to choose the set of boundary

samples as landmarks. The “maxmin” method, was proposed

to reduce uncertainty and improve landmarks for classiﬁca-

tion [16, 17]. In [18], landmarks were selected based on

mixed-integer optimization.

In this paper, there are two main contributions. First, we

propose an alternative landmark selection method by ﬁnd-

ing the minimum subset cover(MSC) of the neighborhood

graph. The problem is resolved by a greedy approximation

algorithm. Secondly, we apply this improved L-Isomap to

Internet trafﬁc matrix to investigate its underlying nonlinear

structure.

The paper is organized as follows. Section II brieﬂy

reviews the Isomap and L-Isomap. Section III gives the

landmark selection method based on a greedy algorithm.

Experiment results on synthetic and physical data sets are

given in Section IV, in order to validate the performance

of our method. Then we apply this improved L-Isomap to

Internet trafﬁc matrix in Section V. Finally, conclusions and

future extensions are discussed in Section VI.

2015 IEEE 54th Annual Conference on Decision and Control (CDC)

December 15-18, 2015. Osaka, Japan

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38733414

粉丝: 11

贪心算法优化的L-Isomap地标选择与应用研究

基于扩展的增量流形学习算法IMM-ISOMAP matlab源代码.zip

Isomap_ISOMAP_流行学习算法-ISOMAP_

matlabrandi函数源代码-PSO-ISOMAP:PSO-ISOMAP

Isomap2.rar_ISOMAP matlab_S-Isomap_isomap改进算法_改进isomap_瑞士卷

论文研究-有监督S-kv-Isomap在入侵检测中的应用.pdf

基于E-Isomap的互联网流量矩阵流形结构分析

基于P-ISOMAP特征脸谱的人脸识别.pdf

L-ISOMAP的新型地标点选择方法：互联网流量矩阵流形结构分析

增量学习算法IMM-ISOMAP：应对等维独立多流形

Matlab实现PSO-ISOMAP算法及其源代码解析

最新资源