L-ISOMAP的新型地标点选择方法：互联网流量矩阵流形结构分析

45 浏览量更新于2024-08-26 收藏 1.4MB PDF 举报

"基于E-Isomap的互联网流量矩阵流形结构分析" 在当前的信息化时代，非线性降维（Nonlinear Dimensionality Reduction, NLDR）技术已经成为处理复杂数据集的关键工具，特别是在诸如化学信息学、地球科学、互联网流量分析等多元化研究领域。E-Isomap（Extended Isometric Feature Mapping）是Isomap算法的一种扩展，Isomap是一种有效的非线性流形学习方法，能够揭示高维数据中的内在几何结构。Isomap通过构建一个保持数据点间距离的低维嵌入来实现降维，从而在低维空间中捕捉数据的非线性特征。描述中的“基于E-Isomap的互联网流量矩阵流形结构分析”是指利用E-Isomap技术对互联网流量数据进行分析，目的是理解并解析流量数据的复杂结构。互联网流量数据通常具有高维度和非线性特性，传统的线性降维方法可能无法有效捕捉其内在模式。E-Isomap则能更好地处理这种数据，揭示出隐藏在网络流量矩阵背后的流形结构，有助于网络管理员监控网络状态、预测流量变化、检测异常行为以及优化网络资源分配。本文特别关注了L-Isomap（Landmark Isometric Feature Mapping），这是一种针对大规模数据集优化Isomap计算效率的方法。在L-Isomap中，不是对所有数据点执行复杂的计算，而是选择一部分代表性点作为地标点（Landmark Points）来近似整个数据集的流形。新提出的地标点选择方法首先寻找邻域集合的最小覆盖，得到地标候选点，然后去除其他点邻域内的点，剩余的候选点即为选定的地标点。这种方法可以减少计算量，提高算法的可扩展性，同时保持降维效果的准确性。实验部分，作者在合成数据集和实际物理数据集上验证了新方法的有效性。通过比较和分析实验结果，证明了该方法在降低计算复杂性的同时，能够有效地保持数据的流形结构，对于理解和分析互联网流量矩阵的复杂性具有重要意义。这篇研究论文探讨了一种用于互联网流量分析的新颖地标点选择策略，结合E-Isomap技术，为高效处理和理解大规模非线性数据提供了新的途径。这对于网络管理和优化、网络安全等领域具有重要的理论价值和应用前景。

A Novel Landmark Point Selection Method for L-ISOMAP

Hao Shi, Baoqun Yin, Yizhao Bao and Yingke Lei

Abstract— Isometric feature mapping (ISOMAP) presents

remarkable performance for nonlinear dimensionality reduc-

tion in diversiﬁed research domains. Landmark-ISOMAP(L-

ISOMAP) has been proposed to improve the scalability of

ISOMAP by performing the most complicated computations

on a subset of points referred as to landmarks. In this paper,

we present a novel landmark point selection method for L-

ISOMAP. The approach ﬁrst attempts to ﬁnd a minimum set

cover of the neighbourhood sets and get the corresponding data

points, referred as to landmark candidate points. After that,

it removes the points which belong to neighbour sets of other

points from the candidate point set and then the remaining can-

didate points are the landmarks. We run several experiments

on synthetic and physical data sets and the experiment results

validate the effectiveness of our proposed method.

I. INTRODUCTION

Nowadays nonlinear dimensionality reduction (NLDR)

has been paid extensive attention, since most emerging

applications, such as chemoinformatics [1], global climate

patterns [2], and face recognition [3] are concerned with

the high-dimensional attributes. Manifold learning is one of

the NLDR techniques[4]. Currently, many manifold learning

methods are available, such as ISOMAP [5], locally linear

embedding (LLE) [6], local tangent space alignment (LTSA)

[7], laplacian eigenmaps (LE) [8], and self-organizing maps

(SOMs) [9]. Among the above methods, ISOMAP a global

manifold learning method. It attempts to recover original

global geometry of observed data drawn from the nonlin-

ear high-dimensional manifold. So far, ISOMAP has been

applied in many different ﬁelds, such as stellar spectra [10],

protein interaction prediction [11] and damage localization

[12]. When the number of sample data points increases,

ISOMAP faces two computational bottlenecks: computing

shortest path matrix (SPM) and metric multidimensional

scaling (MDS) eigenvalue calculation [5]. To reduce the

computational complexity, Silva et al. proposed L-ISOMAP

[13]. L-ISOMAP only constructs the shortest path graph

This work is supported in part by the National Natural Science Foun-

dation of China under grant Nos. 61174124,61233003,61272333, in part

by Research Fund for the Doctoral Program of Higher Education of China

under grant No. 20123402110029 and in part by Natural Science Research

Program of the Anhui High Education Bureau of China under grant No.

KJ2012A286.

Hao Shi is with Department of Automation, University

of Science and Technology of China, 230027, Hefei, China

haoshi@mail.ustc.edu.cn

Baoqun Yin is with Department of Automation, University of Science and

Technology of China, 230027, Hefei, China

bqyin@ustc.edu.cn

Yizhao Bao is with Department of Automation, University

of Science and Technology of China, 230027, Hefei, China

zhxf325@mail.ustc.edu.cn

Yingke lei is Electronic Engineering Institute, 230027, Hefei, China

leiyingke@163.com

between each pair of data point and landmarks instead of

between all pairwise data points. In this way, L-ISOMAP

addresses both of these inefﬁciencies of ISOMAP.

How to designate landmarks for L-ISOMAP is still an

open question since the number and distribution of the

landmark points are uncertain. More landmarks mean more

stability [13]. Poorly distributed landmarks result in poor

embedding result. So far, several landmark selection ap-

proaches have been proposed. In [14], landmarks were

selected using the “minimum spanning tree cut” method,

which tends to choose the set of boundary samples as

landmarks. The “maxmin” method, was proposed to reduce

uncertainty and improve landmarks for classiﬁcation [15,16].

In [17], landmarks were selected based on mixed-integer

optimization. In our previous work [18], an approach was

developed which is based on Minimum Set Cover problem

which is called Fast-ISOMAP. It tries to ﬁnd a minimum

set cover of the neighbourhood sets by a greedy heuristic

algorithm and the corresponding points are determined to

landmark points. But the landmark points selected by this

mean may be neighbours of each other. Thus in this paper,

we further optimize the landmark point selection method.

The points whose neighbourhoods constitute the minimum

set cover of the whole neighbourhood graph are referred as to

landmark candidate points. We delete the ones which belong

to neighbour sets of other points from the set of candidate

points and the remaining candidate points are the landmarks.

The paper is organized as follows. Section II brieﬂy

reviews the ISOMAP and L-ISOMAP. Section III gives the

landmark selection method. Experiment results on synthetic

and physical data sets are given in Section IV, in order to

show the performance of our method. Finally, conclusions

and future extensions are discussed in Section V.

II. ISOMAP

AND L-ISOMAP

In this section, we brieﬂy review the algorithm for I-

SOMAP and L-ISOMAP [4,13].

A. ISOMAP

Let X = {x

, ··· ,x

}⊂M⊂R

, where M is a

manifold in D-dimensioanl Euclidean space R

. ISOMAP

attempts to unravel the d-dimensional embedding of X,

where d  D. ISOMAP consists of three steps. First, create

the neighborhood graph G by either using k nearest neighbor

(k-NN) or ε-ball rule. Secondly, compute the SPM,

by implementing Floyds algorithm or Dijkstras algorithm.

Finally, ﬁnd the d-dimensional embedding by means of MDS

algorithm. In practice, ISOMAP scales poorly to large data

12th IEEE International Conference on Control & Automation (ICCA)

Kathmandu, Nepal, June 1-3, 2016

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38690376

粉丝: 2
资源: 894

L-ISOMAP的新型地标点选择方法：互联网流量矩阵流形结构分析

基于贪心算法的L-Isomap地标选择方法及其应用

数据的多流形结构分析

E-Isomap在互联网流量矩阵分析中的应用

主成分分析-数据的多流形结构分析 (2).pdf

这是一个非常有用的程序主要是用来光谱降维处理的手段有PCA-LCA-LLE-Isomap等.zip

Manifold_ISOMAPmatlab_ISOMAP_isomap算法_repeatbek_流形学习_

Matlab系列--流形学习算法ISOMAP与LLE的matlab代码.zip

Matlab系列--目前多流形学习算法matlab代码.zip

2015年全国研究生数学建模竞赛B题《数据的多流形结构分析》论文及附件。.zip

lle+matlab+代码-manifoldAlgorithm:流形学习算法ISOMAP与LLE的matlab代码

最新资源