增量式邻接图流形学习算法

需积分: 9 82 浏览量更新于2024-10-13 收藏 2.36MB PDF 举报

"这篇文章主要探讨了使用邻接图进行增进式流形学习的方法，特别是针对高维数据的增量式等距嵌入。该研究由Dongfang Zhao和Li Yang发表在IEEE PAMI上，旨在解决非线性数据嵌入过程中因数据流的不完整或偏斜而带来的挑战。" 在流形学习领域，许多方法都采用自下而上的策略来捕获高维空间中数据流形的内在结构。这个过程通常始于构建一个连接的邻域图，以便所有数据点能被映射到一个单一的坐标系统中。然而，在许多应用中，需要对高维数据进行增量式的降维处理。当输入数据流不完全或随时间变化时，建立一个连接的邻域图对于增量数据嵌入的成功至关重要。这篇论文提出了一种算法，用于在添加新数据点或删除旧数据点后更新k-边连通和k-连通的邻域图。这些更新机制确保了即使数据集发生变化，邻域图仍能保持其连通性，从而维护数据点间的拓扑关系。此外，文章还介绍了一种简单的算法，用于更新邻域图上所有对之间的最短距离。这种最短距离的更新对于保持数据点间的相对距离信息至关重要，因为流形学习的目标之一就是尽可能保持原始数据点之间的局部几何结构。通过结合增量式古典多维尺度分析（incremental classical multidimensional scaling, iMDS），这些方法能够动态地适应数据的变化，而无需重新计算整个数据集的嵌入。这在处理大规模数据流或实时数据分析任务时尤其有用，因为它减少了计算复杂性和存储需求。 "基于邻接图的增进式流形学习"提供了一个高效且灵活的框架，用于处理高维数据的在线学习和降维问题。通过对邻域图的增量更新，以及对最短距离的管理，该方法能够在数据流变化时保持数据流形的结构，从而为非线性数据分析提供了新的工具。这对于诸如机器学习、模式识别、计算机视觉等领域的应用具有重要意义。

is the euclidean distance between its end vertices. Let

MST

denote the set of edges of a minimal spanning

tree (MST) of G, MST

¼ mstðV;EÞ.LetMST

denote the set of edges of an MST of G with MST

removed, that is, MST

¼ mstðV;E MST

Þ. Simi-

larly, let MST

denote the set of edges of an MST of

G with [

i1

j¼1

MST

removed, MST

¼ mstðV;E

[

i1

j¼1

MST

Þ:k-MST constructs the k-edge-connected

NG as ðV;[

i¼1

MST

Þ. It is easy to verify that the NG

is k-edge-connected: Taking any pair of vertices,

there must exist k paths between them, each of

which belongs to a distinct MST.

2. k-VC. k-VC uses a similar idea as Kruskal’s MST

algorithm by applying a greedy algorithm to construct

k-connected NGs. It adds each edge, in a nondecreas-

ing order of edge length, to a partially formed NG if

the end vertices of the edge are not yet k-connected.

The k-connectivity can be tested by querying whether

there are k vertex-disjoint paths between the two

vertices. Such a test can be performed using network

flow techniques [23] by assuming every vertex has a

unit flow capacity. Because edges are added in a

nondecreasing order of edge length, the constructed

NG consists of a set of the shortest possible edges to

keep the graph k-connected. The k-VC algorithm takes

advantage of the block property [17] of k-connectivity

in order to terminate the iterative process of edge

insertions early.

For incremental isometric data embedding, Law and Jain

[15] proposed an incremental version of Isomap for data

insertion. The algorithm works in three steps: First, it uses

k-NN (the -neighbor approach is not an option for

incremental embedding) to build NGs. When a new data

point u is inserted, it identifies new edges (between u and

its k nearest neighbors) to be inserted and replaces edges in

existing neighborhoods (if u is closer to an existing point v

than v’s kth nearest neighbor, u will replace the kth nearest

neighbor point in v’s neighborhood) in order to ensure that

every point connects to its k nearest neighbors. Second, it

updates the estimated geodesic distances between vertex

pairs that may possibly be affected by u. For edge deletion,

it does this by identifying all pairs of vertices whose

shortest paths may go through an edge to be dropped and

reruns the Dijkstra’s shortest path algorithm to calculate the

shortest distances between these pairs of vertices. A similar

process applies for edge insertion (note that all inserted

edges are incident to u). In the third step, the algorithm

calculates the low-dimensional coordinates of the new point

and updates the coordinates of the old points by using

subspace iteration with Ritz acceleration. Experiments show

that the algorithm produces similar results as the original

Isomap algorithm.

This paper is concerned with two issues associated with

incremental Isomap. The first issue is on the k-NN approach

to build NGs. Although it can be easily implemented, an

incremental version of k-NN would suffer from the same

problem in the batch

k-NN of building disconnected NGs. In

fact, the problem of disconnected NG is more serious in an

incremental setting because a continuous data stream may

have its data unevenly distributed from time to time even

though the whole data set is evenly distributed. Built with

k-NN, a connected NG may suddenly become disconnected

after a new data point comes in. In this paper, we propose two

algorithms to update NGs that guarantee k-edge-connectiv-

ity and k-connectivity, respectively, of the constructed NGs.

Consequently, this makes our method applicable to a large

variety of input data sets even when the data 1) are

undersampled, 2) are unevenly distributed, or 3) come in a

skewed sequence. Therefore, this paper presents techniques

that enable us to perform Isomap-like data embedding on

data sets in more realistic scenarios in the real world.

The second issue with incremental Isomap is on

updating APSDs. Edge deletion may break existing shortest

paths, whereas edge insertion may create the new shortest

paths. All shortest paths that go through the changed edges

need to be updated. The algorithm given in [15] works by

rerunning Dijkstra’s shortest path algorithm on a set of

vertices whose shortest paths may possibly be affected by

the graph change. In this paper, we present a simple

alternative to update all-pair shortest paths by following an

idea given in [18] on updating APSDs with edge insertion/

deletion in the context of database view maintenance. In

addition, we consider both data insertion and data deletion

in this paper. This enables incremental embedding of an

unbounded data stream by setting up a sliding window

over the data stream.

3UPDATING CONNECTED NGs

3.1 Updating k-Edge-Connected NGs

This section describes an incremental version of the k-MST

algorithm [25], which builds a k-edge-connected NG by

combining k MSTs.

There exist efficient algorithms [4], [22] for updating a

single MST to accommodate a new vertex. However, these

algorithms require that all new edges are incident to the

new vertex. This restriction makes them inapplicable to our

problem of updating k edge-disjoint MSTs to accommodate

a new vertex, where new edges to be inserted include not

only the ones incident to the new vertex but also the edges

that are removed from previous MSTs.

When a new vertex comes, we have k MSTs to update,

where edges removed from one MST are candidates to be

inserted into the next MST. The basic idea of our updating

algorithm is simple: Suppose there are n vertices on an

MST, a new vertex introduces n new candidate edges; when

a candidate edge is inserted into the MST, a cycle may be

created and the MST is then updated by dropping the

longest edge in the cycle. In implementation, such a cycle

can be easily found by using a rooted tree structure to

represent the MST, where each vertex except the root has

one parent vertex. The n candidate edges incident to the

new vertex are first attempted to be inserted into MST

. The

first edge is always inserted to join the new vertex to the

MST. Each of the following edges may either be dropped or

replace an existing edge on the MST. The uninserted edges

and the replaced edges are candidates to be inserted to

MST

, and so on. This process continues until MST

updated. There is no need for another round of update

because the edges left by MST

have no chances to be

inserted in MST

. Please note that the incremental k-MST

algorithm produces exactly the same NGs as the batch

version of k-MST. This is similar to the incremental k-NN

[15], which produces identical results as the batch k-NN.

The incremental k-MST algorithm is given in Algorithm 1.

Each MST has a root. A list, elist, is used to keep the candidate

edges to be inserted into each MST. It is initialized as the set of

88 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 31, NO. 1, JANUAR Y 2009

剩余12页未读，继续阅读

fms365

粉丝: 0
资源: 1

增量式邻接图流形学习算法

基于流形学习的Matlab代码.zip_matlab 流形学习_基于Matlab的流形学习的开发代码_流学习_流形学习 matl

流形学习算法matlab编写

L2_distance.zip_L2_distance_LE流形学习_流形学习 LE_流形学习 算法_流形学习LE

LPP.rar_LPP matlab_LPP算法_matlab LPP_流形学习_流形学习lpp

流形学习工具箱

流形学习MVU 算法

计算机研究 -基于流形学习的数据聚类与可视化.pdf

基于邻接图的面向对象遥感图像分割算法.caj

流形学习算法分析及比较

有关流形学习论文.docx

最新资源

L2_distance.zip_L2_distance_LE流形学习_流形学习 LE_流形学习算法_流形学习LE