is the euclidean distance between its end vertices. Let
MST
1
denote the set of edges of a minimal spanning
tree (MST) of G, MST
1
¼ mstðV;EÞ.LetMST
2
denote the set of edges of an MST of G with MST
1
removed, that is, MST
2
¼ mstðV;E MST
1
Þ. Simi-
larly, let MST
i
denote the set of edges of an MST of
G with [
i1
j¼1
MST
j
removed, MST
i
¼ mstðV;E
[
i1
j¼1
MST
j
Þ:k-MST constructs the k-edge-connected
NG as ðV;[
k
i¼1
MST
i
Þ. It is easy to verify that the NG
is k-edge-connected: Taking any pair of vertices,
there must exist k paths between them, each of
which belongs to a distinct MST.
2. k-VC. k-VC uses a similar idea as Kruskal’s MST
algorithm by applying a greedy algorithm to construct
k-connected NGs. It adds each edge, in a nondecreas-
ing order of edge length, to a partially formed NG if
the end vertices of the edge are not yet k-connected.
The k-connectivity can be tested by querying whether
there are k vertex-disjoint paths between the two
vertices. Such a test can be performed using network
flow techniques [23] by assuming every vertex has a
unit flow capacity. Because edges are added in a
nondecreasing order of edge length, the constructed
NG consists of a set of the shortest possible edges to
keep the graph k-connected. The k-VC algorithm takes
advantage of the block property [17] of k-connectivity
in order to terminate the iterative process of edge
insertions early.
For incremental isometric data embedding, Law and Jain
[15] proposed an incremental version of Isomap for data
insertion. The algorithm works in three steps: First, it uses
k-NN (the -neighbor approach is not an option for
incremental embedding) to build NGs. When a new data
point u is inserted, it identifies new edges (between u and
its k nearest neighbors) to be inserted and replaces edges in
existing neighborhoods (if u is closer to an existing point v
than v’s kth nearest neighbor, u will replace the kth nearest
neighbor point in v’s neighborhood) in order to ensure that
every point connects to its k nearest neighbors. Second, it
updates the estimated geodesic distances between vertex
pairs that may possibly be affected by u. For edge deletion,
it does this by identifying all pairs of vertices whose
shortest paths may go through an edge to be dropped and
reruns the Dijkstra’s shortest path algorithm to calculate the
shortest distances between these pairs of vertices. A similar
process applies for edge insertion (note that all inserted
edges are incident to u). In the third step, the algorithm
calculates the low-dimensional coordinates of the new point
and updates the coordinates of the old points by using
subspace iteration with Ritz acceleration. Experiments show
that the algorithm produces similar results as the original
Isomap algorithm.
This paper is concerned with two issues associated with
incremental Isomap. The first issue is on the k-NN approach
to build NGs. Although it can be easily implemented, an
incremental version of k-NN would suffer from the same
problem in the batch
k-NN of building disconnected NGs. In
fact, the problem of disconnected NG is more serious in an
incremental setting because a continuous data stream may
have its data unevenly distributed from time to time even
though the whole data set is evenly distributed. Built with
k-NN, a connected NG may suddenly become disconnected
after a new data point comes in. In this paper, we propose two
algorithms to update NGs that guarantee k-edge-connectiv-
ity and k-connectivity, respectively, of the constructed NGs.
Consequently, this makes our method applicable to a large
variety of input data sets even when the data 1) are
undersampled, 2) are unevenly distributed, or 3) come in a
skewed sequence. Therefore, this paper presents techniques
that enable us to perform Isomap-like data embedding on
data sets in more realistic scenarios in the real world.
The second issue with incremental Isomap is on
updating APSDs. Edge deletion may break existing shortest
paths, whereas edge insertion may create the new shortest
paths. All shortest paths that go through the changed edges
need to be updated. The algorithm given in [15] works by
rerunning Dijkstra’s shortest path algorithm on a set of
vertices whose shortest paths may possibly be affected by
the graph change. In this paper, we present a simple
alternative to update all-pair shortest paths by following an
idea given in [18] on updating APSDs with edge insertion/
deletion in the context of database view maintenance. In
addition, we consider both data insertion and data deletion
in this paper. This enables incremental embedding of an
unbounded data stream by setting up a sliding window
over the data stream.
3UPDATING CONNECTED NGs
3.1 Updating k-Edge-Connected NGs
This section describes an incremental version of the k-MST
algorithm [25], which builds a k-edge-connected NG by
combining k MSTs.
There exist efficient algorithms [4], [22] for updating a
single MST to accommodate a new vertex. However, these
algorithms require that all new edges are incident to the
new vertex. This restriction makes them inapplicable to our
problem of updating k edge-disjoint MSTs to accommodate
a new vertex, where new edges to be inserted include not
only the ones incident to the new vertex but also the edges
that are removed from previous MSTs.
When a new vertex comes, we have k MSTs to update,
where edges removed from one MST are candidates to be
inserted into the next MST. The basic idea of our updating
algorithm is simple: Suppose there are n vertices on an
MST, a new vertex introduces n new candidate edges; when
a candidate edge is inserted into the MST, a cycle may be
created and the MST is then updated by dropping the
longest edge in the cycle. In implementation, such a cycle
can be easily found by using a rooted tree structure to
represent the MST, where each vertex except the root has
one parent vertex. The n candidate edges incident to the
new vertex are first attempted to be inserted into MST
1
. The
first edge is always inserted to join the new vertex to the
MST. Each of the following edges may either be dropped or
replace an existing edge on the MST. The uninserted edges
and the replaced edges are candidates to be inserted to
MST
2
, and so on. This process continues until MST
k
is
updated. There is no need for another round of update
because the edges left by MST
k
have no chances to be
inserted in MST
1
. Please note that the incremental k-MST
algorithm produces exactly the same NGs as the batch
version of k-MST. This is similar to the incremental k-NN
[15], which produces identical results as the batch k-NN.
The incremental k-MST algorithm is given in Algorithm 1.
Each MST has a root. A list, elist, is used to keep the candidate
edges to be inserted into each MST. It is initialized as the set of
88 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 31, NO. 1, JANUAR Y 2009