稀疏子空间聚类：算法、理论与应用解析

聚类

需积分: 43 21 浏览量更新于2024-07-18 收藏 2.08MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"稀疏子空间聚类：算法、理论与应用" 在高维数据日益增多的现代世界中，如图像、视频、文本、网络文档、DNA微阵列数据等，稀疏子空间聚类（Sparse Subspace Clustering, SSC）已经成为一种重要的数据分析方法。这篇由Ehsan Elhamifar和Rene Vidal撰写的论文深入探讨了SSC的算法、理论及其在实际应用中的价值。稀疏子空间聚类的核心思想是，许多高维数据实际上紧密地分布在几个低维子空间中，这些子空间对应于不同的类别或模式。当处理这类数据时，目标是识别并分组那些属于相同子空间的数据点。SSC算法通过寻找数据点的稀疏表示来实现这一目标，即选择少量来自同一子空间的其他点来近似表示一个数据点。这种稀疏表示的选择是通过解决一个优化问题来实现的，该问题旨在找到最小化非零元素数量的同时最大化表示的准确性。由于原问题（即寻找最稀疏表示）通常属于NP难问题，论文中提出了一个凸松弛的解决方案。这种方法在某些条件下，即使数据点分布在重叠的子空间中，也能有效地推断出数据的聚类结构。作者们分析了这些条件，并证明了在满足这些条件时，该算法能够准确地恢复子空间的结构。在理论部分，论文详细讨论了算法的收敛性和稳定性，为SSC提供了坚实的数学基础。此外，还介绍了如何将得到的稀疏表示嵌入到谱聚类框架中，以确定数据点的最终聚类结果。谱聚类是一种利用数据的相似性矩阵构造图，并通过图的谱分解来划分节点的方法，它在处理非凸和非凸簇方面表现出色。在应用方面，SSC已被广泛应用于图像集的自动分类、视频序列分析、基因表达数据的解析等多个领域。例如，在图像聚类中，SSC可以帮助识别和归类具有相似特征的图像；在视频分析中，它可以发现和跟踪场景中的运动物体；而在生物信息学中，它有助于揭示基因表达的潜在模式。 "稀疏子空间聚类：算法、理论与应用"这篇论文不仅为理解高维数据的低维结构提供了一种有效工具，而且为实际问题的解决提供了理论支持。通过深入研究SSC算法，我们可以更好地理解和处理复杂数据集，从而在各个领域推动数据驱动的决策和发现。

资源详情

资源推荐

However, the representation of yyyy

in the dictionary YYYY is not

unique in general. This comes from the fact that the number

of data points in a subspace is often greater than its

dimension, i.e., N

‘

. As a result, each YYYY

‘

, and conse-

quently YYYY , has a nontrivial nullspace, giving rise to

infinitely many representations of each data point.

The key observation in our proposed algorithm is that

among all solutions of (2),

there exists a sparse solution, cccc

, whose nonzero entries correspond

to data points from the same subspace as yyyy

. We refer to such a

solution as a subspace-sparse representation.

More specifi cally, a data point yyyy

that lies in the

‘

-dimensional subspace S

‘

can be written as a linear

combination of d

‘

other points in general directions from S

‘

As a result, ideally, a sparse representation of a data point

finds points from the same subspace where the number of

the nonzero elements corresponds to the dimension of the

underlying subspace.

For a system of equations such as (2) with infinitely

many solutions, one can restrict the set of solutions by

minimizing an objective function such as the ‘

-norm of the

solution

minkcccc

s:t:yyyy

¼ YYYYcccc

¼ 0: ð3Þ

Different choices of q have different effects in the obtained

solution. Typically, by decreasing the value of q from

infinity toward zero, the sparsity of the solution increases,

as shown in Fig. 3. The extreme case of q ¼ 0 corresponds to

the general NP-hard problem [51] of finding the sparsest

representation of the given point, as the ‘

-norm counts the

number of nonzero elements of the solution. Since we are

interested in efficiently finding a nontrivial sparse repre-

sentation of yyyy

in the dictionary YYYY , we consider minimizing

the tightest convex relaxation of the ‘

-norm, i.e.,

minkcccc

s:t:yyyy

¼ YYYYcccc

¼ 0; ð4Þ

which can be solved efficiently using convex programming

tools [48], [49], [50] and is known to prefer sparse solutions

[29], [30], [31].

We can also rewrite the sparse optimization program (4)

for all data points i ¼ 1; ...;N in matrix form as

minkCCCCk

s:t:YYYY ¼ YYYYCCCC; diagðCCCCÞ¼00; ð5Þ

where CCCC ¼

½cccc

cccc

... cccc

2IR

NN

is the matrix whose

ith column corresponds to the sparse representation of

yyyy

, cccc

, and diagðCCCCÞ2IR

is the vector of the diagonal

elements of CCCC.

Ideally, the solution of (5) corresponds to subspace-

sparse representations of the data points, which we use next

to infer the clustering of the data. In Section 4, we study

conditions under which the convex optimization program

in (5) is guaranteed to recover a subspace-sparse represen-

tation of each data point.

2.2 Clustering Using Sparse Coefficients

After solving the proposed optimization program in (5),

we obtain a sparse representation for each data point whose

nonzero elements ideally correspond to points from the

same subspace. The next step of the algorithm is to infer

the segmentation of the data into different subspaces using

the sparse coefficients.

To address this problem, we build a weighted graph

G¼ðV; E;WWWWÞ, where V denotes the set of N nodes of the

graph corresponding to N data points and EVV

denotes the set of edges between nodes. WWWW 2 IR

NN

is a

symmetric nonnegative similarity matrix representing the

weights of the edges, i.e., node i is connected to node j by

an edge whose weight is equal to w

. An ideal similarity

matrix WWWW , hence an ideal similarity graph G, is one in which

nodes that correspond to points from the same subspace are

connected to each other and there are no edges between

nodes that correspond to points in different subspaces.

Note that the sparse optimization program ideally

recovers to a subspace-sparse representation of each point,

i.e., a representation whose nonzero elements correspond to

points from the same subspace of the given data point. This

provides an immediate choice of the similarity matrix as

WWWW ¼jCCCCjþjCCCCj

. In other words, each node i connects itself

to a node j by an edge whose weight is equal to jc

jþjc

The reason for the symmetrization is that, in general, a data

point yyyy

‘

can write itself as a linear combination of some

points including yyyy

‘

. However, yyyy

may not necessarily

choose yyyy

in its sparse representation. By this particular

choice of the weight, we make sure that nodes i and j get

connected to each other if either yyyy

or yyyy

is in the sparse

representation of the other.

2768 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 35, NO. 11, NOVEMBER 2013

Fig. 3. Three subspaces in IR

with 10 data points in each subspace, ordered such that the fist and the last 10 points belong to S

and S

respectively. The solution of the ‘

-minimization program in (3) for yyyy

lying in S

for q ¼ 1; 2; 1 is shown. Note that as the value of q decreases, the

sparsity of the solution increases. For q ¼ 1, the solution corresponds to choosing two other points lying in S

1. The ‘

-norm of cccc

2 IR

is defined as kcccc

j¼1

2. To obtain a symmetric similarity matrix, one can directly impose the

constraint of CCCC ¼ CCCC

in the optimization program. However, this results in

increasing the complexity of the optimization program and, in practice,

does not perform better than the postsymmetrization of CCCC, as described

above. See also [52] for other processing approaches of the similarity matrix.

剩余16页未读，继续阅读

netysh609760329

粉丝: 5
资源: 18

稀疏子空间聚类：算法、理论与应用解析

Sparse Subspace Clustering基于人脸分割的子空间聚类的原始代码

Sparse subspace clustering算法代码

Sparse Subspace Clustering的论文和源码

软子空间聚类_【子空间聚类】Sparse Subspace Clustering(SSC) Algorithm

diffusion-based sparse subspace clustering

压缩感知tval3参考书

bash: .get/info/sparse-checkout: No such file or directory

子空间聚类最新英文文献有哪些，请给出年限与DOI号

E = sparse(1:n,label,1,n,k,n);

def sparse_to_tuple(sparse_mx):

贝叶斯压缩感知算法有哪些

用C++语言根据三元组的抽象数据类型的定义，使用三元组表实现一个稀疏矩阵。三元组的基本功 能： 1、三元组的建立 2、三元组转置 3、三元组相乘 4、其他：自定义操作 编写测试 main()函数测试三元组的正确性

cplex求解固定车场的车辆路径问题MATLAB代码

python数据结构与算法利用三元组给稀疏矩阵的元素赋值即执行A[i][j]=x

eigen 稀疏矩阵赋值

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning

请问怎么安装torch_sparse模块

torch.sparse_csr

ould not find a version that satisfies the requirement issparse (from versions: none)

最新资源

用C++语言根据三元组的抽象数据类型的定义，使用三元组表实现一个稀疏矩阵。三元组的基本功能： 1、三元组的建立 2、三元组转置 3、三元组相乘 4、其他：自定义操作编写测试 main()函数测试三元组的正确性