Robust Subspace Segmentation by Low-Rank Representation
Guangcan Liu
†
roth@sjtu.edu.cn
Zhouchen Lin
‡
zhoulin@microsoft.com
Yong Yu
†
yyu@apex.sjtu.edu.cn
†
Shanghai Jiao Tong University, NO. 800, Dongchuan Road, Min Hang District, Shanghai, China, 200240
‡
Microsoft Research Asia, NO. 49, Zhichun Road, Hai Dian District, Beijing, China, 100190
Abstract
We propose low-rank representation (LRR)
to segment data drawn from a union of mul-
tiple linear (or affine) subspaces. Given a
set of data vectors, LRR seeks the lowest-
rank representation among all the candidates
that represent all vectors as the linear com-
bination of the bases in a dictionary. Unlike
the well-known sparse representation (SR),
which computes the sparsest representation
of each data vector individually, LRR aims
at finding the lowest-rank representation of a
collection of vectors jointly. LRR better cap-
tures the global structure of data, giving a
more effective tool for robust subspace seg-
mentation from corrupted data. Both the-
oretical and experimental results show that
LRR is a promising tool for subspace segmen-
tation.
1. Introduction
In scientific data analysis and system engineering, one
usually needs a parametric model to characterize a
given set of data. To this end, linear models such
as the linear subspaces
1
are possibly the most com-
mon choice, mainly because they are easy to compute
and are also often effective in real applications. So the
1
There is no loss of generality in assuming that the sub-
spaces are linear, i.e., they all contain the origin. For the
affine subspaces that do not contain the origin, we can al-
ways increase the dimension of the ambient space by one
and identify each affine subspace with the linear subspace
that it spans. So we always use the term “subspace” to de-
note “linear subspace” and “affine subspace” in this work.
App earing in Proceedings of the 27
th
International Confer-
ence on Machine Learning, Haifa, Israel, 2010. Copyright
2010 by the author(s)/owner(s).
subspaces have been gaining much attention in recent
years. For example, the hotly discussed matrix compe-
tition (Cand`es & Recht, 2009; Keshavan et al., 2009;
Cand`es et al., 2009) problem is essentially based on
the hypothesis that the data is drawn from a low-rank
subspace. However, a given data set is seldom well
described by a single subspace. A more reasonable
model is to consider data as lying near several sub-
spaces, leading to the challenging problem of subspace
segmentation. Here, the goal is to segment (or cluster)
data into clusters with each cluster corresponding to
a subspace. Subspace segmentation is an important
data clustering problem as it arises in numerous re-
search areas, including machine learning (Lu & Vidal,
2006), computer vision (Ho et al., 2003), image pro-
cessing (Fischler & Bolles, 1981) and system identifi-
cation.
Previous Work. According to their mechanisms
of representing the subspaces, existing works can be
roughly divided into four main categories: mixture
of Gaussian, factorization, algebraic and compressed
sensing.
In statistical learning, mixed data are typically mod-
eled as a set of independent samples drawn from a
mixture of probabilistic distributions. As a single
subspace can be well modeled by a Gaussian dis-
tribution, it is straightforward to assume that each
probabilistic distribution is Gaussian, so known as
the mixture of Gaussian model. Then the prob-
lem of segmenting the data is converted to a model
estimation problem. The estimation can be per-
formed either by using the Expectation Maximization
(EM) algorithm to find a maximum likelihood esti-
mate, as done in (Gruber & Weiss, 2004), or by iter-
atively finding a min-max estimate, as adopted by K-
subspaces (Ho et al., 2003) and Random Sample Con-
sensus (RANSAC) (Fischler & Bolles, 1981). These
methods are sensitive to the noise and outliers. So