RPCA经典论文：低秩与稀疏数据恢复的原理与应用

3星 · 超过75%的资源需积分: 50 127 浏览量更新于2024-07-22 2 收藏 2.22MB PDF 举报

马毅等人在2011年的《PAMI》杂志上发表了一篇名为"Robust Principal Component Analysis (RPCA)"的经典论文，这篇工作对后来的机器学习和数据恢复领域产生了深远影响。论文的核心研究问题是：当数据集由低秩（如图像的背景或主成分）和稀疏（如噪声或异常值）两部分组成时，如何通过一种系统化的方法准确地分离这两部分。论文的主要贡献在于提出了一个新颖的优化问题——Principal Component Pursuit (PCP)，这是一个结合了核范数（衡量矩阵的低秩性）和L1范数（衡量矩阵的稀疏性）的凸优化程序。通过最小化这两个范数的加权和，作者证明在某些合理假设下，即使数据集中存在相当比例的错误或缺失值，也能有效地恢复出原始的低秩和稀疏特征。这展示了RPCA作为一种鲁棒的主成分分析方法的可能性，因为即使在数据受到污染的情况下，依然可以提取出数据的主要结构。论文的算法设计对于实际应用至关重要，它提供了解决优化问题的有效途径。通过使用Lucas-Kanade方法，作者们不仅解决了理论问题，还展示了该方法在诸如图像去噪、异常检测和视频监控等领域的实用价值。此外，论文还包含了详细的实验结果和演示，以及可供读者参考的代码和相关会议文章，使得其他研究人员能够理解和复制这些成果。马毅的RPCA论文不仅深化了我们对复杂数据集结构的理解，还引入了一种强大的工具箱来处理现实世界中的数据恢复问题，特别是在面对大规模数据和不确定性的挑战时。这篇论文至今仍被广泛引用和扩展，成为了计算机视觉和机器学习领域的重要里程碑。

Robust Pr incipal Component Analysis? 11:7

particular, the clever golﬁng scheme [Gross 2011] plays a crucial role in our analysis,

and we introduce t wo novel modiﬁcations to this scheme.

Despite these similarities, our ideas depart from the literature on matrix completion

on several fronts. First, our results obviously are of a different nature. Second, we

could think of our separation problem, and the recovery of the low-rank component, as

amatrixcompletionproblem.Indeed,insteadofhavingafractionofobservedentries

available and the other missing, we have a fraction available, but do not know which

one, while the other is not missing but entirely corrupted altogether. Although, this

is a harder problem, one way to think of our algorithm is that it simultaneously de-

tects the corrupted entries, and perfectly ﬁts the low-rank component to t he remaining

entries that are deemed reliable. In this sense, our methodology and results go be-

yond matrix completion. Third, we i ntroduce a novel derandomization argument that

allows us to ﬁx the signs of the nonzero entries of the sparse component. We believe

that this technique will have many applications. One such application is in the area

of compressive sensing, where assumptions about the randomness of the signs of a

signal are common, and merely made out of convenience rather than necessity; this is

important because assuming independent signal signs may not make much sense for

many practical applications when the involved signals can all be nonnegative (such as

images).

We mentioned earlier the related work [Chandrasekaran et al. 2009], which also

considers the problem of decomposing a given data matrix into sparse and low-rank

components, and gives sufﬁcient conditions for convex programming to succeed. These

conditions are phrased in terms of two quantities. The ﬁrst is the maximum ratio

between the !

∞

norm and the operator norm, restricted to the subspace generated

by matrices whose row or column spaces agree with those of L

.Thesecondisthe

maximum ratio between the operator norm and the !

∞

norm, restricted to the subspace

of matrices that vanish off the support of S

.Chandrasekaranetal.[2009]showthat

when the product of t hese two quantities is small, then the recovery is exact for a

certain interval of the regularization parameter.

One very appealing aspect of this condition is that it is completely deterministic: it

does not depend on any random model for L

or S

.Ityieldsacorollarythatcanbe

easily compared to our result: suppose n

= n

= n for simplicity, and let µ

be the

smallest quantity satisfying (1.2), then correct recovery occurs whenever

max

{i :[S

]

)= 0}×

r/n < 1/12.

The left-hand side is at least as large as ρ

√

nr ,whereρ

is the fraction of entries

of S

that are nonzero. Since µ

≥ 1always,thisstatementonlyguaranteesrecovery

if ρ

= O((nr )

−1/2

); that is, even when rank(L

) = O(1), only vanishing fractions of the

entries in S

can be nonzero.

In contrast, our result shows that for incoherent L

,correctrecoveryoccurswith

high probability for rank(L

)ontheorderofn/[µ log

n]andanumberofnonzero

entries in S

on the order of n

.Thatis,matricesoflargerankcanberecoveredfrom

non-vanishing fractions of sparse errors. This improvement comes at the expense of

introducing one piece of randomness: a uniform model on the error support.

AdifferencewiththeresultsinChandrasekaranetal.[2009]isthatouranalysis

leads to the conclusion that a single universal value of λ,namelyλ = 1/

√

n,workswith

high probability for recovering any low-rank, incoherent matrix. In Chandrasekaran

et al. [2009], the parameter λ is data-dependent, and may have to be selected by solving

anumberofconvexprograms.ThedistinctionbetweenourresultsandChandrasekaran

et al. [2009] is a consequence of differing assumptions about the origin of the data

matrix M.Weregardtheuniversalityofλ in our analysis as an advantage, since it may

Journal of the ACM, Vol. 58, No. 3, Article 11, Publication date: May 2011.

11:8 E. J. Cand

es et al.

provide useful guidance in practical circumstances where the generative model for M

is not completely known.

1.6. Implications for Matrix Completion from Grossly Corrupted Data

We have seen that our main result asserts that it is possible to recover a low-rank

matrix even though a signiﬁcant fraction of its entries are corrupted. In some applica-

tions, however, some of the entries may be missing as well, and this section addresses

this situation. Let P

be the orthogonal projection onto the linear space of matrices

supported on % ⊂ [n

] × [n

X =

, (i, j) ∈ %,

0, (i, j) /∈ %.

Then, imagine we only have available a few entries of L

+ S

,whichweconveniently

write as

Y = P

obs

+ S

) = P

obs

+ S

;

that is, we see only those entries (i, j) ∈ %

obs

⊂ [n

] × [n

]. This models the following

problem: we wish to recover L

but only s ee a few entries about L

,andamongthosea

fraction happens to be corrupted, and we of course do not know which one. As is easily

seen, this is a signiﬁcant extension of the matrix completion problem, which seeks to

recover L

from undersampled but otherwise perfect data P

obs

We propose recovering L

by solving the following problem:

minimize !L!

∗

+ λ!S!

subject to P

obs

(L + S) = Y.

(1.5)

In words, among all decompositions matching the available data, Principal Component

Pursuitﬁnds the one that minimizes the weighted combination of the nuclear norm, and

of the !

norm. Our observation is that under some conditions, this simple approach

recovers the low-rank component exactly. In fact, the techniques developed in this

article establish this result:

THEOREM 1.2. Suppose L

is n × n, obeys the conditions (1.2)–(1.3),andthat%

obs

is uniformly distributed among all sets of cardinality m obeying m = 0.1n

.Suppose

for simplicity, that each observed entry is corrupted with probability τ independently

of the others. Then, t here is a numerical constant c such that with probability at least

1 −cn

−10

,PrincipalComponentPursuit (1.5) with λ = 1/

√

0.1nisexact,thatis,

L = L

provided that

rank(L

) ≤ ρ

nµ

−1

(log n)

−2

, and τ ≤ τ

. (1.6)

In this equation, ρ

and τ

are positive numerical constants. For general n

× n

rectan-

gular matrices, PCP with λ = 1/

0.1n

(1)

succeeds from m = 0.1n

corrupted entries

with probability at least 1 − cn

−10

(1)

,providedthatrank(L

) ≤ ρ

(2)

−1

(log n

(1)

)

−2

In short, perfect recovery from incomplete and corrupted entries is possible by convex

optimization.

On the one hand, this result extends our previous result in the following way. If

all the entries are available, that is, m = n

,thenthisisTheorem1.1.Onthe

other hand, it extends matrix completion results. Indeed, if τ = 0, we have a pure

matrix completion problem from about a fraction of the total number of entries, and

our theorem guarantees perfect recovery as long as r obeys (1.6), which, for large

values of r,matchesthestrongestresultsavailable.Weremarkthattherecoveryis

exact, however, via a different algorithm. To be sure, in matrix completion one typically

Journal of the ACM, Vol. 58, No. 3, Article 11, Publication date: May 2011.

剩余36页未读，继续阅读

等待不系之舟

粉丝: 5
资源: 10

RPCA经典论文：低秩与稀疏数据恢复的原理与应用

RPCA最全文献和代码

鲁棒主元分析（RPCA）

RPCA分解MATLAB

RPCA.rar_RPCA算法_pca_rpca_递归

论文研究-基于RPCA与三帧差分融合的运动目标检测.pdf

图像盲分离matlab代码-rPCA-MWF:rPCA-MWF

matlab RPCA程序代码.rar

基于RPCA的绝缘子表面缺陷检测方法

使用基因表达数据的基于RPCA的肿瘤分类

运动目标检测的L0群稀疏RPCA模型及其算法

最新资源