L1-2度量下的稳健主成分分析：一种新的数据恢复方法

184 浏览量更新于2024-08-26 收藏 355KB PDF 举报

"基于L1-2度量的稳健主成分分析" 在当前的机器学习和数据挖掘领域，主成分分析(PCA)是一种广泛应用的技术，用于降维和数据清理。然而，当数据受到大规模异常值或噪声的影响时，传统的PCA可能会失效。为了解决这一问题，"基于L1-2度量的稳健主成分分析"提出了一个新的方法，即RPCA-L1-2，以更准确地恢复受损的低秩矩阵。 RPCA（Robust Principal Component Analysis）是近年来发展起来的一种新方法，它的目标是将一个数据矩阵分解为低秩矩阵和稀疏矩阵的和。这种分解方式通过最小化核范数（nuclear norm）和L1范数的加权组合来实现，假设错误矩阵是稀疏的，并用L1范数来量化误差。然而，L1范数的使用往往会导致估计偏差，使得解的精度不尽如人意。 L1-2度量，即L1范数与L2范数之差，被提出作为L0范数的一个近似。L0范数衡量的是非零元素的数量，而L1-2度量比L1范数更好地逼近了L0范数，这使得它在处理稀疏性时更加有效。受此启发，该研究提出了RPCA-L1-2方法，用L1-2度量来量化数据误差，以期望获得更高的恢复精度。 RPCA-L1-2的求解采用了DC（Difference of Convex）算法，这是一种处理非凸优化问题的策略，通过迭代将非凸问题转化为一系列凸子问题进行求解。这种方法的优势在于能够在处理复杂约束条件下找到局部最优解，适用于解决 RPCA-L1-2 中的低秩和稀疏性问题。在实际应用中，例如图像去噪、视频背景建模、网络异常检测等领域，RPCA-L1-2可以提供更好的性能，因为它能更好地应对大规模异常值的影响，提高数据恢复的准确性。通过利用L1-2度量的特性，RPCA-L1-2能够更好地识别和隔离异常或噪声，从而保留更多的有效信息。基于L1-2度量的稳健主成分分析是PCA理论的一个重要扩展，它为处理有噪声和异常值的数据集提供了一种更健壮的工具。这项研究对于理解和改进数据降维和异常检测的算法具有重要意义，为未来的研究和应用提供了新的视角和方法。

Robust Principal Component Analysis Based On L

1í2

Metric

Fanlong Zhang, Zhangjing Yang, Minghua Wan

Guowei Yang

School of Technology

Nanjing Audit University

Nanjing, China

csfzhang@126.com, yzj@nau.edu.cn, wmh36@sina.com, ygw_ustb@163.com

Abstract—Robust principal component analysis (RPCA) is a

new emerging method for exact recovery of corrupted low-

rank matrices. Given a data matrix, RPCA can decompose it

into the sum of a low-rank matrix and a sparse matrix

exactly by minimizing a weighted combination of the nuclear

norm and the L

norm. It assumes that the error matrix is

sparse, and metric it by L

norm. However, L

norm often

leads to bias estimation and the solution is not as accurate as

desired. Recently the difference of L

and L

norms, called

1-2

metric, is proposed as the approximation to the L

norm.

Motivated by the L

1-2

metric’s better approximation to the L

norm than L

norm, this paper presents a method called

robust principal component analysis based on L

1-2

metric

(RPCA-L

1-2

) for recovering the corrupted data. This method

measures the data error by the L

1-2

metric. Moreover,

RPCA-L

1-2

is solved by DC (difference of convex functions)

programming. Extensive experiments on removing occlusion

from face images and background modeling from

surveillance videos demonstrate the effectiveness of the

proposed methods.

Keywords- robust principal component analysis; low-rank;

1-2

metric; DC programming

NTRODUCTION

Principal component analysis (PCA) [1] is widely

investigated and applied in pattern recognition and

machine learning for subspace learning and feature

extraction. PCA, however, is sensitive to outliers. To

overcome the limitations of PCA, a surge of robust

principal component analysis methods have been

proposed. Wright et al. recently established a robust

principal component analysis (RPCA) [2] [3] method,

which assumes the error matrix is sparse and the clean

data matrix is low rank. Under the restricted isometry

property (RIP) condition, RPCA can decompose the

corrupted data into the sum of a low-rank matrix and a

sparse matrix exactly by minimizing a weighted

combination of the nuclear norm and the L

norm. As an

important extension of RPCA, the low-rank representation

(LRR) [4], [5] was presented to segment subspace from a

union of multiple linear subspaces. LRR sought the lowest

rank representation among all the candidates that

represent all vectors as the linear combinations of the

basis vectors in a dictionary. Like RPCA, LRR also

assumes the error term is sparse.

Most of the abovementioned methods characterize the

error via L

or L

norm, which both are convex

regularizers. However, convex regularizers often lead to

inaccurate solution [6]. As a result, many nonconvex

regularizers are designed, such as capped-L

norm [6], L

norm [7], and log-sum-penalty [8]. Recently the

difference of L

and L

norms [9] [10], called L

1-2

metric,

is proposed as the nonconvex regularizer to the L

norm.

1-2

metric is nonconvex yet Lipschitz continuous. The

computation results [10] show that even if the RIP

condition is unsatisfying, the L

1-2

metric can work well

than existing nonconvex regularizers.

Inspired by the L

1-2

metric’s better approximation to

the L

norm, this paper presents a method called robust

principal component analysis based on L

1-2

metric

(RPCA-L

1-2

) for recovering the corrupted data. The

RPCA-L

1-2

measures the data error by the L

1-2

metric,

instead of L

metric in RPCA.

Although the L

1-2

metric is non-convex, it can be

decomposed into the difference of two convex functions.

Then the DC programming [11] [12] can be employed to

solve our model. The “DC” means “difference of convex

functions”. DC programming is a special kind of

optimization method, whose objective function can be

decomposed into the difference of two convex functions.

In [10], the DC programming is also employed for solving

compressed sensing based on L

1-2

metric. Furthermore,

Lou and Man [17] derive an analytical solution for the

proximal operator of the L

1-2

metric, which makes some

fast L

solvers applicable for L

1-2

The contributions include two aspects. (1) A robust

data recovery model is proposed, called RPCA-L

1-2

. The

motivation is that the L

1-2

metric is better approximation

to the L

norm than L

norm. (2) The DC programming is

employed to solve proposed model. DC algorithm

decomposes the original problem into series RPCA

problems, which can be solved efficiently by inexact

augmented Lagrange multiplier algorithms (inexact ALM)

[11].

The rest of this paper is organized as follows. Section

II reviews the related work. Section III presents our model

and corresponding algorithm. Section IV reports

experimental results. Section V offers conclusions.

II.

ELATED

ORKS

Given a data set

= [ , ,..., ]

Xxx x

, where each

is a

sample. The nuclear norm of the matrix X is defined

|| ||

, which is the sum of the singular values

of X. Besides, the L

and L

norms of a matrix X are

defined by

|| || ( ) ,XX

Fij

|| || | |=

respectively, where

means the (i,j)-th entry.

A. RPCA

The data X is usually corrupted. RPCA tries to

decompose X into two matrices D and E, where the

2017 4th IAPR Asian Conference on Pattern Recognition

DOI 10.1109/ACPR.2017.8

394

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38609732

粉丝: 8

L1-2度量下的稳健主成分分析：一种新的数据恢复方法

pca kernel pca 主成分分析 核主成分分析 Matlab算法源代码

通过稳健的主成分分析进行运动物体检测

基于加权核范数和L2,1范数的最优均值线性分类器.docx

深入了解SVM分类器：稀疏性与稳健性解析

【主成分分析】：降维技术，提升无监督学习效能

【主成分分析：从入门到精通】：12个实用技巧让你成为PCA专家（附案例分析）

模型统计与度量方法：EMF模型分析的终极指南

【股票市场深度学习策略】：基于DeepLOB的分析与应用

ResNet50模型在金融科技中的应用：赋能金融风险管理和预测，助力金融行业更稳健

【模型无关特征选择】：掌握基于特征子集搜索的高级方法

最新资源

pca kernel pca 主成分分析核主成分分析 Matlab算法源代码