强化结构保持非负矩阵分解：解决高维数据降维与噪声问题

179 浏览量更新于2024-07-15 收藏 2.9MB PDF 举报

非负矩阵分解（Nonnegative Matrix Factorization, NMF）作为一种线性降维方法，在机器学习和数据挖掘等领域中表现出广泛的应用。然而，它存在两个关键局限性。首先，NMF局限于欧氏空间中的语义分解，无法揭示高维数据分布的内在几何结构。这限制了其在处理复杂数据集时的表现，特别是在需要保留数据的原始结构和模式的情况下。在许多实际问题中，如图像识别或物体分析，这种对数据几何特征的忽视可能导致降维后的信息丢失。其次，NMF对噪声数据敏感，尤其是在真实世界的数据集中，噪声往往难以避免。这些噪声可能来源于测量误差、异常值或数据采样过程中的干扰。对于噪声数据的处理不力，可能导致NMF的结果被误导，降低模型的稳健性和准确性。为了克服这些问题，研究者们提出了Robust Structure Preserving Nonnegative Matrix Factorization（鲁棒结构保持非负矩阵分解）。这种方法旨在通过增强NMF对数据几何结构的保真度，并引入噪声抗性机制，提高模型的稳健性。具体来说，该方法可能包括以下几个关键步骤： 1. **结构保真性**：通过引入约束或者优化策略，使得分解后的因子矩阵能够更好地近似原数据矩阵的奇异值分解，从而更好地捕捉数据的低维几何特征。 2. **噪声抑制**：可能采用诸如迭代硬阈值（Iterative Hard Thresholding）或者稀疏编码等技术，去除或减弱噪声对分解结果的影响。这有助于减少噪声对数据表示的扭曲。 3. **迭代算法**：优化算法可能是交替最小化、块主成分分析（Block Principal Component Analysis, BPCA）或其他非负优化方法的改进版本，以确保在每个迭代步骤中都保持非负性和结构的稳定性。 4. **评估和选择**：可能引入新的评估指标，比如鲁棒性分数或者结构相似度，来衡量降维后的数据是否保留了原始数据的结构，并据此调整模型参数。 5. **应用领域**：尽管本文提到的Robust Structure Preserving NMF主要针对的是机器学习和数据挖掘，但其方法论可以广泛应用于诸如人脸识别（FACE RECOGNITION）、对象识别、图像分割和物体部分识别（PARTS）等领域，以提升这些应用的性能和鲁棒性。总结来说，Robust Structure Preserving Nonnegative Matrix Factorization是一种重要的改进，它在保持数据结构的同时提高了对噪声的抵抗能力，为解决高维数据中的复杂问题提供了有效的解决方案。通过结合先进的优化策略和鲁棒性处理，NMF得以在实际应用中发挥更大的潜力。

Mathematical Problems in Engineering 3

feature samples manifold structure are also considered.

Although the graph-based NMF algorithm incorporates the

prior knowledge of manifold structure into NMF framework,

overemphasizing local structure preservation sometimes will

degrade its performance. A structure preserving nonnega-

tive matrix factorization (SPNMF) is proposed in [10], in

which the intrinsic manifold is approximated with local

and distant graphs. Experiment results demonstrate SPNMF

outperforms GNMF and its variants.

All the abovementioned NMF variants are based on the

premisethattheinputdataarenoise-free;itisimpractical

to real-world data samples which are oen corrupted by

noise. For boosting NMF robustness to outliers, 

2,1

-norm

based NMF reconstruction function is applied in [41, 42]. In

[43], 

2,1

-norm and 

-norm are used to formulate the graph

embedding and data reconstruction function, which endows

the algorithm with robustness to unreliable graphs and noisy

labels.

Among the existing methods, SPNMF is closely relevant

to our method and performs better on some specic datasets.

ere are still some dierences between our proposed algo-

rithm and SPNMF. Firstly, the Laplacian matrix of distant

repulsion graph from SPNMF needs to be updated in each

iteration, which is time consuming. Additionally, to learn a

better parts-based representation, the basis vector of SPNMF

is required to be as orthogonal as possible. Finally, SPNMF is

sensitive to noisy data points.

3. The Proposed Method

Our goal is to perform matrix factorization on the hidden

semanticsubspace.Tothisend,weimposethelocalanity

and distant repulsion structure preservation on the new

robust NMF framework. In this section, we will rst explain

the structure preservation and robust data reconstruction

terms of RSPNMF, followed by its optimization algorithm.

e convergence of the proposed algorithm is proved nally.

3.1. Problem Formulation. Despite the diverse motivations of

various dimensionality reduction algorithms, most of them

can be explained within a graph embedding framework.

Let ={,}be an undirected weighted graph, where

vertex set  corresponds to a dataset and ∈R

𝑁×𝑁

an anity matrix whose elements measure the similarity of

each pair of vertices. In graph embedding algorithm, graph

characterizes the prior knowledge of geometric structure

from data distribution.

If we use the 

-nearest neighboring graph to character-

ize the local structure of data distribution, the weight matrix

=[

𝑖𝑗

]

𝑁×𝑁

canbedenedasfollows:



𝑖𝑗









−‖𝑥

𝑖

−𝑥

𝑗

‖

/𝜎

, if 

𝑖

∈

𝑘



𝑗

 or 

𝑗

∈

𝑘



𝑖

,

0, otherwise,

(4)

where is the bandwidth parameter and 

𝑘

(

𝑖

)denotes the

set of 

nearest neighbors of 

𝑖

Local invariance assumption which encourages neigh-

boring data pairs in the original space to be still close in the

low-dimensional embedding subspace can be formulated as

min

𝑉

𝑁



𝑖,𝑗=1



𝑖

−V

𝑗





𝑖𝑗

(5)

where V

𝑖

is the corresponding low-dimensional representa-

tion of 

𝑖

By dening the diagonal matrix and Laplacian matrix

of graph are as

=−,



𝑖𝑖



𝑖 =𝑗



𝑖𝑗

, ∀,

(6)

we can rewrite (5) as

min

𝑉

𝑁



𝑖,𝑗=1



𝑖

−V

𝑗





𝑖𝑗

=min

𝑉



𝑁



𝑖=1

𝑇

𝑖



𝑖𝑖

−

𝑁



𝑖,𝑗=1

𝑇

𝑖

𝑗



𝑖𝑗



=min

𝑉

Tr 

𝑇

−Tr 

𝑇



=min

𝑉

Tr 

𝑇

,

(7)

where Tr()denotes the trace of a matrix.

Equation (7) essentially holds the smoothness of

dimensionality reduction process. It is based on two vital

assumptions; rstly, the neighboring data samples in high-

dimensional space are semantic similarity, and secondly

the preservation of anity structure plays great role for

low-dimensional representation.

Local invariance assumption essentially exploits the

favorite relationship among similar data samples under unsu-

pervised condition; however, it ignores unfavorite relation-

ship between divergent data pairs. In this paper, we conjecture

that the distant data pairs are always semantic dierence. A

new distant neighboring graph 

𝐶

={,

𝐶

}which is used

to describe the repulsion relationship between dissimilar data

pairs is also constructed. e corresponding weight matrix



𝐶

=[

𝑐

𝑖𝑗

]

𝑁×𝑁

is dened as follows:



𝑐

𝑖𝑗









−‖𝑥

𝑖

−𝑥

𝑗

‖

/𝜎

, if 

𝑖

∈

𝑘



𝑗

 or 

𝑗

∈

𝑘



𝑖

,

1, otherwise,

(8)

where 

𝑘

(

𝑖

)denotes the remotest 

data samples of 

𝑖

givensample dataset.

From (8), we can we can nd that the more the distance

between 

𝑖

and 

𝑗

, the smaller the value of 

𝑐

𝑖𝑗

.Ifwewant

the corresponding low-dimensional representation V

𝑖

and V

𝑗

剩余14页未读，继续阅读

weixin_38675970

粉丝: 5

强化结构保持非负矩阵分解：解决高维数据降维与噪声问题

MahNMF Manhattan Non-negative Matrix Factorization

用于社区检测的类深度自动编码器非负矩阵分解——中南大学彭汪祺--论文研读笔记11

Robust watermarking based on DWT and nonnegative matrix factorization

关于稳健非负矩阵分解（Robust Non-Negative Matrix Factorization）方法进行高光谱影像解混的Matlab代码

Projective Robust Nonnegative Factorization

Robust Discriminative Nonnegative Dictionary Learning for Occluded Face Recognition

变分贝叶斯推断matlab代码-Robust-Streaming-Tensor-Factorization:稳健流张量因式分解

Feature Tracking for Robust Structure-from-Motion

Multi-core parallel robust structuredmultifrontal factorization method for large discretized PDEs

Passivity-based robust control for nonlinear feedback systems using robust right coprime factorization

最新资源