数据表示的区分性正交非负矩阵分解：灵活性与多领域应用

170 浏览量更新于2024-08-29 收藏 632KB PDF 举报

在多学科应用中，如人脸识别、文档聚类和协同过滤，学习具有信息量的数据表示至关重要。非负矩阵分解（NMF）作为一种强大的工具，被广泛用于挖掘数据内部的结构。然而，尽管许多NMF变体研究了数据的几何结构，但现有的工作往往忽视了由类别间差异揭示的判别信息。 "Discriminative Orthogonal Nonnegative Matrix Factorization with Flexibility for Data Representation"这篇论文提出了一种创新方法，它结合了非负矩阵分解的灵活性和判别性，旨在改进数据表示的效率和区分能力。传统的NMF通常假设数据在低维空间中的非负因子分解，这可能导致对不同类别间的区分不够明确。论文引入了“柔性正交性”概念，允许在保持正定性和可解释性的同时，增强模型对数据之间差异的捕捉。通过引入柔性正交约束，该方法能够更好地探索数据的内在结构，同时考虑类别间的分离度。这意味着模型不仅关注数据的局部结构，还重视全局的分类边界，从而提升数据表示的判别性。此外，论文还探讨了如何将这种新的NMF模型应用于复杂的数据集上，如高维人脸特征或文本数据，以实现更精确的聚类和预测。论文进一步讨论了基于流形的判别学习策略，即利用数据分布的低维嵌入来增强类别间的区分，同时保持内部数据点之间的相似性。这种方法有助于保留数据的自然结构，避免了过度拟合和信息丢失，提高了模型的泛化性能。 "Discriminative Orthogonal Nonnegative Matrix Factorization with Flexibility for Data Representation"通过融合正交性和灵活性，扩展了NMF在数据表示领域的应用，特别是在处理具有高度区分需求的多元任务时。其独特的优势在于既能提取丰富的内在结构，又能有效提升数据的判别能力，为多领域应用提供了强大的工具。

information was embedded in the coefﬁcient matrix towards a con-

strained matrix factorization in Liu, Wu, Cai, and Huang (2012).

We can easily observe that most of the above methods do not

employ the discriminant information of the data, which plays a

critical role in data analysis. Nevertheless, it is indeed a fact that

it seems very challenging to extract the discriminant information

of the data in a fashion of unsupervised learning, such as nonneg-

ative matrix factorization. This central problem is the focus of this

work and details are narrated in the following.

3. The proposed DON approach

In this section, we mainly introduce our approach, i.e., Discrim-

inative Orthogonal Nonnegative matrix factorization (DON). First,

we deduce the objective function from manifold discriminant

learning. Second, we present the optimization framework to solve

the objective function and the convergence proof of the multiplica-

tive update algorithm. In the end, we give a brief analysis on the

computational complexity of the proposed method. Prior to our ap-

proach, we begin with a short review of NMF.

3.1. A review of NMF

Given n data points with m features, we denote the input data by

the matrix X 2 R

mn

. Here, the symbol R

means the real data sets

with nonnegative elements. This collection of data is expected to be

categorized or partitioned into c groups. Nonnegative matrix factor-

ization (NMF) (Lee & Seung, 1999) aims to ﬁnd two non-negative

matrices B 2 R

mc

and V 2 R

nc

, such that the product of them

approximates the original data matrix as much as possible, i.e.,

X  BV

; ð1Þ

where B is treated as a basis matrix and V is a coefﬁcient matrix. In

practical applications, the reduced dimension c is generally much

lower than the rank of X, i.e., c  m; c  n. Each row

of the coef-

ﬁcient matrix is regarded as the low-dimensional data representa-

tion of the data point x

under the new basis.

3.2. Manifold discriminant learning

In this work, our goal is to obtain a good data representation

that preserves both the local geometrical structure and the global

discriminating information. Therefore, we propose to exploit the

manifold discriminant learning to achieve this goal. In particular,

we explicitly employ the local manifold learning to reﬂect the

intrinsic geometry of the data distribution, and the global discrim-

inant learning to equip the new data representation with the dis-

criminating power. In the following, we respectively describe the

two components.

3.2.1. Local manifold learning

In many real-world applications, data points often reside on a

much lower-dimensional manifold, which has attracted much

attention on manifold learning in recent years (Belkin & Niyogi,

2001). Generally, nearby data points will likely to have similar fea-

tures and be categorized or partitioned into the same group. It is

often assumed that nearby data points will also be close in the

new data space, called ‘‘manifold assumption’’ (Belin et al., 2006).

Therefore, if we introduce this to nonnegative matrix factorization,

i.e., local manifold learning, it is expected that nearby data points

within a small neighborhood and corresponding representations

share a common intrinsic geometrical structure.

To capture this intrinsic data structure, we take advantage of

the graph-based manifold method, which is also adopted in other

NMF variants (Cai et al., 2011a, 2011b). The input data points are

modeled as a graph with n vertices, an edge is established if two

data points are close in the k nearest neighborhood. In particular,

the weight matrix W should be designed to reﬂect the local rela-

tionship and there are several weighting schemes, such as binary

weights, gaussian weights and dot product weights (Belin et al.,

2006). Here, we adopt the gaussian weights, i.e.,



x

; if x

ðx

Þ or x

ðx

Þ;

0; otherwise:

ð2Þ

where

is the bandwidth parameter, N

ðx

Þ denotes the set of k

nearest neighbors of x

. The graph Laplacian matrix is L ¼ D  W,

where D

, and L is a discrete approximation to the La-

place–Beltrami operator on the manifold (Belkin & Niyogi, 2001).

Therefore, maximizing the smoothness of the new data representa-

tion is essentially to minimize the gap between each pairwise data

points within a small neighborhood in the lower-dimensional data

space (Cai et al., 2011b), i.e.,

min

TrðV

LVÞ; ð3Þ

where TrðÞ is the trace operator for matrix. In this way, the nearby

data points are encouraged to be as close as possible in the new data

space.

3.2.2. Global discriminant learning

In order to make the learned data representation characterize

the discriminating power, we attempt to discover the global dis-

criminant information embedded in the data space. Previous stud-

ies in Ye et al. (2007) and Yang et al. (2011) have shown that the

discriminant relation can be disclosed by introducing a scaled indi-

cator matrix and using the between-class scatter matrix and the

total scatter matrix of the data.

To achieve this, we follow Ye et al. (2007) and denote the indi-

cator matrix by Y 2f0; 1g

nc

. Then, its scaled indicator matrix is

expressed by

F ¼ YðY

YÞ

1=2

: ð4Þ

Each column of F is given by

¼ 0; ...; 0; 1; ...; 1

zﬄﬄﬄﬄ}|ﬄﬄﬄﬄ{

; 0; ...; 0

ﬃﬃﬃﬃ

; ð5Þ

where n

is the sample size of the j-th group C

. In our approach, we

expect the learned data representation V characterizes the structure

of F in order to capture the discriminating ability and yield promis-

ing learning results in the low-dimensional data space R

nc

. Conse-

quently, we use a very small constant

e to control the degree that

the data representation V approaches F, i.e., kV  Fk

6 e.

Deﬁne a centering matrix H ¼ I



, where 1

is a column

vector with all ones, I

is an identity matrix. Intuitively, we want to

maximize the inter-distance among groups with respect to the in-

tra-distance within individual groups. Let

X ¼ XH be the centered

matrix, then the between-cluster class matrix S

XFF

and the

total scatter matrix S

(Ye et al., 2007) should satisfy this

formulation

max

Tr½ðS

1

; ð6Þ

where the parameter

¼ 1e  10 is utilized to avoid the singular

problem. Since TrðF

HFÞ¼c  1 is a constant, (6) is equivalent to

(Yang et al., 2011)

min

Tr½F

ðH 

1

XÞF: ð7Þ

As a result, we can obtain a data representation characterizing

the discriminating power by minimizing the above equation.

P. Li et al. / Expert Systems with Applications 41 (2014) 1283–1293

1285

剩余10页未读，继续阅读

weixin_38747566

粉丝: 12

数据表示的区分性正交非负矩阵分解：灵活性与多领域应用

Semi-supervised Non-negative Matrix FactorizationBased on Semi-tensor Product

Robust Discriminative Nonnegative Dictionary Learning for Occluded Face Recognition

Constructing hierarchical visual tree for discriminative image representation and classification

Interval-valued analysis for discriminative gene selection and tissue sample classification using microarray data

LargeScaleCoverSongId:“Data Driven and Discriminative Projections for Large-Scale Cover Song Identification”一文的

Locally discriminative spectral clustering with composite manifold

Discriminative Reordering with Chinese Grammatical Relations Features

Matlab code for Learning a discriminative high-fidelity dictionary for SCSS

Distributed Stochastic Gradient Descent with Discriminative Aggregating

Discriminative_Training_for_HMM

最新资源