ADMM驱动的多标签学习：流形规范化矩阵完成方法

需积分: 10 23 浏览量更新于2024-08-26 收藏 913KB PDF 举报

本文探讨了在多标签学习领域中利用交替方向乘子法(Alternating Direction Method of Multipliers, ADMM)实现流形正规化矩阵完成的理论与应用。多标签学习是一种广泛应用在自然语言处理、生物信息学、信息检索等多个领域的机器学习问题，目标是处理一个实例可能关联多个类别的标签情况。矩阵完成方法作为一种潜在有效的转化性多标签学习策略，通过构建一个联合矩阵来捕捉数据之间的内在关系。矩阵完成技术的核心思想是根据部分观察到的标签对整个标签矩阵进行填充或预测，以便更好地理解数据的结构和模式。ADMM作为一种强大的优化工具，它能够将复杂的优化问题分解成更易于管理的小问题，并在求解过程中保持全局收敛性。在这篇研究论文中，作者Bin Li, Yingming Li, 和 Zenglin Xu来自中国电子科技大学和浙江大学的SMILE实验室，他们提出了一种新的方法，即结合多标签学习的需求和流形假设，利用ADMM来执行矩阵完成任务。论文的关键点包括： 1. **流形假设**：论文假设数据分布遵循低维流形结构，即数据点在高维空间中紧密地聚类在一起，但其实质上只在低维空间中具有显著变化。这有助于减少噪声和提高模型的泛化能力。 2. **矩阵完成**：作者通过ADMM算法设计了一个优化框架，该框架能利用部分标签信息推断出完整的标签矩阵，同时考虑数据点之间的局部相似性，从而更好地适应多标签数据的复杂性。 3. **多标签学习方法**：论文提出的方法不仅解决了标签矩阵的填充问题，而且还能通过矩阵结构学习到不同类别之间的潜在关联，这在实际应用中具有重要的价值。 4. **接收与修订**：论文于2016年5月首次接收，经过多次修订后于2018年1月接受，同年2月在线发表，展示了作者们对该问题持续的研究和完善过程。这篇研究论文提供了深入理解多标签学习中流形正规化和ADMM结合的重要见解，为实际问题中的多标签数据处理提供了一种有效的数学工具，有望推动相关领域的发展。对于那些对机器学习、矩阵优化和流形分析感兴趣的研究人员和工程师来说，这是不可或缺的一份参考资料。

B. Liu et al. / Neural Networks 101 (2018) 57–67 59

to be predicted. Given the whole set of examples and partial label

information, we hope to complete the Y

In the following, we will first model the multi-label learning

as a matrix completion problem, then enhance it with manifold

regularization for better exploiting the intrinsic manifold structure

of data, followed by the optimization process with ADMM.

3.1. Matrix completion for multilabel learning

We first introduce an intermediate instance matrix (X

, X

such that the observed instance matrix (X

, X

) is sampled from

, X

) with i.i.d Gaussian noises: (X

, X

) = (X

+ ϵ, X

+ ϵ),

where ϵ

∼ N(0, σ

). Correspondingly, the intermediate label

matrix (Y

, Y

) ∈ R

t×n

can be represented as a linear combination

of instance matrix under a weight matrix W ∈ R

t×d

: (Y

, Y

) =

W(X

, X

The joint matrix is denoted by Z = (Y

, Y

; X

, X

). Given the

linear projection W, the ranks of these two matrices (X

, X

) and Z

satisfy (Goldberg et al., 2010),

rank

(

)

≤ rank((X

, X

)) + 1. (1)

Further, we construct an observed matrix M with partition of

four blocks M

, M

, and M

as follows,

M =









. (2)

where the blocks X

= M

, X

= M

, Y

= M

, and Y

= M

= 0.

Obviously, matrix M has same size with Z.

According to Eq. (1), the rank of Z should be much smaller

than its attribution dimensionalities. Specifically, the estimation

of unobserved part of Z can be treated as a matrix completion

problem, which relies on minimizing the rank of Z. However, it

is very difficult to directly solve a rank minimization problem.

Fortunately, rank minimization can be relaxed as minimization of

the nuclear norm of Z, that is ∥Z∥

∗



(Z) where σ

(Z) is the

kth singular value of Z (Candès & Recht, 2009; Candès & Tao, 2010).

Now with the consideration of the noises on (X

, X

) and (Y

), the

optimization problem can be formulated as follows (Cabral et al.,

2011; Goldberg et al., 2010),

min

µ∥Z∥

∗

|Ω



(i,j)∈Ω

t+i,j

, x

)

|Ω



(i,j)∈Ω

, y

)

s. t. z

= 1

⊤

, m = t + d + 1,

where Ω

denotes the index set of observed data entries of

, X

), i.e., 1 ≤ i ≤ d and 1 ≤ j ≤ n for any x

, and Ω

denotes the index set of known label entries of Y

. Note that in

the transduction setting, since all the instances are observed, Ω

actually refers to the indices of rows and columns of the whole

data. Here c

t+i,j

, x

) is the loss between elements of (X

, X

) and

, X

), and c

, y

) is the loss between the soft labels Y

and

actual labels Y

. In this paper, we choose the squared loss for c

(.),

namely c

t+i,j

, x

) =

t+i,j

− x

)

as in Goldberg et al. (2010).

For the label noise (from continuous z

∈ R to y

∈ {−1, +1}), we

choose c

, y

) =

log(1 + exp(−αz

)) as in Eq. (14) to model

the loss (Cabral et al., 2011). This loss between soft and binary

labels is little different from the definition in Goldberg et al. (2010).

In contrast to Goldberg et al. (2010), we use the loss c

(.) with

an additional parameter α because it has been reported a better

performance than the sigmoid function (Cabral et al., 2011). There

are many possible ways to implement the general mapping from

the predicted soft label Z

to the binary label. We will state one of

the possible solution in Section 5.

3.2. Manifold regularized matrix completion

Under the setting of semi-supervised learning, a natural as-

sumption is that if two instances x

and x

are close in the in-

trinsic geometry of the data distribution, then the corresponding

predicted labels Z

and Z

are also similar to each other. This

assumption is usually referred as smoothness (or manifold) as-

sumption (Belkin & Niyogi, 2001; Bengio, Delalleau, & Le Roux,

2006; Chapelle, Schölkopf, Zien, et al., 2006; Liu, Xu, Wu, & Wang,

2016).

Suppose the mapping from the instance x

to its label Z

can be

denoted by a function f (x

) = Z

, then the smoothness of f along

the geodesics in the compact manifold of the data can be measured

as ∥f ∥

In reality, the data manifold is usually unknown. A symmetric

weight matrix W was defined in (Zhu, Ghahramani, Lafferty, et al.,

2003),

= exp(−



m=1

− x

)

can be approximated to measure the smoothness ∥f ∥

. We further

define L = D − W , where D = diag(d

) is a diagonal matrix with

the d



. L ∈ R

n×n

is called graph Laplacian matrix (Chung,

1997).

Therefore, we hope the prediction for Y

should keep consis-

tency with their neighbors, which motivates the choice of the

quadratic loss (Cai, He, Wu, & Han, 2008; Zhu, 2005; Zhu et al.,

2003),

E(Z

) =



i,j

∥f (x

) − f (x

)∥

(3)



i,j

∥Z

− Z

∥



)

−



i,j

= Z

⊤

where ∥ · ∥

indicates L2 norm operation. Suppose that Z

argmin

E(Z

), we find that LZ

= 0. The LZ

= (D − W)Z

= 0

called harmonic property, it suggests that the value of predicted

label is the average of its neighbors’ value,



i,j



i,j



(4)

which satisfy the manifold assumption we mentioned at the begin-

ning of this section.

The matrix form of energy term of Eq. (3) is E(Z

) = Tr(Z

⊤

In this paper, we exploit the geometrical structure of the whole

data to regularize the multi-label learning problem and transform

it as a regularization term, as shown in Eq. (5):

min

∥Z∥

∗

+ Tr(Z

⊤

) (5)

s.t. C

, M

) = 0,

, M

) = 0.

It is used to ensure the label prediction among each class in Z

satisfy the local consistency, where Z

= (Y

, Y

Here in Eq. (5), the manifold term seems to be a ‘‘hard’’ regular-

ization over the nuclear loss, but apparently it is not. Actually, it is

a soft constraint that can be controlled by a parameter introduced

by ADMM during optimization. The details will be explained in

Section 4.3.

剩余10页未读，继续阅读

weixin_38686041

粉丝: 2
资源: 952

ADMM驱动的多标签学习：流形规范化矩阵完成方法

ADMM.rar_ADMM迭代算法_admm_naturalp1q_半正定_半正定矩阵

图像矩阵matlab代码-cuda-rpca-admm:使用ADMM进行优化以实现前景背景分离的鲁棒PCA的CUDA实现

ADMM:使用ADMM进行图像去卷积并使用U-Net进行去噪

最大限度地简化和放松的ADMM，用于正规化的极限学习机

使用ADMM从辅助信息进行有效的内核学习

使用ADMM解决group_lasso.rar

基于在线ADMM的极限学习机，用于稀疏监督学习

ADMM.rar_admm_admm优化算法_无线资源管理_机器学习 ADMM_资源管理

多分块ADMM优化正则化超限学习机：大数据环境下的高效算法

应用ADMM求解套索问题与矩阵求逆的优化方法

最新资源