PUDLE：反向传播加速内隐词典学习

版权申诉

198 浏览量更新于2024-07-06 收藏 818KB PDF 举报

"PUDLE: Implicit Acceleration of Dictionary Learning by Backpropagation" 在机器学习和信号处理领域，字典学习（Dictionary Learning）问题是一个重要的研究方向，它旨在通过少数基元（或称为“原子”）的线性组合来表示数据，从而学习到有效的数据表示。传统的字典学习算法通常采用稀疏编码和字典更新两个步骤交替进行，这一领域的理论研究已经相当成熟。近年来，随着神经网络模型的普及，尤其是可展开的稀疏编码网络，人们发现可以通过反向传播（Backpropagation）来执行字典学习。这种现象在实践中得到了广泛验证，但之前缺乏理论上的支持。"基于反向传播的PUDLE内隐词典学习加速"这篇论文填补了这一空白，提出了一个名为PUDLE（Provable Unfolded Dictionary Learning method）的可证明展开字典学习方法。 PUDLE的核心贡献在于首次提供了反向传播进行字典学习的理论证明。论文深入探讨了损失函数、网络展开以及反向传播对收敛性的影响。作者揭示了一种隐含的加速机制：随着网络展开层数的增加，反向传播梯度更新可以加速字典学习的收敛过程。在PUDLE中，损失函数的设计对于学习过程至关重要，它指导着稀疏编码和字典更新的方向。网络展开是将连续的优化过程转化为序列决策的离散化表示，这使得反向传播成为可能，同时也影响了学习的效率和精度。通过反向传播，模型能够有效地传播误差信息，逐层优化字典和编码，从而在迭代过程中逐渐提升字典的质量。此外，PUDLE还可能对传统字典学习算法的计算复杂性和内存需求产生积极影响，因为反向传播可以并行化，适应大规模数据集和高维度问题。这为实际应用中的字典学习提供了更高效的可能性，尤其是在计算资源有限的情况下。 PUDLE的工作为理解神经网络如何进行字典学习提供了理论基础，同时也为未来设计更高效、更适应实际场景的字典学习算法提供了新的思路。通过结合反向传播和网络展开，PUDLE有望在图像处理、语音识别、自然语言处理等需要数据表示学习的领域发挥重要作用。

condition (Lemma A.3) where we denote

(z, D) , L

(z, D) + h(z)

. One immediate observation

is that

λ ≥ kD

∞

⇔ {0} ∈ arg min f

(z, D)

. We assume

λ < kD

∞

and the solution to

(1)

is unique. Sufﬁcient conditions for uniqueness in the overcomplete case (i.e.,

p > m

) are extensively

studied in the literature [

]. Tibshirani discussed that the solution is unique with probability

one if entries of

are drawn from a continuous probability distribution [

] (Assumption 2.3). We

argue that as long as the data

x ∈ X

are sampled from a continuous distribution, this assumption holds

for the entire learning process. The assumption has been previously considered in analyses of unfolded

sparse coding networks [

] and can be extended to

regularized optimization problems [

Assumption 2.3

(Lasso uniqueness)

The entries of the dictionary

are continuously distributed.

Hence, the minimizer of (1) is unique, i.e., {

z} ∈ arg min f

(z, D) with probability one.

Lemma 2.1 states the ﬁxed-point property of the encoder recursion [

]. Given the deﬁnitions for

Lipschitz and Lipschitz differentiable functions, (Deﬁnitions A.1 and A.2), the loss

and function

satisfy following Lipschitz properties, which will play an important role in our analysis.

Lemma 2.1

(Fixed-point property of lasso)

Given Assumption 2.3, we have

0 ∈ ∇

z, D)+∂h(

The minimizer is a ﬁxed-point of the mapping, i.e.,

z = P

αh

(

z − α∇

z, D)) = Φ(

z) [53].

Lemma 2.2

(Lipschitz differentiable least squares)

Given

(z, D) =

kx − Dzk

, and As-

sumption 2.2, the loss is Lipschitz differentiable. Let

and

denote the Lipschitz constants of the

ﬁrst derivatives

∇

(z, D)

and

∇

(z, D)

and

the Lipschitz constants of the second

derivatives ∇

(z, D) and ∇

(z, D), all w.r.t z. Let ∇

(z, D) be L

-Lipschitz w.r.t D.

Lemma 2.3

(Lipschitz proximal)

Given

h(z) = λkzk

, its proximal operator has bounded sub-

derivative, i.e., k∂P

(z)k

≤ c

prox

3 Main Results

The gradients deﬁned in PUDLE (Algorithm 2) can be compared against the local direction at each

update of classical alternating-minimization (Algorithm 1). Assuming there are inﬁnite samples, i.e.,

Best local direction :

g , lim

n→∞

i=1

∇

(

, D) = E

x∈X

[∇

(

z, D)]

(5)

where

z = arg min

z∈R

(z, D) + h(z)

. Additionally, to assess the estimators for model recovery,

hence dictionary learning, we compare them against gradients that point towards

∗

and

, namely

Desired gradient for D

∗

: g

∗

, lim

n→∞

i=1

∇

i∗

, D) = E

x∈X

[∇

∗

, D)]

Desired gradient for

D :

g , lim

n→∞

i=1

∇

(

, D) = E

x∈X

[∇

(

z, D)]

(6)

where

∗

= arg min

z∈R

(z, D

∗

) + h(z)

. To see why the above are desired directions, we

highlight that

∗

, D

∗

)

is a critical point of the expected risk. Hence, given the current

, to reach

the critical point by gradient descent, we move towards the direction minimizing

x∈X

∗

, D)]

Similarly,

(

is a critical point of the loss

which also reaches zero for data following the

model

(3)

. Hence, to reach

D ∈ arg min

D∈D

x∈X

(

z, D)]

, we move towards the direction

minimizing the loss in expectation. Given these directions, we analyze the error of the gradients

dec

ae-lasso

, and g

ae-ls

assuming inﬁnite samples. In this regard, we ﬁrst study the forward pass.

3.1 Forward pass

We show two convergence results in the forward pass, one for z and another for the Jacobian, i.e.,

Deﬁnition 3.1 (Code Jacobian). Given D, the Jacobian of z

is deﬁned as J

∂z

∂D

The forward pass analysis gives upper bounds on the error between

and

and the error between

and

as a function of unfolded iterations

. We will require these errors in Section 3.2, where we

analyze the gradient estimation errors. Similar to [

], the error associated with

dec

depends on the

code convergence. Unlike

dec

, the convergence of backpropagation with gradient estimates

ae-lasso

and

ae-ls

relies on the convergence properties of the code and the Jacobian [

]. Forward-pass

theories are based on studies by Gilbert on the convergence of variables and their derivatives in an

iterative process governed by a smooth operator [

]. Moreover, Hale et al. studied the convergence

analysis of ﬁxed point iterations for

regularized optimization problems [

]. In Proposition 3.1,

we re-state a result from [55] on support selection.

剩余18页未读，继续阅读

易小侠

粉丝: 6571
资源: 9万+

PUDLE：反向传播加速内隐词典学习

long_double_float_implicit.rar_float

MATLAB.rar_Crank-Nicholson_crank_implicit_古典隐格式_隐格式

> 1267 - Illegal mix of collations (utf16_general_ci,IMPLICIT) and (utf8_bin,IMPLICIT) for operation '='

Illegal mix of collations (utf8mb4_0900_ai_ci,COERCIBLE) and (gbk_bin,IMPLICIT) for operation 'case'

Illegal mix of collations (utf8mb4_0900_ai_ci,IMPLICIT) and (utf8mb4_general_ci,IMPLICIT) for operation 'find_in_set'

Illegal mix of collations (utf8mb4_0900_ai_ci,IMPLICIT) and (utf8mb4_bg_0900_ai_ci,IMPLICIT) for operation '='

Illegal mix of collations (utf8_unicode_ci,IMPLICIT) and (utf8_general_ci,IMPLICIT) for operation '='

Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8_unicode_ci,IMPLICIT) for operation '='

ModuleNotFoundError: No module named 'depthwise_conv2d_implicit_gemm'

Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8_unicode_ci,IMPLICIT) for operation find_in_set

最新资源