坐标块梯度下降法解决线性约束非光滑可分优化

5星 · 超过95%的资源需积分: 49 67 浏览量更新于2024-07-29 收藏 506KB PDF 举报

“Block-Coordinate Gradient Descent方法是用于解决带有线性等式约束的非光滑可分优化问题的一种算法。该方法将坐标块选择基于Gauss-Southwell-q规则，确保足够的预测下降。该方法被证明能全局收敛到一阶稳定状态，并在局部误差边界假设下具有线性收敛率。如果函数f是具有Lipschitz连续梯度的凸函数，那么该方法在O(n^2/ϵ)迭代后可以找到一个满足 ϵ-最优解的解决方案。当m=1时，如果P是可分的，Gauss-Southwell-q规则可以在O(n)操作时间内实现，而对于m>1的情况，则需要O(n^2)操作。在支持向量机训练的特殊情况下，即f为凸二次函数，P可分，且m=1，这种复杂性分析尤其适用。” Block-Coordinate Gradient Descent（BCGD）方法是优化算法领域中的一个重要工具，主要用于解决包含线性约束条件的优化问题。这类问题通常涉及寻找一个n维实数向量，使得由光滑函数f和凸函数P组成的加权和最小化，同时满足m个线性等式约束。在这个方法中，坐标块的选择策略是关键。BCGD采用Gauss-Southwell-q规则，这是一种基于预测下降的策略，即每次迭代会选择那些预期会带来最大下降的坐标进行更新。这种方法确保了算法的全局收敛性，即算法最终会收敛到问题的一阶临界点，即所有坐标方向上的梯度都接近于零。进一步的，BCGD在特定条件下具有线性收敛率。如果函数f不仅是凸的，而且其梯度是Lipschitz连续的，那么算法将在O(n^2/ϵ)次迭代后找到一个近似最优解，其中 ϵ 是期望的精度。这意味着随着精度要求的提高，迭代次数将以线性速率增加。当目标函数P也是可分的，即它可以分解为每个坐标变量的独立函数，BCGD的实现效率得以提升。对于m=1的情况，即只有一个线性约束，坐标块的选择能在O(n)的时间内完成。然而，当约束数量m大于1时，这个过程的复杂性上升到O(n^2)，尽管如此，这仍然比全局优化方法通常的复杂度要低。在支持向量机（SVM）的训练场景下，这个问题的特性与BCGD的优势完全吻合。在SVM中，f通常是二次的并且是凸的，而P是可分的，因为SVM的目标是最小化间隔最大化，这可以通过求解带等式约束的二次规划问题来实现。因此，BCGD提供了一个高效且实用的求解策略，特别适用于大规模数据集的SVM训练。 Block-Coordinate Gradient Descent方法是一种有效的优化策略，它在处理线性约束的非光滑可分优化问题时展现出良好的收敛性和计算效率，尤其在支持向量机训练等实际应用中，其优势更为突出。

J Optim Theory Appl (2009) 140: 513–535 517

Various stepsize rules for smooth optimization [8, 9, 11] can be adapted to our

setting. The following Armijo rule, used in [6, 29], is simple, requires only function

evaluations, and seems effective in theory and practice.

Armijo Rule

Choose α

init

> 0 and let α

be the largest element of {α

init

}

j=0,1,...

satisfying

+α

) ≤F

) +α

σ

, (7)

where 0 <β<1, 0 <σ <1, 0 ≤γ<1, and



def

=∇f(x

)

+γd

+cQ(x

) −cQ(x

). (8)

Since B

0 and 0 ≤γ<1, we see from Lemma 2.1 that

+αd

) ≤ F

) +α

+o(α), ∀α ∈(0, 1],

and 

≤(γ −1)d

< 0, whenever d

=0. Since 0 <σ <1, this shows that

given by the Armijo rule is well deﬁned and positive. By choosing α

init

based on

the previous stepsize α

k−1

, the number of function evaluations can be kept small in

practice. Notice that 

increases with γ , so larger stepsizes will be accepted if we

choose either σ near 0 or γ near 1.

For convergence, the index subset J

must be chosen judiciously. We will choose

according to the Gauss-Southwell-q rule, which was introduced in [6]forthe

case of m = 0 and was shown in [6], [29] to be effective in theory and practice.

Speciﬁcally, let

(x;J)

def



∇f(x)

d +

Hd +cQ(x +d)−cQ(x)



d=d

(x;J)

, (9)

which is the predicted descent when x is moved along the direction d

(x;J).The

Gauss-Southwell-q rule chooses the index subset J

to achieve sufﬁcient predicted

descent, i.e.,

) ≤υq

;N), (10)

where D

0 (typically diagonal) and 0 <υ≤1. In fact, it sufﬁces that B



0 for our analysis. We will discuss in Sect. 6 how to efﬁciently implement this rule

when P is separable and piecewise-linear/quadratic.

3 Properties of Search Direction

In this section we derive various properties of the search direction d

(x;J) and

the corresponding predicted descent q

(x;J). These properties will be used in later

sections to analyze the convergence rate and the complexity of the CGD method.

Formally, we say that x ∈

is a stationary point of F

if x ∈ domF

and



(x;d) ≥ 0 for all d ∈

. The following lemma gives an alternative characteri-

zation of stationarity.

剩余22页未读，继续阅读

niedx2007

粉丝: 9
资源: 25

坐标块梯度下降法解决线性约束非光滑可分优化

wechat-Coordinate-wechat-Coordinate.rar

Lasso-Coordinate-Descent:L1正则化解决LASSO回归问题的循环和随机坐标下降算法的实现

LiuGangKingston-Nestable-coordinate-system-for-TikZ-circuits

google-api-services-coordinate-v1-rev43-1.16.0-rc.zip

ABS-ER-Coordinate Systems-v2.pdf

is-valid-coordinate:验证坐标

Space-time--vector-coordinate-system.rar_matlab例程_WORD_

Laravel开发-coordinate

calculat-the-coordinate.rar_标定板

worldwindjava源码-Geo-Coordinate-Conversion-Java:NASAWorldWind的坐标转换功能

最新资源