理解GBDT：Jerome Friedman的Gradient Boosting Machine解析

需积分: 50 171 浏览量更新于2024-07-18 收藏 3.7MB PDF 举报

"GBDT原始论文，由Jerome H. Friedman撰写，发表在2001年的《统计学年刊》第29卷第5期，深入探讨了Gradient Boosting Machine（GBM）的理论与应用。" Gradient Boosting Decision Trees（GBDT）是一种广泛应用的机器学习算法，它属于集成学习方法，通过构建一系列弱预测器并将其组合成一个强预测器来提升模型的性能。原始论文"Greedy Function Approximation: A Gradient Boosting Machine"由Jerry Friedman提出，他在文中详细阐述了GBDT的基本思想和工作原理。 GBDT的核心在于梯度提升（Gradient Boosting），这是一种迭代的过程，每次迭代都针对前一轮模型的残差或负梯度方向构建一个新的决策树。这样做是为了逐步减少目标函数的误差，从而提高整体预测的准确性。每一轮的决策树都尽可能地拟合前一轮模型的错误，这些决策树的预测结果被加权组合在一起，形成最终的预测模型。在Friedman的论文中，他不仅介绍了GBDT的基本框架，还讨论了如何选择合适的决策树结构、损失函数以及优化策略。其中，损失函数的选择对模型的性能至关重要，它可以是平方误差、绝对误差、二项逻辑回归等。在训练过程中，GBDT通过最小化梯度方向的损失来更新模型，这种贪婪的优化策略使得每一轮都能有效地改进模型。此外，论文还涵盖了GBDT的一些重要变种和扩展，例如，通过引入正则化防止过拟合，调整树的深度和复杂度以平衡模型的泛化能力和训练效率，以及如何处理分类问题。Friedman还讨论了GBDT与其他集成学习方法，如随机森林的比较，展示了GBDT在某些任务上的优势。 GBDT的实用性和高效性使其在许多领域都有广泛的应用，包括但不限于回归分析、分类问题、特征选择、计算机视觉和自然语言处理。由于其强大的建模能力，GBDT也是数据科学竞赛中的常客，并且在现代机器学习库（如XGBoost和LightGBM）中得到了优化实现，进一步提高了训练速度和性能。 "Greedy Function Approximation: A Gradient Boosting Machine"这篇论文是理解GBDT算法及其背后理论的关键资源，对于想要深入研究集成学习和决策树模型的学者和实践者来说，具有极高的参考价值。

GREEDY FUNCTION APPROXIMATION 1195

This implies that h(x; a) is fit (by least-squares) to the sign of the current

residuals in line 4 of Algorithm 1. The line search (line 5) becomes

Pm = argmin Yi - Fm,(xi) - ph(xt;am).

N yi -Fmi(xi)

(14) = argmin> 3h(xi;am). h(xa;am) - P

=medianw1yi-h( m-i(xi)1, Wi = jh(xi;al)j.

Here medianw{.} is the weighted median with weights wi. Inserting these

results [(13), (14)] into Algorithm 1 yields an algorithm for least absolute

deviation boosting, using any base learner h(x; a).

4.3. Regression trees. Here we consider the special case where each base

learner is an J-terminal node regression tree [Breiman, Friedman, Olshen

and Stone (1983)]. Each regression tree model itself has the additive form

( 15) h(x; ttbi, R j}J ) =Ebj 1(x E R j

i=l

Here {Rj}J are disjoint regions that collectively cover the space of all joint

values of the predictor variables x. These regions are represented by the ter-

minal nodes of the corresponding tree. The indicator function 1(.) has the

value 1 if its argument is true, and zero otherwise. The "parameters" of this

base learner (15) are the coefficients {bj}j, and the quantities that define the

boundaries of the regions {Rj}f. These are the splitting variables and the

values of those variables that represent the splits at the nonterminal nodes of

the tree. Because the regions are disjoint, (15) is equivalent to the prediction

rule: if x E R1 then h(x) = bj.

For a regression tree, the update at line 6 of Algorithm 1 becomes

(16) Fm(X) = Fm-i(X) + Pm E bjml(x e Rjm).

j=1

Here {Rjm}J are the regions defined by the terminal nodes of the tree at

the mth iteration. They are constructed to predict the pseudoresponses {i}ij

(line 3) by least-squares (line 4). The {bjm} are the corresponding least-squares

coefficients,

bjm = avexERjm i-

The scaling factor Pm is the solution to the "line search" at line 5.

This content downloaded from 113.140.11.123 on Thu, 19 Jul 2018 07:53:29 UTC

All use subject to http://about.jstor.org/terms

剩余44页未读，继续阅读

a_marker

粉丝: 4
资源: 8

理解GBDT：Jerome Friedman的Gradient Boosting Machine解析

GBDT论文.zip（三篇）

GBDT原始论文+XGB原始论文+陈天奇 ppt

BiLSTM－GBDT

详细介绍一下GBDT

GBDT怎么实现增量学习训练

xgboost和gbdt区别

gbdt筛选特征好用吗

GBDT max_depth

gbdt回归预测python

GBDT算法的Python实现

最新资源