堆叠回归：提高预测精度的方法

需积分: 9 173 浏览量更新于2024-07-18 收藏 685KB PDF 举报

"stacked regression 技术细节，用于 Kaggle 竞赛的堆叠回归方法" 在机器学习领域，堆叠回归（Stacked Regression）是一种提高预测准确性的技术，尤其在数据竞赛如 Kaggle 中广泛应用。该方法通过线性组合不同的预测器，如回归树、线性子集回归和岭回归，来创建一个更强大的预测模型。堆叠回归的思路最初由 Wolpert（1992）提出。 1. 堆叠回归的基本原理堆叠回归的核心在于利用交叉验证数据和非负约束的最小二乘法来确定不同预测器之间的权重。首先，将原始数据集划分为训练集和验证集。然后，对每个预测器（如决策树、随机森林等）在训练集上进行训练，并在验证集上计算其预测结果。这些预测结果作为新的特征，与原始输入特征一起构成一个新的数据集。 2. 非负约束与权重确定在新的数据集中，每个样本包含了所有预测器的预测值。接着，使用非负约束的最小二乘法来确定这些预测器的权重，确保它们在组合中都是正向贡献的。这样可以避免一个预测器被其他预测器的负权重抵消，从而保证了模型的整体预测能力。 3. 效果展示堆叠回归的效果通常通过比较不同大小的决策树或线性子集回归、岭回归等模型的组合表现来验证。在实际应用中，它经常能提供优于单一预测器的预测精度，尤其是在复杂数据集和多变量问题上。 4. 为何有效堆叠回归之所以有效，有几个主要原因：首先，通过集成学习，它可以捕获多个预测器之间的互补性，使得模型能更好地适应数据的复杂性；其次，非负约束确保了预测器之间的协同作用，而不是竞争关系；最后，交叉验证确保了模型对未见过的数据有较好的泛化能力。 5. 应用与关键词堆叠回归的应用广泛，关键词包括：堆叠、非负性、决策树、子集回归和组合方法。这种方法可以应用于各种数值型目标变量的预测任务，比如金融市场的波动预测、气象预报、医学诊断等领域。 6. 结论堆叠回归是机器学习中一种强大的工具，它通过集成多种预测模型，提升了预测的准确性。理解和掌握堆叠回归的方法，对于提升数据科学项目中的模型性能至关重要，尤其是在面临高维度、复杂关联的数据时。

52 L. BREIMAN

Now consider imposing the non-negativity constraints on the {c~k } together with the addi-

tional constraint ~ c~k = 1. For any {ak} satisfying the constraints c~k _> 0, ~k c~k = 1,

v(x) = ~ akvk(x)

is an "interpolating" predictor. That is, for every value of x,

minvk(m) < v(x) <rn~xvk(x).

So, what our procedure does is to find the best "interpolating" predictor.

To do some more exploration, note that

n k i,j

where

Rij

is the matrix of residual products

= Z(w - v,(xn))(w - vj(x.)).

Now suppose we want to determine the {ak} as the minimizers of

(xtRc~

under the non-

negativity and sum one constraints (the cross-validation data is used in practice to determine

the {ak} but we ignore this to present the conclusions in a simple form). An important

question is: under what conditions on the matrix R is the best single predictor also the best

stacked predictor?

By the best single predictor we mean that vk such that Rkk = mini

Rjj.

Then the

following holds:

THEOREM 1

The best single predictor Vk is also the best stacked predictor if and only if

Rkk <_ Rik, all i.

Proof: Since the minimization of

c~t_Ro~

under o~ >_ 0, ~ ak = I is a quadratic program-

ming problem, the Kuhn-Tucker conditions are necessary and sufficient for a solution (see,

for example, Luenberger (1984)). In this case the conditions are that there is a )~, and vector

/z such that

Ro~=A+#

where ak > 0 ~ #k = 0 and ak = 0 =v Pk >_ 0. Ifctj =

~k(i)

is the solution, then

implying that Rkk = ,k and Rik _> Rkk. Conversely, suppose that

Rkk <_ Rik.

Then for

a~ = 6k(i), ~-2~ Rji~k(i) = Rjk.

Putting )~ = Rkk gives

= )~ +/~ where # has the

requisite properties. •

剩余15页未读，继续阅读

redfox_no1

粉丝: 0
资源: 2

堆叠回归：提高预测精度的方法

regression

Stacking:机器学习集成模型之堆叠各种模型及工具源码-机器学习

Using MATLAB to Implement Long Short-Term Memory (LSTM) Networks for Classification and Regression ...

stacked denoising autoencoder

matplotlib-3.6.3-cp39-cp39-linux_armv7l.whl

numpy-2.0.1-cp39-cp39-linux_armv7l.whl

基于springboot个人公务员考试管理系统源码数据库文档.zip

onnxruntime-1.13.1-cp310-cp310-win_amd64.whl

基于springboot的西山区家政服务网站源码数据库文档.zip

Linux环境下，关于C++静态库的封装和调用代码

最新资源