高维预测的分位数随机森林方法

需积分: 38 195 浏览量更新于2024-07-18 1 收藏 310KB PDF 举报

《QuantileRegressionForest.pdf》是一篇发表在《Journal of Machine Learning Research》上的论文，由尼古拉·迈因施泰因在2006年提交并修订，最终于同年6月出版。该研究主要关注的是随机森林（Random Forest）技术在机器学习领域的一个重要扩展——分位数回归森林（Quantile Regression Forests）。随机森林最初在Breiman（2001）的论文中被引入，作为一种强大的工具，特别适用于高维回归和分类任务，其主要特点是能够提供对响应变量条件均值的精确估计。然而，本文的贡献在于揭示了随机森林的潜力不仅限于条件均值的估计。作者指出，通过量化回归森林，可以深入理解整个条件分布，而不仅仅是平均值。这使得算法能够非参数化且准确地估计高维预测变量的条件分位数，这对于需要处理不确定性和风险分析的场景具有重要意义。分位数回归森林算法展示了其一致性，这意味着在大量数据下，它能稳定地逼近真实的条件分位数。此外，论文还提供了数值例子，展示了该方法在预测性能上具有竞争力，证明了其在实际应用中的有效性。关键词包括“量化回归”、“随机森林”和“高维数据分析”。这篇论文是机器学习领域中一个重要的里程碑，它扩展了随机森林技术的适用范围，使其能够更全面地服务于数据分析和预测，特别是在需要估计数据分布特征而非单一均值的情况下。对于从事统计建模、风险管理或数据分析的专业人士来说，理解和掌握分位数回归森林是一个提升技能和解决复杂问题的重要途径。

Meinshausen

training data is used. In addition, only a random subset of predictor variables is considered

for splitpoint selection at each node. The size of the random subset, called mtry, is the

single tuning parameter of the algorithm, even though results are typically nearly optimal

over a wide range of this parameter. The value of mtry can be ﬁne-tuned on the out-of-bag

samples. For regression, the prediction of random forests for a new data point X = x is the

averaged response of all trees. For details see Breiman (2001). The algorithm is somewhat

related to boosting (Schapire et al., 1998), with trees as learners. Yet, with random forests,

each tree is grown using the original observations of the response variable, while bo os ting

tries to ﬁt the residuals after taking into ac count the prediction of previously generated

trees (Friedman et al., 2000).

Some Notation Following the notation of Breiman (2001), call θ the random param-

eter vector that determines how a tree is grown (e.g. which variables are considered for

splitpoints at each node). The corresponding tree is denoted by T (θ). Let B be the space

in which X lives, that is X : Ω 7→ B ⊆ R

, where p ∈ N

is the dimensionality of the

predictor variable. Every leaf ` = 1, . . . , L of a tree corresponds to a rectangular subspace

of B. Denote this rectangular subspace by R

⊆ B for every leaf ` = 1, . . . , L. For every

x ∈ B, there is one and only one leaf ` such that x ∈ R

(corresponding to the leaf that is

obtained when dropping x down the tree). Denote this leaf by `(x, θ) for tree T (θ).

The prediction of a single tree T (θ) for a new data point X = x is obtained by averaging

over the observed values in leaf `(x, θ). Let the weight vector w

(x, θ) be given by a positive

constant if observation X

is part of leaf `(x, θ) and 0 if it is not. The weights sum to one,

and thus

(x, θ) =

∈R

`(x,θ)

}

#{j : X

∈ R

`(x,θ)

}

. (4)

The prediction of a single tree, given covariate X = x, is then the weighted average of the

original observations Y

, i = 1, . . . , n,

single tree: ˆµ(x) =

i=1

(x, θ) Y

Using random forests, the conditional mean E(Y |X = x) is approximated by the averaged

prediction of k single trees, each constructed with an i.i.d. vector θ

, t = 1, . . . , k. Let w

(x)

be the average of w

(θ) over this collection of trees,

(x) = k

−1

t=1

(x, θ

). (5)

The prediction of random forests is then

Random Forests: ˆµ(x) =

i=1

(x)Y

The approximation of the conditional mean of Y , given X = x, is thus given by a weighted

sum over all observations. The weights vary with the covariate X = x and tend to be large

for those i ∈ {1, . . . , n } where the conditional distribution of Y , given X = X

, is similar to

the conditional distribution of Y , given X = x (Lin and Jeon, 2002).

986

剩余16页未读，继续阅读

我没有那种天分

粉丝: 8
资源: 2

高维预测的分位数随机森林方法

随机森林调用matlab代码做回归-QOOB:分位数袋外(QOOB)保形是一种用于预测推理的保形方法

基于EWT和分位数回归森林的短期风电功率概率密度预测

高分位数数据的分位数回归森林的扩展

分位数回归

scikit-garden:scikit-学习兼容树的花园

一种新的随机森林特征采样方法预测高维数据

koenker分位数回归

若依管理存在任何文件读取漏洞检测系统，渗透测试.zip

【java毕业设计】学生社团管理系统源码（完整前后端+说明文档+LW）.zip

【java毕业设计】音乐+商城的设计与实现源码（完整前后端+说明文档+LW）.zip

最新资源