VC理论驱动的回归模型复杂度控制

下载需积分: 10 | PDF格式 | 273KB | 更新于2024-07-30 | 85 浏览量 | 举报

本文主要探讨了在回归分析中如何通过VC维理论来控制模型复杂性，以实现更有效的预测性能。标题"Model Complexity Control for Regression Using VC Generalization Bounds"明确指出了研究的核心议题——利用VC（Vapnik-Chervonenkis）理论来确定模型的适宜复杂度，从而避免过拟合问题。在统计学习理论中，一个重要的概念是理解数据集的容量，即模型能够泛化到新数据的能力。对于给定的数据量，存在一个理想的模型复杂度，对应于最小的预测误差或“泛化误差”。为了从有限样本中学习，方法必须具备控制模型复杂度的能力，这通常通过惩罚项（如正则化）、权重衰减（在神经网络中常见）以及贪心策略（例如构造、生长或修剪方法）来实现。然而，许多现有的模型选择方法依赖于不同的渐近分析，这些方法试图估计预测风险的长期行为。这些方法虽然有其优点，但可能无法捕捉到具体样本下的实际性能。相比之下，Vapnik提出的非渐近预测风险界限基于VC理论，它提供了对模型复杂度更为严格的控制。VC维是一种度量模型复杂度的工具，它衡量了模型能否区分任意数量的点集，从而反映了模型的“学习能力”。论文深入介绍了如何将VC理论应用于回归问题，特别是使用平方损失函数的情况。通过VC界限，作者提出了一个实用的框架，可以帮助研究人员在构建回归模型时动态调整参数，以找到既能有效拟合训练数据又能保持良好泛化性能的平衡点。这种方法的优势在于它提供了一个更加稳健且数据驱动的方式来评估和选择模型，特别是在处理小型和中型数据集时，相比于传统的经验法则，具有更强的实际指导意义。这篇论文不仅阐述了VC理论在回归模型选择中的应用，而且还提供了实施步骤和技术细节，为实践者在实际工作中如何控制模型复杂性，优化回归预测性能提供了有价值的知识和工具。读者可以借此深入了解如何在数据驱动的环境中，利用理论界的先进思想，提升模型的稳健性和实用性。

展开

CHERKASSKY et al.: MODEL COMPLEXITY CONTROL FOR REGRESSION 1077

(guaranteed) bounds on the prediction risk based on VC-theory

have been proposed in [1].

A. Classical Model Selection Criteria

There are two general approaches for estimating prediction

risk for regression problems with ﬁnite data. One is based

on data resampling. The other approach is to use analytic

estimates of the prediction risk as a function of the empirical

risk (training error) penalized (adjusted) by some measure of

model complexity. Once an accurate estimate of the prediction

risk is found it can be used for model selection by choosing the

model complexity which minimizes the estimated prediction

risk. In the statistical literature, various prediction risk esti-

mates have been proposed for model selection (in the linear

case). In general, these estimates all take the form of

estimated risk

(7)

where

is a monotonically increasing function of the ratio

of model complexity (degrees of freedom)

and the training

sample size

[6]. The function is often called a penalization

factor because it inﬂates the average residual sum of squares

for increasingly complex models. The following forms of

have been proposed in the statistical literature:

ﬁnal prediction error (FPE)

Schwartz’ criterion (SC)

generalized cross-validation (GCV)

Shibata’s model selector (SMS)

All these classical approaches are motivated by asymptotic

arguments for linear models and therefore apply well for large

training sets. In fact, for large

, prediction estimates provided

by FPE, GCV, and SMS are asymptotically equivalent. More-

over, the model selection criteria above are all based on a

parametric philosophy. That is, the goal of model selection is

to select the terms of the approximating function in order to

match the target function (under the assumption that the target

function is contained in a set of linear approximating func-

tions). The sucess of the model selection criteria is measured

according to this philosophy [7], [10], [11], [12]. This classical

approach can be contrasted to the VC-theory approach, where

the goal of model selection is to choose the approximating

function with the lowest prediction risk (irrespective of the

number of terms chosen).

Classical model selection criteria were designed with spe-

ciﬁc applications in mind. For example, FPE was originally de-

signed for model identiﬁcation for autoregressive time series,

and GCV was developed as an estimate for cross-validation

(itself an estimate of prediction risk) in spline smoothing. It

is only SC and SMS which were developed for the generic

problem of regression. Note that application of the minimum

description length (MDL) arguments [13] yields a penalization

factor identical to the Schwartz criterion, though the latter was

derived using Bayesian formulation. A recent model selection

criteria described in [14] has the goal of minimizing prediction

risk for regression.

Typically, the classical criteria are constructed by ﬁrst deﬁn-

ing the prediction risk in terms of the linear approximating

function. Then, asymptotic arguments are used to develop limit

distributions for various components of the prediction risk,

leading to an asymptotic form of the prediction risk. Finally,

the data, along with estimates of the noise variance are used

to estimate the expected value of the asymptotic prediction

risk. For example the FPE criterion depends on assuming a

gaussian distribution to develop the asymptotic prediction risk.

In addition, FPE depends on an estimate of the noise variance

given by

There are several common assumptions underlying all these

model selection criteria:

1) The target function is linear.

2) The set of linear functions of the learning machine con-

tains the target function. That is, the learning machine

provides an unbiased estimate.

3) The noise is independent and identically distributed.

4) That the empirical risk is minimized.

Additional assumptions reﬂecting the noise distribution and

limit distributions are also applied in the development of each

selection criterion.

Another popular alternative (to analytic methods) is to

choose model complexity using resampling. In this paper

we consider leave-one-out cross-validation (CV). Under this

approach, the prediction risk is estimated via cross-validation,

and the model providing lowest estimated risk is chosen.

It can be shown [4] that

is asymptotically (for large )

equivalent to analytic model selection criteria (such as FPE,

GCV, and SMS). Unfortunately, the computational cost of

CV grows linearly with the number of samples, and often

becomes prohibitively large for practical applications. Addi-

tional complications arise in the context of using resampling

with nonlinear estimators (such as neural networks), due to

existence of multiple local minima and the dependence of the

ﬁnal solution (obtained by an optimization algorithm) on the

initial conditions (weight initialization)—see [4]. Nevertheless,

resampling remains the prefered approach for model selection

in many learning methods. In this paper, we use CV as a

benchmark method for comparing various analytic methods.

It can be shown [12] that the above estimates of the pre-

diction risk (excluding SC) are not consistent in the following

sense: The probability of selecting the model with the same

number of terms as the target function does not converge to

one as the number of observations is increased (with ﬁxed

number of maximum basis functions). In addition, resampling

methods for model selection may suffer from the same lack

of consistency. For example, leave-one-out CV which has

下载后可阅读完整内容，剩余14页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

zhouxr2010

粉丝: 0

VC理论驱动的回归模型复杂度控制

Nonlinear Model Predictive Control For Autonomous Vehicles

Generalization in Machine Learning via Analytical Learning Theory.pdf

【Basic】Data Regression Prediction Based on Support Vector Machine (SVM) in Matlab

Evaluating Model Overfitting and Underfitting: Diagnosis and Solutions

Application of fmincon in Medical Diagnosis: Optimizing Diagnostic Model Accuracy

Inferring Model Parameters from Data: Unveiling MATLAB's Linear Programming Inverse Modeling

卷料回转库sw20可编辑_三维3D设计图纸_三维3D设计图纸.zip

前端分析-2023071100789s+7

电磁场仿真：瞬态电磁场仿真.zip

嵌入式开发中ADC数据采集的多种软件滤波算法及其STM32工程应用

最新资源