稳健回归利器：最小中值二乘法简介

需积分: 13 166 浏览量更新于2024-07-21 收藏 114KB PDF 举报

最小中值二乘（Least Median of Squares, LMS）是一种在现实世界数据分析中常用的稳健回归方法。在传统的线性回归分析中，我们通常采用最小二乘法（Ordinary Least Squares, OLS），这种方法假设误差项服从正态分布且独立同方差。然而，当数据集中存在异常值或离群点时，OLS可能会受到这些极端值的影响，导致估计结果偏离实际情况。 LMS旨在解决这个问题，它通过寻找使得样本数据的中位数误差（即残差的中位数）最小化的直线来拟合数据，而非最小化均方误差（Mean Square Error, MSE）。相比于OLS，LMS具有更强的抗干扰能力，因为它对异常值不那么敏感，能够提供更稳健的参数估计。具体步骤如下： 1. **设定模型**：首先，假设我们有一个线性关系模型Y = β0 + β1X + ε，其中Y是因变量，X是自变量，ε是随机误差项，满足N(0, σ)，即零均值、固定方差的正态分布。 2. **收集样本**：从总体中抽取n个观测值，形成样本数据集，包括X和Y的对。 3. **常规方法：** OLS方法试图找到最佳拟合线，即使所有观测点到这条直线的距离的平方和（SSR）最小化。这意味着OLS的参数估计是通过最小化残差平方和来得到的。 4. **LMS替代**：LMS则不同，它关注的是中位数而非平均数。目标函数变为找到使得样本残差的中位数最小的直线，这样即使有少数极端值，也不会显著影响参数估计。 5. **求解过程**：LMS通过迭代或优化算法寻找这条中位误差最小的直线，得到的b0和b1作为参数的估计值。与OLS相比，LMS的计算可能更为复杂，但结果更加稳定。 6. **优点与应用**：LMS在处理含有异常值或数据分布偏斜的数据集时，提供了更可靠的结果。在经济学、金融学、工程等领域，特别是在质量控制、经济预测或市场分析中，LMS作为一种稳健回归技术被广泛应用。总结来说，最小中值二乘是一种重要的统计工具，尤其适合在数据质量不佳，可能包含异常值的情况下进行回归分析，其核心思想是提高模型的稳健性和准确性。理解并掌握LMS的方法有助于在实际问题中做出更准确和可靠的数据建模决策。

Least Median of Squares

01B An Introduction to Least Median of Squares.doc 4

the

Make Y6 the

Outlier

button to change the outlier from Y

to Y

. The bias isn’t as bad when the outlier is

in the center of the data. Click the

Make Y11 the

Outlier

button to return the outlier to its initial, Y

position.

Mount, et al. [1997] provide a nice real-world example of the problems associated with using

least squares on dirty data. They are interested in computer vision.

The picture to the left is an aerial photograph of a road (the thick

black line) with other objects scattered about.

In panel (b), a least squares fit has been superimposed. Notice

how the points in the lower left-hand corner of the picture drag the

least squares line down. The road is not captured well by the

white line.

Panel (c) also fits a line, but it is not based on the usual least

squares algorithm. Instead of minimizing the sum of squared

residuals, the chosen intercept and slope yield the least median of

the squared residuals. This approach is abbreviated as LMS.

Figure 1.3: Identifying the Road

The authors argue that this example “demonstrates the enhanced performance obtained by LMS

versus OLS in detecting straight road segments in a noisy aerial image.”

Of course, least squares would perform well if we simply threw out the outliers (point Y

in the

Dirty data set example or the points in the lower left-hand corner in the picture above). In fact,

this is exactly the strategy adopted by another robust estimator called Least Trimmed Squares,

LTS. A rule is applied to detect outliers and they are trimmed, i.e., deleted, from the data.

Determining what constitutes an outlier is controversial. Since outlier detection and trimming

would take us far afield, we will concentrate on the properties of LMS and how it compares to

conventional least squares.

剩余16页未读，继续阅读

matthewxuzy

粉丝: 0
资源: 1

稳健回归利器：最小中值二乘法简介

LeetCode4 Median of Two Sorted Arrays

Analysis of Median of Medians Algorithm

【Advantages of Quantile Regression】: Analysis of the Concept and Advantages of Quantile Regression

【Robust Regression Strategy】: The Significance and Strategies of Robust Regression in Linear ...

Performance Comparison of OpenCV Computer Vision Algorithms Across Different Python Versions: Data-...

【In-Depth Analysis of the ARIMA Model】: Mastering Classical Methods for Time Series Forecasting

果壳处理器研究小组(Topic基于RISCV64果核处理器的卷积神经网络加速器研究)详细文档+全部资料+优秀项目+源码.zip

JSP学生学籍管理系统（源代码+论文+开题报告+外文翻译+答辩PPT）(2024x5).7z

LabVIEW实现NB-IoT通信【LabVIEW物联网实战】

【java毕业设计】智慧社区综合平台（源代码+论文+PPT模板）.zip

最新资源