R语言详解：逻辑回归原理与应用

需积分: 50 5 浏览量更新于2024-07-21 收藏 125KB PDF 举报

逻辑回归是一种广泛应用于统计学和机器学习中的方法，用于预测离散性或二元分类变量的概率。在R语言中，逻辑回归是通过线性模型与logit（对数几率）函数的结合来实现的。logit函数将线性模型的输出转换为概率形式，使得即使解释变量是实数值，其结果也能在0到1之间表示可能性。公式(1)定义了逻辑回归的基本原理，其中logit(p)代表的是给定输入特征(x1, x2, ..., xk)下事件发生的对数几率比，即： logit(p) = β0 + β1 * x1 + β2 * x2 + ... + βk * xk 这个公式表明，每个解释变量xi的单位变化会导致log odds（对数几率比）以恒定倍数增加，而不仅仅是线性关系。公式(2)是对logit函数的指数变换，这使得我们可以将对数几率比转换为实际概率p，即： p / (1 - p) = e^(β0 + β1 * x1 + β2 * x2 + ... + βk * xk) 当解释变量为类别变量时，如二元分类，某些项会简化。例如，如果变量xi为0（虚假），对应的指数项e^(β0)等于1，因此这一项消失。对于xi为1的情况，e^(βixi)简化为e^(βi)。这样，我们只保留那些为1的变量的项，简化了表达式。公式(3)给出了logistic函数，它是logit函数的逆，用于将对数几率映射回[0, 1]的实际比例值，确保概率的合理性。logistic函数的公式为： π = e^z / (1 + e^z) 图1展示了logistic函数如何将连续值z映射到概率区间，这对于理解模型输出如何与实际数据关联非常重要。总结来说，逻辑回归在R中通过logit函数处理线性模型，使其适用于预测二元分类问题，并提供概率解释。它特别适合处理类别变量，通过logistic函数确保输出概率的正确范围。在实践中，R的regression包提供了相关函数，如glm()，可以方便地执行逻辑回归分析。

-3.24384 -1.34325 0.04954 1.01488 6.40094

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -1.31827 0.12221 -10.787 < 2e-16

catd -0.16931 0.10032 -1.688 0.091459

catm 0.17858 0.08952 1.995 0.046053

catn 0.66672 0.09651 6.908 4.91e-12

catv -0.76754 0.21844 -3.514 0.000442

followsP 0.95255 0.07400 12.872 < 2e-16

followsV 0.53408 0.05660 9.436 < 2e-16

factor(class)2 1.27045 0.10320 12.310 < 2e-16

factor(class)3 1.04805 0.10355 10.122 < 2e-16

factor(class)4 1.37425 0.10155 13.532 < 2e-16

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 958.66 on 51 degrees of freedom

Residual deviance: 198.63 on 42 degrees of freedom

AIC: 446.10

Number of Fisher Scoring iterations: 4

Residual deviance is the diﬀerence in G

= −2 log L between a maximal model that has a separate

parameter for each cell in the model and the built model. Changes in the deviance (the diﬀerence in the

quantity −2 lo g L) for two models which can be nested in a reduction w ill be approximately χ

-distributed

with dof equal to the change in the number of estimated parameters. Thus the diﬀerence in deviances can be

tested against the χ

distribution for signiﬁcance. The same conce rns about this a pproximation being valid

only for reas onably sized expected counts (as with contingency tables and multinomials in Suppes (1970))

still apply here, but we (and most people) ignore this caution and use the statistic as a rough indicator when

exploring to ﬁnd good models.

We’re usually mainly interes ted in the relative goodness of models, but nevertheless, the high residual de-

viance shows that the model cannot be accepted to have been likely to generate the data (pchisq(198.63, 42)≈

1). However , it certainly ﬁts the da ta better than the null model (which means that a ﬁxed mean probability

of deletion is used for all cells): pchisq(958.66-198.63, 9)≈ 1.

What can we see from the parameters of this model? catd and catm have diﬀerent eﬀects, but both are

not very clearly signiﬁcantly diﬀerent fro m the eﬀect of cata (the default value). All following environments

seem distinctive. For class, all of class 2 –4 seem to have somewhat similar eﬀects, and we might model class

as a two way distinction. It seems like we cannot pro ﬁta bly drop a whole factor, but we c an test that with

the anova function to give an analysis of deviance table, or the drop1 function to try dropping each factor :

> anova(ced.logr, test="Chisq")

Analysis of Deviance Table

Model: binomial, link: logit

Response: ced.del

Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. Dev P(>|Chi|)

NULL 51 958.66

cat 4 314.88 47 643.79 6.690e-67

剩余14页未读，继续阅读

lzqkean

粉丝: 5
资源: 12

R语言详解：逻辑回归原理与应用

逻辑回归介绍及statsmodels、sklearn实操数据集--accepts.csv

逻辑回归模型实例

逻辑回归python代码实现 ，逻辑回归的介绍和算法实现

课时46逻辑回归算法原理推导_逻辑回归_逻辑回归算法_逻辑回归python_python_

逻辑回归_逻辑回归_LogisticRegression_逻辑回归python_

逻辑回归推导 逻辑回归数学推荐 逻辑回归二分类问题数学推导

LR.tar.gz_改进逻辑回归_梯度下降法_逻辑回归_逻辑回归 python_逻辑回归python

JELR.zip_java 回归_java 逻辑回归_回归java实现_逻辑回归_逻辑回归 java

多项式逻辑回归_逻辑回归_

逻辑回归

最新资源

逻辑回归python代码实现，逻辑回归的介绍和算法实现

逻辑回归推导逻辑回归数学推荐逻辑回归二分类问题数学推导