没有合适的资源?快使用搜索试试~ 我知道了~
首页参数回归模型中区间 censoring 的研究与应用
参数回归模型中区间 censoring 的研究与应用
需积分: 9 0 下载量 79 浏览量
更新于2024-07-16
收藏 372KB PDF 举报
本文档《A Study of Interval Censoring in Parametric Regression Models》主要探讨了在参数化回归模型中处理区间截尾数据(Interval Censoring)的方法。随着特定统计软件的进步,研究人员可以相对轻松地应用这些模型进行数据分析,即使数据存在大量区间性缺失。区间数据通常在寿命分析(LifetimeDataAnalysis)等场景中遇到,例如医疗研究中的疾病发生时间或生存期的估计。 作者介绍了一种策略,即在回归模型中引入位置(location)和分散(dispersion)参数,这有助于捕捉数据中的趋势和变异。对于那些被右(或左)区间截尾的数据点,通过使用混合模型,可以假设存在一个概率分布,表示个体确实未发生事件(还未达到某种状态)或已经发生事件(已达到)。这个混合概率同样可以根据回归方程进行调整,使得模型更加灵活和适应性更强。 文章核心部分比较了基于九种不同分布的模型在三个实际高度截尾数据集上的表现,以及一组模拟数据上的适用性。研究发现,在参数化模型中,当处理区间数据时,有时可以忽略区间边界的影响,而直接使用截尾点处的密度估计。这意味着,在某些情况下,我们可以通过集中分析中心值,而不是严格关注区间范围,来构建有效的预测和推断。 然而,这种方法的前提是数据的分布类型和截尾模式与假设的模型相符。如果实际情况有所偏离,可能需要对模型进行修正或者选择更适合的非参数方法。这篇论文为在区间截尾数据上运用参数化回归模型提供了实用的指导,尤其是在需要考虑个体差异和不确定性的情境下。通过这种方法,研究人员能够更准确地挖掘隐藏在区间数据背后的规律,并进行有效的统计推断。
资源详情
资源推荐
A STUDY OF INTERVAL CENSORING 333
In a regression problem, the location parameter, µ, is usually allowed to vary with the
conditions, x
i
, under study. Here, I shall use the log link function,
log(µ
i
) = g
1
(x
i
, β
1
)
for the first four densities above, and the identity link function,
µ
i
= g
1
(x
i
, β
1
)
for the five ‘logged’ densities, where g
1
(·) is some general regression function that may be
nonlinear in the parameters. Note that, in the first case, µ refers to y and, in the second
case, to log(y). Other link functions are also possible.
I shall also consider two other, less common, regression models, each in addition to
the regression equation for the location parameter, µ. These models, thus, contain two
regression equations (Lindsey, 1974b). The first extension allows the dispersion parameter,
φ, to vary with the conditions:
log(φ
i
) = g
2
(x
i
, β
2
)
The second extensioninvolvesa finitemixture whereby either the left or the right censored
observations may come from a mixture of two populations (Boag, 1949; Berkson and Gage
1952; Cutler and Axtell, 1963; Haybittle, 1965; Farewell, 1977, 1982, 1986; Schmidt and
Witte, 1988; Kuk and Chen, 1992; Maller, 1993; Moulton and Halsey, 1995):
f
M
(y;µ, φ) = (1 − ξ)z + ξ
£
F(y
R
;µ, φ) − F(y
L
;µ, φ)
¤
where z is a binary indicator taking the value one if an observation is right censored and
zero otherwise, so that 1 − ξ is the probability of belonging to the group never having the
event. (Again, the integral can be approximated by a density.) Then, this probability can
also be allowed to vary with the conditions:
log
µ
ξ
i
1 − ξ
i
¶
= g
3
(x
i
, β
3
)
I have written two general functions in the statistical language, R (Ihaka and Gentleman,
1996), that are available from me. They handle these two double regression models for
the nine distributions described above, as well as a number of discrete distributions. For
the continuous distributions, the user can choose either the usual density-based likelihood
or that based on the difference of cumulative functions in Equation (2). If g
j
(·) is a linear
function of the parameters, the regression model may be specified using the Wilkinson and
Rogers (1973) notation.
In the examples to follow, the inference criterion for comparing the models under con-
sideration, whether differing in functional form or in the number of parameters, will be
their ability to predict the observed data, that is how probable they make the data. In other
words, they will be compared directly through the minimized −log likelihood. When the
numbers of parameters in models differ, this may be penalized by adding the number of
estimated parameters, a form of the Akaike information criterion (AIC, see Akaike, 1973).
334 LINDSEY
Table 1. Intervals, in months, between
visits within which subjects changed
from HIV-negative to positive and the
corresponding frequencies, n
i
, with
time measured starting in December,
1979, from Carstensen (1996).
y
L
i
y
R
i
n
i
y
L
i
y
R
i
n
i
0242428∞ 8
0 39 2 39 57 3
24 28 4 39 113 2
24 39 1 39 ∞ 15
24 57 10 57 88 5
24 88 3 57 113 1
24 113 4 57 ∞ 22
24 ∞ 61 88 113 1
28 39 4 88 ∞ 34
28 88 1 113 ∞ 92
Smaller values indicate relatively more preferable models. Intervals of precision for the
parameters of interest will be constructed using normed profile likelihoods (that is, the
likelihood is normed by dividing by its maximum value and the profile obtained by varying
the parameter of interest over the range of values under study while maximizing over all
other parameters).
In interval censored data, the likelihoods of parametric and nonparametric models are
not generally comparable because the latter do not give the probability of right-censored
observations. These are only used conditionally, in the risk set. However, I shall provide
some nonparametric results for visual comparison in graphs. Because this is not the centre
of interest, I shall use midpoints for calculating Kaplan-Meier estimates, rather than the
more sophisticated procedures of Turnbull (1974, 1976). The examples in Lindsey and
Ryan (1998) confirm that this is justified.
3. HIV Infection
I shall first consider an example of highly censored observations with no explanatory vari-
ables. Carstensen (1996) gives data on diagnosis of 297 Danish homosexuals for HIV
antibody positivity at six widely spaced time points between December, 1981, and May,
1989. Many people were not present for all visits. An additional complicating problem
with these data is that the time origin, when all individuals were uninfected, is unknown;
the data are doubly censored. Following Carstensen, I assume that the time origin is the
same for all individuals, provisionally taking this to be December 1979, and present the
data in this form in Table 1.
Thus, one question concerns the point at which individuals were not yet infected. A
second question relates to estimation of the proportion of the group who were HIV-positive
by 1990. This is intimately linked to a third question: is there a subgroup that will never
剩余25页未读,继续阅读
sweetnur
- 粉丝: 0
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- ***+SQL三层架构体育赛事网站毕设源码
- 深入探索AzerothCore的WoTLK版本开发
- Jupyter中实现机器学习基础算法的教程
- 单变量LSTM时序预测Matlab程序及参数调优指南
- 俄G大神修改版inet下载管理器6.36.7功能详解
- 深入探索Scratch编程世界及其应用
- Aria2下载器1.37.0版本发布,支持aarch64架构
- 打造互动性洗车业务网站-HTML5源码深度解析
- 基于zxing的二维码扫描与生成树形结构示例
- 掌握TensorFlow实现CNN图像识别技术
- 苏黎世理工自主无人机系统开源项目解析
- Linux Elasticsearch 8.3.1 正式发布
- 高效销售采购库管统计软件全新发布
- 响应式网页设计:膳食营养指南HTML源码
- 心心相印婚礼主题响应式网页源码 - 构建专业前端体验
- 期末复习指南:数据结构关键操作详解
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功