马尔科夫链蒙特卡洛方法：计数型相关数据的高效分析

需积分: 22 57 浏览量更新于2024-09-07 2 收藏 236KB PDF 举报

马尔科夫链蒙特卡洛方法(Markov Chain Monte Carlo, MCMC)是一种强大的数值统计工具，它在处理复杂模型和大规模数据时尤其有效。本篇论文《马尔科夫链蒙特卡洛分析关联计数数据》由Siddhartha Chib和Rainer Winkelmann共同撰写，两位作者分别来自华盛顿大学圣路易斯分校和波恩的IZA研究所。文章主要聚焦于如何运用MCMC技术来解决计数型数据中的相关性问题。在文中，作者提出了一种新的模型框架，其中通过相关的隐含效应来捕捉计数数据间的内在关联。这种模型允许研究人员考虑多种多样的情形，比如多变量正态分布和多元t分布对隐含效应的假设。对于这类复杂模型，传统的分析方法可能面临挑战，而MCMC算法则提供了解决之道，它通过构造一个有效的马尔可夫链，能够在高维空间中进行高效且精确的参数估计。马尔科夫链蒙特卡罗算法的核心在于Metropolis-Hastings算法，这是一种基于随机游走策略的采样方法，能够在给定概率分布下探索潜在参数空间。通过迭代过程，算法不断接受或拒绝新样本，从而收敛到目标分布的近似，使得估计结果更为可靠。论文特别关注了针对多变量计数数据的处理，尤其是当数据集包含六个或十六个变量的关联计数时。这些数据可能源自真实世界的各种情境，如多个相关事件的发生次数、社会网络中的交互频率等。由于这些数据的特性，单变量模型往往无法准确捕捉它们之间的相互影响，而本文提出的模型恰好提供了处理这种情况的有力工具。关键词如"隐含效应"、"Metropolis-Hastings算法"、"多变量计数数据"和"Poisson-lognormal分布"揭示了论文的核心研究内容和方法论基础。这篇文章为统计学家和经济学家提供了在处理具有复杂相关性的计数数据时，如何有效地利用MCMC技术进行深入分析的新视角和实践指导。

Markov Chain Monte Carlo Analysis of

Correlated Count Data

Siddhartha Chib

John M. Olin School of Business, Washington University, St. Louis, MO 63130 (chib@olin.wustl.edu)

Rainer Winkelmann

IZA Bonn, 53072 Bonn, Germany (winkelmann@iza.org)

This article is concerned with the analysis of correlated count data. A class of models is proposed in

which the correlation among the counts is represented by correlated latent effects. Special cases of the

model are discussed and a tuned and ef cient Markov chain Monte Carlo algorithm is developed to

estimate the model under both multivariate normal an d multivariate-

assumptions on the latent effects.

The methods are illustrated with two real data examples of six and sixteen variate correlated counts.

KEY WORDS: Latent effects; Metropolis–Hastings algorithm; Multivariate count data; Poisson–

lognormal distribution.

A large literature on the an alysis of count data is now

available (Cameron and Trivedi 1998, Winkelmann 2000),

but only a small portion of it deals with correlated counts.

Correlated counts typically arise in three varieties—as gen-

uine “multivariate” data on several related counted outcomes,

as longitudinal measurements o n a large number of su bjects

over a sh ort period of time, or as measurements on a small

set of subjects over a long period of time (the seemingly

unrelated case). Although the longitudinal situation has been

actively studied (e.g., see Hausman, Hall, and Griliches 1 984;

Blundell, Grif th, and Van Reenen 1995; Wooldridge 1997;

Chib, Greenberg, and Winkelmann 1998, henceforth CGW)

and a number of useful models and approaches are avail-

able, the other cases have been analyzed only under simpli-

fying assumptions (King 1989; Jung and Winkelmann 1993;

Gurmu and Elder 1998; Munkin and Trivedi 1999). The latter

approaches either do not allow a general correlation structure

or are dif cult to extend beyond the case of a few outcomes.

This article is an effort to deal with both problems. To

model the correlation among a large number of counts in a

 exible fashion, we introduce a set of correlated latent effects,

one for each subject and outcome. Conditioned on the latent

effects, t he counts are assumed to be independent Poisson with

a conditional mean function that depends on the latent effects

and a set of covariates. To complete the model we assume

that the latent effects follow a multivariate Gaussian distribu-

tion with a zero mean vector and full unrestricted covariance

matrix. As an extension of this model, we also consider the

case in which the latent effects follow a multivariate-

distri-

bution. To estimate this model, we develop a Markov ch ain

Monte Carlo (MCMC) simulation method that is based on the

work of CGW. Under t his framework, we are able to sample

the posterior distribution of the parameters and latent effects

without computing the likelihood function of the model.

The methods that we develop in this article can be applied to

datasets with large numbers of correlated counts. We demon-

strate this feature by  tting our model to a problem with 16

response variables. In our view this is an important illustration

that highlights what is possible from a Bayesian simulation-

based perspective.

The rest of the article is organized as follows. In Section 1

we present the basic model and some special cases and exten-

sions. The  tting algo rithm is developed in Section 2, while

Section 3 gives two real data examples. Section 4 concludes.

1. MODEL

Following the usual notation for multivariate d ata, let

1 : : : 1 y

denote the collection of

counts on the

th sub-

ject in the sample,

i µ n

. Let

1 : : : 1 b

denote a set

subject and outcome-speci c latent effects, and suppose

that, conditioned on

and parameters

‚

the coun ts

j µ J

, are independent Poisson:

—

1 ‚

Poisson

4Œ

exp

‚

for

j µ J

and

i µ n1

(1)

where

are covariates. To model the correlation among the

counts, let

—

1 D51 i µ n1

(2)

where

is an unrestricted covariance ma trix.

To understand some of the features of this model, let

exp

and

1 : : : 1 v

. Then

4Œ1 è5

, a multi-

variate lognormal distribution with mean

exp

5 diag

4D55

and dispersion matrix

diag

4Œ556

exp

4D5

diag

4Œ55

Hence,

—

‹

1 v

Poisson

4‹

, where

‹

exp

‚

This is, therefore, in the form of a Poisson–lognormal distribu-

tion as discussed by Aitchison and Ho (1989).

In this setup, the expectation and va riance of t he marginal

joint distribution of

can be derived without integration. Let

‹

(i.e.,

‹

and

‹

differ only b y a constant fac-

tor),

‹

1 : : : 1

‹

, and

diag

‹

. Applying the

Journal of Business & Economic Statistics

October 2001, Vol. 19, No. 4

428

下载后可阅读完整内容，剩余7页未读，立即下载

forwardeye

粉丝: 0
资源: 1

马尔科夫链蒙特卡洛方法：计数型相关数据的高效分析

马尔科夫链蒙特卡洛MCMC仿真（带MATLAB代码）

MCMC的matlab源代码

概率论 马尔科夫链 排队 模拟

马尔科夫链蒙特卡洛采样 matlab

matlab马尔科夫链蒙特卡洛gibbs

用马尔科夫链蒙特卡洛方法反演期权波动率，如何构造似然函数和先验分布

用马尔科夫链蒙特卡洛方法对期权波动率进行反演过程中，如何构造似然函数和先验分布

马尔科夫链蒙特卡洛(MCMC)方法

马尔科夫链蒙特卡洛随机模拟

马尔科夫链蒙特卡洛求转移概率的Python代码实例

最新资源

概率论马尔科夫链排队模拟