RMarK：使用R语言进行MARK分析的新方法

需积分: 13 7 浏览量更新于2024-07-17 收藏 845KB PDF 举报

"这篇文章主要介绍了RMark，这是一个用于MARK分析的R语言包，适用于野生动物种群评估中的线性模型构建。作者Jeffrey L. Laake和Eric A. Rexstad分别来自美国国家海洋渔业服务和圣安德鲁斯大学，他们在统计学和野生动物保护领域有丰富的研究经验。" 在R语言中，RMark是一个重要的工具，它为生态学家和生物统计学家提供了一种替代方法来构建线性模型，特别是在MARK分析中。MARK分析是研究动物标记重捕数据的经典方法，用于估计野生动物种群的参数，如种群大小、生存率和迁徙模式。RMark允许用户更灵活地处理这些数据，进行模型选择和参数估计。线性模型在生态学研究中广泛使用，因为它们可以用来描述生物量、种群动态或环境变量与观测结果之间的关系。RMark的独特之处在于它使用户能够通过图形化的设计矩阵模板（graphical DM template）直观地构建模型，这有助于理解模型结构和协变量之间的关系。设计矩阵是线性模型的核心，它包含了所有预测变量和响应变量之间的关系。 RMark的使用不仅限于基础的线性模型，还支持更复杂的模型结构，如混合效应模型、时间序列分析和多状态模型。这些模型可以更好地适应野生动物研究中的不完整数据和不确定性。此外，RMark还提供了模型比较的功能，通过AIC（Akaike信息准则）或其他比较标准帮助研究人员选择最佳模型。通过RMark，用户能够进行模型的参数估计，包括最大似然估计和贝叶斯估计，这在野生动物种群研究中尤为重要。这些估计对于理解种群动态、评估管理策略和预测未来趋势都是必要的。 RMark是R语言生态系统中的一个强大工具，它为野生动物种群评估提供了统计上的支持。结合R语言的灵活性和强大的统计功能，RMark使得生态学家和野生动物管理者能够在处理标记重捕数据时，更加高效和精确地进行模型构建和参数估计，从而对野生动物种群的动态有更深入的理解。

C.3. How RMark works C - 15

CJS). With the all-different PIMS it is easier to display real parameter s in PIM format, associate labels

to the real parameters and to use model averaging on the real parameters from different models

which will have different simpliﬁed PIM coding.

It should be helpful to examine the recoded PIMS for some other models, so without describing

how we got them, we show the recoded PIMS for parameter p with ∼time, ∼Time and ∼Time+age

models with Phi(∼1) as shown above for design matrices:

~time or ~Time group = Group 1

2 3 4 5 6 7

1 2 3 4 5 6 7

2 3 4 5 6 7

3 4 5 6 7

4 5 6 7

5 6 7

6 7

~Time + age group = Group 1

2 3 4 5 6 7

1 2 3 4 5 6 7

2 8 9 10 11 12

3 13 14 15 16

4 17 18 19

5 20 21

6 22

Notice that the recoded PIMS for the ∼Time+age model has 21 different parameters as with the

all-different PIMS because with that model all of the rows of the design matrix for p are different.

However, the PIM is recoded to start at 2 because Phi(∼ 1) only requires a single pa rameter.

To a large extent the PIM/design simpliﬁcation is transparent to you as a user in analyzing the

data except that simpliﬁcation does create a conﬂict between the labeling of real parameters in the

MARK output and the labeling of real parameters in output from summary and other functions in R.

When the PIMS are simpliﬁed there is no attempt to create a unique meaningful label for the real

parameters in the input ﬁle sent to mark.exe. It uses the label associated with the ﬁrst real parameter

translated to the new PIM coding. However, the labeling of real parameters in R is maintained with

the use of the all-different PIM structure. So use R when you want to look at real parameter values

with their labels and ignore the labels in the MARK output ﬁle for real parameters.

PIM simpliﬁcation is done for all parameters except for parameters that use the mlogit links like

ψ in the multistrata model and pent in POPAN. The mlogit link assures that the sum of a speciﬁed

set of probabilities sums to 1 but it is implemented in MARK by using a sum of the unique real

parameters indices and not the full set of real parameters. So for example, if you had 5 strata (A to E)

and you wanted to estimate 4 real parameters for transitions from A by constraining equality for D

and E (ψ

, ψ

= ψ

). If you give these 4 parameters indices 1 to 4, then the mlogit link will

work properly because it will sum across all 4, but if you give the parameters the indices 1,2,3,3 to

constrain the last two parameters then the sum will be only the ﬁrst 3 parameters and it will not sum

the third parameter twice. Thus, an all-different PIM structure is required for parameters that use

the mlogit link and any equality constraints must be implemented with the design matrix without

any simpliﬁcation of the PIMS. This restriction on mlogit links does not affect how you use RMark

Chapter C. RMark - an alternative approach to building linear m odels in MARK

C.4. Dissecting the function “mark" C - 16

but may affect the speed at which MARK computes the par ameter estimates because the number of

parameters and the size of the design matrix is larger without PIM simpliﬁcation.

As we showed above, model.matrix in R is the workhorse for creation of design matrices from

formula; however, it cannot directly cope with individual covariates in the design matrix structure

of MARK which uses the name of the individual covariate in the design matrix. To be generally

useful, the formula notation needed to encompass individual covariates and this led to ‘trick number

4’ which is probably the only clever trick in the RMark implementation. But we’ll delay divulging it

until section C .16.

There are just a few things more you should understand before we move on. Note that the indices

are “stacked on top of each other" to get unique indices for all of the parameters. Thus, for our example

there are 21 φ parameters numbered 1 to 21 and 21 p parameters numbered 22 to 42. This ordering of

the index numbers is done in a consistent fashion f or each model. For example, p always follows φ in

the CJS model. However, in most places in the code where you have to specify indices (see C.11 - ﬁxing

real parameters) it will typically only need to identify the parameter with the parameter-speciﬁc index

which is the row number in the design matrix. Thus, in most cases for p, the parameters are ide ntiﬁed

by the indices 1 to 21. The only exception is situations in which you are referring to parameter indices

across p arameter types (e.g., both φ and p) as with the function covariate.predictions (C.16) .

For most models in MARK, the design matrix could b e graphically displayed in the following

manner:

design for parameter 1 0 0 0

0 design for parameter 2 0 0

0 0

0 0 0 design for parameter k

where none of the different types of parameters (e.g., p, φ etc) share columns of the design matrix.

Parameter types can share the same covariate (e.g., φ

), but the effect of that covariate is not the

same for the different types of parameters so the covariates are represented by different columns in

the design matrix. For most models, this works quite well but there are some exceptions including

parameters “p" and “c" in the closed and robust design models, parameters “p1" a nd “p2" in the

MSOccupancy model, and “GammaPrime" and “GammaDoublePrime" in the robust design models.

In e ach of these cases the parameter has a different name but it is effectively the same type of

parameter, so it is quite reasonable to build models in which they “share" covariates or are equate d.

To accommodate this exception, the parameter listed ﬁrst is set as the dominant parameter and the

formula for the dominant para mete r is given a special argument “share" that can be set to TRUE or

FALSE. If it is set to TRUE, then the d esign data are combined ‘on the ﬂy’ and an extra column is added

for the non-dominant parameter to enable ﬁtting additive models. See section C.19 for an example.

C.4. Dissecting the function “mark"

Now that you have been introduced to some of the ideas on the inner workings of RMark like

design data and PIM structure and simpliﬁcation, we’ll discuss the steps that are taken in p roducing

Chapter C. RMark - an alternative approach to building linear m odels in MARK

C.4.1. Function process.data C - 17

an analysis and along the way we will expand the concept of design data to include group structure .

The function mark is actually quite simple because it is a c onvenience function that calls 5 other

functions that ac tually do the work in the following order:

1. process.data

2. make.design.data

3. make.mark.model

4. run.mark.model

5. summary.mark

Why do you care? Primarily because the function has dual calling modes for efﬁciency and to

enable adding/modifying the design data. Depending on the arguments that you pass mark, it will

either start with process.data or it will skip d ir ectly to make.mark.model. This a llows you to do the

ﬁrst 2 steps once, optionally modify the design data, and then run a whole series of models on the

data without repe ating the ﬁrst 2 steps in e ach call to mark.

C.4.1. Function process.data

The ﬁrst function process.data literally does what its name implies. It takes the input data frame

and the user-deﬁned arguments and creates a list (processed data) containing the data and numerous

deﬁned attributes that the remaining functions use in deﬁning the analysis models. The following are

the primar y attributes that are set:

1. model: the type of analysis model (e.g., "CJS", "Known", "POPAN"); see help for

function mark (?mark) for a complete listing of the supported models

2. begin.time: the time of the ﬁrst c apture/release occasion for labeling

3. time.intervals: the lengths of the time intervals between capture occasions

4. groups: the list of factor variables in the data to deﬁne groups

5. initial.ages: the age of animals at ﬁrst capture/release corresponding to the

levels of the age grouping variable (age.var)

6. nocc: number of capture/encounter occasions which is determined from the

contents of the "ch" ﬁeld in the data and the type of analysis model(model).

As an example, we will use the dipper data and the ﬁeld sex to create 2 groups in the data and

deﬁne ﬁctitious beginning time and time intervals for the da ta:

> data(dipper)

> dipper.process=process.data(dipper,model="CJS",begin.time=1980,

time.intervals=c(1,.5,1,.75,.25,1),groups="sex")

The resulting object (dipper.process) is a list containing the data and its attribute s. The names of

the elements of the list can be viewed with the names function:

Chapter C. RMark - an alternative approach to building linear m odels in MARK

C.4.1. Function process.data C - 18

> names(dipper.process)

[1] "data" "model" "mixtures" "freq"

[5] "nocc" "nocc.secondary" "time.intervals" "begin.time"

[9] "age.unit" "initial.ages" "group.covariates" "nstrata"

[13] "strata.labels"

Note that there are many more attributes than described above. Some like mixtures, nstrata,

nocc.secondary and strata.labels are only relevant to spec iﬁc models but these are often included

with a default, NULL or empty value for models in which they are not relevant. Speciﬁc ele ments of

the list can be extracted as illustrated:

> dipper.process$nocc

[1] 7

> dipper.process$group.covariates

sex

1 Female 2 Male

> dipper.process$begin.time

[1] 1980

> dipper.process$strata.labels

character(0)

> dipper.process$nocc.secondary

NULL

> dipper.process$time.intervals

[1] 1.00 0.50 1.00 0.75 0.25 1.00

From the ﬁrst 5 rows of the ﬁeld freq it is obvious that this is the structure used to create the

frequency data f or the MARK input ﬁle with the deﬁned grouping structure and the column labels

as the group labels:

> dipper.process$freq[1:10,]

sexFemale sexMale

1 1 0

2 1 0

3 1 0

4 1 0

5 1 0

The structur e of the encounter history and the analysis depends on the analysis model that you

choose like " CJS" above. Thus, it is necessary to process the data frame (data) containing the encounter

history and a chosen model to deﬁne the relevant values which will be used by the remaining

functions. For example, number of capture occasions (nocc) is automatically computed based on the

length of the encounter history (ch) in data; however, this is dependent on the type of a nalysis model.

For models such as "CJS", "Pradel" and others, it is simply the length of ch. Whereas, for "Burnham"

and "Barker" models, the encounter history contains capture and resight/recover y values so nocc

is one-half the length of ch. Likewise, the number of time.intervals depends on the model. For

models, such as "CJS", "Pradel" and others, the number of time.intervals is nocc-1; whereas, for

capture-recovery (or resight) models the number of time.intervals is nocc. The default time interval

is unit time (1) and if this is adequate, the function will assign the appropriate length; otherwise the

appropriate number of values must be given.

Chapter C. RMark - an alternative approach to building linear m odels in MARK

C.4.2. Function make.design.data C - 19

A processed data frame can only be analyzed using the model that was speciﬁed in the call to

process.data. The model value is used by the functions make.design.data and make.mark.model to

deﬁne the design data and the appropriate input ﬁle structure for MARK. Thus, if the data are going

to b e ana lyzed with different underlying models, create different processed data objects possibly

using the type of model as an extension. For example,

dipper.cjs=process.data(dipper,model="CJS")

dipper.popan=process.data(dipper,model="POPAN")

The process.data function will report any inconsistencies in the lengths of the capture history

values and when invalid entries are given in the capture history. For example, with the "CJS" model,

the capture history should only contain 0 and 1 whereas for "Barker" it can contain 0,1,2. For

"Multistrata" models, the code will automatically identify the number of strata (nstrata) and

strata labels (strata.labels) based on the unique alphab etic codes used in the capture histories.

For "Robust" design models, the number of secondary occasions (nocc.secondary) is determined by

the speciﬁed time.intervals.

The argument begin.time speciﬁes the time for the ﬁrst capture/release occasion. This is used

in creating the levels of the time factor variable in the design data and for labeling parameters. If

begin.time varies by group, enter a vector of times with one for each group.

The argument groups can contain one or more character strings specifying the names of factor

variables contained in data. A group is created for each unique combination of the levels of the factor

variables. Further examples of grouping and use of age variables will be given later a nd they can be

found in the help documentation with R (?process.data and ?example.data) .

C.4.2. Function make.design.data

The next step is to create the design data and PIM structure which dep ends on the selected type of

analysis model (e.g., CJS or Multistrata), number of occasions, grouping variables and other attributes

of the data that were deﬁned in the processed data, which is the ﬁrst and primary argument to the

function make.design.data that creates the design data. For parameter s with triangular PIMS the

default design data are cohort, age and time and any grouping factor variables that were deﬁned.

For parameters with square PIMS, there is only one row so the cohort variable is not automatically

included in the design data but there are ways to create a cohort structure in this case with groups.

In creating the factor variables for cohort, age, and time, a separate factor level is created for e ach

value of the variable. However, you can optionally bin the values into intervals in creating the factor

variable. For example, if birds were always classiﬁed as either young (< 1) or as adult (1+), then

age.bins could be speciﬁed in the call to make.design.data. However, if you wanted the option to

model age based on all levels of the factor and other models with some ages collapsed into intervals

then it is best to allow make.design.data to create the default factor variables and create ad ditional

design data with the function add.design.data or using R statements and functions. Ther e are many

other features of make.design.data including restricting parameters to use "time" or "constant" PIMS,

setting the subtraction stratum for " Multistrata" models, and automatic removal of unused design

data. These features are described in the help ﬁles (?make.design.data and ?add.design.data) and

they a re described in more detail in late r sec tions.

For now, a simple example with the d ipp er data will sufﬁce to illustrate this step and explain the

basic concepts. But before we do that we’ll reprocess the data to use annual time intervals rather than

the ﬁctitious ones used above:

Chapter C. RMark - an alternative approach to building linear m odels in MARK

剩余114页未读，继续阅读

土豆菜

粉丝: 0
资源: 1

RMarK：使用R语言进行MARK分析的新方法

Linguagem-R:UNINOVE研究生数据科学的R语言学科

GAOSrmark：使用RMARK进行自动照片审查分析

r_mark:在 Rails 视图或部分渲染 Markdown

C#将图片存数据库并显示在PictureBox中的操作与更新

Qt作业3.0（qt电子相册初版）

一堆数获取极端值_1_0-190222202316.alp

技术资料分享TSL2560-61-DS000110-2-00很好的技术资料.zip

基于java web的学生信息管理系统（包含设计文档和源码）

623、基于STM32F103RC设计的电子相册（原理图、PCB源文件、程序源码及制作）

树莓派考试真题用于备考

最新资源