
P A M L M A N U A L 9
Running a program
As indicated above, you run a program by typing its name from the command line. You should
know which folder your sequence file, tree file, and control file are, relative to your working folder.
If inexperienced, you may copy the executables to the folder containing your data files. Depending
on the model used, codeml may need a data file such as grantham.dat , dayhoff.dat ,
jones.dat
,
wag.dat
, m
tREV24.dat
, or
mtmam.dat
, so you should copy these files as well.
The programs produce result files, with names such as
rub
,
lnf
,
rst
, or
rates
. You should
not use these names for your own files as otherwise they will be overwritten.
Example data sets
The examples/ folder contains many example data sets. They were used in the original papers to
test the new methods, and I included them so that you could duplicate our results in the papers.
Sequence alignments, control files, and detailed readme files are included. They are intended to
help you get familiar with the input data formats and with interpretation of the results, and also to
help you discover bugs in the program. If you are interested in a particular analysis, get a copy of
the paper that described the method and analyze the example dataset to duplicate the published
results. This is particularly important because the manual, as it is written, describes the meanings of
the control variables used by the programs but does not clearly explain how to set up the control file
to conduct a particular analysis.
examples/HIVNSsites/: This folder contains example data files for the HIV-1 env V3 region
analyzed in Yang et al. (2000b). The data set is for demonstrating the NSsites models
described in that paper, that is, models of variable ω ratios among amino acid sites. Those
models are called the “random-sites ” models by Yang & Swanson (2002) since a priori we
do not know which sites might be highly conserved and which under positive selection.
They are also known as “fishing-expedition ” models. The included data set is the 10th data
set analyzed by Yang et al. (2000b) and the results are in table 12 of that paper. Look at the
readme file in that folder.
examples/lysin/: This folder contains the sperm lysin genes from 25 abalone species
analyzed by Yang, Swanson & Vacquier (2000a) and Yang and Swanson (2002). The data
set is for demonstrating both the “random-sites ” models (as in Yang, Swanson & Vacquier
(2000a)) and the “fixed-sites ” models (as in (Yang and Swanson 2002)). In the latter paper,
we used structural information to partition amino acid sites in the lysin into the “buried ” and
“exposed” classes and assigned and estimated different ω ratios for the two partitions. The
hypothesis is that the sites exposed on the surface are likely to be under positive selection.
Look at the readme file in that folder.
examples/lysozyme/: This folder contains the primate lysozyme c genes of Messier and
Stewart (1997), re-analyzed by Yang (1998). This is for demonstrating codon models that
assign different ω ratios for different branches in the tree, useful for testing positive
selection along lineages. Those models are sometimes called branch models or branch-
specific models. Both the “large ” and the “small ” data sets in Yang (1998) are included.
Those models require the user to label branches in the tree, and the readme file and included
tree file explain the format in great detail. See also the section “Tree file and
representations of tree topology ” later about specifying branch/node labels.