in the GT field (for example ‘./.’ for a diploid genotype and ‘.’ for haploid genotype). The meanings of the
separators are as follows (see the PS field below for more details on incorporating phasing information into the
genotypes):
◦ / : genotype unphased
◦ | : genotype phased
• DP : read depth at this position for this sample (Integer)
• FT : sample genotype filter indicating if this genotype was “called” (similar in concept to the FILTER field).
Again, use PASS to indicate that all filters have been passed, a semicolon-separated list of codes for filters
that fail, or ‘.’ to indicate that filters have not been applied. These values should be described in the meta-
information in the same way as FILTERs (String, no whitespace or semicolons permitted)
• GL : genotype likelihoods comprised of comma separated floating point log
10
-scaled likelihoods for all possible
genotypes given the set of alleles defined in the REF and ALT fields. In presence of the GT field the same
ploidy is expected and the canonical order is used; without GT field, diploidy is assumed. If A is the allele in
REF and B,C,... are the alleles as ordered in ALT, the ordering of genotypes for the likelihoods is given by:
F(j/k) = (k*(k+1)/2)+j. In other words, for biallelic sites the ordering is: AA,AB,BB; for triallelic sites the
ordering is: AA,AB,BB,AC,BC,CC, etc. For example: GT:GL 0/1:-323.03,-99.29,-802.53 (Floats)
• GLE : genotype likelihoods of heterogeneous ploidy, used in presence of uncertain copy number. For example:
GLE=0:-75.22,1:-223.42,0/0:-323.03,1/0:-99.29,1/1:-802.53 (String)
• PL : the phred-scaled genotype likelihoods rounded to the closest integer (and otherwise defined precisely as
the GL field) (Integers)
• GP : the phred-scaled genotype posterior probabilities (and otherwise defined precisely as the GL field); intended
to store imputed genotype probabilities (Floats)
• GQ : conditional genotype quality, encoded as a phred quality −10log
10
p(genotype call is wrong, conditioned
on the site’s being variant) (Integer)
• HQ : haplotype qualities, two comma separated phred qualities (Integers)
• PS : phase set. A phase set is defined as a set of phased genotypes to which this genotype belongs. Phased
genotypes for an individual that are on the same chromosome and have the same PS value are in the same
phased set. A phase set specifies multi-marker haplotypes for the phased genotypes in the set. All phased
genotypes that do not contain a PS subfield are assumed to belong to the same phased set. If the genotype in
the GT field is unphased, the corresponding PS field is ignored. The recommended convention is to use the
position of the first variant in the set as the PS identifier (although this is not required). (Non-negative 32-bit
Integer)
• PQ : phasing quality, the phred-scaled probability that alleles are ordered incorrectly in a heterozygote (against
all other members in the phase set). We note that we have not yet included the specific measure for precisely
defining “phasing quality”; our intention for now is simply to reserve the PQ tag for future use as a measure
of phasing quality. (Integer)
• EC : comma separated list of expected alternate allele counts for each alternate allele in the same order as
listed in the ALT field (typically used in association analyses) (Integers)
• MQ : RMS mapping quality, similar to the version in the INFO field. (Integer)
If any of the fields is missing, it is replaced with the missing value. For example if the FORMAT is GT:GQ:DP:HQ
then 0 | 0 : . : 23 : 23, 34 indicates that GQ is missing. Trailing fields can be dropped (with the exception of the GT
field, which should always be present if specified in the FORMAT field).
See below for additional genotype fields used to encode structural variants. Additional Genotype fields can be
defined in the meta-information. However, software support for such fields is not guaranteed.
6