Compute basic population global statistics
pop_global_stats.Rd
This function computes basic population global statistics, following the notation in Nei 1987 (which in turn is based on Nei and Chesser 1983):
observed heterozygosity ( \(\hat{h}_o\), column header
Ho
)expected heterozygosity, also known as gene diversity ( \(\hat{h}_s\),
Hs
)total heterozygosity ( \(\hat{h}_t\),
Ht
)genetic differentiation between subpopulations (\(D_{st}\),
Dst
)corrected total population diversity (\(h'_t\),
Htp
)corrected genetic differentiation between subpopulations (\(D'_{st}\),
Dstp
)\(\hat{F}_{ST}\) (column header,
Fst
)corrected \(\hat{F'}_{ST}\) (column header
Fstp
)\(\hat{F}_{IS}\) (column header,
Fis
)Jost's \(\hat{D}\) (column header,
Dest
)
Usage
pop_global_stats(.x, by_locus = FALSE, n_cores = bigstatsr::nb_cores())
Arguments
- .x
a
gen_tibble
(usually grouped, as obtained by usingdplyr::group_by()
; use on a single population will return a number of quantities as NA/NaN)- by_locus
boolean, determining whether the statistics should be returned by locus(TRUE), or as a single genome wide value (FALSE, the default).
- n_cores
number of cores to be used, it defaults to
bigstatsr::nb_cores()
Details
We use the notation of Nei 1987. That notation was for loci with \(m\) alleles, but in our case we only have two alleles, so m=2
.
Within population observed heterozygosity \(\hat{h}_o\) for a locus with \(m\) alleles is defined as:
\(\hat{h}_o= 1-\sum_{k=1}^{s} \sum_{i=1}^{m} \hat{X}_{kii}/s\)
where
\(\hat{X}_{kii}\) represents the proportion of homozygote \(i\) in the sample for the \(k\)th population and
\(s\) the number of populations,
following equation 7.38 in Nei(1987) on pp.164.Within population expected heterozygosity (gene diversity) \(\hat{h}_s\) for a locus with \(m\) alleles is defined as:
\(\hat{h}_s=(\tilde{n}/(\tilde{n}-1))[1-\sum_{i=1}^{m}\bar{\hat{x}_i^2}-\hat{h}_o/2\tilde{n}]\)
where
\(\tilde{n}=s/\sum_k 1/n_k\) (i.e the harmonic mean of \(n_k\)) and
\(\bar{\hat{x}_i^2}=\sum_k \hat{x}_{ki}^2/s\)
following equation 7.39 in Nei(1987) on pp.164.Total heterozygosity (total gene diversity) \(\hat{h}_t\) for a locus with \(m\) alleles is defined as:
\(\hat{h}_t = 1-\sum_{i=1}^{m} \bar{\hat{x}_i^2} + \hat{h}_s/(\tilde{n}s) - \hat{h}_o/(2\tilde{n}s)\)
where
\(\hat{x}_i=\sum_k \hat{x}_{ki}/s\)
following equation 7.40 in Nei(1987) on pp.164.The amount of gene diversity among samples \(D_{ST}\) is defined as:
\(D_{ST} = \hat{h}_t - \hat{h}_s\)
following the equation provided in the text at the top of page 165 in Nei(1987).The corrected amount of gene diversity among samples \(D'_{ST}\) is defined as:
\(D'_{ST} = (s/(s-1))D'_{ST}\)
following the equation provided in the text at the top of page 165 in Nei(1987).Total corrected heterozygosity (total gene diversity) \(\hat{h}_t\) is defined as:
\(\hat{h'}_t = \hat{h}_s + D'_{ST}\)
following the equation provided in the text at the top of page 165 in Nei(1987).\(\hat{F}_{IS}\) is defined as:
\(\hat{F}_{IS} = 1 - \hat{h}_o/\hat{h}_s\)
following equation 7.41 in Nei(1987) on pp.164.\(\hat{F}_{ST}\) is defined as:
\(\hat{F}_{ST} = 1 - \hat{h}_s/\hat{h}_t = D_{ST}/\hat{h}_t\)
following equation 7.43 in Nei(1987) on pp.165.\(\hat{F'}_{ST}\) is defined as:
\(\hat{F'}_{ST} = D'_{ST}/\hat{h'}_t\)
following the explanation provided in the text at the top of page 165 in Nei(1987).Jost's \(\hat{D}\) is defined as:
\(\hat{D} = (s/(s-1))((\hat{h'}_t-\hat{h}_s)/(1-\hat{h}_s))\)
as defined by Jost(2008)
All these statistics are first computed by locus, and then averaged across loci (including any
monorphic locus) to obtain genome-wide values. The function uses the same algorithm as
hierfstat::basic.stats()
but is optimized for speed and memory usage.
References
Nei M, Chesser R (1983) Estimation of fixation indexes and gene diversities. Annals of Human Genetics, 47, 253-259. Nei M. (1987) Molecular Evolutionary Genetics. Columbia University Press, pp. 164-165. Jost L (2008) GST and its relatives do not measure differentiation. Molecular Ecology, 17, 4015-4026.