Skip to contents

This function computes basic population global statistics, following the notation in Nei 1987 (which in turn is based on Nei and Chesser 1983):

  • observed heterozygosity ( \(\hat{h}_o\), column header Ho)

  • expected heterozygosity, also known as gene diversity ( \(\hat{h}_s\), Hs)

  • total heterozygosity ( \(\hat{h}_t\), Ht)

  • genetic differentiation between subpopulations (\(D_{st}\), Dst)

  • corrected total population diversity (\(h'_t\), Htp)

  • corrected genetic differentiation between subpopulations (\(D'_{st}\), Dstp)

  • \(\hat{F}_{ST}\) (column header, Fst)

  • corrected \(\hat{F'}_{ST}\) (column header Fstp)

  • \(\hat{F}_{IS}\) (column header, Fis)

  • Jost's \(\hat{D}\) (column header, Dest)

Usage

pop_global_stats(.x, by_locus = FALSE, n_cores = bigstatsr::nb_cores())

Arguments

.x

a gen_tibble (usually grouped, as obtained by using dplyr::group_by(); use on a single population will return a number of quantities as NA/NaN)

by_locus

boolean, determining whether the statistics should be returned by locus(TRUE), or as a single genome wide value (FALSE, the default).

n_cores

number of cores to be used, it defaults to bigstatsr::nb_cores()

Value

a tibble of population statistics, with populations as rows and statistics as columns

Details

We use the notation of Nei 1987. That notation was for loci with \(m\) alleles, but in our case we only have two alleles, so m=2.

  • Within population observed heterozygosity \(\hat{h}_o\) for a locus with \(m\) alleles is defined as:
    \(\hat{h}_o= 1-\sum_{k=1}^{s} \sum_{i=1}^{m} \hat{X}_{kii}/s\)
    where
    \(\hat{X}_{kii}\) represents the proportion of homozygote \(i\) in the sample for the \(k\)th population and
    \(s\) the number of populations,
    following equation 7.38 in Nei(1987) on pp.164.

  • Within population expected heterozygosity (gene diversity) \(\hat{h}_s\) for a locus with \(m\) alleles is defined as:
    \(\hat{h}_s=(\tilde{n}/(\tilde{n}-1))[1-\sum_{i=1}^{m}\bar{\hat{x}_i^2}-\hat{h}_o/2\tilde{n}]\)
    where
    \(\tilde{n}=s/\sum_k 1/n_k\) (i.e the harmonic mean of \(n_k\)) and
    \(\bar{\hat{x}_i^2}=\sum_k \hat{x}_{ki}^2/s\)
    following equation 7.39 in Nei(1987) on pp.164.

  • Total heterozygosity (total gene diversity) \(\hat{h}_t\) for a locus with \(m\) alleles is defined as:
    \(\hat{h}_t = 1-\sum_{i=1}^{m} \bar{\hat{x}_i^2} + \hat{h}_s/(\tilde{n}s) - \hat{h}_o/(2\tilde{n}s)\)
    where
    \(\hat{x}_i=\sum_k \hat{x}_{ki}/s\)
    following equation 7.40 in Nei(1987) on pp.164.

  • The amount of gene diversity among samples \(D_{ST}\) is defined as:
    \(D_{ST} = \hat{h}_t - \hat{h}_s\)
    following the equation provided in the text at the top of page 165 in Nei(1987).

  • The corrected amount of gene diversity among samples \(D'_{ST}\) is defined as:
    \(D'_{ST} = (s/(s-1))D'_{ST}\)
    following the equation provided in the text at the top of page 165 in Nei(1987).

  • Total corrected heterozygosity (total gene diversity) \(\hat{h}_t\) is defined as:
    \(\hat{h'}_t = \hat{h}_s + D'_{ST}\)
    following the equation provided in the text at the top of page 165 in Nei(1987).

  • \(\hat{F}_{IS}\) is defined as:
    \(\hat{F}_{IS} = 1 - \hat{h}_o/\hat{h}_s\)
    following equation 7.41 in Nei(1987) on pp.164.

  • \(\hat{F}_{ST}\) is defined as:
    \(\hat{F}_{ST} = 1 - \hat{h}_s/\hat{h}_t = D_{ST}/\hat{h}_t\)
    following equation 7.43 in Nei(1987) on pp.165.

  • \(\hat{F'}_{ST}\) is defined as:
    \(\hat{F'}_{ST} = D'_{ST}/\hat{h'}_t\)
    following the explanation provided in the text at the top of page 165 in Nei(1987).

  • Jost's \(\hat{D}\) is defined as:
    \(\hat{D} = (s/(s-1))((\hat{h'}_t-\hat{h}_s)/(1-\hat{h}_s))\)
    as defined by Jost(2008)

All these statistics are first computed by locus, and then averaged across loci (including any monorphic locus) to obtain genome-wide values. The function uses the same algorithm as hierfstat::basic.stats() but is optimized for speed and memory usage.

References

Nei M, Chesser R (1983) Estimation of fixation indexes and gene diversities. Annals of Human Genetics, 47, 253-259. Nei M. (1987) Molecular Evolutionary Genetics. Columbia University Press, pp. 164-165. Jost L (2008) GST and its relatives do not measure differentiation. Molecular Ecology, 17, 4015-4026.