Compute the population expected heterozygosity
pop_het_exp.Rd
This function computes expected population heterozygosity (also referred to as gene diversity, to avoid the potentially misleading use of the term "expected" in this context), using the formula of Nei (1987).
Arguments
- .x
a
gen_tibble
(usually grouped, as obtained by usingdplyr::group_by()
, otherwise the full tibble will be considered as belonging to a single population).- by_locus
boolean, determining whether Hs should be returned by locus(TRUE), or as a single genome wide value (FALSE, the default).
- include_global
boolean determining whether, besides the population specific estiamtes, a global estimate should be appended. Note that this will return a vector of n populations plus 1 (the global value), or a matrix with n+1 columns if
by_locus=TRUE
.- n_cores
number of cores to be used, it defaults to
bigstatsr::nb_cores()
Value
a vector of mean population observed heterozygosities (if by_locus=FALSE
), or a matrix of
estimates by locus (rows are loci, columns are populations, by_locus=TRUE
)
Details
Within population expected heterozygosity (gene diversity) \(\hat{h}_s\) for a locus with \(m\) alleles is defined as:
\(\hat{h}_s=\tilde{n}/(\tilde{n}-1)[1-\sum_{i}^{m}\bar{\hat{x}_i^2}-\hat{h}_o/2\tilde{n}]\)
where
\(\tilde{n}=s/\sum_k 1/n_k\) (i.e the harmonic mean of \(n_k\)) and
\(\bar{\hat{x}_i^2}=\sum_k \hat{x}_{ki}^2/s\)
following equation 7.39 in Nei(1987) on pp.164. In our specific case, there are only two alleles, so \(m=2\). \(\hat{h}_s\) at
the genome level for each population is simply the mean of the locus estimates for each population.