Clump loci based on a Linkage Disequilibrium threshold
loci_ld_clump.Rd
This function uses clumping to remove SNPs at high LD. When used with its default options, clumping based on MAF is similar to standard pruning (as done by PLINK with "–indep-pairwise (size+1) 1 thr.r2", but it results in a better spread of SNPs over the chromosome.
Usage
loci_ld_clump(.x, ...)
# S3 method for class 'tbl_df'
loci_ld_clump(.x, ...)
# S3 method for class 'vctrs_bigSNP'
loci_ld_clump(
.x,
S = NULL,
thr_r2 = 0.2,
size = 100/thr_r2,
exclude = NULL,
use_positions = TRUE,
n_cores = 1,
return_id = FALSE,
...
)
# S3 method for class 'grouped_df'
loci_ld_clump(.x, ...)
Arguments
- .x
a
gen_tibble
object- ...
currently not used.
- S
A vector of loci statistics which express the importance of each SNP (the more important is the SNP, the greater should be the corresponding statistic).
For example, ifS
follows the standard normal distribution, and "important" means significantly different from 0, you must useabs(S)
instead.
If not specified, MAFs are computed and used.- thr_r2
Threshold over the squared correlation between two SNPs. Default is
0.2
.- size
For one SNP, window size around this SNP to compute correlations. Default is
100 / thr.r2
for clumping (0.2 -> 500; 0.1 -> 1000; 0.5 -> 200). Ifuse_positions = FALSE
, this is a window in number of SNPs, otherwise it is a window in kb (genetic distance). Ideally, use positions, as they provide a more sensible approach.- exclude
Vector of SNP indices to exclude anyway. For example, can be used to exclude long-range LD regions (see Price2008). Another use can be for thresholding with respect to p-values associated with
S
.- use_positions
boolean, if TRUE (the default),
size
is in kb, if FALSE size is the number of SNPs.- n_cores
number of cores to be used
- return_id
boolean on whether the id of SNPs to keep should be returned. It defaults to FALSE, which returns a vector of booleans (TRUE or FALSE)
Value
a boolean vector indicating whether the SNP should be kept (if 'return_id = FALSE', the default), else a vector of SNP indices to be kept (if 'return_id = TRUE')
Details
Any missing values in the genotypes of a gen_tibble
passed to loci_ld_clump
will cause an error. To deal with missingness, see gt_impute_simple()
.