gen_tibble
objectsgt_pca_autoSVD.Rd
This function performs Principal Component Analysis on a gen_tibble
,
using a fast truncated SVD with initial pruning and then iterative removal
of long-range LD regions. This function is a wrapper for bigsnpr::snp_autoSVD()
gt_pca_autoSVD(
x,
k = 10,
fun_scaling = bigsnpr::snp_scaleBinom(),
thr_r2 = 0.2,
use_positions = TRUE,
size = 100/thr_r2,
roll_size = 50,
int_min_size = 20,
alpha_tukey = 0.05,
min_mac = 10,
max_iter = 5,
n_cores = 1,
verbose = TRUE
)
a gen_tbl
object
Number of singular vectors/values to compute. Default is 10
.
This algorithm should be used to compute a few singular vectors/values.
Usually this can be left unset, as it defaults to
bigsnpr::snp_scaleBinom()
, which is the appropriate function for biallelic SNPs.
Alternatively it is possible to use custom function
(see bigsnpr::snp_autoSVD()
for details.
Threshold over the squared correlation between two SNPs.
Default is 0.2
. Use NA
if you want to skip the clumping step.
size
a boolean on whether the position is used to define size
,
or whether the size should be in number of SNPs. Default is TRUE
For one SNP, window size around this SNP to compute correlations. Default is 100 / thr_r2 for clumping (0.2 -> 500; 0.1 -> 1000; 0.5 -> 200). If not providing infos.pos (NULL, the default), this is a window in number of SNPs, otherwise it is a window in kb (genetic distance). I recommend that you provide the positions if available.
Radius of rolling windows to smooth log-p-values.
Default is 50
.
Minimum number of consecutive outlier SNPs
in order to be reported as long-range LD region. Default is 20
.
Default is 0.1
. The type-I error rate in outlier
detection (that is further corrected for multiple testing).
Minimum minor allele count (MAC) for variants to be included.
Default is 10
.
Maximum number of iterations of outlier detection.
Default is 5
.
Number of cores used. Default doesn't use parallelism.
You may use bigstatsr::nb_cores()
.
Output some information on the iterations? Default is TRUE
.
a gt_pca
object, which is a subclass of bigSVD
; this is
an S3 list with elements:
A named list (an S3 class "big_SVD") of
d
, the eigenvalues (singular values, i.e. as variances),
u
, the scores for each sample on each component (the left singular vectors)
v
, the loadings (the right singular vectors)
center
, the centering vector,
scale
, the scaling vector,
method
, a string defining the method (in this case 'autoSVD'),
call
, the call that generated the object.
Note: rather than accessing these elements directly, it is better to use
tidy
and augment
. See gt_pca_tidiers
.
Using gt_pca_autoSVD requires a reasonably large dataset, as the function iteratively removes regions of long range LD.