Skip to contents

This function performs Principal Component Analysis on a gen_tibble, by partial SVD through the eigen decomposition of the covariance. It works well if the number of individuals is much smaller than the number of loci; otherwise, gt_pca_randomSVD() is a better option. This function is a wrapper for bigstatsr::big_SVD().

Usage

gt_pca_partialSVD(
  x,
  k = 10,
  fun_scaling = bigsnpr::snp_scaleBinom(),
  total_var = TRUE
)

Arguments

x

a gen_tbl object

k

Number of singular vectors/values to compute. Default is 10. This algorithm should be used to compute a few singular vectors/values.

fun_scaling

Usually this can be left unset, as it defaults to bigsnpr::snp_scaleBinom(), which is the appropriate function for biallelic SNPs. Alternatively it is possible to use custom function (see bigsnpr::snp_autoSVD() for details.

total_var

a boolean indicating whether to compute the total variance of the matrix. Default is TRUE. Using FALSE will speed up computation, but the total variance will not be stored in the output (and thus it will not be possible to assign a proportion of variance explained to the components).

Value

a gt_pca object, which is a subclass of bigSVD; this is an S3 list with elements: A named list (an S3 class "big_SVD") of

  • d, the eigenvalues (singular values, i.e. as variances),

  • u, the scores for each sample on each component (the left singular vectors)

  • v, the loadings (the right singular vectors)

  • center, the centering vector,

  • scale, the scaling vector,

  • method, a string defining the method (in this case 'partialSVD'),

  • call, the call that generated the object.

  • square_frobenious, used to compute the proportion of variance explained by the components (optional)

Note: rather than accessing these elements directly, it is better to use tidy and augment. See gt_pca_tidiers.