Skip to contents

This summarizes information about the components of a gt_dapc from the tidypopgen package. The parameter matrix determines which element is returned.

Usage

# S3 method for class 'gt_dapc'
tidy(x, matrix = "eigenvalues", ...)

Arguments

x

A gt_dapc object (as returned by gt_dapc()).

matrix

Character specifying which component of the DAPC should be tidied.

  • "samples", "scores", or "x": returns information about the map from the original space into the least discriminant axes.

  • "v", "rotation", "loadings" or "variables": returns information about the map from discriminant axes space back into the original space (i.e. the genotype frequencies). Note that this are different from the loadings linking to the PCA scores (which are available in the element $loadings of the dapc object).

  • "d", "eigenvalues" or "lds": returns information about the eigenvalues.

...

Not used. Needed to match generic signature only.

Value

A tibble::tibble with columns depending on the component of DAPC being tidied.

If "scores" each row in the tidied output corresponds to the original data in PCA space. The columns are:

row

ID of the original observation (i.e. rowname from original data).

LD

Integer indicating a principal component.

value

The score of the observation for that particular principal component. That is, the location of the observation in PCA space.

If matrix is "loadings", each row in the tidied output corresponds to information about the principle components in the original space. The columns are:

row

The variable labels (colnames) of the data set on which PCA was performed.

LD

An integer vector indicating the principal component.

value

The value of the eigenvector (axis score) on the indicated principal component.

If "eigenvalues", the columns are:

LD

An integer vector indicating the discriminant axis.

std.dev

Standard deviation (i.e. sqrt(eig/(n-1))) explained by this DA (for compatibility with prcomp.

cumulative

Cumulative variation explained by principal components up to this component (note that this is NOT phrased as a percentage of total variance, since many methods only estimate a truncated SVD.

Examples

#' # Create a gen_tibble of lobster genotypes
bed_file <-
  system.file("extdata", "lobster", "lobster.bed", package = "tidypopgen")
lobsters <- gen_tibble(bed_file,
  backingfile = tempfile("lobsters"),
  quiet = TRUE
)

# Remove monomorphic loci and impute
lobsters <- lobsters %>% select_loci_if(loci_maf(genotypes) > 0)
lobsters <- gt_impute_simple(lobsters, method = "mode")

# Create PCA and run DAPC
pca <- gt_pca_partialSVD(lobsters)
populations <- as.factor(lobsters$population)
dapc_res <- gt_dapc(pca, n_pca = 6, n_da = 2, pop = populations)

# Tidy scores
tidy(dapc_res, matrix = "scores")
#> # A tibble: 352 × 3
#>    row      LD  value
#>    <chr> <dbl>  <dbl>
#>  1 Ale04     1  3.87 
#>  2 Ale04     2  0.132
#>  3 Ale05     1  3.96 
#>  4 Ale05     2 -0.402
#>  5 Ale06     1  3.25 
#>  6 Ale06     2 -0.801
#>  7 Ale08     1  3.06 
#>  8 Ale08     2  0.398
#>  9 Ale13     1  1.60 
#> 10 Ale13     2  1.05 
#> # ℹ 342 more rows

# Tidy eigenvalues
tidy(dapc_res, matrix = "eigenvalues")
#> # A tibble: 4 × 3
#>      LD eigenvalue cumulative
#>   <int>      <dbl>      <dbl>
#> 1     1    225.          225.
#> 2     2     33.4         259.
#> 3     3      2.29        261.
#> 4     4      0.283       261.

# Tidy loadings
tidy(dapc_res, matrix = "loadings")
#> # A tibble: 158 × 3
#>    column LD       value
#>    <chr>  <chr>    <dbl>
#>  1 rs3441 LD1   -0.00389
#>  2 rs3441 LD2   -0.00831
#>  3 rs4173 LD1   -0.0157 
#>  4 rs4173 LD2    0.0121 
#>  5 rs6157 LD1    0.0122 
#>  6 rs6157 LD2   -0.162  
#>  7 rs7502 LD1    0.163  
#>  8 rs7502 LD2    0.0172 
#>  9 rs7892 LD1    0.0880 
#> 10 rs7892 LD2    0.0206 
#> # ℹ 148 more rows