This function implements the fastmixture algorithm for population genetics clustering by calling the python module. If you use this function, make sure that you cite the relevant paper by Santander, Refoyo-Martínez, and Meisner (2024).

gt_fastmixture(
  x,
  k,
  n_runs = 1,
  threads = 1,
  seed = 42,
  outprefix = "fastmixture",
  iter = 1000,
  tole = 0.5,
  batches = 32,
  supervised = NULL,
  check = 5,
  power = 11,
  output_path = getwd(),
  chunk = 8192,
  als_iter = 1000,
  als_tole = 1e-04,
  no_freqs = TRUE,
  random_init = TRUE,
  safety = TRUE
)

Arguments

x

either a tidypopgen::gen_tibble, or the name of the binary plink file (without the .bed extension)

k

the number of ancestral components (clusters), either a single value or a vector

n_runs

the number of repeats for each k value

threads

the number of threads to use (1)

seed

the random seed (defaults to 42);it should be a vector of length repeats

outprefix

the prefix of the output files (fastmixture)

iter

the maximum number of iterations (1000)

tole

the tolerance in log-likelihood units between iterations (0.5)

batches

the number of maximum mini-batches (32)

supervised

the name fo the file with the supervised labels (NULL)

check

the number of iterations to check for convergence (5)

power

number of power iterations in randomised SVD (11)

output_path

the path where q matrices will be saved id save_q= TRUE

chunk

the number of SPs in chunk operations (8192)

als_iter

the maximum number of iterations in the ALS algorithm (1000)

als_tole

the tolerance for the RMSE of P between iterations (1e-4)

no_freqs

do not save P-matrix (TRUE)

random_init

random initialisation of parameters (TRUE)

safety

add extra safety steps in unstable optimizations (TRUE)

Value

an object of class gt_admix. See tidypopgen::gt_admixture() for details.

Details

This function returns a q_matrix that can be plotted with autoplot, and tidied with tidy methods from the tidypopgen package.

References

C. G. Santander, A. Refoyo Martinez, J. Meisner (2024) Faster model-based estimation of ancestry proportions. bioRxiv 2024.07.08.602454; doi: https://doi.org/10.1101/2024.07.08.602454