gt_fastmixture.Rd
This function implements the fastmixture algorithm for population genetics clustering by calling the python module. If you use this function, make sure that you cite the relevant paper by Santander, Refoyo-Martínez, and Meisner (2024).
gt_fastmixture(
x,
k,
n_runs = 1,
threads = 1,
seed = 42,
outprefix = "fastmixture",
iter = 1000,
tole = 0.5,
batches = 32,
supervised = NULL,
check = 5,
power = 11,
output_path = getwd(),
chunk = 8192,
als_iter = 1000,
als_tole = 1e-04,
no_freqs = TRUE,
random_init = TRUE,
safety = TRUE
)
either a tidypopgen::gen_tibble
, or the name of the binary plink
file (without the .bed extension)
the number of ancestral components (clusters), either a single value or a vector
the number of repeats for each k value
the number of threads to use (1)
the random seed (defaults to 42);it should be a vector of length
repeats
the prefix of the output files (fastmixture)
the maximum number of iterations (1000)
the tolerance in log-likelihood units between iterations (0.5)
the number of maximum mini-batches (32)
the name fo the file with the supervised labels (NULL)
the number of iterations to check for convergence (5)
number of power iterations in randomised SVD (11)
the path where q matrices will be saved id save_q= TRUE
the number of SPs in chunk operations (8192)
the maximum number of iterations in the ALS algorithm (1000)
the tolerance for the RMSE of P between iterations (1e-4)
do not save P-matrix (TRUE)
random initialisation of parameters (TRUE)
add extra safety steps in unstable optimizations (TRUE)
an object of class gt_admix
. See tidypopgen::gt_admixture()
for
details.
This function returns a q_matrix that can be plotted with autoplot
, and
tidied with tidy
methods from the tidypopgen
package.
C. G. Santander, A. Refoyo Martinez, J. Meisner (2024) Faster model-based estimation of ancestry proportions. bioRxiv 2024.07.08.602454; doi: https://doi.org/10.1101/2024.07.08.602454