Generate a report of what would happen to each SNP in a merge
rbind_dry_run.Rd
This function provides an overview of the fate of each SNP in two
gen_tibble
objects in the case of a merge. Only SNPs found in
both objects will be kept. One object is used as a reference
, and SNPs
in the other dataset will be flipped and/or alleles swapped
as needed. SNPs that have different alleles in the two datasets will also be
dropped.
Arguments
- ref
either a
gen_tibble
object, or the path to the PLINK bim file; the alleles in this objects will be used as template to flip the ones intarget
and/or swap their order as necessary.- target
either a
gen_tibble
object, or the path to the PLINK bim file- use_position
boolean of whether a combination of chromosome and position should be used for matching SNPs. By default,
rbind
uses the locus name, so this is set to FALSE. When using 'use_position=TRUE', make sure chromosomes are coded in the same way in bothgen_tibbles
(a mix of e.g. 'chr1', '1' or 'chromosome1' can be the reasons if an unexpectedly large number variants are dropped when merging).- flip_strand
boolean on whether strand flipping should be checked to match the two datasets. Ambiguous SNPs (i.e. A/T and C/G) will also be removed. It defaults to FALSE
- quiet
boolean whether to omit reporting to screen
Value
a list with two data.frames
, named target
and ref
. Each
data.frame has nrow()
equal to the number of loci in the respective dataset,
a column id
with the locus name, and boolean columns to_keep
(the valid loci
that will be kept in the merge),
alleles_mismatched
(loci found in both datasets but with mismatched alleles,
leading to those loci being dropped), to_flip
(loci that need to be flipped
to align the two datasets, only found in target
data.frame)
and to_swap
(loci for which the order of alleles needs to be swapped
to align the two datasets, target
data.frame)