
Generate a report of what would happen to each SNP in a merge
Source:R/rbind_dry_run.R
rbind_dry_run.Rd
This function provides an overview of the fate of each SNP in two
gen_tibble
objects in the case of a merge. Only SNPs found in both
objects will be kept. One object is used as a reference
, and SNPs in the
other dataset will be flipped and/or alleles swapped as needed. SNPs that
have different alleles in the two datasets will also be dropped.
Arguments
- ref
either a
gen_tibble
object, or the path to the PLINK bim file; the alleles in this objects will be used as template to flip the ones intarget
and/or swap their order as necessary.- target
either a
gen_tibble
object, or the path to the PLINK bim file- use_position
boolean of whether a combination of chromosome and position should be used for matching SNPs. By default,
rbind
uses the locus name, so this is set to FALSE. When using 'use_position=TRUE', make sure chromosomes are coded in the same way in bothgen_tibbles
(a mix of e.g. 'chr1', '1' or 'chromosome1' can be the reasons if an unexpectedly large number variants are dropped when merging).- flip_strand
boolean on whether strand flipping should be checked to match the two datasets. Ambiguous SNPs (i.e. A/T and C/G) will also be removed. It defaults to FALSE
- quiet
boolean whether to omit reporting to screen
Value
a list with two data.frames
, named target
and ref
. Each
data.frame has nrow()
equal to the number of loci in the respective
dataset, a column id
with the locus name, and boolean columns to_keep
(the valid loci that will be kept in the merge), alleles_mismatched
(loci
found in both datasets but with mismatched alleles, leading to those loci
being dropped), to_flip
(loci that need to be flipped to align the two
datasets, only found in target
data.frame) and to_swap
(loci for which
the order of alleles needs to be swapped to align the two datasets,
target
data.frame)