gen_tibble
gen_tibble.Rd
A gen_tibble
stores genotypes for individuals in a tidy format. DESCRIBE
here the format
gen_tibble(
x,
...,
valid_alleles = c("A", "T", "C", "G"),
missing_alleles = c("0", "."),
backingfile = NULL,
quiet = FALSE
)
# S3 method for class 'character'
gen_tibble(
x,
...,
parser = c("vcfR", "cpp"),
chunk_size = NULL,
valid_alleles = c("A", "T", "C", "G"),
missing_alleles = c("0", "."),
backingfile = NULL,
quiet = FALSE
)
# S3 method for class 'matrix'
gen_tibble(
x,
indiv_meta,
loci,
...,
ploidy = 2,
valid_alleles = c("A", "T", "C", "G"),
missing_alleles = c("0", "."),
backingfile = NULL,
quiet = FALSE
)
can be:
a string giving the path to a PLINK BED or PED file. The associated BIM and FAM files for the BED, or MAP for PED are expected to be in the same directory and have the same file name.
a string giving the path to a RDS file storing a bigSNP
object from
the bigsnpr
package (usually created with bigsnpr::snp_readBed()
)
a string giving the path to a vcf file. Note that we currently read the whole
vcf in memory with vcfR
, so only smallish vcf can be imported. Only biallelic
SNPs will be considered.
a string giving the path to a packedancestry .geno file. The associated .ind and .snp files are expected to be in the same directory and share the same file name prefix.
a genotype matrix of dosages (0, 1, 2, NA) giving the dosage of the alternate allele.
if x
is the name of a vcf file, additional arguments
passed to vcfR::read.vcfR()
. Otherwise, unused.
a vector of valid allele values; it defaults to 'A','T', 'C' and 'G'.
a vector of values in the BIM file/loci dataframe that indicate a missing value for the allele value (e.g. when we have a monomorphic locus with only one allele). It defaults to '0' and '.' (the same as PLINK 1.9).
the path, including the file name without extension,
for backing files used to store the data (they will be given a .bk
and .RDS automatically). This is not needed if x
is already an .RDS file.
If x
is a .BED file and backingfile
is left NULL, the backing file will
be saved in the same directory as the
bed file, using the same file name but with a different file type (.bk rather
than .bed). The same logic applies to .vcf files. If x
is a genotype matrix and backingfile
is NULL, then a
temporary file will be created (but note that R will delete it at the end of
the session!)
provide information on the files used to store the data
the name of the parser used for VCF, either "cpp" to use
a fast C++ parser, or "vcfR" to use the R package vcfR
. The latter is slower
but more robust; if "cpp" gives error, try using "vcfR" in case your VCF has
an unusual structure.
the number of loci or individuals (depending on the format)
processed at a time (currently used
if x
is a vcf or packedancestry file)
a list, data.frame or tibble with compulsory columns 'id'
and 'population', plus any additional metadata of interest. This is only used
if x
is a genotype matrix. Otherwise this information is extracted directly from
the files.
a data.frame or tibble, with compulsory columns 'name', 'chromosome',
and 'position','genetic_dist', 'allele_ref' and 'allele_alt'. This is only used
if x
is a genotype matrix. Otherwise this information is extracted directly from
the files.
the ploidy of the samples (either a single value, or a vector of values for mixed ploidy). Only used if creating a gen_tibble from a matrix of data; otherwise, ploidy is determined automatically from the data as they are read.
an object of the class gen_tbl
.
When loading packedancestry files, missing alleles will be converted from 'X' to NA