Constructor for a gen_tibble
gen_tibble.Rd
A gen_tibble
stores genotypes for individuals in a tidy format. DESCRIBE
here the format
Usage
gen_tibble(
x,
...,
valid_alleles = c("A", "T", "C", "G"),
missing_alleles = c("0", "."),
backingfile = NULL,
quiet = FALSE
)
# S3 method for class 'character'
gen_tibble(
x,
...,
parser = c("vcfR", "cpp"),
n_cores = 1,
chunk_size = NULL,
valid_alleles = c("A", "T", "C", "G"),
missing_alleles = c("0", "."),
backingfile = NULL,
quiet = FALSE
)
# S3 method for class 'matrix'
gen_tibble(
x,
indiv_meta,
loci,
...,
ploidy = 2,
valid_alleles = c("A", "T", "C", "G"),
missing_alleles = c("0", "."),
backingfile = NULL,
quiet = FALSE
)
Arguments
- x
can be:
a string giving the path to a PLINK BED or PED file. The associated BIM and FAM files for the BED, or MAP for PED are expected to be in the same directory and have the same file name.
a string giving the path to a RDS file storing a
bigSNP
object from thebigsnpr
package (usually created withbigsnpr::snp_readBed()
)a string giving the path to a vcf file. Note that we currently read the whole vcf in memory with
vcfR
, so only smallish VCF can be imported. Only biallelic SNPs will be considered.a string giving the path to a packedancestry .geno file. The associated .ind and .snp files are expected to be in the same directory and share the same file name prefix.
a genotype matrix of dosages (0, 1, 2, NA) giving the dosage of the alternate allele.
- ...
if
x
is the name of a vcf file, additional arguments passed tovcfR::read.vcfR()
. Otherwise, unused.- valid_alleles
a vector of valid allele values; it defaults to 'A','T', 'C' and 'G'.
- missing_alleles
a vector of values in the BIM file/loci dataframe that indicate a missing value for the allele value (e.g. when we have a monomorphic locus with only one allele). It defaults to '0' and '.' (the same as PLINK 1.9).
- backingfile
the path, including the file name without extension, for backing files used to store the data (they will be given a .bk and .RDS automatically). This is not needed if
x
is already an .RDS file. Ifx
is a .BED or a VCF file andbackingfile
is left NULL, the backing file will be saved in the same directory as the bed file, using the same file name but with a different file type (.bk rather than .bed). Ifx
is a genotype matrix andbackingfile
is NULL, then a temporary file will be created (but note that R will delete it at the end of the session!)- quiet
provide information on the files used to store the data
- parser
the name of the parser used for VCF, either "cpp" to use a fast C++ parser, or "vcfR" to use the R package
vcfR
. The latter is slower but more robust; if "cpp" gives error, try using "vcfR" in case your VCF has an unusual structure.- n_cores
the number of cores to use for parallel processing
- chunk_size
the number of loci or individuals (depending on the format) processed at a time (currently used if
x
is a vcf or packedancestry file)- indiv_meta
a list, data.frame or tibble with compulsory columns 'id' and 'population', plus any additional metadata of interest. This is only used if
x
is a genotype matrix. Otherwise this information is extracted directly from the files.- loci
a data.frame or tibble, with compulsory columns 'name', 'chromosome', and 'position','genetic_dist', 'allele_ref' and 'allele_alt'. This is only used if
x
is a genotype matrix. Otherwise this information is extracted directly from the files.- ploidy
the ploidy of the samples (either a single value, or a vector of values for mixed ploidy). Only used if creating a gen_tibble from a matrix of data; otherwise, ploidy is determined automatically from the data as they are read.
Details
VCF files: the fast
cpp
parser is used by default. Bothcpp
andvcfR
parsers attempt to establish ploidy from the first variant; if that variant is found in a sex chromosome (or mtDNA), the parser will fail with 'Error: a genotype has more than max_ploidy alleles...'. To successful import such a VCF, change the order of variants so that the first chromosome is an autosome using a tool such asvcftools
. Currently, only biallelic SNPs are supported. If haploid variants (e.g. sex chromosomes) are included in the VCF, they are not transformed into homozygous calls. Instead, reference alleles will be counted as 0 and alternative alleles will be counted as 1.packedancestry files: When loading packedancestry files, missing alleles will be converted from 'X' to NA