Skip to contents

A gen_tibble stores genotypes for individuals in a tidy format. DESCRIBE here the format

Usage

gen_tibble(
  x,
  ...,
  valid_alleles = c("A", "T", "C", "G"),
  missing_alleles = c("0", "."),
  backingfile = NULL,
  quiet = FALSE
)

# S3 method for class 'character'
gen_tibble(
  x,
  ...,
  parser = c("vcfR", "cpp"),
  n_cores = 1,
  chunk_size = NULL,
  valid_alleles = c("A", "T", "C", "G"),
  missing_alleles = c("0", "."),
  backingfile = NULL,
  quiet = FALSE
)

# S3 method for class 'matrix'
gen_tibble(
  x,
  indiv_meta,
  loci,
  ...,
  ploidy = 2,
  valid_alleles = c("A", "T", "C", "G"),
  missing_alleles = c("0", "."),
  backingfile = NULL,
  quiet = FALSE
)

Arguments

x

can be:

  • a string giving the path to a PLINK BED or PED file. The associated BIM and FAM files for the BED, or MAP for PED are expected to be in the same directory and have the same file name.

  • a string giving the path to a RDS file storing a bigSNP object from the bigsnpr package (usually created with bigsnpr::snp_readBed())

  • a string giving the path to a vcf file. Note that we currently read the whole vcf in memory with vcfR, so only smallish VCF can be imported. Only biallelic SNPs will be considered.

  • a string giving the path to a packedancestry .geno file. The associated .ind and .snp files are expected to be in the same directory and share the same file name prefix.

  • a genotype matrix of dosages (0, 1, 2, NA) giving the dosage of the alternate allele.

...

if x is the name of a vcf file, additional arguments passed to vcfR::read.vcfR(). Otherwise, unused.

valid_alleles

a vector of valid allele values; it defaults to 'A','T', 'C' and 'G'.

missing_alleles

a vector of values in the BIM file/loci dataframe that indicate a missing value for the allele value (e.g. when we have a monomorphic locus with only one allele). It defaults to '0' and '.' (the same as PLINK 1.9).

backingfile

the path, including the file name without extension, for backing files used to store the data (they will be given a .bk and .RDS automatically). This is not needed if x is already an .RDS file. If x is a .BED or a VCF file and backingfile is left NULL, the backing file will be saved in the same directory as the bed file, using the same file name but with a different file type (.bk rather than .bed). If x is a genotype matrix and backingfile is NULL, then a temporary file will be created (but note that R will delete it at the end of the session!)

quiet

provide information on the files used to store the data

parser

the name of the parser used for VCF, either "cpp" to use a fast C++ parser, or "vcfR" to use the R package vcfR. The latter is slower but more robust; if "cpp" gives error, try using "vcfR" in case your VCF has an unusual structure.

n_cores

the number of cores to use for parallel processing

chunk_size

the number of loci or individuals (depending on the format) processed at a time (currently used if x is a vcf or packedancestry file)

indiv_meta

a list, data.frame or tibble with compulsory columns 'id' and 'population', plus any additional metadata of interest. This is only used if x is a genotype matrix. Otherwise this information is extracted directly from the files.

loci

a data.frame or tibble, with compulsory columns 'name', 'chromosome', and 'position','genetic_dist', 'allele_ref' and 'allele_alt'. This is only used if x is a genotype matrix. Otherwise this information is extracted directly from the files.

ploidy

the ploidy of the samples (either a single value, or a vector of values for mixed ploidy). Only used if creating a gen_tibble from a matrix of data; otherwise, ploidy is determined automatically from the data as they are read.

Value

an object of the class gen_tbl.

Details

  • VCF files: the fast cpp parser is used by default. Both cpp and vcfR parsers attempt to establish ploidy from the first variant; if that variant is found in a sex chromosome (or mtDNA), the parser will fail with 'Error: a genotype has more than max_ploidy alleles...'. To successful import such a VCF, change the order of variants so that the first chromosome is an autosome using a tool such as vcftools. Currently, only biallelic SNPs are supported. If haploid variants (e.g. sex chromosomes) are included in the VCF, they are not transformed into homozygous calls. Instead, reference alleles will be counted as 0 and alternative alleles will be counted as 1.

  • packedancestry files: When loading packedancestry files, missing alleles will be converted from 'X' to NA