gt_roh_window.Rd
This function uses a sliding-window approach to look for runs of homozygosity (or
heterozygosity) in a diploid genome. This function uses the package selectRUNS
,
which implements an approach equivalent to the one in PLINK.
gt_roh_window(
x,
window_size = 15,
threshold = 0.05,
min_snp = 3,
heterozygosity = FALSE,
max_opp_window = 1,
max_miss_window = 1,
max_gap = 10^6,
min_length_bps = 1000,
min_density = 1/1000,
max_opp_run = NULL,
max_miss_run = NULL
)
the size of sliding window (number of SNP loci) (default = 15)
the threshold of overlapping windows of the same state (homozygous/heterozygous) to call a SNP in a RUN (default = 0.05)
minimum n. of SNP in a RUN (default = 3)
should we look for runs of heterozygosity (instead of homozygosity? (default = FALSE)
max n. of SNPs of the opposite type (e.g. heterozygous snps for runs of homozygosity) in the sliding window (default = 1)
max. n. of missing SNP in the sliding window (default = 1)
max distance between consecutive SNP to be still considered a potential run (default = 10^6 bps)
minimum length of run in bps (defaults to 1000 bps = 1 kbps)
minimum n. of SNP per kbps (defaults to 0.1 = 1 SNP every 10 kbps)
max n. of opposite genotype SNPs in the run (optional)
max n. of missing SNPs in the run (optional)
A dataframe with RUNs of Homozygosity or Heterozygosity in the analysed dataset. The returned dataframe contains the following seven columns: "group", "id", "chrom", "nSNP", "from", "to", "lengthBps" (group: population, breed, case/control etc.; id: individual identifier; chrom: chromosome on which the run is located; nSNP: number of SNPs in the run; from: starting position of the run, in bps; to: end position of the run, in bps; lengthBps: size of the run)
This function returns a data frame with all runs detected in the dataset. This data frame can then be written out to a csv file. The data frame is, in turn, the input for other functions of the detectRUNS package that create plots and produce statistics from the results (see plots and statistics functions in this manual, and/or refer to the detectRUNS vignette).
If the gen_tibble
is grouped, then the grouping variable is used to fill in the
group table. Otherwise, the group 'column' is filled with the same values as the 'id'
column
# don't run the example
if (FALSE) {
sheep_ped <- system.file("extdata", "Kijas2016_Sheep_subset.ped",
package="detectRUNS")
sheep_gt <- tidypopgen::gen_tibble(sheep_ped, backingfile = tempfile(),
quiet=TRUE)
sheep_gt <- sheep_gt %>% group_by(population)
sheep_roh <- gt_roh_window(sheep_gt)
detectRUNS::plot_Runs(runs = sheep_roh)
}