SVSNiPeR
Structural Variations detection using SNP genotyping data in R
Last updated 2021-11-02
What this package does
The SVSNiPeR package provides helper functions to use in SNP genotyping array data analysis. The goal is to detect genomic regions subject to structural variations.
How to install this package
To download SVSNiPeR, use the following command :
devtools::install_git("https://forgemia.inra.fr/jonathan.kitt/svniper")
How to use this package
Single genotyping array
If your projects consists of a single genotyping array, use the following analysis pipeline.
library(svsniper)
1) Read list of SNPs and list of samples
The analysis requires a list of physical positions for the SNPs, as shown below:
probeset_id | chromosome | position |
---|---|---|
AX-12345678 | chr1A | 123456 |
We also recommend you use a list of genotyped samples, as show below:
unique_id | file_name | sample_name | definition |
---|---|---|---|
id01 | sample01.CEL | sample01 | reference |
id02 | sample02.CEL | sample02 | sample |
A genotyped sample is defined either as a sample, or as a reference, which will be used to normalise calculatations in further steps.
1) Read Axiom output files
Three files are obtained using the Axiom genotyping pipeline:
- AxiomGT1.calls.txt
- AxiomGT1.confidences.txt
- AxiomGT1.summary.txt
To read these files, use the following commands:
axiom_calls <- readr::read_tsv(path_to_axiom_calls_file)
axiom_confidences <- svsniper::read_confidences(path_to_axiom_confidences_file)
axiom_summary <- svsniper::read_summary(path_to_axiom_summary_file)
Optional step: filter SNPs
You may want to remove SNPs with bad confidence scores, and/or, depending on
the type of analysis you want to run, SNPs with high minor allele frequencies.
In order to filter out SNPs, three functions are available:
a) svsniper::count_confidences(axiom_confidences, threshold = 0.15
This function will count the number of samples with a confidence score above the
defined threshold (defaults to 0.15, the value used in the Affymetrix Axiom
tools), and returns a table as shown below:
probeset_id | threshold_pass | threshold_fail |
---|---|---|
AX-12345678 | 96 | 0 |
b) svsniper::count_alleles(axiom_calls)
This function will return a table as shown below:
probeset_id | count_aa | count_ab | count_bb | count_na | count_otv |
---|---|---|---|---|---|
AX-12345678 | 41 | 2 | 53 | 0 | 0 |
c) svsniper::calculate_maf(allele_count)
This function takes as argument a table obtained using the svsniper::count_alleles()
function, and returns a table as show below:
probeset_id | count_aa | count_ab | count_bb | count_na | count_otv | maf |
---|---|---|---|---|---|---|
AX-12345678 | 41 | 2 | 53 | 0 | 0 | 0.436 |
The count_alleles
and calculate_maf
functions can be called in a pipe :
svsniper::count_alleles(axiom_calls) %>% svsniper::calculate_maf()
We recommend saving a list of filtered SNPs for use in downstream analysis.
2) Extract a and b signal values
In order to calculate the signal intensity, we must first extract a and b signal values for each SNP and each genotyped sample. We can then remove the SNPs we filtered out in the previous step
signal_a <- svsniper::extract_a(axiom_summary)
signal_b <- svsniper::extract_b(axiom_summary)