The function filterVars
filters VCF files based on user definable
quality parameters. It sequentially imports each VCF file into R, applies the
filtering on an internally generated VRanges
object and then writes
the results to a new subsetted VCF file. The filter parameters are passed on to
the corresponding argument as a character string. The function applies this
filter to the internally generated VRanges
object using the standard
subsetting syntax for two dimensional objects such as: vr[filter, ]
.
The parameter files (filter_gatk.param
, filter_sambcf.param
and
filter_vartools.param
), used in the filtering steps, define the paths to
the input and output VCF files which are stored in new SYSargs
instances.
Filter variants called by GATK
The below example filters for variants that are supported by $\ge$x
reads and $\ge$80% of them support the called variants. In addition, all
variants need to pass $\ge$x
of the soft filters recorded in the VCF
files generated by GATK. Since the toy data used for this workflow is
very small, the chosen settings are unreasonabley relaxed. A more
reasonable filter setting is given in the line below (here commented
out).
library(VariantAnnotation)
library(BBmisc) # Defines suppressAll()
args <- systemArgs(sysma="param/filter_gatk.param", mytargets="targets_gatk.txt")[1:4]
filter <- "totalDepth(vr) >= 2 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))>=1"
# filter <- "totalDepth(vr) >= 20 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))==6"
suppressAll(filterVars(args, filter, varcaller="gatk", organism="A. thaliana"))
writeTargetsout(x=args, file="targets_gatk_filtered.txt", overwrite=TRUE)
Filter variants called by BCFtools
The following shows how to filter the VCF files generated by BCFtools
using
similar parameter settings as in the previous filtering of the GATK
results.
args <- systemArgs(sysma="param/filter_sambcf.param", mytargets="targets_sambcf.txt")[1:4]
filter <- "rowSums(vr) >= 2 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)"
# filter <- "rowSums(vr) >= 20 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)"
suppressAll(filterVars(args, filter, varcaller="bcftools", organism="A. thaliana"))
writeTargetsout(x=args, file="targets_sambcf_filtered.txt", overwrite=TRUE)
Filter variants called by VariantTools
The following shows how to filter the VCF files generated by VariantTools
using
similar parameter settings as in the previous filtering of the GATK
results.
library(VariantAnnotation)
library(BBmisc) # Defines suppressAll()
args <- systemArgs(sysma="param/filter_vartools.param", mytargets="targets_vartools.txt")[1:4]
filter <- "(values(vr)$n.read.pos.ref + values(vr)$n.read.pos) >= 2 & (values(vr)$n.read.pos / (values(vr)$n.read.pos.ref + values(vr)$n.read.pos) >= 0.8)"
# filter <- "(values(vr)$n.read.pos.ref + values(vr)$n.read.pos) >= 20 & (values(vr)$n.read.pos / (values(vr)$n.read.pos.ref + values(vr)$n.read.pos) >= 0.8)"
filterVars(args, filter, varcaller="vartools", organism="A. thaliana")
writeTargetsout(x=args, file="targets_vartools_filtered.txt", overwrite=TRUE)
Check filtering outcome for one sample
length(as(readVcf(infile1(args)[1], genome="Ath"), "VRanges")[,1])
length(as(readVcf(outpaths(args)[1], genome="Ath"), "VRanges")[,1])