SNPnexus for Covid-19

SNPnexus has a wide-ranging user-base, spanning a multitude of institutes worldwide, many of whom are working actively on translational COVID-19 projects. SNPnexus for COVID is a cutting-edge analytical platform dedicated to enhancing the clinical value of COVID-19 sequencing projects by providing researchers with easy access to diverse multifactorial datasets and information resources that typically require substantial time and computational power to mine.

SNPnexus for COVID allows users to analyse the functional implication of individual variants in COVID-19 patient genomes and to prioritise these variants based on sequences that demonstrate clinical utility for the prevention, management and/or treatment of COVID-19. It can also provide unique insights into variations in the pathogenesis of COVID-19 patient cohorts among distinct patient populations.

SNPnexus for COVID offers significant scope for the acceleration of biomarker and drug discovery efforts within a timeframe that is clinically relevant for patients and national healthcare services. Such insights are essential to inform strategies for risk management within vulnerable populations and to shape clinical decision-making on candidate therapeutics.

Currently, SNPnexus for Covid supports data from the GRCh38/hg38 human genome assembly. The underlying database gathers data from different sources however the main sources are UCSC and Ensembl. You can download a table with links to the original sources here.

The table below describes all the data sources for this SNPnexus release:

Category GRCh38/hg38
Source Update time
Known SNP information Ensembl Variation 95;dbSNP 151Jan 2019
Gene Annotation RefSeq UCSC hg38Mar 2019
Ensembl Ensembl 95Jan 2019
UCSC UCSC hg38Nov 2018
CCDS UCSC hg38Mar 2019
Genotype-Tissue Expression GTExv8 (Aug 2019)
Protein Interactions IntActSep 2020
DrugBank Jul 2020
Reactome Pathways ReactomeAug 2019
Protein Effect SIFT SIFT (Ensembl Variation 95) Jan 2019
PolyPhen PolyPhen-2 (Ensembl Variation 95) Jan 2019
Population Data HapMap HapMap (Ensembl Variation 95) Nov 2018
1000 Genomes 1000 Genomes (Ensembl Variation 95) Nov 2018
gnomAD Exome Data Ensembl Variation Genotype (gnomad v2.1) Mar 2019
gnomAD Genome Data Ensembl Variation Genotype (gnomad v2.1) Mar 2019
Gene Annotation miRBASE v.22.1Mar 2018
CpG Islands UCSC hg38Mar 2019
TarBase miRNA Ensembl Variation 95Dec 2018
Other RNAs UCSC hg38Nov 2018
ENCODE regions Ensembl Regulatory Building 95Dec 2018
RoadMap Epigenomics Ensembl Regulatory Building 95Dec 2018
Ensembl Regulatory Build Ensembl Regulatory Building 95Dec 2018
Phenotype/Disease Association Human Phenotype Ontology HPOAug 2020
ClinVar NCBI hg38Mar 2020
GWAS UCSC hg38Oct 2020
Conserved Elements PhastConsElements UCSC hg38Sep 2015
GERP++ Ensembl Compara 95Nov 2018
Structural Variations UCSC hg38Sep 2016
Non-coding Variation Scoring CADD v1.5
FunSeq2 v2.1.6

SNPnexus for Covid currently accepts query input data in three different forms (genomic position, chromosomal region or dbSNP id). Users can annotate a single SNP, insertion/deletion (InDel) or block substitution by selecting one of the input formats and supplying the required data into the graphical interface. It also allows users to run batch queries by uploading the appropriately formatted input file or pasting the queries into the interface. Finally, this version allows the user to upload up to 12 batch queries using the "Multiple Samples" option. Each of these queries must be on a separate file and follow the SNPnexus format. The formats are explained in more details below.

At this time, SNPnexus for Covid accepts inputs aligned to the GRCh38/hg38 human genome assembly

Genomic Position

Users can annotate a newly discovered variant by providing the following data into the interface: type (Chromosome/Contig/Clone), name, relative position, reference nucleotide/s (Allele1), observed nucleotide/s (Allele2), positive (1) or negative (-1) strand. One-based coordinate system is used to describe genomic position. At this time, multi-allelic variations are not supported in this versio of SNPnexus for Covid.

Insertions and Deletions (InDels) and Block Substitutions. The tool supports insertions and deletions by using "-" as the placeholder. Users need to insert Allele1=- to indicate Allele2 insertion in the corresponding genomic position. Similarly, Allele2=- can be used to denote deletion of Allele1 from the given genomic position. Similar to single nucleotide substitution, the tool also supports block substitution when the user provides Allele1 and Allele2 data of same or different length.

Here are few examples on hg38 assembly:

Type Id Position Alelle1 Allele2 Strand
Chromosome 1 942451 T C 1
Chromosome 3 9810376 - GAT 1
Chromosome 7 25226951 TA GTT 1
Contig GL000006.2 21916451 A G 1
Clone AL606500.8 119473 GCT - 1

Note that, the tool supports multiple nucleotides in place of Allele1 and Allele2. However, for practical reasons, users are not encouraged to provide very large blocks that can possibly positioned over more than one adjacent functional regions, i.e., adjacent intronic and exonic region, in which case the predicted functionality of the SNP provided by our tool will be based on the first functional region.

Chromosomal Region

Users can query for known SNPs in a given chromosomal region (up to 1Mb) by providing the following data: Chromosome, start position, end position. The tool will identify and annotate all the known SNPs defined in the selected region. Here are few examples:

Chromosome Start End
3 9798000 9799000
1 100000000 100050000

dbSNP rs#

Users can also query for known SNPs by providing the corresponding dbSNP rs identifiers. Here are few examples of dbSNP rs#:

dbSNP rs#
rs293794
rs1052133
rs3136820
rs2272615
rs2953993
rs1799782
rs25487
rs2248690
rs4918
rs1071592

Batch Query

SNPnexus for Covid allows users to submit batch queries when dealing with large numbers of variations. Users can either paste the variants list directly into the designed text space or upload a file containing the queries. SNPnexus for Covid currently supports uploading one batch query or multiple batch queries, imposing a 200,000 variants per file limit. Batch queries should be VCF files or Text Tab-delimited files in the format specified below. Single batch or multiple batch queries can be uploaded zipped or gzipped if needed.

Tab-delimited Text

We only allow batch query using genomic position and/or dbsnp rs# formats. No chromosomal region query data is allowed. Each variant must be on a new line with tab-delimited data in one of the following formats:

< Type    Name    Position    Allele1    Allele2    Strand >         # Genomic position data for novel SNPs
< "dbsnp"    rs# >                                                   # dbSNP rs number for known SNPs
                                    

Example of a batch query is shown below, which one can paste directly into the textarea provided in the interface:

Chromosome  1   100002626   A   T   1
Contig  NT_023736   2025395 A   T   1
Clone   AC105270    154799  A   T   1
dbsnp   rs293794
dbsnp   rs1052133
                                    

Alternatively, users can upload batch query files (.txt) like this example. Note that, known SNPs must be preceded by keyword "dbsnp" to be recognized as dbSNP rs#.

Variant Call Format (VCF) File

Variant Call Format (VCF) is a flexible and extendable standard format for variation data. SNPnexus allows users to upload VCF files (.vcf), containing SNPs,InDels and Block substitutions, directly onto the server. An example input VCF file is shown below:

##fileformat=VCFv4.1
##fileDate=20121001
#CHROMPOSIDREFALTQUALFILTERINFO
chr39798773rs1052133CG...
chr39791667.AGA-...
chr1650763779.-C...
chr201230237.T....
chr201234567.GTCG...
chr201234568.TTA...

This example shows in order a simple SNP, a variant at which two alternate alleles are called, a deletion of 3 bases (AGA), an insertion of one base (C), a monomorphic reference with no alternate alleles which will eventually be ignored by the tool, a deletion of 2 bases (TC), and an insertion of one base (A).

A VCF file should contain 8 fixed, mandatory columns as shown by third header lines in the example. SNPnexus for Covid only uses genomic positions (CHROM,POS fields) and allele information (REF, ALT fields) from the input; the other information contained in the input file will be ignored and have no effect on the annotated outcome. The missing values (for insertions or deletions) in the VCF file is presented by '.'. The tool ignores the input lines that don't follow the correct format. Please consult here to know detail about the format.

SNPnexus for Covid allows users to upload batch queries larger than 200,000 variants by using the pre-filters. By selecting a set of genes as pre-filters, the tool will focus the analysis on these genes and ignore variants that don't overlap them (including ± 2Kb upstream/downstream). The user can specify a set of genes, upload a text file with the Ensembl IDs (ENSG) or use one or both of the preset lists of genes linked to SARS-CoV-2 that are available in the web interface. The queries should have less than 200,000 variants after the pre-filtering stage.

The table containing genomic annotations has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
ID: Genomic Position ID <chromosome/contig/clone id,":",position,":","allele",":",strand>
dbSNP: link to dbSNP, if known
Chromosome: Variant mapped chromosome location
Position: Variant start position on chromosome
REF Allele: Reference allele
ALT Allele (IUPAC): Observed allele
Minor Allele: Minor allele observed in global population, if known
Minor Allele Frequency: Minor allele frequency observed in global population, if known
Contig: Variant mapped contig location
contigPosition: Variant start position on contig
Band: SNP cytogenetic location

The table containing information on overlapped or nearest genes has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Variant mapped chromosome location
Position: Variant start position on chromosome
Overlapped Gene: Name of the gene (HGNC system) to which the variant is overlapped
Type: Gene type, e.g., protein coding, miRNA, non coding, Pseudogene, snoRNA, lincRNA etc.
Annotation: Summary of whether the variant overlapped with the coding, intronic or untranslated regions of the various transcript isoforms of the gene, as annotated from Ensembl gene system.
Nearest Upstream Gene: If variant is not overlapped with any gene, then the gene whose end position is nearest to the variant on the left (considering the alignment of genes on the positive strand as left-to-right)
Type of Nearest Upstream Gene: Gene type, e.g., protein coding, miRNA, non coding, Pseudogene, snoRNA, lincRNA etc.
Distance to Nearest Upstream Gene: distance from the end position of the nearest upstream gene.
Nearest Downstream Gene: If variant is not overlapped with any gene, then the gene whose start position is nearest to the variant on the right (considering the alignment of genes on the positive strand as left-to-right)
Type of Nearest Downstream Gene: Gene type, e.g., protein coding, miRNA, non coding, Pseudogene, snoRNA, lincRNA etc.
Distance to Nearest Downstream Gene: distance from the start position of the nearest downstream gene.

The result table containing gene/protein consequences on a particular gene annotation system may have following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Variant: Examined alleles <reference allele,"|", observed allele(s) >. For Insertion, reference allele is "-". For other cases, reference allele is the allele found in reference genome sequence. Observed allele(s) can be multi-allelic separated by "|" depending on the input Allele2. If input Allele1 does not match with reference allele, then Allele1 becomes the first observed allele.
Strand: On which strand the variant is observed (1 or -1)
Symbol: Gene symbol
Gene: Gene name in the corresponding annotation system
Transcript: Transcript name in the corresponding annotation system
Entrez Gene: Entrez gene id
Predicted Function: Predicted function of the SNP/InDel/block substitution based on its location on the transcript. The result is based on the first nucleotide position of the variation. Possible categories: coding, intronic, intronic (splice_site), 5utr, 3utr, 5upstream, 3downstream, non-coding, non-coding intronic, non-coding intronic (splice_site). More detailed information on the predicted function is available on the "Note" column.
CDNA Position: SNP position on cdna, if the predicted function is coding, 3'UTR or 5'UTR
CDS Position: SNP position on cds, if the predicted function is coding
AA Position: Position of the first amino acid (possibly) effected in the resultant peptide chain, if the predicted function is coding
AA Change: Peptide <reference amino acid(s),">", observed amino acid(s)_1 [,"|", observed amino acid(s)_2, ... ] >
Detail (previously Note column): Detailed functional type for the variation. If the variation occurs over a single coding exon of a transcript, the type of the consequences on the corresponding protein is given. Possible values: syn (synonymous), nonsyn (non-synonymous) [stop-gain or stop-loss], frameshift [stop-gain or stop-loss], pepshift (peptide shift, block substitution). Preceded by "*", if the reference protein is found incomplete (missing stop-codon).
However, if the variation occurs over more than one functional regions on the transcript, the corresponding regions are given separated by "-".
Splice Distance: Distance to splice junction, if the predicted function is intronic
Proteins: Reference and observed peptide sequences separated by "|", if the predicted function is coding. Available only in the downloadable text files.

The Genotype/Tissue Expression result table, which contains extracts from the Genotype-Tissue Expression (GTEx) project, comprises the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
GTEx ID: Variant Identifier in GTEx format
REF Allele: Reference allele
ALT Allele: Observed allele
Gene: Name of gene (HGNC system) to which the variant is overlapped
Tissue: Tissue site detail
NES: Normalized effect size (effect of the ALT allele relative to the REF allele in the human genome reference).
eQTL Plot: Links to eQTL multi-tissue comparison plots

The Human Phenotype Ontology result table contains the following columns:

Gene: Name of gene (HGNC system) to which the variant is overlapped
UniProt ID: UniProt identified associated to the Gene
Protein: Name of the protein (UniProt)
HPO ID: ID from Human Phenotype Ontology
Phenotype: Phenotype associated with the UniProt ID

The GWAS catalogue result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Catalogue ID: ID of SNP associated with trait
Region: Chromosome band/region of SNP
Genes: Reported Gene(s)
Allele_frequency: Risk Allele Frequency
Trait: Disease or trait assessed in study
Population: Initial sample population for the study
Platform: Platform and [SNPs passing Quality Control]
Pubmed: Pubmed id of publication of the study

The ClinVar result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Variant mapped chromosome location
Position: Variant start position on chromosome
Variation: Reference to Observed Allele
Type: Type of Variant
Clinical Significance: Whether identified as Pathogenic or Benign or uncertain
Phenotypes: List of phenotypes associated with the variant

The Protein-Drug-Target result table contains the following columns:

Gene: Name of gene (HGNC system) to which the variant is overlapped
Protein: UniProt identifier associated to the Gene
Name: Name of the protein (UniProt) associated to the Gene
Target Protein: UniProt identifier of the protein interacting with the source gene
Target Name: Name of the target protein (UniProt)
Target Type: Type of the target protein (Human, viral or homologue)
Drug ID: DrugBank identified of drug associated with the Target Protein
Drug Name: Name of drug associated with the Target Protein
Type: Type of drug (Small molecule or biotech)
Groups: Group of drug (approved, experimental, investigational, etc)
Action: Action of drug over target protein (potentiator, inhibitor, antagonist, binder, etc)

The Reactome Pathways result table contains the following columns:

Pathway ID: Link to Reactome Pathway
Description: Pathway description
Parent(s): Immediate parents of the Pathway
p-Value: Statistical significance of the Pathway calculated using the Fisher's Exact Test for all the genes involved in the original queryset
Genes Involved: Genes from the original queryset involved in the Pathway
Variation IDs: Variation in the original query affecting the genes involved in the Pathway. Available only in the downloadable text file.

The SIFT result table containing the predicted effect on protein has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
SNP: SNP name
Variant: <reference allele,"/",observed allele>
Transcript: Transcript name in the Ensembl gene annotation system
Gene: Gene name
AA Position: Position of the amino acid affected in the resultant peptide chain
Wild AA: Reference amino acid
Mutant AA: Observed amino acid
Score: SIFT prediction score for non-synonymous substitution of reference amino acid with observed amino acid. Possible real values: 0 to 1.
Prediction: SIFT predicted effect on protein based on the score and SIFT median. Possible values: Deleterious (score <= 0.5 and median > 3.25); Deleterious - Low Confidence (score <= 0.5 and median <= 3.25); Tolerated (score > 0.5 and median > 3.25); Tolerated - Low Confidence (score > 0.5 and median <= 3.25).

The PolyPhen result table containing the predicted effect on protein has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
SNP: SNP name
Variant: <reference allele,"/",observed allele>
Transcript: Transcript name in the Ensembl gene annotation system
Gene: Protein name in the Ensembl gene annotation system
AA Position: Position of the amino acid affected in the resultant peptide chain
Wild AA: Reference amino acid
Mutant AA: Observed amino acid
Score: PolyPhen prediction score for non-synonymous substitution of reference amino acid with observed amino acid. Possible real values: 0 to 1.
Prediction: PolyPhen predicted effect on protein based on the score. Possible values: Probably Damaging (score > 0.908), Possibly Damaging (0.446 < score <= 0.908), Benign (score <= 0.446).

The result table containing the specific Hapmap population data has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Variant mapped chromosome location
Position: Variant start position on chromosome
REF Allele: Reference allele
ALT Allele: Observed allele
ASW Frequency: Percentage of observed samples with the allele in population with African Ancestry in SouthWestern US
CEU Frequency: Percentage of observed samples with the allele in population with Northern and Western Europe Ancenstry from Utah residents
CHB Frequency: Percentage of observed samples with the allele in the Han Chinese in Beijing, China from HapMap phase 3 population
CHD Frequency: Percentage of observed samples with the allele in population with Chinese Ancestry in Metropolitan Denver, US
GIH Frequency: Percentage of observed samples with the allele in the Gujarati Indians in Houston, Texas population
HCB Frequency: Percentage of observed samples with the allele in population from Unrelated Han Chinese in Beijing, China from the International HapMap project
JPT Frequency: Percentage of observed samples with the allele in the Japanese in Tokyo, Japan population
LWK Frequency: Percentage of observed samples with the allele in the Luhya in Webuye, Kenya population
MEX Frequency: Percentage of observed samples with the allele in population with Mexican Ancestery in Los Angeles, US
MKK Frequency: Percentage of observed samples with the allele in the Masai in Kinyawa, Kenya (MKK) population
TSI Frequency: Percentage of observed samples with the allele in the Toscani in Italia population
YRI Frequency: Percentage of observed samples with the allele in the Yoruba in Ibadan, Nigeria population

The result table containing the specific 1000 Genomes Super Population data has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Variant mapped chromosome location
Position: Variant start position on chromosome
REF Allele: Reference allele
ALT Allele: Observed allele
Minor Allele: Allele with the Minor Frequency observed
AFR Frequency: Percentage of observed samples with the allele in the African super population
AMR Frequency: Percentage of observed samples with the allele in the Ad Mixed American super population
EAS Frequency: Percentage of observed samples with the allele in the East Asian super population
EUR Frequency: Percentage of observed samples with the allele in the European super population
SAS Frequency: Percentage of observed samples with the allele in the South Asian super population

The result table containing the specific exome gnomAD Population data has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Variant mapped chromosome location
Position: Variant start position on chromosome
REF Allele: Reference allele
ALT Allele: Observed allele
Minor Allele: Allele with the Minor Frequency observed
AFR Frequency: Percentage of observed samples with the allele in the African/African American population
AMR Frequency: Percentage of observed samples with the allele in the Latino population
EAS Frequency: Percentage of observed samples with the allele in the East Asian population
FIN Frequency: Percentage of observed samples with the allele in the Finnish population
NFE Frequency: Percentage of observed samples with the allele in the Non-Finnish European population
SAS Frequency: Percentage of observed samples with the allele in the South Asian population
OTH Frequency: Percentage of observed samples with the allele in the Other populations

The result table containing the specific genome gnomAD Population data has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Variant mapped chromosome location
Position: Variant start position on chromosome
REF Allele: Reference allele
ALT Allele: Observed allele
Minor Allele: Allele with the Minor Frequency observed
AFR Frequency: Percentage of observed samples with the allele in the African/African American population
AMR Frequency: Percentage of observed samples with the allele in the Latino population
EAS Frequency: Percentage of observed samples with the allele in the East Asian population
FIN Frequency: Percentage of observed samples with the allele in the Finnish population
NFE Frequency: Percentage of observed samples with the allele in the Non-Finnish European population
OTH Frequency: Percentage of observed samples with the allele in the Other populations

The miRBASE result table has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
TFBS ID: TFBS id
Chromosome: Chromosome name
Region Start: Start position of the TFBS site in the chromosome
Region End: End position of the TFBS site in the chromosome
Name: microRNA name
Accession: miRBASE accession number
Strand: + or -
Type / Description: miRNA type. Possible values: mature miRNA, miRNA_primary_transcript

The CpG Island prediction result table has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Region Start: Start position of the TFBS site in the chromosome
Region End: End position of the TFBS site in the chromosome
CpG Island: Name of the CpG Island
Length: Island Length
Cpg%: Percentage of island that is CpG
C/G%: Percentage of island that is C or G
Ratio: Ratio of observed to expected CpG in island

The TargetBase (TarBase) miRNA target sites result table has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Region Start: Start position of the TFBS site in the chromosome
Region End: End position of the TFBS site in the chromosome
Strand: + or -
miRNA: miRNA targeting the site
Accession: miRBASE accession number
Gene: Gene name

The miRNAs/snoRNAs/scaRNAs result table has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Region Start: Start position of the TFBS site in the chromosome
Region End: End position of the TFBS site in the chromosome
Name: Name of the miRNA/snoRNA/scaRNa
Score: Prediction scores. Possible values: 0 to 1000
Strand: + or -
Type: Type of RNA

The ENCODE and Roadmap Epigenomics result tables has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Region Start: Start position of the TFBS site in the chromosome
Region End: End position of the TFBS site in the chromosome
Feature Type Class: Regulatory feature class
Feature Type: Regulatory feature name
Epigenome: Epigenome or cell name

The Ensembl Regulatory Build result table has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Region Start: Start position of the TFBS site in the chromosome
Region End: End position of the TFBS site in the chromosome
Feature Type Class: Regulatory feature class
Epigenome: Epigenome or cell name
Activity: State of activity (hg38)

The Vertebrate Alignment and Conservation (Phast) result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Region Start: Start position of the TFBS site in the chromosome
Region End: End position of the TFBS site in the chromosome
Id: Name of the aligned element
Score: Estimated probability score for conservation as determined from PHAST package. Possible values: 0 to 1000

The Genomic Evolutionary Rate Profiling (GERP++) result table contains the following information:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Region Start: Start position of the TFBS site in the chromosome
Region End: End position of the TFBS site in the chromosome
Element RS Score: Rejected Substitutions score for the conserved element as determined from GERP++ package.
Base RS Score: Rejected Substitutions score calculated per base as determined from GERP++ package.

The CADD result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Position: Variant start position on chromosome
Variant: <reference allele,"/",observed allele> as reported in the tool's genome-wide score
Raw Score: "Raw" unaltered CADD-score for the variation. It has relative meaning, with higher values indicating that a variant is more likely to be simulated (or "not observed") and therefore more likely to have deleterious effects. <
PHRED: PHRED-like (-10*log10(rank/total)) scaled CADD-score ranking a variant relative to all possible substitutions of the human genome. A score≥10 indicates that it is predicted to be in the 10% most deleterious substitutions that you can do to the human genome, a score≥20 indicates the 1% most deleterious and so on.

The funSeq2 result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Position: Variant start position in the chromosome
Variant: <reference allele,"/",observed allele> as reported in the tool's genome-wide score. "." as observed allele indicates any other nucleotide other than reference allele.
Non-coding Score: Given as p-values in the range [0, 1], with higher scores indicating variants predicted as more likely to be functional.

The structural variations result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Chrom Start: Start position of the structural variation in the chromosome
Chrom End: End position of the structural variation in the chromosome
Type: Type of structural variation
Reference: Literature reference for the study that included this variant
PubMed: Pubmed id of publication of the study
Method: Brief description of method/platform
Sample: Description of sample population for the study
Gain: Copy number gains
Loss: Copy number losses

Once the results are completed, and depending on the set of annotations originally selected by the user, SNPnexus supports performing a set of filtering on the results:

Common Variants (available for multi-sample queries): Restrict to variants common across all variant files uploaded
Filtering by Type of Variant: The user can select to show only variants that map to a known dbSNP, show only novel variants, or both.
Filtering by MAF Global Threshold: Only show variants with a Global Allele Frequency lower than the threshold set by the user.
Filtering by Gene(s): Only show variants that overlap the specified genes.
Filtering by Genomic Consequence: Only show variants with specific genomic consequence. Options available are Coding Non-Synonymous, Coding Synonymous, UTR and Intronic.
Filtering by Predicted Effect: Filter variants based on the predicted protein consequence based on the SIFT and PolyPhen predictions. Options available are Benign and Damaging. This option is only available if SIFT or PolyPhen was an annotation selected by the user for the input query.
Filtering by Conserved Region: Filter variants that lay within or outside a conserved region. This option is only available if Phast was an annotation selected by the user for the input query.
Filtering by Phenotype Association: Filter variants that have a known or unknown phenotypic association based on COSMIC and ClinVar data. This option is only available if COSMIC or ClinVar were selected by the user for the input query.
Filtering by Pathway: Only show variants related to specific pathways. This option is only available if Reactome Pathway was selected by the user for the input query.
Filtering by Genotype/Tissue Expression: Present variants associated with a select tissue type on the Genotype-Tissue Expression (GTEx) project
Filtering by Phenotype Association (HPO): Present variants related to a specific phenotypic abnormality as defined by the Human Phenotype Ontology (HPO)
Filtering by Drug-Protein Target (DrugBank): Show targets for a defined drug.