Query the human genome
Functional Impact Assessment
The search bar accommodates inputs in the format "chr:pos" for defining the variant of interest or by specifying the gene name. The application queries functionally annotated data from six patients as outlined in the referenced paper. Genes in the database adhere to the RefGene naming format, available for reference at UCSC Genome Browser.
Rendered Plot Description
- The rendered plot on this tab comprises two subplots. The bottom plot displays genes within the specified region, as defined by the search box and flanking parameters. The top plot visualizes variants within the defined region. Each rectangle in the plot represents a variant, with the x-axis indicating the position in the genome and the y-axis representing the normalized expression in SuRE.
- The bottom of the rectangle starts from the minimum expressing allele of the variant. For instance, at position chr12:128797635 where the reference allele is T and C is the alternate allele, if T has a mean SuRE expression of 75.32 and C has a mean expression of 2.47, then the rectangle will be positioned at chr12:128797635 on the x-axis. The bottom of the rectangle starts at 2.47 and extends up to 75.32, visually representing the impact of the variant. The opacity and thickness (along the x-axis) of the rectangle indicate the significance of the variant.
- The color of each rectangle indicates whether the maximum expressing allele for a variant is the reference allele or alternate allele. This interactive visualization allows users to click on variants of interest to center the plot around the selected variant.
Variant Data Overview
In this tab, you'll find comprehensive details of all variants under assessment in the Impact Assessment tab. The table includes the following columns:
- Chromosome: Indicates the chromosome where the variant is located.
- Position: Specifies the genomic position of the variant according to the human hg38 reference genome.
- REF: Indicates the reference allele at the variant's position.
- ALT: Indicates the alternate allele observed at the variant's position.
- rsID: The rsID, if available in dbSNP150, assigned to the variant.
- Population allele frequency gnomAD 3.1.2: Provides the allele frequency of the variant in the gnomAD 3.1.2 database across various populations.
- Genotype in SuREXX: Indicates the genotype of the variant in each SuREXX sample (e.g., SuRE38, SuRE57, etc.).
- ALT allele coverage: Specifies the coverage (number of fragments) for the alternate allele.
- REF allele coverage: Specifies the coverage (number of fragments) for the reference allele.
- REF allele mean expression: Indicates the mean expression level of the reference allele across samples.
- ALT allele mean expression: Indicates the mean expression level of the alternate allele across samples.
- p-value: Indicates the p-value calculated based on the Wilcoxon rank sum test between the reference and alternate alleles for each variant.
- Description: Indicates if the variant is an raQTL or not.
You can download this table using the provided download button. When a variant is selected, it will be highlighted with a yellow background.
For detailed information on the calculation of these values, please refer to the cited referenced paper.
Gene Expression Overview
In this tab, we visualise gene expression of genes within the window shown in the Functional Impact Assessment tab. The data plotted below is a collection of transcriptomic data from multiple sources, including ENCODE, GEO, and ArrayExpress. The data encompasses gene expression in:
- Fetal human heart at developmental stages.
- Fetal tissues during development.
- Healthy (non-failing) adult human heart.
- In-vitro differentiated cardiomyocytes and undifferentiated human embryonic stem cell (H1).
- AC16 cell line where SuRE constructs are transfected.
- Human neural crest cells, an approximation of human neural crest cells that migrate from the neural tube and enter the heart from the pharyngeal arches.
We highlight the gene expression of TBX18 and GATA4, which are known cardiac stem cell marker genes expressed highly during development and HNF4A, which is liver specific gene to show contrast.
Data Sources and Descriptions
- PGA5W - PGA25W: scRNAseq data averaged over all cells from GEO accession GSE106118. PGA stands for post gestation age and can be compared to Carnegie stages by the table mentioned here. This range covers stages from 5 weeks to 25 weeks post-gestation.
- Adrenal, Brain, Kidney, Liver, Lower limb, Lung, Palate, RPE (Eye), Stomach, Testes, Thyroid, Tongue, Upper limb, Ventricle (Heart left ventricle): Data from ArrayExpress - Functional Genomics Data accession number E-MTAB-3928, from fetal human embryo CS 19-20.
- Adult.Left.Ventricle: Average of gene expression across 14 healthy non-failing left ventricle hearts of individuals in the study GEO accession GSE116250.
- hESC, hVCM: Data from GEO accession GSE186958. hESC refers to human embryonic stem cells, and hVCM refers to in vitro differentiated ventricular cardiomyocytes at day 80 from hESC.
- AC16: Data from GEO accession GSE109716.
- HumanNCC: Human neural crest cells from ENCODE, GEO accession GSM5330586. These are from a female embryo (5 days) neural crest in vitro differentiated cells originated from neural crest cell, i.e., hESCs H9 derived neural crest cells.
SuRE Profiles
In this tab, we explore how different datasets complement our understanding of the functional SuRE data showcased in the first tab.
- SuRE Profiles of Patients: We start by examining SuRE profiles from congential heart disease patients, as discussed in the cited referenced paper. These profiles offer insights into how the genomic region behaves functionally within individuals.
- AC16 ATACseq : Since SuRE libraries were introduced into the AC16 Human Cardiomyocyte Cell Line, we also examine AC16 ATACseq data that highlights regions of open chromatin within the endogenous AC16 genome.
- PhasCon Score : BigWig track representing conservation scores across 30 mammalian species for the region of interest, highlighting evolutionary conservation patterns within the locus.
Transcription Factor Binding Site Impact (TFBSi)
When a variant is selected in Functional Impact Assessment tab, this section aims to assess any potential disruptions to transcription factors (TFs) resulting from the selected variant.
Analysis Details
- Predicted Alignment: This feature showcases alignments of TFs to both reference (REF) and alternate (ALT) sequences for which the TFBS is affected by the selected variant.
- Predicted Affinity Change: Right below the predicted alignment you will find tabulated information about the above mentioned TF like motif Id, motif name, TF class , TF family etc. It also gives TFBSi score for change in TF binding affinity - REF vs ALT.
For a deeper understanding of the methodology and calculations utilized in predicting affinity changes, it is recommended to refer to the cited referenced paper.
Genome Aggregation Database (gnomAD) Viewer
When a variant is selected in Functional Impact Assessment tab, this tab serves as a gateway to explore the Genome Aggregation Database (gnomAD), providing comprehensive insights into the characteristics of the selected variant.
Key Features
- Allele Distribution: Visualizes the distribution of the reference (REF) and alternate (ALT) alleles across diverse populations, offering valuable insights into allele frequencies.
- Age Distribution: Presents the age distribution of individuals carrying the variant, aiding in understanding its prevalence across different age groups.
- Additional Information: Provides supplementary details such as Ensembl Variant Effect Predictor (VEP) analysis, Combined Annotation Dependent Depletion (CADD), phyloP scores, etc., enhancing understanding of the variant's potential functional impact.
Explore gnomAD: gnomAD
ClinVar Viewer
When a variant is selected in Functional Impact Assessment tab, this tab enables exploration of the ClinVar database, a publicly accessible archive containing reports of human variations classified for diseases and drug responses, along with supporting evidence. ClinVar meticulously processes submissions reporting variants found in patient samples, classifications for diseases and drug responses, information about the submitter, and other pertinent data. Variants are classified into categories such as Likely Pathogenic, Pathogenic, Benign, among others. For further information on the classification criteria, please refer to ClinVar's classification documentation.
Explore ClinVar: ClinVar
Uploaded Tracks Viewer
This tab provides an overview of the files uploaded by the user. The tool supports three input formats: TSV, BED, and BigWig. Each file type is used for specific purposes and must adhere to specific format requirements, as outlined below:
- TSV Files (MPRA Data):
These files must contain exactly 7 columns:
- Chromosome: The chromosome where the variant is located. Must start with "chr_".
- Position: The exact position of the variant (numeric).
- Reference Allele: The original allele at the position.
- Alternate Allele: The mutated allele at the position.
- Reference Signal: The MPRA signal for the reference allele.
- Alternate Signal: The MPRA signal for the alternate allele.
- P-value: The statistical significance of the variant in the MPRA data.
When viewing a variant in the SuRE data, if an MPRA TSV file has been uploaded, the application cross-checks whether the variant is also present in the uploaded MPRA data. If the variant is found, the system highlights it with a yellow line segment, making it easy to spot its position in the uploaded data. This highlighting behavior is consistent with the approach used in the Functional Impact Assessment tab.
- BigWig Files: These binary files store genomic data and are processed using the rtracklayer package in R. A small portion of the file is read to ensure it is valid and accessible. If any issues are encountered, an error message will be displayed.
- BED Files:
These files must contain at least 3 columns with no header. The first three columns should include:
- Chromosome
- Start Position
- End Position
- To extract data for a specific chromosome, use a tool like bcftools or bedtools.
- For BigWig files, use the
bigWigToBedGraph
utility to convert them to BedGraph, filter for your region of interest, and then convert back to BigWig. - For BED files, extract rows containing the chromosome or region of interest using tools like
bedtools
,awk
or a text editor.
It is not advisable to upload more than 10 files at once. Viewing too many files can result in a cluttered and difficult-to-read visualization.