Integration of dilated cardiomyopathy genomics with transcriptomics from the human heart implicates regulatory molecular mechanisms
Authors/Creators
Description
TensorQTL Results for TOPCHeF
This repository contains the complete set of quantitative trait locus (QTL) summary statistics generated using TensorQTL. The dataset includes cis- and trans- eQTL and sQTL analyses, permutation-based significance results, and fine-mapping results using SuSiE.
All large result directories are provided as compressed tar.gz archives.
Directory and File Overview
cis-eQTL Results
cis_eQTL_nominal.tar.gz
Nominal cis-eQTL association results testing genetic variants within a predefined cis-window around each gene.
Typical contents (per-chromosome files):
-
Variant–gene pairs
-
Effect size (beta)
-
Standard error
-
Nominal p-value
These results are intended for downstream filtering, visualization, and coloc analyses.
cis_eQTL_permutation.tar.gz
Permutation-based cis-eQTL results used to estimate gene-level empirical p-values and false discovery rates (FDR).
Typical contents:
-
Gene-level permutation p-values
-
Empirical significance estimates
These files are typically used to identify significantly regulated genes (eGenes).
cis_eQTL_SuSiE.tar.gz
Fine-mapping results for significant cis-eQTLs using SuSiE (Sum of Single Effects).
Typical contents:
-
Credible sets
-
Posterior inclusion probabilities (PIPs)
-
Lead variants per credible set
cis-sQTL Results
cis_sQTL_nominal.tar.gz
Nominal cis-sQTL association results testing genetic variants for associations with splicing phenotypes.
Typical contents:
-
Variant–splicing event pairs
-
Effect size (beta)
-
Standard error
-
Nominal p-value
cis_sQTL_permutation.tar.gz
Permutation-based cis-sQTL results providing event-level empirical p-values and FDR estimates.
Used to identify significantly regulated splicing events (sQTLs).
cis_sQTL_SuSiE.tar.gz
Fine-mapping results for significant cis-sQTLs generated using SuSiE.
Includes credible sets and posterior probabilities for putatively causal variants.
trans-QTL Results
trans_eQTL.tar.gz
Genome-wide trans-eQTL association results testing variants and genes located on different chromosomes or beyond the cis-window..
trans_sQTL.tar.gz
Genome-wide trans-sQTL association results for splicing phenotypes.
File Formats
-
All result files are parquet files (
.parquet). -
Each
tar.gzarchive contains the per-chromosome result files.
Software and Methods
-
QTL mapping was performed using TensorQTL.
-
Fine-mapping was conducted using SuSiE as implemented in TensorQTL workflows.
-
Analyses were performed on normalized gene expression and splicing phenotypes with appropriate covariate adjustment (e.g., genotype PCs, expression PCs, and other technical covariates).
- Summary statistic coordinates are reported accoring to the HG38 human reference genome.
- Human reference genome used for mapping: https://www.gencodegenes.org/human/release_34.html
- A1 (effect allele or minor allele) and A2 (non-effect allele or reference allele) are standardized across the summary statistics.
Intended Use
These data are intended for:
-
Reproducibility of published QTL analyses
-
Secondary analyses and meta-analyses
-
Colocalization and fine-mapping studies
-
Integration with GWAS and other functional genomics datasets
Contact
For questions regarding the dataset or analysis details, please contact the corresponding author listed in the associated manuscript. Or, Connor Murray, PhD (csm6hg@virginia.edu)
Methods/Code
Gene and variant mappability calculation: A concern for cis and trans-QTL mapping is that stretches of similar sequence across distinct regions of the genome will result in alignment errors from short read experiments. Alignment errors can result in the inflation of false positive signals and increase the burden for multiple testing correction, especially for trans-QTL which typically have lower effect sizes (Saha & Battle, 2019). We used the nextflow implementation of crossmapp (https://github.com/porchard/crossmap-nextflow) and the GENCODE v34 GTF, with exon kmer length set to 100 bps, UTR kmer length set to 36 bps, and allowing two mismatches to calculate a gene-level bed file for mappability scores (Orchard et al., 2025). We use the mappability scores for filtering during QTL mapping and QC below.
Expression Quantitative Trait Locus (eQTL) Mapping: To understand how mutations are implicated in expression level (i.e., transcript abundance) of genes between cases and non-failing hearts, we ran TensorQTL v1.0.10 (Taylor-Weiner et al., 2019) using the filtered SNP dataset with a flanking window of ± 1 Mbps and 10,000 permutations enabled to estimate permutation p-values. As an initial covariate model, we included sex, age, disease group (e.g., affected versus non-failing), and cardiomyopathy status (i.e., DCM & ICM), as well as RNA PCs, and the first 5 PCs of ancestry (i.e., DNA PCs). Where sex, disease group, and cardiomyopathy diagnoses are encoded as binary values (0,1). Using this general covariate model, we ran the cis mode to identify all significant eQTL for each gene based on a permutation p-value** < 0.05 while varying the amount of RNA PCs as covariates from 1-100 (ex. RNA PC1 … RNA PCs1-100). We found that the covariate model that uses RNA PCs1-70 resulted in the most significant eGenes and is used for every follow up analysis of eQTL (Supplemental Figure 4A). Additionally, it appears that the saturation points of eGenes (e.g., statistical plateau) was reached somewhere between the inclusions of RNA PCs1-50 and RNA PCs1-75 (Supplemental Figure 4A&B). The cis-nominal mode of TensorQTL was then run to get pairwise-SNP summary statistics for use in colocalization analyses of significant candidate eGenes using the RNA PCs1-70 covariate model to ensure computational efficiency (see colocalization methodology below). The final covariate model we used to map cis-eQTL is as follows:
Variants were considered fine-mapped cis-variants if they appeared in the 95% credible sets output by TensorQTL SuSiE for the relevant cis-eQTL regions. We also investigated trans-eQTL, or distal loci influencing a gene’s expression, to understand long-range interactions between variants and gene expression profiles. We used the trans mode of TensorQTL using the covariate model listed above to estimate trans-eQTL (MAF > 0.01) and only retained significant regions based on a conservative genome-wide cut-off (p < 5 x 10^-8^). We summarized the count of significant eGenes and eVariants to see if there are any regions showing support for being hotspots and report on their trans-eQTL profiles.
Splicing quantitative trait loci (sQTL) mapping: We assessed splicing-QTL (sQTL) by first running regtools junctions extract v0.5.2 (Cotto et al., 2023) to get the counts of every exon-exon junction with a minimum intron length of 50 bps (-m 50), and an anchor length of 8 (-a 8) across each bam file. We then clustered the introns found among the junction files using the leafcutter v0.2.9 (Li et al., 2018) toolset. We used the cluster_prepare_fastqtl.py script (https://github.com/broadinstitute/gtex-pipeline/blob/master/qtl/leafcutter/src/cluster_prepare_fastqtl.py), while considering any intron less than 500 kbps (-l 500000), any cluster with at least 10 reads (--min_clu_reads 10), and kept a cluster that had at least 0.001 fraction of reads supporting a junction (--min_clu_ratio 0.001). We removed clusters with no counts in most of the samples, low complexity clusters, and normalized the matrix, and then each splicing phenotype was inverse normalized. This normalized matrix was used as input to the cis-sQTL scans, and splicing PCs were used as scan covariates. We also added gene information from the collapsed GENCODE GTF v34, and for each gene, variants within 1 Mbps of the gene’s TSS were tested. Gene TSS locations were determined using pyqtl’s gtf_to_tss_bed function. Trans-sQTL were tested with the same covariates as the cis-sQTL and restricted to MAF > 0.05 and only retained significant sites after a conservative Bonferroni correction based on the number of unique isoforms tested (α = 5 x 10^-8^ / 80,786).
Access information
Other publicly accessible locations of the data:
Abstract (English)
Heart failure (HF) is a leading global cause of morbidity and mortality, yet the regulatory molecular mechanisms that link genetic variation to cardiac dysfunction remain elusive. To bridge this gap, we created the Trans-Omics for Precision Medicine in Congestive Heart Failure (TOPCHeF) resource, a multi-omics dataset comprising >700 human left-ventricular tissue samples, including dilated cardiomyopathy (DCM), ischemic cardiomyopathy (ICM), and non-failing controls, with paired whole-genome and RNA sequencing. By mapping expression- (eQTL) and splicing- (sQTL) quantitative trait loci directly in diseased human hearts, we identified over 10,000 transcripts with significant eQTL and 8,600 isoforms with significant sQTL, across both coding and non-coding genes, many of which overlap loci previously associated with HF and emerging novel gene associations. Single-locus colocalization with a largescale DCM genome-wide association study revealed 21 expression and 17 splicing-QTL that share causal variants with disease risk. These include known Mendelian cardiomyopathy risk genes such as FLNC and ACTN2, and novel regulatory candidates like CAMK2D, LMF1, MYOZ1, SKI, SYNPO2L, and TKT. Several loci also showed coordinated effects on both gene expression and RNA splicing, implicating calcium signaling, cytoskeletal organization, and metabolic pathways in HF pathogenesis. Together, these results help define the regulatory landscape of the failing human heart and establish TOPCHeF as a foundational resource for connecting genetic variation to transcriptional and splicing molecular mechanisms in HF research.
Files
Files
(15.3 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:6cfee90b66716930f02b23741f0a4fc6
|
2.7 GB | Download |
|
md5:75a418d0c9f485f3a0ffd398eec9a0f5
|
2.0 MB | Download |
|
md5:f494cbd0d8151f7abb5f66b43f9ba23f
|
3.6 MB | Download |
|
md5:229e7c452644235161ce8785bb0dd6da
|
9.4 GB | Download |
|
md5:a60edd22d3d7e82d3358347a953572b2
|
8.4 MB | Download |
|
md5:349833de192bf494e19f799d4c97dc20
|
3.0 MB | Download |
|
md5:1db3ff801a0b96e1674f9181b1d8e2e0
|
2.7 GB | Download |
|
md5:7d51c77fcb611b39808cab3d8ef7105a
|
475.1 MB | Download |
Additional details
Funding
Dates
- Available
-
2025-12-14TOPCHeF Dataset Available
Software
- Repository URL
- https://github.com/connor122721/nf-eqtls