Integration of dilated cardiomyopathy genomics with transcriptomics from the human heart implicates regulatory molecular mechanisms

Murray, Connor

doi:10.5281/zenodo.17932667

Published December 15, 2025 | Version v1

Dataset Open

Integration of dilated cardiomyopathy genomics with transcriptomics from the human heart implicates regulatory molecular mechanisms

Murray, Connor (Data manager)

TensorQTL Results for TOPCHeF

This repository contains the complete set of quantitative trait locus (QTL) summary statistics generated using TensorQTL. The dataset includes cis- and trans- eQTL and sQTL analyses, permutation-based significance results, and fine-mapping results using SuSiE.

All large result directories are provided as compressed tar.gz archives.

Directory and File Overview

cis-eQTL Results

cis_eQTL_nominal.tar.gz
Nominal cis-eQTL association results testing genetic variants within a predefined cis-window around each gene.

Typical contents (per-chromosome files):

Variant–gene pairs
Effect size (beta)
Standard error
Nominal p-value

These results are intended for downstream filtering, visualization, and coloc analyses.

cis_eQTL_permutation.tar.gz
Permutation-based cis-eQTL results used to estimate gene-level empirical p-values and false discovery rates (FDR).

Typical contents:

Gene-level permutation p-values
Empirical significance estimates

These files are typically used to identify significantly regulated genes (eGenes).

cis_eQTL_SuSiE.tar.gz
Fine-mapping results for significant cis-eQTLs using SuSiE (Sum of Single Effects).

Typical contents:

Credible sets
Posterior inclusion probabilities (PIPs)
Lead variants per credible set

cis-sQTL Results

cis_sQTL_nominal.tar.gz
Nominal cis-sQTL association results testing genetic variants for associations with splicing phenotypes.

Typical contents:

Variant–splicing event pairs
Effect size (beta)
Standard error
Nominal p-value

cis_sQTL_permutation.tar.gz
Permutation-based cis-sQTL results providing event-level empirical p-values and FDR estimates.

Used to identify significantly regulated splicing events (sQTLs).

cis_sQTL_SuSiE.tar.gz
Fine-mapping results for significant cis-sQTLs generated using SuSiE.

Includes credible sets and posterior probabilities for putatively causal variants.

trans-QTL Results

trans_eQTL.tar.gz
Genome-wide trans-eQTL association results testing variants and genes located on different chromosomes or beyond the cis-window..

trans_sQTL.tar.gz
Genome-wide trans-sQTL association results for splicing phenotypes.

File Formats

All result files are parquet files (.parquet).
Each tar.gz archive contains the per-chromosome result files.

Software and Methods

QTL mapping was performed using TensorQTL.
Fine-mapping was conducted using SuSiE as implemented in TensorQTL workflows.
Analyses were performed on normalized gene expression and splicing phenotypes with appropriate covariate adjustment (e.g., genotype PCs, expression PCs, and other technical covariates).
Summary statistic coordinates are reported accoring to the HG38 human reference genome.
Human reference genome used for mapping: https://www.gencodegenes.org/human/release_34.html
A1 (effect allele or minor allele) and A2 (non-effect allele or reference allele) are standardized across the summary statistics.

Intended Use

These data are intended for:

Reproducibility of published QTL analyses
Secondary analyses and meta-analyses
Colocalization and fine-mapping studies
Integration with GWAS and other functional genomics datasets

Contact

For questions regarding the dataset or analysis details, please contact the corresponding author listed in the associated manuscript. Or, Connor Murray, PhD (csm6hg@virginia.edu)

Methods/Code

Gene and variant mappability calculation: A concern for cis and trans-QTL mapping is that stretches of similar sequence across distinct regions of the genome will result in alignment errors from short read experiments. Alignment errors can result in the inflation of false positive signals and increase the burden for multiple testing correction, especially for trans-QTL which typically have lower effect sizes (Saha & Battle, 2019). We used the nextflow implementation of crossmapp (https://github.com/porchard/crossmap-nextflow) and the GENCODE v34 GTF, with exon kmer length set to 100 bps, UTR kmer length set to 36 bps, and allowing two mismatches to calculate a gene-level bed file for mappability scores (Orchard et al., 2025). We use the mappability scores for filtering during QTL mapping and QC below.

Expression Quantitative Trait Locus (eQTL) Mapping: To understand how mutations are implicated in expression level (i.e., transcript abundance) of genes between cases and non-failing hearts, we ran TensorQTL v1.0.10 (Taylor-Weiner et al., 2019) using the filtered SNP dataset with a flanking window of ± 1 Mbps and 10,000 permutations enabled to estimate permutation p-values. As an initial covariate model, we included sex, age, disease group (e.g., affected versus non-failing), and cardiomyopathy status (i.e., DCM & ICM), as well as RNA PCs, and the first 5 PCs of ancestry (i.e., DNA PCs). Where sex, disease group, and cardiomyopathy diagnoses are encoded as binary values (0,1). Using this general covariate model, we ran the cis mode to identify all significant eQTL for each gene based on a permutation p-value** < 0.05 while varying the amount of RNA PCs as covariates from 1-100 (ex. RNA PC₁ … RNA PCs_1-100). We found that the covariate model that uses RNA PCs_1-70 resulted in the most significant eGenes and is used for every follow up analysis of eQTL (Supplemental Figure 4A). Additionally, it appears that the saturation points of eGenes (e.g., statistical plateau) was reached somewhere between the inclusions of RNA PCs_1-50 and RNA PCs_1-75 (Supplemental Figure 4A&B). The cis-nominal mode of TensorQTL was then run to get pairwise-SNP summary statistics for use in colocalization analyses of significant candidate eGenes using the RNA PCs_1-70 covariate model to ensure computational efficiency (see colocalization methodology below). The final covariate model we used to map cis-eQTL is as follows:

Variants were considered fine-mapped cis-variants if they appeared in the 95% credible sets output by TensorQTL SuSiE for the relevant cis-eQTL regions. We also investigated trans-eQTL, or distal loci influencing a gene’s expression, to understand long-range interactions between variants and gene expression profiles. We used the trans mode of TensorQTL using the covariate model listed above to estimate trans-eQTL (MAF > 0.01) and only retained significant regions based on a conservative genome-wide cut-off (p < 5 x 10^-8^). We summarized the count of significant eGenes and eVariants to see if there are any regions showing support for being hotspots and report on their trans-eQTL profiles.

Splicing quantitative trait loci (sQTL) mapping: We assessed splicing-QTL (sQTL) by first running regtools junctions extract v0.5.2 (Cotto et al., 2023) to get the counts of every exon-exon junction with a minimum intron length of 50 bps (-m 50), and an anchor length of 8 (-a 8) across each bam file. We then clustered the introns found among the junction files using the leafcutter v0.2.9 (Li et al., 2018) toolset. We used the cluster_prepare_fastqtl.py script (https://github.com/broadinstitute/gtex-pipeline/blob/master/qtl/leafcutter/src/cluster_prepare_fastqtl.py), while considering any intron less than 500 kbps (-l 500000), any cluster with at least 10 reads (--min_clu_reads 10), and kept a cluster that had at least 0.001 fraction of reads supporting a junction (--min_clu_ratio 0.001). We removed clusters with no counts in most of the samples, low complexity clusters, and normalized the matrix, and then each splicing phenotype was inverse normalized. This normalized matrix was used as input to the cis-sQTL scans, and splicing PCs were used as scan covariates. We also added gene information from the collapsed GENCODE GTF v34, and for each gene, variants within 1 Mbps of the gene’s TSS were tested. Gene TSS locations were determined using pyqtl’s gtf_to_tss_bed function. Trans-sQTL were tested with the same covariates as the cis-sQTL and restricted to MAF > 0.05 and only retained significant sites after a conservative Bonferroni correction based on the number of unique isoforms tested (α = 5 x 10^-8^ / 80,786).

Access information

Other publicly accessible locations of the data:

connor122721/nf-eqtls: Nextflow pipeline for disease QTL mapping and follow up analyses

Abstract (English)

Heart failure (HF) is a leading global cause of morbidity and mortality, yet the regulatory molecular mechanisms that link genetic variation to cardiac dysfunction remain elusive. To bridge this gap, we created the Trans-Omics for Precision Medicine in Congestive Heart Failure (TOPCHeF) resource, a multi-omics dataset comprising >700 human left-ventricular tissue samples, including dilated cardiomyopathy (DCM), ischemic cardiomyopathy (ICM), and non-failing controls, with paired whole-genome and RNA sequencing. By mapping expression- (eQTL) and splicing- (sQTL) quantitative trait loci directly in diseased human hearts, we identified over 10,000 transcripts with significant eQTL and 8,600 isoforms with significant sQTL, across both coding and non-coding genes, many of which overlap loci previously associated with HF and emerging novel gene associations. Single-locus colocalization with a largescale DCM genome-wide association study revealed 21 expression and 17 splicing-QTL that share causal variants with disease risk. These include known Mendelian cardiomyopathy risk genes such as FLNC and ACTN2, and novel regulatory candidates like CAMK2D, LMF1, MYOZ1, SKI, SYNPO2L, and TKT. Several loci also showed coordinated effects on both gene expression and RNA splicing, implicating calcium signaling, cytoskeletal organization, and metabolic pathways in HF pathogenesis. Together, these results help define the regulatory landscape of the failing human heart and establish TOPCHeF as a foundational resource for connecting genetic variation to transcriptional and splicing molecular mechanisms in HF research.

Files

Files (15.3 GB)

Name	Size	Download all
cis_eQTL_nominal.tar.gz md5:6cfee90b66716930f02b23741f0a4fc6	2.7 GB	Download
cis_eQTL_permutation.tar.gz md5:75a418d0c9f485f3a0ffd398eec9a0f5	2.0 MB	Download
cis_eQTL_SuSiE.tar.gz md5:f494cbd0d8151f7abb5f66b43f9ba23f	3.6 MB	Download
cis_sQTL_nominal.tar.gz md5:229e7c452644235161ce8785bb0dd6da	9.4 GB	Download
cis_sQTL_permutation.tar.gz md5:a60edd22d3d7e82d3358347a953572b2	8.4 MB	Download
cis_sQTL_SuSiE.tar.gz md5:349833de192bf494e19f799d4c97dc20	3.0 MB	Download
trans_eQTL.tar.gz md5:1db3ff801a0b96e1674f9181b1d8e2e0	2.7 GB	Download
trans_sQTL.tar.gz md5:7d51c77fcb611b39808cab3d8ef7105a	475.1 MB	Download

Additional details

National Institutes of Health
Integrative genomic and transcriptomic investigation of human heart failure mechanisms 5R01HL170012-02

Available: 2025-12-14

TOPCHeF Dataset Available

Repository URL: https://github.com/connor122721/nf-eqtls

	All versions	This version
Views	190	190
Downloads	156	156
Data volume	442.3 GB	442.3 GB

TensorQTL Results for TOPCHeF

Directory and File Overview

cis-eQTL Results

cis-sQTL Results

trans-QTL Results

File Formats

Software and Methods

Intended Use

Contact

Methods/Code

Access information

Files (15.3 GB)

Funding

Dates

Software

Integration of dilated cardiomyopathy genomics with transcriptomics from the human heart implicates regulatory molecular mechanisms

Authors/Creators

Description

TensorQTL Results for TOPCHeF

Directory and File Overview

cis-eQTL Results

cis-sQTL Results

trans-QTL Results

File Formats

Software and Methods

Intended Use

Contact

Methods/Code

Access information

Abstract (English)

Files

Files (15.3 GB)

Additional details

Funding

Dates

Software