Preprint
Article

This version is not peer-reviewed.

Discovery and Validation of Molecular Biomarkers Contributing to the Risk of Coronary Artery Disease

Submitted:

01 April 2026

Posted:

02 April 2026

You are already at the latest version

Abstract
Background and Aims: This study aimed to systematically search for molecular biomarkers that contribute to the risk of coronary artery disease (CAD). Methods and Results: A SNP-based multiomics data analysis plan was used to identify biomarkers contributing to the risk of CAD through a two-step discovery and validation design. By integrating CAD GWAS data with epigenome, transcriptome, and proteome quantitative trait loci (QTLs) from blood, 44 CpG sites, 37 transcripts, and 27 protein biomarkers were identified contributing to the risk of CAD. The identified biomarkers shared interactions and were enriched in lipid metabolism-related processes. The PCSK9 protein was under the regulatory impact of the APOC1, GZMA, and GRN proteins. The impact of SMARCA4 and PSRC1 transcripts on CAD were mediated through lipids, whereas the influence of the FES transcript on the risk of CAD was attributed to blood pressure. Finally, while 53% of the transcripts identified through the discovery stage were validated, this ratio was 20% for the protein biomarkers and 24% for the CpG sites. Conclusions: This study identified biomarkers contributing to the risk of CAD through a two-step discovery and validation analyses; furthermore, it provided insights into the paths by which several biomarkers influence the risk of CAD and underlined the efficiency of transcriptome platforms in identifying biomarkers.
Keywords: 
;  ;  ;  ;  ;  

Introduction

Following the era of Genome-Wide Association Studies (GWAS), functional studies have been initiated to understand the mechanisms by which a single nucleotide polymorphism (SNP) exerts its impact on a phenotype. Conducting such studies is important to understand the underlying biology of a phenotype. From a clinical perspective, the results from these studies are also important for finding biomarkers for diagnostic and therapeutic purposes.
Traditionally, conducting functional studies is tedious and requires significant laboratory resources. However, concurrently with GWAS studies, quantitative trait loci (QTL) mapping studies in recent years have been able to investigate the genetics of omics at various levels (e.g., transcriptome and proteome). Statistical methods have also been developed that can combine QTL data with GWAS findings to conduct in silico functional studies. The results of such studies have enabled researchers to systematically investigate the molecular biology of a phenotype and identify biomarkers for downstream applications.
An issue with the initial wave of in silico functional studies was the lack of independent data for validation. This is important because, in addition to finding consensus biomarkers for clinical application, validation also helps to prioritize omics platforms for follow-up research. Although conducting validation analysis was often not possible in the past, progress in QTL mapping studies has since circumvented this issue, and it is expected that such efforts will become more cell-type specific in the upcoming years.
Motivated by these advancements, this study aimed to combine GWAS data for Coronary Artery Disease (CAD) with several sources of QTLs in order to systematically investigate molecular biomarkers contributing to the risk of CAD through a two-step discovery and validation analyses. Furthermore, post-hoc analyses were conducted to investigate the function of the biomarkers and their relationships.

Methods

Figure 1 provides an overview of the analysis plan. I conducted Transcriptome-Wide Association Study (TWAS), Proteome-Wide Association Study (PWAS), and Epigenome-Wide Association Study (EWAS) for CAD by integrating GWAS data with eQTL, pQTL and mQTL data (S1 Table). QTLs from different sources were used to identify biomarkers through a two-step discovery and validation design (S1 Table). For each QTL dataset, it was ensured that the selected studies did not include overlapping participants, thereby reducing the risk of false-positive findings. In the following section, I detailed the sources of data.

GWAS Data:

GWAS summary statistics for CAD were obtained from the recent meta-analysis of the CARDIoGRAMplusC4D consortia and UK Biobank [1]. In this study, the authors conducted GWAS in a sample comprising of 181,522 cases among 1,165,690 participants of predominantly European ancestry. After examining the association of over 20 million SNPs with CAD, the authors identified 241 independent SNPs (P<5e-8) contributing to the risk of CAD.

eQTL Data:

Previously eQTLGen consortium investigated the genetic architecture of the blood transcriptome. This consortium integrated eQTLs for 19,960 genes from 37 studies, encompassing over 31,000 blood and peripheral blood mononuclear cell (PBMC) samples. The majority of these samples were of European origin. Gene expression profiling was conducted using a combination of expression arrays and RNA sequencing (which accounted for 20.3% of the samples) [2]. The consortium provided public access to their eQTL summary association statistics, after processing these data, eQTLs (P<5e−8) for 16,732 genes were obtained for this study and used as the input data for the discovery step.
To re-investigate the findings from the discovery step, a second set of eQTLs was obtained from the INTERVAL study, in which the authors recently conducted eQTL mapping in blood samples from 4,732 participants of European origin [3]. The INTERVAL study is a randomized trial of healthy blood donors who were recruited at England’s National Health Service Blood and Transplant center. Gene expression profiling of the samples was achieved by whole-blood Illumina RNA sequencing and the authors provided eQTL summary association statistics for 17,362 genes. From this database, eQTLs for 15,298 genes which also reported in the eQTLGen study were obtained for the validation purpose.

pQTL Data:

Summary association statistics for SNPs regulating proteins (protein quantitative trait loci) were obtained from the UK Biobank Pharma Proteomics Project [4] which is a collaboration between UK Biobank and several biopharmaceutical companies. The project aims to characterize the plasma proteomic profiles of UK Biobank participants. So far, the project has provided pQTLs for 2,923 proteins in 34,557 subjects of European descent. The authors generated the results by measuring proteins using the Olink proteomics assay and linking the data to the genotypes of the participants at UK Biobank.
For the purpose of replication, pQTL data from Ferkingstad et al. study [5] was used in which the authors used the SomaScan assay to quantify protein levels in plasma samples taken from 35,559 Icelanders of European descent that participated in the deCODE Health study. The authors provided pQTLs for 4,719 proteins.

mQTL Data:

To examine the contribution of identified CpG sites to CAD, mQTL data from the Hatton et al. [6] was investigated for the purpose of this study. In their study, the authors reported mQTLs for 404,503 CpG sites following the meta-analysis of three sets of results generated based on samples obtained from participants of European ancestry. I used this data as the input for the discovery stage. In their study, the authors also reported mQTL data for 404,503 CpG sites from 2,099 participants of East Asian origin. The authors generated this data by combining the findings from two cohorts of East Asian origin through meta-analysis. This data was selected as the input for the validation analysis because the authors reported mQTL effect sizes from their two datasets (European and east Asian) are highly correlated (r=0.92).

Analyses:

For each omics layer (epigenome, transcriptome, proteome), initially, a search was performed in the discovery datasets to identify functional elements (CpG sites, transcripts, proteins) that share a SNP (P<5e-8) with CAD. Next Mendelian randomization was used to test the impact of the identified elements on CAD. Significant findings were once more re-investigated using data from the validation step.
I used the GSMR algorithm implemented in the GCTA software to conduct the Mendelian randomization. As compared to other methods for MR analysis, GSMR automatically detects and removes SNPs that have a pleiotropic effect on both exposure and outcome; in addition, it accounts for the sampling variance in beta estimates and the LD among SNPs; as such, it is statistically more powerful than other MR approaches [7].
Genotype data from the 1000 genomes (n = 503 of European ancestry) were used to calculate the LD between SNPs. Of note, in addition to investigating the relation between functional elements and CAD. Mendelian randomization was also used to investigate the inter-dependences between the functional elements.

Functional Interaction:

The STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) online tool (Version 12.0) [8] was used to investigate if the genes identified through our TWAS and PWAS analysis are biologically connected and/or enriched in a biological process. For this purpose, I entered the genes into the STRING online tool and carried out the functional analysis by specifying human as the organism and passing the default settings. The STRING database is a repertoire of protein-protein interaction collected from numerous sources, including experimental data, computational prediction methods and other biological databases. It is freely accessible and is regularly updated.

Conditional Analysis:

To investigate if a gene mediates its impact on the risk of CAD through a secondary trait. Initially, a search was conducted to identify traits that share SNPs (P<5e-8) with the gene. Next, conditional analysis was conducted to investigate if the impact of a gene on CAD is mediated by the candidate trait (second trait). For this purpose, initially the GWAS summary statistics of CAD was adjusted for the candidate trait using the mtCOJO algorithm [7] next, the association between the gene and CAD was investigated using Mendelian randomization. Finally, the magnitude of gene-CAD association between the un-adjusted and adjusted model was examined using the Z-test to find if the impact of the gene on CAD is mediated through the candidate trait [7].

Results

Transcriptome-Wide Association Study:

In their phase 1 effort, eQTLGen study has provided eQTLs for 19,960 genes. By examining these data, through the analysis plan described in Figure 1, initially 813 transcripts that shared at least an eQTL with CAD (P<5e-8) were identified and subjected to Mendelian randomization analysis to investigate their causal impact on CAD. 78 genes showed significant impact on the risk of CAD (P<5e-8). By examining the content of the INTERVAL study, eQTLs for 70 of these genes were identified which I subsequently used to validate the findings from the previous step. The outcome of MR analysis in the INTERVAL study confirmed the influence (P<5e-8) of 37 genes on the risk of CAD (Figure 2 and Table S2). Notably, all genes showed concordant direction of effect as compared to the findings based on the eQTLGen study. The effect sizes were highly correlated between discovery and validation (Pearson ‘s r=0.90, P=2.4e-14, Figure 2).
The impact of a number of these genes on CAD are known. LPL is an enzyme that is involved in triglyceride metabolism. LIPA hydrolysis cholesteryl esters and triglycerides. SREBF1 regulates cholesterol and fatty acid synthesis. SCARB1 encodes a plasma membrane receptor for high density lipoprotein cholesterol and FLT1 encodes a member of the vascular endothelial growth factor receptor and promotes endothelial cell proliferation, survival and angiogenesis.
In both the discovery and validation steps, gene SMARCA4 showed the largest impact on the risk of CAD. Conditional analysis indicated the likely path through which this gene impacts the risk of CAD is by modifying the lipid levels (Table S3). PSRC1 was another gene that showed strong association with lipid traits and adjusting for the impact of lipids attenuated its impact on the risk of CAD. The influence of FES on the risk of CAD appeared to be due to the regulatory influence of this gene on blood pressure. As adjusting for the influence of blood pressure significantly attenuated its association with CAD (Table S3).
Proteome-Wide Association Study:
From the total of 2,941 proteins reported in UKB-PPP study, a total of 1,406 proteins shared at least a SNP (P<5e-8) with CAD. The outcome of MR analysis identified 188 proteins that showed significant impact on the risk of CAD. From this list of proteins, 137 had pQTL data in deCODE study. Following this with validation analysis, 28 proteins were identified that showed significant impact on the risk of CAD. Majority of proteins (N=27) showed concordant effects between the two studies (Figure 3, Table S4). The degree of correlation in effect sizes between the findings from the replication and discovery analysis was strong (Pearson’s r = 0.72, P = 2.2e-6, Figure 3). The influence of a number of these proteins on the risk of CAD are known. PLA2G7 (a.k.a, lipoprotein-associated phospholipase A2) plays a key role in LDL metabolism and inflammatory processes. PCSK9 regulates LDL receptor degradation. IL6R is involved in inflammatory processes, ADIPOQ is involved in lipid metabolism and endothelial protection. APOC1 is involved in lipoprotein metabolism and IGF1R is implicated in vascular smooth muscle cell proliferation and plaque stability.
Given that both UKB-PPP and deCODE study have provided full summary association statistics for the blood proteome, MR analysis was then performed to examine the interactions between the identified proteins. The outcome of analysis uncovered a subset of proteins that share regulatory interactions (Figure 4, Table S5). Notably, APOC1, GZMA, GRN influence the level of PCSK9. GZMA also exerts regulatory impact on the level of GOLM2 and CXCL13. GZMA encodes the granzyme A which is involved in lysis of target cells by lymphocytes.
The gene sets identified from the TWAS and PWAS did not overlap; however, after combining the two gene sets and entering them into the STRING, I noted numerous functional interactions (Figure 5a). Furthermore, the outcome of gene-ontology enrichment analysis indicated the identified transcripts and proteins are enriched in biological processes related to lipid metabolism (Figure 5b).
The outcome of MR analysis highlighted the presence of interactions between several biomarkers (Table S6). Higher expression of PSRC1 contributed to lower levels of LAG3 (B=-1.2, P=4.4e-39) and GRN (B=-1.1, P=3.6e-187). Higher expression of MAPKAPK5-AS1 contributes (B=-0.7, P=1.6e-17) to lower LAG3. Higher expression of C4orf3 (B=0.6, P=2.2e-118) contributed to lower level of fatty acid binding protein 2 (FABP2). Similar trend was observed for the association between RP11-33B1.1 and FABP2 (B=-0.4, P=2.3e-72).
Epigenome-Wide Association Study:
By examining the mQTL data for the discovery analysis, initially 4,887 CpG sites that shared at least a SNP with CAD (P<5e-8) were identified. By selecting these sites and conducting MR analysis, 184 CpG sites were identified that showed significant impact on the risk of CAD. By selecting these sites, I re-examined their contribution to CAD using the data from the validation step. A total of 44 CpG sites that showed significant impact on the risk of CAD were identified (Figure 6, Table S7). Although the dataset was from east Asia; however, the identified CpG sites showed concordant direction of effect sizes compared to the discovery results (Pearson’s r correlation=0.97, P=2.2e-16, Figure 6). A cluster of CpG sites around ADAMTS7 on chromosome 15 showed the strongest impact on the risk of CAD (Figure 6). cg22753661 and cg12645284 were located in the second exon of ADAMTS7 and had negative impact on the risk of CAD; whereas, cg15571903 and cg00540400 upstream of this gene and cg00639195 in the first intron of this gene had positive impact on the risk of CAD. This reflects the complex landscape of genomic regulation at this locus.
Several CpG sites displayed regulatory impact on biomarkers identified from the TWAS and PWAS steps [9]. cg00908766 upstream of PSRC1 impacted the expression of this gene (B=-0.25, P=1.45e-157), GRN protein level (B=-0.6, P=1.8e-78) and LAG3 protein level (B=-0.2, P=2.9e-28). Furthermore, as indicated in the previous section, higher expression of PSRC1 contributed to lower levels of GRN. Several CpG sites at NT5C2 locus impacted the expression of this gene. NT5C2 regulates purine metabolism and energy balance and is linked to lipid metabolism and metabolic traits (Table S6). Other notable interaction was the impact of cg05552543 downstream of CFDP1 on expression of this gene (B=-0.2, P=2.9e-28). Furthermore, higher expression of this gene was associated with lower levels of pancreatic lipase related protein 1, PNLIPRP1 (B=-1.2, P=4.4e-39). CFDP1 is implicated in chromatin remodeling [10] and is reported to be essential for cardiac development and function [11].

Discussion

In the post-GWAS era, a topic of research is to identify functional elements by which a SNP exerts its impact on the endpoint phenotype. Identifying such elements are important for the purpose of developing biomarkers as well as to understand the biology of the phenotype. With the availability of QTL data from large consortiums and development of statistical tools that relies on publicly available data, it is possible to minimize the use of resources by conducting in-silico functional studies. Furthermore, with the evolvement of biobank studies and availability of their QTL data, it is possible to conduct cross-comparison studies in order to increase the likelihood of finding genuine biomarkers and to identify promising direction of research. Following these developments, the aim of this study was to search for functional elements contributing to the risk of CAD through two-steps discovery and validation analyses. Altogether, a total of 108 biomarkers, including 37 transcripts, 27 proteins and 44 CpG sites were identified that displayed causal impact on the risk of CAD (Table S1).
By combining the transcripts and protein biomarkers and conducting in-silico functional analysis, we noted the identified genes are overrepresented in biological processes related to coronary artery disease. Furthermore, Mendelian randomization underlined the presence of statistical interactions between several biomarkers. Notably, I found GZMA, APOC1 and GRN regulate the level of PCSK9. The path through which these proteins exert their impact could be through an immune path. All three proteins share immune function and known to promote the release of inflammatory cytokines [12,13,14] that could increase PCSK9 activation [15]. The influence of GZMA on CXCL13 could also be attributed to the release of cytokines such as TNF-α and IL-1β that are known to upregulate CXCL13 expression [16]. By performing conditional analysis, I found SMARCA4 influences CAD primarily through lipid regulation. As a component of SWI/SNF chromatin remodeling complex. SMARCA4 is known to influence regulation of genes involved in lipid metabolism [17]. Furthermore, it has been shown that peroxisome proliferator-activated receptor gamma (PPARG) requires SWI/SNF enzymes including SMARCA4 during adipogenesis [18]. The role of lipid traits in mediating the impact of PSRC1 on CAD is also in agreement with previous findings. It has been documented that overexpression of PSRC1 could inhibit the accumulation of cholesteryl ester in foam cells and alleviate the development of atherosclerosis, while PSRC1 deficiency could increase trimethylamine N-oxide production and impair cholesterol transport thereby accelerate atherosclerosis [19]. The mechanism whereby FES mediates the influence of hypertension on CAD is less clear. It has been reported that knocking down FES promoted migration of monocytes and vascular smooth muscle cells [20]. Therefore, the mechanism whereby FES influence CAD risk could be by regulating the vascular inflammation and stiffness.
While TWAS and PWAS results were functionally related (Figure 5), no individual genes were shared between the sets. This lack of overlap in part stems from the high statistical burden of the study’s two-step validation process, requiring a P < 5e-8 at each step. This consequently decreases the likelihood of finding shared genes. For example, lower level of transcript SWAP70 contributed to higher risk of CAD (Table S2). Similarly, its protein level was also negatively associated with the risk of CAD in deCODE study (B= -0.05, P=4.5E-10); however, it did not pass the genome-wide significant threshold in UK Biobank (B=-0.12, P=1.12e-7). Another explanation could be due to the nature of protein biomarkers. Previous findings indicate that mRNA–protein correlations are typically poor, indicating that protein levels fluctuate independently and substantially. [21,22,23] This is apparently because several processes including post-translational regulation and protein degradation influence the daily level of a protein. Another contributing factor could be due to differences in protein assay methods. A number of studies have used Olink and SomaScan platforms to quantify the protein levels in blood samples. The consensus from these studies is that the correlation of protein levels between platforms is modest [24,25,26,27]. In this study, while 53% of the transcripts identified through the discovery stage were validated, this ratio was 20% for the protein biomarkers.
The replication rate of CpG sites was also small. From the 184 CpG sites identified at the discovery step, only 44 CpG sites were replicated (24% of biomarkers). This could be due to overall lower heritability of CpG sites compared to transcripts and proteins. The mean narrow-sense heritability (h2) of methylation level at a CpG site in blood is estimated to be 0.14 [28] which is lower than the similar values estimated for a transcript (h2=0.4) [29] and a protein (h2=0.4) [30]. Ethnic disparities could also have an impact. The discovery-phase mQTL data were derived from European cohorts, while the validation-phase data were sourced from Asian participants.
In this study, it was possible to investigate connections between proteins and identify those that share regulatory interactions because the authors provide access to their full pQTL summary statistics. It will be valuable if the future QTL mapping studies provide access to their full GWAS summary statistics. In addition to identifying regulatory interactions, such data will allow for more comprehensive studies, such as calculating heritability, genetic correlation and cross-comparison studies. As compared to other QTL datasets, mQTL datasets had small sample size. Furthermore, it is documented that current methylation arrays cover small fraction (< 4%) of total CpG sites in the human genome (~ 28 Millions) [31]. Therefore, it is important to re-conduct multiomics studies as QTLs from studies with higher sample sizes and dense arrays become available. Another limitation of this study is that the source of QTLs used in this study is blood. Although blood is a proxy for many tissues, it would be ideal to re-conduct future analyses using QTL data from tissues that are more relevant to biology of CAD.
In summary, by leveraging blood QTL data, a search was conducted to investigate biomarkers contributing to the risk of CAD through two-step discovery and validation. Altogether, 108 biomarkers, including 44 CpG sites, 37 transcripts and 27 proteins were identified contributing to the risk of CAD. Transcriptome analysis appeared to be more efficient in identifying biomarkers. The identified biomarkers shared functional interactions; further analyses provided insights into the path whereby several of the identified biomarkers increase the risk of CAD.

Supplementary Materials

The following supporting information can be downloaded at the website of this paper posted on Preprints.org.

Funding

This research received no funding.

Institutional Review Board Statement

Not applicable.

Acknowledgments

This research work was enabled in part by computational resources and support provided by the Compute Ontario and the Digital Research Alliance of Canada.
Declaration of generative AI and AI-assisted technologies in the manuscript preparation process: During the preparation of this work the author used ChatGPT, Microsoft Copilot and Google Gemini, in order to investigate the function of genes. After using these services, the author reviewed and edited the content as needed and takes full responsibility for the content of the published article.

Conflicts of Interest

The author declares no competing interests.

References

  1. Aragam, K. G.; et al. Discovery and systematic characterization of risk variants and genes for coronary artery disease in over a million participants. Nat. Genet. 2022, 54, 1803–1815. [Google Scholar] [CrossRef] [PubMed]
  2. Võsa, U.; et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 2021, 53, 1300–1310. [Google Scholar] [CrossRef]
  3. Xu, Y.; et al. An atlas of genetic scores to predict multi-omic traits. Nature 2023, 616, 123–131. [Google Scholar] [CrossRef]
  4. Sun, B. B.; et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature 2023, 622, 329–338. [Google Scholar] [CrossRef] [PubMed]
  5. Ferkingstad, E.; et al. Large-scale integration of the plasma proteome with genetics and disease. Nat Genet 2021, 53, 1712–1721. [Google Scholar] [CrossRef]
  6. Hatton, A. A.; et al. Genetic control of DNA methylation is largely shared across European and East Asian populations. Nat. Commun. 2024, 15, 2713. [Google Scholar] [CrossRef]
  7. Zhu, Z.; et al. Causal associations between risk factors and common diseases inferred from GWAS summary data. Nat. Commun. 2018, 9, 224. [Google Scholar] [CrossRef]
  8. Szklarczyk, D.; et al. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023, 51, D638–D646. [Google Scholar] [CrossRef]
  9. Johanns, M.; et al. Genetic deletion of soluble 5′-nucleotidase II reduces body weight gain and insulin resistance induced by a high-fat diet. Mol. Genet. Metab. 2019, 126, 377–387. [Google Scholar] [CrossRef] [PubMed]
  10. Messina, G.; et al. The human Cranio Facial Development Protein 1 (Cfdp1) gene encodes a protein required for the maintenance of higher-order chromatin organization. Sci. Rep. 2017, 7, 45022. [Google Scholar] [CrossRef] [PubMed]
  11. Giardoglou, P.; Deloukas, P.; Dedoussis, G.; Beis, D. Cfdp1 Is Essential for Cardiac Development and Function. Cells 2023, 12, 1994. [Google Scholar] [CrossRef]
  12. Wensink, A. C.; Hack, C. E.; Bovenschen, N. Granzymes Regulate Proinflammatory Cytokine Responses. J. Immunol. 2015, 194, 491–497. [Google Scholar] [CrossRef]
  13. Saeedi-Boroujeni, A.; et al. Progranulin (PGRN) as a regulator of inflammation and a critical factor in the immunopathogenesis of cardiovascular diseases. J. Inflamm. 2023, 20, 1. [Google Scholar] [CrossRef]
  14. Berbée, J. F. P.; et al. Apolipoprotein CI enhances the biological response to LPS via the CD14/TLR4 pathway by LPS-binding elements in both its N- and C-terminal helix. J. Lipid Res. 2010, 51, 1943–1952. [Google Scholar] [CrossRef]
  15. Glerup, S.; Schulz, R.; Laufs, U.; Schlüter, K.-D. Physiological and therapeutic regulation of PCSK9 activity in cardiovascular disease. Basic Res. Cardiol. 2017, 112, 32. [Google Scholar] [CrossRef] [PubMed]
  16. Lisignoli, G.; et al. IL1beta and TNFalpha differently modulate CXCL13 chemokine in stromal cells and osteoblasts isolated from osteoarthritis patients: evidence of changes associated to cell maturation. Exp. Gerontol. 2004, 39, 659–665. [Google Scholar] [CrossRef]
  17. Barutcu, A. R.; et al. SMARCA4 regulates gene expression and higher-order chromatin structure in proliferating mammary epithelial cells. Genome Res. 2016, 26, 1188–1201. [Google Scholar] [CrossRef] [PubMed]
  18. Salma, N.; Xiao, H.; Mueller, E.; Imbalzano, A. N. Temporal recruitment of transcription factors and SWI/SNF chromatin-remodeling enzymes during adipogenic induction of the peroxisome proliferator-activated receptor gamma nuclear hormone receptor. Mol. Cell. Biol. 2004, 24, 4651–4663. [Google Scholar] [CrossRef] [PubMed]
  19. Li, Y.; et al. (Apo)Lipoprotein Profiling with Multi-Omics Analysis Identified Medium-HDL-Targeting PSRC1 with Therapeutic Potential for Coronary Artery Disease. Adv. Sci. 2025, 12, 2413491. [Google Scholar] [CrossRef]
  20. Karamanavi, E.; et al. The FES Gene at the 15q26 Coronary-Artery-Disease Locus Inhibits Atherosclerosis. Circ. Res. 2022, 131, 1004–1017. [Google Scholar] [CrossRef]
  21. Wegler, C.; et al. Global variability analysis of mRNA and protein concentrations across and within human tissues. NAR Genomics Bioinforma 2020, 2, lqz010. [Google Scholar] [CrossRef] [PubMed]
  22. Vogel, C.; Marcotte, E. M. Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat. Rev. Genet. 2012, 13, 227–232. [Google Scholar] [CrossRef] [PubMed]
  23. Edfors, F.; et al. Gene-specific correlation of RNA and protein levels in human cells and tissues. Mol. Syst. Biol. 2016, 12, 883. [Google Scholar] [CrossRef] [PubMed]
  24. Wang, B.; et al. Comparative studies of 2168 plasma proteins measured by two affinity-based platforms in 4000 Chinese adults. Nat. Commun. 2025, 16, 1869. [Google Scholar] [CrossRef]
  25. Raffield, L. M.; et al. Comparison of Proteomic Assessment Methods in Multiple Cohort Studies. PROTEOMICS 2020, 20, 1900278. [Google Scholar] [CrossRef]
  26. Pietzner, M.; et al. Synergistic insights into human health from aptamer- and antibody-based proteomic profiling. Nat. Commun. 2021, 12, 6822. [Google Scholar] [CrossRef]
  27. Rooney, M. R.; et al. Comparison of Proteomic Measurements Across Platforms in the Atherosclerosis Risk in Communities (ARIC) Study. Clin. Chem. 2023, 69, 68–79. [Google Scholar] [CrossRef]
  28. Villicaña, S.; et al. Genetic impacts on DNA methylation help elucidate regulatory genomic processes. Genome Biol. 2023, 24, 176. [Google Scholar] [CrossRef]
  29. Kanchibhotla, S. C.; et al. Heritability of Gene Expression Measured from Peripheral Blood in Older Adults. Genes 2024, 15, 495. [Google Scholar] [CrossRef]
  30. Drouard, G.; et al. Twin Study Provides Heritability Estimates for 2321 Plasma Proteins and Assesses Missing SNP Heritability. J. Proteome Res. 24, 2689–2697. [CrossRef]
  31. Christiansen, S. N.; et al. Reproducibility of the Infinium methylationEPIC BeadChip assay using low DNA amounts. Epigenetics 17, 1636–1645. [CrossRef] [PubMed]
Figure 1. Analysis pipeline that was used to search for biomarkers contributing to the risk of CAD and characteristics of QTL data. A two-step discovery and validation procedure was used to identify biomarkers contributing to the risk of CAD. QTL data from independent sources were obtained and their contribution to the risk of CAD was investigated using Mendelian randomization. Study name of data used for discovery and validation are presented in dark red and dark teal respectively. The resulting biomarkers were subjected to functional analysis to investigate their enrichment in biological process and identify the interaction between them. Detailed characteristics of QTL data including the study information and the number of biomarkers analyzed are available in Table S1.
Figure 1. Analysis pipeline that was used to search for biomarkers contributing to the risk of CAD and characteristics of QTL data. A two-step discovery and validation procedure was used to identify biomarkers contributing to the risk of CAD. QTL data from independent sources were obtained and their contribution to the risk of CAD was investigated using Mendelian randomization. Study name of data used for discovery and validation are presented in dark red and dark teal respectively. The resulting biomarkers were subjected to functional analysis to investigate their enrichment in biological process and identify the interaction between them. Detailed characteristics of QTL data including the study information and the number of biomarkers analyzed are available in Table S1.
Preprints 206243 g001
Figure 2. The influence of 37 genes identified from the TWAS analysis on CAD. Mendelian randomization was used to examine the association of eQTLs from the eQTLGen study (discovery panel) and the INTERVAL study (validation panel) on CAD. For each gene indicated on the y-axis, the circle and the bar represent effect size (beta coefficients) and standard error around it. Detailed summary association statistics are available in Table S2.
Figure 2. The influence of 37 genes identified from the TWAS analysis on CAD. Mendelian randomization was used to examine the association of eQTLs from the eQTLGen study (discovery panel) and the INTERVAL study (validation panel) on CAD. For each gene indicated on the y-axis, the circle and the bar represent effect size (beta coefficients) and standard error around it. Detailed summary association statistics are available in Table S2.
Preprints 206243 g002
Figure 3. The influence of 27 genes identified from the PWAS analysis on CAD. Mendelian randomization was used to examine the association of pQTLs from the UKB-PPP (discovery panel) and the deCODE study (validation panel) on CAD. For each protein indicated on the y-axis, the circle and the bar represent effect size (beta coefficients) and standard error around it. Detailed summary association statistics are available in Table S4.
Figure 3. The influence of 27 genes identified from the PWAS analysis on CAD. Mendelian randomization was used to examine the association of pQTLs from the UKB-PPP (discovery panel) and the deCODE study (validation panel) on CAD. For each protein indicated on the y-axis, the circle and the bar represent effect size (beta coefficients) and standard error around it. Detailed summary association statistics are available in Table S4.
Preprints 206243 g003
Figure 4. Proteins that share functional interactions. Mendelian randomization was used to identify protein pairs that share regulatory interactions in UKB-PPP dataset. An edge with an arrow indicates as the level of the source protein increases, the level of the target protein increases as well; whereas, an edge with an oval end indicates an inverse association. Statistical values on the edges indicate the nature of associations. Complete statistical details based on UKB-PPP and deCODE studies are available in Table S5.
Figure 4. Proteins that share functional interactions. Mendelian randomization was used to identify protein pairs that share regulatory interactions in UKB-PPP dataset. An edge with an arrow indicates as the level of the source protein increases, the level of the target protein increases as well; whereas, an edge with an oval end indicates an inverse association. Statistical values on the edges indicate the nature of associations. Complete statistical details based on UKB-PPP and deCODE studies are available in Table S5.
Preprints 206243 g004
Figure 5. Functional analysis based on genes identified from the transcriptome and proteome analysis. Genes identified from the TWAS (orange nodes) and the PWAS (cyan nodes) were entered into the STRING online tool for functional analysis. a) The outcome of analysis revealed numerous functional interactions. P-value indicates the network has significantly more interactions (P=5.8e-13) than the background (an equal sized gene set randomly drawn from the genome). b) The outcome of GO-Term enrichment analysis indicated the identified genes are enriched in biological processes related to lipid metabolism. ID indicates the GO-BP identifier in the gene ontology database. Count In Network indicates number of proteins in the gene list that are annotated with a particular term (first number) as compared to the number of genes in the genome that are assigned to the term (second number). Strength describes how large the enrichment effect is. It is the number of proteins in the network that are annotated with the term divided by the number of proteins that is expected to have the same annotation in a random network of the same size. Q-values (Q) were computed after adjusting P-values using the Benjamini and Hochberg approach for controlling the false discovery rate.
Figure 5. Functional analysis based on genes identified from the transcriptome and proteome analysis. Genes identified from the TWAS (orange nodes) and the PWAS (cyan nodes) were entered into the STRING online tool for functional analysis. a) The outcome of analysis revealed numerous functional interactions. P-value indicates the network has significantly more interactions (P=5.8e-13) than the background (an equal sized gene set randomly drawn from the genome). b) The outcome of GO-Term enrichment analysis indicated the identified genes are enriched in biological processes related to lipid metabolism. ID indicates the GO-BP identifier in the gene ontology database. Count In Network indicates number of proteins in the gene list that are annotated with a particular term (first number) as compared to the number of genes in the genome that are assigned to the term (second number). Strength describes how large the enrichment effect is. It is the number of proteins in the network that are annotated with the term divided by the number of proteins that is expected to have the same annotation in a random network of the same size. Q-values (Q) were computed after adjusting P-values using the Benjamini and Hochberg approach for controlling the false discovery rate.
Preprints 206243 g005
Figure 6. The influence of 44 CpG sites identified from the EWAS analysis on CAD. Mendelian randomization was used to examine the association of mQTLs from the discovery panel and the validation panel on CAD. For each CpG site indicated on the y-axis, the circle and the bar represent effect size (beta coefficients) and standard error around it. Detailed summary association statistics are available in Table S7.
Figure 6. The influence of 44 CpG sites identified from the EWAS analysis on CAD. Mendelian randomization was used to examine the association of mQTLs from the discovery panel and the validation panel on CAD. For each CpG site indicated on the y-axis, the circle and the bar represent effect size (beta coefficients) and standard error around it. Detailed summary association statistics are available in Table S7.
Preprints 206243 g006
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated