Preprint
Article

This version is not peer-reviewed.

A ß-Hydroxybutyrylation–FOXM1/CENPA Axis Links Ketone-Body Metabolism to Mitotic Transcription in Triple-Negative Breast Cancer

Submitted:

20 June 2026

Posted:

22 June 2026

You are already at the latest version

Abstract
Lysine β-hydroxybutyrylation (Kbhb) is a metabolite-derived post-translational modification of histone and non-histone proteins that couples β-hydroxybutyrate (BHB) availability to gene expression. Yet the transcription factors that govern the Kbhb substrate program in cancer remain unidentified. Existing studies have cataloged Kbhb-modified substrates or examined individual proteins, without identifying the transcriptional regulators of the program in a defined tumor context. Here, we performed network-based Master Regulator Analysis (MRA), implemented in the viper package, on a molecular signature restricted to experimentally validated Kbhb substrates, across two independent Basal-like breast cancer cohorts profiled on orthogonal platforms: TCGA-BRCA (RNA-seq; n = 195 tumor, 113 normal) and METABRIC (microarray; n = 209 tumor, 148 normal). Dataset-specific regulatory networks were inferred with ARACNe-AP and integrated by cross-platform Stouffer meta-analysis. Of 1,493 Kbhb substrates, 1,322 and 1,213 were expressed in the respective cohorts. The analysis identified seven concordant transcriptional master regulators (six activated, one repressed; cross-cohort NES correlation r = 0.64), with CENPA (meta-NES +4.55) and FOXM1 (meta-NES +4.27) as the dominant drivers. These findings nominate a BHB–Kbhb–FOXM1/CENPA axis linking ketone-body metabolism to mitotic transcription, with potentially protumoral implications for ketogenic regimens in triple-negative breast cancer.
Keywords: 
;  ;  ;  ;  

1. Introduction

Breast cancer remains the most frequently diagnosed malignancy among women, and within it, the triple-negative subtype (TNBC), defined by the absence of estrogen receptor, progesterone receptor, and HER2 expression, carries the poorest prognosis and the fewest targeted treatment options [1]. TNBC, which largely corresponds to the PAM50 Basal-like molecular class, is characterized by a high propensity for early metastasis and a profoundly rewired transcriptional and metabolic state, with dysregulated lipid metabolism increasingly recognized as a hallmark of the subtype [2]. Because the aggressive phenotype of Basal-like tumors is sustained by coordinated, large-scale transcriptional reprogramming rather than by a single driver lesion, the central biological question is not merely which genes are deregulated but which transcription factors orchestrate their deregulation. Identifying the master regulators that govern these programs is therefore a prerequisite for understanding—and ultimately interrupting—the transcriptional logic of the Basal-like phenotype.
One metabolic input now known to act directly on the transcriptional machinery is the ketone body β-hydroxybutyrate (BHB). Produced endogenously during fasting, prolonged exercise, and ketogenic dietary regimens, BHB was long regarded primarily as an energy carrier delivered to peripheral organs, yet it is now appreciated to be far more than a metabolite: it functions as an endogenous inhibitor of histone deacetylases, as a ligand for cell-surface receptors, and as a substrate that links cellular energy state to gene expression [3,4]. A defining example of this signaling capacity is lysine β-hydroxybutyrylation (Kbhb), a post-translational modification in which a β-hydroxybutyryl group is covalently attached to lysine residues [5]. On histones, Kbhb behaves as an activating mark—installed by the acetyltransferase p300 and removed by HDAC1/2—that decorates the promoters of transcriptionally active genes [5]. Its reach extends well beyond chromatin: proteome-wide mapping identified 1493 Kbhb substrates, with marked enrichment in the spliceosome, DNA repair, and chromatin-remodeling machinery [6]. A recently described non-canonical route, in which BHB-derived acetoacetate is converted to cytosolic acetyl-CoA, supplies the acyl donor that p300 uses to deposit these marks and supports tumor growth in vivo, closing the biochemical loop between circulating BHB and the Kbhb program [7].
Converging evidence now implicates Kbhb as a pro-tumoral modification across several cancer types. In TNBC, elevated ketogenesis promotes metastasis through Kbhb of calpastatin, which relieves calpain inhibition and drives epithelial-to-mesenchymal transition [2]. In pancreatic cancer, BHB-driven Kbhb of the transcription factor Snail blocks its ubiquitin-mediated degradation and accelerates metastasis, with circulating BHB clinically associated with disease progression [8]; BHB has likewise been reported to sustain colorectal cancer through metabolic plasticity and apoptosis resistance [9]. Together with the emergence of the writer enzymes p300 and CBP as candidate anticancer drug targets [10], these findings have raised concern that ketogenic diets—often proposed as cancer adjuvants yet supported by inconsistent and rarely subtype-stratified evidence—could be counterproductive in certain settings, including TNBC [1]. Despite this momentum, existing studies have characterized Kbhb either by cataloging modified substrates and testing them for differential expression and pathway enrichment, or by dissecting the modification of individual proteins. None has asked which transcription factors govern expression of the Kbhb substrate program as a whole within a defined tumor context—a question that demands network inference and master regulator analysis rather than substrate cataloging or differential-expression enumeration.
Here, we address that gap by performing network-based Master Regulator Analysis (MRA), implemented in the viper package [11], on a molecular signature deliberately restricted to the experimentally validated Kbhb substrates of Huang et al. (2021). Restricting the signature to the substrates of the modification—rather than to the full tumor-versus-normal expression profile—focuses the analysis on the regulators that specifically govern the Kbhb program instead of the generic regulators of the malignant phenotype, an approach inherited and extended from earlier signature-restricted MRA in breast cancer [12]. We applied this strategy to the Basal-like subtype in two independent cohorts profiled on orthogonal platforms—TCGA-BRCA (RNA-seq; n = 195 tumor, 113 normal) and METABRIC (microarray; n = 209 tumor, 148 normal)—inferring dataset-specific regulatory networks with ARACNe-AP [13] and integrating the cohorts by cross-platform meta-analysis. Of the 1493 substrates, 1322 and 1213 were expressed in the TCGA and METABRIC Basal cohorts, respectively. This analysis identified seven concordant transcriptional master regulators (six activated and one repressed; cross-cohort NES correlation r = 0.64 over 75 common regulators), with CENPA (meta-NES +4.55) and FOXM1 (meta-NES +4.27) as the dominant drivers. These results nominate a BHB–Kbhb–FOXM1/CENPA axis that couples ketone-body metabolism to mitotic transcription, and suggest that ketogenic, BHB-raising regimens could paradoxically reinforce tumorigenic transcription in Basal-like breast cancer.

2. Results

2.1. The Kbhb Proteome Is Enriched for Transcriptional and Cell-Cycle Regulators in Basal BRCA

To define the transcriptional landscape of lysine β-hydroxybutyrylation (Kbhb) in breast cancer, we compiled a gene set of 1493 experimentally validated Kbhb substrates from the proteomic study by Huang et al. [6], which identified 3248 unique Kbhb sites in 1397 human proteins using immunoaffinity enrichment and LC-MS/MS after β-hydroxybutyrate (BHB) treatment of HEK293 cells.
Gene Ontology analysis of this set revealed significant enrichment for proteins involved in DNA replication, chromosome organization, cell-cycle regulation, and transcriptional control—precisely the protein classes that localize to chromatin and are subject to histone mark-mediated regulation. Of the 1493 Kbhb substrates, 1322 were expressed in the TCGA Basal cohort (log2(TPM + 1) > 1 in ≥20% of Basal tumors) and 1213 in the METABRIC [14] Basal cohort (log2 intensity > 6 in ≥20% of Basal samples), establishing a robust, platform-specific gene set for downstream analyses.

2.2. ARACNe-AP Networks Capture the Basal-like Regulatory Landscape

Dataset-specific transcriptional regulatory networks were inferred from PAM50 Basal-like tumor expression matrices using ARACNe-AP [13,15] with the 1639 human transcription factors (TFs) defined by Lambert et al. [16] as candidate regulators (100 bootstraps, mutual information p < 1 × 10−8). The TCGA Basal network (n = 195 tumor samples) comprised 183,950 regulatory edges; the METABRIC Basal network (n = 209 tumor samples) comprised 110,780 edges. The use of two independent datasets—RNA-seq (TCGA) and Illumina HT-12 microarray (METABRIC)—enabled cross-platform validation of all downstream findings.

2.3. CENPA and FOXM1 Are the Top Transcriptional Master Regulators of the Kbhb Programme in TCGA

Master Regulator Analysis (MRA) [17,18] was performed using the msVIPER [11] algorithm with a molecular signature restricted to the expressed Kbhb substrates—rather than the full tumor-versus-normal differential expression profile—to identify TFs that preferentially regulate the Kbhb transcriptional program rather than general oncogenic differences [12]. The t-statistic-based signature was computed by comparing Basal tumor samples (n = 195) with solid-tissue normal samples (n = 113).
In the TCGA discovery cohort, MYBL2 emerged as the top master regulator (NES = +3.08), followed by CENPA (NES = +2.97), FOXM1 (NES = +2.70), CBX2 (NES = +2.51), and E2F1 (NES = +2.47). HMGA1 (NES = +2.38) and CHCHD3 (NES = +2.33) were also significant. No significantly repressed regulators were identified in this cohort. In total, 13 regulators reached FDR < 0.05 (Supplementary Figure S1A). Shadow analysis [18] revealed a convergent regulatory hierarchy: CENPA, FOXM1, and MYBL2 each independently explain part of the inferred activity of FOXK2, while CBX2 activity is partially explained by MYBL2, suggesting that CENPA and MYBL2 co-occupy the hierarchical apex of the Kbhb regulatory landscape in TCGA (Supplementary Figure S1C).

2.4. Independent Validation in METABRIC Confirms the Core TMR Set

MRA was repeated in the METABRIC validation cohort using the ComBat-corrected microarray expression matrix and the independently inferred METABRIC Basal network. CENPA (NES = +3.46) and FOXM1 (NES = +3.33) again ranked first and second among activated regulators, while BNC2 (NES = −3.12), VEZF1 (NES = −3.03), KLF9 (NES = −2.88), and MZF1 (NES = −2.76) were the most strongly repressed. In total, 10 regulators reached FDR < 0.05 in METABRIC (Supplementary Figure S1B). Shadow analysis in this cohort identified CENPA as the primary hierarchical regulator, partially explaining the inferred activity of DNMT1, E2F3, E2F7, and MXD4, while FOXM1 partially explains XBP1 activity (Supplementary Figure S1D). The direction of all seven regulators identified in the cross-platform meta-analysis was concordant across cohorts (NES correlation TCGA vs. METABRIC: r = 0.64). This cross-platform concordance demonstrates that the identified TMRs reflect a robust biological program rather than platform-specific noise.

2.5. Meta-Analysis Identifies Seven Concordant TMRs; CENPA Is the Hierarchical Master

To integrate MRA results across both cohorts, we applied the Stouffer method to the 75 regulators detected in both datasets, computing a meta-NES as Zmeta = (ZTCGA + ZMETABRIC)/√2, followed by Benjamini–Hochberg multiple-testing correction. Seven regulators reached FDR < 0.05: six activated (CENPA, FOXM1, HMGA1, CHCHD3, E2F7, and ZNF232) and one repressed (VEZF1) (Table 1, Figure 1). CENPA achieved the highest meta-NES (+4.55, FDR = 0.0004), confirming its role as the primary transcriptional master regulator of the Kbhb program in Basal-like breast cancer.

2.6. Per-Sample VIPER Activity Reveals Coherent Activation of the Kbhb Program Across Basal Tumors

To assess whether the identified TMRs are consistently active across individual tumors—rather than driven by a subset of samples—we estimated per-sample TF activity using single-sample VIPER, independently in TCGA and METABRIC, with their respective regulons. Activity scores were z-normalized per TF within each cohort to account for scale differences between RNA-seq and microarray platforms, and then combined into a single TF × sample matrix.
The resulting heatmap (Figure 1C) reveals a stratified activation pattern across the seven significant TMRs. CENPA and FOXM1 show broad, consistent activation across most Basal tumor samples in both cohorts, concordant with their highest meta-NES values. HMGA1 displays a similar but more moderate gradient, with activation evident across most samples but greater inter-tumoral variability. CHCHD3, E2F7, and ZNF232, which carry lower meta-NES values, show a more heterogeneous pattern, as expected from their weaker aggregate signal. VEZF1 presents the inverse profile, with consistently low or negative activity across both cohorts, concordant with its repressed status. Notably, within each cohort, samples were ordered by the mean activity of the activated TMRs, and the resulting left-to-right gradient confirms that this ordering captures a biologically meaningful axis of Kbhb program activity. Taken together, these results indicate that the Kbhb regulatory program is a characteristic—albeit heterogeneous—feature of the Basal-like subtype, with CENPA and FOXM1 as its most robust and consistent drivers.

2.7. The Kbhb Transcriptional Program Is Differentially Expressed in Basal Tumors Across Both Platforms

To characterize the transcriptional state of Kbhb substrates in Basal BRCA, we performed differential expression (DE) analysis comparing Basal tumors to normal tissue in each cohort independently: DESeq2 [36] on raw STAR counts for TCGA (n = 195 tumors, n = 113 normal), and limma [37] on ComBat-corrected [38] log2 microarray intensities for METABRIC (n = 209 tumors, n = 148 normal). A uniform significance criterion (adjusted p < 0.05 and |log2FC| ≥ 0.5) was applied across both platforms to account for the compressed dynamic range of Illumina microarray data relative to RNA-seq.
Of the 1357 Kbhb genes tested in at least one cohort, 673 showed significant DE in at least one platform. Of these, 155 were concordantly upregulated and 22 concordantly downregulated across TCGA and METABRIC, defining a core Kbhb DE program (Figure 2, Figure S1). The larger number of TCGA-only DE genes (476 total: 376 up, 99 down) versus METABRIC-only (≈21) is expected given the greater statistical power of RNA-seq over microarray. The global correlation of log2FC values across platforms was r = 0.76 (Pearson, n = 1201 genes with results in both cohorts), confirming broad cross-platform reproducibility.
Among the 7 significant TMRs, all showed evidence of DE in at least one dataset. Notably, CENPA showed the highest statistical significance in TCGA (−log10(FDR) ≈ 300), well above FOXM1 (≈200), consistent with its role as the hierarchical master regulator. HMGA1 was the only TMR to qualify as a concordantly upregulated Kbhb substrate—i.e., it is both a regulator and a regulated target of the Kbhb program.

2.8. Volcano Plots and Expression Heatmaps Confirm Broad Upregulation of Kbhb Substrates in Tumors

Volcano plots (Figure S2A,B) show the distribution of DE results across all Kbhb genes and the superimposed positions of the seven significant TMRs (highlighted in orange). In both cohorts, Kbhb substrates show a net upregulation bias in Basal tumors: in TCGA, 530 Kbhb genes are significantly upregulated versus 122 downregulated; in METABRIC, 168 versus 29. The seven TMRs are visible in the upper region of the volcano plots, confirming their strong and significant differential expression.
Hierarchical clustering of all significantly DE Kbhb genes (TCGA: n = 1130; METABRIC: n = 913) using (1 − Pearson r) distance and Ward D2 linkage (Figure S2C,D) reveals a coherent transcriptional pattern: tumor and normal samples segregate into distinct clusters on both platforms, and the dominant Kbhb gene cluster shows elevated expression in tumor, further supporting coordinate upregulation of the Kbhb program in Basal BRCA.

2.9. ORA of TMR Regulons Reveals Convergence on Pathways for the Mitotic Cell Cycle and Chromosome Segregation

To characterize the biological functions regulated by each TMR, we performed over-representation analysis (ORA) of their transcriptional regulons (TCGA Basal ARACNe-AP network) against two complementary pathway databases: Gene Ontology Biological Process (GO-BP) and Reactome. GO-BP results were simplified to remove semantic redundancy (similarity cutoff = 0.6). A shared universe of all protein-coding genes in the TCGA Basal expression matrix served as the background.
The ORA dotplot (Figure 3) shows a striking convergence of the six activated TMR regulons on mitotic cell-cycle pathways. In Reactome, the most widely shared pathways across TMRs include M Phase, Mitotic Metaphase and Anaphase, Separation of Sister Chromatids, G2/M Transition, and Cell Cycle Checkpoints—with CENPA and FOXM1 showing the most significant enrichment (lowest adjusted p-values, deepest red in Figure 3). In GO-BP, chromosome segregation, nuclear division, DNA replication, and kinetochore assembly dominate the shared landscape.
Notably, the regulons of CENPA, FOXM1, HMGA1, and CHCHD3 are also enriched for RNA processing terms (mRNA splicing, RNA localization, ribonucleoprotein complex biogenesis), indicating a secondary layer of post-transcriptional regulation within the Kbhb program. VEZF1, the sole repressed TMR, is enriched exclusively in two GO-BP terms (positive regulation of gluconeogenesis; cell surface receptor protein serine/threonine kinase signaling), with no significant Reactome pathway enrichment at FDR < 0.05, consistent with its restricted role in endothelial/vascular identity.

2.10. The TMR Regulons Preferentially Target Concordantly Upregulated Kbhb Genes

To integrate network topology with differential expression, we visualized the ARACNe-AP-supported regulatory edges linking the seven TMRs to their Kbhb DE target genes. All edges present in the TCGA Basal network between a TMR and a Kbhb-DE gene were retained, yielding 436 interactions across 436 unique gene–TMR pairs.
The chord diagram (Figure 4) illustrates the architecture of this regulatory network: CENPA (85 targets), E2F7 (89), and FOXM1 (83) have the largest regulons among Kbhb DE genes, followed by CHCHD3 (78), HMGA1 (43), and VEZF1 (42), while ZNF232 has the most restricted overlap (6 targets). Concordantly upregulated genes constitute the dominant category across all six activated TMRs, consistent with coherent transcriptional activation of Kbhb substrates in Basal tumors. A subset of genes is co-regulated by multiple TMRs: genes targeted by three or more TMRs are labeled in the outer track and represent candidate central nodes of the Kbhb regulatory network.
The Sankey diagram (Figure S3) quantifies the distribution of ARACNe-supported interactions across TMR and DE categories, confirming that CENPA, FOXM1, E2F7, and CHCHD3 each contribute a substantial proportion of regulatory connections to the concordantly upregulated Kbhb gene set.

3. Discussion

The transcriptional consequences of lysine β-hydroxybutyrylation in cancer have so far been examined almost exclusively by cataloging differentially expressed Kbhb substrates and testing them for pathway enrichment, an approach that identifies which genes carry the mark but leaves unanswered the question of who controls their expression. By applying network-based master regulator analysis to a molecular signature restricted to experimentally validated Kbhb substrates [12] this study reframes that question and identifies the transcriptional controllers upstream of the Kbhb program rather than its downstream members. The central result is reproducible across two orthogonal platforms: CENPA and FOXM1 are the dominant transcriptional master regulators of the Kbhb program in Basal-like breast cancer, recovered as the two highest-scoring activated regulators in both the TCGA RNA-seq discovery cohort and the METABRIC microarray validation cohort.
CENPA emerged not only as the strongest regulator (meta-NES = +4.55) but also as the hierarchical apex of the inferred network. As a centromeric histone H3 variant, CENPA defines centromeric chromatin epigenetically and is essential for kinetochore assembly and faithful chromosome segregation; its expression peaks at G2/M and is transcriptionally controlled in part by FOXM1 [19]. CENPA is overexpressed in most human cancers, and its levels correlate with tumor aggressiveness, invasiveness, and metastasis in breast cancer [19], while its forced overexpression is causally linked to aneuploidy and genomic instability [20]. Consistent with an organizing role, shadow analysis in METABRIC placed CENPA above DNMT1, E2F3, E2F7, and MXD4, indicating that it partially accounts for the inferred activity of several other regulators. This hierarchical position must, however, be read in light of CENPA’s atypical classification as a transcription factor: it is cataloged as a low-specificity DNA-binding protein on the basis of the centromeric nucleosome structure rather than a sequence-specific motif [16]. Its ARACNe-derived regulon therefore most plausibly reflects co-regulation through chromatin architecture rather than canonical, motif-driven transcription, a distinction that is mechanistically important but does not diminish the coherence of the signal.
FOXM1 serves as the canonical counterpart to this chromatin-centric regulator and anchors the mechanistic axis proposed here. It is a master regulator of G2/M progression and the transcription factor most strongly associated with poor survival in breast cancer, particularly in TNBC, while being nearly absent from normal adult tissues [21]. Critically, FOXM1 targets such as CENPA, AURKA, and BUB1 are themselves Kbhb substrates in the Huang proteome [6], and FOXM1 directly transactivates CENPA to drive proliferation, migration, and glycolysis in TNBC cell lines [22], co-upregulating centrosome amplification and clustering genes in this subtype [23]. Because histone Kbhb is reversibly regulated by p300 and HDAC1/2, and H3K9bhb has been linked to active promoter-associated transcription through reader-mediated mechanisms [6,40,41], these observations converge on a coherent feedforward model: elevated BHB raises Kbhb on the chromatin of FOXM1 and CENPA target genes, reinforcing transcription of the mitotic and centromeric machinery that FOXM1 and CENPA already drive. The clinical corollary is direct and cautionary. Ketogenic diets, which raise circulating BHB and are being explored as cancer adjuvants, could paradoxically feed this BHB-Kbhb-FOXM1/CENPA axis and act as a protumoral stimulus in Basal/TNBC, a concern reinforced by the precedent that BHB accumulation promotes hepatocellular carcinoma through H3K9bhb [41] (Figure 5). This possibility warrants preclinical testing before dietary BHB elevation is considered in this subtype.
The remaining regulators enrich rather than complicate this picture. HMGA1 is unique in being simultaneously an activated TMR and a concordantly upregulated Kbhb substrate: a chromatin architectural protein that drives stem-like, mesenchymal, and metastatic programs in TNBC [24,25]. Its dual status suggests a self-reinforcing node, in which Kbhb modification of HMGA1 and HMGA1-driven transcription of additional Kbhb substrates sustain the program. CHCHD3 is the most unexpected finding: a MICOS-complex inner-membrane protein governing cristae architecture and metabolic homeostasis [27,28], whose appearance as a regulator is best read as a candidate mitochondria-to-nucleus retrograde link consistent with the mitochondrial origin of BHB, although its TF status rests on a single promoter-binding study and demands validation. VEZF1, the sole repressed TMR, is a vascular endothelial factor whose loss aligns with the dedifferentiated phenotype of Basal tumors; given its genome-wide role in modulating Pol II elongation and splicing [32], its repression may relax elongation constraints on Kbhb-regulated RNA-processing genes, a notion compatible with the secondary enrichment of mRNA splicing and ribonucleoprotein biogenesis terms across the CENPA, FOXM1, HMGA1, and CHCHD3 regulons. E2F7 and ZNF232 are best framed as hypotheses: the apparent activation of the atypical repressor E2F7 likely reflects context-dependent co-regulation in the TP53- and RB-altered Basal background and is partially shadowed by CENPA [29,30], whereas the poorly characterized zinc-finger ZNF232 constitutes a genuinely novel prediction requiring experimental follow-up [31].
Several limitations temper these conclusions. The Kbhb gene set derives from HEK293 cells treated with exogenous BHB [6], and tissue-specific substrates in breast epithelium may differ. ARACNe-AP infers co-expression rather than direct binding, so the regulons, especially those of CENPA and CHCHD3, whose DNA-binding evidence is atypical [16], await ChIP-seq or CUT&RUN confirmation. The analysis is observational, leaving the BHB-Kbhb-TMR-phenotype chain to be established by experimental perturbation. The compressed dynamic range of the METABRIC microarray likely underestimates the Kbhb program. Within these bounds, the genes co-regulated by three or more TMRs represent the highest-confidence core of the network and the most compelling targets for the functional and single-cell studies that should follow.

4. Materials and Methods

The overall analytical workflow is summarized in Figure 6. Briefly, a gene set of experimentally validated Kbhb substrates was derived from a published proteomic dataset [6] and served as the basis for transcriptional master regulator analysis in two independent cohorts of Basal-like breast cancer: TCGA-BRCA (RNA-seq) and METABRIC (Illumina HT-12 microarray). Dataset-specific transcriptional regulatory networks were inferred with ARACNe-AP, and master regulator analysis was performed with msVIPER using a Kbhb-restricted molecular signature. Results from both cohorts were integrated via Stouffer meta-analysis with Benjamini–Hochberg correction to identify transcription factors that consistently regulate the Kbhb transcriptional program across platforms.

4.1. Kbhb Gene Set

The Kbhb gene set was derived from the proteomic study by Huang et al. (2021), which identified 3248 unique Kbhb sites in 1397 human proteins using immunoaffinity enrichment and liquid chromatography–tandem mass spectrometry (LC-MS/MS) in HEK293 cells treated with 10 mM β-hydroxybutyrate (BHB) for 24 h. Sites were filtered to a false discovery rate (FDR) < 1%, MaxQuant score ≥ 40, and phosphoRS localization probability ≥ 0.75. After mapping to HGNC gene symbols and removing ambiguous entries, the final gene set comprised 1493 unique gene symbols (hereafter referred to as the Kbhb proteome).

4.2. Patient Cohorts and Gene Expression Data

4.2.1. TCGA-BRCA (Discovery)

RNA-seq STAR Counts data for 1231 breast cancer samples were downloaded from the Genomic Data Commons (GDC) portal using the Bioconductor package TCGAbiolinks [43]. Raw counts and transcripts per million (TPM) estimates were retrieved as a SummarizedExperiment object. Clinical annotations, including PAM50 molecular subtype, were obtained from the TCGA Pan-Cancer Clinical Data Resource. The analysis was restricted to PAM50 Basal-like primary tumors (n = 195) and solid tissue normal samples (n = 113). Only protein-coding genes were retained, and duplicate Ensembl IDs mapping to the same HGNC symbol were resolved by taking the column-wise median.

4.2.2. METABRIC (Validation)

The METABRIC dataset was downloaded from cBioPortal [44] (brca_metabric; n = 2509). Illumina HT-12 microarray expression data (log2-transformed) and clinical annotations were obtained. Samples were filtered to those annotated as Basal in the CLAUDIN_SUBTYPE field (n = 209) and those annotated as Normal (n = 148). Genes with multiple probes were collapsed using the column-wise median.

4.3. Data Pre-Processing and Normalization

TCGA RNA-seq data were normalized using TPM and a log2(TPM + 1) transformation. No additional batch correction was applied because all samples were processed through the GDC harmonized pipeline. Genes with log2(TPM + 1) > 1 in at least 20% of Basal tumor samples were retained, yielding 14,488 protein-coding genes.
METABRIC microarray data were already provided on a log2 scale. Basal and Normal samples were first filtered for expression (log2 intensity > 6 in at least 20% of Basal samples), yielding 11,976 genes. The filtered matrix was then batch-corrected using ComBat from the Bioconductor sva package [38], with cohort identity (five contributing centers) as the batch variable and tumor/normal group membership as a covariate in the model matrix. Correction was applied jointly to Basal and Normal samples before network inference.

4.4. Transcriptional Network Inference

Dataset-specific transcriptional regulatory networks were inferred using ARACNe-AP, applied separately to the TCGA Basal (n = 195) and METABRIC Basal (n = 209) expression matrices. The human transcription factor list from Lambert et al. (n = 1639 TFs) was used to define candidate regulators. Network inference was performed with a mutual information p-value threshold of 1 × 10−8 (DPI tolerance = 0), 100 bootstrap iterations, and a fixed random seed. Bootstrap networks were consolidated using ARACNe-AP’s built-in consolidation step. The final TCGA network comprised 183,950 edges; the METABRIC network comprised 110,780 edges.

4.5. Transcriptional Master Regulator Analysis

Master Regulator Analysis (MRA) was performed using the msVIPER algorithm implemented in the Bioconductor viper package [11]. For each dataset, the ARACNe-AP network was converted into a regulon object using the aracne2regulon function.
The molecular signature was derived by computing two-sample t-statistics for each Kbhb gene expressed in the respective dataset (TCGA: 1322 genes; METABRIC: 1213 genes), comparing Basal tumor samples with matched normal tissue using the rowTtest function. This signature was restricted to Kbhb substrates—rather than the full differential expression profile—to identify transcription factors that preferentially regulate the Kbhb transcriptional program rather than general tumor-versus-normal differences [12]. A null model was generated from 1000 random gene permutations (sampling with replacement, ttestNull, seed = 1). Regulators with fewer than 25 targets in the signature were excluded. Shadow analysis [18] was performed to identify regulators whose activity is explained by another regulator (shadow, minsize = 25).

4.6. Meta-Analysis Across Datasets

To integrate MRA results from TCGA and METABRIC, regulators detected in both datasets (n = 75 in common) were combined using the Stouffer method [45,46]. Because NES values from msVIPER approximate standard normal distributions under the null hypothesis, the meta-NES was computed using the Stouffer method [45,46] as:
Z meta = Z TCGA   +   Z METABRIC 2
Meta p-values were derived from the standard normal distribution (p = 2 · Φ(−|Zmeta|)) and were adjusted for multiple comparisons using the Benjamini–Hochberg procedure. Regulators with FDR < 0.05 were considered significant.

4.7. Per-Sample TF Activity and Visualization

Per-sample transcription factor activity was estimated using the single-sample VIPER algorithm, applied independently to the TCGA and METABRIC expression matrices with their respective regulons (minimum regulon size = 25). Because TCGA (RNA-seq) and METABRIC (microarray) operate on different scales, activity scores were z-score normalized per TF across samples within each cohort before combining. The normalized matrices were then concatenated into a single TF × sample activity matrix for joint visualization.

4.8. Software and Reproducibility

Bioinformatic analyses were performed in R across two computing environments. Primary analyses—data acquisition, pre-processing, network inference, msVIPER, Stouffer meta-analysis, and differential expression (DESeq2/limma)—were executed in R version 4.5.2 on a Linux server (2 × Intel Xeon Platinum 8368, 152 cores, 1 TiB RAM). Over-representation analysis, expression heatmaps, and regulatory network diagrams (circos and Sankey) were generated locally in R version 4.6.0. Key package versions—server: TCGAbiolinks 2.38.0 [43], SummarizedExperiment 1.40.0 [47], sva 3.58.0 [38], viper 1.44.0 [11], DESeq2 1.50.2 [36], limma 3.66.0 [37], clusterProfiler 4.18.4 [48], ggplot2 4.0.3 [33], ggrepel 0.9.8 [34], patchwork 1.3.2 [35], ggraph 2.2.2 [49], tidygraph 1.3.1 [50]; local (R 4.6.0): clusterProfiler 4.20.0 [48], ReactomePA 1.56.0 [51], org.Hs.eg.db 3.23.1 [52], pheatmap 1.0.13 [53]. ARACNe-AP was run as a standalone Java application on the Linux server. Analysis scripts are available at [https://github.com/hachepunto/kbhb_mra].

4.9. Differential Gene Expression Analysis

4.9.1. TCGA-BRCA

Raw STAR unstranded counts were extracted from the SummarizedExperiment object. Duplicate HGNC symbols were resolved by computing the column-wise median across all rows mapping to the same symbol and rounding to the nearest integer (DESeq2 requirement), consistent with the TPM-based deduplication in Section 4.3. Samples were restricted to PAM50 Basal-like primary tumors (n = 195) and solid tissue normal samples (n = 113). Genes with ≥10 counts in at least 20% of samples were retained before modeling. Differential expression was estimated using DESeq2 with the design formula ~condition (Tumor vs. Normal). Wald test p-values were adjusted for multiple comparisons using the Benjamini–Hochberg procedure (α = 0.05).

4.9.2. METABRIC

Differential expression was performed directly on the ComBat-corrected log2 microarray intensities (Section 4.3) using the limma package. A design matrix was constructed with model.matrix(~condition). Linear models were fitted with lmFit, and moderated t-statistics were computed with eBayes. Adjusted p-values were obtained via topTable using the Benjamini–Hochberg method.

4.11. Cross-Cohort Concordance of Kbhb Differential Expression

To assess the reproducibility of the Kbhb transcriptional response across platforms, a consensus table was constructed from all Kbhb genes present in at least one cohort’s DE results (n = 1357 genes). Genes were assigned to one of six mutually exclusive categories based on significance (adjusted p < 0.05 and |log2FC| ≥ 0.5) and direction in each cohort: Concordant up (significant and upregulated in both), Concordant down (significant and downregulated in both), Discordant (significant in both but in opposite directions), TCGA only, METABRIC only, and Not DE. Of the 673 genes with DE evidence in at least one cohort, 155 were concordantly upregulated and 22 concordantly downregulated across both platforms.

4.12. Over-Representation Analysis of Significant TMR Regulons

To characterize the biological functions regulated by each of the seven significant TMRs, over-representation analysis (ORA) was performed on their transcriptional regulons using two complementary pathway databases.

4.12.1. Gene Lists

For each significant TMR, the set of target genes was extracted from the TCGA Basal ARACNe-AP regulon object. Target gene symbols were converted to Entrez IDs using the bitr function from the clusterProfiler package [48] with the Homo sapiens annotation database (org.Hs.eg.db). Genes without a unique Entrez ID mapping were discarded. The background universe was defined as all protein-coding genes in the TCGA Basal expression matrix with a valid Entrez ID.

4.12.2. GO Biological Process

ORA against the Gene Ontology Biological Process (GO-BP) database was performed using the compareCluster function (clusterProfiler) with fun =enrichGO”, ontology “BP”, Benjamini–Hochberg p-value adjustment (pvalueCutoff = 0.05, qvalueCutoff = 0.2), and readable = TRUE. To reduce redundancy introduced by the hierarchical structure of the GO-BP ontology, the resulting term set was simplified using clusterProfiler::simplify (similarity cutoff = 0.6, ranked by p.adjust).

4.12.3. Reactome

ORA against the Reactome pathway database was performed using compareCluster with fun =enrichPathway” from the ReactomePA package [51] (organism = “human”, same p-value and q-value thresholds, readable = TRUE). No simplification step was applied, as Reactome uses a non-redundant hierarchical structure at the pathway level.

4.13. Network Visualization of TMR–Kbhb Gene Interactions

To integrate network topology with differential expression, ARACNe-AP edges in the TCGA Basal network were filtered to retain interactions in which the regulator was one of the seven significant TMRs and the target was a differentially expressed Kbhb gene (adjusted p < 0.05 and |log2FC| ≥ 0.5 in at least one cohort). This yielded 436 TMR–gene interactions covering [N] unique Kbhb DE target genes.

5. Conclusions

This work introduces a Kbhb-substrate-restricted master regulator analysis framework that moves beyond cataloging differentially expressed modification sites to identify the transcription factors that control their expression. It reveals CENPA and FOXM1 as the hierarchical apex of the β-hydroxybutyrylation transcriptional program in Basal-like breast cancer across two independent platforms. The coherence of this program, anchored by a feedforward BHB–Kbhb–FOXM1/CENPA axis that couples metabolic state to mitotic and centromeric gene expression, suggests that elevated BHB—as occurs under ketogenic dietary regimens—could paradoxically reinforce tumorigenic transcription in TNBC. Beyond the canonical oncogenes, ZNF232 emerges as a consistently co-activated but functionally uncharacterized regulator, making it a priority candidate for the first experimental dissection of its role in cancer. The pharmacological tractability of FOXM1 [54] and chromatin-modifying enzymes such as p300/CBP and class I HDACs [6,10], together with emerging evidence that ketogenesis or BHB exposure can promote TNBC metastasis and breast-cancer cell survival under treatment [1,2,55], translates these computational findings into concrete hypotheses for therapeutic and dietary-intervention research.

Supplementary Materials

The following supporting information can be downloaded at the website of this paper posted on Preprints.org, Figure S1: msVIPER master regulator analysis results and regulatory hierarchy per cohort. Figure S2: Differential expression of Kbhb genes in Basal BRCA. Figure S3: Quantitative distribution of ARACNe-supported TMR–Kbhb gene interactions by DE category. Table S1 (Supplementary_TableS1_ORA_GO_BP.tsv): Complete GO-BP over-representation analysis results for the seven significant TMR regulons (adjusted p < 0.05, q-value < 0.2; semantic redundancy removed at similarity cutoff = 0.6). Table S2 (Supplementary_TableS2_ORA_Reactome.tsv): Complete Reactome over-representation analysis results for the seven significant TMR regulons (adjusted p < 0.05, q-value < 0.2).

Author Contributions

Conceptualization, H.T. and E.H.-L.; Methodology, H.T.; Software, H.T.; Validation, H.T. and E.H.-L.; Formal Analysis, H.T.; Investigation, H.T.; Resources, E.H.-L.; Data Curation, H.T.; Writing—Original Draft Preparation, H.T.; Writing—Review & Editing, H.T. and E.H.-L.; Visualization, H.T.; Supervision, E.H.-L.; Project Administration, E.H.-L.; Funding Acquisition, E.H.-L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable. This study used only publicly available, de-identified genomic datasets (TCGA-BRCA and METABRIC) and did not involve direct contact with human subjects or biological samples.

Data Availability Statement

The TCGA-BRCA dataset is publicly available via the Genomic Data Commons (https://portal.gdc.cancer.gov/). The METABRIC dataset is available through cBioPortal (https://www.cbioportal.org/). All analysis code and intermediate results are available from the corresponding authors upon reasonable request.

Acknowledgments

The authors wish to thank the members of the Computational Genomics Division at INMEGEN for helpful discussions. The authors acknowledge the TCGA Research Network (https://www.cancer.gov/tcga) and the METABRIC consortium for making their datasets publicly available.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Urzì, A.G.; Tropea, E.; Gattuso, G.; Spoto, G.; Marsala, G.; Calina, D.; Libra, M.; Falzone, L. Ketogenic Diet and Breast Cancer: Recent Findings and Therapeutic Approaches. Nutrients 2023, 15, 4357. [Google Scholar] [CrossRef] [PubMed]
  2. Jiang, H.; Zeng, Y.; Yuan, X.; Chen, L.; Xu, X.; Jiang, X.; Li, Q.; Li, G.; Yang, H. Ketogenesis Promotes Triple-Negative Breast Cancer Metastasis via Calpastatin β-Hydroxybutyrylation. Lipids Health Dis. 2024, 23, 371. [Google Scholar] [CrossRef] [PubMed]
  3. Newman, J.C.; Verdin, E. β-Hydroxybutyrate: Much More than a Metabolite. Diabetes Res. Clin. Pract. 2014, 106, 173–181. [Google Scholar] [CrossRef] [PubMed]
  4. Han, Y.-M.; Ramprasath, T.; Zou, M.-H. β-Hydroxybutyrate and Its Metabolic Effects on Age-Associated Pathology. Exp. Mol. Med. 2020, 52, 548–555. [Google Scholar] [CrossRef] [PubMed]
  5. Xie, Z.; Zhang, D.; Chung, D.; Tang, Z.; Huang, H.; Dai, L.; Qi, S.; Li, J.; Colak, G.; Chen, Y.; et al. Metabolic Regulation of Gene Expression by Histone Lysine β-Hydroxybutyrylation. Mol. Cell 2016, 62, 194–206. [Google Scholar] [CrossRef] [PubMed]
  6. Huang, H.; Zhang, D.; Weng, Y.; Delaney, K.; Tang, Z.; Yan, C.; Qi, S.; Peng, C.; Cole, P.A.; Roeder, R.G.; et al. The Regulatory Enzymes and Protein Substrates for the Lysine β-Hydroxybutyrylation Pathway. Sci. Adv. 2021, 7, eabe2771. [Google Scholar] [CrossRef] [PubMed]
  7. Kaluba, F.C.; Rogers, T.J.; Jeong, Y.-J.; House, R. ‘Rae’ J.; Waldhart, A.; Sokol, K.H.; Daniels, S.R.; Lee, C.J.; Longo, J.; Johnson, A.; et al. An Alternative Route for β-Hydroxybutyrate Metabolism Supports Cytosolic Acetyl-CoA Synthesis in Cancer Cells. Nat. Metab. 2025, 7, 2033–2044. [Google Scholar] [CrossRef] [PubMed]
  8. Jiang, W.; Wang, M.; Wang, J.; Hao, Q.; Li, Y.; Liu, L.; Zhou, T.; Song, W.; Liu, J.; Liu, M.; et al. β-Hydroxybutyrate Promotes Cancer Metastasis through β-Hydroxybutyrylation-Dependent Stabilization of Snail. Nat. Commun. 2025, 16, 6592. [Google Scholar] [PubMed]
  9. Rashidi, P.; Bagheri, Z.; Khodayar, Z.; Tarkashvand, S.; Elahirad, N.; Akhoondi, R.; Moghaddam, S.H.; Sanati, M.; Haghighatjou, R.; Yekani, R.; et al. β-Hydroxybutyrate, a Primary Metabolite of Ketogenic Diets and Its Dual Role in Modulating Colorectal Cancer: From Molecular Mechanisms to Therapeutic Insights. Clin. Exp. Med. 2026, 26, 168. [Google Scholar] [CrossRef] [PubMed]
  10. Masci, D.; Puxeddu, M.; Silvestri, R.; Regina, G.L. Targeting CBP and P300: Emerging Anticancer Agents. Molecules 2024, 29, 4524. [Google Scholar] [CrossRef] [PubMed]
  11. Alvarez, M.J.; Shen, Y.; Giorgi, F.M.; Lachmann, A.; Ding, B.B.; Ye, B.H.; Califano, A. Functional Characterization of Somatic Mutations in Cancer Using Network-Based Inference of Protein Activity. Nat. Genet. 2016, 48, 838–847. [Google Scholar] [CrossRef] [PubMed]
  12. Tapia-Carrillo, D.; Tovar, H.; Velazquez-Caldelas, T.E.; Hernandez-Lemus, E. Master Regulators of Signaling Pathways: An Application to the Analysis of Gene Regulation in Breast Cancer. Front. Genet. 2019, 10, 352. [Google Scholar] [CrossRef] [PubMed]
  13. Lachmann, A.; Giorgi, F.M.; Lopez, G.; Califano, A. ARACNe-AP: Gene Network Reverse Engineering through Adaptive Partitioning Inference of Mutual Information. Bioinformatics 2016, 32, 2233–2235. [Google Scholar] [CrossRef] [PubMed]
  14. Curtis, C.; Shah, S.P.; Chin, S.-F.; Turashvili, G.; Rueda, O.M.; Dunning, M.J.; Speed, D.; Lynch, A.G.; Samarajiwa, S.; Yuan, Y.; et al. The Genomic and Transcriptomic Architecture of 2000 Breast Tumours Reveals Novel Subgroups. Nature 2012, 486, 346–352. [Google Scholar] [CrossRef] [PubMed]
  15. Margolin, A.A.; Nemenman, I.; Basso, K.; Wiggins, C.; Stolovitzky, G.; Favera, R.; Califano, A. ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context. BMC Bioinform. 2006, 7, S7. [Google Scholar] [CrossRef] [PubMed]
  16. Lambert, S.A.; Jolma, A.; Campitelli, L.F.; Das, P.K.; Yin, Y.; Albu, M.; Chen, X.; Taipale, J.; Hughes, T.R.; Weirauch, M.T. The Human Transcription Factors. Cell 2018, 175, 598–599. [Google Scholar] [CrossRef] [PubMed]
  17. Carro, M.S.; Lim, W.K.; Alvarez, M.J.; Bollo, R.J.; Zhao, X.; Snyder, E.Y.; Sulman, E.P.; Anne, S.L.; Doetsch, F.; Colman, H.; et al. The Transcriptional Network for Mesenchymal Transformation of Brain Tumours. Nature 2010, 463, 318–325. [Google Scholar] [CrossRef] [PubMed]
  18. Lefebvre, C.; Rajbhandari, P.; Alvarez, M.J.; Bandaru, P.; Lim, W.K.; Sato, M.; Wang, K.; Sumazin, P.; Kustagi, M.; Bisikirska, B.C.; et al. A Human B-cell Interactome Identifies MYB and FOXM1 as Master Regulators of Proliferation in Germinal Centers. Mol. Syst. Biol. 2010, 6, 377. [Google Scholar] [CrossRef] [PubMed]
  19. Renaud-Pageot, C.; Quivy, J.-P.; Lochhead, M.; Almouzni, G. CENP-A Regulation and Cancer. Front. Cell Dev. Biol. 2022, 10, 907120. [Google Scholar] [CrossRef] [PubMed]
  20. Amato, A.; Schillaci, T.; Lentini, L.; Leonardo, A.D. CENPA Overexpression Promotes Genome Instability in pRb-Depleted Human Cells. Mol. Cancer 2009, 8, 119. [Google Scholar] [CrossRef] [PubMed]
  21. Katzenellenbogen, B.S.; Guillen, V.S.; Katzenellenbogen, J.A. Targeting the Oncogenic Transcription Factor FOXM1 to Improve Outcomes in All Subtypes of Breast Cancer. Breast Cancer Res. 2023, 25, 76. [Google Scholar] [CrossRef] [PubMed]
  22. Shen, X.; Zhong, J.; Yu, P.; Liu, F.; Peng, H.; Chen, N. YTHDC1-dependent m6A Modification Modulated FOXM1 Promotes Glycolysis and Tumor Progression through CENPA in Triple-negative Breast Cancer. Cancer Sci. 2024, 115, 1881–1895. [Google Scholar] [CrossRef] [PubMed]
  23. Rida, P.; Baker, S.; Saidykhan, A.; Bown, I.; Jinna, N. FOXM1 Transcriptionally Co-Upregulates Centrosome Amplification and Clustering Genes and Is a Biomarker for Poor Prognosis in Androgen Receptor-Low Triple-Negative Breast Cancer. Cancers 2024, 16, 3191. [Google Scholar] [CrossRef] [PubMed]
  24. Shah, S.N.; Cope, L.; Poh, W.; Belton, A.; Roy, S.; Talbot, C.C.; Sukumar, S.; Huso, D.L.; Resar, L.M.S. HMGA1: A Master Regulator of Tumor Progression in Triple-Negative Breast Cancer Cells. PLoS ONE 2013, 8, e63419. [Google Scholar] [CrossRef] [PubMed]
  25. Méndez, O.; Pérez, J.; Soberino, J.; Racca, F.; Cortés, J.; Villanueva, J. Clinical Implications of Extracellular HMGA1 in Breast Cancer. Int. J. Mol. Sci. 2019, 20, 5950. [Google Scholar] [CrossRef] [PubMed]
  26. Petrosino, S.; Pacor, S.; Pegoraro, S.; Gazziero, V.A.; Canarutto, G.; Piazza, S.; Manfioletti, G.; Sgarra, R. HMGA1 Regulates the Expression of Replication-Dependent Histone Genes and Cell-Cycle in Breast Cancer Cells. Int. J. Mol. Sci. 2022, 24, 594. [Google Scholar] [CrossRef] [PubMed]
  27. Darshi, M.; Mendiola, V.L.; Mackey, M.R.; Murphy, A.N.; Koller, A.; Perkins, G.A.; Ellisman, M.H.; Taylor, S.S. ChChd3, an Inner Mitochondrial Membrane Protein, Is Essential for Maintaining Crista Integrity and Mitochondrial Function*. J. Biol. Chem. 2011, 286, 2918–2932. [Google Scholar] [CrossRef] [PubMed]
  28. Kuai, Z.T.; Zheng, X.M.; Li, Y.Y.; Wang, T.Q. CHCHD3(MIC19): Mitochondrial Cristae Structure Regulation and Disease Associations. Front. Mol. Biosci. 2026, 13, 1861303. [Google Scholar] [CrossRef]
  29. Westendorp, B.; Mokry, M.; Koerkamp, M.J.A.G.; Holstege, F.C.P.; Cuppen, E.; Bruin, A. de E2F7 Represses a Network of Oscillating Cell Cycle Genes to Control S-Phase Progression. Nucleic Acids Res. 2012, 40, 3511–3523. [Google Scholar] [CrossRef] [PubMed]
  30. Moreno, E.; Pandit, S.K.; Toussaint, M.J.M.; Bongiovanni, L.; Harkema, L.; van Essen, S.C.; van Liere, E.A.; Westendorp, B.; Bruin, A. de Atypical E2Fs Either Counteract or Cooperate with RB during Tumorigenesis Depending on Tissue Context. Cancers 2021, 13, 2033. [Google Scholar] [CrossRef] [PubMed]
  31. Mavrogiannis, L.A.; Argyrokastritis, A.; Tzitzikas, N.; Dermitzakis, E.; Sarafidou, T.; Patsalis, P.C.; Moschonas, N.K. ZNF232: Structure and Expression Analysis of a Novel Human C2H2 Zinc Finger Gene1, Member of the SCAN/LeR Domain Subfamily. Biochim. Biophys. Acta (BBA)—Gene Struct. Expr. 2001, 1518, 300–305. [Google Scholar] [CrossRef] [PubMed]
  32. Gowher, H.; Brick, K.; Camerini-Otero, R.D.; Felsenfeld, G. Vezf1 Protein Binding Sites Genome-Wide Are Associated with Pausing of Elongating RNA Polymerase II. Proc. Natl. Acad. Sci. USA 2012, 109, 2370–2375. [Google Scholar] [CrossRef] [PubMed]
  33. Wickham, H. Ggplot2, Elegant Graphics for Data Analysis. Use R! 2016. [Google Scholar] [CrossRef]
  34. Slowikowski, K. Ggrepel: Automatically Position Non-Overlapping Text Labels with “Ggplot2”. CRAN: Contrib. Packag. 2016. [Google Scholar] [CrossRef]
  35. Pedersen, T.L. Patchwork: The Composer of Plots. CRAN: Contrib. Packag. 2019. [Google Scholar] [CrossRef]
  36. Love, M.I.; Huber, W.; Anders, S. Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2. Genome Biol. 2014, 15, 550. [Google Scholar] [CrossRef] [PubMed]
  37. Ritchie, M.E.; Phipson, B.; Wu, D.; Hu, Y.; Law, C.W.; Shi, W.; Smyth, G.K. Limma Powers Differential Expression Analyses for RNA-Sequencing and Microarray Studies. Nucleic Acids Res. 2015, 43, e47. [Google Scholar] [CrossRef] [PubMed]
  38. Leek, J.T.; Johnson, W.E.; Parker, H.S.; Jaffe, A.E.; Storey, J.D. The Sva Package for Removing Batch Effects and Other Unwanted Variation in High-Throughput Experiments. Bioinform. Oxf. Engl. 2012, 28, 882–883. [Google Scholar] [CrossRef] [PubMed]
  39. Gu, Z.; Gu, L.; Eils, R.; Schlesner, M.; Brors, B. Circlize Implements and Enhances Circular Visualization in R. Bioinform. (Oxf. Engl.) 2014, 30, 2811–2812. [Google Scholar] [CrossRef] [PubMed]
  40. Chen, C.; Chen, C.; Wang, A.; Jiang, Z.; Zhao, F.; Li, Y.; Han, Y.; Niu, Z.; Tian, S.; Bai, X.; et al. ENL Reads Histone β-Hydroxybutyrylation to Modulate Gene Transcription. Nucleic Acids Res. 2024, 52, 10029–10039. [Google Scholar] [CrossRef] [PubMed]
  41. Zhou, T.; Cheng, X.; He, Y.; Xie, Y.; Xu, F.; Xu, Y.; Huang, W. Function and Mechanism of Histone β-Hydroxybutyrylation in Health and Disease. Front. Immunol. 2022, 13, 981285. [Google Scholar] [CrossRef] [PubMed]
  42. Lambert, S.A.; Jolma, A.; Campitelli, L.F.; Das, P.K.; Yin, Y.; Albu, M.; Chen, X.; Taipale, J.; Hughes, T.R.; Weirauch, M.T. The Human Transcription Factors. Cell 2018, 172, 650–665. [Google Scholar] [CrossRef] [PubMed]
  43. Silva, T.C.; Colaprico, A.; Olsen, C.; Malta, T.M.; Bontempi, G.; Ceccarelli, M.; Berman, B.P.; Noushmehr, H. TCGAbiolinksGUI: A Graphical User Interface to Analyze Cancer Molecular and Clinical Data. F1000Research 2018, 7, 439. [Google Scholar] [CrossRef]
  44. Cerami, E.; Gao, J.; Dogrusoz, U.; Gross, B.E.; Sumer, S.O.; Aksoy, B.A.; Jacobsen, A.; Byrne, C.J.; Heuer, M.L.; Larsson, E.; et al. The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data. Cancer Discov. 2012, 2, 401–404. [Google Scholar] [CrossRef] [PubMed]
  45. Riley, J.W.; Stouffer, S.A.; Suchman, E.A.; Devinney, L.C.; Star, S.A.; Williams, R.M. The American Soldier: Adjustment During Army Life. Am. Sociol. Rev. 1949, 14, 557. [Google Scholar] [CrossRef]
  46. Zaykin, D.V. Optimally Weighted Z-test Is a Powerful Method for Combining Probabilities in Meta-analysis. J. Evol. Biol. 2011, 24, 1836–1841. [Google Scholar] [CrossRef] [PubMed]
  47. Morgan, M.; Obenchain, V.; Hester, J.; Pagès, H. SummarizedExperiment: A Container (S4 Class) for Matrix-like Assays. 2026. [Google Scholar] [CrossRef]
  48. Yu, G.; Wang, L.-G.; Han, Y.; He, Q.-Y. clusterProfiler: An R Package for Comparing Biological Themes among Gene Clusters. Omics J. Integr. Biol. 2012, 16, 284–287. [Google Scholar] [CrossRef] [PubMed]
  49. Pedersen, T.L. Ggraph: An Implementation of Grammar of Graphics for Graphs and Networks. CRAN: Contrib. Packag. 2017. [Google Scholar] [CrossRef]
  50. Pedersen, T.L. Tidygraph: A Tidy API for Graph Manipulation. 2024. [Google Scholar] [CrossRef]
  51. Yu, G.; He, Q.-Y. ReactomePA: An R/Bioconductor Package for Reactome Pathway Analysis and Visualization. Mol. Biosyst. 2015, 12, 477–479. [Google Scholar] [CrossRef] [PubMed]
  52. Carlson, M. Org.Hs.Eg.Db: Genome Wide Annotation for Human. 2026. [Google Scholar]
  53. Kolde, R. Pheatmap: Pretty Heatmaps. 2025. [Google Scholar]
  54. Merjaneh, N.; Hajjar, M.; Lan, Y.-W.; Kalinichenko, V.V.; Kalin, T.V. The Promise of Combination Therapies with FOXM1 Inhibitors for Cancer Treatment. Cancers 2024, 16, 756. [Google Scholar] [CrossRef] [PubMed]
  55. Mashinchian, M.; Vandghanooni, S.; Karamibonari, A.R.; Eskandani, M. β-Hydroxybutyrate Promotes Chemoresistance and Proliferation in Breast Cancer Cells. Biochem. Biophys. Rep. 2025, 44, 102217. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Transcriptional master regulators of the Kbhb program in Basal BRCA. (A) Scatter plot of normalized enrichment scores (NES) from msVIPER applied to the Kbhb-restricted molecular signature in TCGA Basal (x-axis, n = 195) versus METABRIC Basal (y-axis, n = 209). Each point represents one of the 75 regulators detected in both cohorts; color indicates concordance direction (red: concordant activation; blue: concordant repression; gray: discordant); point size encodes −log10(meta p-value). Pearson r = 0.64. (B) Lollipop chart of meta-NES (Stouffer combination) for the top 20 regulators ranked by |meta-NES|. The filled circle represents the meta-NES; overlaid triangles (METABRIC) and circles (TCGA) show the individual cohort NES values. Red segments: activated regulators; blue segments: repressed. The seven regulators reaching FDR < 0.05 are labeled. (C) Heatmap of per-sample VIPER activity scores (z-scored per TF within each cohort) for the top 25 regulators by |meta-NES|, across all Basal tumor samples from both cohorts. Samples are grouped by cohort (METABRIC, left; TCGA, right) and sorted within each group by mean activity of activated TMRs. Generated with ggplot2 [33], ggrepel [34], and patchwork [35].
Figure 1. Transcriptional master regulators of the Kbhb program in Basal BRCA. (A) Scatter plot of normalized enrichment scores (NES) from msVIPER applied to the Kbhb-restricted molecular signature in TCGA Basal (x-axis, n = 195) versus METABRIC Basal (y-axis, n = 209). Each point represents one of the 75 regulators detected in both cohorts; color indicates concordance direction (red: concordant activation; blue: concordant repression; gray: discordant); point size encodes −log10(meta p-value). Pearson r = 0.64. (B) Lollipop chart of meta-NES (Stouffer combination) for the top 20 regulators ranked by |meta-NES|. The filled circle represents the meta-NES; overlaid triangles (METABRIC) and circles (TCGA) show the individual cohort NES values. Red segments: activated regulators; blue segments: repressed. The seven regulators reaching FDR < 0.05 are labeled. (C) Heatmap of per-sample VIPER activity scores (z-scored per TF within each cohort) for the top 25 regulators by |meta-NES|, across all Basal tumor samples from both cohorts. Samples are grouped by cohort (METABRIC, left; TCGA, right) and sorted within each group by mean activity of activated TMRs. Generated with ggplot2 [33], ggrepel [34], and patchwork [35].
Preprints 219432 g001
Figure 2. Cross-cohort concordance of Kbhb differential expression in Basal BRCA. (A) Circular packing chart of the 1357 Kbhb genes tested for differential expression (adjusted p < 0.05 and |log2FC| ≥ 0.5). The solid outer circle encompasses all genes with significant DE in at least one cohort (Sig. DE, n = 673); the gray circle represents genes without significant DE on either platform (Not DE, n = 684). Inner bubbles are area-proportional to gene count and colored by concordance category: concordant upregulated (dark red, n = 155), concordant downregulated (dark blue, n = 22), TCGA-only up/down (orange/light blue), METABRIC-only (greens), and discordant (purple). Generated with ggraph and tidygraph. (B) Scatter plot of log2FC values (Basal tumor/normal) in TCGA (x-axis) versus METABRIC (y-axis) for the 1201 Kbhb genes with results in both cohorts. Points are colored by concordance category. The dashed diagonal indicates perfect concordance (slope = 1). Pearson r = 0.76. The 10 concordant genes with the highest combined |log2FC| are labeled; HMGA1 is highlighted (white text, dark background) as the sole TMR that is simultaneously a concordantly upregulated Kbhb substrate. Generated with ggplot2 and ggrepel.
Figure 2. Cross-cohort concordance of Kbhb differential expression in Basal BRCA. (A) Circular packing chart of the 1357 Kbhb genes tested for differential expression (adjusted p < 0.05 and |log2FC| ≥ 0.5). The solid outer circle encompasses all genes with significant DE in at least one cohort (Sig. DE, n = 673); the gray circle represents genes without significant DE on either platform (Not DE, n = 684). Inner bubbles are area-proportional to gene count and colored by concordance category: concordant upregulated (dark red, n = 155), concordant downregulated (dark blue, n = 22), TCGA-only up/down (orange/light blue), METABRIC-only (greens), and discordant (purple). Generated with ggraph and tidygraph. (B) Scatter plot of log2FC values (Basal tumor/normal) in TCGA (x-axis) versus METABRIC (y-axis) for the 1201 Kbhb genes with results in both cohorts. Points are colored by concordance category. The dashed diagonal indicates perfect concordance (slope = 1). Pearson r = 0.76. The 10 concordant genes with the highest combined |log2FC| are labeled; HMGA1 is highlighted (white text, dark background) as the sole TMR that is simultaneously a concordantly upregulated Kbhb substrate. Generated with ggplot2 and ggrepel.
Preprints 219432 g002
Figure 3. Over-representation analysis (ORA) of the seven significant TMR regulons. Dotplot showing the top 20 pathways most broadly shared across TMR regulons in GO-BP (upper panel) and Reactome (lower panel). The x-axis lists the seven TMRs, ordered left to right by descending meta-NES. Each dot represents a significant enrichment (adjusted p < 0.05, q-value < 0.2) for that TMR–pathway combination. Dot size encodes the Gene Ratio (fraction of regulon genes annotated to the pathway); dot color encodes the adjusted p-value on a log10 scale (red: most significant; blue: least significant). GO-BP results were simplified to remove semantic redundancy (similarity cutoff = 0.6). VEZF1, the sole repressed TMR, shows no significant Reactome enrichment and appears without data points in the lower panel. Complete ORA results are provided in Supplementary Tables Supplementary_TableS1_ORA_GO_BP.tsv and Supplementary_TableS2_ ORA_Reactome.tsv. Generated with ggplot2.
Figure 3. Over-representation analysis (ORA) of the seven significant TMR regulons. Dotplot showing the top 20 pathways most broadly shared across TMR regulons in GO-BP (upper panel) and Reactome (lower panel). The x-axis lists the seven TMRs, ordered left to right by descending meta-NES. Each dot represents a significant enrichment (adjusted p < 0.05, q-value < 0.2) for that TMR–pathway combination. Dot size encodes the Gene Ratio (fraction of regulon genes annotated to the pathway); dot color encodes the adjusted p-value on a log10 scale (red: most significant; blue: least significant). GO-BP results were simplified to remove semantic redundancy (similarity cutoff = 0.6). VEZF1, the sole repressed TMR, shows no significant Reactome enrichment and appears without data points in the lower panel. Complete ORA results are provided in Supplementary Tables Supplementary_TableS1_ORA_GO_BP.tsv and Supplementary_TableS2_ ORA_Reactome.tsv. Generated with ggplot2.
Preprints 219432 g003
Figure 4. Regulatory interactions between the seven TMRs and differentially expressed Kbhb genes in Basal BRCA (ARACNe-AP TCGA Basal network). Chord diagram with the left arc (upper semicircle) encoding the seven TMRs—sector width proportional to the number of Kbhb DE targets with ARACNe support—and the right arc (lower semicircle) encoding the Kbhb DE genes grouped and colored by concordance category (concordant up: dark red; concordant down: dark blue; TCGA-only up/down: orange/light blue; METABRIC-only: greens). Each chord connects a TMR to a target gene and is colored by the TMR source. The inner track encodes the number of distinct TMRs regulating each gene (gray-to-red gradient). Gene labels (format: SYMBOL (n)) are displayed for genes targeted by ≥3 TMRs. TMR sector labels show the TMR symbol and the total number of Kbhb DE targets. Generated with circlize [39].
Figure 4. Regulatory interactions between the seven TMRs and differentially expressed Kbhb genes in Basal BRCA (ARACNe-AP TCGA Basal network). Chord diagram with the left arc (upper semicircle) encoding the seven TMRs—sector width proportional to the number of Kbhb DE targets with ARACNe support—and the right arc (lower semicircle) encoding the Kbhb DE genes grouped and colored by concordance category (concordant up: dark red; concordant down: dark blue; TCGA-only up/down: orange/light blue; METABRIC-only: greens). Each chord connects a TMR to a target gene and is colored by the TMR source. The inner track encodes the number of distinct TMRs regulating each gene (gray-to-red gradient). Gene labels (format: SYMBOL (n)) are displayed for genes targeted by ≥3 TMRs. TMR sector labels show the TMR symbol and the total number of Kbhb DE targets. Generated with circlize [39].
Preprints 219432 g004
Figure 5. Proposed BHB–Kbhb–FOXM1/CENPA regulatory axis in Basal-like breast cancer. β-Hydroxybutyrate (BHB), produced endogenously or elevated by ketogenic dietary regimens, drives lysine β-hydroxybutyrylation (Kbhb) of histones—a reaction mediated by β-hydroxybutyryl-CoA and catalyzed by the acetyltransferase p300; the mark is removed by HDAC1/2. The resulting H3K9bhb activating mark is enriched at the regulatory targets of FOXM1 and CENPA, the two top transcriptional master regulators (TMRs) of the Kbhb program identified by msVIPER analysis across TCGA and METABRIC (meta-NES: CENPA = +4.55, FOXM1 = +4.27; Stouffer meta-analysis). FOXM1 directly transactivates CENPA, and together they drive a mitotic transcriptional program encompassing AURKA, BUB1, centrosome clustering genes, and broader G2/M machinery, promoting proliferation and genomic instability in TNBC. The elevation of circulating BHB induced by ketogenic dietary regimens may paradoxically reinforce this protumoral axis in Basal/TNBC. Pharmacological nodes amenable to therapeutic intervention include FOXM1 (small-molecule inhibitors in preclinical development), the p300/HDAC1-2 writer–eraser pair (existing inhibitor classes), and secreted HMGA1 (antibody-based strategies). NES, normalized enrichment score; TMR, transcriptional master regulator; Kbhb, lysine β-hydroxybutyrylation; TNBC, triple-negative breast cancer.
Figure 5. Proposed BHB–Kbhb–FOXM1/CENPA regulatory axis in Basal-like breast cancer. β-Hydroxybutyrate (BHB), produced endogenously or elevated by ketogenic dietary regimens, drives lysine β-hydroxybutyrylation (Kbhb) of histones—a reaction mediated by β-hydroxybutyryl-CoA and catalyzed by the acetyltransferase p300; the mark is removed by HDAC1/2. The resulting H3K9bhb activating mark is enriched at the regulatory targets of FOXM1 and CENPA, the two top transcriptional master regulators (TMRs) of the Kbhb program identified by msVIPER analysis across TCGA and METABRIC (meta-NES: CENPA = +4.55, FOXM1 = +4.27; Stouffer meta-analysis). FOXM1 directly transactivates CENPA, and together they drive a mitotic transcriptional program encompassing AURKA, BUB1, centrosome clustering genes, and broader G2/M machinery, promoting proliferation and genomic instability in TNBC. The elevation of circulating BHB induced by ketogenic dietary regimens may paradoxically reinforce this protumoral axis in Basal/TNBC. Pharmacological nodes amenable to therapeutic intervention include FOXM1 (small-molecule inhibitors in preclinical development), the p300/HDAC1-2 writer–eraser pair (existing inhibitor classes), and secreted HMGA1 (antibody-based strategies). NES, normalized enrichment score; TMR, transcriptional master regulator; Kbhb, lysine β-hydroxybutyrylation; TNBC, triple-negative breast cancer.
Preprints 219432 g005
Figure 6. Analytical workflow for identifying transcriptional master regulators of the Kbhb program in Basal-like breast cancer. (1) A gene set of 1493 Kbhb substrates was derived from the proteomic study of Huang et al. (2021). (2) Two independent cohorts were selected: TCGA-BRCA (Basal tumors n = 195, normal tissue n = 113; RNA-seq) and METABRIC (Basal tumors n = 209, normal tissue n = 148; Illumina HT-12 microarray). (3) Expression matrices were preprocessed independently per platform: TCGA data were log2(TPM + 1) transformed and filtered to protein-coding genes expressed in ≥20% of Basal samples (14,488 genes); METABRIC data were first filtered (log2 intensity > 6 in ≥ 20% of Basal samples; 11,976 genes) and then batch-corrected using ComBat (5 cohorts, biological group as covariate). (4) Dataset-specific transcriptional regulatory networks were inferred using ARACNe-AP with 1639 candidate transcription factors [42], mutual information p < 1 × 10−8, and 100 bootstrap iterations, yielding 183,950 edges (TCGA) and 110,780 edges (METABRIC). (5) Master Regulator Analysis was performed using msVIPER with a Kbhb-restricted molecular signature (two-sample t-statistics comparing Basal tumor vs. normal tissue, restricted to expressed Kbhb genes), a null model of 1000 permutations, and shadow analysis to resolve regulatory hierarchy. (6) NES values from both cohorts were integrated by Stouffer meta-analysis with Benjamini–Hochberg FDR correction; regulators with FDR < 0.05 were defined as significant cross-platform validated TMRs. Analytical workflow figures were drafted with assistance from ChatGPT (OpenAI), subsequently reviewed, adapted, and validated by the authors.
Figure 6. Analytical workflow for identifying transcriptional master regulators of the Kbhb program in Basal-like breast cancer. (1) A gene set of 1493 Kbhb substrates was derived from the proteomic study of Huang et al. (2021). (2) Two independent cohorts were selected: TCGA-BRCA (Basal tumors n = 195, normal tissue n = 113; RNA-seq) and METABRIC (Basal tumors n = 209, normal tissue n = 148; Illumina HT-12 microarray). (3) Expression matrices were preprocessed independently per platform: TCGA data were log2(TPM + 1) transformed and filtered to protein-coding genes expressed in ≥20% of Basal samples (14,488 genes); METABRIC data were first filtered (log2 intensity > 6 in ≥ 20% of Basal samples; 11,976 genes) and then batch-corrected using ComBat (5 cohorts, biological group as covariate). (4) Dataset-specific transcriptional regulatory networks were inferred using ARACNe-AP with 1639 candidate transcription factors [42], mutual information p < 1 × 10−8, and 100 bootstrap iterations, yielding 183,950 edges (TCGA) and 110,780 edges (METABRIC). (5) Master Regulator Analysis was performed using msVIPER with a Kbhb-restricted molecular signature (two-sample t-statistics comparing Basal tumor vs. normal tissue, restricted to expressed Kbhb genes), a null model of 1000 permutations, and shadow analysis to resolve regulatory hierarchy. (6) NES values from both cohorts were integrated by Stouffer meta-analysis with Benjamini–Hochberg FDR correction; regulators with FDR < 0.05 were defined as significant cross-platform validated TMRs. Analytical workflow figures were drafted with assistance from ChatGPT (OpenAI), subsequently reviewed, adapted, and validated by the authors.
Preprints 219432 g006
Table 1. Significant transcriptional master regulators (TMRs) of the Kbhb programme in Basal BRCA.
Table 1. Significant transcriptional master regulators (TMRs) of the Kbhb programme in Basal BRCA.
Regulator NES TCGA NES METABRIC NES Meta FDR Biological Relevance
CENPA +2.97 +3.46 +4.55 0.0004 Centromeric histone H3 variant. Overexpressed in cancer and linked to genome instability, kinetochore function and cell-cycle progression; consistent with overlap between the Kbhb programme and chromosomal machinery. [19,20]
FOXM1 +2.70 +3.33 +4.27 0.0007 Master regulator of G2/M progression. Well-documented oncogene in breast cancer/TNBC; FOXM1 has been linked experimentally to CENPA-dependent TNBC proliferation, migration and glycolysis, and to centrosome/cell-cycle programmes. [21,22,23]
HMGA1 +2.38 +2.60 +3.52 0.011 Chromatin architectural protein. Promotes oncogene transcription and tumour progression in TNBC, with roles in stem-like phenotypes, migration/invasion and cell-cycle/histone-gene regulation. Sole TMR that is also a concordantly upregulated Kbhb substrate. [24,25,26]
CHCHD3 +2.33 +2.19 +3.20 0.026 Inner mitochondrial membrane protein (MIC19/CHCHD3). Core MICOS-associated factor required for crista integrity and mitochondrial function; supports the hypothesis that this TMR links BHB-driven metabolic state with mitochondria-to-nucleus signalling. [27,28]
E2F7 (no sig.) +2.82 +3.08 0.031 Atypical E2F repressor of G1/S transcription. Reported to repress oscillating cell-cycle genes and control S-phase progression; its apparent activation here may reflect context-dependent co-regulation of DNA-replication/cell-cycle targets shared with CENPA. Subject to shadow correction by CENPA in METABRIC. [29,30]
ZNF232 +1.78 +2.38 +2.94 0.035 Poorly characterised C2H2 zinc-finger/SCAN-domain protein predicted to participate in transcriptional regulation; original structural and expression analysis supports nuclear localization and broad tissue expression. Novel prediction requiring experimental validation. [31]
VEZF1 −1.25 −3.03 −3.02 0.031 Vascular endothelial zinc-finger factor. REPRESSED—consistent with loss of endothelial/vascular identity in Basal BRCA; VEZF1 is linked to endothelial development/angiogenesis and genome-wide transcriptional regulation. [32]
NES: normalised enrichment score (msVIPER). META: METABRIC. meta-NES: Stouffer-combined score. FDR: Benjamini–Hochberg-adjusted meta p-value. Positive NES indicates activation; negative indicates repression.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Accessibility

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated