Preprint
Article

This version is not peer-reviewed.

Baseline Gut Microbiome Profiling in Therapy-Naive Patients with Locally Advanced Rectal Cancer Identifies Signatures of Response to Total Neoadjuvant Therapy

Submitted:

02 June 2026

Posted:

03 June 2026

You are already at the latest version

Abstract

Background/Objectives: Total neoadjuvant therapy (TNT) is standard for locally advanced rectal cancer, yet treatment response is biologically heterogeneous, underscoring the need for predictive biomarkers. This study aimed to characterize how baseline and on-treatment gut microbiome ecology relate to clinically relevant response dimensions, and to identify baseline-level markers that could be integrated into a pre-treatment response score. Methods: Shotgun metagenomic profiling of the stool microbiome was performed in a real-world cohort of 13 patients with locally advanced rectal cancer treated with TNT followed by total mesorectal excision, together with 18 non-RC controls. Treatment response was assessed in a pathology-anchored framework integrating both tumour regression and nodal response, with favourable response defined by Modified Ryan TRG1–2 and two-category nodal downstaging. Results: Longitudinal analysis from the therapy-naive baseline through TNT to the end of treatment showed that tumour topography was associated with distinct baseline microbiome states and divergent treatment-associated remodelling trajectories. In our cohort, age also stratified microbiome behaviour, with patients aged ≥70 years showing higher baseline pathogen-associated and antimicrobial resistance-linked burden together with a more contraction-prone ecological trajectory. Partitioning the microbiome into below-mean and above-mean fractions showed that biomass was concentrated in dominant taxa, whereas much of the diversity resided in the low-abundance background. Favourable TNT response was associated more with preservation of this low-abundance complexity than with higher overall diversity alone. TNT remodelled, but did not eliminate, a microbiome layer enriched in genotoxicity- and virulence-associated signals, which remained structured by age, tumour location, and response category. Among baseline taxa, Phocaeicola coprophilus emerged as the only species-level signal consistently associated with favourable outcome across both pathological and nodal response dimensions, discriminating patients with concordantly favourable outcomes from all others. To integrate the most informative pre-treatment microbiome features, an exploratory Microbiome Baseline Score was generated, showing promising preliminary performance in this pilot cohort, with a sensitivity of 0.875 and a specificity of 1.00, warranting validation in larger independent cohorts. Conclusions: Our results support the concept that clinically relevant response-associated information may already be embedded in the therapy-naive gut microbiome and can be condensed into a proof-of-concept baseline score for future microbiome-informed pre-treatment stratification in LARC.

Keywords: 
;  ;  ;  ;  ;  ;  ;  

1. Introduction

Locally advanced rectal cancer (LARC) is now increasingly managed within a total neoadjuvant therapy (TNT) framework, in which concurrent chemoradiotherapy and consolidation chemotherapy are delivered before surgery to improve local control, enhance tumour regression, and increase the likelihood of a favourable long--term outcome [1,2]. Yet, despite this therapeutic intensification, response remains biologically heterogeneous [3]. Some patients achieve substantial primary tumour regression together with marked nodal downstaging, whereas others derive more limited benefit despite an apparently similar clinicopathological presentation at baseline [4,5,6]. This persistent heterogeneity highlights the need for biologically informative markers to refine pre-treatment stratification and improve our understanding of why patients traverse different response trajectories under TNT.
Recent TNT--focused analyses demonstrate that higher rates of pathological complete response, improved disease--free survival, and reduced distant relapse are achievable in LARC, but these benefits are not uniformly distributed, underscoring the importance of developing response--predictive biomarkers [6].
Within this context, the gut microbiome has emerged as a plausible and potentially clinically relevant ecological layer linking host condition, mucosal barrier integrity, inflammatory tone, microbial metabolism, and treatment tolerance [7]. More broadly, in colorectal cancer, dysbiosis has been associated with oralisation--like microbial shifts, depletion of obligate anaerobes and short--chain fatty acid (SCFA)--associated taxa, enrichment of pathobiont--like and virulence--linked organisms, and expansion of microbial functions with potential genotoxic relevance [8,9].
At the same time, radiotherapy and chemotherapy are themselves strong ecological perturbations, capable of reshaping the intestinal niche through barrier injury, inflammatory activation, redox imbalance, and altered nutrient landscapes [10,11]. These observations make it biologically plausible that the pre-treatment microbiome may influence, or at least reflect, treatment compatibility, while longitudinal microbiome remodelling during TNT may encode additional information about ecological resilience or vulnerability.
However, the microbiome literature in LARC remains relatively limited, and several clinically meaningful dimensions remain insufficiently explored. Most prior studies have focused on baseline prediction of pathological response or treatment--related toxicity, often in conventional neoadjuvant chemoradiotherapy settings rather than in contemporary TNT--oriented cohorts [12,13,14].
Longitudinal designs are less common, and explicit anatomical stratification by tumour location within the rectum has rarely been incorporated, even though tumour topography is already embedded in clinical decision--making and may influence both mucosal ecology and treatment exposure [15,16]. Likewise, older adults remain underrepresented in major TNT datasets despite the fact that rectal cancer is predominantly diagnosed later in life, making age--linked microbiome behaviour an important but understudied dimension [17]. More broadly, bulk microbiome summaries may obscure whether clinically relevant ecological signals reside in the dominant biomass--bearing core or in the low--abundance diversity reservoir, which is increasingly recognized as a biologically meaningful component of microbial ecosystems [18].
These gaps provide a strong rationale for integrated, pathology--anchored microbiome analyses in real--world LARC--TNT cohorts. Particularly informative are study designs in which treatment benefit is not inferred from tumour regression alone, but is assessed across complementary pathological dimensions, including both primary tumour response and nodal response. Such an approach captures not only local tumour shrinkage, but also lymph node--level treatment effect, thereby offering a biologically more stringent framework for response analysis.
Within this framework, it becomes possible to ask several clinically and ecologically relevant questions: whether therapy--naive microbiome states differ according to tumour location or host age; whether TNT drives convergent or divergent ecological remodeling across patient subgroups; whether potentially harmful microbial signals linked to virulence, pathobiont persistence, or genotoxicity are attenuated or retained during treatment; and whether favourable outcome is associated with preservation of microbial complexity, ecological resilience, or specific baseline species--level signals.
The present work was designed against this background. Using longitudinal shotgun metagenomic profiling of stool samples from a pathology--anchored LARC--TNT cohort, we sought to characterize how baseline and on--treatment microbiome ecology relates to clinically relevant response dimensions. Particular emphasis was placed on tumour topography, host age, abundance--partitioned diversity structure, genotoxicity-- and virulence--associated microbial signals, and the identification of baseline species--level markers with potential translational value. In addition, because a central unmet need in LARC is the development of clinically interpretable pre-treatment biomarkers, we explored whether the most informative baseline microbiome features could be integrated into a composite proof--of--concept score.
In this sense, the present study was intended not only to describe microbiome alterations during TNT, but also to test the broader concept that response--associated information may already be embedded in the therapy--naive gut microbiome and could ultimately contribute to microbiome--informed pre-treatment stratification in rectal cancer.

2. Materials and Methods

Study Design

The study was conducted at the Department of Oncoradiology, University of Debrecen Clinical Centre. Patient enrolment took place between 21 August 2023 and 8 April 2024, during which 13 adult patients with a confirmed diagnosis of locally advanced rectal cancer were recruited. To ensure eligibility for total neoadjuvant therapy, participants were required to have an Eastern Cooperative Oncology Group (ECOG) performance status of 0-2, reflecting adequate general condition to tolerate the planned treatment regimen.
Patients were excluded from participation if they had any of the following: concomitant or second primary malignancy; prior radiotherapy to any anatomical region; previous exposure to systemic chemotherapy; or radiological or histological evidence of distant metastatic disease. Further exclusion criteria encompassed active systemic infection, immune-mediated disorders, or any clinical contraindication to chemoradiotherapy, including bone marrow suppression (anaemia, leukopenia, or thrombocytopenia), untreated hepatic or renal impairment, or cardiac insufficiency. Individuals with a history of habitual alcohol consumption, illicit drug use, or active tobacco smoking were also excluded. For all participants, demographic variables (age, sex, and BMI) and clinicopathological data, including initial clinical stage, tumour localisation within the rectum, and post-operative pathological stage, were retrieved. Moreover 18 healthy volunteers, free from clinical signs or symptoms of RC were recruited to serve as non-rectal cancer controls.
All participants received a full explanation of the study by the responsible clinician and provided written informed consent prior to enrolment. The study protocol received ethical approval from the Institutional Review Board of the University of Debrecen (ethics approval number: DE RKEB/IKEB 6474-2023). All procedures were conducted in accordance with the principles of the Declaration of Helsinki and applicable institutional and national regulatory guidelines.

Treatment Protocol

Total neoadjuvant therapy was initiated following formal multidisciplinary tumour board review. Radiotherapy was delivered to a total dose of 50.4 Gy in 1.8 Gy daily fractions over approximately 5.5 weeks. Concurrent chemotherapy comprised weekly bolus 5-fluorouracil (5-FU; 500 mg/m2) during weeks one and five, followed by a three-month consolidation regimen of FOLFOX or XELOX.
Pelvic simulation CT was acquired without contrast at 3 mm slice thickness (Siemens SOMATOM go.Sim). Target volume delineation, encompassing the gross tumour volume (GTV), clinical target volume (CTV), and planning target volume (PTV), followed RTOG Contouring Atlas guidelines, with planning performed in RayStation (RaySearch Laboratories AB, Stockholm, Sweden) incorporating MRI and PET-CT image fusion. Treatment plan quality was verified by dose–volume histogram (DVH) analysis, with organ-at-risk (OAR) constraints assessed per QUANTEC guidelines. Radiotherapy was delivered using Elekta Versa HD linear accelerators with 6 MV photon beams via IMRT or VMAT techniques under institutional image-guidance protocols.

Assessment of Treatment Response

Baseline clinical staging was established before TNT according to the AJCC 8th edition TNM classification using pelvic MRI, whereas postoperative pathological staging (ypTNM) was determined from resected specimens after completion of TNT and surgery. Treatment response was then assessed using both primary tumour regression and nodal response.
Tumour regression after TNT was graded according to the Modified Ryan Tumour Regression Grade (TRG) system, with TRG1–2 classified as a favourable pathological response and TRG3 as an unfavourable response. Nodal response was evaluated by comparing the baseline clinical nodal stage (cN) with the postoperative pathological nodal stage (ypN), with reductions in nodal category classified as nodal downstaging. More generally, tumour and nodal downstaging were defined as reductions in T and N categories between cTNM and ypTNM, respectively, and were used as pathology-anchored indicators of response to TNT.
To integrate primary tumour and nodal responses into a single patient-level outcome, composite response categories were defined as follows. Patients with both a favourable tumour response (TRG1–2) and nodal response were classified as double-favourable responders (DFR). Patients without a double-favourable response were classified as non-double-favourable responders (non-DFR).

Sample Collection and Nucleic Acid Extraction

Fecal samples were collected from patients at three different timepoints: prior to the commencement of TNT, representing the baseline; at the midpoint of therapy; and upon completion of the radiotherapy course. Sampling was conducted via self-collection by patients either at home or in the hospital using sterile plastic specimen containers. The fecal samples were then transported immediately to the laboratory, where they were aliquoted upon arrival to minimize freeze-thaw cycles and stored at -80 °C for long-term preservation. DNA extraction was performed as previously described by our group [19,20] with using the DNeasy® PowerSoil® Pro Kit (Qiagen, Germany, Cat. 47016) following the manufacturer’s instructions.

Shotgun Sequencing and Metadata Processing

Genomic DNA was randomly fragmented and subjected to end-repair, A-tailing, and Illumina adapter ligation at Novogene Bioinformatics Technology (Beijing, China). Adapter-ligated fragments were size-selected, PCR-amplified, and purified, with library quality and quantity assessed by Qubit fluorometry, real-time PCR, and bioanalyser-based size distribution profiling. Shotgun metagenomic sequencing was performed on an Illumina NovaSeq 6000 platform (Illumina, USA) using 150 bp paired-end reads, generating a minimum of 20 million reads per sample. Where necessary, DNA was re-extracted and purified prior to sequencing to meet required purity (OD260/280 = 1.8–2.0) and concentration (≥ 10 ng/µL) thresholds.
Raw sequencing quality was assessed using FastQC. Metagenomic analysis was conducted with the SqueezeMeta pipeline (v1.6.3) in co-assembly mode without binning [21]. Paired-end reads were assembled using MEGAHIT, followed by taxonomic and functional annotation via DIAMOND (v2.19) against the GenBank database [22,23]. Contig reliability was confirmed by mapping reads back to co-assembled sequences to evaluate coverage for downstream analyses. All computations were executed on the Komondor HPC system at the KIFÜ Hungarian High-Performance Computing Competence Centre, using 48 CPU cores and 90 GB RAM per sample.
For antibiotic resistance profiling, sequencing data underwent quality control using KneadData, incorporating Trimmomatic for adapter trimming and Bowtie2 for host read removal [24,25,26]. Antibiotic resistance genes were subsequently predicted using the Resistance Gene Identifier (RGI) software against the Comprehensive Antibiotic Resistance Database (CARD) [27]. Virulence factors were detected using AMRFinderPlus (https://github.com/ncbi/amr) and sequence assembly was performed with MEGAHIT (https://github.com/voutcn/MEGAHIT).

Statistical Analysis and Visualization

Variables were either presented as aggregate relative frequency or as mean ± SD. All statistical analyses were performed using GraphPad Prism (version 8.0.1). Continuous variables were analysed using nonparametric tests. Three pre-specified families of comparisons were defined to reflect the hierarchical study design:
(i) RC-patients versus non-RC controls. Differences between healthy controls (n = 18) and RC-patient subgroups (Upper/middle rectum Fx0, n = 7; Upper/middle rectum Fx28, n = 7; Lower rectum Fx0, n = 6; Lower rectum Fx28, n = 6) were assessed using the Kruskal-Wallis test followed by Dunn’s multiple comparisons test, restricted to the four pre-specified pairwise comparisons of interest.
(ii) Within RC-patient changes. Within-patient changes between pre-treatment (Fx0) and end-of-treatment (Fx28) samples were evaluated using the Wilcoxon matched-pairs signed-rank test, performed separately for each stratification of interest (e.g., tumor location). Resulting p values within this family were adjusted for multiple comparisons using the Benjamini-Hochberg false discovery rate procedure (Q = 0.05).
(iii) Between-stratum comparisons within the RC-patient cohort. Differences between strata (e.g., tumor location) at each timepoint were assessed using the Kruskal-Wallis test followed by Dunn’s multiple comparisons test, restricted to the two pre-specified pairwise comparisons of interest.
All statistical tests were two-tailed, and adjusted p values < 0.05 were considered statistically significant.
Data visualization was performed using RStudio (version 2025.09.2-418). Barplots and double piecharts were made with the package ‘ggplot2’ (v.4.0.1) [28]. An alluvial plot was conducted using the ‘plotly’ package (v.4.11.0) [29]. Shannon diversity was calculated with the use of the ‘vegan’ package (v.2.6-10) [30]. Venn diagrams were made with the ‘limma’ package (v.3.58.1) [31]. To evaluate the discriminative power of our baseline microbiome signals, the packages ‘pROC’ (v.1.18.5) was used [32].

3. Results

3.1. Cohort Characteristics and Pathology-Based Response Stratification

This retrospective cohort comprised 13 patients with locally advanced rectal cancer who underwent total neoadjuvant therapy followed by total mesorectal excision (TME). Patients were enrolled between 21 August 2023 and 8 April 2024. Eligibility was restricted to patients fit to receive neoadjuvant treatment, defined as an ECOG performance status of 0-2. Exclusion criteria included concomitant malignancy, previous radiotherapy at any anatomical site, prior chemotherapy, and evidence of distant metastatic disease.
Baseline clinical staging was established before TNT according to the AJCC 8th edition TNM classification using pelvic MRI, whereas postoperative pathological staging (ypTNM) was determined from resected specimens after completion of TNT and surgery. Patient-level distributions across baseline clinical stage, age group, tumour location, and pathology-based TNT response categories are summarized in Figure 1a.
Treatment response was assessed using both primary tumour regression and nodal response. Tumour regression after TNT was graded according to the Modified Ryan Tumour Regression Grade (TRG) system, with TRG1–2 classified as a favourable pathological response and TRG3 as an unfavourable response. Nodal response was evaluated by comparing the baseline clinical nodal stage (cN) with the postoperative pathological nodal stage (ypN), with reductions in nodal category classified as nodal downstaging. More generally, tumour and nodal downstaging were defined as reductions in T and N categories between cTNM and ypTNM, respectively, and were used as pathology-anchored indicators of response to TNT.

3.2. Tumour Location Was Associated with Distinct Longitudinal Microbiome Trajectories During the Chemoradiotherapy of TNT

We compared microbiome features between Fx0 and Fx28 after stratifying the cohort by tumour location (low rectum vs. mid/upper rectum). The balance between obligate anaerobic and aerotolerant taxa was next examined, expressed as the ratio of obligate to aerotolerant taxa (Figure 2a). In patients with low-rectum tumours, the baseline configuration at Fx0 was shifted toward aerotolerant taxa at the expense of obligate anaerobes, as reflected by a strongly negative obligate anaerobe-to-aerotolerant balance (-0.84). By Fx28, however, this imbalance had largely resolved, with the ratio approaching equilibrium and showing a slight predominance of obligate anaerobes (0.009). In contrast, the mid/upper-rectum subgroup displayed a persistent predominance of obligate anaerobic taxa at both time points, with a positive balance already at baseline (0.11) that increased further by Fx28 (1.08).
We next assessed the aggregate relative frequency of oralisation-associated taxa across the two tumour-location subgroups (Figure 2b). At Fx0, both subgroups showed significantly elevated levels relative to non-RC controls. In the low-rectum subgroup, the aggregate relative frequency reached 0.0012, representing a significant increase compared with controls (p = 0.0097). In the mid/upper-rectum subgroup, the corresponding baseline value was 0.005 (HC: 0.00015, p = 0.0087), indicating a markedly stronger enrichment and an approximately 4.2-fold higher level than that observed in the low-rectum subgroup at the same time point. By Fx28, oralisation-associated taxa had declined in both subgroups, consistent with treatment-associated attenuation of this dysbiotic signal. The reduction was more pronounced in the mid/upper-rectum subgroup (p = 0.0156), in which the end-of-treatment level approached that of non-RC controls.
The aggregate relative frequency of biofilm-forming taxa was likewise elevated relative to non-RC controls (Figure 2c). However, a great enrichment was detected only in the low-rectum subgroup at Fx0, where the aggregate relative frequency reached 0.21, corresponding to an approximately 2.4-fold increase over controls (0.087). In this subgroup, the value declined to 0.17 by Fx28, indicating a treatment-associated reduction, although it remained approximately 2.0-fold higher than the control level. By contrast, in the mid/upper-rectum subgroup, the aggregate relative frequency remained essentially unchanged between Fx0 and Fx28, with a value of 0.14 at both time points. Although this corresponded to an approximately 1.6-fold elevation relative to controls, the difference did not reach statistical significance (p > 0.05).
SCFA-producing taxa were grouped according to their predominant reported metabolic output into acetate-, butyrate-, and propionate-associated producer groups (Figure 2d). Across both tumour-location subgroups, the total aggregate relative frequency of SCFA-associated taxa remained below that observed in non-RC controls, indicating an overall depletion of SCFA-related functional potential in the rectal cancer cohort. However, the temporal direction of change differed by tumour location. In the low-rectum subgroup, the total SCFA-associated signal showed a slight, non-significant decline over TNT, despite an increase in the butyrate-associated fraction from 0.042 to 0.069 and a decrease in the acetate-associated fraction from 0.097 to 0.070. In the mid/upper-rectum subgroup, by contrast, the total SCFA-associated signal increased overall, with the most marked change observed in the propionate-associated fraction, which rose from 0.077 to 0.147. The butyrate-associated fraction remained relatively stable (0.056 to 0.047), whereas acetate-associated producers declined from 0.078 to 0.047. Notably, this upward shift in the mid/upper-rectum subgroup brought the overall SCFA-associated signal close to the control level by Fx28, whereas the low-rectum subgroup showed an overall decline despite internal restructuring of the individual SCFA-producer fractions.

3.3. Age-Associated Differences in Diversity, Potential Pathogen Load, and AMR Burden During TNT

A clear age-dependent divergence in longitudinal trajectories emerged after distinguishing patients aged <70 years from those aged ≥70 years (Figure 3a). In the ≥70-year subgroup, estimated biomass was higher at baseline but declined markedly by Fx28, with ARF falling from 0.38 to 0.30, corresponding to an absolute decrease of 0.08 and a relative reduction of approximately 21.1%. By contrast, patients aged <70 years showed a modest increase in ARF between Fx0 and Fx28, rising from 0.312 to 0.35, corresponding to an absolute increase of 0.038 and a relative increase of approximately 12.2%, consistent with a comparatively stable biomass profile over the same interval.
Assessment of the aggregate relative frequency of putative pathogen-associated taxa indicated a higher overall burden in patients aged ≥70 years than in those aged <70 years (Figure 3b). Patients aged ≥70 years maintained a consistently higher pathogen-associated burden throughout the chemoradiotherapy phase of TNT. Across the two timepoints, cumulative abundance was 0.225 in the ≥70-year subgroup and 0.165 in the <70-year subgroup.
Longitudinally, this signal showed only a slight numerical decline in the older subgroup, from 0.23 at Fx0 to 0.22 at Fx28. By contrast, the <70-year subgroup showed a quantitatively larger decrease, from 0.18 to 0.15, corresponding to an approximately 16.7% reduction.
Because age was hypothesized to influence the accumulation of antimicrobial resistance (AMR)-associated features, AMR burden was compared across age strata over the course of treatment (Figure 3c). At baseline, patients aged ≥70 years showed a higher AMR burden than those aged <70 years (23,940 vs. 19,845), indicating greater initial accumulation of resistance-associated reads in the older subgroup. By Fx28, AMR burden had declined in both age strata, and in the ≥70-year subgroup the signal decreased from 23,940 to 16,143. By the end of the chemoradiotherapy phase, AMR burden was slightly higher in patients aged <70 years than in those aged ≥70 years (16,991 vs. 16,143).

3.4. Response-Stratified Longitudinal Changes in Low- vs. High-Abundance-Partitioned Microbiome Diversity During TNT

Shannon diversity was assessed after dichotomizing patients into two TNT response-defined subgroups, a double-favourable response group (DFR), comprising those with both favourable pathological response (Modified Ryan TRG1–2) and favourable nodal response, defined as two-category nodal downstaging, and a non-double-favourable response group (non-DFR), including all remaining patients.
Although the difference was not statistically significant, the DFR group showed a slightly higher mean Shannon value than the non-DFR group (3.48 vs. 3.42, p > 0.05; Figure 4a).
We then examined longitudinal within-group changes over the course of TNT and observed opposite directional trends in the two TNT response-defined subgroups (Figure 4b). In the DFR group, mean Shannon increased from 3.38 at baseline to 3.62 by the end of treatment, whereas in the non-DFR group, it decreased from 3.53 to 3.42. Neither change reached statistical significance (p > 0.05).
We next classified detected species into low-abundance taxa (LAT) and high-abundance taxa (HAT) according to whether their relative frequency fell below or above the mean relative-frequency threshold (0.000344), and examined both the biomass contribution and taxonomic coverage of these two abundance fractions in the DFR and non-DFR groups (Figure 4c).
In both groups, LAT represented only a minor fraction of the estimated microbial biomass, but this fraction was slightly larger in DFR than in non-DFR (3% vs. 2%), whereas the remaining biomass was attributable to HAT. Thus, although the low-abundance compartment contributed only a limited share of the total biomass, its relative contribution was modestly greater in patients with double-favourable response.
In contrast, taxonomic coverage showed the opposite pattern. Overall, 90.1% of all detected species belonged to the LAT fraction, whereas only 9.9% were assigned to HAT, indicating that the microbiome was biomass-dominated by a relatively restricted set of high-abundance taxa, while the vast majority of species resided in the low-abundance background.
We next examined how Shannon diversity differed between the LAT and HAT fractions within the DFR and non-DFR groups (Figure 4d). In case of the TNT response-defined subgroups, the LAT fraction only showed significantly higher Shannon diversity than the HAT fraction in case of the non-DFR group, both at baseline (Fx0; DFR p > 0.05; non-DFR p = 0.0156) and at the end of TNT (Fx28; DFR p > 0.05; non-DFR p = 0.0156).
When longitudinal trends in Shannon diversity were examined, opposite directional changes were observed between the LAT and HAT (Figure 4e). In both the DFR and non-DFR groups, Shannon diversity decreased in the LAT fraction from Fx0 to Fx28, falling from 4.23 to 3.98 in DFR and from 3.67 to 3.61 in non-DFR. By contrast, the HAT fraction showed an increase over the same interval, rising from 2.60 to 2.77 in DFR and from 2.22 to 2.43 in non-DFR.
Notably, although LAT-associated Shannon diversity declined in both subgroups during TNT, the Fx28 value in DFR (3.98) remained higher than the baseline Fx0 value in non-DFR (3.67). Likewise, in the HAT fraction, the baseline Shannon value in DFR (2.60) exceeded the Fx28 value observed in non-DFR (2.43), despite the longitudinal increase detected in both groups.

3.5. Potential Genotoxic Burden and Virulence-Associated Gene Profiles in LARC During TNT

We next quantified the aggregate relative frequency of a predefined, literature-informed panel of genotoxicity-associated taxa (GAT) as a species-level proxy for potential genotoxicity-related burden. This panel comprised species for which genotoxic activity has been described in specific strains or toxin-producing subsets in the mechanistic literature, including Escherichia coli, Campylobacter jejuni, Helicobacter pylori, Shigella dysenteriae, Shigella flexneri, Shigella sonnei, Aggregatibacter actinomycetemcomitans, Bacteroides fragilis, Clostridium perfringens, and Morganella morganii (see Supplementary File 1). The GAT panel should therefore be interpreted as an ecological enrichment signal rather than as direct evidence of functionally validated genotoxic strains in the analysed samples.
Using this predefined GAT panel, we quantified the aggregate relative frequency of these taxa across the TNT timeline (Figure 5a). Relative to non-RC controls (ARF = 0.084), the aggregate GAT signal was elevated at all evaluated patient time points, reaching 0.14 at Fx0, 0.17 at Fx14, and 0.134 at Fx28, corresponding to approximately 1.67-fold, 2.02-fold, and 1.60-fold higher abundance, respectively. The signal peaked at Fx14 and subsequently declined by Fx28, although it remained above the control reference level.
We then assessed whether this GAT-associated signal showed subgroup-specific differences by the end of TNT compared with the baseline patterns observed in the tumour location- and age-stratified groups (Figure 5b). With respect to tumour location, GAT abundance remained higher than that observed in non-RC controls at both evaluated treatment stages. In patients with low rectal tumours, ARF reached 0.27 at Fx0 and 0.16 at Fx28, corresponding to approximately 3.21-fold and 1.90-fold higher abundance than controls, respectively. In the mid/upper rectum subgroup, the corresponding values were 0.12 at Fx0 and 0.10 at Fx28, indicating more modest elevations, equivalent to approximately 1.43-fold and 1.19-fold higher abundance than controls.
A similarly marked divergence emerged after age stratification. Patients aged ≥70 years showed a consistently higher GAT burden than those aged <70 years at both evaluated treatment stages (Figure 5c). In the ≥70-year subgroup, ARF values were 0.19 at Fx0 and 0.20 at Fx28, corresponding to approximately 2.26-fold and 2.38-fold higher abundance than controls, respectively. By contrast, in patients aged <70 years, the corresponding values were 0.09 at Fx0 and 0.07 at Fx28. This represented only a marginal elevation above controls at baseline (~1.07-fold), followed by a decline to a level below the control reference by the end of TNT (~0.83-fold of control, i.e. approximately 16.7% lower).
Shotgun metagenomic profiling was also used to quantify the aggregate occurrence of virulence-associated genes (VAGs) with potential relevance to mucosal persistence, host interaction, and treatment-linked ecological fitness in the LARC-TNT setting (Figure 5d). As summarized in Supplementary File 2, the detected VAG repertoire was not restricted to classical toxin determinants, but also included genes related to adhesion, colonization, iron acquisition, immune evasion, and host adaptation. The lowest aggregate VAG count was observed in non-RC controls (40), whereas intermediate counts were detected in patients with a double-favourable outcome, defined by favourable pathological response together with favourable nodal response (82). The highest aggregate virulence-associated gene count was found in the non-favourable-positive outcome group (non-DFR), which included all remaining patients (155).

3.6. Exclusivity-Based Identification of Outcome-Associated Species Across TNT

A further aim of the study was to identify species showing response-exclusive occurrence within our response-based dichotomized groups (DFR vs. non-DFR) (Figure 6a). To improve robustness and translational relevance, screening was restricted to species showing above-mean abundance, thereby prioritizing candidate indicators expected to have more stable detectability.
Particular emphasis was placed on the treatment-naive baseline (Figure 6a/1), given its greater potential for early response stratification, while Fx14 (Figure 6a/2) and Fx28 (Figure 6a/3) were additionally examined.
At baseline, the subset of high-abundance taxa comprised 79 detected species. Within this subset, Phocaeicola coprophilus emerged as the only species showing exclusive occurrence in the favourable group for both endpoints, being confined to the favourable nodal response group and, independently, to the favourable pathological response group (Figure 6a). Notably, no species with exclusive occurrence was detected in the unfavourable groups at Fx0 (Figure 6a/1).
However, the longitudinal behaviour of P. coprophilus differed between the two endpoints. In the pathological response-based analysis, P. coprophilus remained exclusively associated with the favourable pathological response group at all three evaluated time points (Figure 6a), indicating a stable association with favourable pathological response throughout TNT. This exclusivity pattern broadened over time: at Fx0, P. coprophilus was the sole exclusive species in the favourable pathological response group, whereas at Fx14 (Figure 6a/2) this pattern was shared with Bifidobacterium bifidum, and by Fx28 (Figure 6a/3) two additional species, Ligilactobacillus salivarius and Bifidobacterium pseudocatenulatum, also showed exclusive occurrence in the favourable pathological response group.
By contrast, in the nodal response-based analysis, P. coprophilus retained exclusivity for the favourable nodal response group only at baseline (Figure 6a/1). At Fx14 (Figure 6a/2) and Fx28 (Figure 6a/3), it was no longer confined to that group. Instead, the only species showing exclusive occurrence in the non-favourable nodal response group at both post-baseline time points was Candidatus Evtepia excrementipullorum.
The corresponding longitudinal relative-frequency profiles of P. coprophilus across the same response-defined subgroup comparisons are shown in Figure 6b.

3.7. Integration of Baseline Microbiome Biomarkers into a Response Score

To integrate Fx0 baseline microbiome signals, we combined a shared species-level marker with two endpoint-specific community-level distance classifiers into a single exploratory framework (Figure 7).
Among all baseline HAT taxa, Phocaeicola coprophilus emerged as the only species-level signal consistently associated with a favourable outcome across both principal clinical endpoints, namely favourable pathological response and favourable nodal response.
Its diagnostic performance was first evaluated against a combined binary outcome defined as the double-favourable response group (DFR), comprising patients with both favourable pathological response (Modified Ryan TRG1–2) and favourable nodal response, defined here as two-category nodal downstaging, versus the non-double-favourable response group (non-DFR), which included all remaining patients. Under this definition, the baseline P. coprophilus signal yielded an AUC of 0.90, with a sensitivity of 1.00 and a specificity of 0.80, supporting its role as a shared favourable baseline marker (Figure 7a).
We next assessed whether baseline global ecological deviation from the non-RC reference microbiome configuration could also discriminate subsequent treatment outcome. For this purpose, a non-RC reference centroid was first generated from the control samples and used as an ecological reference point. The Reference Distance Index (RDI) was then calculated for each treatment-naive baseline sample (Fx0) as the Bray-Curtis distance from this non-RC reference centroid. Thus, higher RDI values indicated greater baseline deviation from the non-RC reference microbiome. ROC analyses were subsequently performed separately for the two clinical endpoints: favourable versus unfavourable pathological response, and favourable nodal response versus no nodal downstaging (Supplementary File 3).
For pathological response, the optimal RDI cut-off was 0.77, yielding an AUC of 0.926, a sensitivity of 1.00, and a specificity of 0.89 (Figure 7b). For nodal response, the optimal cut-off was 0.79, yielding an AUC of 0.80, a sensitivity of 0.85, and a specificity of 0.80 (Figure 7c). Based on these results, three binary baseline components were defined for score construction.
First, the P. coprophilus indicator (PCI) was derived by dichotomizing the baseline P. coprophilus signal according to its ROC-based favourable-side threshold obtained from the DFR versus non-DFR analysis.
Second, the pathological response-specific classifier (CTRG) was coded according to whether the baseline RDI fell on the favourable side of the TRG-specific cut-off.
Third, the nodal response-specific classifier (CND) was coded according to whether the baseline RDI fell on the favourable side of the nodal-response-specific cut-off.
The exploratory Microbiome Baseline Score (MBS) was then constructed as a weighted combination of these three binary components, thereby integrating the shared species-level signal with the two endpoint-specific ecological classifiers. Component weights were assigned according to their ROC-derived Youden indices, which were 0.80 for PCI, 0.89 for CTRG, and 0.65 for CND. After normalization, the corresponding weights were 0.34, 0.38, and 0.28, respectively, yielding the following score:
MBS = 0.34 × PCI + 0.38 × CTRG + 0.28 × CND
When evaluated against the same DFR versus non-DFR outcome definition, the MBS achieved optimal performance at a cut-off of 0.52, with a sensitivity of 0.875 and a specificity of 1.00 (Figure 7d).

4. Discussion

The present cohort was assembled within a contemporary total neoadjuvant therapy-based treatment framework for locally advanced rectal cancer. In this LARC-TNT pilot, baseline staging was established by pelvic MRI, followed by pathology-anchored post-treatment assessment using ypTNM after total mesorectal excision.
An important methodological strength of the study is that treatment benefit was defined using two complementary pathology-based dimensions, favourable i) pathological and ii) nodal response. By capturing both tumour-level regression and lymph node-level treatment effect, our framework provides a biologically more informative assessment of therapeutic efficacy. Furthermore, the requirement that a favourable nodal response should correspond to a two-category decrease in nodal stage further increased the biological stringency of the response definition. Our cohort represents an older real-world population than many landmark TNT datasets, in which median ages have generally clustered in the early sixties and older adults have remained underrepresented.
The published microbiome literature in locally advanced rectal cancer remains limited, and longitudinal studies explicitly stratified by tumour location (low rectum vs. mid/upper rectum) appear to be uncommon. As summarized in the recent literature, the field has so far been dominated by studies aiming to predict pathological response or treatment-related toxicity from baseline or on-treatment microbiome features, mostly in neoadjuvant chemoradiotherapy settings rather than in anatomically stratified longitudinal designs [33].
This gap is clinically relevant because tumour location is already embedded in contemporary LARC decision-making, and more broadly, colorectal cancer microbiome research has only recently begun to show that fecal microbiome architecture differs according to tumour topography and that location-aware models may outperform location-agnostic biomarker strategies [34,35].
TNT did not induce a uniform redox-ecological shift across tumour locations. In low-rectum cases, treatment appeared to correct a baseline aerotolerant-skewed imbalance toward near-equilibrium, whereas mid/upper-rectum cases already displayed a modest obligate-anaerobe-favoured profile at baseline, which became even more pronounced by the end of TNT, indicating that TNT-associated microbial remodelling is not merely treatment-driven, but strongly shaped by tumour topography: low-rectum cases move toward redox-ecological balance, whereas mid/upper-rectum cases consolidate an obligate-anaerobe-dominant state. This supports the observation that low-rectum tumours may arise in, or be associated with, a more aerotolerant-shifted microbial environment that partially normalizes during treatment, whereas mid/upper-rectum tumors may already occupy a more anaerobe-favoured niche that becomes further reinforced by TNT [36].
Baseline enrichment of oralisation-associated taxa was observed in both tumour-location subgroups, but this pattern is unlikely to reflect distal barrier injury alone. If oralisation were primarily driven by local distal damage, the strongest signal would be expected in low-rectum tumours. Instead, enrichment was more pronounced in mid/upper-rectum cases relative to controls, suggesting that oralisation represents a broader tumour-associated ecological disturbance [37,38]. This interpretation is consistent with colorectal cancer studies linking oral-associated taxa, such as Fusobacterium, Peptostreptococcus, Parvimonas, Porphyromonas, Prevotella, and Streptococcus, to inflammatory and tumour-associated microbial niches [39]. The decline of these taxa by the end of TNT in both subgroups further suggests that oralisation was partly treatment-responsive and more closely related to the baseline tumour-associated niche than to therapy-induced barrier injury alone.
The behaviour of the inferred SCFA-producer taxa should be considered in light of an important methodological limitation. The SCFA-related groupings used here were taxon-based and assigned according to the predominantly reported metabolic orientation of each species, rather than derived from direct metabolite measurements, and individual taxa may contribute to more than one SCFA depending on ecological context.
Even with this caveat, the overall directional patterns remained biologically informative. In the mid/upper rectum subgroup, the aggregate SCFA-associated signal moved toward control-like levels by Fx28, whereas in the low-rectum subgroup, it declined further below baseline. Given that colorectal cancer is commonly associated with depletion of butyrate-producing bacteria , reduced fecal butyrate and, to a lesser extent, acetate [40], the mid/upper pattern appears more compatible with partial restoration of fermentative community function during TNT, whereas the low-rectum pattern suggests persistence, or possibly deepening, of fermentation-related ecological fragility.
Notably, the low-rectum subgroup displayed the lowest overall SCFA-associated signal but the highest relative contribution of butyrate-associated producers. This does not necessarily imply preserved butyrate-generating capacity in absolute terms; rather, it is more consistent with a functionally narrowed residual community in which butyrate-associated taxa account for a larger proportion of a quantitatively contracted fermentative pool. Such a configuration is compatible with reduced functional redundancy, implying that an apparently favourable butyrate-enriched fraction may persist within an overall metabolically depleted and potentially less resilient system [41].
By contrast, the mid/upper rectum subgroup, in which the propionate-associated fraction contributed most strongly while the total SCFA-associated signal recovered, may have undergone broader fermentative reassembly through a relatively more propionogenic route. This is notable because propionate also contributes to mucosal and immune homeostasis, and acetate/propionate flux can participate in cross-feeding networks that support downstream butyrate production [42,43].
Age-based dichotomization at 70 years was clinically justified in the present cohort because older adults remain underrepresented in TNT-oriented rectal cancer trial populations, whereas rectal cancer is predominantly diagnosed later in life. In major TNT studies, median ages have generally been around 61–62 years, and only 12% of patients in PRODIGE-23 were aged ≥70 years [44]. Against this background, the age-associated microbiome separation observed in our cohort should be regarded as hypothesis-generating evidence that host age context may be associated with treatment-linked microbial behaviour during TNT.
Based on our data, patients aged ≥70 years showed higher treatment--naive aggregate relative frequency, a higher baseline burden of putative pathogen--associated taxa, and a higher baseline AMR burden than patients aged <70 years, consistent with a microbiome configuration at treatment initiation that may be ecologically less favourable and enriched in pathobiont-- and resistance--associated signals [45]. Such an interpretation is biologically plausible, because ageing has repeatedly been linked to reduced microbiome resilience, greater instability under stress, and increased susceptibility to expansion of inflammation-- and pathogen--associated community members [46,47].
In parallel, colorectal cancer--associated microbiomes are also known to be enriched in virulence-- and resistance--linked functions [48], making it plausible that age and disease context may converge to amplify ecological vulnerability at treatment initiation.
Within this LARC--TNT framework, the modest decline in aggregate relative frequency by Fx28 in the ≥70--year--old subgroup is of particular interest, as it may indicate an initially expanded but ecologically fragile microbial configuration. Importantly, this biomass contraction occurred alongside only a minimal reduction in the pathogen--related signal. By contrast, the <70--year--old subgroup displayed a more stable biomass trajectory, a modest decline in putative pathogen--associated taxa, and a lower baseline AMR burden during the chemoradiotherapy phase of TNT.
At baseline, therapy--naive patients aged ≥70 had significantly higher AMR burden in the gut microbiome than those aged <70, and showed the most pronounced longitudinal decline. However, by the end of TNT, the AMR load in the ≥70--year--old group had decreased, falling below that observed in the younger subgroup, which age-- and time--dependent reversal in AMR dynamics has not, to our knowledge, been previously reported in LARC cohorts receiving TNT, although age--related accumulation of resistance genes in the gut resistome has been described in other settings [49,50].
Although Shannon diversity differed only minimally between DFR and non-DFR at the cohort level, the two groups showed opposite directional trajectories during TNT, with a numerical increase in DFR and a decrease in non-DFR. Even though neither change reached statistical significance, the contrasting temporal pattern may still be biologically informative, because it suggests that a favourable combined TNT response was associated not simply with higher diversity per se, but with preservation or partial restoration of microbiome complexity over the course of treatment. Conversely, the declining trend in non-DFR may indicate a less resilient microbiome configuration, potentially more susceptible to treatment-associated ecological contraction.
Most estimated biomass was carried by high-abundance taxa, whereas most detected species belonged to the low-abundance taxa fraction. This LAT/HAT distinction is not entirely novel from a microbial ecology perspective, because microbial communities typically show a strongly skewed abundance structure, with a relatively small dominant fraction coexisting with a much larger low-abundance or rare biosphere background [18]. What makes the distinction useful here is that it separates two ecological properties that are often conflated in bulk microbiome summaries, biomass dominance and taxonomic breadth. In the broader literature, low-abundance taxa are increasingly recognized as biologically meaningful community members rather than analytical noise, with evidence suggesting that they can contribute disproportionately to biodiversity, species turnover, community dynamics, and ecosystem function [18].
In our data, most estimated biomass resided in HAT, whereas most detected species belonged to LAT. This is conceptually useful because global Shannon values alone may obscure whether ecological complexity is concentrated in the dominant fraction or in the low-abundance background. Thus, a central message of our findings is that microbiome complexity in LARC-TNT was not primarily embedded in the dominant biomass-bearing taxa, but in the low-abundance background.
Our data suggest that TNT remodelled, but did not eradicate, a microbiome layer enriched in taxa and gene functions linked to potential genotoxicity and virulence, which interpretation was supported by two concordant observations. First, the aggregate signal of genotoxicity-associated taxa remained elevated above the non-RC reference at all patient timepoints and peaked at Fx14. Existing longitudinal LARC studies have shown that neoadjuvant chemoradiotherapy dynamically reshapes the gut microbiome, whereas mechanistic colorectal cancer research indicates that the strongest evidence for microbiome-linked genotoxicity centers on pks-positive Escherichia coli, enterotoxigenic Bacteroides fragilis, and CDT-producing Campylobacter spp. [51]. The Fx14 peak was particularly notable because it is compatible with a mid-treatment stress-window model, presumably arising from the ability of fractionated chemoradiotherapy to reshape the rectal niche through barrier injury, inflammatory activation, and ecological filtering, rather than through uniform suppression of all potentially harmful taxa. Second, the virulence gene-level signal was lowest in non-RC controls and in patients with double-favourable response, but higher in non-DFR patients.
Current biomarker research in LARC has focused mainly on tumour- and host-derived response signals, such as transcriptomic, immune, and genomic features [52], while microbiome studies have mostly examined taxonomic predictors of response or toxicity rather than a composite virulence-gene burden [53].
Importantly, based on shotgun metagenomic profiling, the virulence-gene signal in our cohort was not dominated by classical genotoxins. Instead, it was mainly characterized by a broader repertoire of persistence-associated functions, including genes related to adhesion, colonization, iron acquisition, host survival, and immune evasion. Recurrently detected examples included iss, fdeC, fimbrial or colonization-associated genes (lpfA-O113, sfaS, sfaF/focD), and several siderophore modules, particularly salmochelin (iroB/iroC/iroD/iroE/iroN), aerobactin (iucA/iucB/iucC/iucD/iutA), and yersiniabactin transport (ybtP/ybtQ). This pattern is consistent with a microbiome-derived virulence profile supporting mucosal persistence and microbial fitness under inflammatory and iron-restricted conditions, rather than a purely toxin-driven state. Genotoxic markers, including clbB/clbN and cdtB, were also detected, but represented a smaller component within this broader virulence architecture. Overall, the adverse microbial signal in LARC appears to reflect not only potential DNA-damaging capacity, but also a wider pathobiont-like ecological program that may favour persistence within the treatment-perturbed rectal niche.
Phocaeicola coprophilus emerged in our dataset as a baseline candidate marker of favourable response in the LARC-TNT setting, whose potential translational relevance rests on three convergent observations in our cohort. It was already detectable before treatment, in baseline samples, it was confined to the DFR group at Fx0, and its separation from the non-DFR group became more pronounced during TNT. This pattern is notable because published LARC microbiome biomarker studies, as well as recent reviews, have focused predominantly on broader taxonomic shifts, pathway-level alterations, or multi-feature response signatures [54,55], rather than on P. coprophilus itself as a species-level marker of treatment response.
The biological plausibility of this signal is supported by emerging mechanistic and clinicopathological evidence. Recent work showed that radiation-induced intestinal fibrosis was associated with depletion of P. coprophilus, increased kynurenine levels, and activation of the IDO1–kynurenine–AHR axis, whereas restoration of the species, or supplementation with its metabolite 6-methyluracil, attenuated fibrogenesis [56]. In the context of EBRT-based TNT, this provides a biologically credible framework in which persistence of P. coprophilus may reflect a less fibrosis-permissive and more treatment-compatible mucosal state under radiation stress.
A complementary, albeit indirect, line of support comes from distal colorectal cancer, where reduced mucosa-associated Phocaeicola abundance was linked to nodal and distant metastasis and correlated positively with CD3+ and CD8+ tumour-infiltrating lymphocytes [43]. Although those observations were made primarily at the genus level and cannot be directly extrapolated to P. coprophilus, they are directionally consistent with the concept that Phocaeicola abundance tracks with a less aggressive and more immune-supportive local tumour ecosystem [56,57].
At the same time, interpretation of this signal requires caution. In microbiome datasets, apparent absence should not be equated with confirmed biological absence, but rather understood primarily as non-detectability within the constraints of the method and dataset [58].
Within this framework, P. coprophilus is more plausibly interpreted not as an isolated protective factor, but as a sentinel of a more resilient and ecologically preserved baseline microbiome state.
In this proof-of-concept exploratory analysis, the Microbiome Baseline Score was established as a composite stool microbiome metric that integrates a species-level signal with treatment response-specific ecological distance classifiers at the therapy-naive baseline. The findings suggest that clinically relevant response-associated information may already be embedded in the baseline gut microbiome and can be summarized in a clinically interpretable form before TNT initiation.
The proposed framework is based on a simple logic. First, each baseline microbiome profile is compared with a non-RC reference centroid generated from control samples, yielding the Reference Distance Index, which reflects the degree of deviation from the reference microbiome state. The RDI is then assessed against two endpoint-specific cut-offs, one for pathological response and one for nodal response, and the corresponding classifiers are coded according to whether the sample falls on the favourable side of each threshold. In parallel, the baseline Phocaeicola coprophilus signal is evaluated and converted into a binary PCI component. These three components are then combined into the weighted MBS, with higher values indicating a baseline microbiome profile more consistent with the double-favourable response pattern.
Translationally, the MBS provides a concise way to condense complex pre-treatment microbiome information into a single patient-level score. Rather than implying a direct causal role for the microbiome or for any individual species, the score should be interpreted as a summary of a broader microbial state that may reflect treatment compatibility and mucosal resilience. Although not intended as a ready-to-use clinical decision tool, the MBS may provide a hypothesis-generating framework for future microbiome-informed pre-treatment stratification in LARC.

5. Conclusions

This LARC-TNT pilot was designed within a clinically contemporary treatment framework and captured response through both primary tumour regression and nodal downstaging, thereby providing a biologically stringent basis for microbiome-response analyses.
Tumour topography emerged not merely as an anatomical descriptor, but as a determinant of baseline microbial organization and treatment-linked ecological remodelling during TNT.
The microbiome changes observed during TNT were not uniform, but anatomically patterned, with evidence suggesting location-dependent restructuring of oralisation-associated, and anaerobe–aerotolerant community features.
Functional interpretation of the inferred SCFA-related compartment suggests that mid/upper rectal tumours were associated with partial fermentative recovery during TNT, whereas low-rectal tumours retained signs of ecological and metabolic fragility.
Older age was associated with a more vulnerable therapy-naive microbiome state, characterized by higher pathogen-associated and AMR-linked burden and by reduced ecological resilience during treatment.
Our data revealed an age-dependent reversal in antimicrobial resistance dynamics: patients aged ≥70 years entered TNT with a higher baseline AMR burden, yet showed the steepest decline over treatment, resulting in a lower AMR load than younger patients by the end of chemoradiotherapy.
Microbiome complexity in LARC-TNT was embedded predominantly in the low-abundance background rather than in the dominant biomass-bearing core, highlighting the low-abundance fraction as a biologically meaningful reservoir of ecological resilience.
Favourable TNT response was associated less with higher diversity per se than with preservation, or partial restoration, of microbiome complexity over time.
TNT remodelled, but did not eradicate, a microbiome layer enriched in genotoxicity- and virulence-associated signals, consistent with persistence of a treatment-perturbed pathobiont-like ecological program.
The adverse microbial signal in non-favourable responders was defined by a broader persistence-oriented virulence repertoire linked to adhesion, colonization, iron acquisition, and host adaptation.
Phocaeicola coprophilus emerged as a pre-treatment-accessible, response-associated candidate marker whose detectability may index a more resilient and treatment-compatible baseline microbial state.
Our results support the concept that clinically relevant response-associated information may already be embedded in the therapy-naive gut microbiome and can be condensed into a proof-of-concept baseline score for future microbiome-informed pre-treatment stratification in LARC.

Supplementary Materials

The following supporting information can be downloaded at the website of this paper posted on Preprints.org, S1: List of genotoxicity-associated taxa, and the supportive reference.; S2: List of identified virulence-associated genes (VAGs) in our study cohort.; S3: The Fx0 Bray-Curtis ditances compared to the centroid, and the relevant patient and post-treatment responses.

Author Contributions

KG and MB recruited patients and provided biological samples. MM, DP, ESZT, and KB processed the samples, extracted nucleic acids, and performed shotgun sequencing. MM and PF carried out the bioinformatic and statistical analyses and prepared the figures. DS, SZM, JM, KT, ECS, and MS contributed to data interpretation. JR shaped the conceptual framework and critically revised the manuscript for important intellectual content. ÁK and MP conceived and supervised the study, guided the analyses, and interpreted the data. All authors read and approved the final manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study protocol received ethical approval from the Institutional Review Board of the University of Debrecen (ethics approval number: DE RKEB/IKEB 6474-2023). All procedures were conducted in accordance with the principles of the Declaration of Helsinki and applicable institutional and national regulatory guidelines. All participants received a full explanation of the study by the responsible clinician and provided written informed consent before enrolment.

Data Availability Statement

All sequence data used in the analyses were deposited in the Sequence Read Archive (SRA) (http://www.ncbi.nlm.nih.gov/sra) under PRJNA1254622. Until publication, please use the reviewer’s link: https://dataview.ncbi.nlm.nih.gov/object/PRJNA1254622?reviewer=2nsk9sqdrga47durci9tjee88d.

Acknowledgments

The study was supported by the University of Debrecen Scientific Research Bridging Fund (DETKA). MM was supported by the PhD Excellence Scholarship from the Count István Tisza Foundation for the University of Debrecen. Supported by the University of Debrecen Program for Scientific Publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Steike, D.R.; Pepper, N.B.; Gravemeyer, S.; Bremer, A.; Glasbrenner, B.; Brüwer, M.; Pascher, A.; Domagk, D.; Biermann, L.; Lenz, P.; et al. Total neoadjuvant therapy for locally advanced rectal cancer: barriers to implementation in real-world practice. J. Cancer Res. Clin. Oncol. 2026, 152, 31. [Google Scholar] [CrossRef]
  2. Boublikova, L.; Novakova, A.; Simsa, J.; Lohynska, R. Total neoadjuvant therapy in rectal cancer: the evidence and expectations. Crit. Rev. Oncol. 2023, 192, 104196. [Google Scholar] [CrossRef]
  3. Brunsell, T.H.; Cengija, V.; Sveen, A.; Bjørnbeth, B.A.; Røsok, B.I.; Brudvik, K.W.; Guren, M.G.; Lothe, R.A.; Abildgaard, A.; Nesbakken, A. Heterogeneous radiological response to neoadjuvant therapy is associated with poor prognosis after resection of colorectal liver metastases. Eur. J. Surg. Oncol. (EJSO) 2019, 45, 2340–2346. [Google Scholar] [CrossRef] [PubMed]
  4. Johnson, G.G.; Park, J.; Helewa, R.M.; Goldenberg, B.A.; Nashed, M.; Hyun, E. Total neoadjuvant therapy for rectal cancer: a guide for surgeons. Can. J. Surg. 2023, 66, E196–E201. [Google Scholar] [CrossRef]
  5. Lavery, A.; Turkington, R.C. Transcriptomic biomarkers for predicting response to neoadjuvant treatment in oesophageal cancer. Gastroenterol. Rep. 2020, 8, 411–424. [Google Scholar] [CrossRef]
  6. Yacoub, H.; Zenzri, Y.; Cherif, D.; Ben Mansour, H.; Attia, N.; Mokrani, C.; Ben Zid, K.; Letaief, F.; Maamouri, N.; Mezlini, A. Predictors of pathological complete response after total neoadjuvant treatment using short course radiotherapy for locally advanced rectal cancer. BMC Gastroenterol. 2025, 25, 1–8. [Google Scholar] [CrossRef]
  7. Duan, T.; Ren, Z.; Jiang, H.; Ding, Y.; Wang, H.; Wang, F. Gut microbiome signature in response to neoadjuvant chemoradiotherapy in patients with rectal cancer. Front. Microbiol. 2025, 16, 1543507. [Google Scholar] [CrossRef] [PubMed]
  8. Garvey, M. Intestinal Dysbiosis: Microbial Imbalance Impacts on Colorectal Cancer Initiation, Progression and Disease Mitigation. Biomedicines 2024, 12, 740. [Google Scholar] [CrossRef]
  9. Badero, O.J.; Meribole, E.S.; Omokore, O.; O Quadri, I.; Kingdom, P.; Ifeanyichukwu, O.-C.C.; O Ogunnoiki, S.; Samuel-Ogunnoiki, P.M.; Adeyoola, O.; Osibowale, B.; et al. Gut Microbiota and Colorectal Cancer: Is Microbial Dysbiosis in Carcinogenesis an Emerging Risk Factor? Cureus 2026, 18. [Google Scholar] [CrossRef] [PubMed]
  10. Gao, Y.; Zeng, B.; Wang, Z.; Liang, S.; Yang, Y. Faecalcrobiota metabolites: emerging insights into cancer radiotherapy outcomes. Front. Microbiol. 2025, 16, 1663835. [Google Scholar] [CrossRef]
  11. Lu, L.; Li, F.; Gao, Y.; Kang, S.; Li, J.; Guo, J. Microbiome in radiotherapy: an emerging approach to enhance treatment efficacy and reduce tissue injury. Mol. Med. 2024, 30, 1–36. [Google Scholar] [CrossRef] [PubMed]
  12. Lin, Y.-E.; Chang, T.-H.; Chou, T.-W.; Lin, J.-C.; Hung, L.-C.; Huang, C.-C.; Lin, J.-B.; Lee, J. Predictive factors of pathologic good response after neoadjuvant chemoradiotherapy for rectal cancer. Discov. Oncol. 2025, 16, 1–12. [Google Scholar] [CrossRef]
  13. Assaf, D.; Lawrence, Y.; Margalit, O.; Shacham-Shmueli, E.; Bear, L.; Elbaz, N.; Lebedayev, A.; Ram, E.; Anderson, Y.; Gruper, O.; et al. Predictors and Long-Term Outcomes of Pathological Complete Response Following Neoadjuvant Treatment and Radical Surgery for Locally Advanced Rectal Cancer. J. Clin. Med. 2025, 14, 4251. [Google Scholar] [CrossRef]
  14. Zhang, W.; Fu, X.; Wen, L.; Yang, Y.; Zhang, D. Predicting treatment response to neoadjuvant chemotherapy in locally advanced rectal cancer: A combined deep learning and machine learning approach utilizing longitudinal multi-sequence MRI. Eur. J. Radiol. Open 2026, 16, 100739. [Google Scholar] [CrossRef] [PubMed]
  15. Álvarez-Aguilera, M.; Moreno, A.C.; Cózar, M.Á.B.; Menéndez, N.V.; de la Osa, J.M.O.; Miranda, J.L.D.; Quirós, I.A.; Gutiérrez, G.M.; Martín, J.J.B.; Ramírez, C.C.; et al. Influence of topography on the mucosa-associated gut microbiota in colon cancer. 2026, 104, 800298. [Google Scholar] [CrossRef] [PubMed]
  16. Keikes, L.; Kos, M.; Verbeek, X.A.A.M.; Van Vegchel, T.; Nagtegaal, I.D.; Lahaye, M.J.; Romero, A.M.; De Bruijn, S.; Verheul, H.M.W.; Rütten, H.; et al. Conversion of a colorectal cancer guideline into clinical decision trees with assessment of validity. Int. J. Qual. Heal. Care 2021, 33. [Google Scholar] [CrossRef]
  17. Vallicelli, C.; Barbara, S.J.; Fabbri, E.; Perrina, D.; Griggio, G.; Agnoletti, V.; Catena, F. Geriatric Approaches to Rectal Cancer: Moving Towards a Patient-Tailored Treatment Era. J. Clin. Med. 2025, 14, 1159. [Google Scholar] [CrossRef]
  18. Jousset, A.; Bienhold, C.; Chatzinotas, A.; Gallien, L.; Gobet, A.; Kurm, V.; Küsel, K.; Rillig, M.C.; Rivett, D.W.; Salles, J.F.; et al. Where less may be more: How the rare biosphere pulls ecosystems strings. ISME J. 2017, 11, 853–862. [Google Scholar] [CrossRef]
  19. Petrilla, A.; Nemeth, P.; Fauszt, P.; Szilagyi-Racz, A.; Mikolas, M.; Szilagyi-Tolnai, E.; David, P.; Stagel, A.; Gal, F.; Gal, K.; et al. Comparative analysis of the postadmission and antemortem oropharyngeal and rectal swab microbiota of ICU patients. Sci. Rep. 2024, 14, 1–17. [Google Scholar] [CrossRef]
  20. Mikolas, M.; Fauszt, P.; Petrilla, A.; Nemeth, P.; David, P.; Szilagyi-Tolnai, E.; Szilagyi-Racz, A.; Stagel, A.; Gal, F.; Gal, K.; et al. Analysis of ICU resistome dynamics in patients, staff and environment for the identification of predictive biomarkers of sepsis and early mortality. Sci. Rep. 2025, 15, 1–23. [Google Scholar] [CrossRef]
  21. Tamames, J.; Puente-Sánchez, F. SqueezeMeta, A Highly Portable, Fully Automatic Metagenomic Analysis Pipeline. Front. Microbiol. 2019, 9, 3349. [Google Scholar] [CrossRef]
  22. Buchfink, B.; Xie, C.; Huson, D.H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 2015, 12, 59–60. [Google Scholar] [CrossRef]
  23. Sayers, E. W.; et al. GenBank 2023 update. Nucleic Acids Res. 2023, 51, D141–D144. [Google Scholar] [CrossRef]
  24. biobakery/kneaddata. bioBakery. 2026.
  25. Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef] [PubMed]
  26. Langmead, B.; Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 2012, 9, 357–359. [Google Scholar] [CrossRef] [PubMed]
  27. Alcock, B.P.; Huynh, W.; Chalil, R.; Smith, K.W.; Raphenya, A.R.; A Wlodarski, M.; Edalatmand, A.; Petkau, A.; A Syed, S.; Tsang, K.K.; et al. CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database. Nucleic Acids Res. 2022, 51, D690–D699. [Google Scholar] [CrossRef] [PubMed]
  28. Wickham, H.; Chang, W.; Henry, L.; Pedersen, T.L.; Takahashi, K.; Wilke, C.; Woo, K.; Yutani, H.; Dunnington, D.; Brand, T.
  29. Sievert, C.; Parmer, C.; Hocking, T.; Chamberlain, S.; Ram, K.; Corvellec, M.; Despouy.
  30. Oksanen, J.; Simpson, G.L.; Blanchet, F.G.; Kindt, R.; Legendre, P.; Minchin, P.R.; et al. Vegan: Community Ecology Package. 2025). Available online: https://CRAN.R-project.org/ https://CRAN.R-project.org/package=vegan (accessed on 18 September 2025).
  31. Ritchie, M.E.; Phipson, B.; Wu, D.; Hu, Y.; Law, C.W.; Shi, W.; Smyth, G.K. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015, 43, e47. [Google Scholar] [CrossRef] [PubMed]
  32. Robin, X.; Turck, N.; Hainard, A.; Tiberti, N.; Lisacek, F.; Sanchez, J.-C.; Müller, M. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform. 2011, 12, 77. [Google Scholar] [CrossRef]
  33. Domilescu, I.; Miutescu, B.; Horhat, F.G.; Popescu, A.; Nica, C.; Ghiuchici, A.M.; Gadour, E.; Sîrbu, I.; Hutanu, D. Gut-Microbiome Signatures Predicting Response to Neoadjuvant Chemoradiotherapy in Locally Advanced Rectal Cancer: A Systematic Review. Metabolites 2025, 15, 412. [Google Scholar] [CrossRef]
  34. Lin, Y.; Lau, H.C.-H.; Liu, C.; Ding, X.; Sun, Y.; Rong, J.; Zhang, X.; Wang, L.; Yuan, K.; Miao, Y.; et al. Multi-cohort analysis reveals colorectal cancer tumor location-associated fecal microbiota and their clinical impact. Cell Host Microbe 2025, 33, 589–601.e3. [Google Scholar] [CrossRef]
  35. Miyake, T.; Mori, H.; Yasukawa, D.; Hexun, Z.; Maehira, H.; Ueki, T.; Kojima, M.; Kaida, S.; Iida, H.; Shimizu, T.; et al. The Comparison of Fecal Microbiota in Left-Side and Right-Side Human Colorectal Cancer. Eur. Surg. Res. 2021, 62, 248–254. [Google Scholar] [CrossRef]
  36. Jin, M.; Shang, F.; Wu, J.; Fan, Q.; Chen, C.; Fan, J.; Liu, L.; Nie, X.; Zhang, T.; Cai, K.; et al. Tumor-Associated Microbiota in Proximal and Distal Colorectal Cancer and Their Relationships With Clinical Outcomes. Front. Microbiol. 2021, 12. [Google Scholar] [CrossRef]
  37. Gan, G.; Chen, R.; Zheng, P.; Long, K.; Cheng, K.K.Y.; Sulaiman, J.E.; Huang, X. Oral pathogens meet the gut microbiome: new mechanistic insights on systemic disease. Front. Cell. Infect. Microbiol. 2026, 15, 1673512. [Google Scholar] [CrossRef]
  38. Huo, T.; Huang, X.; Liao, J.; Zhang, H.; Hu, L.; Xie, M. The bidirectional effects and mechanisms of the oral and gut microbiomes: a narrative review. Front. Immunol. 2026, 17, 1697413. [Google Scholar] [CrossRef]
  39. Flemer, B.; Warren, R.D.; Barrett, M.P.; Cisek, K.; Das, A.; Jeffery, I.B.; Hurley, E.; O‘Riordain, M.; Shanahan, F.; O’Toole, P.W. The oral microbiota in colorectal cancer is distinctive and predictive. Gut 2018, 67, 1454–1463. [Google Scholar] [CrossRef] [PubMed]
  40. Pandey, H.; Tang, D.W.T.; Wong, S.H.; Lal, D. Gut Microbiota in Colorectal Cancer: Biological Role and Therapeutic Opportunities. Cancers 2023, 15, 866. [Google Scholar] [CrossRef] [PubMed]
  41. Klier, K.M.; Anantharaman, K. An updated view of metabolic handoffs in microbiomes. Trends Microbiol. 2025, 34, 98–112. [Google Scholar] [CrossRef] [PubMed]
  42. Facchin, S.; Calgaro, M.; Savarino, E.V. Rethinking Short-Chain Fatty Acids: A Closer Look at Propionate in Inflammation, Metabolism, and Mucosal Homeostasis. Cells 2025, 14, 1130. [Google Scholar] [CrossRef] [PubMed]
  43. Yuan, M.; Gao, K.; Peng, K.; Bi, S.; Cui, X.; Liu, Y. A Review of Nutritional Regulation of Intestinal Butyrate Synthesis: Interactions Between Dietary Polysaccharides and Proteins. Foods 2025, 14, 3649. [Google Scholar] [CrossRef]
  44. Daprà, V.; Airoldi, M.; Bartolini, M.; Fazio, R.; Mondello, G.; Tronconi, M.C.; Prete, M.G.; D’agostino, G.; Foppa, C.; Spinelli, A.; et al. Total Neoadjuvant Treatment for Locally Advanced Rectal Cancer Patients: Where Do We Stand? Int. J. Mol. Sci. 2023, 24, 12159. [Google Scholar] [CrossRef]
  45. Ciernikova, S.; Sevcikova, A.; Mladosievicova, B.; Mego, M. Microbiome in Cancer Development and Treatment. Microorganisms 2023, 12, 24. [Google Scholar] [CrossRef]
  46. Kadyan, S.; Park, G.; Singh, T.P.; Patoine, C.; Singar, S.; Heise, T.; Domeier, C.; Ray, C.; Kumar, M.; Behare, P.V.; et al. Microbiome-based therapeutics towards healthier aging and longevity. Genome Med. 2025, 17, 1–19. [Google Scholar] [CrossRef]
  47. Bradley, E.; Haran, J. The human gut microbiome and aging. Gut Microbes 16, 2359677. [CrossRef]
  48. Burns, M.B.; Lynch, J.; Starr, T.K.; Knights, D.; Blekhman, R. Virulence genes are a signature of the microbiome in the colorectal tumor microenvironment. Genome Med. 2015, 7, 1–12. [Google Scholar] [CrossRef] [PubMed]
  49. Tavella, T.; Turroni, S.; Brigidi, P.; Candela, M.; Rampelli, S. The Human Gut Resistome up to Extreme Longevity. mSphere 2021, 6, e0069121. [Google Scholar] [CrossRef] [PubMed]
  50. Zhang, T.; Wang, J.; Feng, Q.; Xu, X.; Zhu, W.; Mao, S.; Liu, J. Age- and diet-driven assembly of the gut antibiotic resistome in humans and food-producing animals. Gut Microbes 2026, 18, 2610052. [Google Scholar] [CrossRef]
  51. Bai, B.; Ma, J.; Xu, W.; Chen, X.; Chen, X.; Lv, C.; Su, W.; Li, Y.; Sun, H.; Zhang, B.; et al. Gut microbiota and colorectal cancer: mechanistic insights, diagnostic advances, and microbiome-based therapeutic strategies. Front. Microbiol. 2025, 16, 1699893. [Google Scholar] [CrossRef] [PubMed]
  52. Carvalho, J.V.M.; Meyer, J.; Ris, F.; Durham, A.; Bornand, A.; Ricoeur, A.; Corrò, C.; Koessler, T. Narrative Review: Predictive Biomarkers of Tumor Response to Neoadjuvant Radiotherapy or Total Neoadjuvant Therapy of Locally Advanced Rectal Cancer Patients. Cancers 2025, 17, 2229. [Google Scholar] [CrossRef] [PubMed]
  53. Zhang, M.; Liu, J.; Xia, Q. Role of gut microbiome in cancer immunotherapy: from predictive biomarker to therapeutic target. Exp. Hematol. Oncol. 2023, 12, 1–30. [Google Scholar] [CrossRef]
  54. Wu, Z.; Yang, Z.; Lyu, C.; Sun, B.; Zhang, R.; Li, H.; Chen, J. Gut microbiota and neoadjuvant chemoradiotherapy in locally advanced rectal cancer: a review of current evidence and emerging insights. Ther. Adv. Med. Oncol. 2026, 18. [Google Scholar] [CrossRef]
  55. Stepanyan, A.; Kotsafti, A.; Rosato, A.; Castagliuolo, I.; Scarpa, M.; Scarpa, M.; Agostini, M.; Angriman, I.; Antoniutti, M.; Bao, Q.R.; et al. Gut microbiota-associated predictors as biomarkers of neoadjuvant treatment response in rectal cancer-a systematic review. Br. J. Cancer 2026. [Google Scholar] [CrossRef]
  56. Zhang, J.; Wang, Z.; Li, S.; Luo, C.; Li, H.; Ma, S.; Wang, P.; Liu, H.; Sun, L.; Yin, Y.; et al. Phocaeicola coprophilus -Derived 6-Methyluracil Attenuates Radiation-Induced Intestinal Fibrosis by Suppressing the IDO1-Kynurenine-AHR Axis. Adv. Sci. 2026, 13, e18502. [Google Scholar] [CrossRef]
  57. Ota, G.; Inoue, R.; Saito, A.; Kono, Y.; Kitayama, J.; Sata, N.; Horie, H. Reduced Abundance of Phocaeicola in Mucosa-associated Microbiota Is Associated with Distal Colorectal Cancer Metastases Possibly through an Altered Local Immune Environment. J. Anus Rectum Colon 2024, 8, 235–245. [Google Scholar] [CrossRef]
  58. Xia, Y. Statistical normalization methods in microbiome data with application to microbiome cancer research. Gut Microbes 2023, 15, 2244139. [Google Scholar] [CrossRef]
Figure 1. Demographic composition and baseline-to-TNT response clinical annotation of the study cohort. a) Alluvial plot showing patient-level baseline clinical characteristics and pathology-based post-treatment TNT responses in the LARC cohort, including initial clinical stage, age group, tumour location, and response categories after TNT and surgery. Green boxes mark patients with double-favourable responses (DFR). TRG+ = favourable pathological response (Modified Ryan TRG1–2); TRG− = unfavourable pathological response (Modified Ryan TRG3); ND+ = favourable nodal response (two-category nodal downstaging, e.g., N2→N0); ND− = incomplete or absent nodal response; Percentages indicate within-group outcome distributions (top pair: TRG+/TRG−; bottom pair: ND+/ND−); red denotes unfavourable outcomes.
Figure 1. Demographic composition and baseline-to-TNT response clinical annotation of the study cohort. a) Alluvial plot showing patient-level baseline clinical characteristics and pathology-based post-treatment TNT responses in the LARC cohort, including initial clinical stage, age group, tumour location, and response categories after TNT and surgery. Green boxes mark patients with double-favourable responses (DFR). TRG+ = favourable pathological response (Modified Ryan TRG1–2); TRG− = unfavourable pathological response (Modified Ryan TRG3); ND+ = favourable nodal response (two-category nodal downstaging, e.g., N2→N0); ND− = incomplete or absent nodal response; Percentages indicate within-group outcome distributions (top pair: TRG+/TRG−; bottom pair: ND+/ND−); red denotes unfavourable outcomes.
Preprints 216590 g001
Figure 2. Tumour location-stratified longitudinal microbiome feature trajectories from baseline to the end of the chemoradiotherapy phase within TNT. Bar plots summarize aggregated microbiome features in RC patients with tumours located in the low rectum or mid/upper rectum at Fx0 (baseline, pre-treatment; yellow) and Fx28 (end of the chemoradiotherapy phase within TNT; blue). a) The balance between obligate anaerobic and aerotolerant taxa was next examined, expressed as the log2-transformed ratio of obligate to aerotolerant taxa. b) Aggregated relative frequency of oralisation-associated taxa. c) Aggregated relative frequency of biofilm-forming taxa. d) Aggregated relative frequency of SCFA-producer-associated taxa, shown as stacked contributions of acetate-, butyrate-, and propionate-associated producer groups. Numerical labels indicate the corresponding values. Dashed horizontal lines denote the corresponding non-RC control levels, where shown. Asterisks indicate statistically significant differences relative to the corresponding non-RC control level (** p < 0.01).
Figure 2. Tumour location-stratified longitudinal microbiome feature trajectories from baseline to the end of the chemoradiotherapy phase within TNT. Bar plots summarize aggregated microbiome features in RC patients with tumours located in the low rectum or mid/upper rectum at Fx0 (baseline, pre-treatment; yellow) and Fx28 (end of the chemoradiotherapy phase within TNT; blue). a) The balance between obligate anaerobic and aerotolerant taxa was next examined, expressed as the log2-transformed ratio of obligate to aerotolerant taxa. b) Aggregated relative frequency of oralisation-associated taxa. c) Aggregated relative frequency of biofilm-forming taxa. d) Aggregated relative frequency of SCFA-producer-associated taxa, shown as stacked contributions of acetate-, butyrate-, and propionate-associated producer groups. Numerical labels indicate the corresponding values. Dashed horizontal lines denote the corresponding non-RC control levels, where shown. Asterisks indicate statistically significant differences relative to the corresponding non-RC control level (** p < 0.01).
Preprints 216590 g002
Figure 3. Age-stratified longitudinal microbiome features during TNT. Patients were dichotomized using a pragmatic age cut-off of 70 years, yielding two subgroups (<70 years and ≥70 years). a) Aggregate relative frequency (ARF), used as a proxy for estimated microbial biomass, at Fx0 and Fx28 across the two age strata. b) Aggregate relative frequency of putative pathogen-associated taxa at Fx0 and Fx28 in the two age strata. c) Antimicrobial resistance (AMR) burden, expressed as total AMR-associated read count, at Fx0 and Fx28 in patients aged <70 years and ≥70 years. Numerical labels indicate the corresponding values shown in each panel.
Figure 3. Age-stratified longitudinal microbiome features during TNT. Patients were dichotomized using a pragmatic age cut-off of 70 years, yielding two subgroups (<70 years and ≥70 years). a) Aggregate relative frequency (ARF), used as a proxy for estimated microbial biomass, at Fx0 and Fx28 across the two age strata. b) Aggregate relative frequency of putative pathogen-associated taxa at Fx0 and Fx28 in the two age strata. c) Antimicrobial resistance (AMR) burden, expressed as total AMR-associated read count, at Fx0 and Fx28 in patients aged <70 years and ≥70 years. Numerical labels indicate the corresponding values shown in each panel.
Preprints 216590 g003
Figure 4. TNT response-stratified analysis of aggregate and abundance-partitioned Shannon diversity. Patients were dichotomized into a double-favourable response group (DFR), defined by favourable pathological response (Modified Ryan TRG1–2) together with favourable nodal response, and a non-double-favourable response group (non-DFR), which included all remaining patients. a) Shannon diversity in the two TNT response-defined subgroups. b) Shannon diversity at Fx0 and Fx28 within the DFR and non-DFR subgroups. c) Donut charts showing the proportional contribution of high-abundance taxa (HAT) and low-abundance taxa (LAT) to total relative frequency and taxonomic coverage in the two response-defined subgroups. HAT and LAT were defined according to whether species relative frequency was above or below the mean relative-frequency threshold, respectively. d) Shannon diversity of the HAT and LAT fractions at Fx0 and Fx28 in the DFR and non-DFR subgroups. e) Longitudinal changes in Shannon diversity within the LAT and HAT fractions between Fx0 and Fx28 in the DFR and non-DFR subgroups. Numerical labels indicate the corresponding values shown in each panel. Asterisks denote statistically significant comparisons (* p < 0.05).
Figure 4. TNT response-stratified analysis of aggregate and abundance-partitioned Shannon diversity. Patients were dichotomized into a double-favourable response group (DFR), defined by favourable pathological response (Modified Ryan TRG1–2) together with favourable nodal response, and a non-double-favourable response group (non-DFR), which included all remaining patients. a) Shannon diversity in the two TNT response-defined subgroups. b) Shannon diversity at Fx0 and Fx28 within the DFR and non-DFR subgroups. c) Donut charts showing the proportional contribution of high-abundance taxa (HAT) and low-abundance taxa (LAT) to total relative frequency and taxonomic coverage in the two response-defined subgroups. HAT and LAT were defined according to whether species relative frequency was above or below the mean relative-frequency threshold, respectively. d) Shannon diversity of the HAT and LAT fractions at Fx0 and Fx28 in the DFR and non-DFR subgroups. e) Longitudinal changes in Shannon diversity within the LAT and HAT fractions between Fx0 and Fx28 in the DFR and non-DFR subgroups. Numerical labels indicate the corresponding values shown in each panel. Asterisks denote statistically significant comparisons (* p < 0.05).
Preprints 216590 g004
Figure 5. Genotoxicity-associated taxonomic and gene-level signatures across TNT and clinical subgroups. a) Aggregated relative frequency of genotoxicity-associated taxa (GAT) in non-RC controls and across the TNT timeline (Fx0, Fx14, and Fx28) in the RC cohort. b) Aggregated relative frequency of GAT stratified by tumour location (low rectum vs. mid/upper rectum) at baseline and end-of-treatment. c) Aggregated relative frequency of GAT stratified by age group (≥70 years vs. <70 years) at baseline and end-of-treatment. d) Aggregated occurrence of virulence-associated genes (VAGs) in non-RC controls, in patients with double-favourable outcome (DFR; favourable pathological and nodal response), and in patients lacking double-favourable outcome (non-DFR). Dashed horizontal lines in panels (b) and (c) indicate the corresponding non-RC control reference level.
Figure 5. Genotoxicity-associated taxonomic and gene-level signatures across TNT and clinical subgroups. a) Aggregated relative frequency of genotoxicity-associated taxa (GAT) in non-RC controls and across the TNT timeline (Fx0, Fx14, and Fx28) in the RC cohort. b) Aggregated relative frequency of GAT stratified by tumour location (low rectum vs. mid/upper rectum) at baseline and end-of-treatment. c) Aggregated relative frequency of GAT stratified by age group (≥70 years vs. <70 years) at baseline and end-of-treatment. d) Aggregated occurrence of virulence-associated genes (VAGs) in non-RC controls, in patients with double-favourable outcome (DFR; favourable pathological and nodal response), and in patients lacking double-favourable outcome (non-DFR). Dashed horizontal lines in panels (b) and (c) indicate the corresponding non-RC control reference level.
Preprints 216590 g005
Figure 6. Response-stratified exclusivity analysis of species occurrence across TNT. a) Venn diagrams show shared and subgroup-exclusive species within the above-mean-abundance fraction (HAT) at Fx0, Fx14, and Fx28, analysed separately for nodal and pathological response. Favourable nodal response was defined as two-category nodal downstaging; favourable pathological response was defined as Modified Ryan TRG1–2. Numbers indicate shared and exclusive species counts. Species listed below the diagrams denote taxa exclusive to the indicated subgroup at the corresponding treatment stage. b) Bar plots show the longitudinal relative frequency of Phocaeicola coprophilus in the same response-defined subgroup comparisons. The dashed red line indicates the non-RC control reference level.
Figure 6. Response-stratified exclusivity analysis of species occurrence across TNT. a) Venn diagrams show shared and subgroup-exclusive species within the above-mean-abundance fraction (HAT) at Fx0, Fx14, and Fx28, analysed separately for nodal and pathological response. Favourable nodal response was defined as two-category nodal downstaging; favourable pathological response was defined as Modified Ryan TRG1–2. Numbers indicate shared and exclusive species counts. Species listed below the diagrams denote taxa exclusive to the indicated subgroup at the corresponding treatment stage. b) Bar plots show the longitudinal relative frequency of Phocaeicola coprophilus in the same response-defined subgroup comparisons. The dashed red line indicates the non-RC control reference level.
Preprints 216590 g006
Figure 7. Exploratory integration of baseline microbiome signals into a composite response score. a) ROC-based evaluation of Phocaeicola coprophilus at Fx0 against the double-favourable response group (DFR), comprising those with both favourable pathological response (Modified Ryan TRG1–2) and favourable nodal response defined as two-category nodal downstaging, versus the non-double-favourable response group (non-DFR). The left panel shows sensitivity and specificity across thresholds, whereas the right panel shows the density distribution of baseline P. coprophilus values in the DFR and non-DFR groups. b) ROC analysis of the Reference Distance Index (RDI), defined as the Bray-Curtis distance from the non-RC reference centroid, for discrimination between favourable and unfavourable pathological response at baseline. The left panel shows sensitivity and specificity across thresholds, whereas the right panel shows the density distribution of baseline Bray-Curtis distances in the two pathological response groups. c) ROC analysis of baseline RDI for discrimination between patients with nodal downstaging and those without nodal downstaging. The left panel shows sensitivity and specificity across thresholds, whereas the right panel shows the density distribution of baseline Bray-Curtis distances in the two nodal response groups. d) ROC-based evaluation of the exploratory Microbiome Baseline Score (MBS), constructed as a weighted combination of the P. coprophilus indicator (PCI), the pathological response-specific distance classifier (CTRG), and the nodal response-specific distance classifier (CND), according to the formula MBS = 0.34 × PCI + 0.38 × CTRG + 0.28 × CND. The left panel shows sensitivity and specificity across thresholds, whereas the right panel shows the density distribution of MBS values in the DFR and non-DFR groups. For each analysis, the area under the curve (AUC), sensitivity, specificity, and optimal cut-off are indicated within the panel.
Figure 7. Exploratory integration of baseline microbiome signals into a composite response score. a) ROC-based evaluation of Phocaeicola coprophilus at Fx0 against the double-favourable response group (DFR), comprising those with both favourable pathological response (Modified Ryan TRG1–2) and favourable nodal response defined as two-category nodal downstaging, versus the non-double-favourable response group (non-DFR). The left panel shows sensitivity and specificity across thresholds, whereas the right panel shows the density distribution of baseline P. coprophilus values in the DFR and non-DFR groups. b) ROC analysis of the Reference Distance Index (RDI), defined as the Bray-Curtis distance from the non-RC reference centroid, for discrimination between favourable and unfavourable pathological response at baseline. The left panel shows sensitivity and specificity across thresholds, whereas the right panel shows the density distribution of baseline Bray-Curtis distances in the two pathological response groups. c) ROC analysis of baseline RDI for discrimination between patients with nodal downstaging and those without nodal downstaging. The left panel shows sensitivity and specificity across thresholds, whereas the right panel shows the density distribution of baseline Bray-Curtis distances in the two nodal response groups. d) ROC-based evaluation of the exploratory Microbiome Baseline Score (MBS), constructed as a weighted combination of the P. coprophilus indicator (PCI), the pathological response-specific distance classifier (CTRG), and the nodal response-specific distance classifier (CND), according to the formula MBS = 0.34 × PCI + 0.38 × CTRG + 0.28 × CND. The left panel shows sensitivity and specificity across thresholds, whereas the right panel shows the density distribution of MBS values in the DFR and non-DFR groups. For each analysis, the area under the curve (AUC), sensitivity, specificity, and optimal cut-off are indicated within the panel.
Preprints 216590 g007
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated