1. Introduction
Follicular lymphoma (FL) and marginal zone lymphoma (MZL) are types of indolent low-grade B-cell lymphomas (LGBCLs) that constitute a heterogeneous subset of non-Hodgkin lymphomas (NHL). FL accounts for 12%–25% of all NHL cases, whereas MZL, including mucosa-associated lymphoid tissue and nodal subtypes, accounts for 7%–10.3%[
1,
2,
3]. Despite their typically indolent clinical course, these lymphomas carry a persistent risk of histologic transformation (HT), often to aggressive lymphoma types such as diffuse large B-cell lymphoma. HT, with a reported incidence of 4%–14.3%, is associated with significantly worse clinical outcomes [
4,
5].
Currently, established prognostic indices for LGBCLs include the Follicular Lymphoma International Prognostic Index (FLIPI) [
6], PRIMA-Prognostic Index [
7], and Mucosa-Associated Lymphoid Tissue Lymphoma Prognostic Index (MALT-IPI) [
8]. However, these indices and next-generation sequencing (NGS)-enhanced models are tailored primarily toward survival prediction rather than identifying the specific risk of HT [
9,
10].
Predicting HT is challenging for several reasons. First, the relatively rare incidence of HT leads to significant class imbalance, which can bias algorithms toward the majority non-transformed group, confound performance metrics, and reduce statistical power. Second, HT is inherently time-dependent, rendering models based solely on fixed timepoints potentially misleading. Moreover, survival analysis requires careful consideration of right-censoring and time-varying hazard rates.
Consequently, recent studies have attempted to improve HT prediction in LGBCLs by combining clinical parameters from established prognostic indices, such as FLIPI, with genetic data from NGS [
11,
12]. However, comprehensive comparisons between machine learning survival models and classical classifiers are limited. Furthermore, the incremental predictive value of incorporating NGS data into these models is not yet fully understood.
To address this knowledge gap, this study developed and compared several predictive models to distinguish patients who experience HT from those who do not, employing both traditional classification methods and advanced survival-based machine learning techniques. The study aimed to evaluate the performance of these models across independent FL and MZL cohorts, assess the added value of incorporating NGS data, and provide practical insights into optimizing HT risk prediction strategies.
2. Materials and Methods
2.1. Patient Selection
This multicenter, retrospective cohort study analyzed clinical data from 1,068 patients diagnosed with LGBCLs at seven hospitals in the Republic of Korea: Yeoido, Seoul, Bucheon, Incheon, Daejeon, Eunpyeong, and Ui-jeongbu St. Mary’s. Patients diagnosed between January 2011 and February 2026 were included. Patients aged 18 years or older with a diagnosis of FL (grades 1, 2, or 3A) or MZL—covering splenic, nodal, and extranodal mucosa-associated lymphoid tissue (MALT) variants—were eligible for inclusion. Baseline diagnostic workup comprised a complete blood count, measurement of serum lactate dehydrogenase (LDH), bone marrow biopsy, and computed tomography scans of the neck, chest, abdomen, and pelvis.
The study was approved by the Institutional Review Board and Ethics Committee of the Catholic Medical Center, South Korea (approval numbers: SC23WISI0093 for the training/validation cohorts and XC25RIDI0050 for the test cohort). Owing to the retrospective nature of the analysis, the requirement for written informed consent was waived. All procedures were conducted in accordance with applicable ethical guidelines and regulations, including adherence to the principles outlined in the Declaration of Helsinki.
Demographic, clinical, and pathological data were systematically collected from medical records, including variables from established prognostic indices, FLIPI and MALT-IPI. Collected variables included demographic details (age at diagnosis, sex), clinical features (nodal involvement, Ann Arbor staging, pathological subtype, bone marrow involvement, pleural effusion), and laboratory results (LDH and hemoglobin levels). Staging was consistently evaluated using the Lugano classification to maintain uniformity across participating institutions [
13].
2.2. NGS
NGS was performed using the QIAseq Pan-cancer Multimodal Panel (Qiagen, Hilden, Germany), which targets DNA alterations across 523 cancer-related genes. DNA was isolated from formalin-fixed, paraffin-embedded (FFPE) tissue samples at diagnosis using the QIAamp DNA FFPE Tissue Kit (Qiagen, Hilden, Germany) in accordance with the manufacturer’s instructions. DNA quantity and quality assessments were conducted using the QIAseq DNA QuantiMIZE Array Kit, ensuring optimal conditions for subsequent library preparation.
NGS library preparation adhered to the QIAseq Multimodal Panel HT Handbook (Qiagen). The workflow involved a series of steps, including enzymatic DNA fragmentation, end repair, addition of adenine overhangs (A-tailing), adapter ligation, and amplification via polymerase chain reaction (PCR). Unique molecular indices were incorporated during library construction to minimize PCR duplicates and sequencing artifacts, enhancing variant detection accuracy. Libraries were individually indexed, quantified using quantitative PCR, and assessed for quality using an Agilent Bioanalyzer to verify fragment size distribution and adapter removal.
Sequencing was performed on an Illumina platform (Illumina, San Diego, CA, USA) using Qiagen’s custom sequencing primers. Sequence data analysis involved variant calling using QIAGEN's CLC Genomics Workbench software, with subsequent variant annotation and clinical interpretation using QIAGEN Clinical Insight Interpret.
2.3. Statistical Analysis
Categorical variables were compared using the Chi-square test or Fisher’s exact test, as appropriate. Overall survival (OS) was estimated using the Kaplan–Meier method, and differences between groups were assessed with the log-rank test. Univariate Cox proportional hazards models were used to identify predictors significantly associated with histologic transformation (HT). The cumulative incidence of HT was calculated using the Gray test, accounting for death as a competing risk. Model discrimination was assessed using Harrell’s concordance index (C-index), where values closer to 1.0 indicate stronger predictive ability, and values near 0.5 suggest no better than chance. Statistical significance was defined as a two-sided p-value of less than 0.05 [
14].
All statistical analyses and data visualizations were performed using R software (version 4.2.3; R Foundation for Statistical Computing, Vienna, Austria). Survival models (Cox proportional hazards, Lasso-Cox [
15], random survival forest [RSF] [
16], gradient-boosted Cox [GBM-Cox] [
17], extreme gradient boosting [XGBoost]-Cox[
18]) and classification algorithms (logistic regression, Lasso logistic regression [
19], random forest [
20], gradient boosting [
21], XGBoost[
22]) were trained using a cohort comprising 592 patients (training set) and validated in an independent cohort of 384 patients (validation set) to predict HT. The top-performing survival models—XGBoost-Cox, Lasso-Cox, and GBM-Cox—were further evaluated on a separate test set (n=92). Model performance was evaluated based on accuracy, time-dependent area under the curve (AUC), sensitivity, and specificity at 6-, 12-, 18-, 24-, and 36-month intervals following diagnosis. The optimal binary risk cutoff values were determined by maximizing Youden’s index. Additional analyses evaluated the incremental improvement in predictive performance achieved by incorporating NGS-derived variables. NGS models were defined as those in which five or more mutations were detected in the test data. Principal component analysis (PCA) biplots were used to elucidate the contributions of NGS-derived predictors in the final predictive models. Nomograms were generated using the “rms” package to illustrate the impact of the variables in the models [
23].
3. Results
3.1. Patient Characteristics and Survival Outcomes
Table 1 summarizes the baseline characteristics of the total cohort comprising 1,068 patients. The median age at diagnosis was 52 years (range, 18–94 years). FL was the predominant subtype (n = 744, 69.7%), and MZL accounted for 324 cases (30.3%). In the total cohort, the HT group had a significantly higher prevalence of pleural effusion, elevated LDH levels, anemia, and Ann Arbor Stage III–IV disease (
Table 1).
Patients diagnosed before May 1, 2020, were assigned to the training cohort (n = 592, 55.4%), whereas those diagnosed thereafter were placed in the validation/test cohort (n = 476, 44.6%). Comparisons between the training and validation/test cohorts indicated significant differences in several clinical features. Specifically, the validation/test cohort included a higher proportion of patients with involvement of more than four nodal sites, axial bone involvement, splenomegaly, anemia, and advanced-stage disease. In contrast, the training cohort had a significantly higher proportion of males and patients diagnosed with the MZL subtype (
Table S1). However, subgroup analyses comparing the presence of HT between the training and validation/test cohorts revealed no significant differences across variables (
Table S2).
The median follow-up duration was 5.2 years (range, 1–13.1 years) in the training cohort and 2.2 years (range, 0.8–5.2 years) in the validation/test cohort. In the training cohort, the 5-year OS rate among patients who experienced HT was 85.2% (95% confidence interval [CI], 71.1%–100%), which was significantly lower than the 93.7% (95% CI, 91.6%–95.8%) observed in the non-HT group (P = 0.021). In the validation/test cohort, the 2-year OS rate was significantly lower in the HT group (80.7%; 95% CI, 63.2%–100%) than in the non-HT group (96.8%; 95% CI, 95.1%–98.5%) (P < 0.001;
Figure 1A and 1B). The cumulative incidence of HT was 3.1% at 5 years in the training cohort and 3.0% at 2 years in the validation/test cohort (
Figure 1C and 1D).
3.2. Model Comparison: Classification vs. Survival Models
We evaluated several machine learning methods for predicting HT, including traditional classification models (logistic regression, Lasso regression, random forest, gradient boosting, and XGBoost) and survival-based models (Cox proportional hazards regression, Lasso-Cox, RSF, GBM-Cox, and XGBoost-Cox). Using the training and validation datasets, time-dependent AUCs were comparable between logistic regression and Cox proportional hazards models and between random forest and RSF. However, survival models employing regularization and boosting (Lasso-Cox, GBM-Cox, XGBoost-Cox) demonstrated superior performance than their traditional classification counterparts (Lasso, GBM, XGB).
Further evaluation of the survival models on an independent test set (n = 92) included optimization of binary risk thresholds using Youden’s index to improve sensitivity. The XGBoost-Cox model demonstrated the best performance, achieving the highest accuracy, time-dependent AUC, sensitivity, and specificity at 12-, 18-, and 24-month predictions. Additionally, XGBoost-Cox yielded the highest C-index (0.836), outperforming GBM-Cox (0.706) and Lasso-Cox (0.734) (
Figure 2).
3.3. Incorporating NGS into the Model
We further examined the impact of incorporating NGS data into predictive models using the test set. Cox proportional hazards regression identified significant hazard ratios for
TP53,
KMT2A,
BLM,
ATR, and
RAD50 mutations related to HT (
Table 2). Incorporating NGS variables significantly improved the XGBoost-Cox model, consistently increasing accuracy and specificity across multiple time points. Although the increase in AUC was modest, integration of NGS data improved long-term accuracy compared with the clinical-only model (
Figure 2A).
TP53 and
BLM mutations were notably more prevalent in the HT group than in the non-HT group (
Figure 3B and
Table 2). The nomogram in
Figure S2 illustrates the relative impact of variables according to the models.
PCA highlighted distinct patterns among predictors. The first two Principal Components (PC1 and PC2) explained 55% and 38.4% of the variance, respectively (
Figure 3C). PC1 separated genes with strong positive loadings (
CREBBP,
BCL2,
STAT6,
KMT2D,
TNFRSF14,
BTK,
BRAF, and
EZH2) from negatively loaded NGS-driven mutations (
KMT2A,
ATR,
FGFR1, and
TCF7L2). The clinical-only model (excluding NGS data) primarily loaded on PC2, indicating reliance on clinically distinct features. In contrast, the NGS-integrated model demonstrated negative PC1 loading, closely clustering with key NGS mutations. The “Observe” vector, representing actual HT events, aligned closely with DNA-repair or tumor-suppressor genes, including
TP53,
BLM, and
RAD50. These findings underline the clinical and prognostic importance of incorporating NGS-derived genetic profiles into HT prediction models for patients with LGBCLs.
4. Discussion
This comprehensive study involving 1,068 patients with LGBCLs showed that survival-based models significantly improved the prediction of HT. Time-dependent modeling emerged as particularly valuable, underscoring the importance of incorporating temporal factors into HT predictions. Our findings align closely with those of previous research including that by Ismael et al., which reported a C-index of 0.618 for Cox proportional hazards models based solely on clinical data. Similarly, the present study yielded a comparable C-index of approximately 0.586 (data not shown) [
11].
Notably, certain survival modeling approaches had distinct advantages. Although traditional Cox proportional hazards models performed similarly to logistic regression, models employing regularization, such as Lasso-Cox, clearly outperformed their corresponding classical classifiers (e.g., Lasso regression). The inherent instability of coefficient estimation in the standard Cox proportional hazards model due to multicollinearity or limited sample sizes underscores the advantage of Lasso-Cox. Using the regularization parameter λ, Lasso-Cox reduces overfitting by selectively including relevant variables. This approach proves especially beneficial in high-dimensional, sparse settings where few predictors genuinely impact prognosis [
15].
Additionally, GBM-Cox and XGBoost-Cox consistently demonstrated superior predictive performance than RSF models. This superiority is likely attributable to the fact that gradient boosting builds decision trees sequentially. Each tree specifically addresses errors from prior steps. This process increases the model’s ability to capture difficult-to-predict cases and helps reduce bias. In contrast, RSF averages fully grown trees using bagging, making it inherently less capable of identifying subtle or low-frequency mutations and complex interactions [
24,
25]. Both XGBoost and XGBoost-Cox utilize gradient boosting algorithms, but they differ distinctly in their respective loss functions. Zha et al. previously demonstrated that the FLIPI-C model, initially developed using XGBoost, could be improved by applying the XGBoost-Cox model [
26]. In this study, we applied optimal binary risk thresholds derived by maximizing Youden’s index to enhance model sensitivity. Clinically, sensitivity is typically more crucial than specificity in predicting HT, as clinicians prioritize accurately identifying patients who may develop HT rather than identifying those who will not. By employing Youden’s index, our models could effectively address class imbalance, allowing customized optimization to better serve clinical decision-making.
Incorporating NGS data notably improved model accuracy by increasing specificity. Mutation analysis revealed that
TP53 mutations were significantly more prevalent in patients who experienced HT, aligning well with existing literature [
27,
28]. Notably, observed HT events closely aligned with DNA-repair and tumor-suppressor genes, including
TP53,
BLM, and
RAD50, suggesting their critical roles in transformation risk. However, discrepancies were noted regarding some mutations previously reported as significant. For instance, although previous studies associated mutations in
MYC,
CDKN2A, and
TNFRSF14 with HT, our cohort confirmed only
TP53 mutations to be significantly associated with HT [
11]. Although BLM mutations were prevalent in HT cases in our data, direct clinical evidence remains limited. Existing support is primarily derived from mouse model studies. Similarly,
RAD50 and
KMT2A mutations were associated with higher hazard ratios in our cohort. However, prior clinical documentation of their specific roles in HT remains sparse. This discrepancy indicates potential differences arising from variations in lymphoma subtypes or ethnic diversity within cohorts [
29,
30,
31].
Our findings emphasize the importance of integrating advanced machine learning methods into clinical practice, particularly in cohorts with inherent limitations such as class imbalance or variable-specific sparsity. Compared with classical classification methods, survival-based machine learning, especially XGBoost-Cox, demonstrated superior predictive capabilities for HT in patients with FL or MZL. The nonlinear, interaction-sensitive nature of tree-based models captures intricate relationships among NGS variables more effectively than linear approaches, thus improving predictive accuracy.
Nevertheless, there are notable limitations regarding the interpretability and direct clinical applicability of machine learning models. Optimal timing for evaluating patient risk of HT remains a significant consideration. The median transformation period in our cohort was approximately 1.7 years, slightly shorter than the previously reported interval of approximately 2 years, which is associated with a high incidence of HT detection.
28 Thus, clinicians should particularly monitor early-stage risk factors, including advanced disease stage, elevated LDH levels, and anemia, during this critical period [
4,
5].
This study also has inherent limitations related to its retrospective design and the absence of external validation, especially for models incorporating NGS data. Prospective validation, including assessments involving circulating tumor DNA and multimodal imaging modalities, is recommended to enhance and refine predictive models for HT. Additionally, the lack of standardized treatment-related outcome measures represents another limitation. Variations in treatment timing and decisions across different institutions render such data unsuitable for incorporation into predictive models. Future research should therefore integrate treatment data into time-dependent models. Lastly, the relatively small number of HT cases posed challenges. Nonetheless, our analysis reflects real-world clinical scenarios characterized by inherent class imbalances.
5. Conclusions
This multicenter retrospective study demonstrated that survival-based machine learning models, particularly XGBoost-Cox, significantly enhance the prediction of HT in patients with FL and MZL. Incorporating NGS data markedly improved model accuracy, highlighting key genetic alterations such as TP53, BLM, and RAD50 as significant predictors. These findings emphasize the clinical value of integrating molecular profiling into routine prognostic assessments to enable early identification and timely intervention for patients at high risk of HT. Future prospective studies and external validation are required to confirm these findings and facilitate the development of personalized therapeutic strategies for patients with LGBCLs.
Supplementary Materials
The following supporting information can be downloaded at the website of this paper posted on
Preprints.org, Figure S1: Flow diagram of patient selection and cohort stratification; Figure S2: Nomogram for aggressive histologic transformation. (A) In the train set and (B) in the with next-generation sequencing.
Author contributions
Conceptualization, T.-Y.K. and Y.-W.J.; methodology, T.-Y.K.; validation, G.J.M., S.-G.C. and S.K.; formal analysis, T.Y.K.; data curation, J.H.L. and B.S.K.; resources, T.-J.K., E.J.H., J.W.J. and H.S.W.; writing—original draft preparation, T.Y.K; writing review and editing, Y.-W.J., T.-J.K., E.J.H., G.J.M., S.-G.C., S.K., J.H.L., B.S.K., J.W.J., and H.S.W.; supervision, Y.-W.J.; project administration, Y.-W.J. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding
Institutional Review Board Statement
The study was conducted in accordance with the Declaration of Helsinki, and approved by Institutional Review Board and Ethics Committee of the Catholic Medical Center, South Korea (approval numbers: SC23WISI0093 for train/validation cohorts and XC25RIDI0050 for the test cohort).
Informed Consent Statement
The requirement for patient consent was waived because of the retrospective design of the study.
Data Availability Statement
Acknowledgments
Not applicable.
Conflicts of Interest
The authors declare no conflicts of interest
Abbreviations
The following abbreviations are used in this manuscript:
| AUC |
Area under the curve |
| DLBCL |
Diffuse large B-cell lymphoma |
| FL |
Follicular lymphoma |
| FLIPI |
Follicular Lymphoma International Prognostic Index |
| GBM-Cox |
Gradient-boosted Cox model |
| HT |
Histologic transformation |
| LGBCL |
Low-grade B-cell lymphoma |
| MALT-IPI |
Mucosa-Associated Lymphoid Tissue Lymphoma Prognostic Index |
| ML |
Machine learning |
| MZL |
Marginal zone lymphoma |
| NGS |
Next-generation sequencing |
| NHL |
Non-Hodgkin lymphoma |
| PCA |
Principal component analysis |
| RSF |
Random Survival Forest |
| CI |
Confidence interval |
| HR |
Hazard ratio |
| FFPE |
Formalin-fixed, paraffin-embedded |
References
- Perry, A.M.; Diebold, J.; Nathwani, B.N.; MacLennan, K.A.; Müller-Hermelink, H.K.; Bast, M.; Boilesen, E.; Armitage, J.O.; Weisenburger, D.D. Non-Hodgkin lymphoma in the developing world: Review of 4539 cases from the International Non-Hodgkin Lymphoma Classification Project. Haematologica 2016, 101, 1244–1250. [Google Scholar] [CrossRef]
- Teras, L.R.; DeSantis, C.E.; Cerhan, J.R.; Morton, L.M.; Jemal, A.; Flowers, C.R. 2016 US lymphoid malignancy statistics by World Health Organization subtypes. CA Cancer J. Clin. 2016, 66, 443–459. [Google Scholar] [CrossRef]
- Abro, B.; Maurer, M. J.; Habermann, T. M. ; Burack,W. R.; Chapman, J. R.; Cohen, J. B.; Friedberg, J. W.; Inghirami, G.; Kahl, B. S.; Larson, M. C. et al. Real-world impact of differences in the WHO and ICC classifications of non-hodgkin lymphoma: A LEO cohort study analysis. Blood 2024, 144, 2063–66. [Google Scholar] [CrossRef]
- Wagner-Johnston, N.D.; Link, B.K.; Byrtek, M.; Dawson, K.L.; Hainsworth, J.; Flowers, C.R.; Friedberg, J.W.; Bartlett, N.L. Outcomes of transformed follicular lymphoma in the modern era: A report from the National LymphoCare Study (NLCS). Blood 2015, 126, 851–857. [Google Scholar] [CrossRef]
- Bult, J.A.A.; Huisman, F.; Zhong, Y.; Veltmaat, N.; Kluiver, J.; Tonino, S.H.; Vermaat, J.S.P.; Chamuleau, M.E.D.; Diepstra, A.; van den Berg, A.; et al. A population-based study of transformed marginal zone lymphoma: Identifying outcome-related characteristics. Blood Cancer J. 2023, 13, 130. [Google Scholar] [CrossRef] [PubMed]
- Solal-Céligny, P.; Roy, P.; Colombat, P.; White, J.; Armitage, J.O.; Arranz-Saez, R.; Au, W.Y.; Bellei, M.; Brice, P.; Caballero, D.; et al. Follicular lymphoma international prognostic index. Blood 2004, 104, 1258–1265. [Google Scholar] [CrossRef] [PubMed]
- Bachy, E.; Maurer, M.J.; Habermann, T.M.; Gelas-Dore, B.; Maucort-Boulch, D.; Estell, J.A.; Van den Neste, E.; Bouabdallah, R.; Gyan, E.; Feldman, A.L.; et al. A simplified scoring system in de novo follicular lymphoma treated initially with immunochemotherapy. Blood 2018, 132, 49–58. [Google Scholar] [CrossRef]
- Thieblemont, C.; Cascione, L.; Conconi, A.; Kiesewetter, B.; Raderer, M.; Gaidano, G.; Martelli, M.; Laszlo, D.; Coiffier, B.; Lopez Guillermo, A.; et al. A MALT lymphoma prognostic index. Blood 2017, 130, 1409–1417. [Google Scholar] [CrossRef]
- Mosquera Orgueira, A.; Cid López, M.; Peleteiro Raíndo, A.; Abuín Blanco, A.; Díaz Arias, J.Á.; González Pérez, M.S.; Antelo Rodríguez, B.; Bao Pérez, L.; Ferreiro Ferro, R.; Aliste Santos, C.; et al. Personally tailored survival prediction of patients with follicular lymphoma using machine learning transcriptome-based models. Front. Oncol. 2021, 11, 705010. [Google Scholar] [CrossRef] [PubMed]
- Hopper, M. A.; Wenzl, K.; Hartert, K. T.; Krull, J. E.; Dropik, A. R.; Novak, J. P.; Manske, M. K.; Serres, M. R.; Sarangi, V.; Larson, M. C.; et al. Molecular classification and identification of an aggressive signature in low-grade b-cell lymphomas. Hematological Oncology. 2023, 41, 644–54. [Google Scholar] [CrossRef] [PubMed]
- Fernández-Miranda, I.; Pedrosa, L.; González-Rincón, J.; Espinet, B.; de la Cruz Vicente, F.; Climent, F.; Gómez, S.; Royuela, A.; Camacho, F.I.; Martín-Acosta, P.; et al. Generation and external validation of a histologic transformation risk model for patients with follicular lymphoma. Mod. Pathol. 2024, 37, 100516. [Google Scholar] [CrossRef]
- Dreval, K.; Hilton, L.K.; Cruz, M.; Shaalan, H.; Ben-Neriah, S.; Boyle, M.; Collinge, B.; Coyle, K.M.; Duns, G.; Farinha, P.; et al. Genetic subdivisions of follicular lymphoma defined by distinct coding and noncoding mutation patterns. Blood 2023, 142, 561–573. [Google Scholar] [CrossRef]
- Cheson, B.D.; Fisher, R.I.; Barrington, S.F.; Cavalli, F.; Schwartz, L.H.; Zucca, E.; Lister, T.A. Recommendations for initial evaluation, staging, and response assessment of Hodgkin and non-Hodgkin lymphoma: The Lugano classification. J. Clin. Oncol. 2014, 32, 3059–3068. [Google Scholar] [CrossRef] [PubMed]
- Longato, E.; Vettoretti, M.; Di Camillo, B. A practical perspective on the concordance index for the evaluation and selection of prognostic time-to-event models. J. Biomed. Inform. 2020, 108, 103496. [Google Scholar] [CrossRef]
- Simon, N.; Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for Cox's proportional hazards model via coordinate descent. J. Stat. Softw. 2011, 39, 1–13. [Google Scholar] [CrossRef] [PubMed]
- Ishwaran, H.; Kogalur, U.B.; Blackstone, E.H.; Lauer, M.S. Random survival forests. Ann. Appl. Stat. 2008, 2, 841–860. [Google Scholar] [CrossRef]
- Mayr, A.; Binder, H.; Gefeller, O.; Schmid, M. The evolution of boosting algorithms. From machine learning to statistical modelling. Methods Inf. Med. 2014, 53, 419–427. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, L.; Rastegar-Mojarad, M.; Moon, S.; Shen, F.; Afzal, N.; Liu, S.; Zeng, Y.; Mehrabi, S.; Sohn, S.; et al. Clinical information extraction applications: A literature review. J Biomed Inform 2018, 77, 34–49. [Google Scholar] [CrossRef] [PubMed]
- Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010, 33, 1–22. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Jerome, H.F. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C.; XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, 13–17 August 2016; pp. 785–794. [CrossRef]
- Harrell, F.E.; Lee, K.L.; Mark, D.B. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 1996, 15, 361–387. [Google Scholar] [CrossRef]
- Chen, W.; Zhou, B.; Jeon, C.Y.; Xie, F.; Lin, Y.C.; Butler, R.K.; Zhou, Y.; Luong, T.Q.; Lustigova, E.; Pisegna, J.R.; et al. Machine learning versus regression for prediction of sporadic pancreatic cancer. Pancreatology 2023, 23, 396–402. [Google Scholar] [CrossRef]
- Cygu, S.; Seow, H.; Dushoff, J.; Bolker, B.M. Comparing machine learning approaches to incorporate time-varying covariates in predicting cancer survival time. Sci. Rep. 2023, 13, 1370. [Google Scholar] [CrossRef]
- Zha, J.; Chen, Q.; Zhang, W.; Jing, H.; Ye, J.; Liu, H.; Yu, H.; Yi, S.; Li, C.; Zheng, Z.; et al. A machine learning-based model to predict POD24 in follicular lymphoma: A study by the Chinese workshop on follicular lymphoma. Biomark. Res. 2025, 13, 2. [Google Scholar] [CrossRef] [PubMed]
- Kridel, R.; Chan, F.C.; Mottok, A.; Boyle, M.; Farinha, P.; Tan, K.; Meissner, B.; Bashashati, A.; McPherson, A.; Roth, A.; et al. Histological transformation and progression in follicular lymphoma: a clonal evolution study. PLoS Med. 2016, 13, e1002197. [Google Scholar] [CrossRef] [PubMed]
- Pasqualucci, L.; Khiabanian, H.; Fangazio, M.; Vasishtha, M.; Messina, M.; Holmes, A.B.; Ouillette, P.; Trifonov, V.; Rossi, D.; Tabbò, F.; et al. Genetics of follicular lymphoma transformation. Cell Rep. 2014, 6, 130–140. [Google Scholar] [CrossRef] [PubMed]
- Warren, M.; Chung, Y.J.; Howat, W.J.; Harrison, H.; McGinnis, R.; Hao, X.; McCafferty, J.; Fredrickson, T.N.; Bradley, A.; Morse, H.C. Irradiated Blm-deficient mice are a highly tumor prone model for analysis of a broad spectrum of hematologic malignancies. Leuk. Res. 2010, 34, 210–220. [Google Scholar] [CrossRef]
- Schuetz, J.M.; MaCarthur, A.C.; Leach, S.; Lai, A.S.; Gallagher, R.P.; Connors, J.M.; Gascoyne, R.D.; Spinelli, J.J.; Brooks-Wilson, A.R. Genetic variation in the NBS1, MRE11, RAD50 and BLM genes and susceptibility to non-Hodgkin lymphoma. BMC Med. Genet. 2009, 10, 117. [Google Scholar] [CrossRef]
- Gindin, T.; Murty, V.; Alobeid, B.; Bhagat, G. MLL/KMT2A translocations in diffuse large B-cell lymphomas. Hematol. Oncol. 2015, 33, 239–246. [Google Scholar] [CrossRef]
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).