Submitted:
29 May 2026
Posted:
01 June 2026
You are already at the latest version
Abstract

Keywords:
1. Introduction
2. Materials and Methods
2.1. Design and Reporting Framework
2.2. Information Sources and Search Strategy
2.3. Eligibility Criteria
- Included pediatric patients or pediatric imaging data.
- Evaluated AI, machine learning, deep learning, or computer-aided diagnostic approaches.
- Addressed VUR detection, VUR grading, low/high-grade classification, AI-assisted interpretation, or clinically related VUR prediction.
- Used VCUG or accepted VUR-related imaging or clinical data.
- Reported diagnostic performance, reliability, agreement, or clinician-comparison metrics.
- Used human pediatric datasets.
2.4. Study Selection and Data Charting
2.5. Evidence Classification and Confidence Mapping
2.6. Synthesis Approach
3. Results
3.1. Study Selection

3.2. Characteristics of Included Studies
| Study | Year | Venue | DOI | Evidence category | Confidence |
|---|---|---|---|---|---|
| Li et al. [4] | 2024 | eClinicalMedicine | 10.1016/j.eclinm.2024.102466 | Direct VCUG grading / AI-assisted interpretation | High/moderate |
| Khondker et al. [5] | 2022 | Journal of Urology | 10.1097/JU.0000000000002987 | Quantitative grading / reliability | High/moderate |
| Wu et al. [6] | 2025 | Research | 10.34133/research.0771 | AI-assisted interpretation | High/moderate |
| Khondker et al. [7] | 2022 | Journal of Pediatric Urology | 10.1016/j.jpurol.2021.10.009 | Quantitative grading | Moderate |
| Eroglu et al. [8] | 2021 | Computer Methods and Programs in Biomedicine | 10.1016/j.cmpb.2021.106369 | Direct VCUG grading | Moderate |
| Chen et al. [9] | 2025 | Journal of Imaging Informatics in Medicine | 10.1007/s10278-025-01438-1 | Direct VCUG grading | Moderate |
| Wang et al. [10] | 2024 | Journal of Pediatric Urology | 10.1016/j.jpurol.2023.11.003 | Indirect prediction | Low/moderate |
| Ergun et al. [11] | 2024 | Journal of Surgery and Medicine | 10.28982/josam.8020 | Direct VCUG grading | Low/moderate |
| Kabir et al. [12] | 2024 | Journal of Pediatric Urology | 10.1016/j.jpurol.2023.10.030 | Low/high-grade classification | Low/moderate |
| Alqaraleh et al. [13] | 2025 | Data and Metadata | 10.56294/dm2025756 | Classification study | Low/moderate |
| Estrada et al. [14] | 2019 | Journal of Urology | 10.1097/JU.0000000000000186 | Indirect clinical prediction | Moderate |
| Kose et al. [15] | 2020 | BioMed Research International | 10.1155/2020/1895076 | Indirect clinical prediction | Low/moderate |
| Wahed et al. [16] | 2024 | NETAPPS Conference | 10.1109/NETAPPS63333.2024.10823509 | Conference / limited report | Low |
3.3. AI Diagnostic Performance
| Study | Main finding | Clinical interpretation |
|---|---|---|
| Li 2024 [4] | External AUC 0.944 for unilateral VUR and 0.924 for bilateral VUR | Strong externally validated signal |
| Khondker 2022 [5] | AUC 0.84; reliability improved 3.6-fold | AI may improve grading consistency |
| Wu 2025 [6] | Clinician AUC markedly improved with AI assistance | Strong signal for clinician support |
| Eroglu 2021 [8] | Accuracy 96.9% | Promising, but validation details should be interpreted cautiously |
| Chen 2025 [9] | Patient-level accuracy 0.84; AUC 0.82 | Clinically plausible side-specific performance |
| Small/public datasets [11,13,16] | Near-perfect or limited-report metrics | Exploratory only; overfitting and reporting-bias risk |
3.4. Radiologist and Clinician Comparator Evidence
| Study | Baseline clinician performance | AI-assisted performance | Interpretation |
|---|---|---|---|
| Li 2024 [4] | Clinician comparison performed | Improved junior and senior clinician performance | AI may support readers across experience levels |
| Khondker 2022 [5] | Similar accuracy; lower reliability | Reliability improved 3.6-fold | AI may standardize grading |
| Wu 2025 [6] | Left VUR AUC 0.6288; right VUR AUC 0.7305 | Left VUR AUC 0.9641; right VUR AUC 0.9506 | Strongest signal for AI-assisted interpretation |
3.5. Direct Versus Indirect Evidence
3.6. Clinical Management Implications
4. Evidence Quality Map
| Domain | Main concern | Clinical impact |
|---|---|---|
| Search reproducibility | Screening and extraction exports were used; conventional database exports were not available | Limits claims of comprehensive systematic search |
| Full-text assessment | Full-text verification status was not available in the supplied export information | Requires manual verification before a formal systematic review |
| Study heterogeneity | Different target tasks, datasets, units of analysis, and outcomes | Prevents reliable meta-analysis |
| Comparator evidence | Limited direct AI-versus-radiologist or AI-versus-clinician studies | Restricts superiority claims |
| Validation quality | External validation was limited to selected studies | Real-world performance remains uncertain |
| Small datasets | Near-perfect metrics were reported in some small cohorts | Overfitting risk |
| Clinical management evidence | No outcome-based management studies | Limits implementation claims |
| Reporting quality | Incomplete sensitivity, calibration, confidence interval, and decision-curve reporting | Reduces comparability and implementation readiness |
5. Discussion
5.1. Principal Findings
5.2. Interpretation in the Context of Current Evidence
5.3. Clinical Implementation
5.4. Reporting Standards for Future AI studies
5.5. Limitations
5.6. Future Directions
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Use of AI Tools
Conflicts of Interest
References
- Lebowitz, R.L.; Olbing, H.; Parkkulainen, K.V.; Smellie, J.M.; Tamminen-Mobius, T.E. International system of radiographic grading of vesicoureteric reflux. Pediatr. Radiol. 1985, 15, 105–109. [CrossRef]
- Peters, C.A.; Skoog, S.J.; Arant, B.S.; Copp, H.L.; Elder, J.S.; Hudson, R.G.; Khoury, A.E.; Lorenzo, A.J.; Pohl, H.G.; Shapiro, E.; Snodgrass, W.T.; Diaz, M. Summary of the AUA guideline on management of primary vesicoureteral reflux in children. J. Urol. 2010, 184, 1134–1144. [CrossRef]
- Tricco, A.C.; Lillie, E.; Zarin, W.; O’Brien, K.K.; Colquhoun, H.; Levac, D.; Moher, D.; Peters, M.D.J.; Horsley, T.; Weeks, L.; et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): checklist and explanation. Ann. Intern. Med. 2018, 169, 467–473. [CrossRef]
- Li, Z.; Tan, Z.; Wang, Z.; Tang, W.; Ren, X.; Fu, J.; Wang, G.; Chu, H.; Chen, J.; Duan, Y.; et al. Development and multi-institutional validation of a deep learning model for grading of vesicoureteral reflux on voiding cystourethrogram: a retrospective multicenter study. EClinicalMedicine 2024, 69, 102466. [CrossRef]
- Khondker, A.; Kwong, J.C.C.; Yadav, P.; Chan, J.Y.H.; Singh, A.; Skreta, M.; et al. Multi-institutional validation of improved vesicoureteral reflux assessment with simple and machine learning approaches. J. Urol. 2022, 208, 1314–1322. [CrossRef]
- Wu, M.; Li, Z.; Liu, Y.; et al. Dynamic multi-image weighting for automated detection and diagnosis of abnormal urinary tract on voiding cystourethrography with a deep learning system: a retrospective, large-scale, multicenter study. Research 2025, 8, 0771. [CrossRef]
- Khondker, A.; Kwong, J.C.C.; Rickard, M.; Skreta, M.; Keefe, D.T.; Lorenzo, A.J.; Erdman, L. A machine learning-based approach for quantitative grading of vesicoureteral reflux from voiding cystourethrograms: methods and proof of concept. J. Pediatr. Urol. 2022, 18, e1–e78-78.e7. [CrossRef]
- Eroglu, Y.; Yildirim, K.; Cinar, A.; Yildirim, M. Diagnosis and grading of vesicoureteral reflux on voiding cystourethrography images in children using a deep hybrid model. Comput. Methods Programs Biomed. 2021, 210, 106369. [CrossRef]
- Chen, G.; Su, L.; Wang, S.; Liu, X.; Wu, W.; Zhang, F.; Zhao, Y.; Zhu, L.; Zhang, H.; Wang, X.; Yu, G. Automated grading of vesicoureteral reflux (VUR) using a dual-stream CNN model with deep supervision. J. Imaging Inform. Med. 2025, 38, 3517–3525. [CrossRef]
- Wang, H.-H.S.; Li, M.; Cahill, D.; Panagides, J.; Logvinenko, T.; Chow, J.; Nelson, C. A machine learning algorithm predicting risk of dilating VUR among infants with hydronephrosis using UTD classification. J. Pediatr. Urol. 2024, 20, 271–278. [CrossRef]
- Ergun, O.; Serel, T.A.; Ozturk, S.A.; Serel, H.B.; Soyupek, S.; Hoscan, B. Deep-learning-based diagnosis and grading of vesicoureteral reflux: a novel approach for improved clinical decision-making. J. Surg. Med. 2024, 8, 12–16. [CrossRef]
- Kabir, S.; Pippi Salle, J.L.; Chowdhury, M.E.H.; Abbas, T.O. Quantification of vesicoureteral reflux using machine learning. J. Pediatr. Urol. 2024, 20, 257–264. [CrossRef]
- Alqaraleh, M.; Alzboon, M.S.; Al-Batah, M.S.; Al Aesa, L.Y.; Abu-Arqoub, M.H.; Marie, R.R.; Alsmadi, F.H. Machine learning-based quantification of vesicoureteral reflux with enhancing accuracy and efficiency. Data Metadata 2025, 4, 756. [CrossRef]
- Estrada, C.R.; Nelson, C.P.; Wang, H.H.; Bertsimas, D.; Dunn, J.; Li, M.; Zhuo, D.; et al. Targeted workup after initial febrile urinary tract infection: using a novel machine learning model to identify children most likely to benefit from voiding cystourethrogram. J. Urol. 2019, 202, 144–152. [CrossRef]
- Kose, T.; Ozgur, S.; Cosgun, E.; Keskinoglu, A.; Keskinoglu, P. Effect of missing data imputation on deep learning prediction performance for vesicoureteral reflux and recurrent urinary tract infection clinical study. BioMed Res. Int. 2020, 2020, 1895076. [CrossRef]
- Wahed, M.A.; Alzboon, M.; Alqaraleh, M.; et al. Enhancing diagnostic precision in pediatric urology: machine learning models for automated grading of vesicoureteral reflux. In Proceedings of the 2024 7th International Conference on Internet Applications, Protocols, and Services (NETAPPS), 2024. [CrossRef]
- Mongan, J.; Moy, L.; Kahn, C.E., Jr. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): a guide for authors and reviewers. Radiol. Artif. Intell. 2020, 2, e200029. [CrossRef]
- Tejani, A.S.; Klontzas, M.E.; Gatti, A.A.; et al. Checklist for Artificial Intelligence in Medical Imaging: CLAIM 2024 update. Radiol. Artif. Intell. 2024. [CrossRef]
- Collins, G.S.; Moons, K.G.M.; Dhiman, P.; Riley, R.D.; Beam, A.L.; Van Calster, B.; et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 2024, 385, e078378. [CrossRef]
- Sounderajah, V.; Guni, A.; Liu, X.; et al. The STARD-AI reporting guideline for diagnostic accuracy studies using artificial intelligence. Nat. Med. 2025. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).