Submitted:
06 June 2026
Posted:
08 June 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
Contributions
- We introduce UM-TOTA, the first ViT-based architecture for ovarian ultrasound that integrates concept-based interpretability directly into the multi-task learning framework, rather than applying it post-hoc. All predictions pass through a clinical concept bottleneck whose targets are informed by the International Ovarian Tumor Analysis(IOTA) and Ovarian-Adnexal Reporting and Data System(O-RADS) criteria, ensuring that model explanations reflect the actual decision process rather than approximating it after the fact.
- We develop a knowledge-guided concept supervision framework that maps ground-truth tumor labels to IOTA- and O-RADS-defined clinical concept targets, going beyond generic concept bottleneck implementations to embed domain-specific diagnostic knowledge directly into model training. This enables clinicians to verify model reasoning using criteria that they already apply in practice.
- We demonstrate that a unified four-task architecture performing classification, malignancy detection, segmentation, and concept-based interpretability in a single forward pass achieves a 66.7% parameter reduction and 65.5% inference latency reduction over sequential single-task deployment, with concrete hardware measurements provided, while maintaining comparable diagnostic accuracy to specialised single-task models.
2. Related Works
2.1. Deep Learning Architectures and Multi-Task Strategies in Ovarian Cancer
2.2. From Post-Hoc Explanation to Concept-Based Interpretability in Ovarian Tumor Diagnosis
2.3. Evaluation Methodologies in Medical AI
3. Methodology
3.1. Dataset and Data Preparation
3.1.1. Dataset
3.1.2. Data Preparation and Augmentation
3.2. Unified Multi-Task Ovarian Tumor Architecture (UM-TOTA)
- 1.
- Eight-class tumor classification
- 2.
- Three-class malignancy detection
- 3.
- ROI-based tumor segmentation
- 4.
- Prediction of clinical semantic concepts for transparent diagnosis
- Transformer Backbone: A ViT encoder which extracts global contextual features from ultrasound images.
- Task Heads: Parallel classification, malignancy detection and segmentation heads generate respective predictions.
- Concept Bottleneck Module: Predicts interpretable clinical indicators such as boundary clarity, shape regularity, vascularization, and solid-component presence.
- Clinical Reasoning Unit: Integrates concept activations into attention-weighted decision pathways.
- Multi-Task Loss Coordinator: Joint optimization with dynamic weighting ensures stable convergence during training.
3.2.1. ViT Backbone Architecture
3.2.2. Task-Specific Head Architecture Design
3.2.3. Concept Bottleneck Model Integration
3.2.4. Clinical Reasoning Module Architecture
3.2.5. Multi-Task Loss Coordination System
3.3. Ablation Study Design and Task Configuration
3.4. Training and Validation Protocol
3.5. Comprehensive Evaluation Framework
4. Results and Discussion
4.1. Multi-Task Learning Performance and Clinical Significance
4.1.1. Task-Specific Performance Analysis and Comparison with related studies
4.1.2. Comparison with CNN-Based Multi-Task Baseline
4.1.3. Deployment Efficiency Analysis
4.2. Clinical Interpretability Results and Trust Building
4.2.1. IOTA/O-RADS Guideline Alignment
4.2.2. Clinical Reasoning Transparency and Trust Building
4.2.3. Interpretable Multi-Task Integration
4.3. Ablation Study Insights and Technical Innovations
- full_model: {classification: True, malignancy: True, segmentation: True, concepts: True}
- no_concepts: {classification: True, malignancy: True, segmentation: True, concepts: False}
- no_class_no_mal: {classification: False, malignancy: False, segmentation: True, concepts: True}
- segmentation_only: {classification: False, malignancy: False, segmentation: True, concepts: False}
- class_mal_only: {classification: True, malignancy: True, segmentation: False, concepts: False}
4.3.1. Task Combination Impact Analysis
- Classification Figure: no_concepts (82.44% ± 1.94%) > class_mal_only (80.67% ± 2.25%) > full_model (80.26% ± 1.10%)
- Malignancy Figure: no_concepts (92.38% ± 0.85%) > class_mal_only (91.22% ± 1.22%) > full_model (90.88% ± 1.14%)
- Segmentation Figure: segmentation_only (80.88% ± 1.04%) > full_model (77.29% ± 1.29%) > no_concepts (75.50% ± 0.94%)
4.3.2. Technical Component Validation
4.3.2.1. Quantified Interpretability Effect:
- Classification: 2.18 percentage points cost for clinical interpretability
- Malignancy detection: 1.50 percentage points cost for transparent reasoning
- Segmentation: 1.79 percentage points gain from concept-guided features
4.3.3. Optimal Clinical Configuration
- High-volume screening: The no_concepts variant (82.44% ± 1.94% classification, malignancy).
- Surgical planning: The segmentation_only variant (80.88% ± 1.04% Dice), as it is focused on boundary precision.
- Research/teaching: The full_model variant, which gives balanced performance and interpretable reasoning.
- Real-time decision support: The class_mal_only variant for its good performance and efficiency.
4.4. Comparative Analysis and Clinical Deployment
4.4.1. State-of-the-Art Positioning and Clinical Adoption Advantages
4.5. Limitations and Future Directions
5. Conclusion
Abbreviations
| AI | Artificial Intelligence |
| AUC | Area Under the Curve |
| CA125 | Cancer Antigen 125 |
| CBM | Concept Bottleneck Model |
| CI | Confidence Interval |
| CNN | Convolutional Neural Network |
| CNN-MTL | CNN-based Multi-Task Learning baseline |
| DSC | Dice Similarity Coefficient |
| IoU | Intersection over Union |
| IOTA | International Ovarian Tumor Analysis |
| LIME | Local Interpretable Model-agnostic Explanations |
| MMOTU | Multi-Modality Ovarian Tumor Ultrasound |
| MTL | Multi-Task Learning |
| OC | Ovarian Cancer |
| O-RADS | Ovarian-Adnexal Reporting and Data System |
| ROC | Receiver Operating Characteristic |
| SHAP | SHapley Additive exPlanations |
| TCAV | Testing with Concept Activation Vectors |
| UM-TOTA | Unified Multi-Task Ovarian Tumor Architecture |
| ViT | Vision Transformer |
| VRAM | Video Random Access Memory |
| XAI | Explainable Artificial Intelligence |
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
References
- World Cancer Research Fund International. Ovarian Cancer Statistics. Accessed. 2022. (accessed on 2026-04-18).
- Cancer Research UK. Screening for Ovarian Cancer, 2025. Last reviewed. 17 February 2025. (accessed on 2023-06-21).
- Sahu, S.A.; Shrivastava, D. A Comprehensive Review of Screening Methods for Ovarian Masses: Towards Earlier Detection. Cureus 2023, 15, e48534. [Google Scholar] [CrossRef]
- Tang, C.; Xu, Z.; Duan, H.; Zhang, S. Advancements in artificial intelligence for ultrasound diagnosis of ovarian cancer: a comprehensive review. Front. Oncol. 2025, 15, 1581157. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE, 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Qadir, H.A.; Shin, Y.; Solhusvik, J.; Bergsland, J.; Aabakken, L.; Balasingham, I. Polyp Detection and Segmentation Using Mask R-CNN: Does a Deeper Feature Extractor CNN Always Perform Better? In Proceedings of the 2019 13th International Symposium on Medical Information and Communication Technology (ISMICT), 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015); Springer; Lecture Notes in Computer Science; 2015; Vol. 9351, pp. 234–241. [Google Scholar] [CrossRef]
- Sanderson, E.; Matuszewski, B.J. FCN-Transformer Feature Fusion for Polyp Segmentation. In Proceedings of the Proceedings of the Medical Image Understanding and Analysis (MIUA 2022), Cambridge, UK; Cham, Switzerland, 24–26 August 2022; Lecture Notes in Computer Science. 2022; Vol. 13413, pp. 892–907. [Google Scholar] [CrossRef]
- Wang, R.; Cai, Y.; Lee, I.K.; Hu, R.; Purkayastha, S.; Pan, I.; Yi, T.; Tran, T.M.L.; Lu, S.; Liu, T.; et al. Evaluation of a convolutional neural network for ovarian tumor differentiation based on magnetic resonance imaging. Eur. Radiol. 2021, 31, 4960–4971. [Google Scholar] [CrossRef]
- Sengupta, D.; Ali, S.N.; Bhattacharya, A.; Mustafi, J.; Mukhopadhyay, A.; Sengupta, K. A deep hybrid learning pipeline for accurate diagnosis of ovarian cancer based on nuclear morphology. In PLOS ONE; Public Library of Science, 2022; Volume 17. [Google Scholar] [CrossRef]
- Hsu, S.T.; Su, Y.J.; Hung, C.H.; Chen, M.J.; Lu, C.H.; Kuo, C.E. Automatic ovarian tumors recognition system based on ensemble convolutional neural network with ultrasound imaging. BMC Med. Inform. Decis. Mak. 2022, 22, 298. [Google Scholar] [CrossRef]
- Ghoniem, R.M.; Algarni, A.D.; Refky, B.; Ewees, A.A. Multi-Modal Evolutionary Deep Learning Model for Ovarian Cancer Diagnosis Number: 4. In Symmetry; Multidisciplinary Digital Publishing Institute, 2021; Volume 13. [Google Scholar] [CrossRef]
- Zhao, Q.; Lyu, S.; Bai, W.; Cai, L.; Liu, B.; Cheng, G.; Wu, M.; Sang, X.; Yang, M.; Chen, L. MMOTU: A Multi-Modality Ovarian Tumor Ultrasound Image Dataset for Unsupervised Cross-Domain Semantic Segmentation 2207.06799 [cs]. 2023. [Google Scholar]
- Xu, T.; Farahani, H.; Bashashati, A. Multi-Resolution Vision Transformer for Subtype Classification in Ovarian Cancer Whole-Slide Histopathology Images; 2022. [Google Scholar] [CrossRef]
- Alahmadi, A. Towards ovarian cancer diagnostics: A vision transformer-based computer-aided diagnosis framework with enhanced interpretability. Results Eng. 2024, 23, 102651. [Google Scholar] [CrossRef]
- Alshdaifat, E.H.; Gharaibeh, H.; Sindiani, A.M.; Madain, R.; Al-Mnayyis, A.M.; Abu Mhanna, H.Y.; Almahmoud, R.E.; Akhdar, H.F.; Amin, M.; Nasayreh, A.; et al. Hybrid vision transformer and Xception model for reliable CT-based ovarian neoplasms diagnosis. Intell.-Based Med. 2025, 11, 100227. [Google Scholar] [CrossRef]
- Wei, S.; Hu, Z.; Tan, L. Res-ECA-UNet++: an automatic segmentation model for ovarian tumor ultrasound images based on residual networks and channel attention mechanism. In Frontiers in Medicine; Frontiers, 2025. [Google Scholar] [CrossRef]
- Musa, A.A.; Fernando, A. Vision Transformer for Ovarian Tumor Classification: A Comparative Study with CNNs on Ultrasound Imaging. In Proceedings of the 2026 IEEE International Conference on Consumer Electronics (ICCE), Dubai, United Arab Emirates, 2026; pp. 1–6. [Google Scholar] [CrossRef]
- Gunning, D.; Aha, D. DARPA’s Explainable Artificial Intelligence (XAI) Program. AI Mag. 2019, 40, 44–58. Number: 2. [Google Scholar] [CrossRef]
- Borys, K.; Schmitt, Y.A.; Nauta, M.; Seifert, C.; Krämer, N.; Friedrich, C.M.; Nensa, F. Explainable AI in medical imaging: An overview for clinical practitioners – Saliency-based XAI approaches. Eur. J. Radiol. 2023, 162, 110787. [Google Scholar] [CrossRef]
- Pang, W.; Ke, X.; Tsutsui, S.; Wen, B. Integrating Clinical Knowledge into Concept Bottleneck Models. In Proceedings of the Medical Image Computing and Computer Assisted Intervention – MICCAI 2024; Linguraru, M.G., Dou, Q., Feragen, A., Giannarou, S., Glocker, B., Lekadir, K., Schnabel, J.A., Eds.; Springer Nature Switzerland, 2024; pp. 243–253. [Google Scholar] [CrossRef]
- Koh, P.W.; Nguyen, T.; Tang, Y.S.; Mussmann, S.; Pierson, E.; Kim, B.; Liang, P. Concept bottleneck models. Proceedings of the Proceedings of the 37th International Conference on Machine Learning. JMLR.org 2020, Vol. 119(ICML’20), 5338–5348. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), 2017; pp. 618–626, ISSN 2380-7504. [Google Scholar] [CrossRef]
- Patrício, C.; Neves, J.C.; Teixeira, L.F. Explainable Deep Learning Methods in Medical Image Classification: A Survey. ACM Comput. Surv. 2023, 56, 85:1–85:41. [Google Scholar] [CrossRef]
- Shen, D.; Wu, G.; Suk, H.I. Deep Learning in Medical Image Analysis. Annu. Rev. Biomed. Eng. 2017, 19, 221–248. [Google Scholar] [CrossRef]
- Wu, C.; Wang, Y.; Wang, F. Deep Learning for Ovarian Tumor Classification with Ultrasound Images. In Proceedings of the Advances in Multimedia Information Processing – PCM 2018; Hong, R., Cheng, W.H., Yamasaki, T., Wang, M., Ngo, C.W., Eds.; Springer International Publishing, 2018; pp. 395–406. [Google Scholar] [CrossRef]
- Wang, H.; Liu, C.; Zhao, Z.; Zhang, C.; Wang, X.; Li, H.; Wu, H.; Liu, X.; Li, C.; Qi, L.; et al. Application of Deep Convolutional Neural Networks for Discriminating Benign, Borderline, and Malignant Serous Ovarian Tumors From Ultrasound Images. Front. Oncol. 2021, 11. [Google Scholar] [CrossRef]
- Srivastava, S.; Kumar, P.; Chaudhry, V.; Singh, A. Detection of Ovarian Cyst in Ultrasound Images Using Fine-Tuned VGG-16 Deep Learning Network Number: 2. In SN Computer Science; Springer, 2020; Volume 1, pp. 1–8. [Google Scholar] [CrossRef]
- Karimzadeh, M.; Vakanski, A.; Xian, M.; Zhang, B. Post-Hoc Explainability of BI-RADS Descriptors in a Multi-Task Framework for Breast Cancer Detection and Segmentation. In Proceedings of the 2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP), 2023; pp. 1–6, ISSN 2161-0371. [Google Scholar] [CrossRef]
- Christiansen, F.; Konuk, E.; Ganeshan, A.R.; Welch, R.; Palés Huix, J.; Czekierdowski, A.; Leone, F.P.G.; Haak, L.A.; Fruscio, R.; Gaurilcikas, A.; et al. International multicenter validation of AI-driven ultrasound detection of ovarian cancer. Nat. Med. 2025, 31, 189–196. [Google Scholar] [CrossRef] [PubMed]
- Su, C.; Miao, K.; Zhang, L.; Yu, X.; Guo, Z.; Li, D.; Xu, M.; Zhang, Q.; Dong, X. Multimodal Deep Learning Based on Ultrasound Images and Clinical Data for Better Ovarian Cancer Diagnosis. J. Imaging Inform. Med. 2025. [Google Scholar] [CrossRef] [PubMed]
- Heinrich, M.P. Intra-operative Ultrasound to MRI Fusion with a Public Multimodal Discrete Registration Tool. In Proceedings of the Simulation, Image Processing, and Ultrasound Systems for Assisted Diagnosis and Navigation; Stoyanov, D., Taylor, Z., Aylward, S., Tavares, J.M.R., Xiao, Y., Simpson, A., Martel, A., Maier-Hein, L., Li, S., Rivaz, H., et al., Eds.; Springer International Publishing, 2018; pp. 159–164. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. 2021, 2010.11929 [cs]. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. 2023, 1706.03762 [cs]. [Google Scholar] [CrossRef]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. [cs]. 2021. [Google Scholar] [CrossRef]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. [eess]. 2021. [Google Scholar] [CrossRef]
- Hatamizadeh, A.; Tang, Y.; Nath, V.; Yang, D.; Myronenko, A.; Landman, B.; Roth, H.; Xu, D. UNETR: Transformers for 3D Medical Image Segmentation. 2021, 2103.10504 [eess]. [Google Scholar] [CrossRef]
- Li, L.; He, L.; Guo, W.; Ma, J.; Sun, G.; Ma, H. PMFFNet: A hybrid network based on feature pyramid for ovarian tumor segmentation. In PLOS ONE; Public Library of Science, 2024; Volume 19. [Google Scholar] [CrossRef]
- Nazir, M.; Shakil, S.; Khurshid, K. End-to-End Multi-task Learning Architecture for Brain Tumor Analysis with Uncertainty Estimation in MRI Images. J. Imaging Inform. Med. 2024, 37, 2149–2172. [Google Scholar] [CrossRef]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. Why Should I Trust You?": Explaining the Predictions of Any Classifier. Proceedings of the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery 2016, KDD ’16, 1135–1144. [Google Scholar] [CrossRef]
- Lundberg, S.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. 2017, 1705.07874 [cs]. [Google Scholar] [CrossRef]
- Kim, B.; Wattenberg, M.; Gilmer, J.; Cai, C.; Wexler, J.; Viegas, F.; Sayres, R. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). 2018, 1711.11279 [stat]. [Google Scholar] [CrossRef]
- Wang, H.; Hou, J.; Chen, H. Concept Complement Bottleneck Model for Interpretable Medical Image Diagnosis 2410.15446 [cs]]. version: 1. 2024. [Google Scholar] [CrossRef]
- Lucieri, A.; Bajwa, M.N.; Braun, S.A.S.; Malik, M.I.; Dengel, A.; Ahmed, S. ExAID: A multimodal explanation framework for computer-aided diagnosis of skin lesions. Comput. Methods Programs Biomed. 2022, 215, 106620. [Google Scholar] [CrossRef] [PubMed]
- Rudin, C. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. 2019, 1811.10154 [stat]. [Google Scholar] [CrossRef] [PubMed]
- Timmerman, D.; Valentin, L.; Bourne, T.H.; Collins, W.P.; Verrelst, H.; Vergote, I. Terms, definitions and measurements to describe the sonographic features of adnexal tumors: a consensus opinion from the International Ovarian Tumor Analysis (IOTA) group. Ultrasound Obstet. Gynecol. 2000, 16, 500–505. _eprint. Available online: https://obgyn.onlinelibrary.wiley.com/doi/pdf/10.1046/j.1469-0705.2000.00287.x. [CrossRef]
- Andreotti, R.F.; Timmerman, D.; Strachowski, L.M.; Froyman, W.; Benacerraf, B.R.; Bennett, G.L.; Bourne, T.; Brown, D.L.; Coleman, B.G.; Frates, M.C.; et al. O-RADS US Risk Stratification and Management System: A Consensus Guideline from the ACR Ovarian-Adnexal Reporting and Data System Committee. In Radiology; Radiological Society of North America: Publisher, 2020; Volume 294, pp. 168–185. [Google Scholar] [CrossRef]
- Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the Proceedings of the 14th international joint conference on Artificial intelligence-Volume 2 1995, IJCAI’95, 1137–1143. [Google Scholar]
- Arlot, S.; Celisse, A. A survey of cross-validation procedures for model selection. Stat. Surv. 2010, 4, 0907.4728 [math]. [Google Scholar] [CrossRef]
- Roberts, M.; Driggs, D.; Thorpe, M.; Gilbey, J.; Yeung, M.; Ursprung, S.; Aviles-Rivero, A.I.; Etmann, C.; McCague, C.; Beer, L.; et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat. Mach. Intell. 2021, 3, 199–217. [Google Scholar] [CrossRef]
- Isensee, F.; Jaeger, P.F.; Kohl, S.A.A.; Petersen, J.; Maier-Hein, K.H. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. In Nature Methods; Nature Publishing Group: Publisher, 2021; Volume 18, pp. 203–211. [Google Scholar] [CrossRef]
- Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; Laak, J.A.W.M.v.d.; Ginneken, B.v.; Sánchez, C.I. A Survey on Deep Learning in Medical Image Analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef]
- Demšar, J. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
- Nadeau, C.; Bengio, Y. Inference for the Generalization Error. In Machine Learning; Number: 3 Publisher; Springer, 2003; Volume 52, pp. 239–281. [Google Scholar] [CrossRef]
- Bouthillier, X.; Delaunay, P.; Bronzi, M.; Trofimov, A.; Nichyporuk, B.; Szeto, J.; Sepah, N.; Raff, E.; Madan, K.; Voleti, V. Accounting for Variance in Machine Learning Benchmarks 2103.03098 [cs]. 2021. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009; pp. 248–255, ISSN 1063-6919. [Google Scholar] [CrossRef]
- Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Kato, S.; Hotta, K. Adaptive t-vMF Dice Loss for Multi-class Medical Image Segmentation. [eess]. 2022. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE, 2016; pp. 2818–2826. [Google Scholar] [CrossRef]
- Ruder, S. An Overview of Multi-Task Learning in Deep Neural Networks. arXiv 2017, arXiv:1706.05098. [Google Scholar] [CrossRef]
- Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
- Timmerman, D.; Testa, A.C.; Bourne, T.; Ameye, L.; Jurkovic, D.; Van Holsbeke, C.; Paladini, D.; Van Calster, B.; Vergote, I.; Van Huffel, S.; et al. Simple ultrasound-based rules for the diagnosis of ovarian cancer. Ultrasound Obstet. Gynecol. 2008, 31, 681–690. Available online: https://onlinelibrary.wiley.com/doi/pdf/10.1002/uog.5365. [CrossRef]
- Garg, S.; Kaur, A.; Mohi, J.K.; Sibia, P.K.; Kaur, N. Evaluation of IOTA Simple Ultrasound Rules to Distinguish Benign and Malignant Ovarian Tumours. J. Clin. Diagn. Res. JCDR 2017, 11, TC06–TC09. [Google Scholar] [CrossRef]
- Mitchell, S.; Nikolopoulos, M.; El-Zarka, A.; Al-Karawi, D.; Al-Zaidi, S.; Ghai, A.; Gaughran, J.E.; Sayasneh, A. Artificial Intelligence in Ultrasound Diagnoses of Ovarian Cancer: A Systematic Review and Meta-Analysis Number: 2. In Cancers; Multidisciplinary Digital Publishing Institute, 2024; Volume 16. [Google Scholar] [CrossRef]
- Schäfer, R.; Nicke, T.; Höfener, H.; Lange, A.; Merhof, D.; Feuerhake, F.; Schulz, V.; Lotz, J.; Kiessling, F. Overcoming data scarcity in biomedical imaging with a foundational multi-task model. In Nature Computational Science; Nature Publishing Group, 2024; Volume 4, pp. 495–509. [Google Scholar] [CrossRef]
- Rhanoui, M.; Alaoui Belghiti, K.; Mikram, M. Multi-Task Deep Learning for Simultaneous Classification and Segmentation of Cancer Pathologies in Diverse Medical Imaging Modalities Number: 3. In Onco; Multidisciplinary Digital Publishing Institute, 2025; Volume 5. [Google Scholar] [CrossRef]
- Yu, T.; Kumar, S.; Gupta, A.; Levine, S.; Hausman, K.; Finn, C. Gradient Surgery for Multi-Task Learning. In Proceedings of the Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS 2020), Virtual Event, 6–12 December 2020; Curran Associates, Inc., 2020; pp. 5824–5836. [Google Scholar]
- Bui, P.N.; Le, D.T.; Bum, J. Multi-scale Feature Enhancement in Multi-task Learning for Medical Image Analysis. arXiv 2024, arXiv:2412.00351. [Google Scholar] [CrossRef]












| Task | Metric | Mean ± Std | Minimum | Maximum |
|---|---|---|---|---|
| 8-Class Classification | Accuracy | 80.26% ± 1.10% | 79.3% | 83.0% |
| Precision | 81.07% ± 0.89% | – | – | |
| Sensitivity (Recall) | 80.26% ± 1.10% | – | – | |
| Specificity | 97.06% ± 0.15% | – | – | |
| F1-Score | 80.24% ± 1.00% | – | – | |
| 3-Class Malignancy | Accuracy | 90.88% ± 1.14% | 89.4% | 91.8% |
| Precision | 90.94% ± 1.27% | – | – | |
| Sensitivity (Recall) | 90.88% ± 1.14% | – | – | |
| Specificity | 90.41% ± 1.60% | – | – | |
| F1-Score | 90.57% ± 1.22% | – | – | |
| Segmentation | Dice Score | 77.29% ± 1.29% | 75.1% | 79.0% |
| IoU | 66.57% ± 1.40% | – | – |
| Study | Accuracy | Sensitivity | Specificity | Interpretability | Multi-Task? |
|---|---|---|---|---|---|
| Zhao et al. [13] | 80.60% | – | – | × | × |
| Mitchell et al. [64] | – | 81.00% | 92.00% | – | × |
| Garg et al. [63] (IOTA Rules) | 86.66% | 91.66% | 84.84% | Manual Rules | × |
| Christiansen et al. [30] | – | 89.31% | 82.67% | – | × |
| Karimzadeh et al. [29] | 91.30% | 94.00% | 85.80% | Post-hoc (SHAP) | ✓ |
| Nazir et al. [39] | 95.10% | – | – | × | ✓ |
| UM-TOTA (8-Class) | 80.26% | 80.26% | 97.06% | Inherent | ✓ |
| UM-TOTA (Malignancy) | 90.88% | 90.88% | 90.41% | Inherent | ✓ |
| Model | Backbone | 8-Class Accuracy |
Malignancy Accuracy |
Dice Score |
|---|---|---|---|---|
| CNN-MTL | ResNet50 | 79.31% ± 2.96% | 90.13% ± 0.88% | 77.71% ± 0.52% |
| UM-TOTA | ViT-Base/16 | 80.26% ± 1.10% | 90.88% ± 1.14% | 77.29% ± 1.29% |
| p-value (paired t-test) | – | 0.178 (ns) | 0.396 (ns) | 0.463 (ns) |
| Metric | Sequential (3×ViT) | UM-TOTA (Unified) | Improvement |
|---|---|---|---|
| Parameters | 259.8 M | 86.6 M | 66.7% reduction |
| Latency (ms/img) | 26.01 ms | ms | 65.5% reduction |
| GPU Memory (MB) | 20.8 MB | 7.4 MB | 64.2% reduction |
| Throughput (img/s) | 48.9 | 147.9 | 202.7% increase |
| Concept (IOTA/O-RADS) | Benign | Malignant | Importance | p-value |
|---|---|---|---|---|
| Vascularization | 0.167 | 0.399 | ||
| Papillary projections | 0.072 | 0.204 | ||
| Solid components | 0.313 | 0.644 | ||
| Ascites presence | 0.068 | 0.199 | ||
| Boundary clarity | 0.669 | 0.626 | ||
| Cystic components | 0.606 | 0.356 | ||
| Homogeneous texture | 0.687 | 0.493 | ||
| Acoustic shadowing | 0.259 | 0.405 | ||
| Posterior enhancement | 0.475 | 0.257 | ||
| Shape regularity | 0.405 | 0.641 |
| Task | Model Configuration | Metric Value | Compared Model | t-stat | p-value |
|---|---|---|---|---|---|
| 8-Class Classification | UM-TOTA (Full Model) | 80.26% ± 1.10% | – | – | – |
| vs. Specialized (class_mal_only) | – | 80.67% ± 2.25% | |||
| vs. Ablated (no_concepts) | – | 82.44% ± 2.17% | |||
| 3-Class Malignancy | UM-TOTA (Full Model) | 90.88% ± 1.14% | – | – | – |
| vs. Specialized (class_mal_only) | – | 91.22% ± 1.37% | |||
| vs. Ablated (no_concepts) | – | 92.38% ± 0.95% | |||
| Segmentation | UM-TOTA (Full Model) | 77.29% ± 1.29% | – | – | – |
| vs. Specialized (segmentation_only) | – | 80.88% ± 1.04% | *** |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).