Submitted:
23 December 2024
Posted:
25 December 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
2.1. Existing AI Applications
2.2. Review of Existing Datasets
2.3. Limitations of Existing Datasets
3. Materials and Methods
3.1. Endometriosis Dataset
3.1.1. Overview
3.1.2. Data Collection and Composition
3.1.3. Annotation Details
3.1.4. Video Characteristics
3.1.5. Data Organization and Accessibility
3.1.6. Considerations and Challenges
3.2. Methods
3.3. Data Preprocessing
3.4. Object Detection Models
- FasterRCNN: We used a FasterRCNN model with a ResNet50 backbone, initialised with pretrained weights from the ImageNet1K_V2 model in PyTorch’s torchvision model zoo. The model has 60.27M parameters and was trained with heavy augmentations to handle a high variability in the dataset.
- MaskRCNN: This model [14], also based on the ResNet50 backbone, initialised with pretrained weights from the ImageNet1K_V2 model, was trained using bounding box-based segmentation masks. Due to the lack of precise segmentation maps, we filled bounding boxes to create rectangular masks, which impacted the models performance. Copy-paste augmentation was not applied to MaskRCNN. The MaskRCN model was employed in an unconventional manner to generate bounding boxes. This approach was hypothesized to be effective based on the characteristics of the dataset and the nature of the annotated tissue classes. One of the tissue classes in the dataset was provided with a few hundred precise segmentation masks, which were utilized to generate bounding boxes for that specific class, ensuring consistency in annotation and model input format. Additionally, two classes in the dataset, originally annotated using bounding boxes, suffered from a significant inclusion of background elements. This issue stemmed from the physical appearance of the tissues, which often resembled string-like structures and were predominantly located at oblique angles. Consequently, the bounding boxes captured substantial portions of the surrounding background, reducing annotation precision. By using the MaskRCNN model to generate bounding boxes based on segmentation masks, it was hypothesized that the quality and relevance of the annotations for these tissue classes would be improved. This method aimed to ensure that the annotations better represented the target structures while minimizing unnecessary background information, thus enhancing model training and performance.
- YOLOv9: We utilized the largest YOLOv9e model, known for its extensive architecture and high performance on object detection tasks. The model was pretrained on the COCO dataset [17] and supports 640 × 640 resolution inputs. It was fine tuned on our dataset with the default Ultralytics augmentations.
3.5. Training Strategy
3.5.1. Stratified vs. Non-Stratified Splits
3.5.2. Augmentation Techniques
3.6. Experimental Setup
3.6.1. Evaluation Metrics
- Precision: measures the proportion of correctly predicted positive instances among all predicted positives.
- Recall: reflects the model’s ability to identify true positives (actual lesions).
- mAP50: evaluates localization accuracy at an Intersection over Union (IoU) threshold of 0.50.
- mAP50-95: assesses mean Average Precision over a range of IoU thresholds (0.50 to 0.95).
- Fitness: a combined metric that balances precision and recall, often represented by the F-1 Score.
3.7. Challenges and Limitations
4. Results
4.1. Main Results






YOLOv9 Performance
- Stratified split (Figure 3): YOLOv9 showed lower precision and recall compared to FasterRCNN, with substantial variation in the stratified scenario. The F-1 curves demonstrate challenges in maintaining a balance between precision and recall, particularly on the validation and test sets.
- Non-Stratified split (Figure 4): YOLOv9 improved its performance in the non-stratified case, as reflected in higher precision, recall, and mAP scores. The non-stratified F-1 curves also show better consistency across training and validation sets.
FasterRCNN Performance
- Stratified split (Figure 5): FasterRCNN demonstrated superior performance across all metrics in the stratified scenario, with precision exceeding 0.97 and relatively stable recall. The mAP50 and mAP50-95 scores indicate that the model was able to detect and segment objects with high accuracy.
- Non-Stratified split (Figure 6): While the performance of FasterRCNN remained high in the non-stratified scenario, slight differences were observed in precision and recall across the training, validation, and test datasets. F-1 curves for FasterRCNN consistently showed better generalization and balance across all splits compared to YOLOv9.
Comparison of Test Performance Metrics
5. Discussion of Challenges and Future Directions
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Hummelshoj, L.; Prentice, A.; Groothuis, P. Update on Endometriosis: 9th World Congress on Endometriosis, 14–17 September 2005, Maastricht, the Netherlands. Women’s Health 2006, 2, 53–56. [Google Scholar] [CrossRef] [PubMed]
- Beata, S.; Szyłło, K.; Romanowicz, H. Endometriosis: Epidemiology, Classification, Pathogenesis, Treatment and Genetics (Review of Literature). International Journal of Molecular Sciences 2021, 22, 10554. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 2015, 39. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence 2018, 42, 386 –397. [Google Scholar] [CrossRef]
- Wang, C.Y.; Yeh, I.H.; Liao, H.y. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information; Springer: Cham, Switzerland, 2024; pp. 1–24. [Google Scholar] [CrossRef]
- Foti, P.; Palmucci, S.; Vizzini, I.; Libertini, N.; Coronella, M.; Spadola, S.; Caltabiano, R.; Iraci Sareri, M.; Basile, A.; Milone, P.; et al. Endometriosis: clinical features, MR imaging findings and pathologic correlation. Insights into Imaging 2018, 9. [Google Scholar] [CrossRef]
- Sivajohan, B.; Elgendi, M.; Menon, C.; Allaire, C.; Yong, P.; Bedaiwy, M. Clinical use of artificial intelligence in endometriosis: a scoping review. npj Digital Medicine 2022, 5, 109. [Google Scholar] [CrossRef] [PubMed]
- Nifora, C.; Chasapi, M.K.; Chasapi, L.; Koutsojannis, C. Deep Learning Improves Accuracy of Laparoscopic Imaging Classification for Endometriosis Diagnosis. Journal of Clinical and Medical Surgery 2024, 4, 1137–1145. [Google Scholar] [CrossRef]
- Leibetseder, A.; Schoeffmann, K.; Keckstein, J.; Keckstein, S. Endometriosis detection and localization in laparoscopic gynecology. Multimedia Tools and Applications 2022, 81. [Google Scholar] [CrossRef]
- Hong, W.; Kao, C.; Kuo, Y.; Wang, J.; Chang, W.; Shih, C. CholecSeg8k: A Semantic Segmentation Dataset for Laparoscopic Cholecystectomy Based on Cholec80. CoRR 2020, arXiv:2012.12453. [Google Scholar]
- Carstens, M.; Rinner, F.; Bodenstedt, S.; Jenke, A.; Weitz, J.; Distler, M.; Speidel, S.; Kolbinger, F. The Dresden Surgical Anatomy Dataset for Abdominal Organ Segmentation in Surgical Data Science. Scientific Data 2023, 10. [Google Scholar] [CrossRef]
- Leibetseder, A.; Kletz, S.; Schoeffmann, K.; Keckstein, S.; Keckstein, J. GLENDA: Gynecologic Laparoscopy Endometriosis Dataset; 2019; pp. 439–450. [CrossRef]
- Yoon, J.; Hong, S.; Hong, S.; Lee, J.; Shin, S.; Park, B.; Sung, N.; Yu, H.; Kim, S.; Park, S.; et al. Surgical Scene Segmentation Using Semantic Image Synthesis with a Virtual Surgery Environment; 2022; pp. 551–561. [CrossRef]
- Fujita, H.; Itagaki, M.; Hooi, Y.K.; Ichikawa, K.; Kawano, K.; Yamamoto, R. Detector Algorithms of Bounding Box and Segmentation Mask of a Mask R-CNN Model. arXiv 2020, arXiv:2010.13783. [Google Scholar]
- Figueiredo, R.B.D.; Mendes, H.A. Analyzing Information Leakage on Video Object Detection Datasets by Splitting Images Into Clusters With High Spatiotemporal Correlation. IEEE Access 2024, 12, 47646–47655. [Google Scholar] [CrossRef]
- Apicella, A.; Isgrò, F.; Prevete, R. Don’t Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer Learning. arXiv 2024, arXiv:2401.13796. [Google Scholar]
- Lin, T.; Maire, M.; Belongie, S.J.; Bourdev, L.D.; Girshick, R.B.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. CoRR 2014, arXiv:1405.0312. [Google Scholar]
- Ghiasi, G.; Cui, Y.; Srinivas, A.; Qian, R.; Lin, T.Y.; Cubuk, E.D.; Le, Q.V.; Zoph, B. Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020; pp. 2917–2927. [Google Scholar]
- Zhang, H.; Cissé, M.; Dauphin, Y.; Lopez-Paz, D. mixup: Beyond Empirical Risk Minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. CoRR 2014, arXiv:1412.6980. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
| Resolution | Number of Videos |
|---|---|
| 1920 × 1080 | 193 |
| 1280 × 720 | 5 |
| 720 × 576 | 1 |
| Class Name | Class ID | Number of Annotated Objects |
|---|---|---|
| Adhesions Dense | 0 | 1,424 |
| Adhesions Filmy | 1 | 537 |
| Deep Endometriosis | 2 | 700 |
| Ovarian Chocolate Fluid | 3 | 223 |
| Ovarian Endometrioma | 4 | 302 |
| Ovarian Endometrioma[B] | 4 | 382 |
| Superficial Black | 5 | 835 |
| Superficial Red | 6 | 642 |
| Superficial Subtle | 7 | 509 |
| Superficial White | 8 | 463 |
| Model / Split | Precision | Recall | mAP50 | mAP50-95 | Fitness |
|---|---|---|---|---|---|
| FasterRCNN / Stratified | 0.9811±0.0084 | 0.7083±0.0807 | 0.8185±0.0562 | 0.7345±0.0554 | 0.7429±0.0555 |
| FasterRCNN / Non-Stratified | 0.9787±0.0107 | 0.7076±0.0957 | 0.8162±0.0647 | 0.7309±0.0612 | 0.7395±0.0615 |
| YOLOv9 / Stratified | 0.5504±0.1864 | 0.3580±0.2701 | 0.4599±0.2503 | 0.2767±0.1877 | 0.2951±0.1939 |
| YOLOv9 / Non-Stratified | 0.6458±0.1662 | 0.4742±0.2193 | 0.5771±0.2113 | 0.3622±0.1656 | 0.3837±0.1701 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).