Preprint
Article

This version is not peer-reviewed.

PolyDeep Advance 1: Clinical Validation of a Computer-Aided Detection System for Colorectal Polyp Detection with a Second Observer Design

A peer-reviewed article of this preprint also exists.

Submitted:

23 December 2024

Posted:

25 December 2024

You are already at the latest version

Abstract

Background: PolyDeep is a computer-aided detection and characterization system that has demonstrated a high diagnostic yield for in vitro detection of colorectal polyps. Our objective is to compare the diagnostic performance of expert endoscopists and PolyDeep for colorectal polyp detection. Methods: PolyDeep Advance 1 (NCT05514301) is an unicentric diagnostic test study with a second observer design. Endoscopists performed the colonoscopy blinded to PolyDeep. The main endpoint was the sensitivity for colorectal polyp (adenoma, serrated or hyperplastic lesion) detection. The secondary endpoints were the diagnostic performance for diminutive lesions (≤5 mm), neoplasia (adenoma, serrated lesion) and adenoma detection. Results: We included 205 patients (55.1% male, 63.0 ± 6.2 years of age) referred to colonoscopy (positive faecal immunochemical occult blood test = 60.5%, surveillance colonoscopy = 39.5%). We excluded 8 patients due to incomplete colonoscopy. Endoscopists detected 384 lesions, of which 39 were not detected by PolyDeep. In contrast, PolyDeep predicted 410 possible additional lesions, 26 of these predictions confirmed by endoscopists as lesions, resulting in a potential 6.8% detection increase with respect to the 384 lesions detected by the endoscopists. In total, 410 lesions were detected, 20 were not retrieved, 5 were colorectal adenocarcinoma, 343 were colorectal polyps (231 adenomas, 39 serrated and 73 hyperplastic polyps), 42 were normal mucosa and 289 were ≤5mm. We did not find statistically significant differences between endoscopists and PolyDeep for colorectal polyp detection (Sensitivity = 94.2%, 91.5%, p = 0.2; Specificity = 9.5%, 14.3%, p = 0.7), diminutive lesions (Sensitivity = 92.3%,89.5%, p = 0.4; Specificity = 9.8%, 14.6%, p = 0.7), neoplasia (Sensitivity = 95.2%, 92.9%, p = 0.3; Specificity = 9.6%, 13.9%, p = 0.4) and adenoma detection (Sensitivity = 94.4%, 92.6%, p = 0.5; Specificity = 7.2%, 11.8%, p = 0.2). Conclusion: Expert endoscopists and PolyDeep have similar diagnostic performance for colorectal polyp detection.

Keywords: 
;  ;  ;  ;  ;  ;  ;  

1. Introduction

Colorectal cancer (CRC) is the third most frequent cancer and the second leading cause of cancer-related mortality worldwide [1,2]. The development of precursor lesions, including adenomas and serrated lesions, plays a critical role in its progression [3]. In recent years, public health systems have implemented CRC screening programmes for average-risk individuals aged 50–75 years to reduce both its incidence and mortality [1,2,3,4]. However, the quality of colonoscopy - affected by inadequate bowel preparation, insufficient mucosal exposure, or endoscopist fatigue - can lead to missed precursor lesions and the subsequent development of post-colonoscopy CRC [5].
Artificial intelligence (AI) is a research field focused on developing machines able of performing tasks that typically require human intelligence. One of its most significant subfields is Machine Learning (ML), where AI models learn to execute tasks for which they were not explicitly programmed [2]. Within ML, Deep Learning (DL) represents a specialised subset of ML techniques, that relies on large artificial neural networks, usually with a high number of layers, enabling the automatic learning of complex features [6]. The success of DL techniques in recent years, surpassing the performance of classical ML methods, has triggered a renewed surge of interest in AI. In the medical field, this is reflected in the growing number of AI-based medical devices authorised by the FDA [7]. Notably, more than 770 devices have been approved in the past decade, a stark increase compared to the 27 approvals in previous years [7]. Among the most impactful AI-based medical devices, particularly in colonoscopy, are Computer-Aided Diagnosis (CAD) systems. These systems support medical professionals during procedures to enhance decision-making by extracting relevant features from colorectal lesion images, enabling both detection (CADe) and characterisation (CADx) [2,4,8,9]. The worldwide integration of CAD systems into medical image analysis represents a revolution in routine clinical practice. In colonoscopy, CADe has been shown to improve procedure quality by enhancing key performance indicators such as the adenoma detection rate (ADR), polyp detection rate (PDR), and adenoma miss rate (AMR) [10]. A common study design to evaluate the impact of CADe on polyp detection is based on a second observer, with the endoscopist blinded to CADe outputs [11,12,13].
PolyDeep is a DL-based CADe/x system designed to detect and characterise colorectal lesions [14,15,16,17,18]. The system integrates a YOLOv3 detection network, coupled with an object tracking algorithm, and a ResNet50 classification network [14,17,18]. PolyDeep was developed using transfer learning techniques, fine-tuning models (ResNet50 based on ImageNet and YOLOv3 on PASCAL VOC) with still images of colorectal lesions extracted from colonoscopy videos [14,17,18]. As part of the development of PolyDeep, we created PIBAdb (Polyp Image BAnk database), a comprehensive database of images and videos of colorectal lesions manually annotated by expert endoscopists, which is currently available in the biobank of the Instituto de Investigación Sanitaria Galicia-Sur [19]. Prior to clinical validation, we enhanced the detection and characterisation models by incorporating new datasets from the project and adjusting several configuration parameters to optimise the CADe/x system for a real colonoscopy procedure [16]. Specifically, the detection model was improved through fine-tuning with non-polyp images to reduce false positives (included in PIBAdb), while the characterisation model was retrained with new polyp images obtained during the clinical validation phase (although these new images are not already included in PIBAdb). The aim of the study is to evaluate and compare whether the diagnostic performance of PolyDeep is better than that of the expert endoscopists in detecting colorectal lesions in a population-based colorectal cancer screening program.

2. Materials and Methods

2.1. PolyDeep Advance 1: Study Design

PolyDeep Advance 1 (NCT05514301) is a single-centre, prospective diagnostic test study with a second observer design. The research protocol was approved by the local institutional review board (2022/067).

2.2. Inclusion and Exclusion Criteria

We enrolled patients referred for colonoscopy within the Galician CRC screening program in Ourense, Spain; either after a positive fecal immunochemical test (FIT) or as a surveillance colonoscopy after the resection of advanced colorectal lesions. All study participants provided written informed consent. We excluded individuals with a personal history of CRC, colonic resection, or any syndrome predisposing to CRC, such as hereditary syndromes or serrated polyposis syndrome. For the analysis, we excluded participants when cecal intubation was not achieved or with insufficient bowel cleansing (defined as Boston Bowel Preparation Scale < 2 in any segment or < 6 overall).

2.3. Clinical Setting

The endoscopists performed the colonoscopy in the conventional clinical setting, blinded to the CADe/x system. A second observer attended the colonoscopy viewing the PolyDeep images on a separate screen. When PolyDeep identified a potential lesion, it triggered a visual alert in the form of a red border around the screen, which was activated when the CADe consistently tracked one or more potential polyps (also indicated by bounding boxes) across several video frames. If PolyDeep identified a possible lesion that was not detected by the endoscopists, the second observer alerted the endoscopists for confirmation. All detections rejected by the endoscopists (i.e., bubbles, stool, normal mucosa and undefined artefact) were classified as false positives. Thus, the endoscopist was responsible for confirming the CADe detections. The second observer recorded the detections and characterisations made by both the endoscopists and PolyDeep in a log sheet form.
Additionally, the endoscopists classified lesions according to the Narrow Band Imaging International Colorectal Endoscopic (NICE) classification [20], categorising them as adenomas (NICE II) or non-adenomas (NICE I). PolyDeep is trained to detect and characterise colorectal lesions as neoplasms, including adenomas and serrated lesions, or non-neoplasms, such as hyperplastic lesions. During colonoscopy, the CADe/x system identified lesions by bounding boxes around them and characterised them as neoplasia if the probability assigned to the bounding box was greater than 50% [14,17,18].

2.4. Study Endpoints

The gold standard for the study was histology. Lesions without histology or cases of CRC were excluded from the analysis. The primary endpoint of the trial was the detection of colorectal polyps, including adenomas, serrated lesions, and hyperplastic lesions. The secondary endpoints included the detection of adenomas, serrated lesions (i.e., sessile serrated lesions or traditional serrated adenomas), neoplasia (i.e., adenomas or serrated lesions), diminutive (≤5mm) polyps and advanced colorectal polyps (i.e., adenomas or serrated lesions ≥ 10 mm, adenomas with villous histology or high-grade dysplasia, or serrated lesions with dysplasia).

2.5. Data Collection and Study Population

We designed the electronic case report form (eCRF) using the RedCap platform at the Galicia-Sur Health Research Institute (IIS Galicia-Sur) (https://redcap.tic1-iisgaliciasur.es/, accessed on 26/04/2024). The eCRF captured data on demographics, colonoscopy, and lesion characteristics (including location, size, and histology). Additionally, we included information on the optical diagnosis and the histology predicted by the endoscopists and the CADx.

2.6. Sample Size Calculation

We calculated the sample size based on the assumption of 90% sensitivity for the endoscopists and 95% sensitivity for PolyDeep, with a beta error of 20%, an alpha error of 5%, and an estimated 5% rate of incomplete colonoscopies [21]. Based on these assumptions, we determined a sample size of 205 patients, with a mean of 2.5 lesions detected per procedure, to evaluate a total of 487 lesions in the study.
2.7. Statistical Analysis
We performed a descriptive analysis of the study population. Qualitative variables are presented as absolute frequencies and percentages, while quantitative variables are presented as means and standard deviations. For the primary and secondary endpoints, we calculated the diagnostic performance metrics (i.e., sensitivity, specificity, positive and negative predictive values and odds ratio) for endoscopists and PolyDeep. The comparison between endoscopists and PolyDeep was performed using the McNemar test. In all categories, we conducted sub analyses by size (≤5 mm or >5 mm), morphology (protruded or non-protruded), and location (right colon or left colon). Additionally, the precision of endoscopists and PolyDeep in the characterisation of adenomatous and neoplastic lesions, respectively, was determined. We used the statistical software R version 4.3.0 (The R Foundation for Statistical Computing, Institute for Statistics and Mathematics, Vienna, Austria) for statistical analysis.

3. Results

3.1. Population Description and Main Endpoints Results

Between February 2023 and April 2023, we recruited 205 patients (55.1% male, 63 ± 6.2 years of age) who participated in the Galician population-based CRC screening program. We included patients who underwent colonoscopy after a positive FIT (60.5%) or as endoscopic surveillance (39.5%). We excluded 8 patients from the study: 7 due to insufficient bowel cleansing or no cecal intubation, and 1 due to a personal history of CRC and previous colonic resection.
Overall, the CADe system and endoscopists performed 794 detections, including 410 lesions and 384 false positives identified by PolyDeep. The endoscopists detected 384 lesions, of which PolyDeep missed 9.51%. PolyDeep detected 371 lesions, of which the endoscopists initially missed 6.34%. Using the endoscopists’ detections as the reference for lesion identification (without considering histological analysis), the sensitivity for endoscopists was 93.7% ((384/410) × 100), while the sensitivity for PolyDeep was 90.5% ((371/410) × 100).
Figure 1. Flowchart of the study.
Figure 1. Flowchart of the study.
Preprints 143849 g001
Table 1 presents the characteristics of the lesions according to the observer (both, endoscopists or PolyDeep). Overall, the histology of the identified polyps was adenomas (56.34%), serrated lesions (9.51%), hyperplastic lesions (17.80%), normal mucosa (10.24%), non-retrieved polyps (4.88%) and CRC (1.22%). Most of the lesions were small, located in the right colon, and exhibited a protruded morphology. Polyps detected during the study had a mean size of 5.1 ± 4.9 mm.

3.2. Diagnostic Performance for Polyp Detection

We excluded 25 lesions from the diagnostic performance analysis: 5 cases of CRCs and 20 lesions without histology. Table 2 shows the diagnostic performance of the endoscopists and PolyDeep for polyp detection. No statistically significant differences were found in sensitivity (94.2%, 91.5%, p = 0.2) or specificity (9.5%, 14.3%, p = 0.7) between endoscopists and PolyDeep. The 91.5% sensitivity achieved by the CADe system in the clinical evaluation surpasses the 89.91% (87.20%–91.94%) sensitivity obtained by the initial detection model in a video-based evaluation [17].
In the sub-analysis by size, we found no statistically significant differences between the endoscopists and PolyDeep. Regarding lesion morphology, although diagnostic performance did not differ in protruded lesions, endoscopists demonstrated higher sensitivity than PolyDeep in the detection of non-protruded lesions (94.5% vs. 82.4%, p < 0.05), with no differences in specificity (11.8% vs. 17.6%, p = 1.0).
In the secondary analysis, we compared the diagnostic performance of endoscopists and PolyDeep according to the lesion histology. As shown in Table 3, no differences in sensitivity or specificity were found in any of the categories evaluated. Data on the diagnostic odds ratio, as well as the positive and negative predictive values, are also presented in Table 3.

3.3. Diagnostic Performance for Polyp Characterization

We calculated the diagnostic performance of PolyDeep and endoscopists for the characterization of neoplastic/non-neoplastic and adenomatous/non-adenomatous lesions when histology was available (n = 385). The endoscopists provided an optical diagnosis for all 385 lesions (100%), classifying 149 as NICE I (38.2%), 236 as NICE II (60.5%), and 5 as NICE III (1.3%). In comparison, PolyDeep achieved an optical diagnosis for 329 lesions (85.5%), classifying 259 (78.7%) as neoplastic (with a probability >50%) and 70 (21.3%) as non-neoplastic (with a probability ≤50%).
For adenoma histology prediction (NICE II), the sensitivity and specificity of endoscopists were 76.6% (95% CI 70.5–81.8%) and 61.7% (95% CI 53.5–69.3%), respectively. In contrast, for neoplastic histology prediction, PolyDeep exhibited a sensitivity of 82.6% (95% CI 77.1–87.1%) and a specificity of 32.2% (95% CI 22.8–43.2%).

4. Discussion and Conclusions

PolyDeep is a CADe/x system that has demonstrated its ability to detect and characterize polyps in an in vitro study [16]. In this prospective diagnostic test study, PolyDeep increases the number of lesions detected during colonoscopy. When compared PolyDeep to blinded expert endoscopists, with histology as the gold standard, we found no significant differences in the detection of polyps, adenomas, neoplasia, or advanced colorectal lesions. These findings suggest the potential of PolyDeep as a clinical support tool for colonoscopy procedures.
For the clinical validation of PolyDeep, we adopted a widely validated study design [11,12,13,22,23], which enabled us to evaluate the impact on endoscopic diagnostic performance. Our results are comparable to other studies that used a second observer design [12,13,23]. The use of CADe systems in colonoscopy improves quality indicators (such as ADR, PDR, or AMR) by increasing lesion detection. In our study, the simultaneous detection of lesions by endoscopists and PolyDeep was lower than in a pilot study, where endoscopists and the CAD-ARTIPOD detected more lesions (84.1% vs. 94.7%) [13]. However, PolyDeep increased the percentage of lesions detected by 6.8%, which is higher than other studies with similar designs, that reported a 4% increase [12]. This increase in polyp detection can directly improve the quality of colonoscopy. Thus, using second observers (e.g., trainees or experienced nurses) or CADe systems (acting as standardized second observers) could potentially lead to improvements in ADR and PDR [11,22,24,25]. Although some meta-analyses highlight no statistically significant differences between the use of CADe systems and second observers in terms of ADR or adenomas per colonoscopy (APC) [22], evaluating the effect on quality indicators was beyond the scope of our study. Two ongoing randomized controlled trials (PolyDeep Advance 2, NCT05512793, and PolyDeep Advance 3, NCT05513261) aim to evaluate AMR and ADR.
Our study was conducted within the context of CRC screening colonoscopies. This setting may limit some of our findings, as screening endoscopists are highly experienced in detecting small lesions [26]. Despite the similarities with other studies, the sensitivity of endoscopists and PolyDeep in our study (94.2% and 91.5%) was lower than that observed in a pilot study, where expert endoscopists and the CAD-ARTIPOD system achieved sensitivities of 98.2% and 96.5%, respectively [13]. The CAD-ARTIPOD study, a multicentric trial using the same CADe system as the pilot study, reported a sensitivity of 94.6% for endoscopists and 96.1% for CAD-ARTIPOD using the endoscopists identification as reference standard and not considering the histological analysis [12,13,23]. In our study, the sensitivity of endoscopists (93.7%) and PolyDeep (90.5%) were comparable to those observed in the CAD-ARTIPOD study not considering histological analysis [12,13,23]. When we considered histology as the gold standard, the diagnostic performance (sensitivity) of endoscopists and CAD-ARTIPOD (94.9% vs. 96.0%) was superior to that of endoscopists and PolyDeep (94.2% vs. 91.5%) [12,23]. We evaluated the diagnostic performance of endoscopists and PolyDeep separately; however, in real clinical practice, both work in tandem (i.e., endoscopists assisted by CADe/x systems). Several studies compared the diagnostic performance of conventional colonoscopy to CADe-assisted colonoscopy [27,28], highlighting the importance of optimizing collaboration between endoscopists and AI systems to achieve maximum diagnostic performance [29].
We also evaluated the diagnostic performance of endoscopists and PolyDeep for the optical diagnosis of colorectal lesions during colonoscopy. In the clinical validation of the POLAR system, the optical diagnosis of diminutive colorectal lesions was assessed [30]. Although this was not the primary objective of our study, both endoscopists and PolyDeep performed optical diagnoses of diminutive lesions, with the mean polyp size in our study being 5.1 ± 4.9 mm [30]. In the POLAR study, lesions were classified as neoplastic or non-neoplastic, which did not allow a direct comparison of results since our endoscopists used the NICE classification. Nonetheless, the sensitivity achieved by endoscopists and the POLAR system (92.4% and 89.4%) was higher than that obtained in our study by endoscopists and PolyDeep (76.6% and 82.6%), although the results across both studies remain comparable for endoscopists and CADx systems [30]. In another study employing a similar setting, where endoscopists could not see the AI system’s output and a second observer managed the system for optical diagnosis of neoplastic and non-neoplastic lesions, sensitivities of 95.5% for less-experienced endoscopists and 90.8% for experienced endoscopists were reported [31]. Although we did not evaluate less-experienced endoscopists, the sensitivity of the experienced endoscopists in our study (76.6%) was lower than the 90.8% reported for experienced endoscopists and significantly inferior to the 95.5% observed for non-experts [31]. Meanwhile, the AI system in that study achieved a sensitivity of 89.7% for characterizing neoplastic and non-neoplastic lesions, while PolyDeep demonstrated a sensitivity of 91.5% in our study being higher and comparable [31]. These findings suggest that the performance of less-experienced endoscopists improves with the assistance of AI systems in optical diagnoses, likely driven by a sense of competition with the AI system. This evidence highlights the potential value of such systems as training tools for novice endoscopists. However, this competitive advantage did not appear to apply when the same technology is used by experienced endoscopists.
During the development of the study, we found that the optical diagnosis made by PolyDeep was sometimes suboptimal. In this regard, the percentage of neoplasia displayed by PolyDeep on-screen fluctuates in a wide range, making it difficult to assess which is the real optical diagnosis of the CADx system. For this reason, we explored different alternatives to first improve the optical diagnosis and second ease the optical diagnosis task for endoscopists in the next randomized controlled trials (i.e., PolyDeep advance 2 and PolyDeep advance 3). This improvement involves the inclusion of specific symbols (i.e., loupe, waves as two brackets and hourglass) within the bounding box that show the percentage of neoplasia for optical diagnosis. When the percentage is displayed and the loupe icon appears, it means that the endoscopist needs to be closer to the polyp to obtain a more stable diagnosis. The wave symbol is displayed when the polyp is in focus but there is excessive movement, preventing the CADx system from providing a stable diagnosis. Finally, the hourglass symbol informs the endoscopists to wait to allow the CADx system to process enough frames to provide a stable diagnosis.
Our study has several strengths: first, it was conducted in a real clinical setting, providing a valuable assessment of PolyDeep’s performance in real colonoscopy procedures. Second, it is innovative because the expert endoscopists were blinded to the CADe/x system’s output. Third, the diagnostic performance obtained for polyp detection is consistent with literature reports. Fourth, PolyDeep was built with a YOLOv3 detection network coupled with an object tracking algorithm, which allows linking the independent predictions done by YOLOv3 on each frame to specific polyps. By doing this, we can present the neoplasia probability of each polyp as the average of the probabilities calculated for that polyp during a pre-specified number of frames (50 frames in the final setting).
On the other hand, our study has several limitations: first, while the study was not fully blinded to the endoscopists, as the PolyDeep image was visible on a separate screen at the back of the endoscopists in the room. The screen was not in the endoscopists’ direct line of sight, but this could have introduced cognitive bias due to the presence of a second observer and the CADe/x system monitoring the procedure. Potentially this could lead to an artificial improvement in the endoscopists’ diagnostic performance. Second, the study did not evaluate the impact of PolyDeep on colonoscopy quality indicators or relevant clinical outcomes, such as post-colonoscopy CRC which is beyond of our scope. Third, despite updates made prior to clinical validation, PolyDeep exhibited a high false-positive rate, which led to minor disruptions in the colonoscopy procedure. However, previous studies with other CADe systems, such as GI Genius CADe v2, reported only 1–2 false alarms per patient, which did not affect withdrawal time [32]. In our study, we reported 1.95 false alarms per patient. Finally, the endoscopists classified lesions based on the NICE classification (i.e., they classified adenomas) and PolyDeep classified them based on the probability of detecting neoplasia. So, we could not compare the diagnostic performance for optical diagnosis of Polydeep and the endoscopists.
In summary, our study demonstrated similar results to those found in other studies. Both endoscopists and PolyDeep showed comparable diagnostic performance for the detection of colorectal lesions. For optical diagnosis, both agents showed moderate performance. These findings highlight the potential of PolyDeep as an effective support tool in colonoscopy procedures.

Author Contributions

Conceptualization, Pedro Davila-Piñón, Astrid Irene Díez-Martín, Jesús Herrero, Manuel Puga, Laura Rivas, Eloy Sánchez, Rubén Domínguez-Carbajales, Florentino Fdez-Riverola, Alba Nogueira-Rodríguez, Daniel Glez-Peña, Miguel Reboiro-Jato, Hugo López-Fernández and Joaquín Cubiella; Data curation, Pedro Davila-Piñón, Teresa Pedrido, Astrid Irene Díez-Martín, Jesús Herrero, Manuel Puga, Laura Rivas, Eloy Sánchez, Florentino Fdez-Riverola, Alba Nogueira-Rodríguez, Daniel Glez-Peña, Miguel Reboiro-Jato, Hugo López-Fernández and Joaquín Cubiella; Formal analysis, Pedro Davila-Piñón, Teresa Pedrido and Joaquín Cubiella; Funding acquisition, Daniel Glez-Peña, Miguel Reboiro-Jato and Joaquín Cubiella; Investigation, Pedro Davila-Piñón, Teresa Pedrido, Astrid Irene Díez-Martín, Jesús Herrero, Manuel Puga, Laura Rivas, Eloy Sánchez, Sara Zarraquiños, Noel Pin, Pablo Vega, Santiago Soto, David Remedios, Rubén Domínguez-Carbajales, Florentino Fdez-Riverola, Alba Nogueira-Rodríguez, Daniel Glez-Peña, Miguel Reboiro-Jato, Hugo López-Fernández and Joaquín Cubiella; Methodology, Pedro Davila-Piñón, Teresa Pedrido, Astrid Irene Díez-Martín, Jesús Herrero, Manuel Puga, Laura Rivas, Eloy Sánchez, Sara Zarraquiños, Noel Pin, Pablo Vega, Santiago Soto, David Remedios, Rubén Domínguez-Carbajales, Florentino Fdez-Riverola, Alba Nogueira-Rodríguez, Daniel Glez-Peña, Miguel Reboiro-Jato, Hugo López-Fernández and Joaquín Cubiella; Project administration, Daniel Glez-Peña, Miguel Reboiro-Jato and Joaquín Cubiella; Resources, Pedro Davila-Piñón, Astrid Irene Díez-Martín, Jesús Herrero, Manuel Puga, Laura Rivas, Eloy Sánchez, Sara Zarraquiños, Noel Pin, Pablo Vega, Santiago Soto, David Remedios, Rubén Domínguez-Carbajales, Florentino Fdez-Riverola, Alba Nogueira-Rodríguez, Daniel Glez-Peña, Miguel Reboiro-Jato, Hugo López-Fernández and Joaquín Cubiella; Software, Florentino Fdez-Riverola, Alba Nogueira-Rodríguez, Daniel Glez-Peña, Miguel Reboiro-Jato and Hugo López-Fernández; Supervision, Florentino Fdez-Riverola, Alba Nogueira-Rodríguez, Daniel Glez-Peña, Miguel Reboiro-Jato, Hugo López-Fernández and Joaquín Cubiella; Validation, Pedro Davila-Piñón, Jesús Herrero, Manuel Puga, Laura Rivas, Eloy Sánchez, Sara Zarraquiños, Noel Pin, Pablo Vega, Santiago Soto, David Remedios, Rubén Domínguez-Carbajales, Florentino Fdez-Riverola, Alba Nogueira-Rodríguez, Daniel Glez-Peña, Miguel Reboiro-Jato, Hugo López-Fernández and Joaquín Cubiella; Visualization, Pedro Davila-Piñón, Florentino Fdez-Riverola, Alba Nogueira-Rodríguez, Daniel Glez-Peña, Miguel Reboiro-Jato, Hugo López-Fernández and Joaquín Cubiella; Writing – original draft, Pedro Davila-Piñón, Teresa Pedrido, Astrid Irene Díez-Martín, Jesús Herrero, Manuel Puga, Laura Rivas, Eloy Sánchez, Sara Zarraquiños, Noel Pin, Pablo Vega, Santiago Soto, David Remedios, Rubén Domínguez-Carbajales, Florentino Fdez-Riverola, Alba Nogueira-Rodríguez, Daniel Glez-Peña, Miguel Reboiro-Jato, Hugo López-Fernández and Joaquín Cubiella; Writing – review & editing, Pedro Davila-Piñón, Teresa Pedrido, Astrid Irene Díez-Martín, Jesús Herrero, Manuel Puga, Laura Rivas, Eloy Sánchez, Sara Zarraquiños, Noel Pin, Pablo Vega, Santiago Soto, David Remedios, Rubén Domínguez-Carbajales, Florentino Fdez-Riverola, Alba Nogueira-Rodríguez, Daniel Glez-Peña, Miguel Reboiro-Jato, Hugo López-Fernández and Joaquín Cubiella.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This publication is part of the DPI2017-87494-R project, funded by MICIU/AEI/10.13039/501100011033 and by “ERDF A way of making Europe”, and part of the PDC2021-121644-I00 project, funded by MICIU/AEI/10.13039/501100011033 and by the “European Union NextGenerationEU/PRTR”. This research also received funding from the Instituto de Salud Carlos III, Madrid, Spain [PI21/01771, CD22/00087 and INT22/00009, FI22/00203], and the Consellería de Educación, Universidades e Formación Profesional (Xunta de Galicia) (ED431G 2019/06, ED431C 2022/03-GRC and ED481B-2023-005). These grants are partially financed by “ERDF A way of making Europe”. The research also obtained the Grant of Oncology-Tamarite 2022 from the Spanish Association of Gastroenterology.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of Pontevedra-Ourense-Vigo with the code (2022/067)

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Lopes, S.R.; Martins, C.; Santos, I.C.; Teixeira, M.; Gamito. ; Alves, A.L. Colorectal cancer screening: A review of current knowledge and progress in research. World J. Gastrointest. Oncol. 2024, 16, 1119–1133. [Google Scholar] [CrossRef]
  2. Kudo, S.E.; Mori, Y.; Abdel-Aal, U.M.; Misawa, M.; Itoh, H.; Oda, M.; Mori, K. Artificial Intelligence and Computer-Aided Diagnosis for Colonoscopy: Where Do We Stand Now? Transl Gastroenterol Hepatol 2021, 6. [Google Scholar] [CrossRef] [PubMed]
  3. Zhao, L.; Wang, N.; Zhu, X.; Wu, Z.; Shen, A.; Zhang, L.; Wang, R.; Wang, D.; Zhang, S. Establishment and validation of an artificial intelligence-based model for real-time detection and classification of colorectal adenoma. Sci. Rep. 2024, 14, 1–13. [Google Scholar] [CrossRef]
  4. Young, E.; Edwards, L.; Singh, R. The Role of Artificial Intelligence in Colorectal Cancer Screening: Lesion Detection and Lesion Characterization. Cancers 2023, 15, 5126. [Google Scholar] [CrossRef] [PubMed]
  5. Maas, M.H.J.; Rath, T.; Spada, C.; Soons, E.; Forbes, N.; Kashin, S.; Cesaro, P.; Eickhoff, A.; Vanbiervliet, G.; Salvi, D.; et al. A computer-aided detection system in the everyday setting of diagnostic, screening, and surveillance colonoscopy: an international, randomized trial. Endoscopy 2024, 56, 843–850. [Google Scholar] [CrossRef]
  6. Lecun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  7. U.S. Food and Drug administration Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices.
  8. Liang, F.; Wang, S.; Zhang, K.; Liu, T.-J.; Li, J.-N. Development of artificial intelligence technology in diagnosis, treatment, and prognosis of colorectal cancer. World J. Gastrointest. Oncol. 2022, 14, 124–152. [Google Scholar] [CrossRef] [PubMed]
  9. Kamitani, Y.; Nonaka, K.; Isomoto, H. Current Status and Future Perspectives of Artificial Intelligence in Colonoscopy. J. Clin. Med. 2022, 11, 2923. [Google Scholar] [CrossRef]
  10. Zhao, S.; Wang, S.; Pan, P.; Xia, T.; Chang, X.; Yang, X.; Guo, L.; Meng, Q.; Yang, F.; Qian, W.; et al. Magnitude, Risk Factors, and Factors Associated With Adenoma Miss Rate of Tandem Colonoscopy: A Systematic Review and Meta-analysis. Gastroenterology 2019, 156, 1661–1674.e11. [Google Scholar] [CrossRef] [PubMed]
  11. Wang, P.; Liu, X.-G.; Kang, M.; Peng, X.; Shu, M.-L.; Zhou, G.-Y.; Liu, P.-X.; Xiong, F.; Deng, M.-M.; Xia, H.-F.; et al. Artificial intelligence empowers the second-observer strategy for colonoscopy: a randomized clinical trial. Gastroenterol. Rep. 2022, 11, goac081. [Google Scholar] [CrossRef]
  12. Sinonquel, P.; Eelbode, T.; Pech, O.; De Wulf, D.; Dewint, P.; Neumann, H.; Antonelli, G.; Iacopini, F.; Tate, D.; Lemmers, A.; et al. Clinical consequences of computer-aided colorectal polyp detection. Gut 2024. [Google Scholar] [CrossRef] [PubMed]
  13. Sinonquel, P.; Eelbode, T.; Hassan, C.; Antonelli, G.; Filosofi, F.; Neumann, H.; Demedts, I.; Roelandt, P.; Maes, F.; Bisschops, R. Real-time unblinding for validation of a new CADe tool for colorectal polyp detection. Gut 2020, 70, 641–643. [Google Scholar] [CrossRef] [PubMed]
  14. Nogueira-Rodríguez, A.; Glez-Peña, D.; Reboiro-Jato, M.; López-Fernández, H. Negative Samples for Improving Object Detection—A Case Study in AI-Assisted Colonoscopy for Polyp Detection. Diagnostics 2023, 13, 966. [Google Scholar] [CrossRef] [PubMed]
  15. Nogueira-Rodríguez, A.; Reboiro-Jato, M.; Glez-Peña, D.; López-Fernández, H. Performance of Convolutional Neural Networks for Polyp Localization on Public Colonoscopy Image Datasets. Diagnostics 2022, 12, 898. [Google Scholar] [CrossRef]
  16. Davila-Piñón, P.; Nogueira-Rodríguez, A.; Díez-Martín, A.I.; Codesido, L.; Herrero, J.; Puga, M.; Rivas, L.; Sánchez, E.; Fdez-Riverola, F.; Glez-Peña, D.; et al. Optical diagnosis in still images of colorectal polyps: comparison between expert endoscopists and PolyDeep, a Computer-Aided Diagnosis system. Front. Oncol. 2024, 14, 1393815. [Google Scholar] [CrossRef] [PubMed]
  17. Nogueira-Rodríguez, A.; Domínguez-Carbajales, R.; Campos-Tato, F.; Herrero, J.; Puga, M.; Remedios, D.; Rivas, L.; Sánchez, E.; Iglesias. ; Cubiella, J.; et al. Real-time polyp detection model using convolutional neural networks. Neural Comput. Appl. 2021, 34, 10375–10396. [Google Scholar] [CrossRef]
  18. Nogueira Rodríguez, A.; Daniel González, D.; Hugo López Fernández, P.D.
  19. PolyDeep Research Consortium. Colorectal Polyp Image Cohort (PIBAdb). Available online at: https://www.iisgaliciasur.es/home/biobanco/cohorte-pibadb/.
  20. Rex, D.K. Narrow-Band Imaging Without Optical Magnification for Histologic Analysis of Colorectal Polyps. Gastroenterology 2009, 136, 1174–1181. [Google Scholar] [CrossRef] [PubMed]
  21. Parsa, N.; Rex, D.K.; Byrne, M.F. Colorectal Polyp Characterization with Standard Endoscopy: Will Artificial Intelligence Succeed Where Human Eyes Failed? Best Pract Res Clin Gastroenterol 2021, 52–53. [Google Scholar] [CrossRef]
  22. Gangwani, M.K.; Haghbin, H.; Ishtiaq, R.; Hasan, F.; Dillard, J.; Jaber, F.; Dahiya, D.S.; Ali, H.; Salim, S.; Lee-Smith, W.; et al. Single Versus Second Observer vs Artificial Intelligence to Increase the ADENOMA Detection Rate of Colonoscopy—A Network Analysis. Dig. Dis. Sci. 2024, 69, 1380–1388. [Google Scholar] [CrossRef]
  23. Sinonquel, P.; Eelbode, T.; Pech, O.; De Wulf, D.; Dewint, P.; Neumann, H.; Antonelli, G.; Tate, D.; Lemmers, A.; Pilonis, N.; et al. CLINICAL VALIDATION OF A COMPUTER-AIDED DETECTION MODEL FOR COLORECTAL POLYP DETECTION (CAD-ARTIPOD) TRIAL USING A SECOND OBSERVER AND REAL-TIME UNBLINDING. Gastrointest. Endosc. 2023, 97, AB712. [Google Scholar] [CrossRef]
  24. Lee, C.K.; Park, D.I.; Lee, S.-H.; Hwangbo, Y.; Eun, C.S.; Han, D.S.; Cha, J.M.; Lee, B.-I.; Shin, J.E. Participation by experienced endoscopy nurses increases the detection rate of colon polyps during a screening colonoscopy: a multicenter, prospective, randomized study. Gastrointest. Endosc. 2011, 74, 1094–1102. [Google Scholar] [CrossRef] [PubMed]
  25. Buchner, A.M.; Shahid, M.W.; Heckman, M.G.; Diehl, N.N.; McNeil, R.B.; Cleveland, P.; Gill, K.R.; Schore, A.; Ghabril, M.; Raimondo, M.; et al. Trainee participation is associated with increased small adenoma detection. Gastrointest. Endosc. 2011, 73, 1223–1231. [Google Scholar] [CrossRef] [PubMed]
  26. Mangas-Sanjuan, C.; De-Castro, L.; Cubiella, J.; Díez-Redondo, P.; Suárez, A.; Pellisé, M.; Fernández, N.; Zarraquiños, S.; Núñez-Rodríguez, H.; Álvarez-García, V.; et al. Role of Artificial Intelligence in Colonoscopy Detection of Advanced Neoplasias. Ann. Intern. Med. 2023, 176, 1145–1152. [Google Scholar] [CrossRef] [PubMed]
  27. Zhang, H.; Wu, Q.; Sun, J.; Wang, J.; Zhou, L.; Cai, W.; Zou, D. A computer-aided system improves the performance of endoscopists in detecting colorectal polyps: a multi-center, randomized controlled trial. Front. Med. 2024, 10, 1341259. [Google Scholar] [CrossRef]
  28. Maas, ( M H J; Soons, E. ; Lebwohl, B.; Lewis, S.K., Ngamruengphong, S., Maas, M.H.J., Neumann, H., Shirin, H., Katz, L.H., Benson, A.A., Eds.; et al. A Computer-Aided Polyp Detection System in Screening and Surveillance Colonoscopy: An International, Multicentre, Randomised, Tandem Trial; 2024. [Google Scholar]
  29. Mori, Y.; Jin, E.H.; Lee, D. Enhancing artificial intelligence-doctor collaboration for computer-aided diagnosis in colonoscopy through improved digital literacy. Dig. Liver Dis. 2023, 56, 1140–1143. [Google Scholar] [CrossRef]
  30. Houwen, B.B.S.L.; Hazewinkel, Y.; Giotis, I.; Vleugels, J.L.A.; Mostafavi, N.S.; van Putten, P.; Fockens, P.; Dekker, E. ; POLAR Study Group Computer-aided diagnosis for optical diagnosis of diminutive colorectal polyps including sessile serrated lesions: a real-time comparison with screening endoscopists. Endoscopy 2023, 55, 756–765. [Google Scholar] [CrossRef]
  31. Baumer, S.; Streicher, K.; Alqahtani, S.A.; Brookman-Amissah, D.; Brunner, M.; Federle, C.; Muehlenberg, K.; Pfeifer, L.; Salzberger, A.; Schorr, W.; et al. Accuracy of polyp characterization by artificial intelligence and endoscopists: a prospective, non-randomized study in a tertiary endoscopy center. Endosc. Int. Open 2023, 11, E818–E828. [Google Scholar] [CrossRef] [PubMed]
  32. Cherubini, A.; Dinh, N.N. A Review of the Technology, Training, and Assessment Methods for the First Real-Time AI-Enhanced Medical Device for Endoscopy. Bioengineering 2023, 10, 404. [Google Scholar] [CrossRef]
Table 1. Characteristics of the lesions detected.
Table 1. Characteristics of the lesions detected.
Both1
N = 345 (%)
Only Endoscopists2
N = 39 (%)
Only PolyDeep3
N = 26 (%)
Lesion type
N = 410
Polyps (n = 343) 294 (85.72%) 29 (8.45%) 20 (5.83%)
Adenoma (n=231) 201 (87.01%) 17 (7.36%) 13 (5.63%)
Serrated lesions (n = 39) 37 (94.87%) 2 (5.13%) 0 (0%)
Hyperplastic lesions (n = 73) 56 (76.71%) 10 (13.70%) 7 (9.59%)
Normal mucosa (n = 42) 32 (76.19%) 6 (14.29%) 4 (9.52%)
Colorectal Cancer (n = 5) 5 (100 %) 0 (0%) 0 (0%)
Not histology (n = 20) 14 (70.00%) 4 (20.00%) 2 (10.00%)
Advanced lesions4
N = 385
Yes (n = 73) 71 (97.26%) 1 (1.37%) 1 (1.37%)
No (n = 312) 255 (81.73%) 34 (10.89%) 23 (7.37%)
Size
N = 385
≤5 mm (n = 289) 234 (80.97%) 32 (11.07%) 23 (7.96%)
>5 mm (n = 96) 92 (95.83%) 3 (3.13%) 1 (1.04%)
Location
N = 385
Right colon (n = 207)5 176 (85.02%) 20 (9.66%) 11 (5.32%)
Left colon (n = 178)6 150 (84.27%) 15 (8.43%) 13 (7.30%)
Morphology
N = 385
Pedunculated (n = 59) 57 (96.61%) 1 (1.70%) 1 (1.69%)
Sessile (n = 218) 187 (85.78%) 15 (6.88%) 16 (7.34%)
Slightly elevated (n = 84) 69 (82.14%) 12 (14.29%) 3 (3.57%)
Flat (n = 24) 13 (54.17%) 7 (29.17%) 4 (16.66%)
1Detections made by the endoscopists and PolyDeep at the same time. 2Detections only made by the endoscopists. 3Detections only made by PolyDeep. 4Adenoma ≥ 10 mm, high-grade dysplasia, tubulovillous or villous histology or serrated lesions with dysplasia or ≥ 10 mm. 5Polyps detected between the cecum and the splenic flexure. 6Polyps detected between the descendent colon and the rectum.
Table 2. Diagnostic performance of endoscopists and PolyDeep for polyp (adenoma, serrated or hyperplastic lesion) detection.
Table 2. Diagnostic performance of endoscopists and PolyDeep for polyp (adenoma, serrated or hyperplastic lesion) detection.
Sensitivity (%) (95% CI) Specificity (%) (95% CI) PPV (%) (95% CI) NPV (%) (95% CI) Odds Ratio (95% CI)
Endoscopists PolyDeep p1 Endoscopists PolyDeep p1 Endoscopists PolyDeep Endoscopists PolyDeep Endoscopists PolyDeep
Polyp 94.2 91.5 0.2 9.5 14.3 0.7 89.5 89.7 16.7 17.1 1.7 1.8
(n = 385) (91.0-96.3) (87.9-94.1) (3.1-23.6) (5.9-29.2) (85.7-92.3) (85.9-92.6) (5.5-38.2) (7.2-34.3) (0.4-5.45) (0.6-4.8)
Polyp ≤5mm 92.3 89.5 0.4 9.8 14.6 0.7 86.1 86.4 17.4 18.8 1.3 1.5
(n = 289) (88.1-95.2) (84.8-92.9) (3.1-24.1) (6.1-29.9) (81.2-89.9) (81.4-90.2) (5.7-39.5) (7.9-37.1) (0.3-4.2) (0.5-4.0)
Polyp >5mm 98.9 96.8 0.6 0.0 0.0 6 98.9 98.9 0.0 0.0 0.0 0.0
(n = 96) (93.4-99.9) (90.4-99.2) (10.8-94.5) (10.8-94.5) (93.4-99.9) (93.3-99.9) (10.8-94.5) (3.18-69.0) (0.0-3502.3) (0.0-1186.6)
Right colon2 95.0 91.1 0.2 7.1 14.3 0.7 86.7 87.2 18.2 20.0 1.4 1.7
(n = 207) (90.4-97.5) (85.7-94.6) (1.3-25.0) (4.7-33.6) (81.0-91.0) (81.3-91.4) (3.2-52.3) (6.6-44.3) (0.1-7.6) (0.4-5.9)
Left colon3 93.3 92.1 0.8 14.3 14.3 1.0 92.7 92.6 15.4 13.3 2.3 1.9
(n = 178) (88.0-96.4) (86.5-95.5) (2.5-43.9) (2.5-43.8) (87.4-96.0) (87.2-96.0) (2.7-46.3) (2.3-41.6) (0.2-12.6) (0.2-10.2)
Protruded lesions4 94.0 94.8 0.9 8.0 12.0 1.0 91.2 91.6 11.8 18.8 1.4 2.5
(n = 277) (90.2-96.5) (91.1-97.1) (1.4-27.5) (3.2-32.3) (86.9-94.2) (87.4-94.5) (2.1-37.8) (5.0-46.3) (0.1-6.5) (0.4-10.1)
Non-Protruded5 94.5 82.4 <0.05 11.8 17.6 1.0 85.2 84.3 28.6 15.8 2.3 1.0
(n = 108) (87.1-98.0) (72.7-89.3) (2.1-37.8) (4.7-44.2) (76.4-91.2) (74.7-90.8) (5.1-69.7) (4.2-40.5) (0.2-15.5) (0.2-4.2)
1Differences are compared using the McNemar. A p-value < 0.05 is considered statistically significant. 2The right colon corresponds to cecum, ascendent colon, hepatic flexure, transverse colon, splenic flexure. 3The left colon includes descendent colon, sigmoid colon and rectum. 4Protruded lesions includes all the lesions with morphology pedunculated and sessile. 5Non-protruded lesions include lesions slightly elevated, flat and depressed lesions. 6Cells without value mean that we cannot calculate this value. Abbreviations: PPV, positive predictive value; NPV, negative predictive value; CI, confidence interval.
Table 3. Diagnostic performance for polyp detection for PolyDeep and endoscopists according to histology.
Table 3. Diagnostic performance for polyp detection for PolyDeep and endoscopists according to histology.
Sensitivity (%)
(95% CI)
Specificity (%)
(95% CI)
PPV (%)
(95% CI)
NPV (%)
(95% CI)
Odds Ratio
(95% CI)
Endoscopists PolyDeep p1 Endoscopists PolyDeep p1 Endoscopists PolyDeep Endoscopists PolyDeep Endoscopists PolyDeep
Neoplasia2 95.2 92.9 0.3 9.6 13.9 0.4 71.2 71.7 45.7 45.7 2.1 2.1
(n = 385) (91.7-97.3) (89.0-95.6) (5.1-16.8) (8.4-21.9) (66.1-75.7) (66.6-76.3) (26.2-66.8) (29.2-63.1) (0.8-5.2) (1.0-4.6)
Adenoma 94.4 92.6 0.5 7.1 11.7 0.2 60.4 61.1 45.8 51.4 1.3 1.7
(n = 385) (94.3-96.8) (88.3-95.5) (3.8-12.7) (7.3-18.1) (55.1-65.4) (55.8-66.2) (26.2-66.8) (34.3-68.3) (0.5-3.2) (0.8-3.6)
Serrated lesion 100.0 94.9 - 6.9 9.5 0.3 10.8 10.6 100.0 94.3 - 2.0
(n = 385) (88.8-99.8) (81.4-99.1) (4.5-10.3) (6.8-13.3) (7.9-14.6) (7.6-14.4) (82.8-99.6) (79.5-99.0) (0.5-17.4)
Advanced lesions3 98.6 98.6 1.0 - - - - - - - - -
(n = 73) (91.6-99.9) (91.6-99.9)
1Differences are compared using the McNemar. A p-value < 0.05 is considered statistically significant. 2Adenoma, serrated lesions. 3Adenomas or serrated lesions ≥ 10mm, adenomas with villous histology or high-grade dysplasia or serrated lesions with dysplasia. Abbreviations: PPV, positive predictive value; NPV, negative predictive value; CI, confidence interval.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated