4. Discussion
This prospective study investigated the feasibility, reliability and accuracy of determining the cardiac LVEF from AI-enhanced ultrasound in an ICU setting. There are two related steps in this process: non-expert users must acquire diagnostic-quality images (with AI assistance) in these challenging patients, and AI must interpret the often-suboptimal images accurately. We had several key findings.
Feasibility of AI-enhanced echo was strong. We found that novice users with only 2 hours of training could generate echo images of quality adequate for AI analysis in ~96% of ICU scans, with a 3-4% scan failure rate similar to that of experts. These results align with previous studies demonstrating that AI-enhanced ultrasound, utilized by both novice and expert users, consistently produces diagnostic quality images from PLAX and A4C views [
13,
15]. This result is particularly impressive considering the factors limiting echo in ICU patients: immobility (especially the inability to turn into decubitus position), potentially unstable clinical status, irregular and/or rapid heart rhythms, shadowing from abnormal lungs, large body habitus, uncooperative partially sedated or delirious patients, and artifacts from machinery such as mechanical ventilation.
While scan times were ~3x as long for novices as for experts, nearly all scans could be obtained in less than 5 minutes even by novices. The PLAX view was easier and substantially faster for novices to obtain: their scan times averaged ~1.5 minutes for the PLAX view and ~3 minutes for A4C.
Inter-observer reliability was high. Novices and experts generated images that led to similar AI calculations of LVEF (ICC=0.88-0.94 for the 2 views). This is expected, since a key advantage of AI is that in many applications it ‘levels the playing field’, enabling novices to perform tasks nearly as well as experts.
Accuracy of the AI LVEF calculations in these challenging ICU patients was more mixed. Concordant with frequent real-world practice [
16], we used a threshold of 40% to differentiate between a substantially reduced LVEF (true-positive result) and a normal/minimally-reduced LVEF. When AI detected an LVEF
<40%, it was generally correct, with high specificity 90-94%. The A4C view had more false-positive AI results than PLAX, potentially due to the increased difficulty of acquiring the A4C view and the effects of foreshortening in a suboptimal A4C view. Sensitivity was low at 56% when using the AI “mean LVEF” prediction, rising to 70% when using the AI “lower-bound LVEF” prediction. Overall, in our ICU patients the AI tool could be considered useful to confirm a suspected abnormal LVEF, due to its high specificity and positive predictive value, but a normal LVEF on this test would not exclude decreased cardiac function.
When comparing A4C and PLAX views, while AI-derived measurements from the two views were generally similar, A4C LVEF estimates were significantly lower than those from PLAX in our cohort, leading to an underestimation of LVEF when relying on A4C alone. There is controversy in the literature regarding which view allows the most accurate AI-enhanced LVEF estimation, with some studies reporting that A4C outperforms PLAX [
4,
17], and other studies supporting our findings that PLAX was superior [
14,
18]. The differences may relate to patient cohorts and user experience. Since PLAX is more easily obtained by non-experts, these images may be higher-quality in more patients. However, because the PLAX view does not include the cardiac apex, patients with focal pathology affecting the apex (as is frequent in myocardial infarctions) may be best assessed by A4C views obtained by experts.
Given these discrepancies, many studies emphasize the value of integrating multiple echocardiographic views to improve the diagnostic accuracy of LVEF [
4,
14,
17,
18]. Consistent with this, we found that sensitivity and specificity for detecting reduced LVEF were highest when combining measurements from both A4C and PLAX. Performing the more easily-obtained PLAX view in all patients and only adding A4C when PLAX was abnormal would have improved specificity slightly. Future larger studies could evaluate the validity of AI-enhanced single-view vs. multi-view approaches for LVEF assessment across different patient populations and clinical settings.
Our study had limitations. Although a strength of our study was the large number of novice scanners in an ICU setting (30 learners), we had only a small number of patients in Cohort 1 scanned by these novices (n=10). This is because it was logistically difficult to have an ICU patient stable enough to be scanned by many learners when they were available. We also did not have a concurrent contrast-echo gold-standard in all of these patients, again for logistical reasons. Another limitation is that the larger series in Cohort 2 (n=65) assessing LVEF accuracy was scanned only by our expert, without learners; this is again due to logistical constraints in the hospital setting. However, since our results in Cohort 1 showed that the AI LVEF estimates were very similar whether the scan was performed by an expert or novice, the accuracy of the AI tool in Cohort 2 is likely to be broadly similar to that obtained by less-experienced users.