4. Discussion
Fluorescein fundus angiography (FFA) enables the detection of subtle DR lesions and serves as a pivotal imaging modality for evaluating DR severity. FFA images acquired at different contrast diffusion times vary in lesion manifestation. Theoretically, FFA images with different diffusion durations may significantly impact FFA-based DR staging model performance.
In this paper, two state-of-the-art deep learning architectures, Swin-Transformer and ConvNeXt, were respectively adopted to establish intelligent DR staging models on FFA images. Following strictly the International five-grade and Chinese six-grade DR classification criteria, the effects of images with different contrast diffusion durations on model performance were systematically evaluated. The results showed that the impact of different diffusion durations on model performance did not reach statistical significance (corrected P > 0.05), although model performance exhibited a numerical downward trend in late-phase images due to decreased contrast and lesion obscuration by hyperfluorescent leakage. To our knowledge, this study is the first to delineate the marginal effect of contrast diffusion duration from both statistical and mechanistic perspectives, providing empirical evidence for standardized FFA image acquisition and robustness optimization of intelligent DR staging models.
In the International five-grade, Chinese six-grade, and binary classification tasks, ConvNeXt based model generally outperformed Swin-Transformer based model in terms of accuracy, precision, recall, and F1-score across all phase groups, achieving the best staging performance in the venous-phase. This advantage may be attributed to the fact that ConvNeXt, while retaining the local feature extraction capabilities of convolutional neural networks, incorporates design concepts from Transformers (e.g., 7×7 large convolutional kernels and inverted bottleneck structures). This makes ConvNeXt more adept at capturing the coupling characteristics between local vascular structures (e.g., microaneurysms and IRMA) and global leakage patterns in FFA images. In contrast, although Swin-Transformer possesses powerful global modeling capabilities, in late-phase images with low contrast and blurred lesion boundaries, the attention mechanism may shift due to the lack of local structural constraints, leading to performance degradation. This finding suggests that hybrid architectures integrating convolutional local perception and Transformer global modeling possess greater potential for FFA image analysis.
Generalized linear model analysis with Bonferroni multiple comparison correction yielded no significant differences in all pairwise phase comparisons. For example, in the International five-grade task, the accuracy difference between the venous and late-phases for Swin-Transformer based model was 6.09 percentage points (corrected P = 0.3651); in the Chinese six-grade task, the F1-score difference between the venous-phase and the combined group for ConvNeXt based model reached 7.60 percentage points (corrected P = 0.8994). The lack of statistical significance may be attributed to the following factors: (1) the relatively large sample size (7,508 images) allowed the models to achieve a certain degree of phase robustness; (2) both Swin-Transformer and ConvNeXt architectures have a high tolerance for fluctuations in image quality; (3) when annotating recirculation-phase and late-phase images, physicians could refer to the venous-phase images of the same patient, which reduced annotation bias to some extent. Notably, the Chinese six-grade fine-grained task showed pronounced performance degradation: the F1-score of Swin-Transformer based model decreased by 17.84 percentage points and that of ConvNeXt based model decreased by 12.13 percentage points from the venous to the late-phase. This indicates that phase-related performance attenuation is more prominent in fine-grained DR grading, which warrants further validation in larger cohorts and higher-resolution datasets.
Despite the absence of statistical significance, the gradual performance decline observed in late-phase images has a clear pathophysiological and imaging basis. First, reduced imaging contrast: With gradual vascular perfusion and metabolic clearance of the contrast agent over time, the fluorescence intensity gradient between retinal vessels and the background declines markedly. In the late-phase, retinal vasculature is largely cleared of fluorescein [
23], blurring the boundaries of key lesions such as microaneurysms and capillary non-perfusion areas, and hindering the extraction of stable discriminative features. In this study, both models presented reduced recall in late-phase images; for Swin-Transformer under the Chinese six-grade standard, recall dropped from 77.37% to 61.40%, indicating an elevated risk of missed lesion detection. Second, lesion masking by hyperfluorescent leakage: In patients with PDR, extensive hyperfluorescent leakage is commonly observed in the late-phase. Chronic hyperglycemia disrupts tight junctions of retinal vascular endothelial cells, impairs the inner blood–retinal barrier, and increases vascular permeability [
24]. Progressive retinal ischemia and hypoxia further induce neovascularization, whose immature endothelial structure is highly prone to dye leakage. Pathological manifestations include vascular wall staining, capillary dilatation, and focal dye pooling [
22]. These hyperfluorescent regions may partially or completely obscure microaneurysms, retinal hemorrhages and neovascularization, ultimately inducing model misclassification and missed diagnosis. In the Chinese six-grade task, the precision of Swin-Transformer declined sharply from 86.00% to 65.88% in late-phase images, confirming that leakage-induced lesion obscuration substantially increases the risk of false-positive predictions.
This study simultaneously adopted the International five-stage and Chinese six-stage DR classification standards, and the results showed that the performance trend across phases was highly consistent under both standards (venous-phase best, late-phase or combined group lowest), indicating that the phase effect is robust across classification standards. The Chinese six-stage standard, due to its further subdivision of the proliferative stage, achieved lower overall accuracy than the International five-stage standard, suggesting that more fine-grained classification tasks impose higher demands on image quality and model discriminative ability.
The present findings provide practical implications for clinical FFA acquisition and the deployment of intelligent DR analysis systems. First, optimization of acquisition timing: Venous or recirculation-phase images are recommended as the preferred data source for constructing intelligent DR staging models. Second,phase labeling in clinical deployment: In clinical computer-aided diagnostic systems based on FFA, the contrast diffusion phase of input images should be explicitly recorded. For late-phase images, the system may appropriately reduce prediction confidence output and provide clinical risk prompts to facilitate comprehensive clinician judgment. Third, multi-phase mixing in model training: Model performance of combined phase groups fell between that of single venous and late-phases, suggesting that incorporating multi-phase FFA images into training datasets can effectively enhance cross-phase generalization and model robustness.
Several limitations of this study should be acknowledged. First, all data were retrospectively collected from a single medical center. Despite the large sample size, geographical and device homogeneity may limit the external generalizability of the conclusions. Future multicenter prospective cohort studies are required for further validation. Second, this study only qualitatively explored the mechanism of lesion masking by late-phase leakage, without quantitative lesion-level evaluation, such as dynamic curves of microaneurysm detection rate across continuous diffusion time points. Moreover, the continuous dynamic FFA process was discretized into three independent phase subgroups in this work. Advanced strategies including temporal attention mechanisms and video-level sequential modeling were not investigated, which represents a valuable direction for future research to enhance model comprehension of dynamic FFA angiographic sequences.