Submitted:
09 April 2026
Posted:
10 April 2026
Read the latest preprint version here
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Dataset
2.2. Data Preprocessing and Augmentation
2.3. Baseline Model: ResNet-50
2.4. Proposed Architecture: RNNet-MST
2.5. Training Configuration
- Optimizer: AdamW (lr = 1 × 10−4, weight decay = 0.01);
- Scheduler: Cosine Annealing Learning Rate;
- Loss Function: Weighted Binary Cross-Entropy (WBCE)
2.6. Evaluation Metrics
- Accuracy: (TP + TN) / (TP + TN + FP + FN)
- Precision: TP / (TP + FP)
- Nodule Recall (Sensitivity): TP / (TP + FN)
- Nodule F1-Score: 2 x (Precision x Nodule Recall) / (Precision + Nodule Recall)
3. Results
3.1. General Classification Performance
3.2. Comparative Evaluation
3.2.1. Overall Classification Performance
3.2.2. Small-Nodule Detection Performance
4. Discussion
4.1. Interpretation of Performance Gains
4.2. Comparison with Related CXR-Based Nodule Detection Work
4.3. Clinical Significance of the Precision-Recall Trade-off
4.4. Workflow Optimization in Resource-Constrained Settings
4.5. Study Limitations and Future Research Directions
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| CAD | Computer-Aided Detection |
| CBAM | Convolutional Block Attention Module |
| CNN | Convolutional Neural Network |
| CT | Computed Tomography |
| CXR | Chest X-Ray |
| FC | Fully Connected |
| FN | False Negative |
| FP | False Positive |
| GELU | Gaussian Error Linear Unit |
| LCP | Lung Center of the Philippines |
| LN | Layer Normalization |
| MHSA | Multi-Head Self-Attention |
| MLP | Multi-Layer Perceptron |
| MST | Multi-Scale Transformers |
| NODE21 | Nodule Detection 2021 |
| ResNet | Residual Neural Network |
| RNNet-MST | Residual Neural Network with Multi-Scale Transformers |
| TN | True Negative |
| TP | True Positive |
| ViT | Vision Transformer |
| WBCE | Weighted Binary Cross-Entropy |
Appendix A. Quantitative and Qualitative Analysis of Spatial Attention
Appendix A.1. Quantitative Analysis of Baseline vs Enhanced Model
| Metric | Baseline | Enhanced | Improvement |
| BBox Coverage | 28.71% | 39.93% | +11.22% |
| Detection Rate | 29.95% | 39.63% | +9.68% |
| Peak Proximity | 80.77% | 83.80% | +3.03% |
| Attention Focus | 2.1856 | 3.1337 | +0.94 |
Appendix A.2. Visual Comparison of Baseline vs Enhanced Model


Appendix B. Techniques to Address Class Imbalance
- Horizontal Translation: Random shifts of ±2% of image width.
- Vertical Translation: Random shifts of ±2% of image height.
- Rotation: Small random rotations within ±3 degrees.
- Brightness Adjustment: Multiplicative intensity scaling by random factor between 0.95–1.05 (±5%), clipped to original intensity range.
- Horizontal Flip: Left-right mirroring of the image.
- Combined Transform: Horizontal shift (±1%) combined with rotation (±2◦).
Appendix C. Preliminary Clinical Interface Evaluation and Directions for Future Validation

References
- Malhotra, J.; Malvezzi, M.; Negri, E.; La Vecchia, C.; Boffetta, P. Risk factors for lung cancer worldwide. Eur. Respir. J. 2016, 48, 889–902.
- Rivera Medical Center Inc. Top 5 leading causes of death in the Philippines, 2025. Available online: https://www.rmci.com.ph/top-5-leading-causes-of-death-in-the-philippines-2025/ (accessed on 14 March 2026).
- Çalli, E.; Sogancioglu, E.; van Ginneken, B.; van Leeuwen, K.G.; Murphy, K. Deep learning for chest X-ray analysis: A survey. Med. Image Anal. 2021, 72, 102125.
- Miki, S.; Nomura, Y.; Hayashi, N.; Hanaoka, S.; Maeda, E.; Yoshikawa, T.; Masutani, Y.; Abe, O. Prospective Study of Spatial Distribution of Missed Lung Nodules by Readers in CT Lung Screening Using Computer-assisted Detection. Acad. Radiol. 2021, 28, 647–654.
- Digumarthy, S.R.; Gullo, R.L.; Levesque, M.H.; Sayegh, K.; Rao, S.; Raymond, S.B.; Otrakji, A.; Kalra, M.K. Cause determination of missed lung nodules and impact of reader training and education: Simulation study with nodule insertion software. J. Cancer Res. Ther. 2020, 16, 780–787.
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
- Bush, I. Lung nodule detection and classification. Technical Report, Stanford University, Stanford, CA, USA, 2016.
- Borji, A. Addressing a fundamental limitation in deep vision models: Lack of spatial attention. arXiv 2024, arXiv:2407.01782.
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929.
- Raghu, M.; Unterthiner, T.; Kornblith, S.; Zhang, C.; Dosovitskiy, A. Do Vision Transformers see like convolutional neural networks? Adv. Neural Inf. Process. Syst. 2021, 34, 12116–12128.
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19.
- Dai, Y.; Gao, Y.; Liu F. TransMed: Transformers Advance Multi-Modal Medical Image Classification. Diagnostics 2021,11, 1384.
- Raghu, M.; Zhang, C.; Kleinberg, J.; Bengio, S. Transfusion: Understanding transfer learning for medical imaging. arXiv 2019, arXiv:1902.07208.
- Luo, D.; Yang, I.; Bae, J.; Woo, Y. Research on Performance Metrics and Augmentation Methods in Lung Nodule Classification. Appl. Sci. 2024, 14, 5726.
- Fu, X.; Lin, R.; Du, W.; Tavares, A.; Liang, Y. Explainable hybrid transformer for multi-classification of lung disease using chest X-rays. Sci. Rep. 2025, 15, 6650.
- Yoo, H.; Kim, K.H.; Singh, R.; Digumarthy, S.R.; Kalra, M.K. Validation of a deep learning algorithm for the detection of malignant pulmonary nodules in chest radiographs. JAMA Netw. Open 2020, 3, e2017135.
- Schultheiss, M.; Schmette, P.; Bodden, J.; Aichele, J.; Müller-Leisse, C.; Gassert, F.G.; Gassert, F.T.; Gawlitza, J.F.; Hofmann, F.C.; Sasse, D.; et al. Lung nodule detection in chest X-rays using synthetic ground-truth data comparing CNN-based diagnosis to human performance. Sci. Rep. 2021, 11, 15857.
- Chiu, H.-Y.; Peng, R.H.-T.; Lin, Y.-C.; Wang, T.-W.; Yang, Y.-X.; Chen, Y.-Y.; Wu, M.-H.; Shiao, T.-H.; Chao, H.-S.; Chen, Y.-M.; Wu, Y.-T. Artificial intelligence for early detection of chest nodules in X-ray images. Biomedicines 2022, 10, 2839.
- Shimazaki, A.; Ueda, D.; Choppin, A.; Yamamoto, A.; Honjo, T.; Shimahara, Y.; Miki, Y. Deep learning-based algorithm for lung cancer detection on chest radiographs using the segmentation method. Sci. Rep. 2022, 12, 727.
- Behrendt, F.; Bengs, M.; Bhattacharya, D.; Krüger, J.; Opfer, R.; Schlaefer, A. A systematic approach to deep learning-based nodule detection in chest radiographs. Sci. Rep. 2023, 13, 10120.



| Metric | Baseline ResNet-50 | RNNet-MST | Improvement |
| Accuracy | 94.65% | 95.16% | +0.51% |
| Nodule Recall | 86.18% | 93.09% | +6.91% |
| Nodule F1-Score | 89.90% | 91.40% | +1.50% |
| Metric | Class | ResNet-50 | RNNet-MST |
| Precision | No Nodule | 0.95 | 0.97 |
| Nodule | 0.94 | 0.90 | |
| Macro Average | 0.94 | 0.94 | |
| Weighted Average | 0.95 | 0.95 | |
| Nodule Recall | No Nodule | 0.98 | 0.96 |
| Nodule | 0.86 | 0.93 | |
| Macro Average | 0.92 | 0.95 | |
| Weighted Average | 0.95 | 0.95 | |
| Nodule F1-Score | No Nodule | 0.96 | 0.97 |
| Nodule | 0.90 | 0.91 | |
| Macro Average | 0.93 | 0.94 | |
| Weighted Average | 0.95 | 0.95 |
| Model | Correctly Detected | False Negatives | Detection Rate |
| Baseline ResNet-50 | 138 / 171 | 33 | 80.7% |
| RNNet-MST | 159 / 171 | 12 | 93.0% |
| Model | Correctly Detected | False Negatives | Detection Rate |
| Baseline ResNet-50 | 176 / 217 | 41 | 81.1% |
| RNNet-MST | 195 / 217 | 22 | 89.9% |
| Study | Year | Method/Model | Dataset |
Sensitivity/ Recall |
Key Notes |
| Yoo et al. [16] | 2020 | Deep learning algorithm (commercial CAD) | NLST | 74.0% | 5,485 participants; specificity 73%; AUC 0.86 |
| Schultheiss et al. [17] | 2021 | RetinaNet / U-Net CNN |
Synthetic (from LIDC-IDRI CT) | wAFROC: 0.81 | 201 synthetic radiographs; p=0.49 vs. radiologists |
| Chiu et al. [18] | 2022 | YOLOv4 + U-Net lung segmentation |
TVGH + JSRT | 79.0% | 3.04 FP/image; 254 CXRs tested |
| Shimazaki et al. [19] | 2022 | CNN segmentationbased DL model | In-house (Osaka City Univ.) | 73.0% | 0.13 mFPI; lower sensitivity in blind spots (50–64%) |
| Behrendt et al. [20] | 2023 | Ensemble (Faster- RCNN, RetinaNet, EfficientDet-D2, YOLOv5) |
NODE21 | FROC25%: ∼0.84 | Node21 competition winner; AUROC + FROC metric |
| Present study | 2026 | RNNet-MST | NODE21 | 93.09% | Reports strong recall relative to the CXR-based methods summarized here |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).