Submitted:
01 November 2025
Posted:
03 November 2025
You are already at the latest version
Abstract

Keywords:
1. Introduction
2. Materials and Methods
3. Results
3.1. CNN + Transformers
3.2. Transformers
3.3. Multihybrid
3.4. CNN + RNN Architectures
3.5. Others
| Year | Ref. | Radiological Domain | Datasets (DS) Used | Deep Learning Method | Architecture | Best BLEU-1 | Best Outcome DS |
|---|---|---|---|---|---|---|---|
| 2020 | [87] | Ultrasound (US) (gallbladder, kidney, liver), Chest X-ray | US image dataset (6,563 images). IU-Xray (7,470 images, 3,955 reports) | CNN, RNN | SFNet (Semantic fusion network). ResNet-50. Faster RCNN. Diagnostic report generation module using LSTM | 0.65 | US DATASET |
| 2021 | [112] | Chest CT | COVID-19 CT dataset (368 reports, 1,104 CT images). CX-CHR dataset (45,598 images, 28,299 reports), 12 million external medical textbooks | Transformer, CNN | Medical-VLBERT (with DenseNet-121 as backbone) | 0.70 | CX CHR CT |
| 2021 | [49] | Chest X-ray | IU-Xray | Transformer | 2 CDGPT2 (Conditioned Distil Generative Pre-trained Transformer) | 0.387 | IU-Xray |
| 2021 | [113] | Chest X-ray | MIMIC-CXR (377,110 images, 227,835 reports). IU-Xray | CNN, Transformer | BERT-base | *0.126 (BLEU-4) | MIMIC-CXR |
| 2021 | [92] | Liver CT | Liver CT annotation dataset from ImageCLEF 2015 (50 patients) | CNN | MobileNet-V2 | 0.65 | ZGT HOSPITAL RX |
| 2021 | [80] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, RNN, GAN | Encoder with two branches (CNN based on ResNet-152 and MLC). Hierarchical LSTM decoder with multi-level attention and a reward module with two discriminators. | *0.148 (BLEU-4) | MIMIC-CXR |
| 2022 | [91] | Proximal Femur Fractures (X-ray) | Main dataset: 4,915 cases with 11,606 images and reports. Language model dataset: 28,329 radiological reports | CNN, RNN | DenseNet-169 for classification. Encoder-Decoder for report generation. GloVe for language modeling | 0.65 | MAIN DATASET |
| 2022 | [94] | Chest X-ray | MIMIC-CXR. IU-Xray | GNN, Transformer | Custom framework using Transformer for generation module | 0.505 | IU-Xray |
| 2022 | [70] | Chest X-ray | IU-Xray | CNN, RNN, Transformer | CNN VGG19 network (feature extraction). BERT (language generation). DistilBERT (perform sentiment) | 0.772 | IU-Xray |
| 2022 | [114] | Liver CT and kidney, DBT (Digital Breast Tomosynthesis) | ImageNet (25,000 images). CT abdomen and mammography images (750 images). CT and DBT medical images (150 images) | RNN, CNN | MLTL-LSTM model (Multi level transfer learning framework with a long short-term-memory model) | 0.769 | CT-DBT |
| 2022 | [88] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, RNN | HReMRG-MR (Hybrid reinforced report generation method with m-linear attention and repetition penalty mechanism) | 0.4806 | MIMIC-CXR |
| 2022 | [15] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, Transformer | CvT2DistilGPT2 | 0.4732 | IU-Xray |
| 2022 | [64] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, RNN, Transformer | DenseNet (encoder). LSTM or Transformer (decoder). ATAG (Attributed abnormality graph) embeddings. GATE (gating mechanism) | **0.323 (BLEU AVG) | IU-Xray |
| 2022 | [84] | Chest X-ray | IU-Xray | CNN, RNN | AMLMA (Adaptive multilevel multi-attention) | 0.471 | IU-Xray |
| 2022 | [22] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, Transformer | MATNet (Multimodal adaptive transformer) | 0.518 | IU-Xray |
| 2022 | [78] | Chest X-ray | IU-Xray | CNN, RNN, Attention Mechanism | RCLN model (combining CNN, LSTM, and multihead attention mechanism). Pre-trained ResNet-152 (image encoder) | 0.4341 | IU-Xray |
| 2022 | [73] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, RNN, Transformer | VTI (Variational topic inference) with LSTM-based and Transformer-based decoders | 0.503 | IU-Xray |
| 2022 | [14] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, Transformer | CTN built on Transformer architecture | 0.491 | IU-Xray |
| 2022 | [81] | Chest X-ray | IU-Xray | CNN, RNN, Reinforcement Learning | CADxReport (VGG19, HLSTM with co-attention mechanism and reinforcement learning) | 0.577 | IU-Xray |
| 2022 | [27] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, Transformer | CAMANet (Class activation map guided attention network) | 0.504 | IU-Xray |
| 2022 | [85] | Chest X-ray | IU-Xray | CNN, RNN, Attention Mechanism | CheXPrune (encoder-decoder architecture with VGG19 and hierarchical LSTM) | 0.5428 | IU-Xray |
| 2022 | [23] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, Transformer, VAE | Prior Guided Transformer. ResNet101 (visual feature extractor). Vanilla Transformer (baseline) | 0.482 | IU-Xray |
| 2022 | [44] | Chest X-ray | MIMIC-CXR. IU-Xray | Transformer | Pure Transformer-based Framework (custom architecture) | 0.496 | IU-Xray |
| 2022 | [79] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, RNN | VGG16 (visual geometry group CNN network). LSTM with attention | 0.580 | IU-Xray |
| 2022 | [41] | Chest X-ray | MIMIC-CXR. CheXpert | Transformer | Meshed-memory augmented transformer architecture with visual extractor using ImageNet and CheXpert pre-trained weights | 0.348 | MIMIC-CXR |
| 2023 | [54] | Chest X-ray | MIMIC-CXR. IU-Xray | Transformer | MFOT (Multi-feature optimization transformer) | 0.517 | IU-Xray |
| 2023 | [42] | Chest X-ray | IU-Xray | Transformer | TrMRG (Transformer Medical Report Generator) using ViT as encoder, MiniLM as decoder | 0.5551 | IU-Xray |
| 2023 | [116] | Chest X-ray | MIMIC-CXR. IU-Xray | Transformer, CNN, RNN | ASGMD (Auxiliary signal guidance and memory-driven) network. ResNet-101 and ResNet-152 as visual feature extractors | 0.489 | IU-Xray |
| 2023 | [36] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, Transformer | Visual Prior-based Cross-modal Alignment Network | 0.489 | IU-Xray |
| 2023 | [55] | Chest X-ray, CT COVID-19 | MIMIC-CXR. IU-Xray. COV-CTR (728 images) | Transformer | ICT (Information calibrated transformer) | 0.768 | COV-CTR |
| 2023 | [16] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, Transformer, Self-Supervised Learning | S3-Net (Self-supervised dual-stream network) | 0.499 | IU-Xray |
| 2023 | [83] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, RNN | TriNet (custom architecture) | 0.478 | IU-Xray |
| 2023 | [89] | Chest X-ray | IU-Xray. Chexpert (224,316 images) | CNN, RNN | ResNet50. CVAM+MVSL (Cross-view attention module and Medical visual-semantic LSTMs) | 0.460 | IU-Xray |
| 2023 | [86] | Chest X-ray | IU-Xray | CNN, RNN | Encoder-Decoder framework with UM-VES and UM-TES subnetworks and LSTM decoder | 0.5881 | IU-Xray |
| 2023 | [57] | Chest X-ray | MIMIC-CXR. IU-Xray | Transformer | ResNet101 (visual extractor). 3-layer Transformer structure (encoder-decoder framefork). BLIP architecture | 0.513 | IU-Xray |
| 2023 | [56] | Chest X-ray | IU-Xray | Transformer, Contrastive Learning | MKCL (Medical knowledge with contrastive learning). ResNet-101. Transformer | 0.490 | IU-Xray |
| 2023 | [62] | Chest X-ray, Dermoscopy | IU-Xray, NCRC-DS (81 entities, 536 triples) | CNN, RNN, Transformer | DenseNet-121. ResNet-101. Memory-driven Transformer | 0.494 | IU-Xray |
| 2023 | [115] | US (gallbladder, fetal hearth), Chest X-ray | US dataset (6,563 images and reports). Fetal Heart (FH) dataset (3,300 images and reports). MIMIC-CXR. IU-Xray | CNN, RNN | AERMNet (Attention-Enhanced Relational Memory Network) | 0.890 | US DATASET |
| 2023 | [59] | Chest X-ray | NIH Chest X-ray (112,120 images). MIMIC-CXR. IU-Xray | Transformer | ViT. GNN. Vector Retrieval Library. Multi-label contrastive learning. Multi-task learning | 0.478 | IU-Xray |
| 2024 | [17] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, Transformer | Swin-Transformer | 0.499 | IU-Xray |
| 2024 | [51] | Chest X-ray | IU-Xray | Transformer | ViGPT2 model | 0.571 | IU-Xray |
| 2024 | [28] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, Transformer | FMVP (Flexible multi-view paradigm) | 0.499 | IU-Xray |
| 2024 | [43] | Chest X-ray | Proposed Dataset (21,970 images). IU-Xray. NIH Chest X-ray | Transformer | XRaySwinGen (Swin Transformer as image encoder, GPT-2 as textual decoder) | 0.731 | PT BR |
| 2024 | [33] | Chest X-ray | IU-Xray | CNN, Transformer | FDT-Dr 2 T (custom famework) | 0.531 | IU-Xray |
| 2024 | [118] | Chest X-ray | IU-Xray. XRG-COVID-19 (8676 scans, 8676 reports) | CNN, Transformer | DSA-Transformer with ResNet-101 as the backbone | 0.552 | XRG-COVID-19 |
| 2024 | [32] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, Transformer | DenseNet-121. Transformer encoder. GPT-4 | 0.491 | IU-Xray |
| 2024 | [58] | Oral Panoramic X-ray | Oral panoramic X-ray image-report dataset (562 sets of images and reports). MIMIC-CXR | Transformer | MLAT (Multi-Level objective Alignment Transformer) | 0.5011 | PAN XRAY |
| 2024 | [66] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, RNN, Transformer | MRARGN (Multifocal Region-Assisted Report Generation Network) | 0.502 | IU-Xray |
| 2024 | [21] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, Transformer | Memory-driven Transformer (based on standard Transformer architecture with relational memory added to the decoder) | 0.508 | IU-Xray |
| 2024 | [68] | Chest X-ray | IU-Xray | CNN, RNN, Transformer | VGG19 (CNN) pre-trained over ImageNet dataset. GloVe, fastText, ElMo, and BERT (extract textual features from the ground truth reports). Hierarchical LSTM (generate reports) | 0.612 | IU-Xray |
| 2024 | [93] | Chest X-ray | MIMIC-CXR. IU-Xray | RNN, Transformer, MLP | Transformer (encoder). MIX-MLP multi-label classification network. CAM (Co-attention mechanism) based on POS-SCAN. Hierarchical LSTM (decoder) | 0.521 | IU-Xray |
| 2024 | [45] | Chest X-ray | MIMIC-CXR | Transformer | CheXReport (Swin-B fully transformer) | 0.354 | MIMIC-CXR |
| 2024 | [25] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, Transformer | RAMT (Relation-Aware Mean Teacher). GHFE (Graph-guided hybrid feature encoding) module. DenseNet121 (visual feature extractor). Standard Transformer (decoder) | 0.482 | IU-Xray |
| 2024 | [31] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, Transformer | ResNet-101. Transformer (multilayer encoder and decoder) | 0.514 | IU-Xray |
| 2024 | [40] | Chest X-ray | MIMIC-CXR. IU-Xray | Transformer | Team Role Interaction Network (TRINet) | 0.445 | MIMIC-CXR |
| 2024 | [26] | Polydactyly X-ray | Custom dataset (16,710 images and reports) | CNN, Transformer | Inception-V3 CNN. Transformer Architecture | 0.516 | PD XRAY BR |
| 2024 | [46] | Chest X-ray | MIMIC-CXR | Transformer | ViT. GPT-2 (with custom positional encoding and beam search) | *0.095 (BLEU-4) | MIMIC-CXR |
| 2024 | [90] | Chest X-ray, Chest CT (COVID-19) | MIMIC-CXR. IU-Xray. COV-CTR | CNN, RNN | HDGAN (Hybrid Discriminator Generative Adversarial Network) | 0.765 | COV-CTR |
| 2024 | [117] | Chest X-ray | IU-Xray. Custom dataset (1,250 image and reports) | CNN-Transformer | CNX-B2 (CNN encoder, BioBERT transformer) | 0.479 | IU-Xray |
| 2024 | [37] | Chest X-ray | NIH ChestX-ray. IU-Xray | CNN, Transformer | CSAMDT (Conditional self attention memory-driven ransformer) | 0.504 | IU-Xray |
| 2024 | [53] | Ultrasound (gallbladder, kidney, liver), Chest X-ray | MIMIC-CXR. IU-Xray. LGK US (6,563 images and reports). | Transformer | CGFTrans (Cross-modal global feature fusion transformer) | 0.684 | US DATASET |
| 2024 | [52] | Chest X-ray | MIMIC-CXR. IU-Xray | Transformer | TSGET (Two-stage global enhanced transformer) | 0.500 | IU-Xray |
| 2024 | [35] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, Transformer | VCIN (Visual-textual cross-modal interaction network). ACIE (Abundant clinical information embedding). Bert-based Decoder-only Generator | 0.508 | IU-Xray |
| 2024 | [76] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, RNN, Transformer | Memory-driven Transformer | 0.539 | IU-Xray |
| 2024 | [30] | Chest X-ray | MIMIC-CXR. Chest ImaGenome (237,853 images). Brown-COVID (1021 images). Penn-COVID (2879 images) | CNN, Transformer | MRANet (Multi-modality regional alignment network) | 0.504 | BROWN-COVID |
| 2024 | [34] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, Transformer | ResNet-101. Multilayer Transformer (encoder and decoder) | 0.472 | IU-Xray |
| 2024 | [65] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, RNN, Transformer | MeFD-Net (proposed multi expert diagnostic module). ResNet101 (visual encoder). Transformer (text generation module) | 0.505 | IU-Xray |
| 2024 | [19] | Chest X-ray | MIMIC-CXR. Chest ImaGenome (242,072 scene graphs) | CNN, Transformer | Faster R-CNN (object detection). GPT-2 Medium (report generation) | 0.391 | MIMIC-CXR |
| 2025 | [77] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, RNN, Transformer | Denoising multi-level cross-attention. Contrastive learning model (with ViTs-B/16 as visual extractor, BERT as text encoder) | 0.507 | IU-Xray |
| 2025 | [75] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, RNN, Transformer | KCAP (Knowledge-guided cross-modal alignment and progressive fusion) | 0.517 | IU-Xray |
| 2025 | [71] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, RNN, Transformer, ViT | ATL-CA (Adaptive topic learning and fine-grained crossmodal alignment) | 0.487 | IU-Xray |
| 2025 | [63] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, RNN, Transformer | ADCNet (Anomaly-driven cross-modal contrastive network). ResNet-101 and Transformer encoder-decoder architecture | 0.493 | IU-Xray |
| 2025 | [50] | Chest X-ray | IU-Xray | Transformer | ChestX-Transcribe (combines Swin Transformer and DistilGPT) | 0.675 | IU-Xray |
| 2025 | [18] | Brain CT and MRI scans | RSNA-IHDC dataset (674,258 brain CT images, 19,530 patients) | CNN, Transformer | AC-BiFPN (Augmented convolutional bi-directional feature pyramid network). Transformer model | 0.382 | RSNA IHDC CT |
| 2025 | [61] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, RNN, Transformer | DCTMN (Dual-channel transmodal memory network) | 0.506 | IU-Xray |
| 2025 | [74] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, Transformer, Graph reasoning network (GRN), Cross-modal Gated Fusion Network (CGFN) | ResNet101 (visual feature extraction). GRN. CGFN (Cross-modal gated fusion network). Transformer (decoder) | 0.514 | IU-Xray |
| 2025 | [47] | Spine CT | VerSe20 (300 MDCT spine images) | Transformer | ViT-Base. BioBERT BASE. MiniLM | 0.7291 | IU-Xray |
| 2025 | [48] | Chest X-ray | IU-Xray | Transformer | ResNet-101 with CBAM (convolutional block attention module). Cross-attention mechanism | 0.456 | IU-Xray |
| 2025 | [39] | Chest X-ray | MIMIC-CXR. IU-Xray | Transformer | MMG (Multi-modal granularity feature fusion) | 0.497 | IU-Xray |
| 2025 | [24] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, Transformer | RCAN (Recalibrated cross-modal alignment network) | 0.521 | IU-Xray |
| 2025 | [82] | Chest X-ray | IU-Xray | CNN, RNN | G-CNX (hybrid encoder–decoder architecture). ConvNeXtBase (encoder side). GRU-based RNN (decoder side) | 0.6544 | IU-Xray |
| 2025 | [67] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, RNN, Transformer | DPN (Dynamics priori networks) with components including DGN (Dynamic graph networks), Contrastive learning, PrKN (Prior knowledge networks). ResNet-152 (image feature extraction). SciBert (report embedding) | 0.409 | IU-Xray |
| 2025 | [29] | Chest X-ray, Bladder Pathology | MIMIC-CXR. IU-Xray. 4,253 bladder Pathology images. | Transformer, CNN | AHP (Adapter-enhanced hierarchical cross-modal pre-training) | 0.502 | IU-Xray |
| 2025 | [119] | Chest X-ray | COVIDx-CXR-2 (29,986 images). COVID-CXR (more than 900 images). BIMCV-COVID-19 (more than 10,000 images). COV-CTR. MIMIC-CXR. NIH ChestX-ray | CNN, Transformer | ResNet-50 (image encoder). BERT (text encoder). Transformer-based model (with variants using LLAMA-2-7B and Transformer-BASE, decoder) | *0.63 (BLEU-4) | COVID-19 DATASETS |
| 2025 | [20] | Chest X-ray | MIMIC-CXR. IU-Xray | CNN, Transformer | CECL (Clustering enhanced contrastive learning) | 0.485 | IU-Xray |
| 2025 | [69] | Chest X-ray | MIMIC-CXR. IU-Xray | Diffusion Models, RNN, CNN, Transformer | Diffusion Model-based architecture. ResNet34. Transformer structure using cross-attention | 0.422 | IU-Xray |
| 2025 | [60] | Chest X-ray | MIMIC-CXR. IU-Xray | Transformer | STREAM (Spatio-temporal and retrieval-augmented modelling). SwinTransformer (Swin-Base) (encoder). TinyLlama-1.1B (decoder). | 0.506 | IU-Xray |
| 2025 | [72] | Chest X-ray | MIMIC-CXR. ROCO (over 81,000 images) | CNN, RNN, Transformer | CAT (Cross-modal augmented transformer) | 0.491 | IU-Xray |
| 2025 | [38] | Chest X-ray | IU-Xray . COV-CTR | Transformer | MedVAG (Medical vision attention generation) | 0.808 | COV-CTR |
4. Discussion
5. Conclusions
- The field remains heavily concentrated on chest radiography, with more than 87% of studies based on CXR datasets. This reflects public data availability and the acceleration of thoracic imaging research during the COVID-19 pandemic, but also exposes a lack of anatomical diversity that limits generalizability to other diagnostic domains encountered in routine radiological practice.
- Hybrid architectures, particularly CNN–Transformer combinations, represent the dominant methodological trend (73% of included studies). By leveraging CNNs for localized visual encoding and Transformer modules for contextual reasoning, these models generate reports with greater coherence and abnormality representation, reducing variability and supporting more consistent documentation.
- The increased use of memory modules, medical knowledge graphs, and cross-modal alignment mechanisms demonstrates a clear shift toward clinically informed modeling. These strategies improve factual grounding by embedding structured domain knowledge into the generation process and aligning outputs more closely with expert reasoning.
- However, current evaluation frameworks remain poorly aligned with clinical decision-making. Metrics such as BLEU and ROUGE capture surface-level similarity but do not reflect diagnostic adequacy or patient management utility, underscoring the need for evaluation standards that measure whether generated reports truly support radiological interpretation and workflow reliability.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| ARRG | Automatic Radiology Report Generation |
| DL | Deep learning |
| PRISMA | Preferred reporting items for systematic reviews and meta-analyses |
| COVID-19 | Coronavirus disease 2019 |
| CNNs | Convolutional neural networks |
| RNNs | Recurrent neural networks |
| LSTM | Long short-term memory |
| CLIP | Contrastive language-image pretraining |
| LLMs | Large language models |
| MIMIC | Medical information mart for intensive care database |
| MIMIC-CXR | MIMIC-Chest X-ray |
| IU-Xray | Indiana university chest X-ray collection |
| BLEU | Bilingual evaluation understudy |
| ROUGE | Recall-oriented understudy for gisting evaluation |
| BERT | Bidirectional encoder representations from transformers |
| AUC | Area under the curve |
| IEEE | Institute of electrical and electronics engineers |
| ACM | Association for computing machinery |
| WoS | Web of science |
| GNNs | Graph neural networks |
| GAT | Graph attention network |
| TRIPOD | Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis |
| CXR | Chest X-ray |
| CT | Computed tomography |
| MRI | Magnetic resonance imaging |
| CTN | Contrastive triplet network |
| SOTA | State of the art |
| METEOR | Metric for evaluation of translation with explicit ordering |
| CIDEr | Consensus-based image description evaluation |
| ViT | Vision transformers |
| GPT | Generative pre-trained transformers |
| NLG | Natural language generation |
| GRU | Gated recurrent unit |
| SVMs | Support vector machines |
| KG | Knowledge graph |
| DS | Dataset |
| US | Ultrasound |
| DBT | Digital breast tomosynthesis |
| SFNet | Semantic fusion network |
| CDGPT | Conditioned distil generative pre-trained transformer |
| MLTL | Multi level transfer learning |
| HReMRG | Hybrid reinforced medical report generation method |
| ATAG | Attributed abnormality graph |
| AMLMA | Adaptive multilevel multi-attention |
| VTI | Variational topic inference |
| CAMANet | Class activation map guided attention network |
| MFOT | Multi-feature optimization transformer |
| TrMRG | Transformer medical report generator |
| ASGMD | Auxiliary signal guidance and memory-driven |
| ICT | Information calibrated transformer |
| CVAM | Cross-view attention module |
| MVSL | Medical visual-semantic LSTMs |
| MKCL | Medical knowledge with contrastive learning |
| AERMNet | Attention-Enhanced Relational Memory Network |
| FMVP | Flexible multi-view paradigm |
| RAMT | Relation-aware mean teacher |
| GHFE | Graph-guided hybrid feature encoding |
| CSAMDT | Conditional self attention memory-driven transformer |
| CGFTrans | Cross-modal global feature fusion transformer |
| TSGET | Two-stage global enhanced transformer |
| VCIN | Visual-textual cross-modal interaction network |
| ACIE | Abundant clinical information embedding |
| MRANet | Multi-modality regional alignment network |
| KCAP | Knowledge-guided cross-modal alignment and progressive fusion |
| ATL-CA | Adaptive topic learning and fine-grained crossmodal alignment |
| ADCNet | Anomaly-driven cross-modal contrastive network |
| AC-BiFPN | Augmented convolutional bi-directional feature pyramid network |
| DCTMN | Dual-channel transmodal memory network |
| GRN | Graph reasoning network |
| CGFN | Cross-modal gated fusion network |
| CBAM | Convolutional block attention module |
| MMG | Multi-modal granularity feature fusion |
| RCAN | Recalibrated cross-modal alignment network |
| DPN | Dynamics priori networks |
| DGN | Dynamic graph networks |
| PrKN | Prior knowledge networks |
| AHP | Adapter-enhanced hierarchical cross-modal pre-training |
| CECL | Clustering enhanced contrastive learning |
| STREAM | Spatio-temporal and retrieval-augmented modelling |
| CAT | Cross-modal augmented transformer |
| MedVAG | Medical vision attention generation |
| RGRG | Region-guided report generation |
| UAR | Unify, align and refine |
References
- Ramirez-Alonso, G.; Prieto-Ordaz, O.; López-Santillan, R.; Montes-Y-Gómez, M. Medical report generation through radiology images: An overview. IEEE Lat. Am. Trans. 2022, 20(6), 986–999. [Google Scholar] [CrossRef]
- Kaur, N.; Mittal, A.; Singh, G. Methods for automatic generation of radiological reports of chest radiographs: A comprehensive survey. Multimed. Tools Appl. 2022, 81(10), 13409–13439. [Google Scholar] [CrossRef]
- Pang, T.; Li, P.; Zhao, L. A survey on automatic generation of medical imaging reports based on deep learning. Biomed. Eng. Online 2023, 22(1), 48. [Google Scholar] [CrossRef] [PubMed]
- Azad, R.; Kazerouni, A.; Heidari, M.; Khodapanah Aghdam, E.; Molaei, A.; Jia, Y.; et al. Advances in medical image analysis with vision transformers: A comprehensive review. Med. Image Anal. 2024, 91, 103000. [Google Scholar] [CrossRef]
- Sun, Z.; Lin, M.; Zhu, Q.; Xie, Q.; Wang, F.; Lu, Z.; et al. A scoping review on multimodal deep learning in biomedical images and texts. J. Biomed. Inform. 2023, 146, 104482. [Google Scholar] [CrossRef]
- Sloan, P.; Clatworthy, P.L.; Simpson, E.; Mirmehdi, M. Automated radiology report generation: A review of recent advances. IEEE Rev. Biomed. Eng. 2025, 18, 368–387. [Google Scholar] [CrossRef] [PubMed]
- Guo, L.; Tahir, A.M.; Zhang, D.; Wang, Z.J.; Ward, R.K. Automatic medical report generation: Methods and applications. APSIPA Trans. Signal Inf. Process. 2024, 13(1). [Google Scholar] [CrossRef]
- Shen, Y.; Xu, Y.; Ma, J.; Rui, W.; Zhao, C.; Heacock, L.; et al. Multi-modal large language models in radiology: Principles, applications, and potential. Abdom. Radiol. 2024. [Google Scholar] [CrossRef] [PubMed]
- Nakaura, T.; Ito, R.; Ueda, D.; Nozaki, T.; Fushimi, Y.; Matsui, Y.; et al. The impact of large language models on radiology: A guide for radiologists on the latest innovations in AI. Jpn. J. Radiol. 2024, 42(7), 685–696. [Google Scholar] [CrossRef]
- Nerella, S.; Bandyopadhyay, S.; Zhang, J.; Contreras, M.; Siegel, S.; Bumin, A.; et al. Transformers and large language models in healthcare: A review. Artif. Intell. Med. 2024, 154, 102900. [Google Scholar] [CrossRef]
- Ouis, M.Y.; Akhloufi, M.A. Deep learning for report generation on chest X-ray images. Comput. Med. Imaging Graph. 2024, 111, 102320. [Google Scholar] [CrossRef] [PubMed]
- Gallifant, J.; Afshar, M.; Ameen, S.; Aphinyanaphongs, Y.; Chen, S.; Cacciamani, G.; et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat. Med. 2025, 31, 60–69. [Google Scholar] [CrossRef] [PubMed]
- Page, M.J.; Moher, D.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; et al. PRISMA 2020 explanation and elaboration: Updated guidance and exemplars for reporting systematic reviews. BMJ 2021, 372, n160. [Google Scholar] [CrossRef] [PubMed]
- Yang, Y.; Yu, J.; Jiang, H.; Han, W.; Zhang, J.; Jiang, W. A contrastive triplet network for automatic chest X-ray reporting. Neurocomputing 2022, 502, 71–83. [Google Scholar] [CrossRef]
- Nicolson, A.; Dowling, J.; Koopman, B. Improving chest X-ray report generation by leveraging warm starting. Artif. Intell. Med. 2023, 144, 102633. [Google Scholar] [CrossRef]
- Pan, R.; Ran, R.; Hu, W.; Zhang, W.; Qin, Q.; Cui, S. S3-Net: A self-supervised dual-stream network for radiology report generation. IEEE J. Biomed. Health Inform. 2024, 28(3), 1448–1459. [Google Scholar] [CrossRef]
- Pan, Y.; Liu, L.J.; Yang, X.B.; Peng, W.; Huang, Q.S. Chest radiology report generation based on cross-modal multi-scale feature fusion. J. Radiat. Res. Appl. Sci. 2024, 17(1), 100823. [Google Scholar] [CrossRef]
- Bouslimi, R.; Trabelsi, H.; Karaa, W.B.A.; Hedhli, H. AI-driven radiology report generation for traumatic brain injuries. J. Imaging Inform. Med. 2025. [Google Scholar] [CrossRef]
- Zhang, K.; Yang, Y.; Yu, J.; Fan, J.; Jiang, H.; Huang, Q.; et al. Attribute prototype-guided iterative scene graph for explainable radiology report generation. IEEE Trans. Med. Imaging 2024, 43(12), 4470–4482. [Google Scholar] [CrossRef]
- Liu, X.; Xin, J.; Shen, Q.; Li, C.; Huang, Z.; Wang, Z. End-to-end clustering enhanced contrastive learning for radiology reports generation. IEEE Trans. Emerg. Top. Comput. Intell. 2025, 9(2), 1780–1794. [Google Scholar] [CrossRef]
- Liu, X.; Xin, J.; Dai, B.; Shen, Q.; Huang, Z.; Wang, Z. Label correlated contrastive learning for medical report generation. Comput. Methods Programs Biomed. 2025, 258, 108482. [Google Scholar] [CrossRef]
- Shang, C.; Cui, S.; Li, T.; Wang, X.; Li, Y.; Jiang, J. MATNet: Exploiting multi-modal features for radiology report generation. IEEE Signal Process. Lett. 2022, 29, 2692–2696. [Google Scholar] [CrossRef]
- Yan, B.; Pei, M.; Zhao, M.; Shan, C.; Tian, Z. Prior guided transformer for accurate radiology reports generation. IEEE J. Biomed. Health Inform. 2022, 26(11), 5631–5640. [Google Scholar] [CrossRef]
- Hou, X.; Li, X.; Liu, Z.; Sang, S.; Lu, M.; Zhang, Y. Recalibrated cross-modal alignment network for radiology report generation with weakly supervised contrastive learning. Expert Syst. Appl. 2025, 269, 126394. [Google Scholar] [CrossRef]
- Zhang, K.; Jiang, H.; Zhang, J.; Fan, J.; Yu, J.; et al. Semi-supervised medical report generation via graph-guided hybrid feature consistency. IEEE Trans. Multimedia 2024, 26, 904–915. [Google Scholar] [CrossRef]
- Vieira, P.D.A.; Mathew, M.J.; Santos Neto, P.D.A.D.; Silva, R.R.V.E. The automated generation of medical reports from polydactyly X-ray images using CNNs and transformers. Appl. Sci. 2024, 14(15), 6566. [Google Scholar] [CrossRef]
- Wang, J.; Bhalerao, A.; Yin, T.; See, S.; He, Y. CAMANet: Class activation map guided attention network for radiology report generation. IEEE J. Biomed. Health Inform. 2024, 28(4), 2199–2210. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Zhu, Z.; Zheng, S.; Zhao, Y.; He, K.; Zhao, Y. From observation to concept: A flexible multi-view paradigm for medical report generation. IEEE Trans. Multimedia 2024, 26, 5987–5995. [Google Scholar] [CrossRef]
- Yu, T.; Lu, W.; Yang, Y.; Han, W.; Huang, Q.; Yu, J.; et al. Adapter-enhanced hierarchical cross-modal pre-training for lightweight medical report generation. IEEE J. Biomed. Health Inform. 2025, 1–15. [Google Scholar] [CrossRef]
- Zhong, Z.; Li, J.; Sollee, J.; Collins, S.; Bai, H.; Zhang, P.; et al. Multi-modality regional alignment network for COVID X-ray survival prediction and report generation. IEEE J. Biomed. Health Inform. 2024, 1–11. [Google Scholar] [CrossRef]
- Li, S.; Qiao, P.; Wang, L.; Ning, M.; Yuan, L.; Zheng, Y.; et al. An organ-aware diagnosis framework for radiology report generation. IEEE Trans. Med. Imaging 2024, 43(12), 4253–4265. [Google Scholar] [CrossRef]
- Li, H.; Wang, H.; Sun, X.; He, H.; Feng, J. Context-enhanced framework for medical image report generation using multimodal contexts. Knowl.-Based Syst. 2025, 310, 112913. [Google Scholar] [CrossRef]
- Sharma, D.; Dhiman, C.; Kumar, D. FDT–Dr2T: A unified dense radiology report generation transformer framework for X-ray images. Mach. Vis. Appl. 2024, 35(4), 68. [Google Scholar] [CrossRef]
- Liu, A.; Guo, Y.; Yong, J.H.; Xu, F. Multi-grained radiology report generation with sentence-level image-language contrastive learning. IEEE Trans. Med. Imaging 2024, 43(7), 2657–2669. [Google Scholar] [CrossRef]
- Zhang, W.; Cai, B.; Hu, J.; Qin, Q.; Xie, K. Visual-textual cross-modal interaction network for radiology report generation. IEEE Signal Process. Lett. 2024, 31, 984–988. [Google Scholar] [CrossRef]
- Zhang, S.; Zhou, C.; Chen, L.; Li, Z.; Gao, Y.; Chen, Y. Visual prior-based cross-modal alignment network for radiology report generation. Comput. Biol. Med. 2023, 166, 107522. [Google Scholar] [CrossRef]
- Shahzadi, I.; Madni, T.M.; Janjua, U.I.; Batool, G.; Naz, B.; Ali, M.Q.; et al. CSAMDT: Conditional self-attention memory-driven transformers for radiology report generation from chest X-ray. J. Imaging Inform. Med. 2024, 37(6), 2825–2837. [Google Scholar] [CrossRef] [PubMed]
- Varol Arısoy, M.; Arısoy, A.; Uysal, I. A vision attention driven language framework for medical report generation. Sci. Rep. 2025, 15(1), 10704. [Google Scholar] [CrossRef]
- Fang, J.; Xing, S.; Li, K.; Guo, Z.; Li, G.; Yu, C. Automated generation of chest X-ray imaging diagnostic reports by multimodal and multi-granularity features fusion. Biomed. Signal Process. Control 2025, 105, 107562. [Google Scholar] [CrossRef]
- Zheng, Z.; Zhang, Y.; Liang, E.; Weng, Z.; Chai, J.; Li, J. TRINet: Team role interaction network for automatic radiology report generation. Comput. Biol. Med. 2024, 183, 109275. [Google Scholar] [CrossRef] [PubMed]
- Vendrow, E.; Schonfeld, E. Understanding transfer learning for chest radiograph clinical report generation with modified transformer architectures. Heliyon 2023, 9(7), e17968. [Google Scholar] [CrossRef]
- Mohsan, M.M.; Akram, M.U.; Rasool, G.; Alghamdi, N.S.; Baqai, M.A.A.; Abbas, M. Vision transformer and language model-based radiology report generation. IEEE Access 2023, 11, 1814–1824. [Google Scholar] [CrossRef]
- Veras Magalhaes, G.; De, S. Santos, R.L.; Vogado, L.H.S.; Cardoso De Paiva, A.; De Alcantara Dos Santos Neto, P. XRaySwinGen: Automatic medical reporting for X-ray exams with multimodal model. Heliyon 2024, 10(7), e27516. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Han, H.; Wang, L.; Li, X.; Zhou, L. Automated radiographic report generation purely on transformer: A multicriteria supervised approach. IEEE Trans. Med. Imaging 2022, 41(10), 2803–2813. [Google Scholar] [CrossRef] [PubMed]
- Zeiser, F.A.; Da Costa, C.A.; De Oliveira Ramos, G.; Maier, A.; Da Rosa Righi, R. CheXReport: A transformer-based architecture to generate chest X-ray reports suggestions. Expert Syst. Appl. 2024, 255, 124644. [Google Scholar] [CrossRef]
- Leonardi, G.; Portinale, L.; Santomauro, A. Enhancing radiology report generation through pre-trained language models. Prog. Artif. Intell. 2024. [Google Scholar] [CrossRef]
- Batool, H.; Mukhtar, A.; Khawaja, S.G.; Alghamdi, N.S.; Khan, A.M.; Qayyum, A.; et al. Knowledge distillation and transformer-based framework for automatic spine CT report generation. IEEE Access 2025, 13, 42949–42964. [Google Scholar] [CrossRef]
- Zhao, J.; Yao, W.; Sun, L.; Shi, L.; Kuang, Z.; Wu, C.; et al. Automated chest X-ray diagnosis report generation with cross-attention mechanism. Appl. Sci. 2025, 15(1), 343. [Google Scholar] [CrossRef]
- Alfarghaly, O.; Khaled, R.; Elkorany, A.; Helal, M.; Fahmy, A. Automated radiology report generation using conditioned transformers. Informatics Med. Unlocked 2021, 24, 100557. [Google Scholar] [CrossRef]
- Singh, P.; Singh, S. ChestX-Transcribe: A multimodal transformer for automated radiology report generation from chest X-rays. Front. Digit. Health 2025, 7, 1535168. [Google Scholar] [CrossRef]
- Raminedi, S.; Shridevi, S.; Won, D. Multi-modal transformer architecture for medical image analysis and automated report generation. Sci. Rep. 2024, 14(1), 19281. [Google Scholar] [CrossRef]
- Yi, X.; Fu, Y.; Liu, R.; Hu, Y.; Zhang, H.; Hua, R. TSGET: Two-stage global enhanced transformer for automatic radiology report generation. IEEE J. Biomed. Health Inform. 2024, 28(4), 2152–2162. [Google Scholar] [CrossRef]
- Xu, L.; Tang, Q.; Zheng, B.; Lv, J.; Li, W.; Zeng, X. CGFTrans: Cross-modal global feature fusion transformer for medical report generation. IEEE J. Biomed. Health Inform. 2024, 28(9), 5600–5612. [Google Scholar] [CrossRef]
- Wang, R.; Hua, R. Generating radiology reports via multi-feature optimization transformer. KSII Trans. Internet Inf. Syst. 2023, 17(10). [Google Scholar] [CrossRef]
- Zhang, J.; Shen, X.; Wan, S.; Goudos, S.K.; Wu, J.; Cheng, M.; et al. A novel deep learning model for medical report generation by inter-intra information calibration. IEEE J. Biomed. Health Inform. 2023, 27(10), 5110–5121. [Google Scholar] [CrossRef]
- Hou, X.; Liu, Z.; Li, X.; Li, X.; Sang, S.; Zhang, Y. MKCL: Medical knowledge with contrastive learning model for radiology report generation. J. Biomed. Inform. 2023, 146, 104496. [Google Scholar] [CrossRef]
- Zhao, G.; Zhao, Z.; Gong, W.; Li, F. Radiology report generation with medical knowledge and multilevel image-report alignment: A new method and its verification. Artif. Intell. Med. 2023, 146, 102714. [Google Scholar] [CrossRef] [PubMed]
- Gao, N.; Yao, R.; Liang, R.; Chen, P.; Liu, T.; Dang, Y. Multi-level objective alignment transformer for fine-grained oral panoramic X-ray report generation. IEEE Trans. Multimedia 2024, 26, 7462–7474. [Google Scholar] [CrossRef]
- Guo, K.; Zheng, S.; Huang, R.; Gao, R. Multi-task learning for lung disease classification and report generation via prior graph structure and contrastive learning. IEEE Access 2023, 11, 110888–110898. [Google Scholar] [CrossRef]
- Yang, Y.; You, X.; Zhang, K.; Fu, Z.; Wang, X.; Ding, J.; et al. Spatio-temporal and retrieval-augmented modelling for chest X-ray report generation. IEEE Trans. Med. Imaging 2025, 1–11. [Google Scholar] [CrossRef]
- Dong, Z.; Lian, J.; Zhang, X.; Zhang, B.; Liu, J.; Zhang, J.; et al. A chest imaging diagnosis report generation method based on dual-channel transmodal memory network. Biomed. Signal Process. Control 2025, 100, 107021. [Google Scholar] [CrossRef]
- Xu, D.; Zhu, H.; Huang, Y.; Jin, Z.; Ding, W.; Li, H.; et al. Vision-knowledge fusion model for multi-domain medical report generation. Inf. Fusion 2023, 97, 101817. [Google Scholar] [CrossRef]
- Liu, Y.; Zhang, J.; Liu, K.; Tan, L. ADCNet: Anomaly-driven cross-modal contrastive network for medical report generation. Electronics 2025, 14(3), 532. [Google Scholar] [CrossRef]
- Yan, S.; Cheung, W.K.; Chiu, K.; Tong, T.M.; Cheung, K.C.; See, S. Attributed abnormality graph embedding for clinically accurate X-ray report generation. IEEE Trans. Med. Imaging 2023, 42(8), 2211–2222. [Google Scholar] [CrossRef]
- Ran, R.; Pan, R.; Yang, W.; Deng, Y.; Zhang, W.; Hu, W.; et al. MeFD-Net: Multi-expert fusion diagnostic network for generating radiology image reports. Appl. Intell. 2024, 54(22), 11484–11495. [Google Scholar] [CrossRef]
- Yang, B.; Lei, H.; Huang, H.; Han, X.; Cai, Y. DPN: Dynamics priori networks for radiology report generation. Tsinghua Sci. Technol. 2025, 30(2), 600–609. [Google Scholar] [CrossRef]
- Yang, B.; Lei, H.; Huang, H.; Han, X.; Cai, Y. DPN: Dynamics priori networks for radiology report generation. Tsinghua Sci. Technol. 2025, 30(2), 600–609. [Google Scholar] [CrossRef]
- Alotaibi, F.S.; Kaur, N. Radiological report generation from chest X-ray images using pre-trained word embeddings. Wirel. Pers. Commun. 2023, 133(4), 2525–2540. [Google Scholar] [CrossRef]
- Sun, S.; Su, Z.; Meizhou, J.; Feng, Y.; Hu, Q.; Luo, J.; et al. Optimizing medical image report generation through a discrete diffusion framework. J. Supercomput. 2025, 81(5), 637. [Google Scholar] [CrossRef]
- Kaur, N.; Mittal, A. RadioBERT: A deep learning-based system for medical report generation from chest X-ray images using contextual embeddings. J. Biomed. Inform. 2022, 135, 104220. [Google Scholar] [CrossRef]
- Mei, X.; Yang, L.; Gao, D.; Cai, X.; Han, J.; Liu, T. Adaptive medical topic learning for enhanced fine-grained cross-modal alignment in medical report generation. IEEE Trans. Multimedia 2025, 1–12. [Google Scholar] [CrossRef]
- Tang, Y.; Yuan, Y.; Tao, F.; Tang, M. Cross-modal augmented transformer for automated medical report generation. IEEE J. Transl. Eng. Health Med. 2025, 13, 33–48. [Google Scholar] [CrossRef] [PubMed]
- Najdenkoska, I.; Zhen, X.; Worring, M.; Shao, L. Uncertainty-aware report generation for chest X-rays by variational topic inference. Med. Image Anal. 2022, 82, 102603. [Google Scholar] [CrossRef]
- Zhang, J.; Cheng, M.; Li, X.; Shen, X.; Wan, Y.; Zhu, J.; et al. Generating medical report via joint probability graph reasoning. Tsinghua Sci. Technol. 2025, 30(4), 1685–1699. [Google Scholar] [CrossRef]
- Huang, L.; Cao, Y.; Jia, P.; Li, C.; Tang, J.; Li, C. Knowledge-guided cross-modal alignment and progressive fusion for chest X-ray report generation. IEEE Trans. Multimedia 2025, 27, 557–567. [Google Scholar] [CrossRef]
- Yi, X.; Fu, Y.; Yu, J.; Liu, R.; Zhang, H.; Hua, R. LHR-RFL: Linear hybrid-reward-based reinforced focal learning for automatic radiology report generation. IEEE Trans. Med. Imaging 2025, 44(3), 1494–1504. [Google Scholar] [CrossRef]
- Zhu, D.; Liu, L.; Yang, X.; Liu, L.; Peng, W. Denoising multi-level cross-attention and contrastive learning for chest radiology report generation. J. Imaging Inform. Med. 2025. [Google Scholar] [CrossRef]
- Li, H.; Liu, X.; Jia, D.; Chen, Y.; Hou, P.; Li, H. Research on chest radiography recognition model based on deep learning. Math. Biosci. Eng. 2022, 19(11), 11768–11781. [Google Scholar] [CrossRef]
- Sirshar, M.; Paracha, M.F.K.; Akram, M.U.; Alghamdi, N.S.; Zaidi, S.Z.Y.; Fatima, T. Attention-based automated radiology report generation using CNN and LSTM. PLOS ONE 2022, 17(1), e0262209. [Google Scholar] [CrossRef]
- Hou, D.; Zhao, Z.; Liu, Y.; Chang, F.; Hu, S. Automatic report generation for chest X-ray images via adversarial reinforcement learning. IEEE Access 2021, 9, 21236–21250. [Google Scholar] [CrossRef]
- Kaur, N.; Mittal, A. CADxReport: Chest X-ray report generation using co-attention mechanism and reinforcement learning. Comput. Biol. Med. 2022, 145, 105498. [Google Scholar] [CrossRef] [PubMed]
- Ucan, M.; Kaya, B.; Kaya, M. Generating medical reports with a novel deep learning architecture. Int. J. Imaging Syst. Technol. 2025, 35(2), e70062. [Google Scholar] [CrossRef]
- Yang, Y.; Yu, J.; Zhang, J.; Han, W.; Jiang, H.; Huang, Q. Joint embedding of deep visual and semantic features for medical image report generation. IEEE Trans. Multimedia 2023, 25, 167–178. [Google Scholar] [CrossRef]
- Gajbhiye, G.O.; Nandedkar, A.V.; Faye, I. Translating medical image to radiological report: Adaptive multilevel multi-attention approach. Comput. Methods Programs Biomed. 2022, 221, 106853. [Google Scholar] [CrossRef]
- Kaur, N.; Mittal, A. CheXPrune: Sparse chest X-ray report generation model using multi-attention and one-shot global pruning. J. Ambient Intell. Humaniz. Comput. 2023, 14(6), 7485–7497. [Google Scholar] [CrossRef] [PubMed]
- Shetty, S.; Ananthanarayana, V.S.; Mahale, A. Cross-modal deep learning-based clinical recommendation system for radiology report generation from chest X-rays. Int. J. Eng. 2023, 36(8), 1569–1577. [Google Scholar] [CrossRef]
- Zeng, X.; Wen, L.; Xu, Y.; Ji, C. Generating diagnostic report for medical image by high-middle-level visual information incorporation on double deep learning models. Comput. Methods Programs Biomed. 2020, 197, 105700. [Google Scholar] [CrossRef]
- Xu, Z.; Xu, W.; Wang, R.; Chen, J.; Qi, C.; Lukasiewicz, T. Hybrid reinforced medical report generation with M-linear attention and repetition penalty. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36(2), 2206–2222. [Google Scholar] [CrossRef]
- Gu, Y.; Li, R.; Wang, X.; Zhou, Z. Automatic medical report generation based on cross-view attention and visual-semantic long short-term memorys. Bioengineering 2023, 10(8), 966. [Google Scholar] [CrossRef]
- Zhang, J.; Cheng, M.; Cheng, Q.; Shen, X.; Wan, Y.; Zhu, J.; et al. Hierarchical medical image report adversarial generation with hybrid discriminator. Artif. Intell. Med. 2024, 151, 102846. [Google Scholar] [CrossRef]
- Paalvast, O.; Nauta, M.; Koelle, M.; Geerdink, J.; Vijlbrief, O.; Hegeman, J.H.; et al. Radiology report generation for proximal femur fractures using deep classification and language generation models. Artif. Intell. Med. 2022, 128, 102281. [Google Scholar] [CrossRef]
- Loveymi, S.; Dezfoulian, M.H.; Mansoorizadeh, M. Automatic generation of structured radiology reports for volumetric computed tomography images using question-specific deep feature extraction and learning. J. Med. Signals Sens. 2021, 11(3), 194–207. [Google Scholar] [CrossRef]
- Sun, S.; Mei, Z.; Li, X.; Tang, T.; Li, Z.; Wu, Y. A label information fused medical image report generation framework. Artif. Intell. Med. 2024, 150, 102823. [Google Scholar] [CrossRef]
- Zhang, D.; Ren, A.; Liang, J.; Liu, Q.; Wang, H.; Ma, Y. Improving medical X-ray report generation by using knowledge graph. Appl. Sci. 2022, 12(21), 11111. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; et al. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS); 2017.
- Chen, Z.; Song, Y.; Chang, T.H.; Wan, X. Generating radiology reports via memory-driven transformer. arXiv preprint 2022, arXiv:2010.16056. https://github.com/zhjohnchan/R2Gen.
- Jing, B.; Xie, P.; Xing, E.P. On the automatic generation of medical imaging reports. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP); 2018. [Google Scholar]
- Liu, F.; Wu, X.; Ge, S.; Fan, W.; Zou, Y. Exploring and distilling posterior and prior knowledge for radiology report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2021. [Google Scholar] [CrossRef]
- Yang, S.; Wu, X.; Ge, S.; Zhou, S.K.; Xiao, L. Knowledge matters: Chest radiology report generation with general and specific knowledge. Med. Image Anal. 2022, 80, 102510. [Google Scholar] [CrossRef] [PubMed]
- Tanida, T.; Müller, P.; Kaissis, G.; Rueckert, D. Interactive and explainable region-guided radiology report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023. https://github.com/ttanida/rgrg.
- Hou, W.; Cheng, Y.; Xu, K.; Li, W.; Liu, J. Recap: Towards precise radiology report generation via dynamic disease progression reasoning. arXiv preprint 2023, arXiv:2310.13864. https://github.com/wjhou/Recap.
- Li, Y.; Yang, B.; Cheng, X.; Zhu, Z.; Li, H.; Zou, Y.; et al. Unify, align and refine: Multi-level semantic alignment for radiology report generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); 2023; pp. 2851–2861. [Google Scholar] [CrossRef]
- Nooralahzadeh, F.; Perez-Gonzalez, N.; Frauenfelder, T.; Fujimoto, K.; Krauthammer, M. Progressive transformer-based generation of radiology reports. arXiv preprint 2021, arXiv:2102.09777. https://github.com/uzh-dqbm-cmi/ARGON.
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL); 2002; pp. 311–318. [Google Scholar]
- Lin, C.Y. ROUGE: A package for automatic evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out (ACL), Barcelona, Spain; 2004. [Google Scholar]
- Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K.Q.; Artzi, Y. BERTScore: Evaluating text generation with BERT. In Proceedings of the International Conference on Learning Representations (ICLR); 2020. https://github.com/Tiiiger/bert_score.
- Vedantam, R.; Zitnick, C.L.; Parikh, D. CIDEr: Consensus-based image description evaluation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2015; pp. 4566–4575. [Google Scholar] [CrossRef]
- Yu, F.; Endo, M.; Krishnan, R.; Pan, I.; Tsai, A.; Pontes-Reis, E.; et al. Evaluating progress in automatic chest X-ray radiology report generation. Patterns 2023, 4, 100802. [Google Scholar] [CrossRef] [PubMed]
- Kaur, N.; Mittal, A.; Singh, G. Methods for automatic generation of radiological reports of chest radiographs: A comprehensive survey. Multimed. Tools Appl. 2022, 81, 13409–13439. [Google Scholar] [CrossRef]
- Mahmoud, M.; Monshi, A.; Poon, J.; Chung, V. Deep learning in generating radiology reports: A survey. Artif. Intell. Med. 2020, 106, 101878. [Google Scholar] [CrossRef]
- Liao, Y.; Liu, H.; Spasic, I. Deep learning approaches to automatic radiology report generation: A systematic review. Informatics Med. Unlocked 2023, 39, 101273. [Google Scholar] [CrossRef]





| Database | Field | Search Expression | Results |
|---|---|---|---|
| PubMed | [tiab] | (“Convolutional Neural Network*” OR CNN OR “Recurrent Neural Network*” OR RNN OR LSTM OR GRU OR Transformer OR Transformers OR “Attention Mechanism” OR “Encoder Decoder” OR “Sequence to Sequence” OR “Graph Neural Network*” OR GNN OR GCN OR GAT OR “Deep Learning” OR “Neural Network” OR “Neural Networks”) AND (Radiology OR Radiolog* OR “Medical Imag*” OR “Diagnostic Imag*” OR X-ray OR CT OR MRI OR PET) AND (“Report Generation” OR “Text Generation” OR “Narrative Generation” OR “Automatic Report*” OR “Clinical Report*” OR “Medical Report*”) | 158 |
| Scopus | - | Same expression as PubMed, without specific field restriction | 259 |
| Web of Science | TS= | Same Boolean expression adapted to the TS= field for topic-based search | 217 |
| IEEE Xplore | - | Same Boolean expression adjusted to the syntax requirements of the respective database | 79 |
| ACM DL | - | Same Boolean expression adjusted to the syntax requirements of the respective database | 301 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
