A Proof-of-Concept Multimodal Framework for Transthyretin Amyloidosis Classification from Clinical Narratives and ECG Images

Sofia Pagano; Cristina Perri; Marika Votta; Ugo Lomoio; Pierangelo Veltri; Pietro Hiram Guzzi

doi:10.20944/preprints202605.1601.v1

Submitted:

24 May 2026

Posted:

25 May 2026

You are already at the latest version

Abstract

Transthyretin amyloidosis (ATTR) is a rare, progressive, and fre- quently under-recognized systemic disease whose early clinical manifes- tations may overlap with common cardiac and neurological conditions. This work presents a proof-of-concept multimodal deep-learning frame- work for the binary classification of ATTR and non-ATTR cases by jointly analysing structured clinical narratives and electrocardiogram (ECG) im- ages. A curated dataset of 100 cases was assembled, including 60 literature- derived cases and 40 synthetic cases generated through a multimodal Gen- erative Adversarial Network (GAN) to mitigate data scarcity and class imbalance. The proposed architecture combines a frozen ResNet-50 vi- sual encoder with a frozen Italian BERT textual encoder; the resulting modality-specific embeddings are projected into a shared latent space and fused through a multilayer perceptron classifier. In a held-out test subset, the final multimodal configuration achieved an overall accuracy of 73%, an AUC-ROC of 0.78, and an ATTR recall of 0.89, indicating promising sensitivity for a preliminary screening scenario. These results should be interpreted as evidence of methodological feasibility rather than clinical readiness, given the limited cohort size, the inclusion of synthetic data, and the absence of external multicentre validation. Overall, the study sup- ports the feasibility of multimodal AI for rare-disease triage and provides a foundation for larger, fully validated clinical investigations.

Keywords:

transthyretin amyloidosis

;

rare diseases

;

multimodal deep learning

;

ECG

;

BERT

;

ResNet50

;

proof of concept

;

clinical decision support

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

Artificial intelligence (AI) and machine learning (ML) are increasingly being explored as tools to support clinical decision-making, particularly when heterogeneous biomedical information must be interpreted jointly [1,2,3]. Recent progress in combining large language models and deep visual encoders has enabled multimodal architectures capable of processing textual and imaging data within a single analytical pipeline [4,5]. Such approaches are especially relevant in complex diagnostic settings, where clinical narratives, instrumental findings, and subtle visual patterns may collectively contribute to disease recognition [6,7].

This study focuses on transthyretin amyloidosis (ATTR), a rare systemic disorder characterized by extracellular deposition of amyloid fibrils derived from misfolded transthyretin [8,9]. ATTR may involve the heart, peripheral nervous system, and other organs, and its early manifestations are often nonspecific. As a result, diagnosis is frequently delayed, limiting the timely use of disease-modifying therapies [10,11]. For this reason, computational tools able to support preliminary triage and guide clinical suspicion may be valuable, provided that their performance is rigorously validated before any real-world deployment.

We propose a proof-of-concept multimodal framework for automated ATTR classification based on clinical text and ECG images. The framework uses a hybrid dataset composed of textual clinical descriptions and electrocardiogram (ECG) images, with additional synthetic cases generated through Generative Adversarial Networks (GANs) to address the data scarcity typical of rare diseases [12]. Architecturally, the model combines a CNN-based visual encoder, ResNet50 [13], and a transformer-based textual encoder, Italian BERT [14], through a neural fusion classifier designed to integrate complementary visual and linguistic representations. Figure 1 summarizes the analytical workflow.

The contribution of this work is twofold. Methodologically, it evaluates whether pretrained textual and visual encoders can be combined to generate a unified representation for rare-disease classification. Clinically, it investigates whether such a multimodal strategy can support early suspicion of ATTR in a controlled experimental setting. Importantly, the study is intentionally framed as a proof of concept: the objective is to test feasibility, identify methodological constraints, and define the requirements for subsequent external validation, rather than to propose a clinically deployable diagnostic system.

2. Methods

2.1. Operational Context and Dataset

For model development and preliminary validation, we constructed an original multimodal dataset consisting of 100 clinical cases designed to integrate structured textual descriptions and ECG tracings. The dataset included 60 real cases extracted from specialized scientific literature [15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58] and 40 synthetic cases generated using GAN-based augmentation. The synthetic cases were used to improve class balance and increase the variability available for this proof-of-concept experiment.

ECG images were standardized by converting the tracings into digital images with a resolution of

224 \times 224

pixels, matching the input requirements of the visual encoder. Clinical information was organized into structured fields, including case identifier, age, sex, symptoms, and ECG description. Each record preserved a one-to-one correspondence between the tabular description and the corresponding ECG image, thereby enabling paired multimodal learning.

Figure 2. Preview of the multimodal dataset structure. Each case links a structured textual record to the corresponding ECG image.

2.1.1. Data Augmentation

To mitigate the data scarcity that commonly affects rare-disease studies, we augmented the dataset using a multimodal Generative Adversarial Network. The generator received a latent noise vector (

z = 100

) and a semantic condition as input, enabling the generation of paired artificial clinical descriptions and ECG tracings. The GAN was trained for 500 epochs with a batch size of 16 using the Adam optimizer, a learning rate of

2 \times 10^{- 4}

,

β_{1} = 0.5

, and

β_{2} = 0.999

. A fixed random seed of 42 was used to support reproducibility.

After generation, records were shuffled at the case level to preserve alignment between textual and visual modalities. The resulting dataset provided a balanced experimental setting for comparing unimodal and multimodal classifiers. Because synthetic data may not fully capture the biological and clinical variability of real patients, all results are interpreted as feasibility evidence and require confirmation on larger real-world cohorts.

As a preliminary exploratory step, sample images were inspected and relationship networks were constructed using NetworkX to connect patients by sex and diagnostic class. These graphs, shown in Figure 3 and Figure 4, were used only for descriptive visualization and were not included as model features.

2.2. Pre-Processing

Textual and imaging data underwent harmonized pre-processing to ensure consistency across modalities. Clinical narratives were normalized by removing non-alphabetic characters, standardizing spacing, and lowercasing. Tokenization was followed by Italian stop-word removal and lemmatization. Key attributes, including age, sex, symptoms, and ECG description, were integrated into a single structured text string, while diagnostic labels were binarized as ATTR or NO_ATTR.

Figure 5. Structure of the consolidated multimodal dataset used as model input. Heterogeneous features were mapped into a standardized format containing the case identifier (case_id), binary diagnostic label (label), structured clinical text (text), and ECG image path (image_path).

Text sequences were processed using the Italian BERT tokenizer with fixed-length padding and truncation. ECG images were converted to RGB and resized to

224 \times 224

pixels. During training, light image augmentation, including rotation, horizontal flipping, and controlled brightness/contrast variation, was applied to increase visual variability. Validation and test images were only resized and normalized using ImageNet statistics to ensure compatibility with the pretrained ResNet50 encoder.

2.3. Multimodal Architecture

The model adopted a multimodal design integrating two pretrained encoders: a visual branch and a textual branch. Both encoders were frozen during training to reduce computational cost, limit overfitting, and focus learning on cross-modal fusion. The visual encoder was a ResNet50 network pretrained on ImageNet1K_V2, with the classification head replaced by an identity layer to output 2,048-dimensional feature vectors describing high-level ECG morphology. The textual encoder was Italian BERT, which produced a 768-dimensional embedding derived from the [CLS] token.

Feature vectors from the two branches were independently projected to 512-dimensional embeddings using linear layers with ReLU activation and dropout (

p = 0.2

). The resulting embeddings were concatenated and passed to a multilayer perceptron composed of a Linear–ReLU–Dropout–Linear sequence, generating logits for binary classification. Softmax transformation was applied during inference to obtain normalized class probabilities.

Data handling was managed using a custom ECGMultimodalDataset class responsible for loading ECG images, tokenizing textual inputs, and returning the corresponding tensors and labels. A dedicated collation function ensured homogeneous batching for efficient parallel computation. Training and evaluation relied on PyTorch DataLoader objects with batch size 8; shuffling was enabled during training and disabled during validation and testing.

Figure 6. Overview of the multimodal architecture for automated ATTR classification. A frozen ResNet50 encoder extracts visual features from ECG images, while a frozen Italian BERT model extracts semantic representations from clinical narratives. Each modality is projected into a 512-dimensional latent space, concatenated, and processed by a multilayer perceptron for binary classification.

2.3.1. Detailed Architecture

The processing pipeline comprises three stages: extraction, projection, and classification. During feature extraction, ResNet50 produces 2,048-dimensional ECG representations and Italian BERT produces 768-dimensional textual embeddings. During projection, both representations are mapped into a common 512-dimensional latent space using Linear–ReLU–Dropout blocks. This step balances the contribution of each modality before fusion.

The concatenation of the two projected embeddings produces a 1,024-dimensional multimodal representation, which is then passed to the final classification module. A sequence of dense layers maps the fused vector to a two-dimensional logit output. Softmax calibration provides the predicted class (ATTR or NO_ATTR) and an associated confidence score, supporting transparent interpretation of the model output within this experimental setting.

Figure 7. Implementation details of the multimodal architecture. (a) ECG images and structured clinical narratives are processed by two frozen encoders. The visual branch uses ResNet50 pretrained on ImageNet1K_V2; the textual branch uses Italian BERT. (b) Modality-specific feature vectors are independently projected into a shared 512-dimensional latent space. (c) The concatenated 1,024-dimensional representation is passed through a multilayer perceptron that outputs class logits and Softmax probabilities for ATTR and NO_ATTR.

2.4. Experimental Protocol

After pre-processing and consistency checks, incomplete records and records with missing image references were removed. The cleaned dataset was partitioned into a training–validation subset (85%) and a held-out test subset (15%) using a stratified split to preserve the class distribution between ATTR and NO_ATTR cases. All experiments were performed in Python on Google Colab with a fixed random seed of 42.

2.5. Cross-Validation and Training

Model development followed a five-fold stratified cross-validation design on the training–validation subset. For each fold, a fresh instance of the multimodal classifier was initialized with frozen encoders and randomly initialized fusion layers. Training was conducted for six epochs using the AdamW optimizer with a learning rate of

2 \times 10^{- 3}

and Cross-Entropy Loss. Gradient clipping with a threshold of 1.0 was applied to improve numerical stability, and a linear learning-rate scheduler without warm-up was used.

For each fold, the model achieving the highest validation F1-score was checkpointed. After cross-validation, the best-performing model was reloaded and evaluated on the held-out test subset to estimate generalization within the constructed proof-of-concept dataset.

3. Results

3.1. Unimodal Comparative Analysis and Baseline

The initial modelling phase evaluated baseline classifiers trained on the 60 real cases only. This step was used to quantify the limitations imposed by small sample size and class imbalance before introducing GAN-based augmentation. Three representative algorithms were tested: Random Forest, Multi-Layer Perceptron, and Logistic Regression.

Random Forest (RF). Random Forest was selected because ensemble decision trees can model nonlinear relationships and are relatively robust to noisy features. In this setting, however, the limited number of cases favoured overfitting and reduced generalization to unseen samples.

Figure 8. Classification report for the Random Forest baseline trained on real cases only.

Multi-Layer Perceptron (MLP). The MLP was used to assess whether a neural classifier could capture nonlinear relationships among clinical and ECG-derived features. Despite its flexibility, convergence was unstable because the number of available samples was insufficient to reliably optimize the hidden-layer weights.

Figure 9. Classification report for the MLP baseline trained on real cases only.

Logistic Regression (LR). Logistic Regression was included as an interpretable linear baseline. Although useful for comparison, it showed limited capacity to separate classes when symptom descriptions and ECG patterns overlapped between ATTR and NO_ATTR cases.

Figure 10. Classification report for the Logistic Regression baseline trained on real cases only.

Overall, the baseline models achieved suboptimal F1-score and recall values, indicating that the real-only dataset was insufficient for a reliable screening-oriented classifier. This finding motivated the use of GAN-based augmentation as an experimental strategy for class balancing and proof-of-concept evaluation.

3.2. Impact of Data Augmentation on Classifiers

Expanding the dataset to 100 instances (60 real and 40 synthetic) improved the stability of the tested classifiers and reduced the imbalance between diagnostic classes. The augmented dataset allowed a more controlled comparison of algorithms, while remaining limited in scale and therefore unsuitable for definitive clinical conclusions.

Random Forest with GAN. The augmented dataset reduced overfitting relative to the real-only baseline. The additional variability introduced by synthetic examples acted as a form of regularization and improved the ability of the classifier to identify more stable patterns.

Figure 11. Classification report for the Random Forest classifier after GAN-based augmentation.

Logistic Regression with GAN. Logistic Regression benefited from improved class balance, producing more stable coefficients and better linear separation. Nevertheless, the model remained limited in its ability to capture complex nonlinear interactions between textual and imaging information.

Figure 12. Classification report for the Logistic Regression classifier after GAN-based augmentation.

Multi-Layer Perceptron with GAN. The MLP produced the strongest overall performance among the tested classifiers. The larger and more balanced dataset allowed the neural classifier to optimize its weights more effectively, improving the balance between precision and recall.

The main improvement associated with augmentation was observed in the recall of the ATTR class. While models trained only on real data tended to favour the majority class, the synthetic contribution increased sensitivity to ATTR cases. This characteristic is relevant for screening-oriented applications, where false negatives may carry greater clinical risk than false positives. Based on its ability to fuse features derived from BERT and ResNet50, the MLP with GAN-based augmentation was selected as the final proof-of-concept configuration.

3.3. Performance Evaluation and Metrics

The final multimodal framework was evaluated on a held-out test subset to estimate generalization within the constructed dataset. Performance was quantified using standard binary classification metrics, including accuracy, precision, recall, F1-score, and AUC-ROC.

The model achieved an overall accuracy of 73% and an AUC-ROC of 0.78. These values suggest encouraging discrimination between ATTR and NO_ATTR cases in this preliminary setting, but they should not be interpreted as evidence of clinical validity without external validation.

Consistent with a screening-oriented objective, the model achieved a recall of 0.89 for the ATTR class, corresponding to 8 correctly identified ATTR cases out of 9. Precision was 0.73 for the ATTR class and 0.75 for the NO_ATTR class, indicating a reasonable balance between sensitivity and false-positive management within the small test subset.

Figure 13. Classification report for the final multimodal proof-of-concept classifier.

Figure 14. Receiver Operating Characteristic curve for the final multimodal classifier.

Figure 15. Precision–Recall curve for the final multimodal classifier.

The normalized confusion matrix showed lower recall for the NO_ATTR class (0.50), with three false positives among six negative cases. This pattern indicates a cautious decision profile that favours ATTR detection over specificity. Such behaviour may be acceptable in an early triage scenario, provided that positive predictions are followed by confirmatory diagnostic tests. However, the small number of test cases makes this interpretation preliminary.

Figure 16. Normalized confusion matrix for the final multimodal classifier.

3.4. Inference Pipeline

To emulate a possible single-case workflow, we implemented an inference module that receives an ECG image and the corresponding clinical narrative [59]. The module encodes both modalities, applies multimodal fusion, and outputs the predicted class (ATTR or NO_ATTR) together with a Softmax confidence score. In the illustrative case analysed here, the model returned a confidence score of 0.88. This interface supports qualitative inspection of model behaviour, but it should be considered an experimental prototype rather than a clinically validated tool.

4. Discussion

This proof-of-concept study suggests that integrating clinical narratives and ECG images through a multimodal deep-learning pipeline can improve the preliminary classification of ATTR compared with unimodal or real-only baselines. The final configuration combined frozen pretrained encoders with a lightweight fusion classifier, allowing the model to exploit complementary textual and visual cues while limiting the number of trainable parameters. This design is appropriate for small-data contexts, where full fine-tuning of large encoders would increase the risk of overfitting.

The most relevant result was the high recall observed for the ATTR class. In a screening-oriented scenario, sensitivity is particularly important because missed cases may prolong diagnostic delay. The model’s tendency to produce false positives should therefore be interpreted in relation to its intended role: it is not a replacement for specialist assessment, but a possible triage component that could prompt confirmatory testing when clinical suspicion is high.

The Precision–Recall analysis further supports this interpretation, showing that decision-threshold selection can favour sensitivity while maintaining acceptable precision in a preliminary setting. Nevertheless, these results are based on a very small test subset and must be interpreted cautiously. In particular, the step-like shape of the Precision–Recall curve reflects the limited number of test examples and underscores the instability of performance estimates in small cohorts.

4.1. Limitations and Future Work

Several limitations define the scope of this study. First, the dataset is small and includes only 60 real cases, making it insufficient for robust clinical validation. Second, although GAN-based augmentation improves experimental balance, synthetic samples may introduce artefacts or fail to reproduce the full heterogeneity of real-world ATTR presentations. Third, the test subset contained only 15 cases, so the reported metrics are sensitive to individual classification errors. Fourth, the framework has not yet been evaluated on external multicentre cohorts, prospective data, or data acquired under routine clinical conditions.

Future work should therefore focus on expanding the dataset with fully real, multicentre cases; separating synthetic data from final clinical testing; evaluating calibration and uncertainty; and integrating additional modalities such as echocardiographic findings, laboratory markers, genetic information, and longitudinal follow-up. Explainability analyses should also be incorporated to identify which textual and ECG patterns drive model predictions, thereby increasing clinical interpretability and supporting expert review.

5. Conclusion

This work presents a proof-of-concept multimodal framework for ATTR classification based on the fusion of clinical narratives and ECG images. By combining frozen ResNet50 and Italian BERT encoders with an MLP fusion classifier, the model achieved promising preliminary sensitivity for ATTR detection in a small held-out test subset. The findings support the methodological feasibility of multimodal AI for rare-disease triage, but they do not establish clinical readiness. Larger real-world cohorts, external validation, and rigorous evaluation of robustness, calibration, and interpretability are necessary before the framework can be considered for clinical decision-support use. Within these limits, the study provides a structured starting point for the development of multimodal diagnostic support systems for ATTR and other rare diseases characterized by heterogeneous clinical and instrumental evidence.

References

Rajpurkar, P.; Irvin, J.; Ball, R.L. Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 2018, 15, e1002686. [Google Scholar] [CrossRef] [PubMed]
Esteva, A.; Robicquet, A.; Ramsundar, B. A guide to deep learning in healthcare. Nat. Med. 2019, 25, 24–29. [Google Scholar] [CrossRef]
Lomoio, U.; Veltri, P.; Guzzi, P.H.; Liò, P. Design and use of a Denoising Convolutional Autoencoder for reconstructing electrocardiogram signals at super resolution. Artif. Intell. Med. 2025, 160, 103058. [Google Scholar] [CrossRef] [PubMed]
Brown, T.; Mann, B.; Ryder, N. Language Models are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Huang, Y.; Dong, Z.; Wang, J. Fusion of Visual and Language Model Features for Medical Diagnosis. IEEE Trans. Med. Imaging 2023, 42, 233–244. [Google Scholar]
Hosseinzadeh, M.M.; Cannataro, M.; Guzzi, P.H.; Dondi, R. Temporal networks in biology and medicine: a survey on models, algorithms, and tools. Netw. Model. Anal. Health Inform. Bioinform. 2022, 12, 10. [Google Scholar] [CrossRef]
Maurer, M.; Elliott, P.; Merlini, G. Diagnosis, Management, and Clinical Trials in Transthyretin Amyloidosis Cardiomyopathy. JACC Heart Fail. 2020, 8, 197–202. [Google Scholar]
Ruberg, F.; Berk, J. Transthyretin (TTR) Cardiac Amyloidosis. Circulation 2012, 126, 1286–1300. [Google Scholar] [CrossRef]
Maurer, M.; Rapezzi, C.; Planté-Bordeneuve, V. How to diagnose cardiac amyloidosis: a practical approach. Eur. Heart J. 2022, 43, 1539–1549. [Google Scholar]
Gillmore, J.; Damy, T.; Fontana, M. Diagnosis of transthyretin amyloidosis in clinical practice. Eur. J. Heart Fail. 2022, 24, 759–771. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M. Generative Adversarial Nets. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar]
He, K.; Zhang, X.; Ren, S. Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition 2016, 770–778. [Google Scholar]
Devlin, J.; Chang, M.; Lee, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Tanaka, K.; et al. Wild-type transthyretin amyloidosis: A case report. World J. Clin. Cases 2019, 7, 742–748. [Google Scholar]
Kittleson, M.M.; et al. Diagnosis and management of transthyretin amyloid cardiomyopathy. Front. Cardiovasc. Med. 2023. [Google Scholar]
Gagliardi, C.; et al. Transthyretin cardiac amyloidosis in the elderly. Eur. Heart J. 2020. [Google Scholar]
Niu, X.; et al. Transthyretin cardiac amyloidosis: a review. J. Geriatr. Cardiol. 2023, 20, 150–162. [Google Scholar]
Staff, C.E. Cardiac Amyloidosis: A Case Report. Cureus J. Med. Sci. 2024. [Google Scholar]
Staff, C.E. A Case of Transthyretin Cardiac Amyloidosis. Cureus J. Med. Sci. 2024. [Google Scholar]
Staff, C.E. Wild-Type Transthyretin Cardiac Amyloidosis. Cureus J. Med. Sci. 2024. [Google Scholar]
Garcia-Pavia, P.; et al. Diagnosis and treatment of transthyretin cardiac amyloidosis. J. Clin. Med. 2024. [Google Scholar]
Zhao, L.; et al. Heart failure as the initial presentation of transthyretin cardiac amyloidosis. Medicine 2019. [Google Scholar]
Martini, N.; et al. Atrial fibrillation and transthyretin cardiac amyloidosis. Ann. Noninvasive Electrocardiol. 2022. [Google Scholar]
Staff, C.E. Transthyretin Cardiac Amyloidosis: A Case Report on Diagnostic Challenges. Cureus J. Med. Sci. 2024. [Google Scholar]
Fujimoto, K.; et al. Case with transthyretin amyloid cardiomyopathy complicated with rapidly progressive aortic stenosis. J. Cardiol. Cases 2021. [Google Scholar]
Abchf, J. Clinical characteristics of transthyretin cardiac amyloidosis. ABC Heart Fail. 2022. [Google Scholar]
Wang, Z.; et al. Diagnostic value of echocardiography in cardiac amyloidosis. Cardiovasc. Innov. Appl. 2020. [Google Scholar]
Reports, C.C. ATTR-CA and renal involvement: a case study. Clin. Case Rep. 2024. [Google Scholar]
Journal, E.H. Multimodality imaging in cardiac amyloidosis. Eur. Heart J.-Case Rep. 2024. [Google Scholar]
Board, M.E. Systemic amyloidosis and cardiac involvement. Medicine (Baltimore) 2021. [Google Scholar]
Reports, O.C. Rare presentations of ATTR-CA. Oxford Medical Case Reports 2024. [Google Scholar]
Journal, I.H. Prevalence of ATTR-CA in South Asian populations. Indian Heart J. 2025. [Google Scholar]
Supplements, E.H.J. Advances in TTR stabilization therapy. Eur. Heart J. Suppl. 2021. [Google Scholar]
Cunha, C.; et al. Transthyretin Cardiac Amyloidosis: A Review of Current and Emerging Treatment Strategies. J. Cardiovasc. Dev. Dis. 2023. [Google Scholar]
Khouri, M.G.; et al. The role of multimodality imaging in the diagnosis of cardiac amyloidosis. In NIHMS/Global Cardiology Science and Practice; 2023. [Google Scholar]
Philippakis, A.; Falk, R.H. Cardiac amyloidosis mimicking hypertrophic cardiomyopathy with obstruction. Eur. Heart J. 2021. [Google Scholar]
of Clinical Medical Case Reports, I.J. Diagnostic challenges in ATTR-CA: A clinical prediction study. IJCMCR 2024. [Google Scholar]
Yousaf, A.; et al. Fabry Disease: A critical differential diagnosis in restrictive cardiomyopathy. Eur. Heart J.-Case Rep. 2024. [Google Scholar]
Main, A.; et al. Hypertrophic Cardiomyopathy: Clinical characteristics and differentiation from amyloidosis. Cardiol. Case Rep. 2024. [Google Scholar]
of the American College of Cardiology, J. ATTR-CM: Contemporary diagnosis and management. JACC: Case Reports 2022.
Group, P.R. Pathological findings in transthyretin cardiac amyloidosis. Pathology International 2023. [Google Scholar]
Xie, L.; et al. Chest pain in a patient with transthyretin cardiac amyloidosis: A case report. Clin. Case Rep. 2024. [Google Scholar]
Science, G.C.; Practice. Epidemiology and outcomes of cardiac amyloidosis. GCSP 2018.
Staff, C.E. Hypertrophic Cardiomyopathy: A Clinical Case Study. Cureus J. Med. Sci. 2023. [Google Scholar]
of Cardiology, J. Aortic Stenosis mimicking restrictive patterns. J. Cardiol. Cases 2021. [Google Scholar]
Board, H.F. Case reports in restrictive cardiomyopathy. ESC Heart Failure 2022. [Google Scholar]
Group, C.R. Hypertensive heart disease vs. Amyloidosis. J. Clin. Med. 2021. [Google Scholar]
Reports, M.C. Sarcoidosis with cardiac involvement. J. Med. Case Rep. 2024. [Google Scholar]
Journal, E.H. Clinical features of Fabry disease. EHJ Case Rep. 2024. [Google Scholar]
Reports, O.M. Endomyocardial fibrosis: a rare differential. Oxf. Med. Case Rep. 2024. [Google Scholar]
Board, J.R. Hemochromatosis and the heart. J. Integr. Cardiol. 2024. [Google Scholar]
Practice, C. Athletic heart syndrome vs. Pathology. Cardiology in Practice 2021. [Google Scholar]
Investigations, C. Myocarditis and restrictive patterns. J. Clin. Investig. 2021. [Google Scholar]
Zhang, J.; et al. Machine Learning for Cardiac Disease Diagnosis. arXiv 2025, arXiv:2506.06315. [Google Scholar]
Health, B.; Informatics, C. AI and Rare Disease Diagnostics in Clinical Practice. BMJ HCI 2024. [Google Scholar]
Board, C.R. Restricted Cardiomyopathy: Emerging Trends. Heart and Vessels 2023. [Google Scholar]
Experimental.; Medicine, T. Myocardial involvement in systemic diseases. ETM 2020.
Sato, T.; et al. Severe cibenzoline toxicity in hypertrophic obstructive cardiomyopathy: A clinical case study. J. Cardiol. Cases 2021. [Google Scholar]

Figure 1. Overview of the proof-of-concept multimodal deep-learning framework for transthyretin amyloidosis (ATTR) classification. Textual clinical descriptions are encoded using a transformer-based language model, while electrocardiogram (ECG) images are processed through a convolutional neural network. The resulting representations are fused by a neural classifier that estimates the probability of ATTR versus NO_ATTR presentation.

Figure 3. Exploratory relationship network connecting patients by sex and ATTR status.

Figure 4. Exploratory patient network stratified by sex and diagnostic class (ATTR and NO_ATTR).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.