Preprint
Article

This version is not peer-reviewed.

Bone Fracture Classification Using a YOLOv8–ANN Hybrid Model with SHAP and LIME-Based Interpretability

Submitted:

05 September 2025

Posted:

08 September 2025

You are already at the latest version

Abstract
Bone fractures remain a critical diagnostic challenge in orthopedic medicine, requiring precise and timely interpretation of radiographic images in conjunction with clinical evaluation. This study proposes a multimodal artificial intelligence (AI) framework that integrates a YOLOv8n-based convolutional neural network (CNN) for image analysis with an artificial neural network (ANN) trained on structured clinical data to improve fracture detection and classification. The CNN, trained on annotated X-ray images spanning seven anatomical regions, achieved an overall accuracy of 97.1%, with strong localization and classification performance. Interpretability was enhanced using Gradient-weighted Class Activation Mapping (Grad-CAM) to highlight spatial regions of diagnostic relevance. In parallel, the ANN was trained on clinical profiles from 2,873 patients—including demographic, biochemical, and diagnostic parameters—and achieved 96.13% accuracy in binary fracture prediction. To further ensure transparency, SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) were employed to quantify the contribution of individual clinical features. Comprehensive evaluation through confusion matrices, per-class performance metrics, and training dynamics confirmed the robustness and generalizability of the proposed system. By combining radiological imaging with clinical data, this framework provides an accurate, interpretable, and scalable solution for AI-assisted fracture diagnosis in orthopedic practice.
Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  

1. Introduction

Bone health plays an essential role in maintaining mobility, structural integrity, and overall quality of life in aging populations [1]. Disorders affecting the skeletal system, such as osteoporosis, metabolic bone disease, and degenerative joint conditions, contribute significantly to global morbidity and healthcare concern [2,3]. According to the World Health Organization, musculoskeletal disorders rank among the leading causes of disability worldwide, often resulting in chronic pain, decreased independence, and increased risk of injury [4,5]. The human skeletal system is subject to gradual degeneration over time, influenced by factors such as age, hormonal changes, nutritional deficiencies, and systemic illnesses [6]. These alterations accord bone density and microarchitecture, rendering bones more susceptible to trauma and fractures. As the incidence of bone-related conditions continues to rise, especially among older adults and medically vulnerable groups, there is an increasing need for accurate and timely evaluation of bone integrity [7].
Bone fractures are structural breaks in bone tissue that occur when mechanical forces exceed the bone’s capacity to absorb stress [8]. These injuries may result from acute trauma, repetitive strain, or underlying pathological conditions that weaken bone integrity, such as osteoporosis, osteogenesis imperfecta, or bone metastases [9,10]. Fractures are commonly categorized by their anatomical location, pattern (e.g., transverse, oblique, spiral), and extent of displacement or fragmentation [11]. In clinical practice, fractures are often classified as either traumatic or pathological, arising from compromised bone strength due to disease [12,13]. Risk factors such as advanced age, low bone mineral density, hormonal imbalance, and comorbid conditions significantly increase fracture susceptibility [14,15]. The consequences of fractures extend beyond localized injury, often leading to reduced mobility, prolonged hospitalization, and diminished quality of life, particularly in older adults [16]. Effective management requires a clear understanding of fracture type, location, and patient-specific risk factors to guide appropriate therapeutic interventions and reduce the likelihood of complications such as malunion, nonunion, or recurrent injury [17].
Traditionally, the diagnosis of bone fractures has relied primarily on clinical examination supported by imaging techniques such as conventional radiography, computed tomography, and magnetic resonance imaging [18,19]. Among these, radiography remains the most widely used modality due to its accessibility, speed, and ability to visualize bone structures with sufficient detail for routine assessments [20]. However, the diagnostic accuracy of radiographic interpretation is highly dependent on the clinician’s experience and may be influenced by factors such as image quality, anatomical complexity, and the subtle presentation of certain fracture types [21,22]. Over time, advanced imaging methods such as computed tomography and magnetic resonance imaging have improved diagnostic precision, especially for detecting complex or hidden fractures, but their use is often limited by cost, equipment availability, and specific patient-related constraints [23,24].
Technological advances in medical imaging and diagnostic computing have opened new pathways for evaluating bone health, providing clinicians with more precise tools for early intervention and personalized treatment planning [25]. Artificial intelligence (AI), particularly machine learning (ML) and deep learning (DL), has demonstrated considerable potential in transforming the diagnosis and management of bone-related disorders [26]. Machine learning has reshaped numerous sectors, including natural language processing, autonomous systems, healthcare, and visual data analysis [27]. It is particularly effective in processing large and complex datasets, uncovering latent patterns that may not be apparent to human observers, and generating highly accurate predictions [28].
ML techniques are built upon advanced neural network structures, such as Convolutional Neural Networks (CNNs) and Artificial Neural Networks (ANNs), which are specifically designed to model intricate data relationships [29,30]. CNNs are a distinct type of ANN tailored for analyzing image data, including radiographic and diagnostic imaging commonly used in musculoskeletal assessments [31,32]. Among the most well-known CNN-based architectures for object detection is the "You Only Look Once" (YOLO) family, which has been widely applied in real-time bone fracture localization due to its ability to detect multiple anatomical features in a single forward pass [33]. YOLO models, including the latest YOLOv8, apply convolutional filters to extract spatial and hierarchical features from medical images, enabling rapid and accurate identification of fracture sites [34]. These convolutional layers are typically followed by pooling layers that reduce the spatial dimensionality of the data, increase computational efficiency and mitigating overfitting risks [35,36]. CNNs, including object detection models like YOLO, excel in applications such as image recognition, lesion detection, and medical imaging due to their capability to capture spatial dependencies and subtle structural anomalies within the data [37,38]. ANNs, inspired by the architecture of the human brain, consist of layers of interconnected neurons that process input data using weighted combinations and activation functions, allowing them to learn patterns and produce predictive outputs [39,40].
Table 1 provides an overview of recent studies focused on automated bone fracture detection using deep learning and machine learning techniques applied to X-ray imaging. These studies demonstrate a wide range of methodologies, from traditional image preprocessing and classical machine learning to advanced CNN architectures and hybrid models, achieving accuracies between 88% and 99%. However, no prior work has effectively combined ANN models trained on structured clinical data with YOLO-based object detection models applied to radiographic images, highlighting the novelty and integrative strength of the present study.
This study proposes a comprehensive, multimodal artificial intelligence framework for the automated classification of bone fractures by integrating radiographic imaging with structured clinical data. The framework leverages a YOLO-based deep learning model to detect and localize fractures within X-ray images, while a parallel ANN model is employed to analyze patient-specific clinical variables associated with bone health. To enhance transparency and interpretability, the system incorporates explainable AI techniques, including SHAP and LIME, allowing for both global and case-specific understanding of model predictions. The proposed approach aims to improve diagnostic accuracy, support clinical decision-making, and address current limitations in fracture detection by combining high-resolution imaging analysis with contextual medical information.

2. Methods

The following sections detail the datasets, preprocessing techniques, model architecture, and evaluation metrics employed in this study. By integrating image-based and clinical data-driven AI models, this research presents a comprehensive diagnostic framework for detecting bone fractures and assessing fracture risk. The YOLOv8n model demonstrates strong capabilities in localizing and classifying fractures from bone X-ray images, while the ANN offers predictive insights based on clinical attributes related to bone health. Together, these models emphasize the potential of deep learning to enhance diagnostic precision and support more effective orthopedic decision-making.

2.1. Data Information

We developed two distinct AI models using a combination of image-based and clinical data to detect and evaluate bone fractures. The first model was trained using 4000 labeled bone X-ray images obtained from a publicly available dataset [45], while the second model was trained using clinical data from 2873 patients collected from the Harvard Dataverse platform [46]. The primary aim was to design an integrated diagnostic framework capable of detecting bone fractures across various anatomical regions and assessing fracture risk based on individual patient health profiles.
The X-ray dataset consisted of 4,000 images, each labeled and annotated according to specific fracture-related categories. Annotations were assigned to seven classes, including:
• 0 – Elbow (positive)
• 1 – Fingers (positive)
• 2 – Forearm (fracture)
• 3 – Humerus (fracture)
• 4 – Humerus (non-fracture)
• 5 – Shoulder (fracture)
• 6 – Wrist (positive)
All images were preprocessed and resized to a uniform resolution of 640×640 pixels to ensure compatibility with the YOLOv8n object detection model. Standardization of image dimensions and annotation format was applied to maintain consistency across the dataset and optimize model training efficiency.
The second part of the study involved the use of structured clinical data from 2,873 patients, focusing on a wide range of physiological, biochemical, and disease-related variables. The dataset contained the following features:
Demographics and Anthropometrics: Gender, Age, Height, Weight, and Body Mass Index (BMI)
Bone Density Measurements: L1–L4, L1.4T, FN (Femoral Neck), FNT (Femoral Neck T-score), TL (Total Lumbar), TLT (Total Lumbar T-score)
Biochemical Parameters: ALT, AST, BUN, CREA, URIC, FBG, HDL-C, LDL-C, Ca, P, Mg, Calcitriol
Medication History: Use of Bisphosphonate and Calcitonin
Comorbidities and Medical Conditions: HTN (Hypertension), COPD (Chronic Obstructive Pulmonary Disease), DM (Diabetes Mellitus), Hyperlipidaemia, Hyperuricemia, AS (Ankylosing Spondylitis), VT (Vertebral Trauma), VD (Vitamin D Deficiency), OP (Osteoporosis), CAD (coronary artery disease), CKD (chronic kidney disease)
All clinical features were preprocessed and standardized prior to training to ensure numerical consistency and facilitate effective learning. This comprehensive dataset allowed the ANN model to analyze multiple risk factors contributing to bone health and fracture susceptibility.

2.2. Machine Learning Models

A YOLOv8n model was developed to detect and localize bone fractures across 7 annotated classes. The model was implemented using the Ultralytics YOLOv8 framework in Python and trained on 4,000 pre-processed bone X-ray images. All images were resized to 640×640 pixels to comply with model input requirements, and annotations were organized using a structured YAML schema specifying class names and image paths. To improve model generalization and prevent overfitting, a range of augmentation strategies were applied during training, including random horizontal flipping (fliplr=0.5), image translation (translate=0.1), scaling (scale=0.4), and color augmentations in hue, saturation, and brightness domains. These augmentations enriched the dataset by introducing variability in positioning, contrast, and appearance.
The YOLOv8n model (illustrated in Figure 1) was trained at over 300 epochs with a batch size of 16. Early stopping was employed with a patience value of 10, and the learning rate was carefully managed starting from 0.0005 (lr0) and decaying to 0.0001 (lrf). Regularization was applied using a weight decay of 0.0005 and momentum of 0.937. A warm-up phase spanned the initial 3 epochs, ensuring smoother gradient updates at the start of training. Image caching (cache=True) was used to accelerate training by reducing I/O latency. Post-training, Grad-CAM was employed to generate visual heatmaps over the input images. These visualizations highlight the most influential regions used by the model to make predictions, thereby enhancing model interpretability and clinical trust in AI-assisted fracture detection.
Performance was evaluated using accuracy, precision, recall, and F1-score, calculated per fold and averaged to obtain overall model metrics (Table 2). Confusion matrices were used to visualize the alignment between predicted and true labels. Additionally, training and validation accuracy/loss curves were plotted for each fold to monitor model learning dynamics, detect overfitting or underfitting patterns, and inform potential hyperparameter tuning.
An ANN was developed to classify patients into two categories: Class 0 (no fracture) and Class 1 (fracture), based on 37 clinical and biochemical input features. The ANN architecture was structured as a lightweight, fully connected feedforward model, designed to balance predictive performance with clinical interpretability.
The model consisted of an input layer accepting 37 standardized features, followed by two hidden layers, each comprising four neurons with ReLU activation functions. L2 regularization (l2=0.01) and dropout (rate=0.3) were applied to each hidden layer to mitigate overfitting and improve generalization. The final output layer consisted of two units with sigmoid activation, enabling probabilistic outputs for binary classification. Model compilation was performed using the Adam optimizer and sparse categorical cross-entropy as the loss function, suitable for handling integer-labeled targets in binary classification tasks. Training incorporated early stopping with a patience value of 10 to prevent overtraining and automatically restore the best-performing weights based on validation loss.
To ensure the ANN’s clinical transparency and reliability, post-hoc model interpretability was enhanced using both SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations). SHAP was employed to analyze global feature importance and assess the contribution of each variable to model predictions. LIME provided case-specific insights, enabling the examination of local feature influence on individual classification outcomes. Together, these techniques allowed for a detailed understanding of the decision boundaries and enhanced the clinical applicability of the model.
Model evaluation was carried out using a set of performance metrics including accuracy, precision, recall, F1-score, and Matthews Correlation Coefficient (MCC). Additionally, a confusion matrix was generated to assess classification robustness and discrimination between classes.

3. Results

The results of this study present a thorough evaluation of the proposed AI models, highlighting their performance in bone fracture detection and risk classification tasks. This assessment covers essential aspects such as detection accuracy, class-wise prediction reliability, model interpretability, and consistency across both imaging and clinical datasets. Key findings are illustrated through training curves, confusion matrices, ROC-AUC analyses, and feature attribution visualizations, demonstrating the robustness and diagnostic potential of the YOLOv8n and ANN systems. The combination of quantitative metrics and qualitative explanations offers a comprehensive understanding of each model’s effectiveness and supports their applicability in real-world clinical settings.
The training performance of the YOLOv8n model for bone fracture detection is illustrated in Figure 3, with metrics plotted across 300 epochs. The top left panel shows the training object loss, which begins at approximately 0.85 and consistently declines to near zero, indicating progressive minimization of detection error. The validation object loss (bottom left) follows a similar downward trend, suggesting strong generalization without signs of overfitting. The precision and recall curves (top center and right) exhibit steep initial improvements, stabilizing near 0.95 by epoch 150, which demonstrates the model’s high ability to correctly identify and localize fractures.
In the bottom center and right panels, both mAP@0.5 and mAP@0.5:0.95 steadily increase and plateau above 0.92, indicating strong overall performance across different intersection-over-union (IoU) thresholds. The convergence of low loss and high metric values across all curves confirms that the YOLOv8n model has effectively learned to detect and classify bone fractures with high confidence and accuracy.
Figure 4 illustrates the confusion matrix of the proposed deep neural network model in classifying radiographic bone conditions across seven distinct categories. The model demonstrates robust diagnostic performance, with notably high true positive rates for classes such as humerus (108 correctly predicted instances) and fingers positive (93), indicating its precision in recognizing both fracture and non-fracture conditions. Misclassifications are limited and predominantly occur between anatomically or radiographically similar classes which are consistent with the complexity of their visual overlap in X-ray imagery. Despite these challenges, the matrix reveals a balanced distribution across categories, with minimal confusion among dissimilar classes, attesting to the model’s discriminative capacity. These findings suggest that the network effectively captures nuanced skeletal features and spatial cues, enabling accurate multi-class differentiation essential for clinical decision support in fracture detection and orthopedic assessments.
Table 3 presents a comprehensive evaluation of the YOLOv8n model’s classification performance across seven bone condition categories using key statistical metrics derived from the confusion matrix. The model achieved a high overall accuracy of 97.1%, with consistently strong predictive capability observed across all classes. Macro-averaged precision and recall were calculated as 92.0% and 90.1%, respectively, reflecting the model’s ability to limit both false positives and false negatives in a multi-class setting. The F1-score, representing the harmonic mean of precision and recall, was 91.0%, indicating balanced performance across the diverse fracture and anatomical categories. Additionally, the MCC, which provides a balanced measure even in the presence of class imbalance, was computed as 0.902. These results collectively highlight the model’s ability to accurately distinguish between subtle variations in radiographic bone features and maintain generalizability across complex classification tasks.
Figure 7 illustrates the prediction outputs of the YOLOv8n model for three representative radiographic images (A–C), each accompanied by its corresponding Grad-CAM-based class activation map (D–F). In the detection stage, the model successfully identified relevant anatomical regions with high confidence scores: 0.88 for wrist positive (A), 0.86 for elbow positive (B), and 0.76/0.75 for fingers positive (C). The Grad-CAM overlays in panels D, E, and F confirm the model’s spatial focus, with high activation intensities centered around the localized regions of interest. These attention maps provide visual validation of the model’s interpretability, indicating that YOLOv8n captures relevant radiographic features to inform its predictions. The combination of high-confidence detection and localized gradient-based saliency supports the model’s utility in automated fracture screening and facilitates transparent, explainable AI-driven diagnosis in musculoskeletal imaging.
Figure 8 illustrates the training progression of the proposed artificial neural network (ANN) over 100 epochs, capturing model dynamics in terms of accuracy and loss across training and validation datasets. As shown in the left panel, both training and validation accuracy steadily increased throughout the training period, with the validation accuracy surpassing 90% by approximately epoch 20 and continuing to improve gradually until reaching a plateau near 96%. The training accuracy shows a parallel but slightly lower trajectory, stabilizing around 88%, which suggests controlled learning without evidence of overfitting. The right panel displays the corresponding loss curves, where both training and validation loss exhibit a consistent downward trend across epochs. Interestingly, the validation loss remains lower than the training loss throughout, indicating effective generalization possibly enhanced by regularization techniques or architectural stability. The absence of divergence between loss and accuracy curves confirms that the model maintains convergence integrity while minimizing errors on unseen data.
Figure 9 displays the confusion matrix summarizing the classification performance of the ANN model in distinguishing between fractured and non-fractured cases. Out of the total samples, the model correctly identified 257 non-fracture cases (true negatives) and 295 fracture cases (true positives), while misclassifying 18 non-fracture samples as fractures (false positives) and 5 fracture samples as non-fractures (false negatives). These results correspond to a high classification accuracy, with the model demonstrating strong discriminative capability between the two categories. The minimal number of false negatives indicates a low rate of missed fractures, while the limited false positives suggest that overdiagnosis of fracture is also effectively controlled.
Table 4 summarizes the classification performance of the artificial neural network (ANN) model in binary fracture detection, detailing precision, recall, F1-score, and class support for each category. The model achieved a precision of 0.9813 and a recall of 0.9346 for the no fracture class, while for the fracture class, precision and recall were 0.9425 and 0.9833, respectively. These values indicate that the model performs reliably in minimizing both false positives and false negatives. The F1-scores for both classes were similarly high, 0.9574 for no fracture and 0.9624 for fracture, reflecting balanced precision and recall. The overall accuracy across all 575 samples was calculated as 96.00%. Macro-averaged and weighted-average scores for all metrics remained consistently high (≥0.959), demonstrating that the model maintains stable and equitable performance across both classes, irrespective of class distribution.
Figure 10 presents the SHAP summary plot, which illustrates the average contribution of each input feature to the predictions made by the ANN model distinguishing between individuals with and without bone fractures. The horizontal axis represents the mean absolute SHAP value, indicating the average magnitude of each feature’s impact on the model output. Among all features, TL (Total Length) and FNT (Femoral Neck Thickness) emerged as the most influential predictors, contributing substantially to both Class 0 (no fracture) and Class 1 (fracture) outcomes. Gender, FN (Femoral Neck BMD), and P (Phosphorus) also exhibited high SHAP values, suggesting strong relevance to the model’s fracture risk estimation. Features like L1–4 BMD, Magnesium, and Diabetes Mellitus (DM) contributed moderately, while a wide range of clinical parameters including Age, Creatinine (CREA), Calcium, Lipid Profiles (HDL-C, LDL-C), AST, and BUN showed relatively lower but still notable influence. Color coding distinguishes the contribution directionality across classes, with blue bars indicating influence on no fracture predictions and pink bars reflecting influence toward fracture classification.
Figure 11 presents the LIME analysis for an individual prediction made by the ANN model, offering a localized interpretation of the model’s decision regarding fracture classification. The prediction probability bar on the left shows that the model assigned a full confidence score of 1.00 for the Fracture class. The central panel illustrates the top contributing features that influenced this prediction, with features such as TL > 0.65, VT ≤ -0.24, FN > 0.64, TLT > 0.59, and Hyperlipidemia > -0.59 contributing positively toward the fracture classification (shown in orange). In contrast, features including FNT > 0.70, Calcium ≤ -0.34, Gender ≤ -0.62, P ≤ -0.55, and Hyperuricemia ≤ -0.34 acted in favor of the No Fracture class (shown in blue). The corresponding feature values are listed in the rightmost panel, highlighting that this instance was characterized by elevated measurements in FNT (1.48), TL (1.16), FN (1.08), TLT (1.63), and Hyperlipidemia (1.69), which were dominant drivers of the fracture prediction.

4. Discussion

This section interprets the performance and implications of the proposed multimodal AI framework for bone fracture detection and classification. The discussion emphasizes the diagnostic utility of the YOLOv8n model for radiographic image analysis, followed by an examination of the ANN model’s interpretability through SHAP and LIME explainability tools. The integration of these results highlights the framework’s clinical potential in supporting transparent, accurate, and data-informed decision-making in musculoskeletal assessment.
The YOLOv8n model demonstrated strong detection capabilities across multiple bone fracture categories, achieving an overall classification accuracy of 97.1% and maintaining high precision, recall, and F1-scores across all classes. The performance metrics, particularly in anatomically complex regions such as the humerus and wrist, suggest that the model effectively distinguishes between subtle radiographic features. The confusion matrix indicates minimal misclassification, with most errors occurring between anatomically similar or visually overlapping regions. Additionally, the high mAP scores across different IoU thresholds confirm the robustness of the model's object localization abilities. These findings underscore the suitability of YOLOv8n for real-time fracture screening tasks, where accurate detection and anatomical localization are essential for clinical triage and intervention planning.
The interpretability of the ANN model was explored using SHAP and LIME analyses, which provided detailed insights into the contribution of individual clinical features to fracture predictions. The SHAP summary plot identified Total Length and Femoral Neck Thickness as the most influential predictors, reinforcing their clinical significance in assessing bone strength and fracture susceptibility. Other features such as gender, phosphorus levels, and femoral neck BMD also played substantial roles, reflecting the multifactorial nature of bone health. The LIME analysis further complemented these findings by offering a local explanation for a specific prediction, distinguishing between features that positively and negatively influenced the model’s decision. Together, these explainability tools not only validated the ANN’s learning behavior but also enhanced the model’s transparency, enabling clinicians to interpret predictions within a medically relevant context.
Consistent with the findings of , who achieved maximum 92% accuracy using a deep neural network on augmented X-ray datasets [44], our YOLOv8n model outperformed this benchmark with a 97.1% overall accuracy, suggesting that one-stage object detection architecture may offer improved fracture localization and classification in multi-class settings. A CAD system based on CT imaging [51] achieved 95% accuracy in segmenting and labeling bone fractures by incorporating patient-specific anatomy and artifact removal, highlighting the potential of image-based automation, though differing from our approach in both imaging modality and model architecture. While [52] reported exceptionally high accuracy (99.12%) using a hybrid two-scale edge-enhanced CNN model, their approach did not integrate clinical variables, limiting interpretability an aspect addressed in our study using SHAP and LIME alongside ANN analysis. Compared to the traditional CAD approach by research [53], which achieved 88.67% accuracy using handcrafted features and classical classifiers, our deep learning-based framework demonstrated superior performance and automation in fracture detection. The recently introduced FracNet framework [54] reported 100% accuracy across three datasets by leveraging self-supervised learning and attention mechanisms, offering high adaptability and interpretability, though our model demonstrates comparable performance while additionally integrating clinical data for enhanced diagnostic insight. Unlike prior studies that focus solely on imaging data or handcrafted features, our work uniquely integrates YOLO-based radiographic detection with ANN-driven clinical data analysis, offering a multimodal, explainable, and highly accurate framework for bone fracture classification. Furthermore, the incorporation of Grad-CAM, SHAP, and LIME enhances model transparency by providing both global and local interpretability, enabling clinicians to understand, validate, and trust the decision-making process.
The primary advantage of the proposed framework lies in its multimodal architecture, which combines radiographic image analysis through YOLOv8n with structured clinical data interpretation via an artificial neural network. This integration allows the model to capture both visual fracture characteristics and patient-specific risk factors, resulting in a more comprehensive and clinically relevant assessment. The use of explainable AI techniques, including Grad-CAM, SHAP, and LIME, further enhances transparency by offering visual and quantitative insights into model decision-making, thereby fostering clinician trust and interpretability. Moreover, the system demonstrates strong generalization performance, achieving high accuracy across multiple fracture types and anatomical regions.
Despite these strengths, there are limitations that should be acknowledged. The reliance on labeled datasets, particularly for radiographic images, may restrict scalability due to the time and expertise required for annotation. Additionally, while the ANN effectively incorporates clinical features, its performance may vary depending on the completeness and quality of electronic health records. Another constraint is the current reliance on retrospective data, which may not fully represent real-time variability seen in clinical environments. Addressing these challenges through larger, prospective, and multicenter studies will be essential for future validation and clinical translation.
While the proposed multimodal AI framework demonstrated strong performance in bone fracture detection and classification, future research could focus on expanding its generalizability and clinical deployment. Incorporating larger and more diverse datasets from multiple institutions would enhance the model's robustness across various imaging protocols and patient demographics. Additionally, extending the framework to handle 3D imaging modalities such as CT or MRI could improve detection accuracy for complex or subtle fractures that are less visible in standard X-rays. Further integration of temporal clinical data, including treatment history and longitudinal bone density trends, may also enhance risk prediction capabilities. Lastly, real-time deployment through user-friendly clinical interfaces, combined with prospective validation in clinical settings, will be essential to translate this AI-assisted diagnostic tool into routine orthopedic workflows.

5. Conclusion

This study presents a comprehensive and explainable AI-based framework for bone fracture detection and classification by integrating a YOLOv8n model for radiographic image analysis with an ANN model trained on structured clinical data. The proposed system demonstrated high accuracy and robustness across multiple fracture types and anatomical regions, supported by detailed interpretability through Grad-CAM, SHAP, and LIME analyses. By combining image-based localization with patient-specific risk profiling, the framework offers a clinically meaningful and transparent approach that advances the current capabilities of automated fracture diagnosis. These findings suggest the potential for real-world implementation in orthopedic settings, with future improvements expected through expanded datasets and prospective clinical validation.
Data Availability
The datasets used in this study are publicly available on the Internet. These datasets were used under their respective open-access licenses for research purposes. The code developed for this study is available upon reasonable request.

Acknowledgements

The endeavor was exclusively conducted using the organization's current staff and infrastructure, and all resources and assistance came from inside sources. Ethical approval is not applicable. The data supporting the study's conclusions are accessible inside the journal, according to the author. Upon a reasonable request, the corresponding author will provide the raw data supporting the study's findings.

Conflicts of Interest

In this article, the author states that they have no competing commercial interests or personal affiliations. Declaration of generative AI and AI assisted technologies in the writing process. The authors used Grammarly and QuillBot solely for grammar correction and stylistic refinement. ChatGPT was employed strictly for language clarity suggestions and for improving the readability of technical content, without generating original scientific content or interpretations. All intellectual and scientific contributions—including study design, data analysis, interpretation, and manuscript content—were developed entirely by the authors.

References

  1. L. Ferrucci et al., ‘Interaction between bone and muscle in older persons with mobility limitations’, Curr Pharm Des, vol. 20, no. 19, pp. 3178–3197, 2014.
  2. Vaish, R. A. Vaishya, and K. P. Iyengar, ‘Metabolic syndrome and its impact on bone and joint health: a comprehensive review’, Apollo Medicine, vol. 22, no. 3, pp. 237–243, 2025.
  3. D. A. Woolf and B. Pfleger, ‘Burden of major musculoskeletal conditions’, Bull World Health Organ, vol. 81, no. 9, pp. 646–656, 2003.
  4. E. Sebbag, R. E. Sebbag, R. Felten, F. Sagez, J. Sibilia, H. Devilliers, and L. Arnaud, ‘The world-wide burden of musculoskeletal diseases: a systematic analysis of the World Health Organization Burden of Diseases Database’, Ann Rheum Dis, vol. 78, no. 6, pp. 844–848, 2019.
  5. M. A. Briggs et al., ‘Musculoskeletal health conditions represent a global threat to healthy aging: a report for the 2015 World Health Organization world report on ageing and health’, Gerontologist, vol. 56, no. suppl_2, pp. S243–S255, 2016.
  6. D. Nandiraju and I. Ahmed, ‘Human skeletal physiology and factors affecting its modeling and remodeling’, Fertil Steril, vol. 112, no. 5, pp. 775–781, 2019.
  7. P. A. Anderson, K. J. P. A. Anderson, K. J. Jeray, J. M. Lane, and N. C. Binkley, ‘Bone health optimization: beyond own the bone: AOA critical issues’, JBJS, vol. 101, no. 15, pp. 1413–1419, 2019.
  8. H. S. Gupta and P. Zioupos, ‘Fracture of bone tissue: the “hows” and the “whys”’, Med Eng Phys, vol. 30, no. 10, pp. 1209–1226, 2008.
  9. M. N. Pathria, C. B. M. N. Pathria, C. B. Chung, and D. L. Resnick, ‘Acute and stress-related injuries of bone and cartilage: pertinent anatomy, basic biomechanics, and imaging perspective’, Radiology, vol. 280, no. 1, pp. 21–38, 2016.
  10. T. Hoenig et al., ‘Bone stress injuries’, Nat Rev Dis Primers, vol. 8, no. 1, p. 26, 2022.
  11. S. B. Mostofi, Fracture classifications in clinical practice. Springer, 2006.
  12. G. R. Matcuk, S. R. G. R. Matcuk, S. R. Mahanty, M. R. Skalski, D. B. Patel, E. A. White, and C. J. Gottsegen, ‘Stress fractures: pathophysiology, clinical presentation, imaging features, and treatment options’, Emerg Radiol, vol. 23, pp. 365–375, 2016.
  13. E. A. Zimmermann, B. E. A. Zimmermann, B. Busse, and R. O. Ritchie, ‘The fracture mechanics of human bone: influence of disease and treatment’, Bonekey Rep, vol. 4, p. 743, 2015.
  14. P. Pisani et al., ‘Major osteoporotic fragility fractures: Risk factor updates and societal impact’, World J Orthop, vol. 7, no. 3, p. 171, 2016.
  15. S. L. Wilson-Barnes, S. A. S. L. Wilson-Barnes, S. A. Lanham-New, and H. Lambert, ‘Modifiable risk factors for bone health & fragility fractures’, Best Pract Res Clin Rheumatol, vol. 36, no. 3, p. 101758, 2022.
  16. R. Marks, J. P. R. Marks, J. P. Allegrante, C. R. MacKenzie, and J. M. Lane, ‘Hip fractures among the elderly: causes, consequences and control’, Ageing Res Rev, vol. 2, no. 1, pp. 57–93, 2003.
  17. K. M. Bowers and D. E. Anderson, ‘Delayed union and nonunion: current concepts, prevention, and correction: a review’, Bioengineering, vol. 11, no. 6, p. 525, 2024.
  18. D. P. Yadav and S. Rathor, ‘Bone Fracture Detection and Classification using Deep Learning Approach’, in 2020 International Conference on Power Electronics & IoT Applications in Renewable Energy and its Control (PARC), 2020, pp. 282–285. [CrossRef]
  19. S. Hussain et al., ‘Modern diagnostic imaging technique applications and risk factors in the medical field: a review’, Biomed Res Int, vol. 2022, no. 1, p. 5164970, 2022.
  20. J. E. Adams, ‘Advances in bone imaging for osteoporosis’, Nat Rev Endocrinol, vol. 9, no. 1, pp. 28–42, 2013.
  21. Wright, E. A. A. J. Hegedus, L. Lenchik, K. J. Kuhn, L. Santiago, and J. M. Smoliga, ‘Diagnostic accuracy of various imaging modalities for suspected lower extremity stress fractures: a systematic review with evidence-based recommendations for clinical practice’, Am J Sports Med, vol. 44, no. 1, pp. 255–263, 2016.
  22. J. E. Adams, ‘Advances in bone imaging for osteoporosis’, Nat Rev Endocrinol, vol. 9, no. 1, pp. 28–42, 2013.
  23. X.-D. Liu, H.-B. X.-D. Liu, H.-B. Wang, T.-C. Zhang, Y. Wan, and C.-Z. Zhang, ‘Comparison between computed tomography and magnetic resonance imaging in clinical diagnosis and treatment of tibial platform fractures’, World J Clin Cases, vol. 8, no. 18, p. 4067, 2020.
  24. S. Hussain et al., ‘Modern diagnostic imaging technique applications and risk factors in the medical field: a review’, Biomed Res Int, vol. 2022, no. 1, p. 5164970, 2022.
  25. Naik, A. A. A. Kale, and J. M. Rajwade, ‘Sensing the future: A review on emerging technologies for assessing and monitoring bone health’, Biomaterials Advances, p. 214008, 2024.
  26. Naik, A. A. A. Kale, and J. M. Rajwade, ‘Sensing the future: A review on emerging technologies for assessing and monitoring bone health’, Biomaterials Advances, p. 214008, 2024.
  27. N. Rane, S. Choudhary, and J. Rane, ‘Machine learning and deep learning: A comprehensive review on methods, techniques, applications, challenges, and future directions’, Techniques, Applications, Challenges, and Future Directions (May 31, 2024), 2024.
  28. S. Maramraju et al., ‘AI-organoid integrated systems for biomedical studies and applications’, Bioeng Transl Med, vol. 9, no. 2, p. e10641, 2024.
  29. Goel, A. A. K. Goel, and A. Kumar, ‘The role of artificial neural network and machine learning in utilizing spatial information’, Spatial Information Research, vol. 31, no. 3, pp. 275–285, 2023.
  30. Khan, A. A. Sohail, U. Zahoora, and A. S. Qureshi, ‘A survey of the recent architectures of deep convolutional neural networks’, Artif Intell Rev, vol. 53, pp. 5455–5516, 2020.
  31. S. Gitto et al., ‘AI applications in musculoskeletal imaging: a narrative review’, Eur Radiol Exp, vol. 8, no. 1, p. 22, 2024.
  32. P. Chea and J. C. Mandell, ‘Current applications and future directions of deep learning in musculoskeletal radiology’, Skeletal Radiol, vol. 49, no. 2, pp. 183–197, 2020.
  33. M. T. Hosain, A. M. T. Hosain, A. Zaman, M. R. Abir, S. Akter, S. Mursalin, and S. S. Khan, ‘Synchronizing object detection: applications, advancements and existing challenges’, IEEE access, 2024.
  34. G. Meza, D. G. Meza, D. Ganta, and S. Gonzalez Torres, ‘Deep Learning Approach for Arm Fracture Detection Based on an Improved YOLOv8 Algorithm’, Algorithms, vol. 17, no. 11, p. 471, 2024.
  35. Zafar, A.; et al. , ‘A comparison of pooling methods for convolutional neural networks’, Applied Sciences, vol. 12, no. 17, p. 8643, 2022.
  36. C. F. G. Dos Santos and J. P. Papa, ‘Avoiding overfitting: A survey on regularization methods for convolutional neural networks’, ACM Computing Surveys (Csur), vol. 54, no. 10s, pp. 1–25, 2022.
  37. M. Saraei, M. M. Saraei, M. Lalinia, and E.-J. Lee, ‘Deep Learning-Based Medical Object Detection: A Survey’, IEEE Access, 2025.
  38. M. G. Ragab et al., ‘A comprehensive systematic review of YOLO for medical object detection (2018 to 2023)’, IEEE Access, 2024.
  39. T. K. Gupta and K. Raza, ‘Optimization of ANN architecture: a review on nature-inspired techniques’, Machine learning in bio-signal analysis and diagnostic imaging, pp. 159–182, 2019.
  40. I. Abiodun, A. I. Abiodun, A. Jantan, A. E. Omolara, K. V. Dada, N. A. Mohamed, and H. Arshad, ‘State-of-the-art in artificial neural network applications: A survey’, Heliyon, vol. 4, no. 11, 2018.
  41. Y. Ma and Y. Luo, ‘Bone fracture detection through the two-stage system of Crack-Sensitive Convolutional Neural Network’, Inform Med Unlocked, vol. 22, p. 100452, 2021. [CrossRef]
  42. D. P. Yadav, A. Sharma, S. Athithan, A. Bhola, B. Sharma, and I. Ben Dhaou, ‘Hybrid SFNet Model for Bone Fracture Detection and Classification Using ML/DL’, Sensors, vol. 22, no. 15, 2022. [CrossRef]
  43. M. E. Sahin, ‘Image processing and machine learning-based bone fracture detection and classification using X-ray images’, Int J Imaging Syst Technol, vol. 33, no. 3, pp. 853–865, 2023. [CrossRef]
  44. K. Dlshad Ahmed and R. Hawezi, ‘Detection of bone fracture based on machine learning techniques’, Measurement: Sensors, vol. 27, p. 100723, 2023. [CrossRef]
  45. P. Darabi, ‘Bone Fracture Detection: Computer Vision Project’, Jun. 2024. [CrossRef]
  46. L. He, ‘Bone mineral density’, 2022.
  47. Maxwell, A. E. T. A. Warner, and L. A. Guillén, ‘Accuracy assessment in convolutional neural network-based deep learning remote sensing studies—Part 1: Literature review’, Remote Sens (Basel), vol. 13, no. 13, p. 2450, 2021.
  48. G. K. Armah, G. G. K. Armah, G. Luo, and K. Qin, ‘A deep analysis of the precision formula for imbalanced class distribution’, Int J Mach Learn Comput, vol. 4, no. 5, pp. 417–422, 2014.
  49. T. Alam, W.-C. T. Alam, W.-C. Shia, F.-R. Hsu, and T. Hassan, ‘Improving breast cancer detection and diagnosis through semantic segmentation using the Unet3+ deep learning framework’, Biomedicines, vol. 11, no. 6, p. 1536, 2023.
  50. D. Chicco and G. Jurman, ‘An invitation to greater use of Matthews correlation coefficient in robotics and artificial intelligence’, Front Robot AI, vol. 9, p. 876814, 2022.
  51. D. D. Ruikar, K. C. D. D. Ruikar, K. C. Santosh, and R. S. Hegadi, ‘Segmentation and analysis of CT images for bone fracture detection and labeling’, in Medical Imaging, CRC Press, 2019, pp. 130–154.
  52. D. P. Yadav, A. D. P. Yadav, A. Sharma, S. Athithan, A. Bhola, B. Sharma, and I. Ben Dhaou, ‘Hybrid SFNet model for bone fracture detection and classification using ML/DL’, Sensors, vol. 22, no. 15, p. 5823, 2022.
  53. M. E. Sahin, ‘Image processing and machine learning-based bone fracture detection and classification using X-ray images’, Int J Imaging Syst Technol, vol. 33, no. 3, pp. 853–865, 2023. [CrossRef]
  54. H. A. Alwzwazy, L. H. A. Alwzwazy, L. Alzubaidi, Z. Zhao, and Y. Gu, ‘FracNet: An end-to-end deep learning framework for bone fracture detection’, Pattern Recognit Lett, 2025.
Figure 1. Graphical Representation of YOLOv8n Architecture.
Figure 1. Graphical Representation of YOLOv8n Architecture.
Preprints 175542 g001
Figure 3. Evaluation of YOLOv8n Training Dynamics for Fracture Detection.
Figure 3. Evaluation of YOLOv8n Training Dynamics for Fracture Detection.
Preprints 175542 g003
Figure 4. Confusion Matrix Illustrating the Performance of YOLOv8n in Radiographic Classification of Bone Fractures.
Figure 4. Confusion Matrix Illustrating the Performance of YOLOv8n in Radiographic Classification of Bone Fractures.
Preprints 175542 g004
Figure 7. Confusion Matrix of CNN Model’s Classes.
Figure 7. Confusion Matrix of CNN Model’s Classes.
Preprints 175542 g007
Figure 8. Training and Validation Accuracy of ANN Model.
Figure 8. Training and Validation Accuracy of ANN Model.
Preprints 175542 g008
Figure 9. The Confusion Matrix of ANN Model.
Figure 9. The Confusion Matrix of ANN Model.
Preprints 175542 g009
Figure 10. Feature Importance Analysis Using SHAP for ANN Model.
Figure 10. Feature Importance Analysis Using SHAP for ANN Model.
Preprints 175542 g010
Figure 11. Feature Importance Analysis Using LIME for ANN Model.
Figure 11. Feature Importance Analysis Using LIME for ANN Model.
Preprints 175542 g011
Table 1. Summary of The Previous Studies.
Table 1. Summary of The Previous Studies.
Study Objective Dataset Techniques Results
[18] To develop an automated system for classifying healthy and fractured bones using deep learning. Small X-ray image dataset (augmented using data augmentation techniques) Deep Neural Network (DNN), Data Augmentation, Softmax Activation, Adam Optimizer, 5-Fold Cross Validation Achieved 92.44% overall accuracy; over 95% accuracy on 10% test data and over 93% on 20% test data;
[41] To develop a two-stage computer-aided diagnosis system for automated fracture detection using a fracture-sensitive neural network. 1052 X-ray images (526 fractured, 526 non-fractured) from Haikou People's Hospital Two-stage system: (1) Faster R-CNN for bone region detection, (2) Crack-Sensitive CNN (CrackNet) for fracture classification Achieved 90.11% accuracy and 90.14% F-measure: outperformed other two-stage systems
[42] To develop an efficient and accurate model for bone fracture diagnosis using edge-enhanced deep learning. Bone image dataset (number not specified) including grayscale and canny edge images Hybrid two-scale model (SFNet), Improved Canny Edge Algorithm, Multi-scale Feature Fusion, CNN Achieved 99.12% accuracy, 99% F1-score, and 100% Recall: outperformed state-of-the-art deep CNN models
[43] To detect and classify bone fractures using classical machine learning methods on preprocessed X-ray images. X-ray dataset containing various bone types (normal and fractured) Image preprocessing, Canny & Sobel edge detection, Hough line detection, Harris corner detection, Feature extraction, 12 ML classifiers, Grid Search, 10-fold cross-validation Linear Discriminant Analysis (LDA) achieved the highest accuracy of 88.67% and AUC of 0.89; comparative results presented for all classifiers
[44] To develop a machine learning-based system for bone fracture detection from X-ray images to assist surgeons in diagnosis. 270 X-ray images Image preprocessing, Edge detection, Feature extraction, ML classifiers Accuracy ranged from 64% to 92%; SVM achieved the highest accuracy (92%), outperforming other classifiers and most prior studies
Table 2. Summary of Performance Metrics Used in the Study and Their Calculation Methods.
Table 2. Summary of Performance Metrics Used in the Study and Their Calculation Methods.
Metrics Calculation
Accuracy [47] T P + T N T P + T N + F P + F N
Precision [48] T P T P + F P
Recall [49] T P T P + F N
F1-Score [47] P r e c i s i o n   x   R e c a l l P r e c i s i o n + R e c a l l
Matthews Correlation Coefficient [50] T P   x   T N F P   x   F N T P + F P   T P x F N   T P x F P   T N x F N
Table 3. Per-Class and Overall Performance Metrics of the YOLOv8n Model.
Table 3. Per-Class and Overall Performance Metrics of the YOLOv8n Model.
Class Accuracy Precision Recall F1-Score MCC
“elbow positive” 0.965 0.956 0.917 0.936 0.931
“fingers positive” 0.973 0.961 0.925 0.943 0.938
“forearm fracture” 0.962 0.844 0.888 0.866 0.850
“humerus fracture” 0.973 0.911 0.890 0.900 0.888
“humerus” 0.979 0.906 0.915 0.910 0.905
“shoulder fracture” 0.974 0.944 0.889 0.916 0.901
“wrist positive” 0.970 0.915 0.883 0.899 0.891
Overall (Avg) 0.971 0.920 0.901 0.910 0.902
Table 4. Performance Metrics of ANN Model.
Table 4. Performance Metrics of ANN Model.
Precision Recall F1-Score Support
No Fracture (0) 0.9813 0.9346 0.9574 275
Fracture (1) 0.9425 0.9833 0.9624 300
Accuracy 0.9605 0.9790 0.9600 575
Macro Avg. 0.9619 0.9590 0.9599 575
Weighted Avg. 0.9613 0.9600 0.9600 575
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated