3.1. Overview
Developing an accurate medical expert model for diagnosing conditions requires substantial patient data. This dataset should encompass images as well as a variety of relevant factors, i.e. age, weight and other medical history data. Using such a comprehensive dataset, our integrated CNN and BN network analyzes images and the relationships between these factors, expressed as conditional probabilities. Because ANNs and BNs operate based on statistical principles, the accuracy of their predictions relies heavily on the size of the dataset. Therefore, the research process involves several stages:
Initial acquisition and preprocessing of data
Compilation of the collected data
Deployment and utilization of CNN+BNs algorithm
The overall methodology in key steps is the following. 1) thermal images as well as medical history data are collected 2) segmentation of thermal images is performed 3) CNN is trained with the thermal images and makes a diagnosis 4) XAI algorithms identify which are the critical parts of images 5) statistical and computational factors are evaluated from the critical parts of the thermal images 6) BN is trained with the factors and the medical records dataset and makes a diagnosis 5) if BN has similar to the CNN accuracy then the structure of BN reveals a full interpretable model of the decision of diagnosis 6) Finally, including the diagnosis of the CNN into the previous BN and train it and running it again generates a very high accurate expert system for diagnosis.
This section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.
3.4. Convolutional Neural Network Model
The CNN model utilizes thermal images in JPEG format as input data and produces binary output (1 for positive, 0 for negative), as elaborated in [
33,
34].
CNNs process data reminiscent of the grid processing seen in the LeNet architecture. According to [
33,
34], CNN consists of five data processing layers: alignment and two output layers. During the training phase, the CNN adjusts its parameters based on the provided data set, gradually increasing its accuracy [
33,
34,
35]. Moreover, transfer learning was used to adapt an existing model to the current task, speeding up the learning process. The dataset was divided into separate subsets for training, cross-validation, and testing.
Figure 7.
CNN architecture showing parameters at each level [
33,
34,
35].
Figure 7.
CNN architecture showing parameters at each level [
33,
34,
35].
Figure 8.
Scheme of transfer learning method for binary classification.
Figure 8.
Scheme of transfer learning method for binary classification.
To evaluate the models' performance eight metrics, including accuracy and precision, were used. The confusion matrix was useful in identifying areas where the model made errors.
3.5. Explainable Artificial Intelligence (XAI) Framework
Explainable Artificial Intelligence (XAI) is a subset of machine learning (ML) focused on elucidating the processes by which ML models generate their outputs. Applying XAI to an ML model enhances its reliability, as the reasoning behind the model's inferences becomes traceable [
38].
Artificial Neural networks (ANN) usually comprise numerous layers linked through complex, nonlinear relationships. Even if all these layers and their interconnections were examined, it would be nearly impossible to fully understand how the ANN arrived at its decision. Consequently, deep learning is frequently regarded as a 'black box.' Given the high stakes involved in medical decision-making, it is unsurprising that medical professionals have expressed concerns about the opaque nature of deep learning, which currently represents the leading technology in medical image analysis [
39,
40,
41,
42].
There has been a demand for methods to demystify the 'black box' nature of deep learning. These methods are typically known as interpretable deep learning or explainable artificial intelligence (XAI) [
38,
46]. Given the high stakes in medical decision-making, it is not surprising that medical professionals have expressed concerns about the opaque nature of deep learning, which is currently the leading technology in medical image analysis [
39,
40,
41,
44].
LIME, which stands for Local Interpretable Model-Agnostic Explanations, is an algorithm designed to faithfully explain the predictions of any classifier or regressor by locally approximating it with an interpretable model. LIME offers local explanations by substituting a complex model with simpler ones in specific regions. For instance, it can approximate a CNN with a linear model by altering the input data, and then the output of the complex model changes. LIME uses the simpler model to map the relationship between the modified input data and the output changes. The similarity of the altered input to the original input is used as a weight, ensuring that explanations from the simple models with significantly altered inputs have less influence on the final explanation [
45,
46].
First of all, the digit classifier was built by installing tensorflow specifically employing Keras, which is installed in tensorflow. The Keras frontend simplifies the complexity of lower-level training processes, making it an excellent tool for quickly building models.
Keras includes the MNIST dataset in its distribution, which can be accessed using the load_data() method from the MNIST module. This method provides two tuples containing the training and testing data organized for supervised learning. In the code snippet, 'x' and 'y' are used to represent the images and their corresponding target labels, respectively.
The images returned by the method are 1-D numpy arrays, each with a size of 784. These images are converted from uint8 to float32 and reshaped into 2-D matrices of size 28x28. Since the images are grayscale, their pixel values range from 0 to 255. To simplify the training process, the pixel values are normalized by dividing by 255.0, which scales them between 0 and 1. This normalization step is crucial because large values can complicate the training process.
Then, a basic CNN model was developed that processes a 3-D image by passing it through Conv2D layers with 16 filters, each sized 3x3 and using the ReLU activation function. These layers learn the weights and biases of the convolution filters, essentially functioning as the "eyes" of the model to generate feature maps. These feature maps are then sent to the MaxPooling2D layer, which uses a default 2x2 max filter to reduce the dimensionality of the feature maps while preserving important features to some extent.
The basic CNN model is trained for 2 epochs with a batch size of 32 through model.fit(), and a validation set, which was set aside earlier while loading MNIST data, is used. In this context, "epochs" denotes the total number of times the model sees the entire training data, whereas "batch_size" refers to the number of records processed to compute the loss for one forward and backward training iteration.
With the model ready, LIME for Explainable AI (XAI) could be applied. The lime_image module from the LIME package is explored to create a LimeImageExplainer object. This object has an explain_instance() method that takes 3-D image data and a predictor function, such as model.predict, and provides an explanation based on the predictions made by the function.
The explanation object features a get_image_and_mask() method, which, given the predicted labels for the 3-D image data, returns a tuple of (image, mask). Here, the image is a 3-D numpy array and the mask is a 2-D numpy array that can be used with skimage.segmentation.mark_boundaries to show the features in the image that influenced the prediction.
The results of the applied algorithm will be more thoroughly presented and discussed in Result and Discussion section of the paper.
3.6. Informational Nodes for the Diagnosis
After XAI algorithms isolating the critical region of interest (ROI) (each breast separately) is isolated from the image files. At this stage, temperature data of both healthy and tumor-affected breasts, stored in spreadsheet format, are used to calculate various statistical/computational parameters. Each temperature value within the spreadsheet cell corresponds to a singular pixel from the thermal image. It is worth noting that based on available medical information, a determination is made between a healthy and an affected breast. In cases where there is no tumor, for calculations the left breast is considered affected, and the right breast is considered healthy. Temperature values are measured in degrees Celsius. Below is a complete list of parameters used for these calculations.
Maximum Temperature
Minimum Temperature
Temperature Range (Maximum minus Minimum Temperature)
Mean
Median
Standard deviation
Variance
Deviation from the Mean (Maximum minus Mean Temperature)
Deviation from the Maximum Temperature of the Healthy Breast (Maximum minus Maximum Temperature of the Healthy Breast)
Deviation from the Minimum Temperature of the Healthy Breast (Maximum minus Minimum Temperature of the Healthy Breast)
Deviation from the Mean Temperature of the Healthy Breast (Maximum minus Mean Temperature of the Healthy Breast)
Deviation between Mean Temperatures (Mean minus Mean Temperature of the Healthy Breast)
Distance between Points of Maximum and Minimum Temperature:
- 14.
A = Number of Pixels near the Maximum with Temperature Greater than
- 15.
B = Number of pixels of the entire area
- 16.
C = Number of All Pixels with Temperature Greater than
- 17.
Ratio of A to B (A/B)
- 18.
Ratio of C to B (C/B)
The calculated factors, along with the initially gathered patient information, are merged into a unified file format compatible with the software requirements.
3.7. Bayesian Network (BN) Model
A Bayesian network (BN), also known as a belief network or probabilistic directed acyclic graphical model, is defined mathematically as a Graph Structure and more specifically a directed Acyclic Graph (DAG), G = (V, E) where V is a set of vertices (nodes) representing random variables and E is a set of directed edges (arcs) representing conditional dependencies between the variables. Each node in the network is associated with a conditional probability distribution where are the parent nodes of in the graph G.
The main property of a BN is a theorem that simplifies the calculation of the joint probability distribution. Let be a set of n random variables represented by the nodes in the DAG, G. Then the joint probability of the set of random variables X can be factorized as:
Furthermore, each node
is conditionally independent of its non-descendants given its parents. This is expressed as:
Thus, BNs provide an efficient way to represent the joint probability distribution by exploiting conditional independencies. Inference in Bayesian networks involves computing the posterior distribution of a set of query variables given evidence about some other variables. By defining the structure and the conditional probability distributions for each node, we can model complex probabilistic relationships using Bayesian networks.
The BN models that we have constructed include as informational nodes all previously mentioned factors presented in subsection 2.6 as well as historical medical record data. Finally, we can also include as an additional factor, the diagnosis from the pure CNN model if we want.
Within the compiled data file, the diagnosis variable is integrated, where a value of 1 signifies a positive diagnosis and 0 denotes a negative one. The file is then inputted into the software (BayesiaLab 11.2.1 [
47]), categorizing different parameters as either continuous or discrete (
Figure 9).
The final diagnosis decision is specifically designated as a target variable of the whole BN. Subsequently, the software computes freequencies between the provided parameters and the target variable, presenting the outcomes as conditional probabilities. Using this, the connections between nodes were established during training finalizing the acyclic graph (DAG).
Figure 10.
The link (relationship) between nodes through training in DAG.
Figure 10.
The link (relationship) between nodes through training in DAG.
Supervised learning, including Augmented Naive Bayes, was used to get results, and our findings were validated using K-fold analysis. In BayesiaLab, K-fold cross validation is a method used to evaluate the performance of Bayesian network models. It involves dividing the data set into K subsets (or folds), where one subset is used as the test set and the remaining K-1 subsets are used to train the model.