5. Results and Discussion
The research utilizes formal ten-fold testing for machine learning and CNN classification methodologies. The training uses 85% of the gene expression data, and the remaining 15% is employed for testing the model. Since this research is binary classification, the performance metrics are accordingly chosen. In binary classification problems, a confusion matrix is a valuable tool for evaluating the performance of a machine-learning model. As mentioned earlier, the confusion matrix summarizes the predictions made by the model against the true labels of the data. The performance metrics such as Accuracy, F1 score, MCC, Error Rate, and Kappa are derived by analyzing the values within the confusion matrix.Next, we discuss in detail the performance matrices employed in this research.
Accuracy
The accuracy of a classifier is a measure of how well it correctly identifies the class labels of a dataset. It is calculated by dividing the number of correctly classified instances by the total number of instances in the dataset.The equation for accuracy is given by Wilson et al. [
35].
F1 Score
The F1 score is a measure of a classifier's accuracy that combines precision and recall into a single metric. It is calculated as the harmonic mean of precision and recall, with values ranging from 0 to 1, where 1 indicates perfect precision and recall.The equation for F1 score is given by Koizumi et al. [
36].
Here, precision is the proportion of true positives among all instances classified as positive, and recall is the proportion of true positives among all positive instances. The F1 score is useful when the classes in the dataset are imbalanced, meaning there are more instances of one class than the other. In such cases, accuracy may be a bad metric, as a classifier that predicts the majority class would have high accuracy but low precision and recall. The F1 score provides a more balanced measure of a classifier's performance.
Matthews Correlation Coefficient (MCC)
MCC stands for "Matthews Correlation Coefficient," which measures the quality of binary (two-class) classification models. It considers true and false positives and negatives and is particularly useful in situations where the classes are imbalanced.
The MCC is defined by the following equation as given in Chicco et al. [
37]:
The MCC takes on values between -1 and 1, where a coefficient of 1 represents a perfect prediction, 0 represents a random prediction, and -1 represents a perfectly incorrect prediction.
Error Rate
As mentioned by Duda et al. [
38], the error rate of a classifier is the proportion of misclassified instances. It can be calculated using the following equation:
Kappa <!-- MathType@Translator@5@5@MathML2 (no namespace).tdl@MathML 2.0 (no namespace)@ -->
e kappa statistic, also known as Cohen's kappa, measures agreement between two raters or between a rater and a classifier. In the context of classification, it is used to evaluate the performance of a classifier on a binary or multi-class classification task. The kappa sta
istic measures the agreement between the predicted and true classes, considering the possibility of the agreement by chance. Cohen et al. [
39] defined kappa as follows:
Where
is the observed proportion of agreement, and
is the proportion of agreement expected by chance.
and
. are calculated as follows:
The kappa statistic takes on values between -1 and 1, where values greater than 0 indicate agreement better than chance, 0 indicate agreement by chance, and values less than 0 indicate agreement worse than chance. The results are tabulated in the following tables.
Table 8 indicates the performance analysis of the classifiers based on parameters such as Accuracy, F1 Score, MCC, Error Rate, and Kappa values for the STFT Dimensionality Reduction method without feature selection methods. It is explored from
Table 8 that the GMM Classifier is a high performing one with an accuracy of 80.66%, an F1 Score of 87.71%, with a low error rate of 19.34%. The GMM Classifier also demonstrates a moderate value of MCC 0.4419 and a Kappa value of 0.4285. The Softmax discriminant Classifier is a low-performing classifier with a low accuracy of 59.11%, with high Error Rate of 40.89% and an F1 Score of 69.67%. The MCC and Kappa values of the SD classifier are 0.2083 and 0.1609, respectively.
Table 9 depicts the performance analysis of the classifiers for the STFT Dimensionality Reduction method with PSO feature selection methods. It is identified from
Table 9 that the SVM(RBF) Classifier achieved high accuracy of 94.47%, an F1 Score of 96.62% with a low error rate of 5.52%. The SVM(RBF)Classifier has also reached a high value of MCC 0.81709 and Kappa value of 0.81485. The Non-linear Regression classifier is placed at the lower edge with a low accuracy of 59.91 %, a high Error Rate of 43.09% and an F1 Score of 68.03%. The MCC and Kappa values of the Non-Linear Regression classifier are at 0.1496 and 0.1156, correspondingly.
Table 10 exhibits the performance analysis of the classifiers for the STFT Dimensionality Reduction method with Harmonic Search feature selection methods. As shown in
Table 10, the SVM (RBF) Classifier achieved high accuracy of 90.05% F1 Score of 93.75% with a low error rate of 9.94%. The SVM (RBF) Classifier is also maintained at a high value of MCC 0.711 and Kappa value of 0.6963. Unfortunately, SVM (poly) classifier is placed at the low performing one with an accuracy of 59.11 %, a high Error Rate of 40.89% and an F1 Score of 70.4%. The MCC and Kappa value of the SVM (Poly) classifier is at 0.1512 and 0.1217, accordingly.
Figure 9 shows the Performance of Classifiers with and without Feature Selection regarding MCC and Kappa parameters. As indicated by
Figure 9 that the SVM (RBF) Classifier is reached a high value of MCC 0.81709 and Kappa value of 0.81485. The Non-Linear Regression classifier for the PSO feature selection method attains low MCC and Kappa values of 0.1496 and 0.1156. The average MCC and Kappa value across the classifiers for feature selection is 0.3212 and 0.27403, respectively. The average MCC and Kappa values across the classifiers for PSO and Harmonic Search Feature selection methods are settled at 0.4194, 0.3878 and 0.4841 and 0.4667. This signature effect of Feature selection shows the improvement of average MCC and Kappa values across the classifiers.
Figure 10 shows the Performance of Classifiers with and without Feature Selection regarding parameters like Accuracy, F1 Score and Error Rate. As identified by
Figure 10, the SVM (RBF) Classifier with the PSO feature selection method achieved higher values in Accuracy of 94.47%, an F1 Score of 96.62% and a lower Error Rate of 5.52% than all other categories of classifiers. In the case of the Harmonic Search feature selection method, once again SVM (RBF) classifier attained good values of Accuracy of 90.05%, an F1 Score of 93.75% and a low error Rate of 9.944% compared with all other six Classifiers. GMM Classifier attained an appreciable value of Accuracy of 80.66%, F1 Score of 87.71% and moderate Error Rate of 19.33% for the STFT inputs without any feature selection methods. The effect of feature selection improves the classifiers' benchmark parameters and overall performance. The PSO feature Selection retained its top position. It maintained its superiority over the harmonic search Feature selection method, reflected in the classical improvements of the classifier's performance.
Table 11 explores the performance analysis of the classifiers for raw micro array gene data with CNN methods.
As displayed in
Table 11, the SVM (RBF) Classifier achieved the highest accuracy of 90.607%, an F1 Score of 94.56%, with a low error rate of 9.39%. The SVM (RBF) Classifier is also maintained at a moderate value of MCC 0.6329 and Kappa value of 0.6031. For the CNN method, all seven classifiers are maintained at more than 83% accuracy and more than 90% F1 Score. The Softmax Discriminant Classifier attained minimum MCC and Kappa values of 0.5016 and 0.4616, respectively.
Table 12 exhibits the performance analysis of the classifiers for the STFT Dimensionality Reduction method with CNN methods.
Table 12 shows that Soft max Discriminant Classifier(SDC) achieved a high accuracy of 91.66%, an F1 Score of 95.08%, with a low error rate of 8.33%. The SD Classifier is also maintained at a high MCC value of 0.6825 and Kappa value of 0.6785. All seven classifiers are maintained at more than 86% of accuracy and more than 92% of F1 Score. The Naïve Bayesian Classifier achieved an accuracy of 91.66% and also attained MCC and Kappa values of 0.6742 and 0.625, respectively. It is also observed from
Table 12 that STFT input to the CNN method enhances the performance metrics of the classifiers when compared with raw input with CNN methods, as discussed in the paper.
Figure 11 depicts the Performance of Classifiers regarding MCC and Kappa parameters for Raw and STFT inputs to the CNN method. As indicated by
Figure 11, the SD Classifier for the STFT feature method reached high MCC and Kappa values of 0.6825 and 0.6785, respectively. The average MCC and Kappa values across the classifiers are 0.5236 and 0.485. As shown by
Figure 11 that the SVM (RBF) Classifier is attained at good values of MCC 0.6329 and Kappa value of 0.6031 for the raw inputs to the CNN method. The average MCC and Kappa value across the classifiers for CNN is 0.4794 and 0.4226, respectively. The Scalogram effect of STFT inputs to the CNN methods exemplifies the enhancement of average MCC and Kappa values across the classifiers.
Figure 12 shows the Performance of Classifiers regarding Accuracy, F1 Score, and Error Rate parameters for raw and STFT inputs with the CNN method. As depicted by
Figure 12, the SVM(RBF) Classifier for raw input with the CNN method reached high accuracy, F1 Score and low error rates of 90.6%,94.56% and 9.39%, respectively. As displayed in
Figure 12 that the SD Classifier is attained at the high accuracy value, F1 Score and low Error Rate of91.66%, 95.08% and 8.33%, respectively, for STFT inputs with the CNN method. The yardstick effect of STFT inputs in CNN methods maketh the improvement in the accuracy, F1 Score and Error Rate values across the classifiers.
Figure 13 displays the Performance of Classifiers in terms of Deviation of MCC and Kappa Parameters with mean Values.The MCC and Kappa are the benchmark parameters that indicate the classifiers' outcomes for different inputs.As in this research, there are two categories of inputs like raw microarray genes, STFT, STFT with PSO and Harmonic Search Feature selection, STFT with Scalogram method and finally, CNN methods are provided to the classifiers. The classifier's performance is observed through the attained MCC and Kappa values for these inputs. The average MCC and Kappa values from the classifiers are 0.4681 and 0.44518, respectively. Now a methodology is devised to identify the deviation of MCC and Kappa values from their respective mean values to point out the classifier's performance. It is also observed from
Figure 13 that the MCC and Kappa values placed in the third quadrant of the graph depict the nonlinearoutcome of the classifiers with lower performance metrics. The MCC and Kappa values placed at the first quadrant show higher outcomes for classifiers with MCC and Kappa values more than the average value. This specifies the classifier's performance is improved for the STFT inputs along with the PSO Feature selection method.
Figure 13 is also denoted by the curve fitting of the linear line with the following equation y = 1.062x + 1E-07 and R² = 0.991 values
.
5.1. Computational Complexity
The computational complexity analyses the efficiency of the machine learning methods in terms of time and space complexity. In this research, the Big O notation represents the computational complexity of the dimensionality reduction, feature selection and classification methodologies. The computational complexity is represented by O(n),where n is the number of features. The O(log2n) means that the computational complexity increases 'log2n' times for any increase in 'n'. The classifiers considered in this research are integrated with either dimensionality reduction techniques, feature selection techniques, or a combination of both. Therefore, the computational complexity also is a combination of these hybrid methods. Overall, the choice of methodology should consider the trade-off between computational complexity and the desired level of performance in classification methodology.
Table 13 shows the computational complexity of the classifiers for the STFT dimensionality reduction method with and without feature selection methods, along with CNN Models.
As noted from
Table 13, the SVM(Linear), NBC, Nonlinear Regression, Soft max discriminant classifiers, and SVM(RBF) classifiers are at the level of low computational complexity for the STFT DR method without feature selection methods. GMM Classifier attains moderate complexity of O(2n
4 log2n) without feature selection methods and achieved good accuracy of 80.66%. For PSO and Harmonic search feature selection methods, the SVM(RBF) classifier with the computational complexity of O(2n
5 log 4n) placed at high accuracy of 94.47% and 90.05%, respectively.GMM and SVM(polynomial) classifiers are denoted by the high computational complexity of O(2n
7 log 2n) for the PSO and Harmonic Search Feature selection methods poorly performed in terms of accuracy score. These poor performances of classifiers are due to the presence of outlier problems in the STFT features. In order to enhance the performance of the classifiers, the STFT Scalogram features and CNN methods are included in this research. The SVM(RBF) Classifier attained good performance with moderate complexity of O(2n
3 log4n) and O(2n
3 log8n) for the methods, respectively.Next, through
Table 14, we compare the accuracy of our research work with various methodologies across diverse microarray gene datasets employed for Adenocarcinoma and mesothelioma lung cancer.
Overall, this research evaluates and explores machine learning and CNN classifiers for detecting lung cancer from microarray gene expressions. The research investigation shows that although CNNs are good at learning features, due to their black-box nature, the CNNs fail to analyse the physical meaning behind forecasting and categorising gene expression patterns from the learned features. Further, the LH2 dataset used for this research contains data imbalance, and many times during the analysis, CNNs were prone to overfitting and underfitting issues. The measures like the feature extraction step and replacement of the softmax layer with suitable classifiers were adopted to rectify these issues. But, the trade-off is that computational complexity will be increased while adding multiple stages to the CNN classification methodology.
Our research shows that microarray gene expression-based data is instrumental for disease classification. It contributes significantly to understanding gene expression patterns and their association with various biological processes and diseases. However, microarray genes have some limitations over general approaches like mRNA-Seq. The microarray gene expressions are selected from limited areas with specific genes or tissue regions. Therefore, the data generated may not capture the entire transcriptome and provide comprehensive information about gene expression. Also, microarray experiments may be prone to background noise and cross-hybridization problems due to non-specific binding, resulting in decreased sensitivity and specificity. Microarray experiments are expensive and sometimes have limited throughput compared to newer high-throughput sequencing technologies like mRNA-Seq.