A Novel Approach of CT Images Feature Analysis and Prediction to Screen for Corona Virus Disease (COVID-19)

: The paper demonstrates the analysis of Corona Virus Disease based on a probabilistic model. It involves a technique for classification and prediction by recognizing typical and diagnostically most important CT images features relating to Corona Virus. The main contributions of the research include predicting the probability of recurrences in no recurrence (first time detection) cases at applying our proposed approach for feature extraction. The combination of the conventional statistical and machine learning tools is applied for feature extraction from CT images through four images filters in combination with proposed composite hybrid feature extraction (CHFS). The selected features were classified by the stack hybrid classification system(SHC). Experimental study with real data demonstrates the feasibility and potential of the proposed approach for the said cause.


Introduction
Data mining skills involved in biomedical sciences and investigate for providing prediction for help to identify the disease and classify it correctly [1]. Screening large numbers of reported cases for successful isolation and treatment is a priority to control the spread of Corona Virus Disease . Pathogenic laboratory testing is the scientific gold standard but, given significant false-negative results, it is time-consuming. There is an urgent need for quick and accurate diagnosis methods to combat the disease. Based on COVID-19 radiographic improvements in CT scans, we tried to create a deep learning algorithm that could extract the graphical characteristics of COVID-19 to provide a prepathogenic clinical diagnosis and thus save critical time for disease control [2][3]. Even before clinical symptoms emerge, computed tomography diagnose irregularities in patients with laboratory-confirmed coronavirus, according to a new case report [51][52][53]. It is yet another critical piece of evidence showing the central role of the modality in stopping the lethal epidemic. The case, reported in Clinical Imaging on February 22, reports that of a 61-year-old asymptomatic man admitted to a Chinese hospital 1,000 miles outside Wuhan after claiming close contact with an infected person [60]. In addition to identifying early abnormalities, CT showed a result that was not seen in any other COVID-19 instances. As well as those previously diagnosed with standard viral pneumonia and SARS, we obtained 250 CT photographs of pathogen-confirmed COVID-19 events from the Kaggle database web. Our proposed hybrid feature extraction of four filters ( MPEG-7 edge histogram filter with Gabor filterpyramid of rotation-invariant local binary pattern histograms -fuzzy 64-bin histogram ), which analyzes a low-level feature of an image can extract the features and provide a statistical hypothesis [40][41][42][43][44][45][46][47][48][49][50]. Our proposed model using composite hybrid attribute selection (CHFS) to achieve high accuracy in prediction and improve the feature extraction methods [2][3][4][5][6][7][8][9][10][11][12][13][14][15] with hybrid classification techniques for combine multiclassifiers to improving an in-depth investigation. Testing is employed to see which feature vectors/elements are most informative to differentiate different image classes. Also, using (CNN) for relatively little pre-processing compared to other image classification algorithms and traditional classifiers. The article planned as follows. The next section discusses the literature review of other authors who have used data mining and its relative of machine learning algorithm to analyze coronavirus. Section 3 describes the proposed technique used for feature extraction from CT images datasets with the CHFS model and four image filters. Section 4 describes the method used for the stack hybrid classification process and convolution neural network (CNN) in comparison with traditional classifiers, whereas section 5 describes the experiments and evaluation. Section 6 discusses the results. Finally, section 7 presents the paper summary and conclusions.

Literature Review
(Fei Shan) developed DL-based segmentation that uses the "VB-Net" neural network to segment CT scans of COVID-19 infection regions. The device is equipped using 249 patients with COVID19 and tested using 300 new patients with COVID-19. A human-in-the-loop (HITL) methodology is introduced to assist radiologists in developing automatic recording of each case to speed up the manual delineation of CT images for analysis. To assess DL-based system performance, the Dice similarity coefficient, volume differences, and infection percentage (POI) are calculated on the validation collection between automatic and manual segmentation checks. EOLBREAK (Xiaowei Xu) observed that in the early stage of identification of viral RNA from sputum or nasopharyngeal swab, the real-time reverse transcription-polymerase chain reaction (RT-PCR) had a relatively low positive rate regarding evaluating COVID-19 (named World Health Organisation). COVID-19 computed tomography (CT) imagery manifestations had their characteristics, which vary from other forms of viral pneumonia, such as viral influenza-A pneumonia. Hospital doctors often lobby for another clear diagnostic criteria for this new type of pneumonia at the earliest possible opportunity. The accuracy of the chest CT (Yicheng Fang) was higher than that of RT-PCR (98% vs. 71% respectively, p<.001). The explanations for viral nucleic acid detection's low efficiency may include: 1) premature production of nucleic acid detection technology; 2) variability in the detection rate from different manufacturers; 3) weak viral load of patients; or 4) inadequate clinical sampling. (Xingzhi Xie, Zheng Zhong) observed that most instances have similar features on CT pictures, such as GGO or combined and merged GGOs. The peripheral spread of 2019-nCoV pneumonia is likely to occur with a longitudinal, multifocal lower lung involvement [6][7][8]. Given negative RT-PCR samples, CT characteristics of viral pneumonia may be highly suspect for 2019-nCoV infection in the case of regular clinical presentation, and exposure to other persons with 2019-nCoV. In these cases, it is essential to consider repeat swab testing and patient isolation. Modern image edge detection algorithms include both firstorder differential operators (i.e., operators Roberts, Prewitt, Sobel, and Canny) and second-order differential operators (i.e., operators Laplacian which LoG) and can extend to a wide variety of applications. Through integrating this with mathematical mechanics, Wang and Liu used the Roberts operator to identify vehicle image edges and distinguish vehicle license plate positions. While the detection mentioned above algorithms have the benefits of being simple and easy to implement and delivering excellent performance in realtime, they also have obvious shortcomings. The Roberts operator extracted image edge function is relatively rough and offers imprecise edge positions. The edge features which the Prewitt operator extracts have wide margins and many discontinuities. Likewise, the Sobel director does not provide precise picture edge coordinates. The Laplacian operator highly noised sensitive, and the LoG operator can not remove salt and pepper noise in an image.

Proposed Work
In the proposed work, the CT images dataset collected from the online access Kaggle benchmark dataset. The medical dataset contains several CT images for Sars and Covid-19 older adults-the proposed layout in Fig 1,2.  The following steps explain the mechanism of the proposed work on CT images dataset :

Data Collection
The CT images dataset has two classes of images both in training as well as the testing set containing a total of around ~51 images each segregated into the severity of Sars and coronavirus (online access Kaggle benchmark dataset,2020): i.Covid-19 ii.Sars

Data Pre-processing
In the real world, data collected tend to be not wholly complete, noisy and conflicting, detection missing of data, data irregularity, prevent the errors, and decrease the data to be analyzed would lead to massive payouts for decision making (H. i.simplification of the models to make them easier for researchers/users to understand ii.shorter training periods iii.preventing the curse of dimensionality iv.improved generalization by minimizing overfitting.

3.3.proposed Hybrid Feature Selection Approach on CT-Images
There is a variability of approaches used to obtain images. Some of these are the mean and difference dependent on the bi-orthogonal wavelet filter, image retrieval based on shape. Edges are an essential image feature that is present between an objective and a background and between two targets, two zones, or two primitives. Much of the details for a picture borne in the margins. A picture edge is usually a group of pixels for which the gray-level values demonstrate a shift in phase(Jianfang Cao,2018) [12][13][14][15][16][17][18][19][20][21][22].

3.3.1.MPEG7 Histogram Filter
The histogram is the form most widely used to describe an image's composition of any global function. Translation and rotation of the picture invariant, a normalization of the histogram corresponds to the invariance of the distance. According to the edge histogram description in MPEG-7, an extra histogram bin may easily be created from the local 5-bin edge histogram of each 4 *4 sub-image. A statistical hypothesis testing is employed to see which feature vectors/elements are most informative to differentiate different image classes. The histogram is very useful for indexing and extracting pictures using the above property.

Gabor Image Filter
Gabor filter used for texture analysis, which implies that it mainly analyzes if there is some different frequency information in the picture in specific directions across the point or area of analysis in a regional region. It can define by a sinusoidal wave (a plane wave for 2D Gabor filters) multiplied by a Gaussian function. A set of Gabor filters with different frequencies and orientations may assist in extracting useful features from an image. In the discrete domain, the two-dimensional Gabor filters are given by (2) (3)

3.3.3.Pyramid of Rotation-Invariant Local Binary Pattern Histograms Image Filter
The local binary sequence (LBP) commonly used in the classification of textures. The modern LBP methods define only micro-texture picture structures, such as edges, corners, points, although many of them show excellent texture classification efficiency. This situation could still not be changed, although the technique of multi-resolution research used in local binary pattern methods(Ojala et al., 2002). The texture operator LBP has become a simplified approach in different applications. These can describe as a unifying solution to historically different computational and structural texture research frameworks. To the LBP operator, the following terminology used: LBPP, Ru2. The subscript represents a neighborhood using the operator (P, R). Superscript u2 indicates that only standard patterns used and the other patterns marked with a single label. After obtaining the LBP-labeled picture fl(x, y), the LBP histogram can identify as = ∑ , { ( , ) = }, = 0, … , − 1, (4) N is the number of different LBP operators labels, and I{A } is 1 if A is valid and 0 if A is false. When the picture patches with different sizes applied to the histograms, the histograms must standardized to achieve a coherent description:

Fuzzy 64-bin Histogram Image Filter
The Fuzzy 64-bin focused on color vision is usually not typically shown in RGB. The better model of HVS is the so-called opposing color type. The competing color space has three component fuzzy 64-bin bases on color expectations typically not better represented in RGB.

Consequences of Opponent Channel CSF Luminance channel is Bandpass function
• Wide bandwidth ⇒ high spatial resolution.
• Low-frequency cut-off is ⇒ insensitive to the average luminance level. Chrominance channels are • Lowpass function • Lower bandwidth ⇒ low spatial resolution.

.5. Applying Hybrid feature selection architecture model on CT-images
The author combines four filters to feature extraction process considered for the optimal selection feature set from CT-images, as shown in fig 3.

Information Gain feature selection
The calculation of the information gain for only one attribute according to the algorithm below (Aouatif Amine & ...& Rziza Driss ,2011) [28][29][30][31][32]: This gain measure gives the effect of the features, and the following algorithm selects features that are larger than the threshold.

Gain Ratio Feature Selection
A decision tree can be a simple form when non-terminal nodes perform tests on many attributes to the effect of decision outcomes (J.R. Quinlan,1986) [38]. The gain ratio is

Optimized Genetic Algorithm (OGA)
The authors propose a method to modify a general genetic algorithm to evaluates specified attributes on training data or a separate testing set and uses a decision tree (J.R. Quinlan,1986) [32][33][34][35][36][37][38]to estimate the 'merit' of a set of attributes to produce an optimized feature subset with genetic search elevation strategy to recognize the features. All feature selection technique should use an evaluation function together with a search strategy to achieve the optimal feature set(Huang & C.,2012) [44]. It is unable to be realized to search all subsets to find out an optimal subset and need much effort to indicate whether a particular feature is present or not in the chromosome, one, and zero used. One in a gene position refers to feature and zero to absent(Yanan Mao & Dingyuan Fan,2016).The number of features and what are the features that are to be present in a chromosome are guided by information gain (IG) and gain ratio (GR). The initial population created using input values of IG and GR of the values present in the chromosome. After Generated the population, the individuals evaluated using a fitness function. There is no general approach to find the fitness function for a genetic algorithm. It is a heuristic approach and depends on the used application. So the authors nominate a C4.5 classifier to be used as a fitness function because C4.5 has some utility of handling both continuous and discrete attributes and training data with missing attribute values, pruning trees after creation -C4.5 goes back through the tree once it has been created and try to eject branches that do not help by replacing them with leaf nodes (Dash & H. Liu, 1997) . (J.R. Quinlan,1986) . The following algorithm selects a feature from the set of features that are gained by OGA, gain ratio, and Information gain, as shown in fig 5.

Proposed Stack Hybrid Classification Model
A weka software tool (Weka online open-source accessed,2018) [16] shows the list of black-box classifiers. These algorithms, in general, are used to classify the medical dataset.

Two learning evaluators can be used to evaluate the dataset
• Training set: the classifier separates a dataset to test and training data. • Cross-validation: in case of 10 fold cross-validation (Divya Jain, Vijendra Singh,2018).

Stacking technique
Ensemble methods are learning methods that contain a set of classifiers for classifying data by taking a weighted point of their predictions (Leo Breiman,1996) [25] . The authors combine multiple classifiers to get the maximum efficiency of classification accuracy and overcome the weakness of individual classifiers in the classification process on potential patients. Classifiers, as shown in fig 6. Fig. 6the proposed framework of stack hybrid classification based on the CHFS model The author chooses permanent (Jrip, RF) based on a result from the below table2, which achieves a good indicator of a fitness function problem. Weka explorer is for exploring the data(weka,2019) [16]. Using explorer preprocessing can be performed for data. The result-oriented attribute can be selected, and the results can visualize. The experimenter used to understand the learning curve of stack hybrid classification based on (CHFS) and compare results with individual traditional classifiers without (CHFS). The knowledge flow interface lets the user represent learning algorithms and data sources into the required configuration. It enables the user to specify the data flow by connecting components representing data sources, preprocessing tools learning algorithms evaluation method, and visualization modules[55-75].

Metrics used in health check systems for evaluation
The different performance metrics generally used to explore the performance of the different models like sensitivity, accuracy, precision, and f-measure (

3.4.5.A Stack, Hybrid Classification Model, Based On Hybrid Feature Selection OF(MPEG7-Gabour filterbinary patterns pyramidfuzzy 64 bin histogram) on CT-images Dataset
In below figure 7 shows the architecture of all processes from feature extraction output will coming through the four image filter to extract sensitive unseen data to prepare it into the classification process [35]. Also, the classification process takes two ways of the test. The first way to consider the CNN approach in analyzing the MRI images and its major for extract and correctly classify the images with less error percentage depends on its algorithms. The other one we involved the data mining techniques and machine learning classifiers to in-depth into extracted data from four image filter in the feature extraction step. Besides, the comparison between stack hybrid classification with CNN classification [59].

4.1.Result from proposed (CHFS) feature selection model
The CT-images dataset collected from Kaggle database website with a total of 51 images each segregated into the severity of COVID-19, first we apply four image filter mention in the proposed architecture above in fig 6 on MRI images to extract the mathematical data analysis relation to applying the CHFS model and the result of feature extracted shown below in table1.

4.2.Applying Convolution neural network (CNN) on CTimages of covid-19
The author's aims to make a comparison between the proposed model and convolution neural network (CNN), We tested a set of CT-images of COVID-19 from the kaggle.com benchmark web of dataset science, as shown in below figure18, to perform the accuracy of early-screen diagnosis by the convolution neural network. Fig. 9 the types of covid19-ct-images

Fig. 10 shows the prediction model for CNN CT-images
In the below figure 20, we propose the result of ctimages-CNN classification on Kaggle 51 CT-images dataset by using the convolution neural network in weka software data mining, and the result was 94.11% of accuracy, the result evaluated by ROC curvef-measure

Fig.11 ROC curve of CT-images category and confusion matrix of CNN classification
And in the below figure 12 shown the error curve of CNN classification. Fig.12  The classification accuracy of the convolution neural network (CNN) was 94.11% and f-measure 94%, which considered least than the classification accuracy of our proposed model.

4.3.Applying a Novel Hybrid classification model on CTimages
The author applying a novel stack hybrid classification as described above in section 3.4.5 on the CT-images dataset to improve the classification accuracy of CT-images by using a traditional algorithms of classifiers but differently as a combination process of(Jrip, Random Forest)classifiers with stacked meta classifiers of (Naive Bayes-SVM-Jrip-Random Forest). In table 4, the result of the comparison of accuracy between pre-post applying our proposed classification model based on the four image filter with CHFS feature extraction model and in figure 13,14, the ROC curve of all classifiers process with the result of accuracy and f-measure precision-recall-true positive rate-false positive rate for our proposed model as shown in table 4.

Discussion
The authors compare the results of the proposed feature extraction model and stack hybrid classification on covid19-Sars CT-images data in different cases pre-post feature extraction model and pre-post proposed stack hybrid classification and the result compared with CNN model on CT-images dataset. These comparisons, according to our proposed model presented in this study, were reduced a false negative rate and showed a relatively high overall accuracy with more accurate results. The best result extracted from our proposed model was CHFS-Stacked (jrip, RF) with Naïve Bayes classifier accuracy of 96.07% compared to the output result of CNN accuracy of 94.11%. After that, the author applying a novel feature extraction on Ct-images with four image filters later with the CHFS model of optimizing the GA algorithm to best select sensitive features from CT-images. The reduction of selected features from 1473 to 15 features. In the next stage, we applying our proposed stack hybrid classification on 15 features among four different classifiers to obtain the improves in the classification process.

6.Conclusion
In this work, the author aims to early diagnosis of covid19 by using a benchmark CT-Images dataset on our proposed combination of four image filters with composed hybrid feature selection (CHFS) model. That combines the advantages of three filter feature selection approaches and optimizes the Genetic Algorithm (OGA) by improving the initial population generating and genetic operators. Using the results of filter approaches as some prior information with using the J48 decision tree classifier as a fitness function instead of probability and random selection to speed up convergence and select the best features, After that using the selected feature in stack hybrid classification and combine three classifiers with improving the prediction and accuracy. The proposed model shows better than the traditional classification approaches for optimum feature selection and improvement of the classification process and effectively reduced the false-negative rate with high accuracy when using a Naïve Bayes as a meta-classifier in a hybrid classification method with 96.07% compared to 48.66% of usage individually which is considerably better than the previous state-of-the-art result. The results of the proposed model show an accurate classify Ct-Images COVID19. Such findings show the proof-of-principle for using artificial intelligence to derive radiological characteristics for timely and precise diagnosis of COVID-19.