A CNN Classification Model For Diagnosis Covid19 — Source link

: The paper demonstrates the analysis of Corona Virus Disease based on a CNN probabilistic model. It involves a technique for classification and prediction by recognizing typical and diagnostically most important CT images features relating to Corona Virus. The main contributions of the research include predicting the probability of recurrences in no recurrence (first time detection) cases at applying our proposed Convolution neural network structure. The Study is validated on 2002 chest X-ray images with 60 confirmed positive covid19 cases and (650 bacterial – 412 viral -880 normal) x-ray images. The proposed CNN compared with traditional classifiers with proposed CHFS feature extraction model. The experimental study has done with real data demonstrates the feasibility and potential of the proposed approach for the said cause. The result of proposed CNN structure has been successfully done to achieve 98.20% accuracy of covid19 potential cases with comparable of traditional classifiers.


Introduction
Data mining skills involved in biomedical sciences and investigate for providing prediction for help to identify the disease and classify it correctly [1]. Screening large numbers of reported cases for successful isolation and treatment is a priority to control the spread of Corona Virus Disease . Pathogenic laboratory testing is the scientific gold standard but, given significant falsenegative results, it is time-consuming. There is an urgent need for quick and accurate diagnosis methods to combat the disease. Based on COVID-19 radiographic improvements in CT scans, we tried to create a deep learning algorithm that could extract the graphical characteristics of COVID-19 to provide a pre-pathogenic clinical diagnosis and thus save critical time for disease control [2][3]. Even before clinical symptoms emerge, computed tomography diagnose irregularities in patients with laboratory-confirmed coronavirus, according to a new case report [51][52][53]. It is yet another critical piece of evidence showing the central role of the modality in stopping the lethal epidemic. The case, reported in Clinical Imaging on February 22, reports that of a 61-year-old asymptomatic man admitted to a Chinese hospital 1,000 miles outside Wuhan after claiming close contact with an infected person [60]. In addition to identifying early abnormalities, CT showed a result that was not seen in any other COVID-19 instances. As well as those previously diagnosed with standard viral pneumonia and SARS, we obtained 250 CT photographs of pathogen-confirmed COVID-19 events from the Kaggle database web. Our proposed hybrid feature extraction of four filters ( MPEG-7 edge histogram filter with Gabor filter-pyramid of rotation-invariant local binary pattern histogramsfuzzy 64-bin histogram ), which analyzes a low-level feature of an image can extract the features and provide a statistical hypothesis [40][41][42][43][44][45][46][47][48][49][50]. Our proposed model using composite hybrid attribute selection (CHFS) to achieve high accuracy in prediction and improve the feature extraction methods [2][3][4][5][6][7][8][9][10][11][12][13][14][15] with hybrid classification techniques for combine multi-classifiers to improving an in-depth investigation. Testing is employed to see which feature vectors/elements are most informative to differentiate different image classes. Also, using (CNN) for relatively little pre-processing compared to other image classification algorithms and traditional classifiers. The article planned as follows. The next section discusses the literature review of other authors who have used data mining and its relative of machine learning algorithm to analyze coronavirus. Section 3 describes the proposed technique used for feature extraction from CT images datasets with the CHFS model and four image filters. Section 4 describes the method used for the stack hybrid classification process and convolution neural network (CNN) in comparison with traditional classifiers, whereas section 5 describes the experiments and evaluation. Section 6 discusses the results. Finally, section 7 presents the paper summary and conclusions.

Literature Review
(Fei Shan) developed DL-based segmentation that uses the "VB-Net" neural network to segment CT scans of COVID-19 infection regions. The device is equipped using 249 patients with COVID19 and tested using 300 new patients with COVID-19. A human-in-the-loop (HITL) methodology is introduced to assist radiologists in developing automatic recording of each case to speed up the manual delineation of CT images for analysis. To assess DL-based system performance, the Dice similarity coefficient, volume differences, and infection percentage (POI) are calculated on the validation collection between automatic and manual segmentation checks. EOLBREAK (Xiaowei Xu) observed that in the early stage of identification of viral RNA from sputum or nasopharyngeal swab, the real-time reverse transcription-polymerase chain reaction (RT-PCR) had a relatively low positive rate regarding evaluating COVID-19 (named World Health Organisation). COVID-19 computed tomography (CT) imagery manifestations had their characteristics, which vary from other forms of viral pneumonia, such as viral influenza-A pneumonia. Hospital doctors often lobby for another clear diagnostic criteria for this new type of pneumonia at the earliest possible opportunity. The accuracy of the chest CT (Yicheng Fang) was higher than that of RT-PCR (98% vs. 71% respectively, p<.001). The explanations for viral nucleic acid detection's low efficiency may include: 1) premature production of nucleic acid detection technology; 2) variability in the detection rate from different manufacturers; 3) weak viral load of patients; or 4) inadequate clinical sampling. (Xingzhi Xie, Zheng Zhong) observed that most instances have similar features on CT pictures, such as GGO or combined and merged GGOs. The peripheral spread of 2019-nCoV pneumonia is likely to occur with a longitudinal, multifocal lower lung involvement [6][7][8]. Given negative RT-PCR samples, CT characteristics of viral pneumonia may be highly suspect for 2019-nCoV infection in the case of regular clinical presentation, and exposure to other persons with 2019-nCoV. In these cases, it is essential to consider repeat swab testing and patient isolation. Modern image edge detection algorithms include both first-order differential operators (i.e., operators Roberts, Prewitt, Sobel, and Canny) and second-order differential operators (i.e., operators Laplacian which LoG) and can extend to a wide variety of applications. Through integrating this with mathematical mechanics, Wang and Liu used the Roberts operator to identify vehicle image edges and distinguish vehicle license plate positions. While the detection mentioned above algorithms have the benefits of being simple and easy to implement and delivering excellent performance in real-time, they also have obvious shortcomings. The Roberts operator extracted image edge function is relatively rough and offers imprecise edge positions. The edge features which the Prewitt operator extracts have wide margins and many discontinuities. Likewise, the Sobel director does not provide precise picture edge coordinates. The Laplacian operator highly noised sensitive, and the LoG operator can not remove salt and pepper noise in an image.

Proposed Work
In the proposed work, the CT images dataset collected from the online access Kaggle benchmark dataset. The medical dataset contains several CT images for Sars and Covid-19 older adults-the proposed layout in Fig  1,2.  The following steps explain the mechanism of the proposed work on CT images dataset :

Data Collection
The CT images dataset has Four classes of images both in training as well as the testing set containing a total of around ~2002 chest x-ray images each segregated into the severity of bacterial and viral pneumonia and coronavirus (online access Kaggle benchmark dataset,2020): iii.Viral ii.Bacterial iv. Normal

Data Pre-processing
In the real world, data collected tend to be not wholly complete, noisy and conflicting, detection missing of data, data irregularity, prevent the errors, and decrease the data to be analyzed would lead to massive payouts for decision making (H. Witten & Eibe Frank ,2008).
For machine learning and analytics, the selection of features is the process of selecting a subset of specific features (variables, predictors) for use in model building. Component selection methods used for many reasons(H. Witten & Eibe Frank ,2008): i.simplification of the models to make them easier for researchers/users to understand ii.shorter training periods iii.preventing the curse of dimensionality iv.improved generalization by minimizing overfitting

3.3.proposed Hybrid Feature Selection Approach on CT-Images
There is a variability of approaches used to obtain images. Some of these are the mean and difference dependent on the bi-orthogonal wavelet filter, image retrieval based on shape. Edges are an essential image feature that is present between an objective and a background and between two targets, two zones, or two primitives. Much of the details for a picture borne in the margins. A picture edge is usually a group of pixels for which the gray-level values demonstrate a shift in phase(Jianfang Cao,2018) [12][13][14][15][16][17][18][19][20][21][22].

3.3.1.MPEG7 Histogram Filter
The histogram is the form most widely used to describe an image's composition of any global function. Translation and rotation of the picture invariant, a normalization of the histogram corresponds to the invariance of the distance. According to the edge histogram description in MPEG-7, an extra histogram bin may easily be created from the local 5-bin edge histogram of each 4 *4 subimage. A statistical hypothesis testing is employed to see which feature vectors/elements are most informative to differentiate different image classes. The histogram is very useful for indexing and extracting pictures using the above property.

Gabor Image Filter
Gabor filter used for texture analysis, which implies that it mainly analyzes if there is some different frequency information in the picture in specific directions across the point or area of analysis in a regional region. It can define by a sinusoidal wave (a plane wave for 2D Gabor filters) multiplied by a Gaussian function. A set of Gabor filters with different frequencies and orientations may assist in extracting useful features from an image. In the discrete domain, the two-dimensional Gabor filters are given by (2)

3.3.3.Pyramid of Rotation-Invariant Local Binary Pattern Histograms Image Filter
The local binary sequence (LBP) commonly used in the classification of textures. The modern LBP methods define only micro-texture picture structures, such as edges, corners, points, although many of them show excellent texture classification efficiency. This situation could still not be changed, although the technique of multi-resolution research used in local binary pattern methods (Ojala et al., 2002). The texture operator LBP has become a simplified approach in different applications. These can describe as a unifying solution to historically different computational and structural texture research frameworks. To the LBP operator, the following terminology used: LBPP, Ru2. The subscript represents a neighborhood using the operator (P, R). Superscript u2 indicates that only standard patterns used and the other patterns marked with a single label. After obtaining the LBP-labeled picture fl(x, y), the LBP histogram can identify as = ∑ , { ( , ) = }, = 0, … , − 1, (4) N is the number of different LBP operators labels, and I{A } is 1 if A is valid and 0 if A is false. When the picture patches with different sizes applied to the histograms, the histograms must standardized to achieve a coherent description:

Fuzzy 64-bin Histogram Image Filter
The Fuzzy 64-bin focused on color vision is usually not typically shown in RGB. The better model of HVS is the so-called opposing color type. The competing color space has three component fuzzy 64-bin bases on color expectations typically not better represented in RGB. (C. A. Bouman,2007): • O1 is a luminance component • O2 is the red-green channel

Consequences of Opponent Channel CSF Luminance channel is Bandpass function
• Wide bandwidth ⇒ high spatial resolution.
• Low-frequency cut-off is ⇒ insensitive to the average luminance level. Chrominance channels are • Lowpass function • Lower bandwidth ⇒ low spatial resolution.

Applying Hybrid feature selection architecture model on CT-images
The author combines four filters to feature extraction process considered for the optimal selection feature set from CT-images, as shown in fig 3.

Proposed Composite Hybrid Feature Selection Model (CHFS)
Feature selection is the approach of taking a subset of relevant features for use in model construction (Chen & ...& F. Li,2010), and combines the advantages of three feature selection approaches (Filter (IG,GR)-Wrapper( improved (Genetic Algorithm)) with Embedded(C4.5)).

Composed Hybrid feature selection architecture
The author combine of three feature extraction technique considered for the optimal selection feature set, and this method is information gain (IG) -gain ratio (GR) and Optimized Genetic Algorithm

Information Gain feature selection
The calculation of the information gain for only one attribute according to the algorithm below (Aouatif Amine & ...& Rziza Driss ,2011) [28][29][30][31][32]: This gain measure gives the effect of the features, and the following algorithm selects features that are larger than the threshold.

Gain Ratio Feature Selection
A decision tree can be a simple form when non-terminal nodes perform tests on many attributes to the effect of decision outcomes (J.R. Quinlan,1986) [38]. The gain ratio is

Optimized Genetic Algorithm (OGA)
The authors propose a method to modify a general genetic algorithm to evaluates specified attributes on training data or a separate testing set and uses a decision tree (J.R. Quinlan,1986) [32][33][34][35][36][37][38]to estimate the 'merit' of a set of attributes to produce an optimized feature subset with genetic search elevation strategy to recognize the features. All feature selection technique should use an evaluation function together with a search strategy to achieve the optimal feature set(Huang & C.,2012) [44]. It is unable to be realized to search all subsets to find out an optimal subset and need much effort to indicate whether a particular feature is present or not in the chromosome, one, and zero used. One in a gene position refers to feature and zero to absent(Yanan Mao & Dingyuan Fan,2016).The number of features and what are the features that are to be present in a chromosome are guided by information gain (IG) and gain ratio (GR). The initial population created using input values of IG and GR of the values present in the chromosome. After Generated the population, the individuals evaluated using a fitness function. There is no general approach to find the fitness function for a genetic algorithm. It is a heuristic approach and depends on the used application. So the authors nominate a C4.5 classifier to be used as a fitness function because C4.5 has some utility of handling both continuous and discrete attributes and training data with missing attribute values, pruning trees after creation -C4.5 goes back through the tree once it has been created and try to eject branches that do not help by replacing them with leaf nodes (Dash & H. Liu, 1997) . (J.R. Quinlan,1986) . The following algorithm selects a feature from the set of features that are gained by OGA, gain ratio, and Information gain, as shown in fig 5.

Proposed Stack Hybrid Classification Model
A weka software tool (Weka online open-source accessed,2018) [16] shows the list of blackbox classifiers. These algorithms, in general, are used to classify the medical dataset.

Two learning evaluators can be used to evaluate the dataset
• Training set: the classifier separates a dataset to test and training data.

Metrics used in health check systems for evaluation
The different performance metrics generally used to explore the performance of the different models like sensitivity, accuracy, precision, and f-measure (

4.Experimental Results and Evaluation
The CSV file of 2002 x-ray Dataset to patients from Kaggle.com loaded to the weka tool. All experiments evaluated by the receiver operating characteristic (ROC) curve, accuracy, F-measure.

4.1.Result from proposed (CHFS) feature selection model
The CT-images dataset collected from Kaggle database website with a total of 51 images each segregated into the severity of COVID-19 [74][75][76], first we apply four image filter mention in the proposed architecture above in fig 6 on x-ray images to extract the mathematical data analysis relation to applying the CHFS model and the result of feature extracted shown below in table1. Besides using four image filters to extract the features from x-ray images, which extract 82 features from MPEG-7 filter and 142 features after applying Gabor filter with 898 features from binary patterns pyramid with final applying of fuzzy 64-bin histogram to produce total 1473 features from CT-images dataset. Besides, applying our proposed CHFS with optimization of genetic algorithm to get the last 165 sensitive features [76].  The overall improvement in our experiment on traditional classifiers produce 13.87% after using our proposed feature extraction of traditional model from CT-images [77]. The figure below shows it summarize for all results in table 6.

4.2.Applying proposed Convolution neural network (CNN) on covid-19-normal-bacterial-viral xray images
The author's aims to make a comparison between the proposed traditional model and proposed modifying convolution neural network (CNN) model structure, We tested a set of x-ray-images of COVID-19--normal-bacterial-viral x-ray images from the kaggle.com benchmark web of dataset science, as shown in below figure8, to perform the accuracy of early-screen diagnosis by the convolution neural network.  In the below figure 11, we propose the result of ct-images-CNN classification on Kaggle 2002 CT-images dataset by using the convolution neural network in weka software data mining, the result evaluated by ROC curvef-measure   The classification accuracy of the convolution neural network (CNN) was 98.20% and f-measure 98.2%, which considered better than the traditional classififers accuracy .

Discussion
The authors compare the results of the proposed feature extraction model on covid19-Bacterial -viral-normal x-ray -images data in different cases pre-post feature extraction model and the result compared with proposed CNN structure model. These comparisons, according to our proposed model presented in this study, were reduced a false negative rate and showed a relatively high overall accuracy with more accurate results. The best result extracted from our proposed CNN model was achieved accuracy of 98.20% compared to the output result of traditional classifiers achieved of accuracy of 77.36 %. This result make convolution neural network and deep learning plays a vital role in diagnosing a serious global epidemic that has no cure.

6.Conclusion
In this work, the author aims to early diagnosis of covid19 by using a benchmark x-ray-Images dataset on our proposed combination of four image filters with composed hybrid feature selection (CHFS) model. That combines the advantages of three filter feature selection approaches and optimizes the Genetic Algorithm (OGA) by improving the initial population generating and genetic operators. Using the results of filter approaches as some prior information with using the J48 decision tree classifier as a fitness function instead of probability and random selection to speed up convergence and select the best features with compare of modifying a novel CNN structure which contain 2 CNN of (20-50) filters to obtain the best way of diagnosis,. The proposed CNN model shows better than the traditional classification approaches for optimum feature selection and improvement of the classification process and effectively reduced the false-negative rate with high accuracy with 98.20% compared to 77.31% of usage traditional classififers individually after proposed chfs feature selection model which is considerably better than the previous state-of-the-art result.
The results of the proposed CNN model show an accurate classify of x ray-Images of COVID19 between viral and bacterial pneumonia.