This paper presents the design of a multiscale dilated two simultaneous deep CNN technique to extract multiscale detail characteristics from MRI images. To increase the receptive field despite adding more parameters to the network, dilated convolution is used. Additionally, batch normalization is used to guarantee that the model's precision won't drop as the network depth increases.
Both local and global characteristics are acquired in the dilated PDCNN framework through the corresponding local and global routes. However, most DCNN-based methods cannot effectively collect both local and global data because of their tiny receptive fields. Stacking multiple dilated convolutions has the disadvantage of creating a grid effect, even though dilated convolution maintains data resolution at the output layer and expands the receptive field without incorporating computation. In the event that, with poor DF, the model may contain a smaller receptive field nevertheless misses the coarse features. In contrast, with the excessive DF, the model is unable to pick up from the finer details. By contrasting various DFs, these suitable DFs are chosen for both local and global feature paths. Each of the convolutional layers is followed by the max-pooling layer for every single path that down samples the outcome of the convolutional layer and uses the ReLU activation function. In the end, an average ensemble method is employed to carry out the brain tumor categorization process after four ML classifiers—have been training the images.
3.4.1. Multiscale Feature Selection Path
CNNs have been used extensively in the field of medicine and have demonstrated good results in the segmentation and classification of medical images [22, 23]. CNN architectures are built using a variety of building blocks, such as Fully-Connected (FC) layers, Pooling layers, and Convolution layers. Convolution layers, which combine linear and nonlinear operations—that is, activation functions and convolution operations—are used in feature extraction [
24,
25]. Kernels and their hyperparameters, such as the size, quantity, stride, padding, and activation function of each kernel, are the parameters of convolution layers [
26]. Six convolution layers are used in the two simultaneous paths and the convolution operation occurs using equation (1).
Where for
kernel in layer
expresses the resultant feature map of position
,
represents the weight vector’s values
indicates the input vector of position
in the
and
is the symbol of bias
In addition, the activation function is
[
27]. By down-sampling, pooling layers lower the dimensionality of the feature maps. The stride, padding, and filter size are among the hyperparameters that comprise pooling layers, although they do not contain any other parameters. Two common varieties of pooling layers are max pooling and global average pooling. Maximum pooling layer is used in this structure. The output size of the pooling operation in CNN is calculated using equation (2).
where
stands for the dimension of input,
is the kernel size, the padding size is shown by
, and
is symbol of stride size [
27].
The pooling layers' feature maps are smoothed out and sent to several one-dimensional (1D) vectors known as FC layers. The most popular activation parameter for FC layers is the Rectified Linear Unit (ReLU), which is illustrated in (3).
The final FC layer's activation function is usually SoftMax for the categorization of multiple classes and Sigmoid for binary classification. The node values in the final FC layer of the proposed model has computed using (4), and the sigmoid activation function for a binary categorization dataset-Ⅰ is calculated using (5) [
24].
where
stands for the neural network layers' internal calculations,
shows the bias, and
stands for the weights used to determine an output node's value. Furthermore, the input vector and output class are denoted by
and
, respectively. The SoftMax activation function is calculated using (6) for the multi-class categorization Figshare dataset-Ⅱ and Kaggle dataset-Ⅲ in this proposed structure.
where,
stands for the input vector and
for the class in the case of a multi-class categorization problem. Additionally, the
component of the class rating vector in the final FC layer is displayed by
. The category
with the highest
coefficient is chosen as the output class in the SoftMax activation function [
24]. A backpropagation algorithm has used during CNN training to adjust the weights of the FC and convolution layers. The two main elements of backpropagation are the loss function and Gradient Descent (GD), in which GD is used to minimize the loss function. Among the loss functions most frequently employed by CNNs is the Cross-Entropy (CE) loss function. For the binary categorization dataset-Ⅰ with sigmoid activation function the CE loss function is computed using (7).
where
computed using formula (4). For the multi-class categorization Figshare dataset-Ⅱ and Kaggle dataset-Ⅲ with the SoftMax activation function the CE loss function is calculated using (8) [
27,
28].
where
denotes the quantity of training elements, input image class
is indicated by
, and the
component of the category scores vector in the final FC layer is presented by
[
27].
Expanding the receptive field in deep learning involves boosting the dimension and depth of the convolution kernel, which in turn enhances the number of elements in the network. By adding weights of zero to the conventional convolution kernel, dilated convolution may enhance the receptive field without adding more network elements.
Equation (9) defines the convolution function * as follows: 1-D dilated convolution using DF,
connects input image
alongside kernel
. The term "standard CNN" refers to this 1-D convolution. The network is identified as dilated CNN when
rises.
Upon the introduction of a DF denoted as
and through its expansion,
is referred to as,
Using equation (10), the dilated convolution operation is calculated in this proposed structure. The fundamental CNN has a value of
[
28,
29].
The main function of dilated convolution layer is to extract features. In addition to conveying fine and high-level feature details, MRI images also contain rough and low-level information. As a result, image data must be extracted at several scales. Specifically, the local and global routes are employed to obtain the local and global features. Within the local route, the convolutional layers make use of the small 5x5 pixel window dimension to provide low-level details about the images. However, a vast number of filters with 12x12 pixels are present in the convolutional stages of the global path. The same 5 by 5 filters are used by three different convolution layers throughout the local path, and each layer's decremental even number of high DF (4,2,1) is the only factor used to produce the coarse feature maps. Three distinct convolution layers in the global path employ identical 12 × 12 filters, and the generation of finer feature maps is exclusively dependent on the tiny DF (2,1,1) of every single layer. As illustrated in
Figure 4, three convolution layers with distinct filter numbers (128, 96, 96) are applied at each feature extraction path to extract image data at various scales.
Conv1, Conv3, and Conv4 provide local as well as coarse features, while Conv2, Conv5, and Conv6 supply global as well as fine features. The max-pooling layer is employed after each convolutional layer for each path that down-samples the output of the convolutional layer. By employing a 2 × 2 kernel, the max-pooling layers lower the dimension of the attributes that are produced.
A dimension of (32, 32, 1) is assigned to each input tensor in the suggested model's structure. To test the impact of the DF on the model's efficiency and comprehend the gridding impact brought about by the dilation approach, the interior design is kept as simple as possible. In the local path, layer Conv1 applies a 5 × 5 filter and a dilation factor of =4 to generate coarse feature maps (such as shapes and contours); layer Conv3 applies the same filter and dilation factor of =2 along with the final convolution to generate coarse feature maps once more; and layer Conv4 applies a 5 × 5 filter and dilation factor of =1 to generate coarse feature maps. In the global route, layer Conv2 applies a 12 × 12 filter and a dilation factor of =2, layer Conv5 applies the same filter and dilation factor of =1 along with the last convolution to generate fine feature maps once more, and layer Conv6 applies a 12 × 12 filter and a dilation factor of =1 to generate fine feature maps. The activation function of ReLU is utilized by all six convolutional stages.
3.4.3. Hyperparameter Tuning
Hyperparameter adjusting is a successful parameter searching technique for the suggested dilated PDCNN framework. The dense layer, optimization, and dropout measure are among the parameters that must be chosen to perform this PDCNN adjustment. It provides the framework with the ideal set of parameters, producing the most effective results.
The training data for the simulated scenario is provided by the effective adjustment of the hyperparameter, which includes the Adaptive Moment Estimation (Adam) optimizer, 0.3 dropout, 512 dense layers, and 0.0001 rate of learning. In this work, the weight of the layers is updated via Adam, the optimizer that calculates the adaptive learning rates of every parameter. The training setting employs a validation frequency of 20 Hz. The highest average accuracy for the test datasets is collected for each run. When the epoch count reaches 70, the framework is trained employing a range of epoch counts; it acquires 98.67% accuracy for dataset-Ⅰ. It acquires 98.13% and 98.35% accuracy for dataset-Ⅱ, and dataset-Ⅲ respectively when the epoch number is 60.
Table 3.
Hyper-parameter Settings for Model Training.
Table 3.
Hyper-parameter Settings for Model Training.
| Hyper-Parameter |
Optimized Value |
| Optimizer |
Adam |
| Dropout |
0.3 |
| Dense Layer |
512 |
| Learning Rate |
0.0001 |
| Maximum Epoch |
50 |
| Validation Frequency |
20 |
| Iteration Per Epoch |
34 |
3.4.4. Feature Map of Dilated Convolutional Layers
A CNN feature map represents specific attributes in the input image as the result of a convolutional layer. It is produced by filtering input images or the previous layers' feature map output. The feature maps that are acquired from every convolutional layer are presented in
Figure 5 and
Figure 6. In
Figure 5, the low-level and coarse features of the three convolutional layers conv_1, conv_3, and conv_4 having filters of 128, 96, and 96 are displayed. The feature maps in this figure are primarily composed of coarse and local features which represent the texture in an image. In this local path, a dilated CNN algorithm that has DFs associated with (
= 4,
= 2,
= 1) is referred to as dilated PDCNN (4, 2, 1). In Figure6, the high-level feature maps include contour representations, shape descriptors and fine features of the deeper three convolutional layers conv_2, conv_5, and conv_6 having the same filters, are shown. DFs corresponding to (
=2,
= 1,
= 1) are used in this global path. The multiscale feature maps, which are displayed in
Figure 7, are greatly improved when these features are combined using a feature fusion technique.
Figure 8 displays the final multiscale features that are extracted, along with a fully connected layer that is prevented from overfitting by employing the dropout technique.
Local & Coarse Features (conv_1[DF:4*4], conv_3[DF:2*2], and conv_4[DF:1*1])
Figure 5.
Local and Coarse feature maps of various convolutional layers of the Dilated PDCNN (a) Feature map of conv_1-layer (b) Feature map of conv_3-layer (c) Feature map of conv_4-layer.
Figure 5.
Local and Coarse feature maps of various convolutional layers of the Dilated PDCNN (a) Feature map of conv_1-layer (b) Feature map of conv_3-layer (c) Feature map of conv_4-layer.
Global & Finer Features (conv_2[DF:2*2], conv_5[DF:1*1], and conv_6[DF:1*1])
Figure 6.
Global and finer feature maps of various convolutional layers of the Dilated PDCNN (a) Feature map of conv_2-layer (b) Feature map of conv_5-layer (c) Feature map of conv_6-layer.
Figure 6.
Global and finer feature maps of various convolutional layers of the Dilated PDCNN (a) Feature map of conv_2-layer (b) Feature map of conv_5-layer (c) Feature map of conv_6-layer.
Figure 7.
Addition of all features.
Figure 7.
Addition of all features.
Figure 8.
Features extraction after FC_2 layer.
Figure 8.
Features extraction after FC_2 layer.