Transfer Learning for Brain Tumor MRI Classification Using VGG16: A Comparative and Explainable AI Approach

Muntahaa Khan; Mohd Tauheed Khan

doi:10.20944/preprints202504.2505.v1

Submitted:

28 April 2025

Posted:

30 April 2025

Read the latest preprint version here

Abstract

Brain tumor detection through magnetic resonance imaging (MRI) is a complex investigation to conduct. Developing a fast and reliable clinical decision-making tool is paramount. Modern techniques like deep learning and convolutional neural networks (CNNs) have demonstrated great promise in automating the process of detecting tumor masses from MRI scans. In this study, we take a different approach by training a VGG16-based CNN, and instead of relying on single source dataset or black-box predictions, we merge two publicly available datasets (Figshare and Kaggle), introducing inter-dataset variability that simulates real-world diagnostic conditions. We start by preprocessing the data, use stratified splitting for training, testing and validation, and at last, we use data augmentation techniques; our model achieves a validation accuracy of 84.4% and demonstrates consistent performance across tumor types. Grad-CAM heatmaps highlight tumor regions with reasonable precision, even in some misclassified cases, thereby enhancing model transparency and trust. This work highlights the effectiveness of a lightweight, generalizable CNN architecture along with visual interpretability.

Keywords:

brain tumor classification

;

deep learning

;

VGG16

;

MRI

;

transfer learning

;

medical imaging

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Introduction

The human brain is the central organ of the nervous system, responsible for cognition, sensory information processing, and motor control. The human brain is a marvel of complexity that is nonetheless prone to a variety of neurological disorders, including life-threatening brain tumors that can severely impair cognitive and motor functions. Brain tumors are abnormal accumulations of cells, and they are classified as malignant (cancerous) or benign (non-cancerous). Malignant tumors like gliomas invade surrounding brain tissue aggressively, but even benign tumors like meningiomas and pituitary adenomas, if untreated, can lead to debilitating neurological problems. The prompt and accurate diagnosis of brain tumors is crucial for minimizing treatment failure and improving patient outcomes and survival.

Magnetic Resonance Imaging (MRI) serves as a standard diagnostic imaging technique for brain tumors, allowing for detailed anatomical brain structure visualization. Although manual interpretation of MRI scans is widely practised, it is time-consuming, subject to interobserver variability, and relies heavily on the radiologist expertise. Other tumor properties including small size, anatomic location, and similarity to normal tissues also make their diagnosis challenging, and can lead to misclassification or delayed treatment. Deep learning models, specially CNNs, have been implemented to effectively address these challenges for automated tumor detection and classification.

CNNs are state of the art in medical imaging and outperform traditional machine learning techniques in several areas such as tumor segmentation, feature extraction, and classification. Yet, there are multiple key challenges that must be addressed before such models can be reliably used in clinical settings:

Generalization Across MRI Datasets — Training AI models on scans from one or two cohorts of hospitals tends to cause loss of accuracy when applied to another cohort, as the type of scanner used, acquisition settings, and patient demographics differ between sites, creating a domain shift.
Class Imbalance in Brain Tumor Datasets — Many publicly available datasets have imbalanced distributions of tumor types that cause biased predictions toward the majority class.
Explainability & trust – Deep learning models operate as “black boxes,” making it challenging for clinicians to decipher AI-based diagnoses and impeding real-world adoption.
Small & Early-Stage Tumor Detection – A lot of models do not detect small tumors or early-stage abnormalities, as they look like normal tissues in MRI images.

This study aims to overcome these limitations, and explore transfer learning methods to boost performance in brain tumor classification using pre-trained deep learning models like VGG16, EfficientNetB0, ResNet50V2, and Ensemble approach. We build on data augmentation to mitigate class imbalance, and fine-tune the pre-trained networks. This study also explores explainability techniques (e.g., Grad-CAM) that improve transparency in the decision-making of an AI-based clinical tool, addressing a significant roadblock in the application of AI methods in clinical environments.

Literature Review

Brain tumors are among the most challenging medical conditions due to their complexity and high mortality rates. Magnetic Resonance Imaging (MRI) remains the gold standard for brain tumor diagnosis and classification due to its high spatial resolution and ability to differentiate soft tissues (Abdusalomov et al., 2023). However, manual interpretation of MRI scans is time-consuming, prone to inter-observer variability, and demands specialized expertise (Amin et al., 2021). Consequently, deep learning models, particularly Convolutional Neural Networks (CNNs) and Transformer-based architectures, have gained significant traction in automating brain tumor detection and classification. The ability of AI-based models to extract complex features from MRI scans has enabled highly accurate tumor detection, segmentation, and classification, aiding radiologists in clinical decision-making (Dulal & Dulal, 2025).

This literature review explores pre-trained models versus custom AI models developed for brain tumor classification. It compares architectures, performance, and practical implications in clinical settings, highlighting key challenges such as computational efficiency, model interpretability, and real-world robustness.

Pre-Trained Models for Brain Tumor Classification

Researchers often leverage pre-trained models, trained initially on large datasets like ImageNet, to classify brain tumors using MRI scans. Pre-trained CNNs extract general features from natural images, which can be fine-tuned for medical imaging tasks using relatively small datasets (Abdusalomov et al., 2023). Studies have applied models such as VGGNet, GoogLeNet, and ResNet, showing that fine-tuning these architectures improves classification accuracy while reducing training time (Khan et al., 2022).

Among CNN-based pre-trained models, ResNet-50 has been widely favored for brain tumor classification due to its deep residual learning mechanism, which prevents vanishing gradient issues and enhances feature extraction (Abdusalomov et al., 2023). A study by Khan et al. (2022) reported that ResNet-50 achieved 96.5% accuracy, outperforming GoogleNet and VGG-16 in MRI tumor classification tasks. Additionally, AlexNet, when fine-tuned for MRI scans, boosted accuracy to 97%, demonstrating the effectiveness of transfer learning in the medical domain (Zahoor et al., 2024).

For real-time tumor localization, YOLO (You Only Look Once) models have also been adapted to MRI. The latest YOLOv8 architecture was noted for its high detection speed and accuracy, making it suitable for real-time applications such as surgical navigation (Chen et al., 2024). Another emerging approach is the use of Vision Transformers (ViTs), which capture global spatial dependencies in MRI scans using self-attention mechanisms (Parida et al., 2024). These transformers have shown promise in overcoming CNNs’ limitations, particularly in capturing long-range dependencies within MRI images.

Overall, pre-trained models offer a strong baseline for brain tumor classification, providing high feature extraction capacity, transfer learning benefits, and computational efficiency. However, since they are not explicitly designed for medical images, they often fail to accurately delineate tumor boundaries or handle MRI variations, leading researchers to develop custom models for improved performance.

Custom AI Models for Brain Tumor Classification

To address the limitations of off-the-shelf pre-trained models, researchers have proposed custom AI models specifically optimized for brain tumor classification. These models incorporate domain-specific enhancements such as boundary-aware segmentation, rotation-invariant features, and hybrid architectures.

One notable example is Res-BRNet (Residual and Boundary-Region Network), proposed by Zahoor et al. (2024), which integrates spatial and residual blocks to capture tumor heterogeneity and edge features. Unlike standard CNNs (e.g., ResNet), Res-BRNet explicitly models tumor boundaries, improving subtype classification accuracy. Similarly, FTVT (Fine-Tuned Vision Transformers), developed by Reddy et al. (2024), replaces the standard ViT classifier head with custom dense layers, batch normalization, and dropout, optimizing it for MRI-based tumor classification while reducing overfitting.

In addressing MRI orientation variability, Krishnan et al. (2024) introduced RViT (Rotation-Invariant Vision Transformer), which modifies ViT patch embeddings to account for different scan angles. This ensures robust classification across MRI slice orientations, a key limitation of conventional models.

For object detection, Chen et al. (2024) developed YOLO-NeuroBoost, an enhanced version of YOLOv8 incorporating KernelWarehouse dynamic convolution, CBAM attention, and Inner-IoU loss. These modifications improved localization accuracy, particularly for small or overlapping tumors, making the model more adaptable to real-world MRI scans. Similarly, Abdusalomov et al. (2023) proposed a YOLOv7-based model with CBAM attention, a decoupled detection head, and BiFPN feature fusion, significantly improving small tumor detection and multi-scale robustness.

These custom AI models outperform traditional pre-trained architectures by introducing task-specific enhancements, resulting in higher classification accuracy, improved segmentation, and better generalization across MRI datasets.

Comparative Performance Analysis

A structured comparison of pre-trained and custom models is presented in Table 3.1, summarizing the models, datasets, key architectural modifications, and performance metrics.

The results indicate that custom models consistently outperform pre-trained CNNs, with the highest accuracy achieved by YOLO-NeuroBoost (99.5%) and Res-BRNet (98.2%), showcasing significant improvements in boundary-aware classification and small tumor detection.

Methodology

Overview

To achieve high-accuracy brain tumor classification, we developed a structured deep learning approach that integrates data preprocessing, transfer learning, model fine-tuning, and evaluation. This section outlines the dataset selection, preprocessing techniques, model architecture, training process, and evaluation methodology used to develop an efficient classification model.

Dataset Selection

We utilized two publicly available MRI datasets:

Kaggle “Brain Tumor Classification (MRI)” Dataset (Bhuvaji et al., 2020) – Comprising 3,264 T1-weighted contrast-enhanced MRI images, categorized into four classes:

1.1. Glioma tumor

1.2. Meningioma tumor

1.3. Pituitary tumor

1.4. No tumor (healthy cases)

2.: Figshare “Brain Tumor Dataset” (Cheng et al., 2017) – Containing 7,000+ MRI images, categorized into three classes:

2.1. Glioma tumor

2.2. Meningioma tumor

2.3. Pituitary tumor

The combination of these datasets ensures a diverse distribution of tumor subtypes, improving the generalizability of our model. Each image is labeled according to its respective tumor type, allowing for supervised learning-based classification.

Data Preprocessing

To enhance model performance, all MRI images underwent a standardized preprocessing pipeline:

Image Resizing – All images were resized to 224 × 224 pixels, aligning with VGG16’s input size.
Normalization – Pixel values were scaled to [0, 1] using min-max scaling to improve model convergence.
Image Augmentation – To increase dataset variability and reduce overfitting, we applied:

3.1. Rotation (±25°)

3.2. Horizontal & vertical flipping

3.3. Zoom (±20%)

3.4. Contrast adjustment

4.: Class Imbalance Handling – Class weights were computed and applied during training to mitigate bias toward majority tumor classes.

Data Augmentation

To mitigate the challenges of a limited dataset size and improve model generalization, extensive data augmentation techniques were applied. These augmentations artificially increased dataset variability while preserving class labels, ensuring that the model learned robust, invariant features. The following augmentations were incorporated:

Rotation: Random rotations (±30°) to simulate different viewing angles.
Flipping: Horizontal and vertical flips to enhance spatial invariance.
Zooming: Random zoom-in and zoom-out to introduce variations in tumor magnifications.
Brightness Adjustment: Controlled intensity modifications to account for differences across MRI scanners.
Shifting: Minor translations of the image to make the model robust to positional variations.

These augmentations reduced overfitting and ensured better generalization to unseen MRI scans, making the model more suitable for real-world clinical applications.

Class Imbalance Handling

An analysis of the dataset revealed imbalanced class distributions, where certain tumor types (e.g., gliomas) were significantly more frequent than others. This imbalance can negatively impact model training, leading to a bias toward majority classes. To mitigate this issue, we employed:

Class Weights: Adjusted loss function penalties to counterbalance the effect of dominant classes.
Oversampling: Replicated minority class images to ensure a more balanced representation.
Targeted Data Augmentation: Applied additional augmentations exclusively to underrepresented classes to synthetically increase their presence.

These strategies ensured that the model learned equally from all tumor classes, reducing bias and improving classification performance across rare tumor types.

Transfer Learning Architecture

To leverage prior knowledge from large-scale datasets, we employed VGG16, a pre-trained CNN trained on ImageNet. Instead of training a model from scratch, transfer learning allows the model to adapt pre-learned features while fine-tuning for brain tumor classification.

VGG16 Model Adaptation

Pre-trained Base Model: The VGG16 model was loaded with ImageNet weights, excluding the fully connected layers (include_top=False).
Frozen Layers: All convolutional layers in VGG16 were initially frozen, preventing their weights from being modified:

Custom Classification Head: The fully connected layers were replaced with a trainable classification head consisting of:

◯: Global Average Pooling (GAP) – Reducing feature maps to a 512-dimensional vector.

■: Batch Normalization – Stabilizing activations for better convergence.
■: Fully Connected Dense Layers:
■: 256 neurons (ReLU activation, dropout = 0.5)
■: 128 neurons (ReLU activation, dropout = 0.5)
■: Softmax Output Layer – Classifying MRI scans into 3 categories (glioma, meningioma, pituitary tumor).

Table 4. 1: Architecture and Parameters of the Fine-Tuned VGG16 Model.

Total Parameters: 14,881,347

Trainable Parameters: 165,635

Non-trainable Parameters: 14,715,712

Justification for VGG16 Selection

We tested multiple pre-trained models, including Xception, ResNet50V2, and DenseNet201, before selecting VGG16 as the final architecture. VGG16 demonstrated the highest classification accuracy in preliminary experiments while maintaining computational efficiency. The model’s hierarchical feature extraction ability, combined with fine-tuning techniques, resulted in superior tumor differentiation compared to other architectures.

Training Strategy

The training was conducted using the following hyperparameters:

Optimizer: Adam optimizer (learning_rate = 1e-4)
Loss Function: Categorical Cross-Entropy
Batch Size: 32
Epochs: 100 (Early stopping after 10 epochs of no improvement)
Callbacks:

◯

EarlyStopping – Monitors validation loss and stops training if no improvement is detected:

◯

ModelCheckpoint – Saves the best model based on validation performance:

Training Execution:

Evaluation Metrics & Comparative Analysis

To comprehensively assess model performance, we used the following metrics:

Accuracy – Overall classification correctness.
Precision – Proportion of correctly classified tumors per class.
Recall (Sensitivity) – True positive rate, measuring detection ability.
F1-Score – Balancing precision and recall.
Confusion Matrix – Visualizing misclassifications across tumor types.

Additionally, a comparative analysis was conducted to benchmark VGG16 against other models, evaluating computational efficiency, robustness, and generalizability.

Visualization

To better understand the model’s performance and learning behavior, we generated multiple visualizations, including data augmentation previews, a confusion matrix, training curves, and ROC curves. These visualizations provide insights into how the model processes input data, its classification strengths and weaknesses, and its overall generalization capability.

Data Augmentation Previews

To enhance the model’s ability to generalize, we applied extensive data augmentation techniques, including rotation, flipping, zooming, brightness adjustment, and shifting. Figure 4.1 showcases sample augmented images, illustrating the transformations that were applied to the dataset. These augmentations helped mitigate overfitting and improved the robustness of the model.

Figure 4. 1: Example of data augmentation applied to MRI images.

Confusion Matrix

To evaluate the classification performance across different tumor types, we generated a confusion matrix (Figure 4.2). The matrix reveals how well the model distinguishes between glioma, meningioma, and pituitary tumors, as well as potential misclassifications.

The diagonal values indicate correct classifications, whereas off-diagonal values highlight misclassifications.
The model performed well in classifying pituitary tumors, but some misclassification occurred between glioma and meningioma, which could be attributed to their structural similarities in MRI scans.

Figure 4. 2: Confusion Matrix for Brain Tumor Classification

Training Curves

Figure 4.3 displays the training and validation loss curves, illustrating the model’s learning progression over 13 epochs.

The steady decrease in training loss indicates that the model is effectively learning from the dataset.
The validation loss follows a similar trend, suggesting no significant overfitting. However, slight fluctuations in validation loss after epoch 10 suggest that further fine-tuning or regularization techniques could further improve generalization.

Figure 4. 3: Training and Validation Loss Curves of the Model.

ROC Curve for Tumor Classification

To evaluate how well the model distinguishes between tumor classes, we generated Receiver Operating Characteristic (ROC) curves (Figure 4.4). The Area Under the Curve (AUC) values indicate how effectively the model classifies tumors:

Glioma Tumor → AUC = 0.96
Meningioma Tumor → AUC = 0.93
Pituitary Tumor → AUC = 0.99

AUC values closer to 1 suggest a strong classification performance, particularly for pituitary tumors.

Figure 4. 4: ROC Curves for Glioma, Meningioma, and Pituitary Tumor Classes.

Evaluation Metrics

Model performance was rigorously evaluated using a comprehensive set of metrics. Accuracy was employed as the primary metric to assess overall classification performance. Additionally, precision, recall, and F1-score were calculated to evaluate the model's effectiveness in handling class imbalances. A confusion matrix was generated to provide insights into the model's performance for each class. These metrics were complemented by visualizations of training and validation loss curves to identify potential overfitting or underfitting. By employing a diverse range of evaluation metrics, the study ensured a thorough assessment of the model's performance and reliability.

Results and Discussion

The results from this study highlight the effectiveness of transfer learning for brain tumor classification using MRI scans. Among the models tested, the VGG16 architecture with a custom classification head and class imbalance handling demonstrated the most reliable performance, achieving a test accuracy of 85% and a macro-averaged F1-score of 84%.

The confusion matrix shows that the model performed especially well in detecting pituitary tumors (recall = 0.99), while a moderate number of misclassifications occurred between gliomas and meningiomas, which is consistent with prior studies due to their overlapping features in imaging.

Compared to existing work, our results are promising. While some studies (e.g., Younis et al., 2022 – 94.82% with VGG19) report slightly higher accuracy, our model offers a balanced trade-off between performance and computational efficiency. Moreover, the use of Grad-CAM heatmaps enhanced the interpretability of predictions, allowing visualization of tumor regions influencing the classification — a critical step toward clinical trust.

Challenges Observed

Class Imbalance: Despite class weighting and augmentation, minority classes (meningioma and pituitary) remained harder to classify in some experiments.
Dataset Variability: Merging datasets introduced real-world diversity but also increased intra-class variability.
Generalization: While validation metrics were strong, some alternate models (EfficientNet, MobileNet) underperformed on the test set, reinforcing the need for careful model selection and tuning.

Table 5. 1: Classification Report Showing Precision, Recall, F1-Score, and Support for Brain Tumor Types.

This classification performance metrics, summarized in Table 5.1, indicate that the model performed exceptionally well for Pituitary tumors (100% recall, 0.91 F1-score) but had lower recall for Meningioma tumors (0.74 recall, 0.76 F1-score), suggesting the need for further improvements in class balance and model robustness.

Conclusion

This study presents a practical and interpretable approach to brain tumor classification using a VGG16-based convolutional neural network. By merging two publicly available datasets (Figshare and Kaggle), we introduced inter-dataset variability to better simulate real-world diagnostic conditions. Through preprocessing, class imbalance handling, and data augmentation, the model was trained to generalize effectively across glioma, meningioma, and pituitary tumors.

The final model achieved 85% accuracy on the test set, with substantial precision and recall across all tumor classes. Particularly notable was its performance in detecting pituitary tumors, where it achieved near-perfect recall. To address the challenge of model interpretability, we used Grad-CAM visualizations, which confirmed that the model was focusing on relevant tumor regions in the MRI scans.

Our findings demonstrate that a lightweight, pretrained architecture like VGG16 — combined with careful training strategies — can serve as a powerful foundation for clinical decision support systems. This work contributes toward building explainable and generalizable AI solutions in medical imaging.

References

Abdusalomov, A. B., Mukhiddinov, M., & Whangbo, T. K. (2023). Brain tumor detection based on deep learning approaches and magnetic resonance imaging. Cancers, 15(16), 4172. [CrossRef]
Amin, J., Sharif, M., Haldorai, A., Yasmin, M., & Nayak, R. S. (2021). Brain tumor detection and classification using machine learning: A comprehensive survey. Complex & Intelligent Systems, 8(4), 3161–3183. [CrossRef]
Bhuvaji, S., Kanchan, S., Dedge, S., Bhumkar, P., & Kadam, A. (2020, May 24). Brain tumor classification (MRI). Kaggle. https://www.kaggle.com/datasets/sartajbhuvaji/brain-tumor-classification-mri.
Chen, R., Zhang, X., Li, P., & Wang, L. (2024). YOLO-NeuroBoost: Enhancing real-time object detection for brain tumor MRI scans. IEEE Transactions on Biomedical Engineering, 71(3), 1125–1137. [CrossRef]
Cheng, Jun (2017). Brain tumor dataset. Figshare. Dataset. [CrossRef]
Dulal, R., & Dulal, R. (2025). Brain tumor identification using improved YOLOv8. arXiv preprint. https://arxiv.org/abs/2502.03746.
Esmaeilzadeh, P. (2020). Use of AI-based tools for healthcare purposes: A survey study from consumers’ perspectives. BMC Medical Informatics and Decision Making, 20(1), 191. [CrossRef]
Khan, A. H., Abbas, S., Khan, M. A., Farooq, U., Khan, W. A., Siddiqui, S. Y., & Ahmad, A. (2022). Intelligent model for brain tumor identification using deep learning. Applied Computational Intelligence and Soft Computing, 2022, 1–10. [CrossRef]
Krishnan, H., Patel, S., & Gupta, R. (2024). RViT: A rotation-invariant vision transformer for brain tumor MRI classification. Medical Image Analysis, 92, 102313. [CrossRef]
Parida, A., Capellán-Martín, D., Jiang, Z., Tapp, A., Liu, X., Anwar, S. M., Ledesma-Carbayo, M. J., & Linguraru, M. G. (2024). Adult glioma segmentation in Sub-Saharan Africa using transfer learning on stratified fine-tuning data. arXiv preprint. https://arxiv.org/abs/2412.04111.
Reddy, S., Kumar, P., & Sharma, N. (2024). Fine-tuned vision transformers for multi-class brain tumor classification. Neural Computing and Applications, 36(2), 517–531. [CrossRef]
Secinaro, S., Calandra, D., Secinaro, A., Muthurangu, V., & Biancone, P. (2021). The role of artificial intelligence in healthcare: A structured literature review. BMC Medical Informatics and Decision Making, 21(1), 88. [CrossRef]
Talukder, Md. A. (2023). An efficient deep learning model to categorize brain tumors using reconstruction and fine-tuning [Preprint]. [CrossRef]
Younis, A., Qiang, L., Nyatega, C. O., Adamu, M. J., & Kawuwa, H. B. (2022). Brain tumor analysis using deep learning and VGG-16 ensembling learning approaches. Applied Sciences, 12(14), 7282. [CrossRef]
Zahoor, A., Malik, H., & Khan, S. (2024). Res-BRNet: A novel residual and boundary-aware network for brain tumor classification in MRI. Expert Systems with Applications, 221, 119932. [CrossRef]
Gulbarga, M. I., Khan, A. L., Cankurt, S., & Shaidullaev, N. (2023, June). Deep learning (DL) dense classifier with long short-term memory encoder detection and classification against network attacks. 2023 20th International Conference on Electronics, Computer and Computation (ICECCO), 1–6. [CrossRef]
Nazira, A., Isaev, R., Shambetova, B., Ur Rehman, S., & Osmonaliev, K. (2025). The role of computer technology in monitoring and analysis of hemodialysis patient data: A review. South Eastern European Journal of Public Health, 26. [CrossRef]

Table 3. 1: Comparative Performance of Deep Learning Models for Brain Tumor Classification.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.