Accurate and timely detection of brain tumors on magnetic resonance imaging (MRI) is a critical prerequisite for effective neuro-oncological management. While convolutional neural networks (CNNs) have been the dominant paradigm for medical image classification over the past decade, the recent emergence of vision-capable large language models (LLMs) offers a complementary, training-free pathway to image-based decision support. This study presents a controlled, head-to-head comparison between 17 ImageNet-pretrained CNN architectures and 8 state-of-the-art multimodal LLMs on the publicly available Brain MRI Images for Brain Tumor Detection dataset (n = 253; 155 tumor, 98 non-tumor). Following an 80/20 train–test partition (n = 202 / n = 51), CNN models were fine-tuned via transfer learning, whereas LLMs were evaluated in a zero-shot configuration using a standardized prompt. Test-set performance was assessed using accuracy, precision, recall, specificity, F1-score, Cohen's kappa, and the area under the receiver operating characteristic curve (AUC). Among CNNs, six architectures (DenseNet169, DenseNet201, InceptionV3, ResNet101V2, VGG16, Xception) tied at 94.12% test accuracy, while ResNet50 and NASNetMobile exhibited pronounced overfitting (45.10% and 49.02%, respectively). Among LLMs, ChatGPT 5.4 Thinking achieved perfect classification (100% on all metrics), with ChatGPT 5.5 Thinking and Gemini 3.1 Thinking attaining 98.04% and 94.12% accuracy. These findings indicate that modern multimodal foundation models can match or exceed bespoke CNNs in low-data medical imaging tasks and support their further investigation as components of clinical decision-support pipelines.