The deployment of Vision-Language Models (VLMs) in Satellite IoT scenarios is critical for real-time disaster assessment but is often hindered by the substantial memory and compute requirements of state-of-the-art models. While parameter-efficient fine-tuning (PEFT) enables adaptation, with minimal computational overhead, standard supervised methods often fail to ensure robustness and reliability on resource-constrained edge devices. To address this, we propose EdgeV-SE, a self-reflective fine-tuning framework that significantly enhances the performance of VLM without introducing any inference-time overhead. Our framework incorporates an uncertainty-aware self-reflection mechanism with asymmetric dual pathways: a generative linguistic pathway and an auxiliary discriminative visual pathway. By estimating uncertainty from the linguistic pathway using a log-likelihood margin between class verbalizers, EdgeV-SE identifies ambiguous samples and refines its decision boundaries via consistency regularization and cross-pathway mutual learning. Experimental results on hurricane damage assessment demonstrate that our approach improves image classification accuracy, enhances image–text semantic alignment, and achieves superior caption quality. Notably, our work achieves these gains while maintaining practical deployment on a commercial off-the-shelf edge device such as NVIDIA Jetson Orin Nano, preserving the inference latency and memory footprint. Overall, our work contributes a unified self-reflective fine-tuning framework that improves robustness, calibration, and deployability of VLMs on edge devices.