Submitted:
29 November 2025
Posted:
02 December 2025
You are already at the latest version
Abstract
Oil spills pose severe ecological and economic threats, making rapid detection and severity assessment essential for effective environmental response and mitigation. Traditional remote-sensing approaches rely heavily on manual interpretation or rule-based algorithms, both of which are limited by variability in weather, illumination, and sea conditions. With the growing availability of satellite imagery and advancements in artificial intelligence, deep learning techniques offer powerful alternatives for automated oil spill identification. This study develops and evaluates a two-stage deep learning pipeline designed to (1) detect and segment oil spill regions in satellite images using semantic segmentation, and (2) classify the severity of identified spills using a supervised image-level classifier. The project utilizes the publicly available altunian/oil_spills dataset, consisting of 1,040 paired satellite images and color-encoded segmentation masks representing four classes: Background, Water, Oil, and Others. Stage 1 of the pipeline employs a U-Net architecture with a ResNet-18 encoder pretrained on ImageNet. The model performs pixel-level segmentation to isolate oil regions from surrounding ocean and environmental structures. Stage 2 uses a modified ResNet-18 classifier that accepts four-channel one-hot encoded segmentation outputs and predicts one of three spill severity levels derived from the proportional area of oil pixels: No Oil (<5%), Minor (5–15%), and Major (>15%). The pipeline was trained using the PyTorch framework with separate training cycles for each stage, enabling modular evaluation and interpretability. A systematic experimental setup including an 80/10/10 training–validation–test split, cross-entropy loss functions, Adam optimization, and 20-epoch training windows was used to assess model performance. Results show that the U-Net segmentation model achieves a mean Intersection-over-Union (IoU) of 0.8156 on the test set, with particularly strong performance on the Background (0.9123) and Water (0.8567) classes and lower, but still effective, performance on the Oil class (0.7234). These findings reflect the inherent class imbalance in satellite imagery, where oil occupies a small proportion of total pixels. The ResNet classifier achieved an overall accuracy of 88.76%, with F1-scores of 0.90 for No Oil, 0.85 for Minor, and 0.90 for Major severity levels. Classification errors were concentrated around the Minor category, consistent with threshold-based class definitions and segmentation uncertainty. The combined results demonstrate that a two-stage deep learning approach offers substantial improvements in both accuracy and interpretability over single-stage or heuristic-based systems. Segmentation masks provide visual justification for classification outputs, enabling a more transparent workflow for environmental monitoring agencies. Despite strong performance, limitations include dataset size, imbalance across severity classes, and dependency of classification accuracy on segmentation quality. Future work may incorporate data augmentation, advanced architectures such as U-Net++ or DeepLabv3+, temporal satellite imagery, or uncertainty quantification models for risk-aware operational deployment. Overall, our two-stage pipeline provides a robust, interpretable, and scalable framework for real-time oil spill detection and severity assessment in satellite imagery.
