Submitted:
08 April 2025
Posted:
09 April 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- KANConv Blocks: replacing standard convolutions with learnable B-spline activations to dynamically suppress SSS noise while preserving edge details;
- KANConv-PAN: A deformable feature pyramid network using spline-parameterized kernels to correct geometric distortions and fuse multi-scale targets;
- Dual-Task Head: combining CIoU Loss for detection and segmentation with Dice Loss to refine boundary-sensitive segmentation.
2. Related Works
2.1. SSS Data Processing and Benchmarking
2.2. YOLOv8
3. Proposed Method
3.1. Structure of the CKAN-YOLOv8 Model
- Input Preprocessing: Utilizes noise-based augmentation, adaptive scaling, and grayscale padding to optimize raw image data for diverse detection scenarios;
- Backbone Architecture: Incorporates convolutional layers, C2f modules, and spatial pyramid pooling (SPPF) blocks for hierarchical feature extraction through convolutional operations and multi-scale pooling;
- Neck Network: Leverages a modified path aggregation network (PANet) topology to integrate multi-level features via bidirectional sampling (upsampling/downsampling) and concatenation operations;
- Detection: Employs decoupled prediction heads to independently handle classification tasks, bounding box regression.
- Classification: Binary Cross-Entropy (BCE) for object/non-object differentiation;
- Localization: Distribution Focal Loss (DFL) for probability distribution-aware regression;
- Bounding Box Refinement: CIoU metric to address aspect ratio discrepancies.
3.2. KAN Convolutions
3.3. Cross Stage Partial Fusion with 2 KAN Convolutions
3.4. KANConv-PANet
3.5. Loss Function
4. Results and Discussion
4.1. Experimental Environment
4.2. Indicators
4.3. Experiments and Results
4.3.1. Main Experiment Analysis
4.3.2. Ablation Experiments Analysis
4.3.3. Lightweight Analysis
4.3.4. Robustness Analysis
5. Conclusions and Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Williams, A.; Johnson, J. Acoustic scattering models for sidescan sonar imagery. IEEE Transactions on Geoscience and Remote Sensing. 2020, 58, 4502–4515. [Google Scholar]
- Smith, T.; Jones, R. Texture-based classification of underwater sonar images. Pattern Recognition. 2018, 72, 12–24. [Google Scholar]
- Chen, Y.; Li, X. SVM-driven target detection in low-SNR sidescan sonar. The IEEE Journal of Oceanic Engineering. 2019, 44, 789–801. [Google Scholar]
- Ronneberger, O.; Fischer, P. U-Net: Convolutional networks for biomedical image segmentation. Medical Image Analysis. 2015, 9, 234–241. [Google Scholar]
- Zhang, L.; Wang, H. Enhanced U-Net for sonar image segmentation. Remote Sensing. 2021, 13, 1120. [Google Scholar]
- He, K.; Gkioxari, G. Mask R-CNN. IEEE International Conference on Computer Vision. 2017, 2980–2988. [Google Scholar]
- Liu, Q.; Zhang, F. Mask R-CNN for sonar image instance segmentation. Applied Acoustics. 2022, 185, 108423. [Google Scholar]
- Wang, L.; Smith, J.; Brown, K. Topological data analysis for interpretable feature learning in sonar imagery. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023, 45, 9503–9517. [Google Scholar]
- Wang, Y.; Zhou, X. False positive reduction in sonar detection via attention mechanisms. IEEE Sensors Journal. 2023, 23, 10234–10243. [Google Scholar]
- Woo, S.; Park, J. CBAM: Convolutional block attention module. European Conference on Computer Vision Proceedings. 2018, 3–19. [Google Scholar]
- Lin, T.; Dollár, P. Feature pyramid networks for object detection. IEEE Conference on Computer Vision and Pattern Recognition. 2017, 936–944. [Google Scholar]
- Garcia, M.; Lopez, S. Limitations of fixed-filter CNNs in dynamic underwater environments. Ocean Engineering. 2022, 259, 111876. [Google Scholar]
- Liu, Y.; Zhang, H.; Wang, Q. Sparse attention U-Net for side-scan sonar image segmentation with limited annotations. IEEE Transactions on Geoscience and Remote Sensing. 2023, 61, 1–15. [Google Scholar]
- Zhang, H.; Li, Q.; Wang, Y. Leaf Segmentation Using Modified YOLOv8-Seg Models. Computer Vision and Image Processing. 2024, 25, 123–135. [Google Scholar]
- Wang, J.; Chen, L.; Liu, X. Adapting YOLOv8 for Kidney Tumor Segmentation in Computed Tomography. Medical Image Analysis. 2023, 18, 45–58. [Google Scholar]
- Li, T.; Zhou, M.; Zhang, R. YOLOv8-seg-CP: A Lightweight Instance Segmentation Algorithm for Chip Pad Based on Improved YOLOv8-seg Model. IEEE Transactions on Industrial Informatics. 2025, 12, 789–801. [Google Scholar]
- Zheng, L.; Hu, T.; Zhu, J. Underwater sonar target detection based on improved ScEMA YOLOv8. IEEE Geoscience and Remote Sensing Letters. 2024, 21, 1–5. [Google Scholar] [CrossRef]
- Weng, Y.; Xiang, X.; Ma, L. SCR-YOLOv8: an enhanced algorithm for target detection in sonar images. Journal of Applied Sciences. 2025, 15, 1024–1035. [Google Scholar] [CrossRef]
- Chen, Z. et al. AquaYOLO: Enhancing YOLOv8 for Accurate Underwater Object Detection for Sonar Images. Sensors. 2025, 25, 123–135. [Google Scholar]
- Howard, A.; Zhu, M. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, 1704.04861.
- Tan, M.; Le, Q. EfficientDet: Scalable and efficient object detection. IEEE Conference on Computer Vision and Pattern Recognition. 2020, 10781–10790. [Google Scholar]
- Li, J.; Zhang, Y. Lightweight networks for underwater image segmentation. IEEE Transactions on Instrumentation and Measurement. 2021, 70, 1–12. [Google Scholar]
- Bochkovskiy, A.; Wang, C. YOLOv4: Optimal speed and accuracy of object detection. arXiv. 2020, 2004.10934.
- Kumar, V.; Singh, A. YOLO adaptations for sonar image analysis. IEEE Geoscience and Remote Sensing Letters. 2023, 20, 1–5. [Google Scholar]
- Chen, Z.; Li, M.; Xu, R. Edge-optimized YOLOv4-Tiny for real-time sonar object detection on autonomous underwater vehicles. IEEE Journal of Oceanic Engineering. 2022, 47, 1120–1135. [Google Scholar]
- Zhou, T.; Wu, X.; Li, G. Dynamic neural architecture adaptation for energy-efficient sonar processing on heterogeneous UUV platforms. IEEE Journal of Oceanic Engineering. 2024, 49, 567–582. [Google Scholar]
- Cai, W.; Zhang, Y.; Li, T. Sonar image coarse-to-fine few-shot segmentation based on object-shadow feature pair localization and level set method. IEEE Sensors Journal. 2024, 24, 12345–12356. [Google Scholar]
- Wang, K.; Liu, S.; Xu, M. Real-time heterogeneous filtering with lightweight U-Net for side-scan sonar image segmentation. IEEE Robotics and Automation Letters. 2025, 10, 4567–4574. [Google Scholar]
- Hinton, G.; Vinyals, O. Distilling the knowledge in a neural network. Conference on Neural Information Processing Systems. 2015, 1–9. [Google Scholar]
- Jacob, B.; Kligys, S. Quantization and training of neural networks for efficient integer-arithmetic-only inference. IEEE Conference on Computer Vision and Pattern Recognition. 2018, 2704–2713. [Google Scholar]
- Gupta, R.; Patel, N. Spline-based CNNs for interpretable medical imaging. Medical Image Analysis. 2022, 80, 102499. [Google Scholar]
- Kurkova, V.; Sanguineti, M. Kolmogorov-Arnold networks: A survey. Neural Networks. 2018, 103, 127–135. [Google Scholar]
- Unser, M.; Aziznejad, S. B-spline CNNs on Lie groups. International Conference on Learning Representations. 2020, 1–15. [Google Scholar]
- Dai, J.; Qi, H. Deformable convolutional networks. IEEEInternational Conference on Computer Vision. 2017, 764–773. [Google Scholar]
- Hayes, M.; Smith, P. SAS image reconstruction using deformable kernels. IEEE Journal of Oceanic Engineering. 2021, 46, 1104–1116. [Google Scholar]
- Wang, H.; Xu, Y.; Zhang, L. Meta-learning for few-shot segmentation of low-resolution side-scan sonar images. IEEE Transactions on Geoscience and Remote Sensing. 2025, 63, 3050–3065. [Google Scholar]
- Tegmark, M.; Liu, Z.; et al. KAN: Kolmogorov-Arnold Networks. arXiv. 2024, 2404.19756.
- Wang, C.; Liao, H.M.; Wu, Y.; Chen, P.; Hsieh, J.; Yeh, I. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2020, 1571-1580.
- Feng, C.; Zhong, Y.; Gao, Y.; Scott, M. R.; Huang, W. TOOD: Task-aligned One-stage Object Detection. IEEE/CVF International Conference on Computer Vision (ICCV). 2021, 10428-10437.
- Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y. J. YOLACT: Real-time instance segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2019, 9157-9166.











| Strategy | Implementation Detail |
|---|---|
| Optimizer | AdamW (, weight decay ). |
| Loss Function | Adopted the formulation described in Section 3.5, , , , . |
| Initialization | Pre-trained the CKAN-YOLOv8 backbone network on the natural image dataset COCO to leverage its generic feature extraction capability; During fine-tuning, only optimized the segmentation head and detection head to prevent overfitting on small datasets. |
| Dynamic Learning Rate Scheduling | Implemented SGDR (Stochastic Gradient Descent with Warm Restarts) to periodically reset the learning rate, enabling escape from local optima; Cyclic learning rate ranges: Base → Peak over 10 epochs. |
| Regularization | Drop Path (probability 0.2) for stochastic branch pruning in residual connections; Stochastic Depth (randomly dropping layers during training) to enhance generalization. |
| Early Stopping | Training terminated if no improvement in validation loss was observed for 10 consecutive epochs; Best checkpoint selection based on AP@0.5 metric. |
| Model | AP@0.5 | IoU | FPS |
|---|---|---|---|
| Deeplabv3 | 0.716 | 0.63 | 23 |
| U-Net | 0.705 | 0.61 | 27 |
| Mask R-CNN | 0.723 | 0.67 | 19 |
| YOLOv5s_seg | 0.767 | 0.65 | 53 |
| YOLOv8-Baseline | 0.813 | 0.68 | 62 |
| CKAN-YOLOv8 | 0.869 | 0.72 | 66 |
| Model | AP@0.5 | IoU | FPS |
|---|---|---|---|
| YOLOv8-Baseline | 0.813 | 0.68 | 62 |
| CKAN-Backbone | 0.832 | 0.68 | 62 |
| CKAN-Neck | 0.843 | 0.70 | 65 |
| CKAN-YOLOv8 | 0.869 | 0.72 | 66 |
| Model | AP@0.5-Noisy(%) | AP@0.5-Clean(%) | AP(%) | IoU@0.5-Noisy(%) | IoU@0.5-Clean(%) | IoU(%) | FPS |
|---|---|---|---|---|---|---|---|
| CKAN-YOLOv8 | 82.1 | 86.9 | 4.8 | 68.5 | 72.0 | 3.5 | 64 |
| YOLOv8-Baseline | 73.5 | 81.3 | 9.6 | 62.3 | 68.0 | 8.4 | 59 |
| Mask R-CNN | 59.8 | 72.3 | 17.3 | 51.2 | 67.0 | 23.6 | 15 |
| YOLOv5s_seg | 68.2 | 76.7 | 11.1 | 57.8 | 65.0 | 11.1 | 49 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).