Sort by
A Latent Space Diffusion Transformer for High-Quality Video Frame Interpolation
Wei Chen
,Jiing Fang
Posted: 17 December 2025
SuperSegmentation: KeyPoint Detection and Description with Semantic Labeling for VSLAM
Rajarshi Karmakar
,Ciaran Eising
,Rekha Ramachandra
,Sahil Zaidi
Posted: 17 December 2025
Effects of Computer Science on the Creative Industries: A Bibliometric Analysis
Lorenzo Alejandro Matadamas-Torres
,Juan Regino Maldonado
,Idarh Matadamas
,Luis Alberto Alonso-Hernandez
,Manuel de Jesus Melo-Monterrey
,Lorena Juith Ramírez-López
,Luis Enrique Rodríguez-Antonio
Posted: 16 December 2025
Hybrid-Frequency-Aware Mixture-of-Experts Method for CT Metal Artifact Reduction
Pengju Liu
,Hongzhi Zhang
,Chuanhao Zhang
,Feng Jiang
Posted: 16 December 2025
Comparative Analysis of YOLOv8 and YOLOv11 Models for Phenotypic Traits of Edible Mushrooms
Doo-Ho Choi
,Youn-Lee Oh
,Minji Oh
,Eun-Ji Lee
,Sung-I Woo
,Minseek Kim
,Ji-Hoon Im
Posted: 16 December 2025
Lightweight Pipeline Defect Detection Algorithm Based on FALW-YOLOv8
Huazhong Wang
,Xuetao Wang
,Lihua Sun
,Qingchao Jiang
Pipelines play a critical role in industrial production and daily life as essential conduits for transportation. However, defects frequently arise because of environmental and manufacturing factors, posing potential safety hazards. To address the limitations of traditional object detection methods, such as inefficient feature extraction and loss of critical information, this paper proposes an improved algorithm named FALW-YOLOv8, based on YOLOv8. The FasterBlock is integrated into the C2f module to replace standard convolutional layers, thereby reducing redundant computations and significantly enhancing the efficiency of feature extraction. Additionally, the ADown module is employed to improve multi-scale feature retention, while the LSKA attention mechanism is incorporated to optimize detection accuracy, particularly for small defects. The Wise-IoU v2 loss function is adopted to refine bounding box precision for complex samples. Experimental results demonstrate that the proposed FALW-YOLOv8 achieves a 5.8% improvement in mAP50, alongside a 34.8% reduction in parameters and a 30.86% decrease in computational cost. This approach effectively balances accuracy and efficiency, making it suitable for real-time industrial inspection applications.
Pipelines play a critical role in industrial production and daily life as essential conduits for transportation. However, defects frequently arise because of environmental and manufacturing factors, posing potential safety hazards. To address the limitations of traditional object detection methods, such as inefficient feature extraction and loss of critical information, this paper proposes an improved algorithm named FALW-YOLOv8, based on YOLOv8. The FasterBlock is integrated into the C2f module to replace standard convolutional layers, thereby reducing redundant computations and significantly enhancing the efficiency of feature extraction. Additionally, the ADown module is employed to improve multi-scale feature retention, while the LSKA attention mechanism is incorporated to optimize detection accuracy, particularly for small defects. The Wise-IoU v2 loss function is adopted to refine bounding box precision for complex samples. Experimental results demonstrate that the proposed FALW-YOLOv8 achieves a 5.8% improvement in mAP50, alongside a 34.8% reduction in parameters and a 30.86% decrease in computational cost. This approach effectively balances accuracy and efficiency, making it suitable for real-time industrial inspection applications.
Posted: 11 December 2025
A Vision-Based Subtitle Generator: Text Reconstruction via Subtle Vibrations from Videos
Yan Wang
,Yingchong Wang
,Xiuqi Zhang
,Xiaoyu Ding
Posted: 11 December 2025
Human Activity Recognition in the Deep Learning Era: Different Modalities, Recent Advances in Applications, and Emerging Techniques
Mohammad Osman Khan
,Imran Khan Apu
Posted: 10 December 2025
Bidirectional Translation of ASL and English Using Machine Vision and CNN and Transformer Networks
Stefanie Amiruzzaman
,Md. Amiruzzaman
,James Dracup
,Alexander Pham
,Benjamin Crocker
,Linh Ngo
,And M. Ali Akber Dewan
Posted: 08 December 2025
Adaptive Contextual-Relational Network for Fine-Grained Gastrointestinal Disease Classification
Mingxuan Du
,Yutian Zeng
Posted: 08 December 2025
Hierarchical Attention Driven Detection of Small Objects in Remote Sensing Imagery
Xinyu Liu
,Xiongwei Sun
,Jile Wang
Posted: 08 December 2025
Integrating Frequency-Spatial Features for Energy-Efficient OPGW Target Recognition in UAV-Assisted Mobile Monitoring
Lin Huang
,Xubin Ren
,Daiming Qu
,Lanhua Li
,Jing Xu
Posted: 05 December 2025
SIFT-SNN for Traffic-Flow Infrastructure Safety: A Real-Time Context-Aware Anomaly Detection Framework
Munish Rathee
,Boris Bačić
,Maryam Doborjeh
Posted: 03 December 2025
Linear-Region-Based Contour Tracking for Edge Images
Erick Huitrón-Ramírez
,Leonel G. Corona-Ramírez
,Diego Jiménez-Badillo
Posted: 02 December 2025
LBA-Net: Lightweight Boundary-Aware Network for Efficient Breast Ultrasound Image Segmentation
Ye Deng
,Meng Chen
,Jieguang Liu
,Qi Cheng
,Xiaopeng Xu
,Yali Qu
Breast ultrasound segmentation is challenged by strong noise, low contrast, and ambiguous lesion boundaries. Although deep models achieve high accuracy, their heavy computational cost limits deployment on portable ultrasound devices. In contrast, lightweight networks often struggle to preserve fine boundary details. To address this gap, we propose the lightweight boundary-aware network. A MobileNetV3-based encoder with the atrous spatial hyramid pooling is integrated for efficient multi-scale representation learning. The applied the lightweight boundary-aware block uses an adaptive fusion to combine efficient channel attention and depthwise spatial attention to enhance discriminative capability with minimal computational overhead. A boundary-guided dual-head decoding scheme injects explicit boundary priors and enforces boundary consistency to sharpen and stabilize margin delineation. Experiments on curated BUSI* and BUET* datasets demonstrate that the proposed network achieves 82.8% Dice, 38 px HD95, and real-time inference speeds (123 FPS GPU / 19 FPS CPU) using only 1.76M parameters. They show that this proposed network can offer a highly favorable balance between accuracy and efficiency.
Breast ultrasound segmentation is challenged by strong noise, low contrast, and ambiguous lesion boundaries. Although deep models achieve high accuracy, their heavy computational cost limits deployment on portable ultrasound devices. In contrast, lightweight networks often struggle to preserve fine boundary details. To address this gap, we propose the lightweight boundary-aware network. A MobileNetV3-based encoder with the atrous spatial hyramid pooling is integrated for efficient multi-scale representation learning. The applied the lightweight boundary-aware block uses an adaptive fusion to combine efficient channel attention and depthwise spatial attention to enhance discriminative capability with minimal computational overhead. A boundary-guided dual-head decoding scheme injects explicit boundary priors and enforces boundary consistency to sharpen and stabilize margin delineation. Experiments on curated BUSI* and BUET* datasets demonstrate that the proposed network achieves 82.8% Dice, 38 px HD95, and real-time inference speeds (123 FPS GPU / 19 FPS CPU) using only 1.76M parameters. They show that this proposed network can offer a highly favorable balance between accuracy and efficiency.
Posted: 01 December 2025
UCA-Net: A Transformer-Based U-Shaped Underwater Enhancement Network with Compound Attention Mechanism
Cheng Yu
,Jian Zhou
,Lin Wang
,Guizhen Liu
,Zhongjun Ding
Images captured underwater frequently suffer from color casts, blurring, and distortion, which are mainly attributable to the unique optical characteristics of water. Although conventional UIE methods rooted in physics are available, their effectiveness is often constrained, particularly in challenging aquatic and illumination conditions. More recently, deep learning has become a leading paradigm for UIE, recognized for its superior performance and operational efficiency. This paper proposes UCA-Net, a lightweight CNN-Transformer hybrid network. It incorporates multiple attention mechanisms and utilizes composite attention to effectively enhance textures, reduce blur, and correct color. A novel adaptive sparse self-attention module is introduced to jointly restore global color consistency and fine local details. The model employs a U-shaped encoder-decoder architecture with three-stage up- and down-sampling, facilitating multi-scale feature extraction and global context fusion for high-quality enhancement. Experimental results on multiple public datasets demonstrate UCA-Net’s superior performance, achieved with fewer parameters and lower computational cost. Its effectiveness is further validated by improvements in various downstream image tasks.
Images captured underwater frequently suffer from color casts, blurring, and distortion, which are mainly attributable to the unique optical characteristics of water. Although conventional UIE methods rooted in physics are available, their effectiveness is often constrained, particularly in challenging aquatic and illumination conditions. More recently, deep learning has become a leading paradigm for UIE, recognized for its superior performance and operational efficiency. This paper proposes UCA-Net, a lightweight CNN-Transformer hybrid network. It incorporates multiple attention mechanisms and utilizes composite attention to effectively enhance textures, reduce blur, and correct color. A novel adaptive sparse self-attention module is introduced to jointly restore global color consistency and fine local details. The model employs a U-shaped encoder-decoder architecture with three-stage up- and down-sampling, facilitating multi-scale feature extraction and global context fusion for high-quality enhancement. Experimental results on multiple public datasets demonstrate UCA-Net’s superior performance, achieved with fewer parameters and lower computational cost. Its effectiveness is further validated by improvements in various downstream image tasks.
Posted: 01 December 2025
Machine Vision and Deep Learning for Robotic Harvesting of Shiitake Mushrooms
Thomas Rowland
,Mark Hansen
,Melvyn Smith
,Lyndon Smith
Automation and computer vision are increasingly vital in modern agriculture, yet mushroom harvesting remains largely manual due to complex morphology and occluded growing environments. This study investigates the application of deep learning–based instance segmentation and keypoint detection to enable robotic harvesting of Lentinula edodes (shiitake) mushrooms. A dedicated RGB-D image dataset, the first open-access RGB-D dataset for mushroom harvesting, was created using a Microsoft Azure DK 3D camera under varied lighting and backgrounds. Two state-of-the-art segmentation models, YOLOv8-seg and Detectron2 Mask R-CNN, were trained and evaluated under identical conditions to compare accuracy, inference speed, and robustness. YOLOv8 achieved higher mean average precision (mAP = 67.9) and significantly faster inference, while Detectron2 offered comparable qualitative performance and greater flexibility for integration into downstream robotic systems. Experiments comparing RGB and RG-D inputs revealed minimal accuracy differences, suggesting that colour cues alone provide sufficient information for reliable segmentation. A proof-of-concept keypoint-detection model demonstrated the feasibility of identifying stem cut-points for robotic manipulation. These findings confirm that deep learning–based vision systems can accurately detect and localise mushrooms in complex environments, forming a foundation for fully automated harvesting. Future work will focus on expanding datasets, incorporating true four-channel RGB-D networks, and integrating perception with robotic actuation for intelligent agricultural automation.
Automation and computer vision are increasingly vital in modern agriculture, yet mushroom harvesting remains largely manual due to complex morphology and occluded growing environments. This study investigates the application of deep learning–based instance segmentation and keypoint detection to enable robotic harvesting of Lentinula edodes (shiitake) mushrooms. A dedicated RGB-D image dataset, the first open-access RGB-D dataset for mushroom harvesting, was created using a Microsoft Azure DK 3D camera under varied lighting and backgrounds. Two state-of-the-art segmentation models, YOLOv8-seg and Detectron2 Mask R-CNN, were trained and evaluated under identical conditions to compare accuracy, inference speed, and robustness. YOLOv8 achieved higher mean average precision (mAP = 67.9) and significantly faster inference, while Detectron2 offered comparable qualitative performance and greater flexibility for integration into downstream robotic systems. Experiments comparing RGB and RG-D inputs revealed minimal accuracy differences, suggesting that colour cues alone provide sufficient information for reliable segmentation. A proof-of-concept keypoint-detection model demonstrated the feasibility of identifying stem cut-points for robotic manipulation. These findings confirm that deep learning–based vision systems can accurately detect and localise mushrooms in complex environments, forming a foundation for fully automated harvesting. Future work will focus on expanding datasets, incorporating true four-channel RGB-D networks, and integrating perception with robotic actuation for intelligent agricultural automation.
Posted: 26 November 2025
The Impact of Responsive Design on User Experience
Kylychbek Parpiev
Posted: 26 November 2025
A Lightweight Degradation-Aware Framework for Robust Object Detection in Adverse Weather
Seungun Park
,Jiakang Kuai
,Hyunsu Kim
,Hyunseong Ko
,ChanSung Jung
,Yunsik Son
Posted: 26 November 2025
Dynamic Contextual Relational Alignment Network for Open-Vocabulary Video Visual Relation Detection
Linyu Lou
,Jiarong Mo
Posted: 25 November 2025
of 31