REVIEW | doi:10.20944/preprints202007.0506.v1
Subject: Engineering, Automotive Engineering Keywords: neural network; object detection; object classification; Darknet; programming.
Online: 22 July 2020 (09:39:51 CEST)
The article’s goal is to overview challenges and problems on the way from the state of the art CUDA accelerated neural networks code to multi-GPU code. For this purpose, the authors describe the journey of porting the existing in the GitHub, fully-featured CUDA accelerated Darknet engine to OpenCL. The article presents lessons learned and the techniques that were put in place to make this port happen. There are few other implementations on the GitHub that leverage the OpenCL standard, and a few have tried to port Darknet as well. Darknet is a well known convolutional neural network (CNN) framework. The authors of this article investigated all aspects of the porting and achieved the fully-featured Darknet engine on OpenCL. The effort was focused not only on the classification with the use of YOLO1, YOLO2, and YOLO3 CNN models. They also covered other aspects, such as training neural networks, and benchmarks to look for the weak points in the implementation. The GPU computing code substantially improves Darknet computing time compared to the standard CPU version by using underused hardware in existing systems. If the system is OpenCL-based, then it is practically hardware independent. In this article, the authors report comparisons of the computation and training performance compared to the existing CUDA-based Darknet engine in the various computers, including single board computers, and, different CNN use-cases. The authors found that the OpenCL version could perform as fast as the CUDA version in the compute aspect, but it is slower in memory transfer between RAM (CPU memory) and VRAM (GPU memory). It depends on the quality of OpenCL implementation only. Moreover, loosening hardware requirements by the OpenCL Darknet can boost applications of DNN, especially in the energy-sensitive applications of Artificial Intelligence (AI) and Machine Learning (ML).
ARTICLE | doi:10.20944/preprints202203.0172.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: object detection; larger-scale dataset; stacked carton
Online: 11 March 2022 (15:48:23 CET)
Carton detection is an important technique in the automatic logistics system and can be applied to many applications such as the stacking and unstacking of cartons, the unloading of cartons in the containers. However, there is no public large-scale carton dataset for the research community to train and evaluate the carton detection models up to now, which hinders the development of carton detection. In this paper, we present a large-scale carton dataset named Stacked Carton Dataset (SCD) with the goal of advancing the state-of-the-art in carton detection. Images are collected from the Internet and several warehouses, and objects are labeled using per-instance segmentation for precise localization. There are total of 250,000 instance masks from 16,136 images. Naturelly, a suite of benchmarks are established with several popular detectors. In addition, we design a carton detector based on RetinaNet by embedding our proposed Offset Prediction between Classification and Localization module (OPCL) and Boundary Guided Supervision module (BGS). OPCL alleviates the imbalance problem between classification and localization quality which boosts AP by 3.1%∼4.7% on SCD at the model level while BGS guides the detector to pay more attention to boundary information of cartons and decouple repeated carton textures at the task level. To demonstrate the generalization of OPCL to other datasets, we conduct extensive experiments on MS COCO and PASCAL VOC. The improvements of AP on MS COCO and PASCAL VOC are 1.8%∼2.2% and 3.4%∼4.3% respectively. Source dataset is available here.
REVIEW | doi:10.20944/preprints202310.0870.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: image inpainting object removal detection forensic forgery
Online: 13 October 2023 (08:25:14 CEST)
In recent years, significant advancements in the field of machine learning have influenced the domain of image restoration. While these technological advancements present prospects for improving the quality of images, they also present difficulties, particularly the proliferation of manipulated or counterfeit multimedia information on the internet. The objective of this paper is to provide a comprehensive review of existing inpainting algorithms and forgery detections, with a specific emphasis on techniques that are designed for the purpose of removing objects from digital images. In this study, we will examine various techniques encompassing conventional texture synthesis methods, as well as those based on neural networks. Furthermore, we will explore the artifacts associated with the identification of modified photos and present the artifacts frequently introduced by the inpainting procedure and assess the state-of-the-art technology for detecting such modifications. Lastly, we shall look at the available datasets and how the methods compare with each other. Having covered all of the above, the final outcome of this study is to provide a comprehensive perspective on the abilities and constraints to detect images for which an inpainting object removal method was applied.
ARTICLE | doi:10.20944/preprints202009.0088.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: YOLOv2; transfer learning; pig farming; object detection
Online: 4 September 2020 (07:59:03 CEST)
Generic object detection is one of the most important and flourishing branches of computer vision and has real-life applications in our day to day life. With the exponential development of deep learning-based techniques for object detection, the performance has enhanced considerably over the last 2 decades. However, due to the data-hungry nature of deep models, they don't perform well on tasks which have very limited labeled dataset available. To handle this problem, we proposed a transfer learning-based deep learning approach for detecting multiple pigs in the indoor farm setting. The approach is based on YOLO-v2 and the initial parameters are used as the optimal starting values for train-ing the network. Compared to the original YOLO-v2, we transformed the detector to detect only one class of objects i.e. pigs and the back-ground. For training the network, the farm-specific data is annotated with the bounding boxes enclosing pigs in the top view. Experiments are performed on a different configuration of the pen in the farm and convincing results have been achieved while using a few hundred annotated frames for fine-tuning the network.
ARTICLE | doi:10.20944/preprints202306.0899.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: lightweight salient object detection; salient object detection; object detection; lightweight neural network; color opponent; cone-opponent; double-opponent; vision sensing
Online: 13 June 2023 (08:33:19 CEST)
Computer vision models of salient object detection attempt to mimic the ability of the human visual system to select relevant objects in images. To this end, the development of deep neural networks on high-end computers has recently made it possible to achieve high performance. However, it remains a challenge to develop deep neural network models of the same performance for devices with much more limited resources. In this work, we propose a new approach for a lightweight salient object detection neural network model, inspired by the cone and spatial opponent processes of the primary visual cortex (V1), that inextricably link color and shape in human color perception. Our proposed model, namely CoSOV1net, is trained from scratch, without using backbones from image classification or other tasks. Experiments, on the most widely used and challenging datasets for salient object detection, show that CoSOV1Net achieves competitive performance (i.e. Fβ=0.931 on the ECSSD dataset) with state-of-the-art salient object detection models, while having low number of parameters (1.14M), low FLOPS (1.4G) and high FPS (211.2) on GPU (nvidia Geforce RTX 3090 TI) compared to the state-of-the-art in the salient object detection or lightweight salient object detection task. Thus, CoSOV1net turns out to be a lightweight salient object detection that can be adapted to mobile environments and resource-constrained devices.
COMMUNICATION | doi:10.20944/preprints202105.0679.v1
Subject: Biology And Life Sciences, Immunology And Microbiology Keywords: YOLOv5; object detection; mold; food spoilage; deep learning.
Online: 27 May 2021 (14:14:20 CEST)
The study aimed to identify different molds that grow on various food surfaces. As a result, we conducted a case study for the detection of mold on food surfaces based on the “you only look once (YOLO) v5” principle. In this context, a dataset of 2050 food images with mold growing on their surfaces was created. The dataset was trained using the pre-trained YOLOv5 algorithm. In comparison to YOLOv3 and YOLOv4, this current YOLOv5 model had better precision, recall, and average precision (AP), which were 98.10%, 100%, and 99.60%, respectively. The YOLOv5 algorithm was used for the first time in this study to detect mold on food surfaces. In conclusion, the proposed model successfully recognizes any kind of mold present on the food surface. Using YOLOv5, we are currently conducting research to identify the specific species of the detected mold.
ARTICLE | doi:10.20944/preprints202207.0070.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: scene recognition; object detection; scene classification; TF-IDF
Online: 5 July 2022 (08:38:17 CEST)
Indoor scene recognition and semantic information can be helpful for social robots. Recently, in the field of indoor scene recognition, researchers have incorporated object-level information and shown improved performances. This paper demonstrates that scene recognition can be performed solely using object-level information in line with these advances. A state-of-the-art object detection model was trained to detect objects typically found in indoor environments and then used to detect objects in scene data. These predicted objects were then used as features to predict room categories. This paper successfully combines approaches conventionally used in computer vision (YOLO) and Term Frequency-Inverse Document Frequency (TF-IDF). These approaches could be further helpful in the field of embodied research and dynamic scene classification, which we elaborate on.
ARTICLE | doi:10.20944/preprints202306.1084.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: object detection; DETR; FPN; transformer; attention mechanism
Online: 15 June 2023 (08:32:11 CEST)
In practical applications, the detection of objects with various sizes is a common requirement for most detectors. The feature pyramid network (FPN) is widely adopted as a framework to address this challenge. The field is witnessing an increasing number of transformer-based target detectors due to the widespread adoption of transformer technology. This paper initially examines the design flaws in FPN and transformer-based target detectors, followed by the introduction of a new transformer-based approach called Texturized Instance Guidance (TIG-DETR) to address these issues. Specifically, TIG-DETR comprises a backbone network, a new pyramidal structure known as Texture-Enhanced FPN (TE-FPN), and an enhanced DETR detector.The TE-FPN is composed of three components: a bottom-up pathway for enhancing texture information in the feature map, a lightweight attention module to address confounding effects resulting from cross-scale fusion, and a standard attention module to enhance the final output features.The improved DETR detector utilizes Shifted Window based Self-Attention to replace the multi-headed self-attention module in DETR, thereby accelerating model convergence. Moreover, it incorporates an Instance Based Advanced Guidance Module to enhance instance perception in the image by employing a pre-local self-attentive mechanism for recognizing larger instances. By employing TE-FPN instead of FPN in Faster RCNN with Resnet-50 as the backbone network, we achieve a 1.9% improvement in average accuracy. TIG-DETR achieves an average accuracy of 44.1 with Resnet-50 as the backbone network.
ARTICLE | doi:10.20944/preprints201902.0105.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: fusion; point clouds; images; object detection
Online: 12 February 2019 (16:53:19 CET)
This paper aims at tackling with the task of fusion feature from images and its corresponding point clouds for 3D object detection in autonomous driving scenarios basing on AVOD, an Aggregate View Object Detection network. The proposed fusion algorithms fuse features targeted from Bird’s Eye View (BEV) LIDAR point clouds and its corresponding RGB images. Differs in existing fusion methods, which are simply the adoptions of concatenation module, element-wise sum module or element-wise mean module, our proposed fusion algorithms enhance the interaction between BEV feature maps and its corresponding images feature maps by designing a novel structure, where single level feature maps and another utilizes multilevel feature maps. Experiments show that our proposed fusion algorithm produces better results on 3D mAP and AHS with less speed loss comparing to existing fusion method used on the KITTI 3D object detection benchmark.
ARTICLE | doi:10.20944/preprints202112.0511.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: real sea surface; object detection; performance detection
Online: 31 December 2021 (11:16:15 CET)
The video images captured at long range usually have low contrast floating objects of interest on a sea surface. A comparative experimental study of the statistical characteristics of reflections from floating objects and from the agitated sea surface showed the difference in the correlation and spectral characteristics of these reflections. The functioning of the recently proposed modified matched subspace detector (MMSD) is based on the separation of the observed data spectrum on two subspaces: relatively low and relatively high frequencies. In the literature the MMSD performance has been evaluated in generally and moreover using only a sea model (additive Gaussian background clutter). This paper extends the performance evaluating methodology for low contrast object detection and moreover using only the real sea dataset. This methodology assumes an object of low contrast if the mean and variance of the object and the surrounding background are the same. The paper assumes that the energy spectrum of the object and the sea are different. The paper investigates a scenario in which an artificially created model of a floating object with specified statistical parameters is placed on the surface of a real sea image. The paper compares the efficiency of the classical Matched Subspace Detector (MSD) and MMSD for detecting low-contrast objects on the sea surface. The article analyzes the dependence of the detection probability at a fixed false alarm probability on the difference between the statistical means and variances of a floating object and the surrounding sea.
ARTICLE | doi:10.20944/preprints202310.1729.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: 3D object detection; point cloud; voxel; deep learning
Online: 27 October 2023 (03:52:14 CEST)
In the realm of autonomous vehicle environment perception, the primary objective of point cloud target detection is to swiftly and accurately identify three-dimensional objects from point cloud data. To meet this requirement, the prevalent network architecture employed in the industry is the voxel-based PointPillars model. Nonetheless, this model faces challenges in maintaining detection accuracy when objects are obscured or diminutive in size. In response to this issue, we introduce AGPNet, a novel model that seamlessly integrates four key modules: Data Augmentation, Dynamic Graph CNN, Pillar Feature Net, and Detection Head (SSD). The Data Augmentation module enhances the adaptability of point cloud data to complex and ever-changing real-world environments. The Dynamic Graph CNN module endows the network structure with geometric features, which encapsulate not only the point itself but also its adjacent points. The Pillar Feature Net module translates three-dimensional point cloud data into pseudo-image data through the utilization of voxels. Subsequently, the Detection Head (SSD) module leverages this pseudo-image data to conduct target detection of three-dimensional objects. Our experiments, conducted on the KITTI dataset, demonstrate that our proposed method boosts object detection accuracy by 6-7 percentage points compared to the PointPillars model, while maintaining similar detection times.
ARTICLE | doi:10.20944/preprints202306.0287.v1
Subject: Engineering, Automotive Engineering Keywords: autonomous driving; object detection; Position Adaptive Convolution; FANet
Online: 5 June 2023 (09:15:40 CEST)
3D object detection is essential for an accurate and reliable autonomous driving system. Currently, the methods used by the state-of-the-art two-stage detectors are not flexible enough and their fea-ture extraction capabilities are very limited to cope effectively with the disorder and irregularity of point clouds. In this paper, we combine the advantages of both PV-RCNN and PAConv (Position Adaptive Convolution) to create a completely new network, FANet, in order to overcome the ir-regularity and disorder of point clouds. The convolution in our network builds convolutional ker-nels from a basic weight matrix, whose combined coefficients are learned adaptively by LearnNet from relative points. This network allows for flexible modeling of complex spatial variations and geometric structures in the 3D point cloud, enabling better extraction of point cloud features and producing high-quality 3D proposal boxes. Compared to other methods, FANet is superior in terms of 3D object detection accuracy. Extensive experiments on the KITTI dataset have shown a signif-icant improvement in our approach.
ARTICLE | doi:10.20944/preprints201904.0244.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: salient object; local binary pattern; histogram features; conditional random field
Online: 22 April 2019 (11:40:11 CEST)
We propose a novel method for salient object detection in different images. Our method integrates spatial features for efficient and robust representation to capture meaningful information about the salient objects. We then train a conditional random field (CRF) using the integrated features. The trained CRF model is then used to detect salient objects during the online testing stage. We perform experiments on two standard datasets and compare the performance of our method with different reference methods. Our experiments show that our method outperforms the compared methods in terms of precision, recall, and F-Measure.
ARTICLE | doi:10.20944/preprints202206.0384.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Deep Learning; Smartphone Image; Acne Grading; Acne Object DetectionDeep Learning, Smartphone Image, Acne Grading, Acne Object Detection
Online: 28 June 2022 (10:05:25 CEST)
Skin image analysis using artificial intelligence (AI) has recently attracted significant research interest, particularly for analyzing skin images captured by mobile devices. Acne is one of the most common skin conditions with profound effects in severe cases. In this study, we developed an AI system called AcneDet for automatic acne object detection and acne severity grading using facial images captured by smartphones. AcneDet includes two models for conducting two tasks: (1) a Faster R-CNN-based deep learning model for the detection of acne lesion objects of four types including blackheads/whiteheads, papules/pustules, nodules/cysts, and acne scars; and (2) a LightGBM machine learning model for grading acne severity using the Investigator’s Global Assessment (IGA) scale. The output of the Faster R-CNN model, i.e., the counts of each acne type, were used as input for the LightGBM model for acne severity grading. A dataset consisting of 1,572 labeled facial images captured by both iOS and Android smartphones was used for training. The results show that the Faster R-CNN model achieves a mAP of 0.54 for acne object detection. The mean accuracy of acne severity grading by the LightGBM model is 0.85. With this study, we hope to contribute to the development of artificial intelligent systems that are able to help acne patients understand more about their conditions and support doctors in acne diagnosis.
ARTICLE | doi:10.20944/preprints202309.0499.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: thermal; object detection; conditioning; weather-aware
Online: 7 September 2023 (09:29:00 CEST)
Deployments of real-world object-detection systems often experience a degradation in performance over time due to concept drift. Systems that leverage thermal cameras are especially susceptible because the respective thermal signatures of objects and their surroundings are highly sensitive to environmental changes. In this study, a conditioning method is investigated. The method aims to guide the training loop of thermal object detection systems by leveraging an auxiliary branch to predict the weather, while directly or indirectly conditioning the baseline detection system. Leveraging such an approach to train detection networks does not necessarily improve the performance of native architectures, however, it can be observed that conditioned networks manage to extract a signal from thermal images that guides the network to detect objects that baseline models miss. As the extracted signal appears to be quite noisy and very challenging to regress accurately, further work is needed to identify an ideal optimization vector.
ARTICLE | doi:10.20944/preprints202307.2106.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: dense road; object detection; Darknet-53 network; transfer learning
Online: 31 July 2023 (10:40:08 CEST)
Stemming from the object overlap and undertraining from the few samples, the road dense object detection is confronted with the poor object identification performance and the inability to recognize edge objects. Based on this, one transfer learning-based YOLOv3 approach for identifying dense objects in the road has been proposed. Firstly, Darknet-53 network structure is adopted to obtain pre-trained YOLOv3 model, then the transfer training is introduced as the output layer for the special dataset of 2000 images containing vehicles; in the proposed model, one random function is adapted to intialize and optimize the weights of the transfer training model, which is seperately designed from the pre-trained YOLOv3; and the object detection classifier replaces the fully connected layer, which further improves the detection effect, the reduced size of the network model can further reduce the training and detection time, and can be better applied to actual scenarios. The experimental results demonstrate that the object detection accuracy of the presented approach is 87.75% for the Pascal VOC 2007 dataset, which is superior to the traditional YOLOv3 and the traditional YOLOv2 by 3.05% and 11.15%, respectively. Besides, the test was carried out using UA-DETRAC, a public road vehicle detection dataset, the object detection accuracy of the presented approach reaches 79.23% in detecting images, which is 4.13% better than the traditional YOLOv3, and compared with the existing relatively new object detection algorithm YOLOv5, the detection accuracy is 1.36% better. Moreover, the detection speed of the proposed YOLOv3 method reaches 31.2 Fps/s in detecting images, which is 7.6 Fps/s faster than the traditional YOLOv3, and compared with the existing relatively new object detection algorithm YOLOv5, the speed is 4.3 Fps/s faster; the proposed YOLOv3 performs 79.38Bn of floating point operations per second in detecting video, which obviously surpasses the traditional YOLOv3 and the newer object detection algorithm YOLOv5.
ARTICLE | doi:10.20944/preprints202310.1823.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: 3D object detection; distance features; SA layer enhancement
Online: 30 October 2023 (06:34:05 CET)
With increasing demand from unmanned driving and robotics, more attention has been paid to point cloud-based 3D object accurate detection technology. However, due to the sparseness and irregularity of the point cloud, the most critical problem is how to utilize the relevant features more efficiently. In this paper, we proposed a point-based object detection enhancement network to improve the detection accuracy in the 3D scenes understanding based on the distance features. Firstly, the distance features are extracted from the raw point sets and fused with the raw features about reflectivity of the point cloud to maximizing the use of information in point cloud. Secondly, we enhanced the distance features and raw features that we collectively refer to them as self-features of the key points in Set Abstraction (SA) layers with the self-attention mechanism, so that the foreground points can be better distinguished from the background points. Finally, we revised the group aggregation module in SA layers to enhance the feature aggregation effect of key points. We conducted experiments on the KITTI dataset and nuScenes dataset and the results show the enhancement method proposed in this paper has excellent performance.
ARTICLE | doi:10.20944/preprints202307.1200.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: plant phenotype; soybean leaf; image segmentation; object detection
Online: 18 July 2023 (09:14:41 CEST)
Plant phenotype plays an important role in crop breeding and planting. Leaf phenotype is an important part of plant phenotype. In order to analyze the leaf phenotype, the target leaf is required to be segmented from the complex background image. In this paper, an automatic soybean leaf segmentation method based on object detection and interactive segmentation models is proposed. Firstly, the Libra R-CNN object detection algorithm is used to detect all soybean leaves in the image. Then, based on the idea that the target soybean leaf is located in the center of the image and the area is large, the detection bounding box of the target leaf is selected. In order not to destroy the segmentation result, the bounding box is optimized to completely enclose the whole leaf. Finally, according to the optimized bounding box, the prior channels of foreground and background are constructed using Gaussian model. The two channels together with the original image are as the input of the interactive object segmentation with inside-outside guidance model to segment the target soybean leaf. A large number of qualitative and quantitative experimental results show that the method has high segmentation accuracy and strong generalization capacity.
ARTICLE | doi:10.20944/preprints202108.0509.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: synthetic aperture radar; deep learning; data augmentation; object detection; ship detection
Online: 26 August 2021 (12:00:22 CEST)
Maritime ship monitoring plays an important role in maritime transportation. Fast and accurate detection of maritime ship is the key to maritime ship monitoring. The main sources of marine ship images are optical images and synthetic aperture radar (SAR) images. Different from natural images, SAR images are independent to daylight and weather conditions. Traditional ship detection methods of SAR images mainly depend on the statistical distribution of sea clutter, which leads to poor robustness. As a deep learning detector, RetinaNet can break this obstacle, and the problem of imbalance on feature level and objective level can be further solved by combining with Libra R-CNN algorithm. In this paper, we modify the feature fusion part of Libra RetinaNet by adding a bottom-up path augmentation structure to better preserve the low-level feature information, and we expand the dataset through style transfer. We evaluate our method on the publicly available SAR dataset of ship detection with complex backgrounds. The experimental results show that the improved Libra RetinaNet can effectively detect multi-scale ships through expansion of the dataset, with an average accuracy of 97.38%.
ARTICLE | doi:10.20944/preprints202305.1592.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: forest fire detection; attention mechanism; staged object detection; deep learning
Online: 23 May 2023 (07:20:56 CEST)
Forest fires are one of the world's deadliest natural disasters. Early detection of forest fires can help minimize the damage to ecosystems and forest life. In this paper, we propose an improved fire detection method YOLOv5-IFFDM for YOLOv5. Firstly, the fire and smoke detection accuracy and the network perception accuracy of small targets are improved by adding attention mechanism in the backbone network. Secondly, the loss function is improved and the SoftPool pyramid pooling structure is used to improve the regression accuracy and detection effect of the model and the robustness of the model. In addition, random Mosaic augmentation technique is used to enhance the data to increase the generalization ability of the model, and re-clustering of flame and smoke detection a priori frames are used to improve the accuracy and speed. Finally, the parameters of the convolutional and normalization layers of the trained model are homogeneously merged to further reduce the model responsibility to improve the detection speed. Experimental results on homemade forest fire and smoke datasets show that this algorithm has high detection accuracy and fast detection speed, with average accuracy of fire up to 90.5% and smoke up to 84.3%, and detection speed up to 75 FPS (frames per second transmission), which can meet the requirements of real-time and efficient fire detection.
ARTICLE | doi:10.20944/preprints202312.0314.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: YOLO; Object tracking and Counting; Pink Bollworm; ROS; UGV; Agricultural Robot
Online: 6 December 2023 (08:01:56 CET)
The agricultural industry places a high priority on crop protection, especially when it comes to protecting cotton crops against pink bollworm infestations. Rapid disease identification is essential for effective crop management at the same time. In order to overcome these difficulties, this study offers a thorough process for utilizing the Robot Operating System (ROS) to deploy an autonomous robot. The robot is intended to identify infestations of pink bollworm in addition to counting and tracking red Disease that impact cotton harvests. Drones are used to check the condition of cotton fields, and the autonomous robot's path planning component is closely related to this process. Large fields are photographed by these drones in multispectral mode, and the robot's course planning technique is done by calculating the Normalized Difference Vegetation Index (NDVI) based on these images. This approach ensures targeted surveillance and intervention, optimizing the use of resources. A customized dataset was created especially for this application to improve the robot's detecting abilities. The dataset was utilized to train a YOLOV8 model, a state-of-the-art object detection architecture. Performance characteristics of the trained model are impressive; it has a mean Average Precision (mAP) of 67.1%, Precision of 67.9%, and Recall of 61.8%. These metrics highlight how well the model works to precisely locate and measure interesting occurrences in the cotton fields. In order to address the particular challenges presented by pink bollworm infestations and crop diseases in the context of cotton cultivation, this research contributes a comprehensive solution for autonomous crop monitoring and protection by seamlessly integrating ROS, drones, NDVI calculations, and a robust detection model.
ARTICLE | doi:10.20944/preprints202306.1543.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: power line inspection; object detection; small targets; attention mechanisms; Loss function
Online: 21 June 2023 (11:46:42 CEST)
The images captured by UAVs during inspection often contain numerous small targets related to transmission lines, which are critical and vulnerable elements for ensuring the safe operation of these lines. However, due to various factors such as the small size of the targets, low resolution, complex background, and potential line aggregation, achieving accurate and real-time detection becomes challenging. To address these issues, this paper proposes a detection algorithm called P2-ECA-EIOU-YOLOv5 (P2E-YOLOv5). Firstly, in order to address the challenges posed by the complex background and environmental interference that impact small targets, an ECA attention module is integrated into the network. This module effectively enhances the network's focus on small targets while concurrently mitigating the influence of environmental interference. Secondly, considering the characteristics of small target size and low resolution, a new high-resolution detection head is introduced, which is more sensitive to small targets. Lastly, the network utilizes the EIOU-Loss as the regression Loss function to improve the positioning accuracy of small targets, as they tend to aggregate. Experimental results demonstrate that the proposed P2E-YOLOv5 detection algorithm achieves an accuracy P of 96.0% and an average accuracy (mAP) of 97.0% for small target detection in transmission lines.
ARTICLE | doi:10.20944/preprints202306.0281.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Infrared dim small targets; Object detection; Adaptive Fusion Attention Module; ISVD
Online: 5 June 2023 (08:47:09 CEST)
Infrared detection plays an important role in the military, aerospace, and other fields, which has the advantages of all-weather, high stealth, and strong anti-interference. However, infrared dim small target detection suffers from complex backgrounds, low signal-to-noise ratio, blurred targets with small area percentages, and other challenges. In this paper, we proposed a multiscale YOLOv5-AFAM algorithm to realize high-accuracy and real-time detection. Aiming at the problem of target intra-class feature difference and inter-class feature similarity, the Adaptive Fusion Attention Module - AFAM was proposed to generate feature maps that are calculated to weigh the features in the network and make the network focus on small targets. This paper proposed a multiscale fusion structure to solve the problem of small and variable detection scales in infrared vehicle targets. In addition, the downsampling layer is improved by combining Maxpool and convolutional downsampling to reduce the number of model parameters and retain the texture information. For multiple scenarios, we constructed an infrared dim and small vehicle target detection dataset, ISVD. The multiscale YOLOv5-AFAM was conducted on the ISVD dataset, compared to YOLOv7, mAP@0.5 achieves a small improvement while the parameters are only 17.98% of it. By contrast with the YOLOv5s model, mAP@0.5 was improved by 4.3% with a 6.6% reduction in the parameters. Experiments results demonstrate that the multiscale YOLOv5-AFAM has a higher detection accuracy and detection speed on infrared dim and small vehicles.
ARTICLE | doi:10.20944/preprints202209.0060.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Autonomous Driving; Deep Learning; LIDAR Data; Wavelets; 3D Object Detection
Online: 5 September 2022 (13:03:00 CEST)
3D object detection is crucial for autonomous driving to understand the driving environment. Since the pooling operation causes information loss in the standard CNN, we have designed a wavelet multiresolution analysis-based 3D object detection network without a pooling operation. Additionally, instead of using a single filter like the standard convolution, we use the lower-frequency and higher-frequency coefficients as a filter. These filters capture more relevant parts than a single filter, enlarging the receptive field. The model comprises a discrete wavelet transform (DWT) and an inverse wavelet transform (IWT) with skip connections to encourage feature reuse for contrasting and expanding layers. The IWT enriches the feature representation by fully recovering the lost details during the downsampling operation. Element-wise summation is used for the skip connections to decrease the computational burden. We train the model for the Haar and Daubechies (Db4) wavelets. The two-level wavelet decomposition result shows that we can build a lightweight model without losing significant performance. The experimental results on the KITTI’s BEV and 3D evaluation benchmark show our model outperforms the Pointpillars base model by up to 14 \% while reducing the number of trainable parameters. Code will be released.
ARTICLE | doi:10.20944/preprints202306.1069.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: wheel surface defect detection; deep learning; YOLO; object detection; machine vision
Online: 15 June 2023 (07:20:42 CEST)
Surface defect detection is a crucial step in the process of automotive wheel production. However, the task possesses challenges due to complex background and a wide range of defect types. In order to detect the defects on the wheel surface accurately and quickly, this paper proposes a YOLOv5-based algorithm for automotive wheel surface defect detection. The algorithm trains and tests the YOLOv5s model using the self-created automotive wheel surface defect dataset, which contains four kinds of defects: linear, dotted, sludge, pinhole. The extensive experimental results demonstrate that the deep learning network trained by our method can achieve an average accuracy of 71.7% and 57.14 FPS. Our findings prove that this detection algorithm performs better than other common target detection algorithms and meets the real-time requirements of industrial applications.
ARTICLE | doi:10.20944/preprints202309.2024.v2
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: target detection; side-scan sonar images; seabed object; frequency domain
Online: 2 November 2023 (07:13:35 CET)
Side-scan sonar (SSS) detection is a key method in applications such as underwater environmental security and subsea resource development. The use of acoustic images for seabed target detection has gradually become a mainstream underwater detection method. However, many existing detection approaches primarily concentrate on tracking the evolution path of optical image object detection tasks, resulting in complex structures and limited versatility. To tackle this issue, we introduce a pioneering Dual-Domain Multi-Frequency Network (D2MFNet) meticulously crafted to harness the distinct characteristics of SSS image detection. In D2MFNet, aiming at the underwater detection requirements of small scenes, we introduce a novel method for optimize and improve the detection sensitivity of different frequency ranges and propose a Multi-Frequency Combined Attention Mechanism (MFCAM). This mechanism amplifies the relevance of dual-domain features across different channels and space. Moreover, recognizing that SSS images can provide richer insights after frequency domain conversion, we introduce a Dual-Domain Feature Pyramid Network (D2FPN). By incorporating frequency domain information representation, D2FPN significantly augments the depth and breadth of feature information in underwater small datasets. Our methods are seamlessly designed for integration into existing networks, offering plug-and-play functionality with substantial performance enhancements. We have conducted extensive experiments to validate the efficacy of our proposed techniques, and the results showcase their state-of-the-art performance. MFCAM improves the mAP by 16.9% in the KLSG dataset and 15.5% in the SCTD dataset. The mAP of D2FPN was improved by 8.4% in the KLSG dataset and by 9.8% in the SCTD dataset. We will make our code and models publicly available at https://dagshub.com/estrellaww00/D2MFNet.
ARTICLE | doi:10.20944/preprints202307.1534.v1
Subject: Engineering, Automotive Engineering Keywords: Belt conveyor; foreign object detection; YOLOX; image enhancement; rotation detection
Online: 21 July 2023 (13:52:38 CEST)
As one of the main equipment of coal transportation, the belt conveyor with detection system is an important direction for the development of intelligent mine. The occurrences of non-coal foreign objects making contact with belts are common phenomenon in complex production environments and improper human operations. In order to avoid major safety accidents caused by scratches, deviation and breakage of the belt, a foreign object detection method is proposed for belt conveyor in this work. Firstly, a foreign object image dataset is collected and established, and IAT image enhancement module and attention mechanism of CBAM are proposed to enhance image data sample. Moreover, to predict the angle information of foreign objects with large aspect ratios, a rotating decoupling head is designed, and a MO-YOLOX network structure is constructed. Some experiments are carried out with the belt conveyor in the mine intelligent mining equipment laboratory, and the detections of different foreign objects are analyzed. Experimental results show that the accuracy, recall, and mAP50 of the proposed rotating frame foreign object detection method reaches 93.87%, 93.69%, and 93.72%, and the average inference time of foreign object detection is 25 ms.
ARTICLE | doi:10.20944/preprints202304.1242.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: visual intelligence; object detection; image processing; action recognition; autonomous vehicles; machine learning
Online: 30 April 2023 (02:50:07 CEST)
In the context of Shared Autonomous Vehicles, the need to monitor the environment inside the car will be crucial. This article focuses on the application of deep learning algorithms to detect objects, namely lost/forgotten items to inform the passengers, and aggressive items to monitoring if violent actions may arise between passengers. For object detection algorithms was used public datasets (COCO and TAO) to train state-of-the-art algorithms, such as YOLOv5. For violent action detection was used the MoLa InCar dataset to train on state-of-the-art algorithms such as I3D, R(2+1)D, SlowFast, TSN and TSM. At the end an embedded automotive solution was used to demonstrate both methods running in real-time.
ARTICLE | doi:10.20944/preprints202311.0614.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: autonomous driving; 3D object detection; multiple frame point clouds; feature and data fusion
Online: 9 November 2023 (11:16:01 CET)
Object detection is important in many applications, such as autonomous driving. While 2D images are lack of depth information and are sensitive to environmental conditions, 3D point cloud can provide accurate depth information and a more descriptive environment. However, sparsity is always a challenge in single-frame point cloud object detection. This paper introduces a two-stage proposal-based feature fusion method for object detection using multiple frames. The proposed method, called proposal features fusion (PFF), utilizes a cosine-similarity approach to associate proposals from multiple frames and employs an attention weighted fusion module to merge features from these proposals. It allows for feature fusion specific to individual objects and offers lower computational complexity while achieving higher precision. The experimental results on the nuScenes dataset demonstrate the effectiveness of our approach, achieving a mAP of 46.7%, which is 1.3% higher than the state-of-the-art 3D object detection method.
ARTICLE | doi:10.20944/preprints202209.0025.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: object detection; semi-supervised learning; Mask R-CNN; floor-plan images; computer vision
Online: 1 September 2022 (15:16:43 CEST)
Research has been growing on object detection using semi-supervised methods in past few years. We examine the intersection of these two areas for floor-plan objects to promote the research objective of detecting more accurate objects with less labelled data. The floor-plan objects include different furniture items with multiple types of the same class, and this high inter-class similarity impacts the performance of prior methods. In this paper, we present Mask R-CNN based semi-supervised approach that provides pixel-to-pixel alignment to generate individual annotation masks for each class to mine the inter-class similarity. The semi-supervised approach has a student-teacher network that pulls information from the teacher network and feeds it to the student network. The teacher network uses unlabeled data to form pseudo-boxes, and the student network uses both unlabeled data with the pseudo boxes and labelled data as ground truth for training. It learns representations of furniture items by combining labelled and unlabeled data. On the Mask R-CNN detector with ResNet-101 backbone network, the proposed approach achieves mAP of 98.8%, 99.7%, and 99.8% with only 1%, 5% and 10% labelled data, respectively. Our experiment affirms the efficiency of the proposed approach as it outperforms the fully supervised counterpart using only 10% of the labels.
ARTICLE | doi:10.20944/preprints202310.1704.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: unmanned aerial vehicles; multi-object tracking; tracking-by-detection; set-membership filter
Online: 26 October 2023 (11:06:09 CEST)
Multi-Object Tracking (MOT) is a key technology for Unmanned Aerial Vehicles (UAVs). Traditional tracking-by-detection methods firstly employ an object detector to retrieve targets in each image and then track them based on a matching algorithm. Recently, the popular multi-task learning methods has been dominating this area since they can detect targets and extract Re-Identification (Re-ID) features in a computationally efficient way. However, the detection task and the tracking task have conflicting requirements on image features, leading to the poor performance of the joint learning model compared to separate detection and tracking methods. The problem is more severe when it comes to UAV images due to the presence of irregular motion of large number of small targets. In this paper, we propose a balanced Joint Detection and Re-ID learning (JDR) network to address the MOT problem in UAV vision. To better handle the non-uniform motion of objects in UAV videos, the Set-Member Filter is applied which describes object state as a bounded set. An appearance matching cascade is then proposed based on target state set. Furthermore, a Motion-Mutation module is designed to address the challenges posed by the abrupt motion of UAV. Extensive experiments on the VisDrone-MOT2019 dataset demonstrate that our proposed model, termed as SMFMOT, outperforms state-of-the-arts by a large margin and achieves superior performance on the MOT tasks in UAV videos.
ARTICLE | doi:10.20944/preprints202308.1011.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: remote sensing images; orientated object detection; one-stage; anchor-free; Gaussian kernal
Online: 14 August 2023 (10:39:48 CEST)
Oriented object detection is a challenging task in scene text detection and remote sensing image analysis, which has attracted extensive attention in recent years with the development of deep learning. Currently, mainstream oriented object detectors are based on preset anchor boxes. This method increases the computational load of the network and cause a large amount of anchor box redundancy. To solve this problem, we propose anchor-free oriented object detection based on the Gaussian centerness(AOGC), a single-stage anchor-free detection method. Our method uses contextual attention FPN(CAFPN) to obtain the contextual information of the target. Then we design a label assignment method for oriented objects. Finally, we develop a Gaussian kernel-based centerness branch, which can effectively determine the significance of different anchors. AOGC achieves mAP of 74.30% on the DOTA-1.0 datasets and 89.80% on the HRSC2016 datasets, respectively. AOGC exhibits superior performance to other methods in oriented anchor-free object detection methods.
ARTICLE | doi:10.20944/preprints202207.0377.v1
Subject: Engineering, Control And Systems Engineering Keywords: object detection; contour; polygonal approximation; piecewise split-merge algorithm; Coupled Hidden Markov Model
Online: 26 July 2022 (02:27:17 CEST)
Since the conventional split-merge algorithm is sensitive to the object scale variance and splitting starting point, a piecewise split-merge polygon approximation method is proposed to extract the object contour features. Specifically, the contour corner is used as the starting point for the contour piecewise approximation to reduce the sensitivity of the contour segment on the starting point; then, the split-merge algorithm is used to implement the polygon approximation for each contour segments. Both the distance ratio and the arc length ratio instead of the distance error are used as the iterative stop condition to improve the robustness to the object scale variance. Both the angle and length as two features describe the shape of the contour polygon, and affect each other along the contour order relationship. Since they have a strong coupling relationship. To improve the description correction of the contour, these two features are combined to construct a Coupled Hidden Markov Model to detect the object by calculating the probability of the contour feature. The proposed algorithm is validated on ETHZ Shape Classes and INRIA Horses standard datasets. Compared with other contour-based object detection algorithms, the proposed algorithm reduces the complexity of contour description, improves the robustness of contour features to scale variance, and has a higher object detection rate.
ARTICLE | doi:10.20944/preprints202305.1132.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: object detection; UAV images; lightweight network; maritime search and rescue
Online: 16 May 2023 (09:03:19 CEST)
Maritime search and rescue is a crucial component of the national emergency response system, which currently mainly relies on Unmanned Aerial Vehicles (UAVs) to detect the objects. Most traditional object detection methods focus on boosting the detection accuracy while neglecting the detection speed of the heavy model. However, it is also essential to improve the detection speed which can provide timely maritime search and rescue. To address the issues, we propose a lightweight object detector named Shuffle-GhostNet-based detector (SG-Det). First, we construct a lightweight backbone, named Shuffle-GhostNet, which enhances the information flow between channel groups by redesigning the correlation group convolution and introducing the channel shuffle operation. Second, we propose an improved feature pyramid model, namely BiFPN-tiny, which has a lighter structure while being capable of reinforcing small object features. Furthermore, we incorporate the atrous spatial pyramid pooling module (ASPP) to the network, which employs atrous convolution with different sampling rates to obtain multi-scale information. Finally, we generate three sets of bounding boxes at different scales – large, medium, and small – to detect objects of different sizes. Compared with other lightweight detectors, SG-Det achieves better tradeoffs across performance metrics, and enables real-time detection with an accuracy rate of over 90% for maritime objects, which shows that it can be better meet the actual requirements of maritime search and rescue.
ARTICLE | doi:10.20944/preprints202307.0206.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: deep learning; remote sensing; arbitrary object detection; convolutional neural network
Online: 5 July 2023 (02:25:59 CEST)
With the continuous progress of remote sensing image object detection tasks in recent years, researchers in this field have gradually shifted the focus of their research from horizontal object detection to the study of object detection in arbitrary directions. It is worth noting that some properties are different from the horizontal object detection during oriented object detection that researchers have yet to notice much. This article presents the design of a straightforward and efficient arbitrary-oriented detection system, leveraging the inherent properties of the orientation task, including the rotation angle and box aspect ratio. In the detection of low aspect ratio objects, the angle is of little importance to the orientation bounding box, and it is even difficult to define the angle information in extreme categories. Conversely, in the detection of objects with high aspect ratios, the angle information plays a crucial role and can have a decisive impact on the quality of the detection results. By exploiting the aspect ratio of different targets, this letter proposes a ratio-balanced angle loss that allows the model to make a better trade-off between low-aspect ratio objects and high-aspect ratio objects. The rotation angle of each oriented object, which we naturally embed into a two-dimensional Euclidean space for regression, thus avoiding an overly redundant design and preserving the topological properties of the circular space. The performance of the UCAS-AOD, HRSC2016, and DLR-3K datasets show that the proposed model in this paper achieves a leading level in terms of both accuracy and speed. The code is released at https://github.com/minghuicode/Periodic-Pseudo-Domain.
ARTICLE | doi:10.20944/preprints202310.1631.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: small object detection; remote sensing images; context information; multiscale feature fusion
Online: 26 October 2023 (03:42:19 CEST)
Detecting rotational objects in remote sensing imagery is a significant challenge. These images typically encompass a broad field of view, featuring diverse and intricate backgrounds, with ground objects of various sizes densely scattered. As a result, identifying objects of interest within these images is a daunting task. While the integration of Convolutional Neural Networks (CNN) and Transformer networks leads to some advancements in rotational object detection, there is still room for improvement, particularly in enhancing the extraction and utilization of information related to smaller objects. To address this, our paper presents a multi-scale feature fusion module and a global feature context aggregation module. Initially, we fuse original, shallow, and deep features to reduce the loss of shallow feature information, thereby improving the detection performance of small objects in complex backgrounds. Subsequently, we compute the correlation of contextual information within feature maps to extract valuable insights. We name the newly proposed model the "Multiscale Feature Context Aggregation Module" (MFCA). We evaluate our proposed methodology on three challenging remote sensing datasets: DIOR-R, HRSC, and MAR20. Comprehensive experimental results show that our approach surpasses baseline models by 2.07\% mAP, 1.02\% mAP, and 1.98\% mAP on the DIOR-R, HRSC2016, and MAR20 datasets, respectively.
ARTICLE | doi:10.20944/preprints202206.0390.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Object detection; Feature fusion network; Multiple feature selection; Angle prediction; Pixel Attention Mechanism
Online: 29 June 2022 (03:09:52 CEST)
The object detection task is usually affected by complex backgrounds. In this paper, a new image object detection method is proposed, which can perform multi-feature selection on multi-scale feature maps. By this method, a bidirectional multi-scale feature fusion network is designed to fuse semantic features and shallow features to improve the detection effect of small objects in complex backgrounds. When the shallow features are transferred to the top layer, a bottom-up path is added to reduce the number of network layers experienced by the feature fusion network, reducing the loss of shallow features. In addition, a multi-feature selection module based on the attention mechanism is used to minimize the interference of useless information on subsequent classification and regression, allowing the network to adaptively focus on appropriate information for classification or regression to improve detection accuracy. Because the traditional five-parameter regression method has severe boundary problems when predicting objects with large aspect ratios, the proposed network treats angle prediction as a classification task. The experimental results on the DOTA dataset, the self-made DOTA-GF dataset and the HRSC 2016 dataset show that, compared with several popular object detection algorithms, the proposed method has certain advantages in detection accuracy.
ARTICLE | doi:10.20944/preprints202210.0014.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: spectrogram data set; wireless network monitoring; spectrum analysis; frame detection; object detection; deep learning
Online: 4 October 2022 (09:56:48 CEST)
Automated spectrum analysis serves as a troubleshooting tool that helps to diagnose faults in wireless networks like difficult signal propagation conditions and coexisting wireless networks. It provides a higher monitoring coverage while requiring less expertise compared to manual spectrum analysis. In this paper, we introduce a data set that can be used to train and evaluate deep learning models, capable of detecting frames from different wireless standards as well as interference between single frames. Since manually labelling a high variety of frames in different environments is too challenging, an artificial data generation pipeline has been developed. The data set consists of 20 000 augmented signal segments, each containing a random number of different Wi-Fi and Bluetooth frames, their spectral image representations and labels that describe the position and type of frame within the spectrogram. The data set contains results of intermediate processing steps that enables the research or teaching community to create new data sets for specific requirements or to provide new interesting examination examples.
ARTICLE | doi:10.20944/preprints202309.0050.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Autonomous Driving; Harsh Weather; Object Detection; Data Merging; Deep Neural Networks; YOLOv8
Online: 1 September 2023 (09:57:21 CEST)
For autonomous driving, perception is a primary and essential element that fundamentally deals with the insight into the ego vehicle’s environment through sensors. Perception is a challenging task suffering from dynamic objects and continuous environmental changes. The issue gets worse due to interrupting the quality of perception by adverse weather like snow, rain, fog, night light, sand storm, strong daylight, etc. In this work, we have tried to improve camera-based perception accuracy, such as autonomous driving-related object detection in adverse weather. We proposed the improvement of YOLOv8-based object detection in adverse weather through transfer learning using merged data from various harsh weather datasets. Two prosperous open-source datasets (ACDC and DAWN) and their merged dataset were used to detect primary objects on the road in harsh weather. A set of training weights were collected from training on the individual datasets, their merged version, and several subsets of those datasets according to their characteristics. A comparison between the training weights also occurred by evaluating the detection performance on the above-mentioned datasets and their subsets. The evaluation revealed that using custom datasets for training significantly improves the detection performance compared to the YOLOv8 base weights. And using more images through the feature-related data merging technique steadily increases the object detection performance.
ARTICLE | doi:10.20944/preprints202202.0204.v1
Subject: Medicine And Pharmacology, Pharmacy Keywords: computer vision; image processing; medication adherence; object detection; pill detection
Online: 17 February 2022 (08:45:14 CET)
Objective tools to track medication adherence are lacking. A tool to monitor pill intake that can be implemented in mHealth apps without the need for additional devices was developed. We propose a pill intake detection tool that uses digital image processing to analyze images of a blister to detect the presence of pills. The tool uses the circular Hough transform as a feature extraction technique and is therefore primarily useful for the detection of pills with a round shape. This pill detection tool is composed of two steps. First, the registration of a full blister and storing of reference values in a local database. Second, the detection and classification of taken and remaining pills in similar blisters, to determine the actual number of untaken pills. In the registration of round pills in full blisters, 100% of pills in gray blisters or blisters with a transparent cover were successfully detected. In counting of untaken pills in partially opened blisters, 95.2% of remaining and 95.1% of taken pills were detected in gray blisters, while 88.2% of remaining and 80.8% of taken pills were detected in blisters with a transparent cover. The proposed tool provides promising results for the detection of round pills. However, the classification of taken and remaining pills need to be further improved, in particular for the detection of pills with non-oval shapes.
ARTICLE | doi:10.20944/preprints202306.2050.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Serial dual-channel detection; Faster RCNN; Transfer learning; Object classification; End to end system
Online: 28 June 2023 (15:29:49 CEST)
The phenomenon of seat occupancy in university libraries is a prevalent issue. However, existing solutions, such as software-based seat reservations and sensors-based occupancy detection, have proven to be inadequate in effectively addressing this problem. In this study, we propose a novel approach: a serial dual-channel object detection model based on Faster RCNN. Furthermore, we develop a user-friendly web interface and mobile APP to create a computer vision-based platform for library seat occupancy detection. To construct our dataset, we combine real-world data collection with UE5 virtual reality. The results of our tests also demonstrate that the utilization of personalized virtual dataset significantly enhances the performance of the convolutional neural network (CNN) in dedicated scenarios. The serial dual-channel detection model comprises three essential steps. Firstly, we employ Faster RCNN algorithm to determine whether a seat is occupied by an individual. Subsequently, we utilize an object classification algorithm based on transfer learning, to classify and identify images of unoccupied seats. This eliminates the need for manual judgment regarding whether a person is suspected of occupying a seat. Lastly, the web interface and APP provide seat information to librarians and students respectively, enabling comprehensive services. By leveraging deep learning methodologies, this research effectively addresses the issue of seat occupancy in library systems. It significantly enhances the accuracy of seat occupancy recognition, reduces the computational resources required for training CNNs, and greatly improves the efficiency of library seat management.
ARTICLE | doi:10.20944/preprints202210.0131.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Adversarial examples; Remote sensing images; Universal adversarial patch; Object detection; Joint optimization; Scale factor.
Online: 11 October 2022 (02:34:23 CEST)
Although deep learning has received extensive attention and achieved excellent performances in various of scenarios, it suffers from adversarial examples to some extent. Especially, physical attack poses more threats than digital attack. However, existing researches pay less attention to physical attack of object detection in remote sensing images (RSIs). In this work, we systematically analyze the universal adversarial patch attack for multi-scale objects in the remote sensing field. There are two challenges for adversarial attack in RSIs. On one hand, the number of objects in remote sensing images is more than that of natural images. Therefore, it is difficult for adversarial patch to show adversarial effect on all objects when attacking a detector of RSIs. On the other hand, the wide range of height of photography platform causes that the size of objects diverse a lot, which brings challenges for generating universal adversarial perturbation for multi-scale objects. To this end, we propose an adversarial attack method on object detection for remote sensing data. One of the key ideas of the proposed method is the novel optimization of adversarial patch. We aim to attack as many objects as possible by formulating a joint optimization problem. Besides, we raise a scale factor to generate a universal adversarial patch that adapts to multi-scale objects, which ensures the adversarial patch is valid for multi-scale objects in the real world. Extensive experiments demonstrate the superiority of our method against state-of-the-art methods on YOLO-v3 and YOLO-v5. In addition, we also validate the effectiveness of our method in real-world applications.
ARTICLE | doi:10.20944/preprints202204.0279.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: object detection; challenging environments; low-light; image enhancement; complex environments; deep neural networks; computer vision
Online: 28 April 2022 (09:42:37 CEST)
In recent years, due to the advancement of machine learning, object detection has become a mainstream task in the computer vision domain. The first phase of object detection is to find the regions where objects can exist. With the improvement of deep learning, traditional approaches such as sliding windows and manual feature selection techniques have been replaced with deep learning techniques. However, object detection algorithms face a problem when performing in low light, challenging weather, and crowded scenes like any other task. Such an environment is termed a challenging environment. This paper exploits pixel-level information to improve detection under challenging situations. To this end, we exploit the recently proposed hybrid task cascade network. This network works collaboratively with detection and segmentation heads at different cascade levels. We evaluate the proposed methods on three complex datasets of ExDark, CURE-TSD, and RESIDE and achieve an mAP of 0.71, 0.52, and 0.43, respectively. Our experimental results assert the efficacy of the proposed approach.
ARTICLE | doi:10.20944/preprints202308.0837.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: SAR vehicle detection; rotated object detection; Synthetic dataset; Mix MSTAR; deep learning
Online: 10 August 2023 (10:16:27 CEST)
The application of deep learning in the detection of Synthetic Aperture Radar (SAR) targets has been primarily limited to large objects such as ships and airplanes, with much less popularity in detecting SAR vehicles. The complexities of SAR imaging make it difficult to distinguish small vehicles from the background clutter, creating a barrier to data interpretation and the development of Automatic Target Recognition (ATR) in SAR vehicles. The scarcity of datasets has inhibited progress in SAR vehicle detection in the data-driven era. To address this, we introduce a new synthetic dataset called Mix MSTAR, which mixes target chips and clutter backgrounds with original radar data at the pixel level. Mix MSTAR contains 5,392 objects of 20 fine-grained categories in 100 high-resolution images, predominantly 1478x1784 pixels. The dataset includes various landscapes such as woods, grasslands, urban buildings, lakes, and tightly arranged vehicles, each labeled with Oriented Bounding Box (OBB). Notably, Mix MSTAR presents fine-grained object detection challenges by using the Extended Operating Condition (EOC) as a basis for dividing the dataset. Furthermore, we evaluate 9 benchmark rotated detectors on Mix MSTAR and demonstrate the fidelity and effectiveness of the synthetic dataset. To the best of our knowledge, Mix MSTAR represents the first public multi-class SAR vehicle dataset designed for rotated object detection in large-scale scenes with complex background. Mix MSTAR is available at: https://github.com/TheGreatTreatsby/Mix-MSTAR-mmrotate.
ARTICLE | doi:10.20944/preprints202206.0426.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: event-based vision; object detection and tracking; high-temporal resolution tracking; frame-based vision; hybrid approach
Online: 30 June 2022 (09:54:14 CEST)
Event-based vision is an emerging field of computer vision that offers unique properties such as asynchronous visual output, high temporal resolutions, and dependence on brightness changes to generate data. These properties can enable robust high-temporal-resolution object detection and tracking when combined with frame-based vision. In this paper, we present a hybrid, high-temporal-resolution, object detection and tracking approach, that combines learned and classical methods using synchronized images and event data. Off-the-shelf frame-based object detectors are used for initial object detection and classification. Then, event masks, generated per each detection, are used to enable inter-frame tracking at varying temporal resolutions using the event data. Detections are associated across time using a simple low-cost association metric. Moreover, we collect and label a traffic dataset using the hybrid sensor DAVIS 240c. This dataset is utilized for quantitative evaluation using state-of-the-art detection and tracking metrics. We provide ground truth bounding boxes and object IDs for each vehicle annotation. Further, we generate high-temporal-resolution ground truth data to analyze the tracking performance at different temporal rates. Our approach shows promising results with minimal performance deterioration at higher temporal resolutions (48 – 384 Hz) when compared with the baseline frame-based performance at 24 Hz.
ARTICLE | doi:10.20944/preprints202305.0857.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Optical Character Recognition; Sticker Pattern; Deep Learning; Object Detection; YOLO; Manufacture Automation; Paddle OCR
Online: 11 May 2023 (13:33:45 CEST)
Recent advancements in Artificial Intelligence (AI), Deep Learning (DL), and computer vision have revolutionized various industrial processes through image classification and object detection. State-of-the-art Optical Character Recognition (OCR) and Object Detection (OD) technologies, such as YOLO and PaddleOCR, have emerged as powerful solutions for addressing challenges in recognizing textual and non-textual information on printed stickers. However, a well-established framework integrating these cutting-edge technologies for industrial applications still needs to be discovered. In this paper, we propose an innovative framework that combines advanced OCR and OD techniques to automate visual inspection processes in an industrial context. Our primary contribution is a comprehensive framework adept at detecting and recognizing textual and non-textual information on printed stickers within a company, harnessing the latest AI tools and technologies for sticker information recognition. Our experiments reveal an overall macro accuracy of 0.88 for the sticker OCR across three distinct patterns. Furthermore, the proposed system goes beyond traditional Printed Character Recognition (PCR) by extracting supplementary information, such as barcodes and QR codes present in the image, significantly streamlining industrial workflows and minimizing manual labor demands.
ARTICLE | doi:10.20944/preprints202305.0708.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Autonomous Driving; Deep learning Methods; LiDAR Sensing Technology; 3D Object Detection
Online: 10 May 2023 (08:21:46 CEST)
The rapid development of Deep Learning brought novel methodologies for 3D Object Detection using LiDAR sensing technology. These improvements in precision and inference speed performances lead to notable high performance and real-time inference, which is especially important for self-driving purposes. However, the developments carried by these approaches overwhelm the research process in this area since new methods, new technologies, and software versions lead to different project necessities, specifications and requirements. Moreover, the improvements brought by the new methods may be due to improvements in newer versions of deep learning frameworks and not just the novelty and innovation of the model architecture. Thus, it became crucial to create a framework with the same software versions, specifications and requirements that accommodate all these methodologies and allow the easy introduction of new methods and models. A framework is proposed that abstracts the implementation, reusing and building of novel methods and models. The main idea is to facilitate the representation of state-of-the-art (SoA) approaches and simultaneously encourage the implementation of new approaches by reusing, improving and innovating modules in the proposed framework, which has the same software specifications to allow a fair comparison. This makes it possible to determine if the key innovation approach outperforms the current SoA by comparing models in a framework with the same software specifications and requirements.
ARTICLE | doi:10.20944/preprints202209.0109.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Kalman filter; median filter; impulse noise; estimate prediction; object distance determination; lidar; value calibration; point cloud.
Online: 7 September 2022 (10:20:49 CEST)
The task of determining the distance from one object to another is one of the important tasks solved in robotics systems. Conventional algorithms rely on an iterative process of predicting distance estimates, which results in an increased computational burden. Algorithms used in robotic systems should require minimal time costs, as well as be resistant to the presence of noise. To solve these problems, the paper proposes an algorithm for Kalman combination filtering with a Goldschmidt divisor and a median filter. Software simulation showed an increase in the accuracy of predicting the estimate of the developed algorithm in comparison with the traditional filtering algorithm, as well as an increase in the speed of the algorithm. The results obtained can be effectively applied in various computer vision systems.
ARTICLE | doi:10.20944/preprints202110.0089.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Object Detection; Cascade Mask R-CNN; Floor Plan Images; Deep Learning; Transfer Learning; Dataset Augmentation; Computer Vision
Online: 5 October 2021 (15:09:26 CEST)
Object detection is one of the most critical tasks in the field of Computer vision. This task comprises identifying and localizing an object in the image. Architectural floor plans represent the layout of buildings and apartments. The floor plans consist of walls, windows, stairs, and other furniture objects. While recognizing floor plan objects is straightforward for humans, automatically processing floor plans and recognizing objects is a challenging problem. In this work, we investigate the performance of the recently introduced Cascade Mask R-CNN network to solve object detection in floor plan images. Furthermore, we experimentally establish that deformable convolution works better than conventional convolutions in the proposed framework. Identifying objects in floor plan images is also challenging due to the variety of floor plans and different objects. We faced a problem in training our network because of the lack of publicly available datasets. Currently, available public datasets do not have enough images to train deep neural networks efficiently. We introduce SFPI, a novel synthetic floor plan dataset consisting of 10000 images to address this issue. Our proposed method conveniently surpasses the previous state-of-the-art results on the SESYD dataset and sets impressive baseline results on the proposed SFPI dataset. The dataset can be downloaded from SFPI Dataset Link. We believe that the novel dataset enables the researcher to enhance the research in this domain further.
ARTICLE | doi:10.20944/preprints201810.0524.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: object detection; tomato organ; K-means clustering; Soft-NMS; migration learning; convolutional neural network; deep learning
Online: 23 October 2018 (07:57:44 CEST)
In the current natural environment, due to the complexity of the background and the high similarity of the color between immature green tomato and plant, the occlusion of the key organs (flower and fruit) by the leaves and stems will lead to low recognition rate and poor generalization of the detection model. Therefore, an improved tomato organ detection method based on convolutional neural network has been proposed in this paper. Based on the original Faster R-CNN algorithm, Resnet-50 with residual blocks was used to replace the traditional vgg16 feature extraction network, and K-means clustering method was used to adjust more appropriate anchor size than manual setting to improve detection accuracy. A variety of data augmentation techniques were used to train the network. The test results showed that compared with the traditional Faster R-CNN model, the mean average precision (mAP) of the optimal model was improved from 85.2% to 90.7%, the memory requirement decreased from 546.9MB to 115.9 MB, and the average detection time was shortened to 0.073S/sheet. As the performance greatly improved, the training model can be transplanted to the embedded system, which lays a theoretical foundation for the development of precise targeting pesticide application system and automatic picking device.
ARTICLE | doi:10.20944/preprints202309.1174.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Design Right Infringement; Deep Learning; Ensemble Learning; Image Classification; Object Detection; Large-Scale Detection System
Online: 19 September 2023 (03:03:46 CEST)
This paper presents a two-stage hierarchical neural network using image classification and object detection algorithms as key building blocks for a system that automatically detects a potential design right infringement. This neural network is trained to return the Top-N original design right records that highly resemble the input image of a counterfeit. Design rights specify the unique aesthetic characteristics of a product. Due to the rapid change of trends, new design rights are continuously generated. This work proposes an Ensemble Neural Network (ENN), an artificial neural network model that aims to deal with a large amount of counterfeit data and design right records that are frequently added and deleted. At first, we performed image classification and objection detection learning per design right using the existing models with a proven track record of high accuracy. The distributed models form the backbone of the ENN and yield intermediate results aggregated at a master neural network. This master neural network is a deep residual network paired with a fully connected network. This ensemble layer is trained to determine the sub-models that return the best result for a given input image of a product. In the final stage, the ENN model multiples the inferred similarity coefficients to the weighted input vectors produced by the individual sub-models to assess the similarity between the test input image and the existing product design rights to see any sign of violation. Given 84 design rights and the sample product images taken meticulously under various conditions, our ENN model achieved average Top-1 and Top-3 accuracies of 98.409% and 99.460%, respectively. Upon introducing new design rights data, a partial update of the inference model was done an order of magnitude faster than the single model. ENN maintained a high level of accuracy as it scaled out to handle more design rights. Therefore, the ENN model is expected to offer practical help to the inspectors in the field, such as the customs at the border that deal with a swarm of products.
ARTICLE | doi:10.20944/preprints202106.0157.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: Land use and land cover; Classification; Object-based change detection; Multi-temporal image analysis; Landsat; Tiaoxi
Online: 7 June 2021 (09:27:22 CEST)
The changing of land use and land cover (LULC) are both affected by climate and human activity and affect climate, biological diversity, and human well-being. Accurate and timely information about the LULC pattern and change is crucial for land management decision-making, ecosystem monitoring, and urban planning, especially in developing economies undergoing industrialization, urbanization, and globalization. Biodiversity degradation and urban expansion in eastern China are research hot-spots. However, the influence of LULC changes on the region remains largely unexplored. Here, an object-based and multi-temporal image analysis approach was developed to detect how LULC changes during 1985-2015 in the Tiaoxi watershed (Zhejiang province, eastern China) using Landsat TM and OLI data. The main objective of this study is to improve the accuracy of unsupervised change detection from object-based and multi-temporal images. To this end, a total of seven LULC maps are generated with multi-temporal images. A random stratified sample design was used for assessing change detection accuracy. The proposed method achieved an overall accuracy of 91.86%, 92.14%, 92.00%, and 93.86% for 2000, 2005, 2010, and 2015, respectively. Nevertheless, the proposed method, in conjunction with object-oriented and multi-temporal satellite images, offers a robust and flexible approach to LULC changes mapping that helps with emergency response and government management. Urbanization and agriculture efficiency are the main reasons for LULC changes in the region. We anticipate that this freely available data will improve the modeling for surface forcing, provide evidence of changes in LULC, and inform water-management decision-making.
ARTICLE | doi:10.20944/preprints202106.0590.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Object detection; challenging environments; low-light; image enhancement; complex environments; state-of-the-art; deep neural networks; computer vision; performance analysis.
Online: 23 June 2021 (16:01:33 CEST)
Recent progress in deep learning has led to accurate and efficient generic object detection networks. Training of highly reliable models depends on large datasets with highly textured and rich images. However, in real-world scenarios, the performance of the generic object detection system decreases when (i) occlusions hide the objects, (ii) objects are present in low-light images, or (iii) they are merged with background information. In this paper, we refer to all these situations as challenging environments. With the recent rapid development in generic object detection algorithms, notable progress has been observed in the field of object detection in challenging environments. However, there is no consolidated reference to cover state-of-the-art in this domain. To the best of our knowledge, this paper presents the first comprehensive overview, covering recent approaches that have tackled the problem of object detection in challenging environments. Furthermore, we present the quantitative and qualitative performance analysis of these approaches and discuss the currently available challenging datasets. Moreover, this paper investigates the performance of current state-of-the-art generic object detection algorithms by benchmarking results on the three well-known challenging datasets. Finally, we highlight several current shortcomings and outline future directions.
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: traffic flow; object detection; object tracking; deep learning
Online: 1 June 2021 (14:42:58 CEST)
This paper proposes a neural network which fuses the data received from a camera system on a gantry, to detect moving objects and calculate relative position and velocity of the vehicles traveling on a freeway, this information is used to estimate the traffic flow. To estimate the traffic flow at both microscopic and macroscopic view, this paper used YOLO v4 and DeepSORT for vehicle detection and tracking, then counting the number of vehicles pass through the freeway by drawing virtual lines and hot zones, also counting the velocity of each vehicles. The information is then pass to the traffic control center, in order to monitoring and control traffic flow on freeways, and analyzing freeway conditions.
ARTICLE | doi:10.20944/preprints202108.0360.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Table detection, table localization, deep learning, Hybrid Task Cascade, Object detection, deformable convolution, deep neural networks, computer vision, scanned document images, document image analysis.
Online: 17 August 2021 (10:26:42 CEST)
Tables in the document image are one of the most important entities since they contain crucial information. Therefore, accurate table detection can significantly improve information extraction from tables. In this work, we present a novel end-to-end trainable pipeline, HybridTabNet, for table detection in scanned document images. Our two-stage table detector uses the ResNeXt-101 backbone for feature extraction and Hybrid Task Cascade (HTC) to localize the tables in scanned document images. Moreover, we replace conventional convolutions with deformable convolutions in the backbone network. This enables our network to detect tables of arbitrary layouts precisely. We evaluate our approach comprehensively on ICDAR-13, ICDAR-17 POD, ICDAR-19, TableBank, Marmot, and UNLV. Apart from the ICDAR-17 POD dataset, our proposed HybridTabNet outperforms earlier state-of-the-art results without depending on pre and post-processing steps. Furthermore, to investigate how the proposed method generalizes unseen data, we conduct an exhaustive leave-one-out-evaluation. In comparison to prior state-of-the-art results, our method reduces the relative error by 27.57% on ICDAR-2019-TrackA-Modern, 42.64% on TableBank (Latex), 41.33% on TableBank (Word), 55.73% on TableBank (Latex + Word), 10% on Marmot, and 9.67% on UNLV dataset. The achieved results reflect the superior performance of the proposed method.
REVIEW | doi:10.20944/preprints202104.0739.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Deep neural network; survey; document images; review paper; deep learning; performance evaluation; page object detection, graphical page objects; document image analysis; page segmentation
Online: 28 April 2021 (10:17:49 CEST)
In any document, graphical elements like tables, figures, and formulas contain essential information. The processing and interpretation of such information require specialized algorithms. Off-the-shelf OCR components cannot process this information reliably. Therefore, an essential step in document analysis pipelines is to detect these graphical components. It leads to a high-level conceptual understanding of the documents that makes digitization of documents viable. Since the advent of deep learning, the performance of deep learning-based object detection has improved many folds. In this work, we outline and summarize the deep learning approaches for detecting graphical page objects in the document images. Therefore, we discuss the most relevant deep learning-based approaches and state-of-the-art graphical page object detection in document images. This work provides a comprehensive understanding of the current state-of-the-art and related challenges. Furthermore, we discuss leading datasets along with the quantitative evaluation. Moreover, it discusses briefly the promising directions that can be utilized for further improvements.
ARTICLE | doi:10.20944/preprints202212.0570.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: Drone and Aerial Remote Sensing; Image Deblurring; Generative Adversarial Networks; Multi-Scale; Image blur level; Object Detection; Deep Learning
Online: 30 December 2022 (04:45:12 CET)
Drone and aerial remote sensing images are widely used, but their imaging environment is complex and prone to image blurring. Existing CNN deblurring algorithms usually use multi-scale fusion to extract features in order to make full use of aerial remote sensing blurred image information, but images with different degrees of blurring use the same weights, leading to increasing errors in the feature fusion process layer by layer. Based on the physical properties of image blurring, this paper proposes an adaptive multi-scale fusion blind deblurred generative adversarial network (AMD-GAN), which innovatively applies the degree of image blurring to guide the adjustment of the weights of multi-scale fusion, effectively suppressing the errors in the multi-scale fusion process and enhancing the interpretability of the feature layer. The research work in this paper reveals the necessity and effectiveness of a priori information on image blurring levels in image deblurring tasks. By studying and exploring the image blurring levels, the network model focuses more on the basic physical features of image blurring. Meanwhile, this paper proposes an image blurring degree description model, which can effectively represent the blurring degree of aerial remote sensing images. The comparison experiments show that the algorithm in this paper can effectively recover images with different degrees of blur, obtain high-quality images with clear texture details, outperform the comparison algorithm in both qualitative and quantitative evaluation, and can effectively improve the object detection performance of aerial remote sensing blurred images. Moreover, the average PSNR of this paper's algorithm tested on the publicly available dataset RealBlur-R reached 41.02dB, surpassing the latest SOTA algorithm.
ARTICLE | doi:10.20944/preprints202307.0664.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Human Object Recognition and tracking; Multi-Modal Sensing; EO/IR; Radar; Mobile Platform; Deep Learning; Image Fusion, Autonomous Vehicles
Online: 11 July 2023 (05:26:48 CEST)
In modern security situations, tracking multiple human objects in real-time within challenging urban environments is a critical capability for enhancing situational awareness, minimizing response time, and increasing overall operational effectiveness. Tracking multiple entities enables informed decision-making, risk mitigation, and the safeguarding of civil-military operations to ensure safety and mission success. This paper presents a multi-modal electro-optical/infrared (EO/IR) and radio frequency (RF) fused sensing (MEIRFS) platform for real-time human object detection, recognition, classification, and tracking in challenging environments. By utilizing different sensors in a complementary manner, the robustness of the sensing system is enhanced, enabling reliable detection and recognition results across various situations. Specifically designed Radar tag and thermal tag can be used to discriminate friendly and non-friendly objects. The system incorporates deep learning-based image fusion and human object recognition and tracking (HORT) algorithms to ensure accurate situation assessment. After integrating into an all-terrain robot, multiple ground tests were conducted to verify the consistent HORT in various environments. The MEIRFS sensor system has been designed to meet the Size, Weight, Power, and Cost (SWaP-C) requirements for installation on autonomous ground and aerial vehicles.
Subject: Engineering, Control And Systems Engineering Keywords: UAV; Object Detection; Object Tracking; Deep Learning; Kalman Filter; Autonomous Surveillance
Online: 28 September 2021 (11:27:07 CEST)
The ever-burgeoning growth of autonomous unmanned aerial vehicles (UAVs) has demonstrated a promising platform for utilization in real-world applications. In particular, UAV equipped with a vision system could be leveraged for surveillance applications. This paper proposes a learning-based UAV system for achieving autonomous surveillance, in which the UAV can be of assistance in autonomously detecting, tracking, and following a target object without human intervention. Specifically, we adopted the YOLOv4-Tiny algorithm for semantic object detection and then consolidated it with a 3D object pose estimation method and Kalman Filter to enhance the perception performance. In addition, a back-end UAV path planning for surveillance maneuver is integrated to complete the fully autonomous system. The perception module is assessed on a quadrotor UAV, while the whole system is validated through flight experiments. The experiment results verified the robustness, effectiveness, and reliability of the autonomous object tracking UAV system in performing surveillance tasks. The source code is released to the research community for future reference.
ARTICLE | doi:10.20944/preprints202003.0313.v3
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: object detection; faster region-based convolutional neural network (FRCNN); single-shot multibox detector (SSD); super-resolution; remote sensing imagery; edge enhancement; satellites
Online: 29 April 2020 (13:33:56 CEST)
The detection performance of small objects in remote sensing images has not been satisfactory compared to large objects, especially in low-resolution and noisy images. A generative adversarial network (GAN)-based model called enhanced super-resolution GAN (ESRGAN) showed remarkable image enhancement performance, but reconstructed images usually miss high-frequency edge information. Therefore, object detection performance showed degradation for small objects on recovered noisy and low-resolution remote sensing images. Inspired by the success of edge enhanced GAN (EEGAN) and ESRGAN, we applied a new edge-enhanced super-resolution GAN (EESRGAN) to improve the quality of remote sensing images and used different detector networks in an end-to-end manner where detector loss was backpropagated into the EESRGAN to improve the detection performance. We proposed an architecture with three components: ESRGAN, EEN, and Detection network. We used residual-in-residual dense blocks (RRDB) for both the ESRGAN and EEN, and for the detector network, we used a faster region-based convolutional network (FRCNN) (two-stage detector) and a single-shot multibox detector (SSD) (one stage detector). Extensive experiments on a public (car overhead with context) dataset and another self-assembled (oil and gas storage tank) satellite dataset showed superior performance of our method compared to the standalone state-of-the-art object detectors.
ARTICLE | doi:10.20944/preprints202109.0059.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: table detection; table recognition; cascade Mask R-CNN; atrous convolution; recursive feature pyramid networks; document image analysis; deep neural networks; computer vision, object detection.
Online: 3 September 2021 (11:05:10 CEST)
Table detection is a preliminary step in extracting reliable information from tables in scanned document images. We present CasTabDetectoRS, a novel end-to-end trainable table detection framework that operates on Cascade Mask R-CNN, including Recursive Feature Pyramid network and Switchable Atrous Convolution in the existing backbone architecture. By utilizing a comparatively lightweight backbone of ResNet-50, this paper demonstrates that superior results are attainable without relying on pre and post-processing methods, heavier backbone networks (ResNet-101, ResNeXt-152), and memory-intensive deformable convolutions. We evaluate the proposed approach on five different publicly available table detection datasets. Our CasTabDetectoRS outperforms the previous state-of-the-art results on four datasets (ICDAR-19, TableBank, UNLV, and Marmot) and accomplishes comparable results on ICDAR-17 POD. Upon comparing with previous state-of-the-art results, we obtain a significant relative error reduction of 56.36%, 20%, 4.5%, and 3.5% on the datasets of ICDAR-19, TableBank, UNLV, and Marmot, respectively. Furthermore, this paper sets a new benchmark by performing exhaustive cross-datasets evaluations to exhibit the generalization capabilities of the proposed method.
ARTICLE | doi:10.20944/preprints202104.0653.v1
Online: 26 April 2021 (10:55:00 CEST)
The aim of this paper is to use deep learning tools to innovate pre-trained object detection models to improve the accuracy of non-destructive testing (NDT) of civil aviation maintenance. First, this thesis classifies object defects for NDT, such as cracks, undercut, etc. Nowadays, thesis surveys innovation deep-learning methods technology is used to improve the defect detection performance inferencing capability, increase the accuracy and efficiency of automatic identification which in enhanced the safety and reliability of aircraft fuselage in future, mark hidden cracks and solve the challenges that cannot be identified by manual inspection. Second, recent mainstream techniques the YOLOv4 neural network to the graphics card GPU core operator to speed up the recognition of defect images is being applied to the non-destructive inspection process of aircraft maintenance on A, C and D-Level, fully validating the deep learning model's powerful defect detection target capability. The attention-based YOLOv4 algorithm is improved by applying a one-stage attention mechanism to the YOLOv4, thereby improving the accuracy of the innovation model. Finally, thesis improved YOLOv4 based on an attention mechanism is proposed for object detection NDT via the deep learning method to effectively improve and shorten the inspection anomaly detection method for automatic detection sensor systems.
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Deep Learning; Reducing Training Annotations per Image; Object Detection; Object Counting; Asymmetric Loss Function
Online: 15 January 2021 (15:44:51 CET)
Annotating training data is a time consuming and labor intensive process in deep learning, especially for images with many objects present. In this paper, we propose a method to allow deep networks to be trained on data with reduced numbers of annotations (per image) in heatmap regression tasks (e.g. object detection and counting), by applying an asymmetric loss function. In a real scenario, this reduction of annotations can be imposed by the researchers (e.g. ask the annotators to label only 50% of what they see in each image), or can potentially counteract unintentionally missing labels from the annotators. To demonstrate the effectiveness of our method, we conduct experiments in two domains, crowd counting and wheat spikelet detection, using different deep network architecture. We drop various percentages of instance annotations per image in training. Results show that an asymmetric loss function is effective across different models and datasets, even in very extreme cases with limited annotations provided (e.g. 90% of the original annotations reduced). Whilst tuning of the key parameters are required, we find that setting conservative parameter values can help more realistic situations, where only small amounts of data have been missed by annotators.
ARTICLE | doi:10.20944/preprints202309.2147.v1
Subject: Physical Sciences, Thermodynamics Keywords: AC electro-kinetics; electro-kinetic object manipulation; inhomogeneous object polarization; microchambers; micro-systems; object manipulation; field-cage; μTAS; MatLab® model; thermodynamics; energy dissipation; LMEP
Online: 30 September 2023 (07:40:23 CEST)
In two previous papers, we calculated the dielectrophoresis (DEP) force and corresponding trajectories of high- and low-conductance 200-µm 2D spheres in a square 1x1mm chamber with plane-versus-pointed, plane-versus-plane and pointed-versus-pointed electrode configurations by applying the law of maximum entropy production (LMEP) to the system. Here, we complete these considerations for configurations with four-pointed electrodes centered on the chamber edges. The four electrodes were operated in either object-shift mode (two adjacent electrodes opposite the other two adjacent electrodes), DEP mode (one electrode versus the other three electrodes), or field-cage mode (two electrodes on opposite edges versus the two electrodes on the other two opposite edges). As in previous work, we have assumed DC properties for the object and the external media for simplicity. Nevertheless, every possible polarization ratio of the two media can be modeled this way. The trajectories of the spherical centers and the corresponding DEP forces were calculated from the gradients of the system’s total energy dissipation, described by numerically-derived conductance fields. In each of the three drive modes, very high attractive and repulsive forces were found in front of pointed electrodes for the high and low-conductance spheres, respectively. The conductance fields predict bifurcation points, watersheds, and trajectories with multiple endpoints. The high and low-conductance spheres usually follow similar trajectories, albeit with reversed orientations. In DEP drive mode, the four-point electrode chamber provides a similar area for DEP measurements as the classical plane-versus-pointed electrode chamber.
ARTICLE | doi:10.20944/preprints202306.2254.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Attribute and Object Selection; Fuzzification; Discretization
Online: 30 June 2023 (12:10:31 CEST)
Significant attribute selection in machine learning is one of the key aspects aimed at simplifying the problem and reducing its dimensionality, and consequently speeding up computation. This paper proposes new algorithms for selecting not only relevant features but also for evaluating and selecting a subset of relevant objects in a dataset. Both algorithms are mainly based on the use of a fuzzy approach. The research presented here yielded preliminary results of a new approach to the problem of selecting relevant attributes and objects, and, in fact, to selecting appropriate ranges of their values. Detailed results obtained on the Sonar dataset show the positive effects of this approach. Moreover, the observed results may suggest the effectiveness of the proposed method in terms of identifying a subset of truly relevant attributes from among those identified by traditional feature selection methods.
ARTICLE | doi:10.20944/preprints202301.0030.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: OpenCV; Python; objects; object detection; card
Online: 3 January 2023 (09:40:17 CET)
Computer vision is a rapidly developing field that focuses on highly sophisticated picture analysis, manipulation, and comprehension. Its objective is to analyze what is happening in front of a camera and utilize that understanding to control a computer or robotic system or to present users with fresh visuals that are more enlightening or appealing than the original camera images. Computer vision technologies make it feasible for new user interfaces, augmented reality gaming, biometrics, automobile, photography, movie creation, Web search, and many more applications. This essay seeks to explain how computer vision can be utilized to play blackjack successfully.
ARTICLE | doi:10.20944/preprints202212.0543.v1
Subject: Computer Science And Mathematics, Signal Processing Keywords: OpenCV; Python; objects; object detection; card
Online: 28 December 2022 (12:42:17 CET)
Computer vision is a fast-expanding discipline focusing on analyzing, altering, and comprehending images at a high level. Its goal is to figure out what's going on in front of a camera and use that knowledge to manage a computer or robotic system or to show people new visuals that are more instructive or attractive than the original camera photos. Video surveillance, biometrics, automotive, photography, movie production, Web search, medicine, augmented reality gaming, new user interfaces, and many other applications are all possible with computer vision technologies. This paper aims to describe how computer vision will be used to play a winning game of blackjack.
ARTICLE | doi:10.20944/preprints201810.0651.v1
Online: 29 October 2018 (04:32:23 CET)
Aims of the paper are the results of a research on a wooden box that holds an important historical document, which is a hand Bible handwritten in the thirteenth century. The tradition connect this Bible to the name of Marco Polo (Venice, 1254 - Venice, 1324), who would be the owner and that it would accompany him on his travels (1262 and 1271) in China. The Bible, of fine workmanship, written on thin parchment, and its container, along with a yellow silk cloth, is preserved in the ancient and prestigious Laurentian Library in Florence. The manuscript was in very poor condition and in the course of the study (2011) was being restored. Aims of survey were to determine the place and period of realization of the box, or rather if it be contemporary or later than the manuscript it contains and whether it was made in the East or in Europe.
ARTICLE | doi:10.20944/preprints201709.0139.v1
Subject: Social Sciences, Geography, Planning And Development Keywords: Accuracy; Uncertainties; Object-Based; Slums; Jakarta
Online: 27 September 2017 (16:45:25 CEST)
Object-Based Image Analysis (OBIA) has been successfully used to map slums. In general, the occurrence of uncertainties in producing geographic data is inevitable. However, most studies concentrated solely on assessing the classification accuracy and neglecting the inherent uncertainties. Our research analyses the impact of uncertainties in measuring the accuracy of OBIA-based slum detection. We selected Jakarta as our case study area, because of a national policy of slum eradication, which is causing rapid changes in slum areas. Our research comprises of four parts: slum conceptualization, ruleset development, implementation, and accuracy and uncertainty measurements. Existential and extensional uncertainty arise when producing reference data. The comparison of a manual expert delineations of slums with OBIA slum classification results into four combinations: True Positive, False Positive, True Negative and False Negative. However, the higher the True Positive (which lead to a better accuracy), the lower the certainty of the results. This demonstrates the impact of extensional uncertainties. Our study also demonstrates the role of non-observable indicators (i.e., land tenure), to assist slum detection, particularly in areas where uncertainties exist. In conclusion, uncertainties are increasing when aiming to achieve a higher classification accuracy by matching manual delineation and OBIA classification.
ARTICLE | doi:10.20944/preprints202306.0262.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Multi-object tracking; DeepSORT; object detection; sensor fusion; deep learning, autonomous vehicles; radars; adverse weather; fog
Online: 5 June 2023 (08:09:38 CEST)
The presence of fog in the background can prevent small and distant objects from being detected, let alone tracked. Under safety-critical conditions, multi-object tracking models require faster-tracking speed while maintaining high object-tracking accuracy. The original DeepSORT algorithm used YOLOv4 for the detection phase, and a simple neural network for deep appearance descriptor. Consequently, the feature map generated loses relevant details about the track being matched with a given detection in fog. Targets with a high degree of appearance similarity on the detection frame are more likely to be mismatched, resulting in identity switches or track failures in heavy fog. We propose an improved multi-object tracking model based on the DeepSORT algorithm to im-prove tracking accuracy and speed under foggy weather conditions. First, we employed our camera-radar fusion network (CR-YOLOnet) in the detection phase for faster and more accurate object detection. We proposed an appearance feature network to replace the basic convolutional neural network. We incorporated GhostNet to take the place of the traditional convolutional layers to generate more features and reduce computational complexities and cost. We adopted a segmentation module and fed the semantic labels of the corresponding input frame to add rich semantic information to the low-level appearance feature maps. Our proposed method outperformed YOLOv5 + DeepSORT with a 35.15% increase in multi-object tracking accuracy, a 32.65% increase in multi-object tracking precision, the speed increased by 37.56%, and identity switches decreased by 46.81%.
ARTICLE | doi:10.20944/preprints201711.0101.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: tactile sensing; artificial robotic skin; active tactile object perception; active tactile object learning; active tactile transfer learning
Online: 16 November 2017 (03:49:49 CET)
Reusing the tactile knowledge of some previously explored objects helps us to easily recognize the tactual properties of new objects. In this paper, we enable a robotic arm equipped with multi-modal artificial skin, like humans, to actively transfer the prior tactile exploratory action experiences when it learns the detailed physical properties of new objects. These experiences, or prior tactile knowledge, are built by the feature observations that the robot perceives from multiple sensory modalities, when it applies the pressing, sliding, and static contact movements on objects with different action parameters. We call our method Active Prior Tactile Knowledge Transfer (APTKT), and systematically evaluated its performance by several experiments. Results show that the robot improved the discrimination accuracy by around 10% when it used only one training sample plus the feature observations of prior objects. By incorporating the auxiliary features, the transfer learning improved the discrimination accuracy by over 20%. The results also show that the proposed method is robust against transferring irrelevant prior tactile knowledge (negative knowledge transfer).
ARTICLE | doi:10.20944/preprints202105.0641.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: foraminifera; instance segmentation; object detection; deep learning
Online: 26 May 2021 (13:33:54 CEST)
Foraminifera are single-celled marine organisms that construct shells that remain as fossils in the marine sediments. Classifying and counting these fossils are important in e.g. paleo-oceanographic and -climatological research. However, the identification and counting process has been performed manually since the 1800s and is laborious and time-consuming. In this work, we present a deep learning-based instance segmentation model for classifying, detecting, and segmenting microscopic foraminifera. Our model is based on the Mask R-CNN architecture, using model weight parameters that have learned on the COCO detection dataset. We use a fine-tuning approach to adapt the parameters on a novel object detection dataset of more than 7000 microscopic foraminifera and sediment grains. The model achieves a (COCO-style) average precision of 0.78±0.00 on the classification and detection task, and 0.80±0.00 on the segmentation task. When the model is evaluated without challenging sediment grain images, the average precision for both tasks increases to 0.84±0.00 and 0.86±0.00, respectively. Prediction results are analyzed both quantitatively and qualitatively and discussed. Based on our findings we propose several directions for future work, and conclude that our proposed model is an important step towards automating the identification and counting of microscopic foraminifera.
ARTICLE | doi:10.20944/preprints202311.0629.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: space object identification; deep learning; YOLO; deformable convolution
Online: 9 November 2023 (11:00:23 CET)
With the rapid development of space programs in various countries, the number of satellites in space is increasing, resulting in an increasingly complex space environment. Therefore, improving space object identification technology has become highly important. We proposes a method of applying deep learning to intelligent detection of space object. We utilize 49 authentic 3D satellite models including 16 scenarios to generate a dataset comprising 17,942 images, which contains over 500 actual satellite photos. Additionally, we acquired a substantial amount of annotated data using a semi-automatic labeling method, which resulted in significant labor cost savings, and obtained a total of 39,000 labels. We validate the feasibility of the dataset using YOLOv3 and YOLOv7 models. What's more, we optimize the YOLOv7 model by integrating deformable convolution RepPoint into the YOLOv7 backbone to obtain the YOLOv7-R model. Through training with these two models, experimental results show that YOLOv3 achieves an accuracy of 0.927, YOLOv7 reaches an accuracy of 0.964, and YOLOv7-R achieves the highest accuracy at 0.983. This provides an effective solution for intelligent space object detection.
ARTICLE | doi:10.20944/preprints201806.0279.v2
Subject: Physical Sciences, Astronomy And Astrophysics Keywords: galaxy morphology, machine learning; data analysis; object classification
Online: 22 October 2018 (13:01:42 CEST)
Automated machine classifications of galaxies are necessary because the size of upcoming surveys will overwhelm human volunteers. We improve upon existing machine classification methods by adding the output of SpArcFiRe to the inputs of a machine learning model. We use the human classifications from Galaxy Zoo 1 (GZ1) to train a random forest of decision trees to reproduce the human vote distributions of the Spiral class. We prefer the random forest model over other black box models like neural networks because it allows us to trace post hoc the precise reasoning behind the classification of each galaxy. We find that, across a sample of 470,000 Sloan galaxies that are large enough that details could be seen if they were there, the combination of SpArcFiRe outputs with existing SDSS features provides a better machine classification than either one alone on comparison to Galaxy Zoo 1. We suggest that adding SpArcFiRe outputs as features to any machine learning algorithm will likely improve its performance.
ARTICLE | doi:10.20944/preprints201711.0153.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: image segmentation; object labeling; color space; fruit counting
Online: 23 November 2017 (10:37:25 CET)
Identifying the total number of fruits on trees has long been of interest in agricultural crop estimation work. Yield prediction of fruits in practical environment is one of the hard and significant tasks to obtain better results in crop management system to achieve more productivity with regard to moderate cost. Utilized color vision in machine vision system to identify citrus fruits, and estimated yield information of the citrus grove in-real time. Fruit recognition algorithms based on color features to estimate the number of fruit. In the current research work, some low complexity and efficient image analysis approach was proposed to count yield fruits image in the natural scene. Semi automatic segmentation and yield calculation of fruit based on shape analysis is presented. Color and shape analysis was utilized to segment the images of different fruits like apple, pomegranate obtained under different lighting conditions. First the input sectional tree image was converted from RGB colour space into the colour space transform (i.e., YUV, YIQ, or YCbCr). The resultant image was then applied to the algorithm for fruit segmentation. After it is applied Morphological Operations which is enhanced image then execute Blob counting method which identify the object and count the number of it. Accuracy of this algorithm used in this thesis is 82.21% for images that have been scanned.
ARTICLE | doi:10.20944/preprints202106.0665.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: Delft3D; Object Mobility Model; Munitions Mobility and Burial; Object Shields Parameter; Sediment Shields Parameter; Equilibrium Burial Percentage; Sediment Supporting Point
Online: 28 June 2021 (14:24:25 CEST)
Coupled Delft3D-object model has been developed to predict object’s mobility and burial on sandy seafloor. The Delft3D model is used to predict seabed environment such as currents, waves (peak period, significant wave height, wave direction), water level, sediment transport, and seabed change, which are taken as the forcing term to the object model consisting of three components: (a) object‘s physical parameters such as diameter, length, mass, and rolling moment, (b) dynamics of rolling cylinder around its major axis, and (c) empirical sediment scour model with re-exposure parameterization. The model is compared with the observational data collected from a field experiment from 21 April to 23 May 2013 off the coast of Panama City, Florida funded by the Department of Defense Strategic Environmental Research and Development Program. The experimental data contain both objects’ mobility using sector scanning and pencil beam sonars and simultaneous environmental time series data of the boundary layer hydrodynamics and sediment transport conditions. Comparison between modeled and observed data clearly show the model capability.
ARTICLE | doi:10.20944/preprints202312.0406.v1
Subject: Computer Science And Mathematics, Robotics Keywords: dynamic SLAM; object detection; Kalman filters; multi-view geometry
Online: 6 December 2023 (11:03:51 CET)
The dynamic factors in the environment violate the static environment assumption of the SLAM algorithm, leading to a decrease in the accuracy of camera positioning. In recent years, many studies have attempted to address dynamic objects by combining semantic constraints and geometric constraints, but issues such as poor real-time performance, the potential for mistakenly treating humans as rigid objects, and subpar performance in high-dynamic scenes still persist. This paper proposes a dynamic scene visual SLAM algorithm called Dynamic SLAM based on target tracking and multi-view geometry (TKG-SLAM), based on object detection, Kalman filters, and multi-view geometry. This algorithm takes into consideration both real-time performance and algorithm accuracy. It combines semantic constraints and multi-view geometry constraints, selectively running the algorithm in different scenarios, filtering and preserving static points for camera pose estimation. Experimental results demonstrate that, compared to current state-of-the-art dynamic SLAM methods, our approach achieves optimal performance in some scenarios and exhibits stronger real-time capabilities.
ARTICLE | doi:10.20944/preprints202310.1784.v1
Subject: Engineering, Control And Systems Engineering Keywords: hand-eye calibration; reprojection error analysis; manipulator object grasping
Online: 27 October 2023 (11:38:43 CEST)
During the hand-eye calibration process of the manipulator, the Euclidean distance error of the calibration results cannot be calculated because the true values of the hand-eye conversion matrix cannot be obtained. In this paper, a new method of projection error analysis is presented. Error analysis is carried out by using a priori knowledge that the location of AR markers is fixed in the calibration process. The coordinates of the AR marker center point are reprojected to the pixel coordinate system, and then compared with the true pixel coordinates of the AR marker center point obtained by corner detection or manual labeling in order to obtain the Euclidean distance between the two coordinates as the basis for error analysis. Experimental results show that the proposed optimization method can greatly improve the accuracy of hand-eye calibration results.
ARTICLE | doi:10.20944/preprints202308.2046.v1
Subject: Biology And Life Sciences, Neuroscience And Neurology Keywords: fMRI; MVPA; PPI; dACC; occipital lobe; occluded object recognition
Online: 30 August 2023 (15:28:29 CEST)
Recognizing highly occluded objects is believed to arises from the interaction between the brain's vision and cognition controlling areas, although supporting neuroimaging data is currently limited. To explore the neural mechanism during this activity, we conducted an occlusion object recognition experiment using functional magnetic resonance imaging (fMRI). During magnet resonance examinations, 66 subjects engaged in object recognition tasks with three different occlusion degrees. Generalized linear model (GLM) analysis showed that the activation degree of occipital lobe (inferior occipital gyrus, middle occipital gyrus, and occipital fusiform gyrus) and dorsal anterior cingulate cortex (dACC) was related to the occlusion degree of the objects. Multivariate pattern analysis (MVPA) further unearthed a considerable surge in classification precision when dACC activation was incorporated as a feature. This suggested the combined role of dACC and occipital lobe in occluded object recognition tasks. Moreover, psychophysiological interaction (PPI) analysis disclosed that functional connectivity (FC) between the dACC and the occipital lobe was enhanced with increased occlusion, highlighting the necessity of FC between these two brain regions in effectively identifying exceedingly occluded objects. In conclusion, these findings contribute to understanding the neural mechanisms of highly occluded objects recognition, augmenting our appreciation of how the brain manages incomplete visual data.
ARTICLE | doi:10.20944/preprints202308.0346.v1
Subject: Biology And Life Sciences, Animal Science, Veterinary Science And Zoology Keywords: foals; behavior; handling; haltering; halter training; novel object test
Online: 3 August 2023 (14:32:23 CEST)
It could be helpful if behaviours presented in routine handling could be used as temperament indicators, thus directing tailor-made managements, optimising resources, and animal welfare. We evaluate whether behaviours presented by 25 foals during the first navel treatment and first halter session at weaning. Their behaviour was assessed during a novel object transposition test when they were one year of age. Behavioural parameters presented then were correlated with behaviours presented at navel treatment and haltering. Foals that showed higher exploratory activity during the test correlated to foals that showed less reactivity and less tickling at navel treatment and that were more relaxed and showed less reactivity haltering. Confidence to transpose the novel object was correlated with foals that were more relaxed, showed less reactivity and were less ticklish at navel treatment and were more relaxed during haltering. Transposition style correlated foals less prone to transpose with foals less curious in humans at navel treatment and more vigilant at haltering. Correlations verified between routine management and behaviour test indicate the possibility of early identification of equine temperament, allowing management techniques and specific training for the individual, enhancing training efficiency, animal welfare, and better human-horse interactions.
ARTICLE | doi:10.20944/preprints202306.0102.v1
Subject: Computer Science And Mathematics, Robotics Keywords: virtual displacement principle; deformed object; constitutive relationship; analytical mechanics
Online: 1 June 2023 (14:02:19 CEST)
Analytical mechanics is the most basic discipline. The basic principles of analytical mechanics should also be applicable to general deformed objects. However, the virtual displacement principle proposed by analytical mechanics is only applicable to particle systems and rigid body systems, but not to general deformed objects. In this paper, the generalized virtual displacement principle of general deformed objects (such as, elastic objects, plastic objects, elasto-platic objects and flexible objects etc ) is derived by the method of analytical mechanics, which is also applicable to particle systems and rigid body systems. First of all, according to the method of analytical mechanics, the external force, internal force, constraint reaction force and elastic recovery force of the deformed object system under the equilibrium state are analyzed, and the concepts of virtual displacement, ideal constraint and virtual work are introduced, and the generalized virtual displacement principle (also called generalized virtual work principle) of deformed objects is proposed; secondly, vector form, coordinate component form and generalized coordinate form of generalized virtual displacement principle of deformed object are given; thirdly, as the application of the principle, the virtual displacement principle of deformed objects in plane polar coordinate system, space cylindrical coordinate system and spherical coordinate system is given; finally, a brief conclusion is drawn. This work unifies the virtual displacement principle of elastic object, plastic, elastoplastic etc deformed object systems and rigid object systems with the basic method of analytical mechanics. It is a basic principle for dealing with the static problems of deformed objects. This work also lays a foundation for the further study of the dynamics of deformed object systems.
ARTICLE | doi:10.20944/preprints202305.2072.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: stereo SLAM; semantic segmentation; moving object detection; dynamic scenarios
Online: 30 May 2023 (07:54:13 CEST)
As we all know, many more dynamic objects appear almost continuously in real world, which are immensely able to impair the performance of the majority of vision-based SLAM systems that are based on the static-world assumption. In order to improve the robustness and accuracy of visual SLAM in the high dynamic environment, a real-time and robust stereo SLAM system for dynamic scenes was proposed. To weaken influences of dynamic content, the moving object detection method was put forward in our visual odometry, then the semantic segmentation network was combined into our stereo SLAM to extract pixel-level contours of dynamic objects. Then influences of dynamic objects were extremely weaken and the performance of our system was increased markedly in dynamic, complex and crowed city spaces. Experiment with both on KITTI Odometry dataset and in a real-life scene, the results show that our method can dramatically decrease the tracking error or drift, improve the robustness and stability of our stereo SLAM in high dynamic outdoor scenarios.
ARTICLE | doi:10.20944/preprints202203.0111.v1
Subject: Engineering, Civil Engineering Keywords: close range photogrammetry; 3D linear control network; object dimensioning
Online: 7 March 2022 (19:57:21 CET)
In surveying engineering tasks, close-range photogrammetry belongs to leading technology considering different aspects like the achievable accuracy, availability of hardware and software, accessibility to measured objects, or the economy. Hence, constant studies on photogrammetric data processing are desirable. Especially in industrial applications, the control points for close-range photogrammetry are usually measured using total stations. In the case of small objects, more precise positions of control points can be obtained by deploying and adjusting a three-dimensional linear network set up on the object. The article analyzes the accuracy of the proposed method, based on the measurement of the linear network using a tape with a precision of ±1 mm. The experiment shows that the adjusted positions of the network control points can be determined with high, one-millimeter accuracy. The photogrammetric 3D model derived referring to such control points and stereo-images captured with a non-metric camera is also characterized by the highest possible precision, which qualifies the presented method to accurate measurements used in surveying engineering. The authors prove that the distance between two randomly optional points derived from the 3D model of a dimensioned object is equal to the actual distance measured directly on it with one-millimeter accuracy.
ARTICLE | doi:10.20944/preprints202006.0170.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: object detection; semantic segmentation; computer vision; automatic check-out
Online: 14 June 2020 (12:51:26 CEST)
Auto checkout has received more and more attention in recent years and this system automatically generates a shopping bill by identifying the picture of the products purchased by the customers. However, the system is challenged by the domain adaptation problem, where each image of the training set contains only one commodity, whereas the test set is a collection of multiple commodities. The existing solution to this problem is to resynthesize the training images to enhance the training set. Then the composite images are rendered using CycleGAN to make the image distribution of the training set and the test set more similar. However, we find that the detection boxes given by the ground truth of the common dataset contain a large part of the background area, the area will affect the training process as noise. To solve this problem, we propose a mask data priming method. Specifically, we redo the large scale Retail Product Checkout (RPC) dataset and add segmentation annotation information to each item in the training set image based on the original dataset using pixel-level annotation. Secondly, a new network structure is proposed in which we train the network using joint learning of detectors and counters, and fine-tune the detection network by filtering out suitable images from the test set. Experiments on the RPC dataset have shown that our method yields better results. we used an approach that reached 81.87% compared to 56.68% for the baseline approach which demonstrates that pixel-level information helps to improve the detection results of the network.
ARTICLE | doi:10.20944/preprints201905.0342.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: cadastral boundaries; automation; feature extraction; object based image analysis
Online: 29 May 2019 (04:37:50 CEST)
The objective to fast-track the mapping and registration of large numbers of unrecorded land rights globally, leads to the experimental application of Artificial Intelligence (AI) in the domain of land administration, and specifically the application of automated visual cognition techniques for cadastral mapping tasks. In this research, we applied and compared the ability of rule-based systems within Object Based Image Analysis (OBIA), as opposed to human analysis, to extract visible cadastral boundaries from Very high resolution (VHR) World View-2 image, in both rural and urban settings. From our experiments, machine-based techniques were able to automatically delineate a good proportion of rural parcels with explicit polygons where the correctness of the automatically extracted boundaries was 47.4% against 74.24% for humans and the completeness of 45% for machine, as against 70.4% for humans. On the contrary, in the urban area, automatic results were counterintuitive: even though urban plots and buildings are clearly marked with visible features such as fences, roads and tacitly perceptible to eyes, automation resulted in geometrically and topologically poorly structured data, that could neither be geometrically compared with human digitised, nor actual cadastral data from the field. These results provide an updated snapshot with regards to the performance of contemporary machine-drive feature extraction techniques compared to conventional manual digitising.
ARTICLE | doi:10.20944/preprints201709.0155.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: object-oriented technique; change detection; eCognition® software; landuse
Online: 29 September 2017 (12:51:40 CEST)
This study compared two object-oriented land use change detection methods—detection after classification (DAC) and classification after detection (CAD) —based on a digital elevation model, slope data, and multi-temporal Landsat images (TM image for 2000 and ETM image for 2010). We noted that the overall accuracy of the DAC (86.42%) was much higher than that of the CAD (71.71%). However, a slight difference between the accuracies of the two methods exists for deciduous broadleaf forest, evergreen coniferous forest, mixed wood, upland, paddy, reserved land, and settlement. Owing to substantial spectrum differences, these land use types can be extracted using spectral indexes. The accuracy of DAC was much higher than that of CAD for industrial land, traffic land, green shrub, reservoir, lake, river, and channel, all of which share similar spectrums. The discrepancy was mainly because DAC can completely utilize various forms of information apart from spectrum information during a two-stage classification. In addition, the change-area boundary was not limited at first, but was adjustable in the process of classification. DAC can overcome smoothing effects to a great extent using multi-scale segmentations and multi-characters in detection. Although DAC yielded better results, it was more time-consuming (28 days) because it uses a two-stage classification approach. Conversely, CAD consumed less time (15 days). Thus, a hybrid of the two methods is recommended for application in land use change detection.
ARTICLE | doi:10.20944/preprints201705.0190.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: wireless sensor network; object tracking; dual sink; data collection
Online: 26 May 2017 (04:58:00 CEST)
Continuous object tracking in WSNs, such as monitoring of mud-rock flows, forest fires etc., is a challenging task due to characteristic nature of continuous objects. They can appear randomly in the network field, move continuously, and can change in size and shape. Monitoring such objects in real-time generally require tremendous amount of messaging between sensor nodes to synergistically estimate object’s movement and track its location. In this paper, we propose a novel twofold-sink mechanism, comprising of a mobile and a static sink node. Both sink nodes gather information about boundary sensor nodes, which is then used to uniformly distribute energy consumption across all network nodes, thus helping in saving residual energy of network nodes. Numerous object tracking schemes, using mobile sink, have been proposed in the literature. However, existing schemes employing mobile sink cannot be applied to track continuous objects, because of momentous variation of network topology. Therefore, we present in this paper a mechanism, transformed from K-means algorithm, to find the best sensing location of the mobile sink node. It helps to reduce transmission load on the intermediate network nodes situated between static sink node and the ordinary network sensing nodes. The simulation results show that the proposed scheme can distinctly improve life time of the network, compared to one-sink protocol employed in continuous object tracking.
ARTICLE | doi:10.20944/preprints201703.0159.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: object detection; background subtraction; video surveillance; Kinect sensor fusion
Online: 20 March 2017 (10:21:40 CET)
Depth-sensing technology has led to broad applications of inexpensive depth cameras that can capture human motion and scenes in 3D space. Background subtraction algorithms can be improved by fusing color and depth cues, thereby allowing many issues encountered in classical color segmentation to be solved. In this paper, we propose a new fusion method that combines depth and color information for foreground segmentation based on an advanced color-based algorithm. First, a background model and a depth model are developed. Then, based on these models, we propose a new updating strategy that can eliminate ghosting and black shadows almost completely. Extensive experiments have been performed to compare the proposed algorithm with other, conventional RGB-D algorithms. The experimental results suggest that our method extracts foregrounds with higher effectiveness and efficiency.
REVIEW | doi:10.20944/preprints201909.0233.v1
Subject: Biology And Life Sciences, Ecology, Evolution, Behavior And Systematics Keywords: primate hand use; primate grooming; manual grooming; object manipulation; primate evolution; oral grooming; object play; tool use; Machiavellian Intelligence; Bayesian decision theory
Online: 20 September 2019 (06:39:59 CEST)
The evolution of manual grooming and its implications have received little attention in the quest to understand the origins of simian primates and their social and technical intelligence. All simians groom manually, whereas prosimians groom orally despite comparable manual dexterity between some members of the two groups. Simians also exhibit a variable propensity for the manipulation of inanimate, non-food objects, which has culminated in tool making and tool use in some species. However, lemuriform primates also seem capable of tool use with training. Furthermore, lemuriforms appear to understand the concept of a tool and use their own body parts as “tools”, despite not using inanimate objects. This suggests that prosimian primates are pre-adapted for proprioceptive object manipulation and tool use, but do not express these cognitive abilities by default. This essay explores the paleontological, anatomical, cognitive, ethological, and neurological roots of these abilities and attempts to explain this behavioural divide between simians and prosimians. Common misconceptions about early primate evolution and captive behaviours are addressed, and chronological inconsistencies with Machiavellian Intelligence are examined. A “licking to picking” hypothesis is also proposed to explain a potential link between manual grooming and object manipulation, and to reconcile the inconsistencies of Machiavellian Intelligence. Bayesian decision theory, the evolution of the parietal cortex and enhanced proprioception, and analogies with behavioural changes resulting from artificial selection may help provide new insights into the minds of both our primate kin and ourselves.
ARTICLE | doi:10.20944/preprints202305.2154.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: VISLAM; dynamic environments; object detection; geometric constraint; IMU prior constraint
Online: 30 May 2023 (13:29:38 CEST)
Visual inertial SLAM algorithms enable robots to autonomously explore and navigate in unknown scenes. However, most of the current SLAM systems highly rely on static environment assumptions, which fails in the exsitence of motional objects in the real environment. To improve the robustness and localization accuracy of SLAM systems in dynamic scenes, this paper proposes a visual-inertial SLAM framework that fuses semantic and geometric information, called DA-VINS. First, this paper presents a dynamic object classification method based on feature’s current motion state, which obtains temporary static features in the environment. Secondly, a features dynamics check module based on IMU prior and adjacent frame’s geometry constraint is designed to calculate dynamic factors. It also verifies the classification results of temporary static features. Finally, a dynamic adaptive bundle adjustment module based on the features’ dynamic factors is designed to adjust the weights of features in nonlinear optimization. We evaluated our method in public and our dataset. The results show that D-VINS is one of the most real-time, accurate, and robust systems in dynamic scenes.
REVIEW | doi:10.20944/preprints202211.0544.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: pillar-based lake management; object-based lake management; Lake Rawapening
Online: 29 November 2022 (08:49:57 CET)
Lake Rawapening, Semarang Regency, Indonesia, has incorporated a holistic plan in its management practices. However, despite successful target achievements, some limitations remain that a review of its management plan is needed. This paper identifies and analyzes existing lake management strategies as a standard specifically in Lake Rawapening by exploring various literature, both legal frameworks and scholarly articles indexed in Google Scholar and published in Water by MDPI about lake management in many countries. There are two major types of lake management, namely pillar-based and object-based. While the former is the foundation of a conceptual paradigm that does not comprehensively consider the roles of finance and technology in the lake management, the latter indicates the objects to manage so as to create standards or benchmarks for the implementation of various programs. Overall, Lake Rawapening management should include more programs on erosion-sedimentation control and monitoring of operational performance using information systems.
ARTICLE | doi:10.20944/preprints202107.0358.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Metaheuristic algorithms; Health data analytics; Multi-object simulated annealing; optimization
Online: 15 July 2021 (12:03:44 CEST)
Metaheuristic algorithms have been frequently using to tackle optimization problems, however such algorithms in the analysis of health-related data is not commonly used as developing metaheuristic algorithms that work well on health-related data is a difficult task due to complexity of the health data in particular genomics and epigenetics data. One of the important tasks in genomics is to predict genomic elements that are incorporating together to regulate a disease-related genes. Predicting such elements are important as they can be used to develop a personalized cure. In this study, we present for the first time, a multi-object simulated annealing algorithm to identify enhancer-promoter like interactions from Hi-C (chromosome conformation capture) data. These regulatory elements can potentially play vital roles as promoters and/or enhancers in appearance and exacerbation of the regulation of gene.s To evaluate the efficiency of the proposed method, we applied our proposed method and traditional methods on the Hi-C data from mice and compared together. Our results show that the interacting elements identified by our new method are more likely to be functional. The source code of the method is publicly available.
ARTICLE | doi:10.20944/preprints201911.0035.v1
Subject: Engineering, Control And Systems Engineering Keywords: smart environment; smart sensors; distributed architectures; object detection; information integration
Online: 4 November 2019 (03:45:21 CET)
Objects recognition is a necessary task in smart city environments. This recognition can be used in processes such as the reconstruction of the environment map or the intelligent navigation of vehicles. This paper proposes an architecture that integrates heterogeneous distributed information to recognize objects in intelligent environments. The architecture is based on the IoT / Industry 4.0 model to interconnect the devices, called Smart Resources. Smart Resources can process local sensor data and send information to other devices. These other devices can be located in the same operating range, the Edge, in the same intranet, the Fog, or on the Internet, the Cloud. Smart Resources must have an intelligent layer in order to be able to process the information. A system with two Smart Resources equipped with different image sensors has been implemented to validate the architecture. Experiments show that the integration of information increases the certainty in the recognition of objects between 2\% and 4\%. Consequently, in the field of intelligent environments, it seems appropriate to provide the devices with intelligence, but also capabilities to collaborate closely with other devices.
ARTICLE | doi:10.20944/preprints201807.0238.v1
Subject: Computer Science And Mathematics, Hardware And Architecture Keywords: Multiple object tracking; Airborne video; Tracklet confidence; Hierarchical association framework
Online: 13 July 2018 (14:27:22 CEST)
Multi-object tracking (MOT) in airborne videos is a challenging problem due to the uncertain airborne vehicle motion, vibrations of the mounted camera, unreliable detections, size, appearance and motion of the moving objects as well as occlusions due to the interaction between the moving objects and with other static objects in the scene.To deal with these problems, this work proposes a four-stage Hierarchical Association framework for multiple object Tracking in Airborne video (HATA). The proposed framework combines data association-based tracking (DAT) methods and target tracking using a Compressive Tracking approach, to robustly track objects in complex airborne surveillance scenes. In each association stage, different sets of tracklets and detections are associated to efficiently handle local tracklet generation, local trajectory construction, global drifting tracklet correction and global fragmented tracklet linking. Experiments with challenging airborne video datasets show significant tracking improvement compared to existing state-of-art methods.
ARTICLE | doi:10.20944/preprints202305.1967.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: UAV images; multi-feature fusion; information aggregation; multi-scale object detection
Online: 29 May 2023 (04:52:15 CEST)
Unmanned Aerial Vehicles (UAVs) image object detection has great application value in military and civilian fields. However, the objects in the captured images from UAVs have problems of large scale variation, complex backgrounds, and a large proportion of small objects. To resolve these problems, a multi-scale object detector based on coordinate and global information aggregation is proposed, named CGMDet. Firstly, a Coordinate and Global Information Aggregation Module (CGAM) is designed by aggregating local, coordinate, and global information, which can obtain features with richer context information. Secondly, a Multi-Feature Fusion Pyramid Network (MF-FPN) is proposed, which can better fuse features of different scales and obtain features containing more context information through repeated use of feature maps, to better detect multi-scale targets. Moreover, more location information of low-level feature maps is integrated to improve the detection results of small targets. Furthermore, we modified the bounding box regression loss of the model to make the model more accurately regress the bounding box and faster convergence. Finally, the proposed CGMDet was tested on VisDrone and UAVDT datasets and mAP0.5 of 50.9% and 48% was obtained, respectively. At the same time, our detector achieved the best results compared to other detectors.
ARTICLE | doi:10.20944/preprints202101.0423.v1
Subject: Engineering, Automotive Engineering Keywords: autonomous vehicles; SWOT analysis; 3D object detection; artificial intelligence; market dominance
Online: 21 January 2021 (14:50:05 CET)
Scientific and technological advances in telecommunications and onboard electronics, and advances in sustainability standards, dictated major changes to various industrial sectors, including the automotive industry, where hard and soft approaches to manufacturing are vying for market dominance. This work presents a prospective analysis of the autonomous vehicle (AV) market, analyzing three of the main US AV technology firms, Tesla, Waymo and Apple. Their designs and solutions are compared, and prospective scenarios were constructed based on an analysis of their strengths, weaknesses, opportunities and threats (SWOT). The results suggest that Tesla currently exhibits the greatest market leadership in the group studied. However, it was concluded that in the medium term, Waymo would surpass Tesla and assume market leadership. In the long run, it was concluded that Apple will overcome its rivals and dominate this market.
ARTICLE | doi:10.20944/preprints201908.0161.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Navier-Stokes solver; multi-object tracking; collision risk assessment; road scenes
Online: 14 August 2019 (09:26:07 CEST)
Prediction of the likely evolution in the traffic scenes is a challenging task because of high uncertainty of sensing technology and dynamic environment. It leads to failure of planning for intelligent agents like autonomous vehicles. In this paper, we propose a fluid-based physical model to present the influence of surrounding object's motion on driving safety. In our pipeline, the input sensor could be LiDAR, camera, or multi-modal data. We use a Kalman filter to estimate the state space of each detected object, and adopt the properties of stable fluid to build a riskmap based on the density field. The noisy state space are then modeled as the boundary conditions in the simulation of advection and diffusion process. We test our approach on the public KITTI dataset and find this model could handle the short-term prediction in case of misdetection and tracking failure caused by object occlusion, which shows promising in collision risk assessment on road scenes.
Subject: Computer Science And Mathematics, Information Systems Keywords: system-analysis; complex systems; system-theory; biological systems; object oriented programming
Online: 11 April 2019 (11:24:35 CEST)
The following series of Articles about a contextual system-theory, with focus on biological systems, shall give a guideline of how to explore complex systems. It is written not only for scientists, and therefore I will try to describe part of the way of understanding as an intuitive approach. This approach will also help scientists to get a deeper understanding of complexity. Some definitions and rules may seem at first simple and obvious. If practiced in a multidimensional complex system, one will see the difficulties. Even without mathematization or computer modeling the multidimensional sub space, which we try to describe phenomenologically for approaching the understanding of this Subsystem, will need a lot of training. Easily we get lost in the multidimensional world. Some of the chapters here and in the following publications of this series are written for specialists, but it will always come back to the essence of a phenomenological description. Concepts like Symbolic programming, Object oriented modeling, simulation and optimizing as well as experimental and mathematical approaches are the described tools to unpuzzle a complex scenario.
REVIEW | doi:10.20944/preprints202310.0200.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: machine learning; augmented reality; mixed reality; object recognition; action recognition; context analysis
Online: 4 October 2023 (08:03:43 CEST)
A major challenge of augmented and mixed reality applications is identifying the context and semantics of the real environment. Studies on object and action recognition were developed based on the improvement of machine learning techniques, allowing them to be annotated and recognized. This study aims to characterize current knowledge on the use of machine learning for recognizing objects and actions in augmented and mixed reality environments, increasing context awareness. Therefore, a systematic literature review of works related to these topics was made, using the Scopus and Web of Science knowledge bases. We searched articles and conference reviews or papers published between 2018 and 2022 and selected fifteen studies to be reviewed. The results indicate that there is a great demand for using machine learning to immersive technologies in factories, engineering, entertainment, education, health, among other application domains. However, these real-time interactive systems still have challenges and limitations to be solved, involving network communication, prediction time and the creation of a model that recognize objects and actions in broad contexts. Furthermore, additional research is needed to investigate how object and action recognition can increase context awareness in augmented reality applications.
ARTICLE | doi:10.20944/preprints202308.1909.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: mixed reality; object detection; OpenCV; YOLOv8; computer vision; quality inspection; smart manufacturing
Online: 29 August 2023 (08:43:57 CEST)
Quality control is a critical component in industrial manufacturing, directly influencing efficiency, product reliability, and ultimately, customer satisfaction. In the dynamic environment of industrial manufacturing, traditional methods of inspection may not adequately meet the evolving complexity, necessitating innovative approaches to bolster precision and productivity. In this study, we explore the application of mixed reality (MR) technology for real-time quality control in the assembly process. Our methodology involved the integration of smart glasses with a server-based image recognition system, designed to conduct real-time component analysis. The innovative aspect of our study lies in the harmonization of MR and computer vision algorithms, providing immediate visual feedback to inspectors and thereby improving the speed and accuracy of defect detection. YOLOv8 have been adopted in this study for detection object model. The project implementation occurred in a controlled environment to enable a comprehensive evaluation of the system functionality, the identification of possible problems and improvements in the system performance. The results indicated the viability of mixed reality as a powerful tool for enhancing traditional inspection processes. The fusion of MR and computer vision offers possibilities for future advancements in industrial quality control, paving the way for more efficient and reliable manufacturing ecosystems.