ARTICLE | doi:10.20944/preprints202309.1577.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: computer vision; deep learning; multi-modality image registration
Online: 25 September 2023 (04:18:55 CEST)
This study proposes a method for image matching between infrared(IR)-RGB images using a deep learning network to estimate the deformation field. We propose a deformation field generator (DFG) that estimates the deformation field of the transformation matrix to match each pixel or IR image to the RGB image. DFG is a network that receives IR and RGB images as input; the output size is two channels and has the sample resolution as the input image. By warping the IR image through a grid-sampler that warps the image according to the value of the deformation field, we can obtain a warped IR image that matches the RGB image. Additionally, to check whether the warped IR image matched the RGB image, the masking images detecting the segmentation of objects were photographed in two images. Without directly comparing IR and RGB images, we proposed mask loss that warps the IR mask image through the deformation field and grid sampler and then compares the warped IR mask image with the RGB mask image. Mask loss solves the spatial similarity comparison problem with multi-modality images, such as IR and RGB images, by comparing the mask image with the same modality image as the mask image.
ARTICLE | doi:10.20944/preprints202104.0282.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: OpenCV stereo-vision; low-cost stereo-vision; do it yourself stereo-vision; stereoscopic binocular vision; binocular vision; do it yourself stereo-vision; practical guide stereo-vision
Online: 12 April 2021 (12:09:38 CEST)
The paper presents an analysis of the latest developments in the field of stereo vision in the low-cost segment, both for prototypes and for industrial designs. We described the theory of stereo vision and presented information about cameras and data transfer protocols and their compatibility with various devices. The theory in the field of image processing for stereo vision processes is considered and the calibration process is described in detail. Ultimately, we presented the developed stereo vision system and provided the main points that need to be considered when developing such systems. The final, we presented software for adjusting stereo vision parameters in real-time in the python language in the Windows operating system.
Subject: Biology And Life Sciences, Animal Science, Veterinary Science And Zoology Keywords: Retina; Bird vision; Colour vision
Online: 11 March 2020 (16:00:46 CET)
The Avian retina is far less known than that of mammals such as mouse and macaque, and detailed study is overdue. The chicken (Gallus gallus) has potential as a model, in part because research can build on developmental studies of the eye and nervous system. One can expect differences between bird and mammal retinas simply because whereas most mammals have three types of visual photoreceptor birds normally have six. Spectral pathways and colour vision are of particular interest, because filtering by oil droplets narrows cone spectral sensitivities and birds are probably tetrachromatic. The number of receptor inputs is reflected in the retinal circuitry. The chicken probably has four types of horizontal cell, there are at least 11 types of bipolar cell, often with bi- or tri-stratified axon terminals, and there is a high density of ganglion cells, which make complex connections in the inner plexiform layer. In addition, there is likely to be retinal specialisation, for example chicken photoreceptors and ganglion cells have separate peaks of cell density in the central and dorsal retina, which probably serve different types of behaviour.
ARTICLE | doi:10.20944/preprints202110.0363.v1
Subject: Engineering, Control And Systems Engineering Keywords: Oil spills; synthetic aperture radar (SAR); deep convolutional neural networks (DCNNs); vision transformers (ViTs); deep learning; semantic segmentation; marine pollution; remote sensing
Online: 25 October 2021 (15:42:36 CEST)
Oil spillage over a sea or ocean’s surface is a threat to marine and coastal ecosystems. Spaceborne synthetic aperture radar (SAR) data has been used efficiently for the detection of oil spills due to its operational capability in all-day all-weather conditions. The problem is often modeled as a semantic segmentation task. The images need to be segmented into multiple regions of interest such as sea surface, oil spill, look-alikes, ships and land. Training of a classifier for this task is particularly challenging since there is an inherent class imbalance. In this work, we train a convolutional neural network (CNN) with multiple feature extractors for pixel-wise classification; and introduce to use a new loss function, namely ‘gradient profile’ (GP) loss, which is in fact the constituent of the more generic Spatial Profile loss proposed for image translation problems. For the purpose of training, testing and performance evaluation, we use a publicly available dataset with selected oil spill events verified by the European Maritime Safety Agency (EMSA). The results obtained show that the proposed CNN trained with a combination of GP, Jaccard and focal loss functions can detect oil spills with an intersection over union (IoU) value of 63.95%. The IoU value for sea surface, look-alikes, ships and land class is 96.00%, 60.87%, 74.61% and 96.80%, respectively. The mean intersection over union (mIoU) value for all the classes is 78.45%, which accounts for a 13% improvement over the state of the art for this dataset. Moreover, we provide extensive ablation on different Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) based hybrid models to demonstrate the effectiveness of adding GP loss as an additional loss function for training. Results show that GP loss significantly improves the mIoU and F1 scores for CNNs as well as ViTs based hybrid models. GP loss turns out to be a promising loss function in the context of deep learning with SAR images.
ARTICLE | doi:10.20944/preprints201903.0155.v1
Subject: Physical Sciences, Applied Physics Keywords: light pollution, vision, non-vision, DSLRs, ISS
Online: 14 March 2019 (15:44:08 CET)
Night-time lights interact with human physiology through different pathways starting at the retinal layers of the eye, from the signals provided by the rods, the S-, L- and M-cones, and the intrinsically photosensitive retinal ganglion cells (ipRGC). These individual photic channels combine in complex ways to modulate important physiological processes, among them the daily entrainment of the neural master oscillator that regulates circadian rhythms. Evaluating the relative excitation of each type of photoreceptor generally requires full knowledge of the spectral power distribution of the incoming light, information that is not easily available in many practical applications. One such instance is wide area sensing of public outdoor lighting; present-day radiometers onboard Earth-orbiting platforms with sufficient nighttime sensitivity are generally panchromatic and lack the required spectral discrimination capacity. In this paper we show that RGB imagery acquired with off-the-shelf digital single-lens reflex cameras (DSLR) can be a useful tool to evaluate, with reasonable accuracy and high angular resolution, the photoreceptoral inputs associated with a wide range of lamp technologies. The method is based on linear regressions of these inputs against optimum combinations of the associated R, G, and B signals, built for a large set of artificial light sources by means of synthetic photometry. Given the widespread use of RGB imaging devices, this approach is expected to facilitate the monitoring of the physiological effects of light pollution, from ground and space alike, using standard imaging technology.
ARTICLE | doi:10.20944/preprints202105.0119.v1
Subject: Engineering, Automotive Engineering Keywords: Autonomous Driving; Environment Perception; Grid Mapping; Stereo Vision; Mococular Vision
Online: 6 May 2021 (17:24:09 CEST)
Accurately estimating the current state of local traffic scenes is one of the key problems in the development of software components for automated vehicles. In addition to details on free space and drivability, static and dynamic traffic participants, information on the semantics may also be included in the desired representation. Multi-layer grid maps allow to include all this information in a common representation. However, most existing grid mapping approaches only process range sensor measurements such as LIDAR and Radar and solely model occupancy without semantic states. In order to add sensor redundancy and diversity it is desired to add vision based sensor setups in a common grid map representation. In this work, we present a semantic evidential grid mapping pipeline including estimates for eight semantic classes that is designed for straightforward fusion with range sensor data. Unlike in other publication our representation explicitly models uncertainties in the evdiential model. We present results of our grid mapping pipeline based on a monocular vision setup and a stereo vision setup. Our mapping resulsts are accurate and dense mapping due to the incorporation of a disparity- or depth-based ground surface estimation in the inverse perspective mapping. We conclude this paper by providing a detailed quantitative evaluation based on real traffic scenarios in the Kitti odometry benchmark and demonstrating the advantages compared to other semantic grid mapping approaches.
ARTICLE | doi:10.20944/preprints202012.0403.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Computer vision; performance metrics; Yolo3; AWS Rekognition; Azure Computer Vision
Online: 16 December 2020 (10:37:47 CET)
Computer vision is considered as an ally to solve business problems that require human intervention, intelligence and criteria. This topic of research has evolved in XXI century at faster peace, delivering various alternatives from open source until commercial platforms. With so many options and market growing, it result difficult to make a decision on which one to use, or even worse, realize it was not suited for different scenarios. In this paper we analyze five options selected arbitrarily and tested on a dataset of 755 images to detect persons in an image, using object detectors. We analyze elapsed time to process an image, error with observations by humans, number of persons detected, correlation of time and person density, object detected size and F1 Score, considering precision and recall. As we found there are score ties and similar behaviors among options available, we introduce a novel index that takes in consideration the number of persons and their pixel size, to propose the Vision Acuity Index of Computer Vision. The results demonstrate this is a good option to serve as indicator to make decisions. Also, this index proposed have a potential to be expanded for different business use cases, and to measure new proposed algorithms in the future along with the traditional metrics used previously.
ARTICLE | doi:10.20944/preprints202304.0133.v1
Subject: Engineering, Other Keywords: tactile sensing; vision-based tactile sensing; event-based vision; robotic manufacturing
Online: 10 April 2023 (03:06:15 CEST)
Vision-based tactile sensors (VBTS) have become the de facto method of giving robots the ability to obtain tactile feedback from their environment. Unlike other solutions to tactile sensing, VBTS offers high spatial resolution feedback without compromising on instrumentation costs or incurring additional maintenance expenses. However, conventional cameras used in VBTS have a fixed update rate and output redundant data, leading to computational overhead downstream. In this work, we present a neuromorphic vision-based tactile sensor (N-VBTS) that employs observations from an event-based camera for contact angle prediction. Particularly, we design and develop a novel graph neural network, dubbed TactiGraph, that asynchronously operates on graphs constructed from raw N-VBTS streams exploiting their spatiotemporal correlations to perform predictions. Although conventional VBTS uses an internal illumination source, TactiGraph is reported to perform efficiently in both scenarios, with and without an internal illumination source. Rigorous experimental results revealed that TactiGraph achieved a mean absolute error of 0.62∘ in predicting the contact angle and was faster and more efficient than both conventional VBTS and other N-VBTS, with lower instrumentation costs. Specifically, N-VBTS requires only 5.5% of the compute-time needed by VBTS when both are tested on the same scenario.
CASE REPORT | doi:10.20944/preprints202011.0397.v1
Online: 16 November 2020 (08:30:08 CET)
A 31-year-old male noticed blurred vision in his right eye for five days with no obvious predisposing causes, accompanied by mild dizziness. No obvious nodular lesions were found in the body. The patient’s binocular visual acuity was 20/20. Fundus photography showed the optic nerve swelling and radial superficial retinal hemorrhage of both eyes. Blood panel, urine routine, liver, and kidney function were all normal. Total cholesterol, triglycerides, high-density lipoprotein, and low-density lipoprotein were all in the normal limits. Head MRI showed a mass in the right temporal lobe, clear boundary, and multiple separations, which thinned and disappeared closer to the skull. The right temporal lobe and lateral ventricle were all compressed, with the midline structure shifted to the left. The patient was then transferred to Neurosurgery. During the operation, we observed the tumor had invaded the skull. The actual size of the tumor was 5.6 cm × 7.5 cm × 10.1 cm. Histology revealed foam cell accumulation in the mucous connective tissue of the right temporal lobe. The immunohistochemistry showed: CD34 (+), CD99 (+), EMA (−), GFAP (−), IDH-1 (−), Ki-67 (+) index about 10%, Oliga-2 (−), PR (−), S-100 (−), Vim (+), β-Catenin (+), CD1a (−), CD68 (+). Three months after the removal of the tumor, the visual acuity of both eyes was 20/20; The visual fields were normal, the optic disc edema and retinal hemorrhages had disappeared. MRI indicated the midline structure was back to normal.
ARTICLE | doi:10.20944/preprints201912.0116.v1
Online: 9 December 2019 (04:05:48 CET)
Many accidents, such as those involving collisions or trips, appear to involve failures of vision; but the association between accident risk and vision as conventionally assessed, is weak or absent. We addressed this conundrum by embracing the distinction inspired by neuroscientific research, between vision for perception and vision for action. A dual-process perspective predicts that accident vulnerability will be associated more strongly with vision for action than vision for perception. Older and younger adults, with relatively high and relatively low self-reported accident vulnerability (Accident Proneness Questionnaire), completed three behavioural assessments targeting: vision for perception (Freiburg Visual Acuity Test); vision for action (Vision for Action Test - VAT); and the ability to perform physical actions involving balance, walking and standing (Short Physical Performance Battery). Accident vulnerability was not associated with visual acuity or with performance of physical actions; but was associated with VAT performance. VAT assesses the ability to link visual input with a specific action –launching a saccadic eye movement as rapidly as possible, in response to shapes presented in peripheral vision. The predictive relationship between VAT performance and accident vulnerability was independent of age, visual acuity and physical performance scores. Applied implications of these findings are considered.
ARTICLE | doi:10.20944/preprints202212.0221.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Computer vision; Deep learning; Image classification; Loss functions; Vision Transformers; Weather detection
Online: 13 December 2022 (02:30:49 CET)
There is great interest in automatically detecting road weather and understanding its impacts on the overall safety of the transport network. This can, for example, support road condition-based maintenance or even serve as detection systems that assist safe driving during adverse climate conditions. In computer vision, previous work has demonstrated the effectiveness of deep learning in predicting weather conditions from outdoor images. However, training deep learning models to accurately predict weather conditions using real-world road-facing images is difficult due to: (1) the simultaneous occurrence of multiple weather conditions; (2) imbalanced occurrence of weather conditions throughout the year; and (3) road idiosyncrasies, such as road layouts, illumination, road objects etc. In this paper, we explore the use of focal loss function to force the learning process to focus on weather instances that are hard to learn with the objective to help address data imbalance. In addition, we explore the attention mechanism for pixel based dynamic weight adjustment to handle road idiosyncrasies using state-of-the-art vision transformer models. Experiments with a novel multi-label road weather dataset show that focal loss significantly increases the accuracy of computer vision approaches for imbalanced weather conditions. Furthermore, vision transformers outperforms current state-of-the-art convolutional neural networks in predicting weather conditions with a validation accuracy of 92% and F1-score of 81.22%, which is impressive considering the imbalanced nature of the dataset.
ARTICLE | doi:10.20944/preprints202301.0490.v1
Subject: Biology And Life Sciences, Animal Science, Veterinary Science And Zoology Keywords: navigation; behavior; proprioception; pectines; vision
Online: 27 January 2023 (06:24:23 CET)
Many sand scorpions are faithful to burrows they dig, however, it is unknown how these animals get back home after hunting excursions. Of the many mechanisms of homing that exist, path integration (PI) is one of the more common tools used by arachnids. In PI, an animal integrates its distance and direction while leaving its home, enabling it to compute an approximate home-bound vector for the return trip. The objective of our study was to test whether scorpions use PI to return home under absolute darkness in the lab. We first allowed animals to establish burrows in homing arenas. Then, after they left their burrow, we recorded the scorpion’s location in the homing arena before we transferred it to the center of a testing arena. We used overhead IR cameras to record its movements in the testing arena. If scorpions exhibited PI, we predicted they would follow a vector in the test arena that approximated the same angle and distance from the capture point to their burrow in their home arena. However, under the conditions of this experiment, we found no evidence that scorpions moved along such home-bound vectors. We speculate that scorpions may need a reliable reference cue to accommodate path integration.
ARTICLE | doi:10.20944/preprints202207.0021.v1
Subject: Biology And Life Sciences, Biology And Biotechnology Keywords: artificial neural networks; biological neural networks; cortical prosthetic vision; machine vision; neuromorphic hardware; neuroprosthesis
Online: 1 July 2022 (17:01:32 CEST)
Sense element engagement theory explains how neural networks produce cortical prosthetic vision. A major prediction of the theory can be tested by developing a device which is expected to enable perception of continuous forms in altered visual geometries. The research reported here completes several essential steps in developing this device: (1) replication of simulations that are consistent with the theory using the NEST simulator, which can also be used for full-scale network emulation by a neuromorphic computer; (2) testing whether results consistent with the theory survive increasing the scale and duration of simulations; (3) establishing a method that uses numbers of spikes produced by network neurons to report the number of phosphenes produced by cortical stimulation; and (4) simulating essential functions of the prosthetic device. NEST simulations replicated early results and increasing their scale and duration produced results consistent with the theory. A decision function created using multinomial logistic regression correctly classified the expected number of phosphenes for 2080 spike number distributions for each of three sets of data, half of which arise from simulations expected to yield continuous visual forms on an altered visual geometry. A process for modulating electrical stimulation amplitude based on intermittent population recordings that is predicted to produce continuous visual forms was successfully simulated. The classification function developed using logistic regression will be used to tune this process as the scale of simulations is further increased.
ARTICLE | doi:10.20944/preprints202308.0068.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Machine Learning; Computer Vision; Automated measurement
Online: 1 August 2023 (10:20:49 CEST)
Regular inspections during construction work verify that the work completed is consistent with the plans and specifications and ensure that it is within the planned time and budget. This requires frequent physical site observations to independently measure and verify the completion percentage of the construction progress performed over periods of time. The current computer vision-based (CV) techniques for the measurement of as-built elements, predominantly use 3D laser scanning or 3D Photogrammetry modelling to determine the geometrical properties of as-built elements on construction sites. Both techniques require data acquisition from several positions and angles to generate sufficient information about the element’s coordinates making the deployment of these techniques on dynamic construction project sites a challenging task. In this paper, we propose a pipeline for automating the measurement of as-built components using artificial intelligence (AI) and computer vision (CV) techniques. The pipeline requires a single image obtained with a stereo-camera system to measure the size of selected objects or as-built components. We demonstrate our approach by measuring the size of concrete walls and columns. The novelty of this work is attributed to the fully automated CV-based method for measuring any given element using a single image only. The proposed solution is suitable for use in measuring the sizes of as-built components of built assets. It has the potential to be further developed and integrated with BIM models for use on construction projects for progress monitoring.
REVIEW | doi:10.20944/preprints202306.0360.v1
Subject: Biology And Life Sciences, Neuroscience And Neurology Keywords: eye movements; visual system; mouse vision
Online: 6 June 2023 (02:44:55 CEST)
The mouse visual system recently became the most popular model to study the cellular and circuit mechanisms of sensory processing. However, the importance of eye movements in mice only started to be appreciated recently. Eye movements provide a basis for active sensing and deliver insights into various brain functions and dysfunctions. A plethora of knowledge on the central control of eye movements and their role in perception and behaviour arose from work on primates. However, an overview of the known eye movement types in mice and a comparison to primates is missing.Here, we review the eye movement types described to date in mice and compare them to those observed in primates. We discuss the central neuronal mechanisms for their generation and control. Furthermore, we review the mounting literature on eye movements in mice during head-fixed and freely moving behaviours. Finally, we highlight gaps in our understanding and suggest future directions for research.
BRIEF REPORT | doi:10.20944/preprints202207.0419.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: computer vision; deep learning; CoughNet model
Online: 27 July 2022 (10:01:54 CEST)
To solve two key problems in the identification of people who are infected with COVID-19: the first problem is that the identification accuracy is not high enough. The second problem is that present identification method such as nucleic acid testing is expensive in many countries. Methods: So, I decided to design a fast identification method for COVID-19 patients which is based on deep learning. After the model (CoughNet) learns more than 6,000 cough spectrograms of both COVID-19 patients and normal people, the accuracy rate of identification of COVID-19 patients and normal people is higher than 99% in the test set. Structure: This paper is mainly divided into three parts: the first part introduces the main background and research status of the research; The second part introduces the research methods; The third part introduces the specific process of the experiment.
ARTICLE | doi:10.20944/preprints202010.0167.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: daltonisation; colour vision deficiencies; anisotropic diffusion
Online: 8 October 2020 (09:33:55 CEST)
Daltonisation refers to the recolouring of images such that details normally lost by colour vision deficient observers become visible. This comes at the cost of introducing artificial colours. In a previous work, we presented a gradient-domain colour image daltonisation method that outperformed previously known methods both in behavioural and psychometric experiments. In the present paper, we improve the method by (i) finding a good first estimate of the daltonised image, thus reducing the computational time significantly, and (ii) introducing local linear anisotropic diffusion, thus effectively removing the halo artefacts. The method uses a colour vision deficiency simulation algorithm as an ingredient, and can thus be applied for any colour vision deficiency, and can even be individualised if the exact individual colour vision is known.
REVIEW | doi:10.20944/preprints202003.0076.v2
Subject: Biology And Life Sciences, Insect Science Keywords: retina; vision; ambystoma; salamander; mudpuppy; axolotl
Online: 19 April 2020 (08:06:38 CEST)
Salamanders have been habitual residents of research laboratories for more than a century, and their history in science is tightly interwoven with vision research. Nevertheless, many vision scientists – even those working with salamanders – may be unaware of how much our knowledge about vision, and particularly the retina, has been shaped by studying salamanders. In this review, we take a tour through the salamander history in vision science, highlighting the main contributions of salamanders to our understanding of the vertebrate retina. We further point out specificities of the salamander visual system and discuss the perspectives of this animal system for future vision research.
REVIEW | doi:10.20944/preprints201811.0498.v1
Subject: Biology And Life Sciences, Animal Science, Veterinary Science And Zoology Keywords: color vision; cone photoreceptors; opponency; retina
Online: 20 November 2018 (11:14:49 CET)
Vertebrate color vision is evolutionarily ancient. Jawless fish evolved four main spectral types of cone photoreceptor, almost certainly complemented by retinal circuits to process chromatic opponent signals. Subsequent evolution of photoreceptors and visual pigments are now documented for many vertebrate lineages and species, giving insight into evolutionary variation and ecological adaptation of color vision. We look at organization of the photoreceptor mosaic and the functions different types of cone in teleost fish, primates, and birds and reptiles. By comparison less is known about the underlying neural processing. Here we outline the diversity of vertebrate color vision and summarize our understanding of how spectral information picked up by animal photoreceptor arrays is adapted to natural signals. We then turn to the question of how spectral information is processed in the retina. Here, the quite well known and comparatively ‘simple’ system of mammals such as mice and primates reveals some evolutionarily conserved features such as the mammalian BlueON system which compares short and long wavelength receptors signals. We then survey our current understanding of the more complex circuits of fish, amphibians, birds and reptiles. Together, these clades make up more than 90% of vertebrate species, yet we know disturbingly little about their neural circuits for colour vision beyond the photoreceptors. Here, long-standing work on goldfish, freshwater turtles and other species is being complemented by new insights gained from the experimentally amendable retina of zebrafish. From this body of work, one thing is clear: The retinal basis of colour vision in non-mammalian vertebrates is substantially richer compared to mammals: Diverse and complex spectral tunings are established at the level of the cone output via horizontal cell feedforward circuits. From here, zebrafish use cone-selective wiring in bipolar cells to set-up color opponent synaptic layers in the inner retina, which in turn lead a large diversity of color-opponent channels for transmission to the brain. However, while we are starting to build an understanding of the richness of spectral properties in some of these species’ retinal neurons, little is known about inner retinal connectivity and cell-type identify. To gain an understanding of their actual circuits, and thus to build a more generalised understanding of the vertebrate retinal basis of color vision, it will be paramount to expand ongoing efforts in deciphering the retinal circuits of non-mammalian models.
ARTICLE | doi:10.20944/preprints202206.0426.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: event-based vision; object detection and tracking; high-temporal resolution tracking; frame-based vision; hybrid approach
Online: 30 June 2022 (09:54:14 CEST)
Event-based vision is an emerging field of computer vision that offers unique properties such as asynchronous visual output, high temporal resolutions, and dependence on brightness changes to generate data. These properties can enable robust high-temporal-resolution object detection and tracking when combined with frame-based vision. In this paper, we present a hybrid, high-temporal-resolution, object detection and tracking approach, that combines learned and classical methods using synchronized images and event data. Off-the-shelf frame-based object detectors are used for initial object detection and classification. Then, event masks, generated per each detection, are used to enable inter-frame tracking at varying temporal resolutions using the event data. Detections are associated across time using a simple low-cost association metric. Moreover, we collect and label a traffic dataset using the hybrid sensor DAVIS 240c. This dataset is utilized for quantitative evaluation using state-of-the-art detection and tracking metrics. We provide ground truth bounding boxes and object IDs for each vehicle annotation. Further, we generate high-temporal-resolution ground truth data to analyze the tracking performance at different temporal rates. Our approach shows promising results with minimal performance deterioration at higher temporal resolutions (48 – 384 Hz) when compared with the baseline frame-based performance at 24 Hz.
ARTICLE | doi:10.20944/preprints202307.1090.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: computer vision; fertilizers; germination; morphometry; wheat; seedlings
Online: 17 July 2023 (10:52:36 CEST)
Image analysis is widely applied in plant science for phenotyping and monitoring botanic and agricultural species. Although a lot of software is available, tools integrating image analysis and statistical assessment of seedling growth in large groups of plants are limited or absent, and do not cover the needs of the researchers. In this study, we developed Morley, a free, open-source graphical user interface written in Python. Morley automates the following workflow: (1) group-wise analysis of a few thousand seedlings from multiple images; (2) recognition of seeds, shoots and roots in seedling images; (3) calculation of shoot and root lengths and surface areas, (4) evaluation of statistically significant differences between plant groups, (5) calculation of germination rates, (6) visualization and interpretation. Morley is designed for laboratory studies of biotic effects on seedling growth, when molecular mechanisms underlying morphometric changes are analyzed. Performance was tested using cultivars of T. aestivum, P. sativum on seedlings of up to 1 week old. Accuracy of the measured morphometric parameters was comparable with the ones obtained using ImageJ and manual measurements. Dose-dependent laboratory tests for germination affected by new bioactive compounds and fertilizers, assuming extraction of seedlings from a substrate and/or dissection are among the suggested applications.
ARTICLE | doi:10.20944/preprints202306.0033.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Computer vision; 3D human pose estimation; Transformer
Online: 1 June 2023 (05:00:33 CEST)
Existing methods for 3D human pose estimation mainly divide the task into two stages. The first stage identifies the 2D coordinates of the human joints in the input image, namely the 2D human joint coordinates. The second stage uses the results from the first stage as input to recover the depth information of human joints from the 2D human joint coordinates to achieve 3D human pose estimation. However, the recognition accuracy of the two-stage method relies heavily on the results of the first stage and includes too many redundant processing steps, which reduces the inference efficiency of the network. To address these issues, we propose the EDD, a fully End-to-end 3D human pose estimation method based on transformer architecture with Dual Decoders. By learning multiple human poses, the model can directly infer all 3D human poses in the image using a pose decoder, and then further optimize the recognition result using a joint decoder based on the kinematic relations between joints. With the attention mechanism, this method can adaptively focus on the most relevant features to the target joint, effectively overcoming the feature misalignment problem in the human pose estimation task and greatly improving the model performance. Any complex post-processing step, such as non-maximum suppression, is eliminated, further improving the efficiency of the model. The results show that the method achieves an accuracy of 87.4% on the MuPoTS-3D dataset, significantly improving the accuracy of end-to-end 3D human pose estimation methods.
REVIEW | doi:10.20944/preprints202305.2164.v1
Subject: Computer Science And Mathematics, Analysis Keywords: machine vision; pose measurement algorithms; accuracy; applications
Online: 31 May 2023 (03:33:34 CEST)
This review paper provides a comprehensive overview of machine vision pose measurement algorithms. The paper focuses on the state-of-the-art algorithms and their applications. The paper is structured as follows: The introduction in Section 1 provides a brief overview of the field of machine vision pose measurement. Section 2 describes the commonly used algorithms for machine vision pose measurement. Section 3 discusses the factors that affect the accuracy and reliability of machine vision pose measurement algorithms. Section 4 presents the applications of machine vision pose measurement in various fields. The paper provides specific examples of how machine vision pose measurement is used in each of these fields. Finally, Section 5 summarizes the paper and provides future research directions. The paper highlights the need for more robust and accurate algorithms that can handle varying lighting conditions and occlusion. It also suggests that the integration of machine learning techniques may improve the performance of machine vision pose measurement algorithms. Overall, this review paper provides a comprehensive overview of machine vision pose measurement algorithms, their applications, and the factors that affect their accuracy and reliability. It provides a valuable resource for researchers and practitioners working in the field of computer vision.
ARTICLE | doi:10.20944/preprints202303.0345.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: neural architecture search; machine learning; computer vision
Online: 29 March 2023 (02:14:14 CEST)
Existing one-shot neural architecture search (NAS) methods have to conduct a search over a giant super-net, which leads to the huge computational cost. To reduce such cost, in this paper, we propose a method, called FTSO, to divide the whole architecture search into two sub-steps. Specifically, in the first step, we only search for the topology, and in the second step, we search for the operators. FTSO not only reduces NAS’s search time from days to 0.68 seconds, but also significantly improves the found architecture's accuracy. Our extensive experiments on ImageNet show that within 18 seconds, FTSO can achieve a 76.4% testing accuracy, 1.5% higher than the SOTA, PC-DARTS. In addition, FTSO can reach a 97.77% testing accuracy, 0.27% higher than the SOTA, with nearly 100% (99.8%) search time saved, when searching on CIFAR10.
ARTICLE | doi:10.20944/preprints202303.0161.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: deepfake detection; deep learning; computer vision; generalization
Online: 9 March 2023 (02:13:46 CET)
The increasing use of deep learning techniques to manipulate images and videos, commonly referred to as "deepfakes," is making more and more challenging to differentiate between real and fake content. While various deepfake detection systems have been developed, they often struggle to detect deepfakes in real-world situations. In particular, these methods are often unable to effectively distinguish images or videos when these are modified using novel techniques which have not been used in the training set. In this study, we carry out an analysis of different deep learning architectures in an attempt to understand which is more capable of better generalizing the concept of deepfake. According to our results, it appears that Convolutional Neural Networks (CNNs) seem to be more capable of storing specific anomalies and thus excel in cases of datasets with a limited number of elements and manipulation methodologies. The Vision Transformer, conversely, is more effective when trained with more varied datasets, achieving more outstanding generalization capabilities than the other methods analysed. Finally, the Swin Transformer appears to be a good alternative for using an attention-based method in a more limited data regime. All the analyzed architectures seem to have a different way to look at deepfakes but since in a real-world environment, the generalization capability is essential, based on the carried out experiments the Vision Transformer seems to provide superior performances.
ARTICLE | doi:10.20944/preprints202302.0097.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Vision loss; Diabetic retinopathy; Image enhancement; APTOS
Online: 6 February 2023 (09:50:58 CET)
Vision loss can be avoided if diabetic retinopathy (DR) is diagnosed and treated promptly. Following are the main 5 DR stages: none, moderate, mild, proliferate, and severe. In this study, a deep learning (DL) model is presented that diagnoses all 5 stages of DR with more accuracy than previous methods. The suggested method presents two scenarios: case 1 with image enhancement using contrast limited adaptive histogram equalization (CLAHE) filtering algorithm in conjunction with an Enhanced Super-resolution generative adversarial network (ESRGAN), and case 2 without image enhancement; augmentation techniques are then performed to generate a balanced dataset utilizing the same parameters for both cases. Using Inception-V3 applied to the Asia Pacific Tele-Ophthalmology Society (APTOS) datasets, the developed model achieved an accuracy of 98.7% for case 1 and 80.87% for case 2, which is greater than existing methods for detecting the five stages of DR. It was demonstrated that using CLAHE and ESRGAN improves a model's performance and learning ability.
ARTICLE | doi:10.20944/preprints202007.0326.v1
Subject: Engineering, Control And Systems Engineering Keywords: mobile robot; vision-based navigation; cascade classifiers
Online: 15 July 2020 (09:16:44 CEST)
This work presents the development and implementation of a distributed navigation system based on computer vision. The autonomous system consists of a wheeled mobile robot with an integrated colour camera. The robot navigates through a laboratory scenario where the track and several traffic signals must be detected and recognized by using the images acquired with its on-board camera. The images are sent to a computer server that processes them and calculates the corresponding speeds of the robot using a cascade of trained classifiers. These speeds are sent back to the robot, which acts to carry out the corresponding manoeuvre. The classifier cascade should be trained before experimentation with two sets of positive and negative images. The number of images in these sets should be considered to limit the training stage time and avoid overtraining the system.
ARTICLE | doi:10.3390/sci2010008
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: numismatics; Roman; Rome; deep learning; computer vision
Online: 2 March 2020 (00:00:00 CET)
In recent years, a range of problems under the broad umbrella of computer vision based analysis of ancient coins have been attracting an increasing amount of attention. Notwithstanding this research effort, the results achieved by the state of the art in published literature remain poor and far from sufficiently well performing for any practical purpose. In the present paper we present a series of contributions which we believe will benefit the interested community. We explain that the approach of visual matching of coins, universally adopted in existing published papers on the topic, is not of practical interest because the number of ancient coin types exceeds by far the number of those types which have been imaged, be it in digital form (e.g., online) or otherwise (traditional film, in print, etc.). Rather, we argue that the focus should be on understanding the semantic content of coins. Hence, we describe a novel approach—to first extract semantic concepts from real-world multimodal input and associate them with their corresponding coin images, and then to train a convolutional neural network to learn the appearance of these concepts. On a real-world data set, we demonstrate highly promising results, correctly identifying a range of visual elements on unseen coins with up to 84% accuracy.
CONCEPT PAPER | doi:10.20944/preprints201910.0059.v1
Subject: Medicine And Pharmacology, Ophthalmology Keywords: myopia progression; environmental factors; vision care knowledge
Online: 7 October 2019 (10:55:03 CEST)
Importance: Because of the high prevalence of myopia in Taiwan, understanding the risk factors for its development and progression is important to public health. Background: This study investigated the risk factors for myopia and their influence on the progression of myopia in schoolchildren in Taiwan. Design: Patients’ clinical records were obtained retrospectively from ophthalmologists. Questionnaires were given to collect demographic information, family background, hours spent on daily activities, myopia progression, and treatment methods. Participants: A total of 522 schoolchildren with myopia from a regional medical hospital in northern Taiwan participated the study. Written informed consent was obtained from the participants of legal age or the parents or legal guardians. Methods: Multivariable regression analyses were performed. Myopia measured in dioptres was analysed, controlling for patients’ family and demographic information as well as their daily behaviours. Main Outcome Results: Children with high myopic parents were more myopic. Earlier onset age of myopia was associated with a higher level of myopia and greater annual myopic progression. Children reporting more near work activities had higher levels of myopia and greater progression of myopia. Lower levels of myopia were associated with more exercise, longer periods of sleep, and better vision care knowledge in children and parents. Intake of food supplements had no effect on myopia. Conclusions and Relevance: In addition to genetics, education, environment, and near work activity can influence the development of myopia. Health policies for schoolchildren should promote protective activities and vision care knowledge in order to protect the eyesight of schoolchildren.
ARTICLE | doi:10.20944/preprints201906.0105.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: image analysis; machine learning; algorithms; computer vision
Online: 12 June 2019 (12:39:18 CEST)
Spike shape and morphometric characteristics are among the key characteristics of cultivated cereals associated with their productivity. Identification of the genes controlling these traits requires morphometric data at harvesting and analysis of numerous plants, which could be automatically done using technologies of digital image analysis. A method for wheat spike morphometry utilizing 2D image analysis is proposed. Digital images are acquired in two variants: a spike on a table (one projection) or fixed with a clip (four projections). The method identifies spike and awns in the image and estimates their quantitative characteristics (area in image, length, width, circularity, etc.). Section model, quadrilaterals, and radial model are proposed for describing spike shape. Parameters of these models are used to predict spike shape type (spelt, normal, or compact) by machine learning. The mean error in spike density prediction for the images in one projection is 4.61 (~18%) versus 3.33 (~13%) for the parameters obtained using four projections.
ARTICLE | doi:10.20944/preprints201812.0232.v1
Subject: Engineering, Control And Systems Engineering Keywords: Computer vision, Data Augmentation, Fine- Tuning, Imagenet
Online: 19 December 2018 (07:57:03 CET)
In this paper, we leverage state of the art models on Imagenet data-sets. We use the pre-trained model and learned weighs to extract the feature from the Dog breeds identification data-set. Afterwards, we applied fine-tuning and dataaugmentation to increase the performance of our test accuracy in classification of dog breeds datasets. The performance of the proposed approaches are compared with the state of the art models of Image-Net datasets such as ResNet-50, DenseNet-121, DenseNet-169 and GoogleNet. we achieved 89.66% , 85.37% 84.01% and 82.08% test accuracy respectively which shows the superior performance of proposed method to the previous works on Stanford dog breeds datasets.
ARTICLE | doi:10.20944/preprints201806.0449.v1
Subject: Engineering, Control And Systems Engineering Keywords: surface electromyography; computer vision; grasping; assistive robotics
Online: 27 June 2018 (15:01:06 CEST)
This paper presents a system that merges computer vision and surface electromyography techniques to carry out grasping tasks. To perform this, the vision-driven system is used to compute pre-grasping poses of the robotic system based on the analysis of tridimensional object features. Then, the human operator can correct the pre-grasping pose of the robot using surface electromyographic signals from the forearm during wrist flexion and extension. Weak wrist flexions and extensions allow a fine adjustment of the robotic system to grasp the object and finally, when the operator considers that the grasping position is optimal, a strong flexion is performed to initiate the grasping of the object. The system has been tested with several subjects to check its performance showing a grasping accuracy of around 95% of the attempted grasps which increases by around a 9% the grasping accuracy of previous experiments in which electromyographic control was not implemented.
ARTICLE | doi:10.20944/preprints201805.0297.v1
Subject: Engineering, Civil Engineering Keywords: bridge maintenance and inspection; UAVs; machine vision
Online: 22 May 2018 (10:09:37 CEST)
The economic development and infrastructure of a nation are closely interrelated. In addition, public trust in national infrastructure facilities is closely linked to the preservation of the advantages provided by these facilities to the public. Since the 1970s, Korea has achieved exponential economic growth over a short period of time and the number of infrastructure facilities has increased correspondingly. This compressed economic development has been underpinned by the national infrastructure, whose safety and usability have been excluded from the scope of the development. However, after around 30 years, structural deterioration coupled with general insensitivity to safety in today’s society has considerably reduced public trust in using the infrastructure. Realistically, policies that mainly focus on developing new technologies related to infrastructure construction have led to practical limitations that discourage the development of technologies for maintenance or inspection. Furthermore, current maintenance works face certain limitations caused by various reasons: insufficient budget, increasing number of infrastructure facilities requiring maintenance, shortage of manpower, and rapidly increasing number of aging infrastructure facilities. To overcome these limitations, a new approach is required that is different from general inspection methods under the existing rules and regulations. In this context, this study aimed to explore the efficiency of bridge inspection and maintenance by unmanned aerial vehicles (UAVs) that could observe inaccessible areas, could be conveniently and easily controlled, and could offer high economic benefits. To this end, various tests were performed on elevated bridges, and suitable UAV images were obtained. The obtained UAV images were inspected by using machine vision technology, thereby excluding subjective evaluations by humans. Methods for enhancing the objectivity of the inspection were also discussed. The test results showed that both the efficiency and objectivity of the proposed method were better than those of the existing bridge maintenance and inspection methods.
TECHNICAL NOTE | doi:10.20944/preprints202209.0380.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: UAV Technology; Information Processing; Machine Learning; IR Technology; Covid-19; Multi-modal machine learning; Machine vision; Computer vision
Online: 26 September 2022 (05:46:03 CEST)
Tracking and early identification of suspected cases are essential to control and prevent potential COVID-19 outbreaks. One of the most popular techniques used to track this disease is the use of Infrared cameras to identify individuals with elevated body temperatures. However, they are limited by their inability to be implemented in open public settings such as public parks or even outdoor recreational centers. This limits their ability to effectively track possible COVID-19 patients as open public recreational places such as parks, concert venues and other public venues are hotspots for the spreading of the virus. Other technological solutions such as thermal scanners require an individual to perform the actual testing as they are not individual standalone technologies. This method of testing can potentially cause the transmission of the virus between the tester and the individual getting tested. As can be seen, an alternative solution is essential to solving this issue. In this study, we aim to present the system, design and potential scope of a non-invasive system that can diagnose and identify potential COVID-19 patients using thermal and optical images of the individual using drone technology. The proposed system (COVIDRONE) combines multi-modal machine intelligence, computer vision and real-time monitoring to enable scalable monitoring. The system will also involve the use of machine learning algorithms for better and more accurate diagnosis. We envisage that development of such technologies may help in developing technological solutions to combat infectious disease threats in the future pandemics.
ARTICLE | doi:10.20944/preprints202307.1125.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Mexican sign language; Dataset; Hand-gestures; Computer-vision
Online: 17 July 2023 (16:17:50 CEST)
In Mexico, the incorporation of deaf people into education has been lacking since only 14% of the deaf population in the age group between 3 and 29 years access education with the support of a hearing aid. Additionally, those who have been incorporated frequently face inappropriate educational strategies which poorly develop the use of Mexican Sign Language (MSL) and therefore academical success and opportunities for insertion in the workplace are difficult. This research explores a novel mexican sign language lexicon video dataset containing the dynamical gestures most frequently used by MSL. Each gesture consists of a set of different versions of videos under uncontrolled conditions. MX-ITESO-100 data set is composed of a lexicon of 100 gestures and 5,000 videos from three participants with different grammatical elements. Additionally, the data set is evaluated in a two-step neural network model with an accuracy greater than 99%. and thus serves as a benchmark for future training of machine learning models in computer vision systems. Finally, this research provides an inclusive environment within society and organizations in particular for people with hearing impairment.
REVIEW | doi:10.20944/preprints202307.0271.v1
Subject: Medicine And Pharmacology, Neuroscience And Neurology Keywords: Mice Behavior Analysis; Mice Model; AI; Computer Vision
Online: 5 July 2023 (07:19:18 CEST)
Mice are one of the frequently used animal models in science research whose behavioral characteristics can provide much valuable information in biology, neuroscience, and pharmacology. Nowadays, artificial intelligence is widely used in mice behavior analysis. Integrated AI systems such as ChatGPT and VisualGPT are already available, and we discuss the feasibility of MiceGPT to help researchers identify and classify mouse behavior more easily. We review the applications of mice behavior analysis, analyze the tasks of deep learning on these applications based on an AI pyramid, and finally summarize the AI approaches to solve these tasks. Based on these summaries, we propose three MiceGPT architectures to demonstrate the theoretical feasibility of MiceGPT.
ARTICLE | doi:10.20944/preprints202304.0066.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Waste Classification, Deep Learning, Waste Management, Computer Vision
Online: 5 April 2023 (15:25:05 CEST)
Computer vision methods have shown to be effective in classifying garbage into recycling categories for waste processing, existing methods are costly, imprecise, and unclear. To tackle this issue, we introduce MWaste, a mobile application that uses computer vision and deep learning techniques to classify waste materials as trash, plastic, paper, metal, glass or cardboard. Its effectiveness was tested on various neural network architectures and real world images, achieving an average precision of 92% on the test set. This app can help combat climate change by enabling efficient waste processing and reducing the generation of greenhouse gases caused by incorrect waste disposal.
ARTICLE | doi:10.20944/preprints202302.0218.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: diabetic retinopathy; Vision loss; Deep learning; CLAHE; ESRGAN
Online: 13 February 2023 (14:33:05 CET)
Sometimes when diabetic retinopathy (DR) is found and treated quickly, vision loss can indeed be spared. This study deploys a deep learning (DL) model that can discover all 5 stages of DR more accurately than other methods. The proposed methodology shows two cases scenarios: case 1 with image enhancement using CLAHE and ESRGAN, and case 2 without image enhancement. Augmentation techniques are then employed to produce a balanced dataset with the identical criteria for both scenarios. The generated model using DenseNet-121 on the APTOS dataset outperformed other approaches for locating the 5 stages of DR, with an accuracy of 98.7 percent for case 1 and 81.2 percent for case 2. Using CLAHE and ESRGAN was shown to improve a model's performance and ability to learn.
ARTICLE | doi:10.20944/preprints202206.0148.v1
Subject: Arts And Humanities, Architecture Keywords: false-class inclusions; serendipity; machine vision; creativity; innovativeness
Online: 10 June 2022 (04:35:14 CEST)
In the mid-layers of Deep Learning systems, clustered features tend to fit multiple classifications, which are filtered out during the final stages of object recognition. However, many misclassifications remain, regarded as errors of the system. This paper claims that tagging an entity incorrectly for reasons of similarity is evidence of spontaneous machine creativeness. According to the ratings of 40 design educators and researchers, AI-generated false-class inclusions produced creative design ideas, predicting the level of innovation value. These designers were not just anybody but came from a design school in Asia with a top position on the world ranking-lists. They entered an experiment in which 20 classification mistakes were framed as early-design ideas that were either human-made or intentionally suggested by creative AI. Many examples passed the Feigenbaum variant of the Turing test with a conceptual preference to creations supposedly done by human hand.
ARTICLE | doi:10.3390/sci2010013
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: deep learning; computer vision; Cycle-GAN; image reconstruction
Online: 12 March 2020 (00:00:00 CET)
In this paper, our goal is to perform a virtual restoration of an ancient coin from its image. The present work is the first one to propose this problem, and it is motivated by two key promising applications. The first of these emerges from the recently recognised dependence of automatic image based coin type matching on the condition of the imaged coins; the algorithm introduced herein could be used as a pre-processing step, aimed at overcoming the aforementioned weakness. The second application concerns the utility both to professional and hobby numismatists of being able to visualise and study an ancient coin in a state closer to its original (minted) appearance. To address the conceptual problem at hand, we introduce a framework which comprises a deep learning based method using Generative Adversarial Networks, capable of learning the range of appearance variation of different semantic elements artistically depicted on coins, and a complementary algorithm used to collect, correctly label, and prepare for processing a large numbers of images (here 100,000) of ancient coins needed to facilitate the training of the aforementioned learning method. Empirical evaluation performed on a withheld subset of the data demonstrates extremely promising performance of the proposed methodology and shows that our algorithm correctly learns the spectra of appearance variation across different semantic elements, and despite the enormous variability present reconstructs the missing (damaged) detail while matching the surrounding semantic content and artistic style.
Subject: Social Sciences, Cognitive Science Keywords: eyetracking, eye movements, gaze, memory, retrieval, vision, aging
Online: 20 May 2019 (12:25:44 CEST)
Eye movements support memory encoding by binding distinct elements of the visual world into coherent representations. However, the role of eye movements in memory retrieval is less clear. We propose that eye movements play a functional role in retrieval by reinstating the encoding context. By overtly shifting attention in a manner that broadly recapitulates the spatial locations and temporal order of encoded content, eye movements facilitate access to, and reactivation of, associated details. Such mnemonic gaze reinstatement may be obligatorily recruited when task demands exceed cognitive resources, as is often observed in older adults. We review research linking gaze reinstatement to retrieval, describe the neural integration between the oculomotor and memory systems, and discuss implications for models of oculomotor control, memory, and aging.
ARTICLE | doi:10.20944/preprints201711.0021.v1
Subject: Engineering, Control And Systems Engineering Keywords: calibration; binocular vision sensor; unknown-sized elliptical stripe
Online: 2 November 2017 (17:37:06 CET)
Most of the existing calibration methods for binocular stereo vision sensor (BSVS) depend on high-accuracy target with feature points that are difficult to manufacture and costly. In complex light conditions, optical filters are used for BSVS, but they affect imaging quality. Hence, the use of a high-accuracy target with certain-sized feature points for calibration is not feasible under such complex conditions. To solve these problems, a calibration method based on unknown-sized elliptical stripe images is proposed. With known intrinsic parameters, the proposed method adopts the elliptical stripes located on the parallel planes as a medium to calibrate BSVS online. In comparison with the common calibration methods, the proposed method avoids utilizing high-accuracy target with certain-sized feature points. Therefore, the proposed method is not only easy to implement but is a realistic method for the calibration of BSVS with optical filter. Changing the size of elliptical curves projected on the target solves the difficulty of applying the proposed method in different fields of view and distances. Simulative and physical experiments are conducted to validate the efficiency of the proposed method. When the field of view is approximately 400 mm × 300 mm, the proposed method can reach a calibration accuracy of 0.03 mm, which is comparable with that of Zhang’s method.
ARTICLE | doi:10.20944/preprints201608.0186.v1
Subject: Computer Science And Mathematics, Geometry And Topology Keywords: active vision; the conformal camera; the Riemann sphere; Möbius geometry; complex projective geometry; projective Fourier transform; retinotopy; binocular vision; horopter
Online: 20 August 2016 (11:24:25 CEST)
Primate vision is an active process that constructs a stable internal representation of the 3D world based on 2D sensory inputs that are inherently unstable due to incessant eye movements. We present here a mathematical framework for processing visual information for a biologically-mediated active vision stereo system with asymmetric conformal cameras. This model utilizes the geometric analysis on the Riemann sphere developed in the group-theoretic framework of the conformal camera, thus far only applicable in modeling monocular vision. The asymmetric conformal camera model constructed here includes the fovea’s asymmetric displacement on the retina and the eye’s natural crystalline lens tilt and decentration, as observed in ophthalmological diagnostics. We extend the group-theoretic framework underlying the conformal camera to the stereo system with asymmetric conformal cameras. Our numerical simulation shows that the 1 theoretical horopter curves in this stereo system are conics that well approximate the empirical longitudinal horopters of the primate vision system.
ARTICLE | doi:10.20944/preprints202309.1150.v1
Subject: Computer Science And Mathematics, Robotics Keywords: Orchard robot; Autonomous navigation; Positional parameters; Machine vision; YOLO
Online: 19 September 2023 (04:00:26 CEST)
The relative position of the orchard robot to the rows of fruit trees is an important parameter for achieving autonomous navigations. The current methods for estimating the position parameters between rows of orchard robots obtain low parameter accuracy, and to address this problem, this paper proposes a machine vision-based method for detecting the relative position of orchard robots and fruit tree rows. Firstly, the fruit tree trunk is identified based on the improved YOLOv4 model; secondly, the camera coordinates of the tree trunk are calculated from the principle of binocular camera triangulation, and the ground projection coordinates of the tree trunk are obtained through coordinate conversion; finally, the midpoints of the projection coordinates of different sides are combined and the navigation path is obtained by linear fitting with the least squares method, and the position parameters of the orchard robot are obtained through calculation. The experimental results show that the average accuracy and average recall of the improved YOLOv4 model for fruit tree trunk detection are 97.05% and 95.42%, respectively, which are 5.92 and 7.91 percentage points higher than those of the original YOLOv4 model. The average errors of heading angle and lateral deviation estimates obtained based on the method in this paper are 0.57° and 0.02 m. The method can accurately calculate heading angle and lateral deviation values at different positions between rows, and can provide a reference for autonomous visual navigation of orchard robots.
REVIEW | doi:10.20944/preprints202308.1210.v1
Subject: Biology And Life Sciences, Neuroscience And Neurology Keywords: Vision; Information Theory; Neural Computation; Drosophila; Cognition; Compound Eye
Online: 16 August 2023 (13:51:01 CEST)
The traditional understanding of brain function has predominantly focused on chemical and electrical processes. However, new research in fruit fly (Drosophila) binocular vision reveals ultrafast photomechanical photoreceptor movements significantly enhance information processing, thereby impacting a fly's perception of its environment and behaviour. The coding advantages resulting from these mechanical processes suggest that similar physical motion-based coding strategies may affect neural communication ubiquitously. The theory of neural morphodynamics proposes that rapid biomechanical movements and microstructural changes at the level of neurons and synapses enhance the speed and efficiency of sensory information processing, intrinsic thoughts, and actions by regulating neural information in a phasic manner. We propose that morphodynamic information processing evolved to drive predictive coding, synchronising cognitive processes across neural networks to match the behavioural demands at hand effectively.
COMMUNICATION | doi:10.20944/preprints202308.0424.v1
Subject: Medicine And Pharmacology, Otolaryngology Keywords: barbed pharyngoplasty; lighting system; surgical vision; oral cavity: klaro
Online: 4 August 2023 (10:18:08 CEST)
Obstructive Sleep Apnea (OSA) surgery is now a viable solution in selected patients and the “remodelling” palatopharyngeal surgery is the most common one. Recently it becomes less in-vasive with the introduction of Barbed Sutures (BS). An optimization of surgical technique is represented by Barbed pharyngoplasty (BP), that requires surgical precision and needs efficient and precise oropharyngeal visualization. Consequently, the lighting system is of pivotal im-portance in BP. The aim of this work is to describe the first experience on the use of a new light-ing system, called KlaroTM in BP for OSA. We evaluated the Klaro™ system in 15 consecutives BP for OSA in comparison with conventional headlamp illumination. The visualization of pala-topharyngeal muscle in the bottom of the tonsillar fossa, entry and exit needle, such as needle tip were statistically better with KlaroTM than headlamp illumination both for surgeon and resi-dent (p<0.05). No significant differences for the visualization of the posterior pharyngeal wall and uvula were reported. The KlaroTM lighting system allows a satisfied illumination of oral cavity and oropharynx in the majority of cases. We encourage to use of KlaroTM not only in BP for OSA, but also in all oral and pharyngeal surgeries, including tonsillectomy and oncological surgery.
ARTICLE | doi:10.20944/preprints202307.0719.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: Scene matching; EPnP; Vision navigation; Positioning solution; Measurement models
Online: 11 July 2023 (11:35:38 CEST)
To handle the problem of solving the results of aircraft’s visual navigation with scene matching, this paper take the measure of the spherical EPnP positioning posture solving by measuring the central angle threshold value and approaches for constructing a measuring model. The detailed steps are as follows: firstly, this approach needs to construct a positioning coordinate model for the earth surface, makes sure the expression for the 3-dimensional coordinate of the earth surface and solves the positioning of constructing data model with EPnP positioning posture solving algo-rithm. Secondly, by contrasting and analyzing the positioning posture value of approximate plane coordinates, the critical value is acquired, which can be recognized as plane calculation. Lastly, this method should construct a theoretical model of measurement for the visual height and central angle with the decided central angle threshold value. The simulation experiment shows that the average positioning precision of taking the spherical coordinates as input is 16.42 percent way higher than taking the plane coordinates as input. When the central angle is less than 0.05 degrees and the surface district is less than 5585 square meters, the positioning precision of the plane co-ordinates is pretty much equal to the spherical coordinates. At this moment, the sphere can be seen as flat. The conclusion of this essay can theoretically guide the further study of positioning posture solving of the scene matching, which is also of vital significance for theory research and engineering application.
ARTICLE | doi:10.20944/preprints202307.0655.v1
Subject: Social Sciences, Geography, Planning And Development Keywords: global sustainability; sustainable projects; quality of flows; architectural vision
Online: 11 July 2023 (04:41:21 CEST)
Space syntax can be potentially applied for the evaluation of quality of user flows particularly in solving problems of sustainable projects in spaces intended for users to move around. This study aims to analyze the concepts of pedestrian flows (open and closed) by space syntax based bibli-ographical approach on a global scale, demonstrating the capability of improvements in Sus-tainable Development Goals (SDGs) as applied to the architecture of sustainable flows. Scopus theoretical reference bases were used, which are directly related to the theme of space syntax in open and closed spaces. Frequency analyzes were carried out, applying the content analysis, to identify words with a degree of similarity, related to “space syntax: flow in urban environ-ments” and “space syntax in closed built systems”, in relations to the SDGs. The results show that pedestrian flows identified in the literature aid to understand the global production on space syntax in open and closed spaces, directed to user flows in the built environment, where many environments of flows in the built space analyzed become unsustainable because they do not present full flow efficiency. In our study on space syntax, the following central terms were identified: pedestrian movements (open urban systems) and space (closed built systems), which allows a better understanding of the flows, highlighting the importance of the urbanist architect on the functionality of user flows in sustainable architectural projects on a global scale. The most frequent categories for open environments were the terms space and form and for closed envi-ronments were space and flow.
ARTICLE | doi:10.20944/preprints202305.0466.v1
Subject: Engineering, Mechanical Engineering Keywords: soft actuator; modular design; machine vision; flexible clamping technology
Online: 8 May 2023 (08:21:53 CEST)
Conducting related research is crucial since there are still many problems that need to be resolved in the research on soft robots in the areas of material selection, structure design and manufacture, and drive control. Soft manipulators, a subset of soft robots, are now a popular area of study for many researchers. In comparison to typical manipulators, soft manipulators feature a high degree of gripping flexibility and a basic morphological structure. They are composed of flexible materials. It has a wide range of potential applications in healthcare, rehabilitation, bionics, and detection, and it can compensate for the drawbacks of rigid manipulators in some use scenarios. A modular soft-body torsional gripping system is developed after a torsional and gripping actuator is conceived, constructed, and its performance is examined. The torsion actuator and the grasping actuator can be combined in the system in a modular fashion. With the help of RGB-D vision algorithms, this multi-modular setup makes it possible to combine soft actuators with various twisting degrees and achieve exact gripping. Through pneumatic control, the target object is precisely grasped and rotated at various angles, enabling rotation of the target object in three dimensions.
ARTICLE | doi:10.20944/preprints202305.0379.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: point cloud classification; deep learning; computer vision; scene understanding
Online: 6 May 2023 (05:19:09 CEST)
The point cloud is a form of three-dimensional data that comprises various detailed features at multiple scales. Due to this characteristic and its irregularity, point cloud analysis based on deep learning is challenging. While previous works utilize the sampling-grouping operation of PointNet++ for feature description and then explore geometry by means of sophisticated feature extractors or deep networks, such operations fail to describe multi-scale features effectively. Additionally, these techniques have led to performance saturation. And it is difficult for standard MLPs to directly "mine" point cloud. To address above problems, we propose the Detail Activation (DA) module, which encodes data based on Fourier transform after sampling and grouping. We activate the channels at different frequency levels from low to high in the DA module to gradually recover finer point cloud details. As training progresses, the proposed Point-MDA can uncover local and global geometries of point cloud progressively. Our experiments show that Point-MDA achieves superior classification accuracy, outperforming PointNet++ by 3.3% and 7.9% in terms of overall accuracy on the ModelNet40 and ScanObjectNN dataset, respectively. Furthermore, it accomplishes this without employing complicated operations, while exploring the full potential of PointNet++.
ARTICLE | doi:10.20944/preprints202303.0221.v1
Subject: Computer Science And Mathematics, Computer Networks And Communications Keywords: polyp segmentation; computer vision; ensemble; transformers; convolutional neural networks
Online: 13 March 2023 (07:31:25 CET)
In the realm of computer vision, semantic segmentation is the task of recognizing objects in images at the pixel level. This is done by performing a classification of each pixel. The task is complex and requires sophisticated skills and knowledge about the context to identify objects’ boundaries. The importance of semantic segmentation in many domains is undisputed. In medical diagnostics, it simplifies the early detection of pathologies, thus mitigating the possible consequences. In this work, we provide a review of the literature on deep ensemble learning models for polyp segmentation and we develop new ensembles based on convolutional neural networks and transformers. The development of an effective ensemble entails ensuring diversity between its components. To this end, we combine different models (HarDNet-MSEG, Polyp-PVT, and HSNet) trained with different data augmentation techniques, optimization methods, and learning rates, which we experimentally demonstrate to be useful to form a better ensemble. Most importantly, we introduce a new method to obtain the segmentation mask which is more suitable for combining transformers in an ensemble. In our extensive experimental evaluation, the proposed ensembles exhibit state-of-the-art performance.
ARTICLE | doi:10.20944/preprints202210.0366.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: skin segmentation; skin detection; computer vision; digital image processing
Online: 24 October 2022 (12:50:24 CEST)
A single paragraph of about 200 words maximum. For research articles, abstracts should give a pertinent overview of the work. We strongly encourage authors to use the following style of structured abstracts, but without headings: (1) Background: place the question addressed in a broad context and highlight the purpose of the study; (2) Methods: describe briefly the main methods or treatments applied; (3) Results: summarize the article’s main findings; (4) Conclusions: indicate the main conclusions or interpretations. The abstract should be an objective representation of the article, it must not contain results which are not presented and substantiated in the main text and should not exaggerate the main conclusions.
ARTICLE | doi:10.20944/preprints202204.0177.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: Plant disease; Machine vision; UAV; Smartphone; Convolutional Neural Network
Online: 19 April 2022 (07:44:29 CEST)
Stripe rust (caused by Puccinia striiformis f. sp. tritici) is one of the most devastating diseases of wheat and causes large-scale epidemics and severe yield loss. Applying fungicides during early epidemic development is crucial to controlling the disease but is often challenged by resource-limited human visual scouting. Deep learning has the potential to process images and videos captured from affordable devices to empower high-throughput phenotyping for early detection of stripe rust for timely application of fungicides and improve control efficiency. Here, we developed RustNet, a neural network-based image classifier, for efficiently monitoring fields for stripe rust. RustNet was built on a ResNet-18 architecture pre-trained with ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) dataset using transfer learning. RGB images and videos of multiple wheat fields with different wheat types (winter and spring wheat), conditions (irrigated and non-irrigated), and locations were acquired using smartphones or unmanned aerial vehicles near the canopy. A semi-automated image labeling approach was conducted to improve labeling efficiency by combining automated machine labeling and human correction. Cross-validations across multiple categories (sensor platforms, wheat types, and locations) achieved Area Under Curve from 0.72 to 0.87. Independent validation on a published dataset from Germany achieved accuracies ranging from 0.79 to 0.86. The visualization of the last convolutional layer of RustNet demonstrated the identification of pixels with stripe rust. RustNet is freely available at https://zzlab.net/RustNet.
ARTICLE | doi:10.20944/preprints202112.0349.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: yak; semantic segmentation; binocular vision; body size; weight stimation
Online: 9 March 2022 (10:02:00 CET)
In order to solve the labor-intensive and time-consuming problem in the process of measuring yak body ruler and weight in yak breeding industry in Qinghai Province, a non-contact method for measuring yak body ruler and weight was proposed in this experiment, and key technologies based on semantic segmentation, binocular ranging and neural network algorithm were studied to boost the development of yak breeding industry in Qinghai Province. Main conclusions: (1) Study yak foreground image extraction, and implement yak foreground image extraction model based on U-net algorithm; select 2263 yak images for experiment, and verify that the accuracy of the model in yak image extraction is over 97%. (2) Develop an algorithm for estimating yak body ruler based on binocular vision, and use the extraction algorithm of yak body ruler related measurement points combined with depth image to estimate yak body ruler. The final test shows that the average estimation error of body height and body oblique length is 2.6%, and the average estimation error of chest depth is 5.94%. (3) Study the yak weight prediction model; select the body height, body oblique length and chest depth obtained by binocular vision to estimate the yak weight; use two algorithms to establish the yak weight prediction model, and verify that the average estimation error of the model for yak weight is 10.7% and 13.01% respectively.
ARTICLE | doi:10.20944/preprints202111.0182.v1
Subject: Computer Science And Mathematics, Robotics Keywords: AHRS; Computer Vision; Dataset Acquisition; Deep Learning; Orientation Estimation.
Online: 9 November 2021 (14:35:21 CET)
The use of Attitude and Heading Reference Systems (AHRS) for orientation estimation is now common practice in a wide range of applications, e.g., robotics and human motion tracking, aerial vehicles and aerospace, gaming and virtual reality, indoor pedestrian navigation and maritime navigation. The integration of the high-rate measurements can provide very accurate estimates, but these can suffer from errors accumulation due to the sensors drift over longer time scales. To overcome this issue, inertial sensors are typically combined with additional sensors and techniques. As an example, camera-based solutions have drawn a large attention by the community, thanks to their low-costs and easy hardware setup; moreover, impressive results have been demonstrated in the context of Deep Learning. This work presents the preliminary results obtained by DOES , a supportive Deep Learning method specifically designed for maritime navigation, which aims at improving the roll and pitch estimations obtained by common AHRS. DOES recovers these estimations through the analysis of the frames acquired by a low-cost camera pointing the horizon at sea. The training has been performed on the novel ROPIS dataset, presented in the context of this work, acquired using the FrameWO application developed for the scope. Promising results encourage to test other network backbones and to further expand the dataset, improving the accuracy of the results and the range of applications of the method.
ARTICLE | doi:10.20944/preprints202010.0009.v1
Subject: Social Sciences, Psychology Keywords: visual search; vision loss; incidental learning; macular degeneration; fovea
Online: 1 October 2020 (09:12:00 CEST)
Foveal vision loss has been shown to reduce efficient visual search guidance due to contextual cueing by incidentally learned contexts. However, previous studies used artificial (T among L-shape) search paradigms that prevent the memorization of a target in a semantically meaningful scene. Here, we investigated contextual cueing in real-life scenes that allow explicit memory of target locations in semantically rich scenes. In contrast to the contextual cueing deficits in artificial scenes, contextual cueing in patients with age-related macular degeneration (AMD) did not differ from age-matched normal-sighted controls. We discuss this in the context of visuospatial working memory demands for which both eye-movement control in the presence of central vision loss and for memory-guided search may compete. Memory-guided search in semantically rich scenes may depend less on visuospatial working memory than search in abstract displays, potentially explaining intact contextual cueing in the former but not the latter. In a practical sense, our findings may indicate that Patients with AMD are less deficient than expected after previous lab experiments. This shows the usefulness of realistic stimuli in experimental clinical research.
ARTICLE | doi:10.20944/preprints202009.0022.v1
Subject: Engineering, Control And Systems Engineering Keywords: Artificial neural network; image processing; machine vision; yield monitoring
Online: 2 September 2020 (03:21:02 CEST)
Precision agriculture is a technology used by farmers to help food sustainability amidst growing population. One of the tools of precision agriculture is yield monitoring, which helps a farmer manage his production. Yield monitoring is usually done during harvest, however it could also be done early in the growing season. Early prediction of yield, specifically for fruit trees, aids the farmer in the marketing of their product and assists in managing production logistics such as labor requirement and storage needs. In this study, a machine vision system is developed to estimate fruit yield early in the season. The machine vision system uses a color camera to capture images of fruit trees during the full bloom period. An image segmentation algorithm based on an artificial neural network was developed to recognize and count the blossoms on the tree. The artificial neural network segmentation algorithm uses color information and position as input. The resulting correlation between the blossom count and the actual number of fruits on the tree shows the potential of this method to be used for early prediction of fruit yield.
ARTICLE | doi:10.20944/preprints202006.0170.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: object detection; semantic segmentation; computer vision; automatic check-out
Online: 14 June 2020 (12:51:26 CEST)
Auto checkout has received more and more attention in recent years and this system automatically generates a shopping bill by identifying the picture of the products purchased by the customers. However, the system is challenged by the domain adaptation problem, where each image of the training set contains only one commodity, whereas the test set is a collection of multiple commodities. The existing solution to this problem is to resynthesize the training images to enhance the training set. Then the composite images are rendered using CycleGAN to make the image distribution of the training set and the test set more similar. However, we find that the detection boxes given by the ground truth of the common dataset contain a large part of the background area, the area will affect the training process as noise. To solve this problem, we propose a mask data priming method. Specifically, we redo the large scale Retail Product Checkout (RPC) dataset and add segmentation annotation information to each item in the training set image based on the original dataset using pixel-level annotation. Secondly, a new network structure is proposed in which we train the network using joint learning of detectors and counters, and fine-tune the detection network by filtering out suitable images from the test set. Experiments on the RPC dataset have shown that our method yields better results. we used an approach that reached 81.87% compared to 56.68% for the baseline approach which demonstrates that pixel-level information helps to improve the detection results of the network.
ARTICLE | doi:10.20944/preprints201905.0243.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: Machine Vision; Morphological image filtering; Galvanic Industry; Rear-projection.
Online: 20 May 2019 (11:46:34 CEST)
In the fashion field, the use of electroplated small metal parts such as studs, clips and buckles is widespread. The plate is often made of precious metal, such as gold or platinum. Due to the high cost of these materials, it is strategically relevant and of primary importance for manufacturers to avoid any waste by depositing only the strictly necessary amount of material. To this aim, Companies need to be aware of the overall number of items to be electroplated so that it is possible to properly set the parameters driving the galvanic process. Accordingly, the present paper describes a Machine Vision-based method able to automatically count small metal parts arranged on a galvanic frame. The devised method relies on the definition of a proper acquisition system and on the development of image processing-based routines. Such a system is then implemented on a counting machine is meant to be adopted in the galvanic industrial practice to properly define a suitable set or working parameters (such as current, voltage and deposition time) for the electroplating machine and, thereby, to assure the desired plate thickness from one side and to avoid material waste on the other.
ARTICLE | doi:10.20944/preprints201904.0175.v1
Subject: Engineering, Automotive Engineering Keywords: IMU; vision; classification networks; hough transform; lane markings detection
Online: 15 April 2019 (13:13:19 CEST)
It's challenging to achieve robust lane detection depending on single frame when considering complicated scenarios. In order to detect more credible lane markings by using sequential frames, a novel approach to fusing vision and Inertial Measurement Unit (IMU) is proposed in this paper. The hough space is employed as the space where lane markings are stored and it's calculated by three steps. Firstly, a basic hough space is extracted by Hough Transform and primary line segments are extracted from it. In order to measure the possibility about line segments belong to lane markings, a CNNs based classifier is introduced to transform the basic hough space into a probabilistic space by using the networks outputs. However, this probabilistic hough space based on single frame is easily disturbed. In the third step, a filtering process is employed to smooth the probabilistic hough space by using sequential information. Pose information provided by IMU is applied to align hough spaces extracted at different times to each other. The final hough space is used to eliminate line segments with low possibility and output those with high confidence as the result. Experiments demonstrate that the proposed approach has achieved a good performance.
ARTICLE | doi:10.20944/preprints201903.0091.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: image analysis; pattern recognition; algorithms; computer vision; wheat spike
Online: 7 March 2019 (12:06:15 CET)
Spike shape and morphometric characteristics are among the key characteristics of cultivated cereals associated with their productivity. Identification of the genes controlling these traits requires morphometric data harvesting and analysis for numerous plants, which is automatable using technologies of digital image analysis. A method for wheat spike morphometry utilizing 2D image analysis is proposed. Digital images are acquired in two variants: a spike on a table (one projection) or fixed with a clip (four projections). The method identifies spike and awns in the image and estimates their quantitative characteristics (area in image, length, width, circularity, etc.). Models of sections, quadrilaterals, and radial model are proposed for describing spike shape. Parameters of these models are used to predict spike shape type (spelt, normal, or compact) by machine learning. The mean error in spike density prediction for the images in one projection is 4.61 (~18%) versus 3.33 (~13%) for the parameters obtained using four projections. F1 measure in automated spike classification into three types is 0.78 using logistic regression (one projection) and 0.85 using random forest method (four projections). The proposed method is implemented in Java; examples of images and user guide are available at http://wheatdb.org/werecognizer.
REVIEW | doi:10.20944/preprints202309.1939.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: HAR, Human Activity Recognition, Feature extraction, Machine Learning, Computer Vision
Online: 28 September 2023 (11:13:40 CEST)
Human Action Recognition is widely used in multiple fields to recognize activities and extract spatial and temporal information. This paper focuses on analyzing various methods as well as provides extensive knowledge of the foundational concepts of the HAR. A dataset is also very crucial for any research study so we discussed the popular dataset and its features in this paper.
ARTICLE | doi:10.20944/preprints202308.1330.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Plant recognition; Image Processing; Convolution neural network; Vision transformer; Classification
Online: 18 August 2023 (08:28:28 CEST)
Identification of plants is a challenging task which aims to identify the family, genus, and species level according to morphological features. Automated deep learning-based computer vision algorithms are widely used for identifying plants and can help users to narrow down the possibilities. However, numerous morphological similarities between and within species make the classification difficult. In this paper, we tested a custom convolution neural network (CNN) and vision transformer (ViT) based models using the PyTorch framework to classify plants. We used a large dataset of 88K and 16K images for classifying plants at genus and species levels respectively. Our results show that for classifying plants at the genus level, ViT models perform better compared to CNN-based models ResNet50 and ResNet-RS-420, and other state-of-the-art CNN-based models suggested in previous studies on a similar dataset. The ViT model achieved top accuracy of 83.3% for classifying plants at the genus level. ViT models also perform better for classifying plants at the species level compared to CNN-based models ResNet50 and ResNet-RS-420, with a top accuracy of 92.5%. We show that the correct set of augmentation techniques plays an important role in classification success.
ARTICLE | doi:10.20944/preprints202308.0373.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: ReID; Pyramid Vision Transformer; local feature clustering; side information embeddings
Online: 4 August 2023 (07:26:19 CEST)
Due to the influence of background conditions, lighting conditions, occlusion issues and the image resolution, how to extract robust person features is one of the difficulties in ReID research. Vision in Transformers (ViT) has achieved significant results in the field of computer vision. However, the existing problems still limit its application in ReID due to slow extraction of person features and difficulty in utilizing local features of people. To solve the mentioned problems, we utilize Pyramid Vision Transformer (PVT) as the backbone of feature extraction and propose a PVT-based ReID method in conjunction with other studies. Firstly, some improvements suitable for ReID are used on the PVT backbone, and we establish a basic model by using powerful methods verified on CNN-based ReID. Secondly, in an effort to further promote the robustness of the person features extracted by the PVT backbone, two new modules are designed. (1) The local feature clustering (LFC) is recommend to enhance the robustness of person features by calculating the distance between local features and global feature to select the most discrete local features and clustering them. (2) The side information embeddings (SIE) are used to encode non-visual information and send it into the network for training to reduce its impact on person features. Finally, the experiments show that PVTReID has achieved excellent results in ReID datasets and are 20% faster on average than CNN-based ReID methods.
ARTICLE | doi:10.20944/preprints202306.1733.v1
Subject: Biology And Life Sciences, Insect Science Keywords: Hemiptera; opsin; gene loss; color vision; compensatory neofunctionalization; tuning site
Online: 26 June 2023 (04:26:48 CEST)
Expanding previous efforts to survey the visual opsin repertoires of the Hemiptera, this study confirms that homologs of the UV- and LW-opsin subfamilies are conserved in all Hemiptera, while the B-opsin subfamily is missing from the Heteroptera and subgroups of the Sternorrhyncha and Auchenorrhyncha, i.e. aphids (Aphidoidea) and planthoppers (Fulgoroidea), respectively. Unlike in the Heteroptera, which are characterized by multiple expansions of the LW-opsin subfamily, the lack of B-opsin correlates with the presence of tandem-duplicated UV-opsins in aphids and planthoppers. Available data on organismal wavelength sensitivities and retinal gene expression patterns lead to the conclusion that, in both groups, one UV-opsin paralog shifted from ancestral UV peak sensitivity to derived blue sensitivity, thereby compensating for the lost B-opsin. Two parallel bona fide tuning site substitutions compare to 18 non-corresponding amino acid replacements in the blue-shifted UV-opsin paralogs of aphids and planthoppers. Most notably, while the aphid blue-shifted UV-opsin clade is characterized by a replacement substitution at one of the best-documented UV/blue tuning sites (Rhodopsin site 90), the planthopper blue-shifted UV-opsin paralogs retained the ancestral lysine at this position. The combined findings identify aphid and planthopper UV-opsins as a new valuable data sample for studying adaptive opsin evolution.
ARTICLE | doi:10.20944/preprints202306.0787.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Low-light Image Enhancement; Unsupervised Learning; Physics-inspired Computer Vision
Online: 12 June 2023 (07:31:34 CEST)
With the advent of deep learning, significant progress has been made in low-light image enhancement methods. However, deep learning requires enormous paired training data, which is challenging to capture in real-world scenarios.To address this limitation, this paper presents a novel unsupervised low-light image enhancement method, which first introduces the frequency domain features of images in low-light image enhancement tasks. Our work is inspired by imagining a digital image as a spatially varying metaphoric “field of light”, then subjecting the influence of physical processes such as diffraction and coherent detection back onto the original image space via a frequency-domain to spatial-domain transformation (inverse Fourier transform). However, the mathematical model created by this physical process still requires complex manual tuning of the parameters for different scene conditions to achieve the best adjustment. Therefore, we proposed a dual-branch convolution network to estimate pixel-wise and high-order spatial interactions for dynamic range adjustment of the frequency feature of the given low-light image. Guided by the frequency feature from the “field of light” and parameter estimation networks, our method enables dynamic enhancement of low-light images. Extensive experiments have shown that our method performs well compared to state-of-the-art unsupervised, and its performance approximates the level of the state-of-the-art supervised methods qualitatively and quantitatively. At the same time, the light network structure design allows the proposed method to have an extremely fast inference speed(nearly 150 FPS on an NVIDIA 3090 Ti GPU for an image of size 600×400×3 ). Furthermore, the potential benefits of our method to object detection in the dark are discussed.
ARTICLE | doi:10.20944/preprints202305.2127.v1
Subject: Social Sciences, Ethnic And Cultural Studies Keywords: Hispanic; Familism; Vision Impairment; Hearing Impairment; Social Isolation; Cognitive Functioning
Online: 30 May 2023 (11:28:27 CEST)
Objectives: Understanding the intersection of age, ethnicity, and disability will become increasingly important as the global population ages and becomes more diverse. By 2060, Hispanics will comprise 28% of the U.S. population. This study examines critical associations between sensory impairment, social isolation, and cognitive functioning among Hispanic older adults. Methods: Our sample consisted of 557 Hispanic older adults that participated in Rounds 1-3 or Rounds 5-7 of the National Health and Aging Trends Study. Longitudinal mediation models across a three-year span were estimated using Mplus with vision, hearing, and dual sensory impairment predicting cognitive functioning directly and indirectly through social isolation. Results: Findings indicated that cognitive functioning was concurrently and, in certain cases, longitudinally predicted by vision and dual sensory impairment, and by social isolation. Contrary to expectations, vision and hearing impairment were not predictive of social isolation. Dual sensory impairment was associated with social isolation, yet no significant indirect associations were found for sensory impairments predicting cognitive functioning through social isolation. Discussion: The finding that social isolation did not mediate the relationship between sensory impairment and cognitive decline among Hispanic older adults in the U.S. is contrary to findings from other studies that were not specifically focused on this population. This finding may be evidence that culturally motivated family support and intergenerational living buffer the impact of sensory impairments in later life. Findings suggest that Hispanic older adults experiencing dual sensory impairments may benefit from interventions that foster social support and include family members.
ARTICLE | doi:10.20944/preprints202304.0401.v1
Subject: Engineering, Marine Engineering Keywords: traffic safety; offshore wind farms; YOLOv3; stereo vision; deep learning
Online: 17 April 2023 (04:38:08 CEST)
Newly build offshore wind farms (OWFs) render a collision risk between ships and installations. The paper proposed a real-time traffic monitoring method based on machine vision and deep learning technology to improve the efficiency and accuracy of the traffic monitoring system in the vicinity of offshore wind farms. Specifically, the method employs real automatic identification system (AIS) data to train a machine vision model, which is then used to identify passing ships in OWF waters. Furthermore, the system utilizes stereo vision techniques to track and locate the positions of passing ships. The method is tested in offshore water in China to validate its reliability. The results prove that the system sensitively detects the dynamic information of the passing ships, such as the distance between ships and OWFs, ship speed and course. Overall, this study provides a novel approach to enhancing the safety of OWFs, which is increasingly important as the number of such installations continues to grow. By employing advanced machine vision and deep learning techniques, the proposed monitoring system offers an effective means of improving the accuracy and efficiency of ship monitoring in challenging offshore environments.
ARTICLE | doi:10.20944/preprints202202.0204.v1
Subject: Medicine And Pharmacology, Pharmacy Keywords: computer vision; image processing; medication adherence; object detection; pill detection
Online: 17 February 2022 (08:45:14 CET)
Objective tools to track medication adherence are lacking. A tool to monitor pill intake that can be implemented in mHealth apps without the need for additional devices was developed. We propose a pill intake detection tool that uses digital image processing to analyze images of a blister to detect the presence of pills. The tool uses the circular Hough transform as a feature extraction technique and is therefore primarily useful for the detection of pills with a round shape. This pill detection tool is composed of two steps. First, the registration of a full blister and storing of reference values in a local database. Second, the detection and classification of taken and remaining pills in similar blisters, to determine the actual number of untaken pills. In the registration of round pills in full blisters, 100% of pills in gray blisters or blisters with a transparent cover were successfully detected. In counting of untaken pills in partially opened blisters, 95.2% of remaining and 95.1% of taken pills were detected in gray blisters, while 88.2% of remaining and 80.8% of taken pills were detected in blisters with a transparent cover. The proposed tool provides promising results for the detection of round pills. However, the classification of taken and remaining pills need to be further improved, in particular for the detection of pills with non-oval shapes.
ARTICLE | doi:10.20944/preprints202110.0020.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: journalism; artificial intelligence; computer science; machine learning; computer vision; NLP.
Online: 21 December 2021 (14:01:16 CET)
In recent years, news media has been greatly disrupted by the potential of technologically driven approaches in the creation, production, and distribution of news products and services. Artificial intelligence (AI) has emerged from the realm of science fiction and has become a very real tool that can aid society in addressing many issues, including the challenges faced by the news industry. The ubiquity of computing has become apparent and has demonstrated the different approaches that can be achieved using AI. We analyzed the news industry’s AI adoption based on the seven subfields of AI: (i) machine learning; (ii) computer vision (CV); (iii) speech recognition; (iv) natural language processing (NLP); (v) planning, scheduling, and optimization; (vi) expert systems; and (vii) robotics. Our findings suggest that three subfields are being developed more in the news media: machine learning, computer vision, as well as planning, scheduling, and optimization. Other areas have not been fully deployed in the journalistic field. Most AI news projects rely on funds from tech companies such as Google. This limits AI’s potential to a small number of players in the news industry. We make conclusions by providing examples of how these subfields are being developed in journalism and present an agenda for future research.
ARTICLE | doi:10.20944/preprints202105.0047.v1
Subject: Business, Economics And Management, Accounting And Taxation Keywords: Textual analysis; Media; Correspondence analysis; Wavelet thresholding; KSA-2030 Vision
Online: 5 May 2021 (12:28:53 CEST)
In the present paper, we propose a wavelet method to study the impact of electronic media on economic situations. We precisely apply wavelet techniques versus classical methods to analyze economic indices in the market. The technique consists firstly in filtering the data from unprecise circumstances (noise) to construct next a wavelet denoised contingency table. Next, a thresholding procedure is applied to such a table to extract the essential information porters. The resulting tables subject finally to correspondence analysis before and after thresholding. As a case of study, we are empirically concerned with the 2030 KSA vision in electronic and social media. Effects of the electronic media texts about the trading 2030 Vision on the Saudi and global economy have been studied. Recall that the Saudi market is the most important representative market in the GCC continent. It has both regional and worldwide influence on economies and besides, it is characterized by many political, economic, and financial movements such as the worldwide economic NEOM project. The findings provided in the present paper may be applied to predict future GCC markets situation and thus may be a basis for investors’ decisions in such markets.
Subject: Medicine And Pharmacology, Ophthalmology Keywords: visual cortical prosthesis; brain-machine interface; electrical stimulation; prosthetic vision
Online: 23 March 2021 (10:42:30 CET)
The electrical stimulation of the visual cortices has the potential to restore vision to blind individuals. Until now, the results of visual cortical prosthetics has been limited as no prosthesis has restored a full working vision but the field has shown a renewed interest these last years thanks to wireless and technological advances. However, several scientific and technical challenges are still open in order to achieve the therapeutic benefit expected by these new devices. One of the main challenges is the electrical stimulation of the brain itself. In this review, we analyze the results in electrode-based visual cortical prosthetics from the electrical point of view. We first briefly describe what is known about the electrode-tissue interface and safety of electrical stimulation. Then we focus on the psychophysics of prosthetic vision and the state-of-the-art on the interplay between the electrical stimulation of the visual cortex and phosphene perception. Lastly, we discuss the challenges and perspectives of visual cortex electrical stimulation and electrode array design to develop the new generation implantable cortical visual prostheses.
ARTICLE | doi:10.20944/preprints202102.0146.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: SO2 emissions; computer vision; time-averaged dispersion model; CrIS; JPSS
Online: 4 February 2021 (21:53:35 CET)
Long-term continuous time series of SO2 emissions are considered critical elements of both volcano monitoring and basic research into processes within magmatic systems. One highly successful framework for computing these fluxes involves reconstructing a representative time-averaged SO2 plume from which to estimate the SO2 source flux. Previous methods within this framework have used ancillary wind datasets from reanalysis or numerical weather prediction (NWP) to construct the mean plume and then again as a constrained parameter in the fitting. Additionally, traditional SO2 datasets from ultraviolet (UV) sensors lack altitude information which must be assumed to correctly calibrate the SO2 data and to capture the appropriate NWP wind level which can be a significant source of error. We have made novel modifications to this framework which do not rely on prior knowledge of the winds and therefore do not inherit errors associated with NWP winds. To perform the plume rotation, we modify a rudimentary computer vision algorithm designed for object detection in medical imaging to detect plume-like objects in gridded SO2 data. We then fit a solution to the general time-averaged dispersion of SO2 from a point source. We demonstrate these techniques using SO2 data generated by a newly developed probabilistic layer height and column loading algorithm designed for the Cross-track Infrared Sounder (CrIS), a hyperspectral infrared sensor aboard the Joint Polar Satellite System’s Suomi-NPP and NOAA-20 satellites. This SO2 data source is best suited to flux estimates at high-latitude volcanoes and at low-latitude, but high-altitude volcanoes. Of particular importance, IR SO2 data can fill an important data gap in the UV-based record: estimating SO2 emissions from high-latitude volcanoes through the polar winters when there is insufficient solar backscatter for UV sensors to be used.
ARTICLE | doi:10.20944/preprints202008.0336.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: image processing; image classification; computer vision; expert systems; amber gemstones
Online: 15 August 2020 (04:39:11 CEST)
The article describes a classification solution for amber stones. The problem of classifying amber is known for a long time among jewelers and artisans of amber art. Existing solutions can classify amber pieces according to color, but a need to classify by shape and texture is not satisfied up to now. The proposed solution is capable of classifying the gemstones according to a shape. Amber can be considered as a specific object since the form is difficult to define unambiguously. Data for amber experiments was gathered from amber art craftsmen. In the proposed solution amber form can be classified into 10 different classes (7 classes chosen during the experiment).
ARTICLE | doi:10.3390/sci2010018
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: colour words; hue histogram; colour representation; machine learning; computer vision
Online: 24 March 2020 (00:00:00 CET)
Ancient numismatics, that is, the study of ancient currencies (predominantly coins), is an interesting domain for the application of computer vision and machine learning, and has been receiving an increasing amount of attention in recent years. Notwithstanding the number of articles published on the topic, the variety of different methodological approaches described, and the mounting realisation that the relevant problems in the field are most challenging indeed, all research to date has entirely ignored one specific, readily accessible modality: colour. Invariably, colour is discarded and images of coins treated as being greyscale. The present article is the first one to question this decision (and indeed, it is a decision). We discuss the reasons behind the said choice, present a case why it ought to be reexamined, and in turn investigate the issue for the first time in the published literature. Specifically, we propose two new colour-based representations specifically designed with the aim of being applied to ancient coin analysis, and argue why it is sensible to employ them in the first stages of the classification process as a means of drastically reducing the initially enormous number of classes involved in type matching ancient coins (tens of thousands, just for Ancient Roman Imperial coins). Furthermore, we introduce a new data set collected with the specific aim of denomination-based categorisation of ancient coins, where we hypothesised colour could be of potential use, and evaluate the proposed representations. Lastly, we report surprisingly successful performances which goes further than confirming our hypothesis—rather, they convincingly demonstrate a much higher relevant information content carried by colour than even we expected. Thus we trust that our findings will be noted by others in the field and that more attention and further research will be devoted to the use of colour in automatic ancient coin analysis.
ARTICLE | doi:10.20944/preprints201911.0168.v1
Subject: Social Sciences, Cognitive Science Keywords: fine motor precision; vision; proprioception; sex differences; individual differences; personality
Online: 15 November 2019 (03:46:22 CET)
Previous studies have reported certain sex differences in motor performance precision. The aim of the present study was to analyse sex differences in fine motor precision performance for both hands in different tests conditions. 220 Spanish participants (ages: 12-95) performed fine motor tasks - tracing over the provided models – lines of 40 mm for both hands, two sensory conditions (PV – proprioceptive-visual; P – proprioceptive only) and three movement types (F – frontal, T – transversal and S - Sagittal). Differences in line length (the task focused on precision) were observed through MANOVA analysis for all test conditions, both sexes and different age groups. Sex differences in precision were observed in F and T movement types (statistically significance level and higher Cohens’ d was observed in condition with vision). No any statistically significant differences were observed in both hands and sensory conditions in sagittal type. Sex differences in fine motor precision were more frequently observed in the PV sensory condition in the frontal movement type and less in the sagittal one.
ARTICLE | doi:10.20944/preprints201801.0195.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: spacecraft; structure from motion; monocular vision; component detection; structure analysis
Online: 22 January 2018 (05:11:39 CET)
A monocular vision pose estimation and identification algorithm used on a small spacecraft for future orbital servicing is studied in this paper. A tracker spacecraft equipped with a short-range vision system is proposed to recover the 3D structural model of a space target in orbit and automatically identify its solar panels and main body using only visual information from an onboard camera. The proposed reconstruction and identification framework is tested using structure-from-motion and point cloud identification methods. The Efficient Perspective-n-Points (EPnP) descriptor is used for pose estimation. Triangulated points are used for component segmentation by means of orientation histogram descriptors. Experimental results based on laboratory images of a spacecraft model show the effectiveness and robustness of our approach.
ARTICLE | doi:10.20944/preprints201705.0170.v1
Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: accuracy; depth data; RMS error; 3D vision sensors; stereo disparity
Online: 23 May 2017 (09:20:27 CEST)
We propose an approach for estimating the error in depth data provided by generic 3D sensors, which are modern devices capable of generating an image (RGB data) and a depth map (distance) or other similar 2.5D structure (e.g. stereo disparity) of the scene. Our approach starts capturing images of a checkerboard pattern devised for the method. Then proceed with the construction of a dense depth map using functions that generally comes with the device SDK (based on disparity or depth). The 2D processing of RGB data is performed next to find the checkerboard corners. Clouds of corner points are finally created (in 3D), over which an RMS error estimation is computed. We come up with a multi-platform system and its verification and evaluation has been done, using the development kit of the board nVIDIA Jetson TK1 with the MS Kinects v1/v2 and the Stereolabs ZED camera. So the main contribution is the error determination procedure that does not need any data set or benchmark, thus relying only on data acquired on-the-fly. With a simple checkerboard, our approach is able to determine the error for any device. Envisioned application is on 3D reconstruction for robotic vision, with a series of 3D vision sensors embarked in robots (UAV of type quadcopter and terrestrial robots) for high-precision map construction, which can be used for sensing and monitoring.
ARTICLE | doi:10.20944/preprints201611.0034.v1
Subject: Engineering, Automotive Engineering Keywords: blossoms; digital image processing; machine vision; peaches; unmanned aerial system
Online: 7 November 2016 (05:18:19 CET)
One of the tools for optimal crop production is regular monitoring and assessment of crops. During the growing season of fruit trees, the bloom period has increased photosynthetic rates that correlate with the fruiting process. This paper presents the development of an image processing algorithm to detect peach blossoms on trees. Images of an experimental peach orchard were acquired from the Parma Research and Extension Center of the University of Idaho using an off-the-shelf unmanned aerial system (UAS), equipped with a multispectral camera (Near-infrared, Green, Blue). The orchard has different stone fruit varieties and different plant training system. Individual tree images (high-resolution) and arrays of trees images (low-resolution) were acquired to evaluate the detection capability. The image processing algorithm was based on different vegetation indices. Initial results showed that the image processing algorithm could detect peach blossoms and demonstrate good potential as a monitoring tool for orchard management.
ARTICLE | doi:10.20944/preprints202306.1069.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: wheel surface defect detection; deep learning; YOLO; object detection; machine vision
Online: 15 June 2023 (07:20:42 CEST)
Surface defect detection is a crucial step in the process of automotive wheel production. However, the task possesses challenges due to complex background and a wide range of defect types. In order to detect the defects on the wheel surface accurately and quickly, this paper proposes a YOLOv5-based algorithm for automotive wheel surface defect detection. The algorithm trains and tests the YOLOv5s model using the self-created automotive wheel surface defect dataset, which contains four kinds of defects: linear, dotted, sludge, pinhole. The extensive experimental results demonstrate that the deep learning network trained by our method can achieve an average accuracy of 71.7% and 57.14 FPS. Our findings prove that this detection algorithm performs better than other common target detection algorithms and meets the real-time requirements of industrial applications.
REVIEW | doi:10.20944/preprints202208.0313.v3
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Convolutional Neural Network; domain; natural language processing; computer vision; semantic parsing
Online: 18 August 2022 (07:39:33 CEST)
Convolutional neural network (CNN), a class of artificial neural network (ANN) is attracting interests of researchers in all research domain. CNN was invented for computer vision. They have also shown to be useful for semantic parsing, sentence modeling and other natural language processing related tasks. Here in this paper we discuss the basics of CNN models and their scope to provide a reference/baseline to the researchers interested in using CNN models in their research.
ARTICLE | doi:10.20944/preprints202108.0282.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Classification of insulators; Electrical power system; k-Nearest neighbors; Computer vision.
Online: 13 August 2021 (11:45:50 CEST)
The contamination on the insulators may increase its surface conductivity and, as a consequence, electrical discharges occur more frequently, which can lead to interruptions in the power supply. To maintain reliability in the electrical distribution power system, components that have lost their insulating properties must be replaced. Identifying the components that need maintenance, is a difficult task as there are several levels of contamination that are hardly noticed during inspections. To improve the quality of inspections, this paper proposes to use the k-nearest neighbours (k-NN) to classify the levels of insulator contamination, based on the image of insulators at various levels of contamination simulated in the laboratory. Using computer vision features such as mean, variance, asymmetry, kurtosis, energy, and entropy are used for training the k-NN. To assess the robustness of the proposed approach, statistical analysis and a comparative assessment with well-consolidated algorithms such as decision tree, ensemble subspace, and support vector machine models are presented. The k-NN showed results of up to 85.17 % accuracy using the k-fold cross-validation method, with an average accuracy higher than 82 % for multi-classification of the contamination of the insulators, being superior to the compared models.
DATASET | doi:10.20944/preprints202005.0345.v2
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: computer vision; deep learning; Earth Engine; remote sensing; renewable energy; Tensorflow
Online: 21 July 2021 (14:53:14 CEST)
We have an unprecedented ability to map the Earth’s surface as deep learning technologies are applied to an abundance of high-frequency Earth observation data. Simple, free, and effective methods are needed to enable a variety of stakeholders to use these tools to improve scientific knowledge and decision making. Here we present a trained U-Net model that can map and delineate ground mounted solar arrays using publicly available Sentinel-2 imagery, and that requires minimal data pre-processing and no feature engineering. By using label overloading and image augmentation during training, the model is robust to temporal and spatial variation in imagery. The trained model achieved a precision and recall of 91.5% each and an intersection over union of 84.3% on independent validation data from two distinct geographies. This generalizability in space and time makes the model useful for repeatedly mapping solar arrays. We use this model to delineate all ground mounted solar arrays in North Carolina and the Chesapeake Bay watershed to illustrate how these methods can be used to quickly and easily produce accurate maps of solar infrastructure.
REVIEW | doi:10.20944/preprints202103.0449.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: vision rehabilitation; review of systems; traumatic brain injury; concussion; patient advocacy.
Online: 17 March 2021 (16:05:12 CET)
Treating a patient with traumatic brain injury requires an interdisciplinary approach because of the pervasive, profound and protean manifestations of this condition. In this review, key aspects of the medical history and review of systems will be described in order to highlight how the role of any provider must evolve to become a better patient advocate. Although this review is written from the vantage point of a vision care provider, it is hoped that patients, caregivers and providers will recognize the need for the team approach; it truly takes a village.
ARTICLE | doi:10.20944/preprints202102.0152.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: color vision deficiency; medical students; ishihara plates; humans; incidence; prevalence; frequency
Online: 5 February 2021 (09:58:31 CET)
Introduction Color vision deficiency (CVD) constitutes one of the frequently observed eye disorders in all human populations. Color is a prominent sign utilized in the medical profession to study and identify histopathological specimens, lab instruments, and patient examination. Color deficiency affects the medical skills of students resulting in poor clinical examination and color appreciation. There is no effective screening of CVD at any level of the medical profession. Hence, this study was aimed to determine the prevalence of CVD among medical students. Materials and methods This was a cross-sectional study conducted from September 2019 to February 2020 over a period of six months in Karachi, Pakistan. All medical students aged 18-21 years of either gender enrolled in the first and second years of medical college were included in this study. The examination was performed during daylight. Ishihara plates were placed at a distance of 75 cm from the subject and tilted so that the plane of the paper lies perpendicular to the line of vision. Students were given five seconds to read the plate and one examiner was instructed to mark the checklist. A score of less than 12 out of 14 red/green test plates (not including the demonstration plate) was considered as a CVD. All statistical analysis was performed using Statistical Package for Social Sciences version 20.0 (Armonk, NY: IBM Corp). Results The mean age of the medical students was 19.61± 1.22 years. There were (n=123) 53.0% females and (n=111) 47.0% males. Most of the medical students (n=131, 56.0%) belonged to the upper-middle-class socioeconomic group. CVD was observed in (n=13) 6.0%of medical students. Age (p=0.001) and socioeconomic status (p=0.001) were the only demographic factors significantly associated with color deficiency. Conclusions Color deficiency, although an unnoticed concern, is fairly common among medical students. Medical students must be screened for CVD as this will enable them to be aware of their limitations in their future observational skills as a doctor and devise ways of overcoming them in clinical practice.
ARTICLE | doi:10.20944/preprints202008.0487.v1
Subject: Social Sciences, Geography, Planning And Development Keywords: Twitter; data reliability; risk communication; data mining; Google Cloud Vision API
Online: 22 August 2020 (02:32:40 CEST)
While Twitter has been touted to provide up-to-date information about hazard events, the reliability of tweets is still a concern. Our previous publication extracted relevant tweets containing information about the 2013 Colorado flood event and its impacts. Using the relevant tweets, this research further examined the reliability (accuracy and trueness) of the tweets by examining the text and image content and comparing them to other publicly available data sources. Both manual identification of text information and automated (Google Cloud Vision API) extraction of images were implemented to balance accurate information verification and efficient processing time. The results showed that both the text and images contained useful information about damaged/flooded roads/street networks. This information will help emergency response coordination efforts and informed allocation of resources when enough tweets contain geocoordinates or locations/venue names. This research will help identify reliable crowdsourced risk information to enable near-real time emergency response through better use of crowdsourced risk communication platforms.
ARTICLE | doi:10.20944/preprints201704.0130.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: intelligent robotics; flexibility; reusability; multisensor; state machine; software architecture; computer vision
Online: 20 April 2017 (04:14:33 CEST)
This paper presents a state machine based architecture which enhances flexibility and reusability of industrial robots, more concretely dual-arm multisensor robots. The proposed architecture, in addition to allowing absolute control of the execution, eases the programming of new applications by increasing the reusability of the developed modules. Through an easy-to-use graphical user interface, operators are able to create, modify, reuse and maintain industrial processes increasing the flexibility of the cell. Moreover, the proposed approach is applied in a real use case in order to demonstrate its capabilities and feasibility in industrial environments. A comparative analysis is presented for evaluating presented approach versus traditional robot programming techniques.
ARTICLE | doi:10.20944/preprints202309.2179.v1
Subject: Engineering, Metallurgy And Metallurgical Engineering Keywords: Convolutional Neural Networks; Computer Vision; Artificial Intelligence; Iron Ore Pellets; ISO 4698
Online: 1 October 2023 (07:12:52 CEST)
Iron ore processing involves critical steps that affect the quality of the final product. Determining the diameter of the pellets is the necessary initial step for volume measurement and, ultimately, for porosity and bulk density measurement, crucial characteristics for optimizing the burning process in the blast furnace. Traditional measurement methods using mercury present issues related to operator safety and environmental preservation, while the practices described in ISO 4698 standard require lengthy preparation and execution time. In light of environmental needs, operator safety, and the time consumed in tests, implementing a new method based on digital image processing and convolutional neural networks for measuring the diameter of burnt iron ore pellets is proposed. The methodology involves capturing images of the pellets using a high-resolution camera and utilizing digital image processing and neural networks capable of performing pixel-by-pixel object segmentation in the images, providing precise information about the pellets to calculate their volume automatically. The results were compared with those obtained using traditional methods, evaluating their conformity with the ISO 4698 standard. In conclusion, this study offers a new approach to measuring the volume of burnt iron ore pellets, providing accurate and reliable results in compliance with safety and environmental preservation standards.
ARTICLE | doi:10.20944/preprints202308.2180.v1
Subject: Social Sciences, Urban Studies And Planning Keywords: urban green spaces; human activities; Convolutional Neural Networks; computation vision; urban parks
Online: 31 August 2023 (13:18:41 CEST)
Understanding park events and their categorization offer pivotal insights into urban parks and their integral roles in cities. This study utilized images and event category data from the New York City Parks Events Listing database to train a Convolutional Neural Network (CNN) for image-based park event categorization. Different CNN models were tuned to complete this multi-label classification task, their performances compared. Preliminary results underscore the efficacy of deep learning in automating the event classification process, revealing the multifaceted activities within urban green spaces. The CNN showcased proficiency in discerning various event nuances, emphasizing the diverse recreational and cultural offerings of urban parks. Such categorization has potential applications in urban planning, aiding decision-making processes related to resource distribution, event coordination, and infrastructure enhancements tailored to specific park activities.
ARTICLE | doi:10.20944/preprints202308.1909.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: mixed reality; object detection; OpenCV; YOLOv8; computer vision; quality inspection; smart manufacturing
Online: 29 August 2023 (08:43:57 CEST)
Quality control is a critical component in industrial manufacturing, directly influencing efficiency, product reliability, and ultimately, customer satisfaction. In the dynamic environment of industrial manufacturing, traditional methods of inspection may not adequately meet the evolving complexity, necessitating innovative approaches to bolster precision and productivity. In this study, we explore the application of mixed reality (MR) technology for real-time quality control in the assembly process. Our methodology involved the integration of smart glasses with a server-based image recognition system, designed to conduct real-time component analysis. The innovative aspect of our study lies in the harmonization of MR and computer vision algorithms, providing immediate visual feedback to inspectors and thereby improving the speed and accuracy of defect detection. YOLOv8 have been adopted in this study for detection object model. The project implementation occurred in a controlled environment to enable a comprehensive evaluation of the system functionality, the identification of possible problems and improvements in the system performance. The results indicated the viability of mixed reality as a powerful tool for enhancing traditional inspection processes. The fusion of MR and computer vision offers possibilities for future advancements in industrial quality control, paving the way for more efficient and reliable manufacturing ecosystems.
ARTICLE | doi:10.20944/preprints202308.0062.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: road traffic understanding; artificial intelligence; clustering; process mining; automated reasoning; computer vision
Online: 1 August 2023 (11:18:51 CEST)
Studying and understanding the behavior of people and vehicles on public roads can be of utmost importance for supporting the activities of many institutional stakeholders. It may allow automated supervision of the ongoing situation in a given place, with warnings or alarms raised in case of anomalies. It may be used to plan their interventions on road and town organization. It may provide them with advanced support to decision-making. The number of involved entities and places to manage makes it infeasible to manually handle all the traffic-related tasks. Moreover, the complexity of the tasks to be carried out requires the adoption of advanced approaches. Many AI solutions are nowadays mature to support these requirements. In some cases, the motivations and objectives of traffic management require the AI outcomes to be understandable, interpretable and explainable. In this paper, we propose TrAnSIT (TRaffic ANalysis Supervision and Interpretation Tool), an AI-based framework that combines several modules, each aimed at tackling a specific traffic-related task, so as to cover a wide landscape of traffic-related issues, from overall urban or suburban traffic management to surveying specific road segments that fall under the scope of one camera. Most of these modules are based on AI techniques that support a human-level understanding of the outcomes.
ARTICLE | doi:10.20944/preprints202306.1106.v1
Subject: Engineering, Other Keywords: Vision Transformers; white blood cells; explainable AI models; deep learning; Score-CAM
Online: 15 June 2023 (08:42:51 CEST)
Blood cell analysis is a crucial diagnostic process in medical practice. In particular, detecting white blood cells (WBCs) is essential for diagnosing of many diseases. The manual screening of blood films is a time-consuming and subjective process, which can lead to inconsistencies and errors. Therefore, automated detection of blood cells can improve the accuracy and efficiency of the screening process. In this study, an explainable Vision Transformer (ViT) model was proposed for the automatic detection of WBCs from blood films. The proposed model utilizes the self-attention mechanism to extract relevant features from the input images and leverages transfer learning by incorporating pre-trained model weights to improve its performance. The proposed model achieved a classification accuracy of 99.40% for five distinct types of WBCs and exhibited potential in reducing the time required for manual screening of blood films by pathologists. Upon examination of the misclassified test samples, it was observed that incorrect predictions were correlated with the presence or absence of granules in the cell samples. To validate this observation, the dataset was divided into two classes, namely Granulocytes and Agranulocytes, and a secondary training process was conducted. The resulting ViT model trained for binary classification achieved an accuracy of 99.70%, recall of 99.54%, precision of 99.32%, and F-1 score of 99.43% during the test phase. To ensure the reliability of the ViT model's multi-class classification of WBCs, the pixel areas that the model focuses on in its predictions are visualized through the Score-CAM algorithm.
ARTICLE | doi:10.20944/preprints202304.1074.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: data analysis; computer vision algorithms; visual data; natural language processing; scientific research
Online: 2 May 2023 (04:13:23 CEST)
The abundance of information in academic articles, reports, and studies can make it challenging for researchers to gain insights from the existing literature. To address this issue, there is a growing demand for tools that can help researchers effectively parse and analyze large volumes of data. One such tool is DataDiscoveryLab, a software system that utilizes computer vision algorithms and NLP techniques to parse academic articles into text and figures, creating three separate databases. These databases allow researchers to quickly identify articles that may be relevant to their research questions, gain a deeper understanding of the research presented, and analyze visual data. The integration of article mining and computer vision in the DataDiscoveryLab software system provides researchers with a powerful tool for navigating the vast amount of scientific literature available today. Yet, as we will discuss in the latter papers these databases’ purpose is to create a bridge between researchers’ data and practically unlimited scientific publications. Yet, in this article, we will discuss how we plan to do that, and our efforts on integrating deep learning modes. After all, unlike already existing AI models, DataDiscoveryLab can be their combination and the first Generative AI in academia that can encompass every part of the natural sciences.
ARTICLE | doi:10.20944/preprints202209.0275.v1
Subject: Engineering, Civil Engineering Keywords: Bicycle Behavior; Naturalistic Cycling Data; Car/Bike Interactions; Computer Vision; Object Detection
Online: 19 September 2022 (10:22:00 CEST)
As machine learning and computer vision techniques and methods continue to advance, the collection of naturalistic traffic data from video feeds is becoming more and more feasible. That is especially true for the case of bicycles, for which the collection of naturalistic data is not achievable in the traditional vehicle approach. This study describes a research effort that aims to extract naturalistic cycling data from a video dataset for use in safety and mobility applications. The used videos come from a dataset collected in a previous Virginia Tech Transportation Institute study in collaboration with SPIN in which continuous video data at a non-signalized intersection on the Virginia Tech campus was recorded. The research team applied computer vision and machine learning techniques to develop a comprehensive framework for the extraction of naturalistic cycling trajectories. In total, this study resulted in the collection and classification of 619 bicycle trajectories based on their type of interactions with other road users. The results confirm the success of the proposed methodology in relation to extracting the locations, speeds, and accelerations of the bicycles at a high level of precision. Furthermore, preliminary insights into the acceleration and speed behavior of bicyclists around motorists are determined.
ARTICLE | doi:10.20944/preprints202203.0064.v1
Subject: Social Sciences, Behavior Sciences Keywords: Computer vision; Google Street View; Built Environment; Walkability; Micro-scale; Deep learning
Online: 3 March 2022 (13:49:08 CET)
The study purpose was to train and validate a deep-learning approach to detect micro-scale streetscape features related to pedestrian physical activity. This work innovates by combining computer vision techniques with Google Street View (GSV) images to overcome impediments to conducting audits (e.g., time, safety, and expert labor cost). The EfficientNETB5 architecture was used to build deep-learning models for eight micro-scale features guided by the Microscale Audit of Pedestrian Streetscapes-Mini tool: sidewalks, sidewalk buffers, curb cuts, zebra and line crosswalks, walk signals, bike symbols, and streetlights. We used a train--correct loop, whereby images were trained on a training dataset, evaluated using a separate validation dataset, and trained further until acceptable performance metrics were achieved. Further, we used trained models to audit participant (N=512) neighborhoods in the WalkIT Arizona trial. Correlations were explored between micro-scale features and GIS-measured- and participant reported-macro-scale walkability. Classifier precision, recall, and overall accuracy were all >84%. Total micro-scale was associated with overall macro-scale walkability (r=0.300,p<.001). Positive associations were found between model-detected and self-reported sidewalks (r=0.41,p<.001) and sidewalk buffers (r=0.26,p<.001). Computer vision model results suggest an alternative to trained human raters, allowing for audits of hundreds or thousands of neighborhoods for population surveillance or hypothesis testing.
ARTICLE | doi:10.20944/preprints202201.0054.v1
Subject: Engineering, Automotive Engineering Keywords: stylus tip center self-calibration; spherical fitting; pose domain; vision measurement system
Online: 6 January 2022 (09:47:38 CET)
Light pen 3D vision coordinate measurement systems are increasingly widely used due to their advantages, such as small size, convenient carrying and wide applicability. The posture of the light pen is an important factor affecting accuracy. The pose domain of the pen needs to be given so that the measurement system has a suitable measurement range to obtain more qualified parameters. The advantage of the self-calibration method is that the entire self-calibration process can be completed at the measurement site without any auxiliary equipment. After the system camera calibration is completed, we take several pictures of the same measurement point with different poses to obtain the conversion matrix of the picture, and then use spherical fitting, the generalized inverse method of least squares, and the principle of position invariance within the pose domain range. The combined stylus tip center self-calibration method calculates the actual position of the light pen probe. The experimental results show that the absolute error is stable below 0.0737 mm and that the relative error is stable below 0.0025 mm. The experimental results verify the effectiveness of the method; the measurement accuracy of the system can meet the basic industrial measurement requirements.
ARTICLE | doi:10.20944/preprints202108.0405.v1
Subject: Biology And Life Sciences, Anatomy And Physiology Keywords: animal welfare; pigs; deep learning; computer vision; stress detection; facial expression recognition
Online: 19 August 2021 (13:17:08 CEST)
Animal welfare is not only an ethically important consideration in good animal husbandry, but can also have a significant effect on an animal’s productivity. The aim of this paper is to show that a reduction in animal welfare, in the form of increased stress, can be identified in pigs from frontal images of the animals. We train a Convolutional Neural Network (CNN) using a leave-one-out design and show that it is able to discriminate between stressed and unstressed pigs with an accuracy of >90% in unseen animals. Grad-CAM is used to identify the animal regions used, and these support those used in manual assessments such as the Pig Grimace Scale. This innovative work paves the way for further work examining both positive and negative welfare states with a view to the development of an automated system that can be used in precision livestock farming to improve animal welfare.
ARTICLE | doi:10.20944/preprints202108.0279.v1
Subject: Medicine And Pharmacology, Ophthalmology Keywords: Glaucoma; Diabetic Retinopathy; Convolution Neural Network (CNN); Vision Loss; Blindness; Machine Learning
Online: 12 August 2021 (15:36:51 CEST)
In the last few decades, glaucoma became the second biggest leading cause of irreversible vision loss. Because of its asymptotic growth, it is not properly diagnosed until the relatively late stage. To stop the severe damage by glaucoma it is needed to detect glaucoma in its early stages. Surprisingly diabetes also be the greatest cause of glaucoma. In the modern era, artificial intelligence makes great progress in the medical image processing field. Image analysis based on machine learning gives a huge success in diagnosis glaucoma without any misdiagnosis. The aim of this proposed paper is to create an automated process that can detect glaucoma and diabetic retinopathy. Here various Machine Learning models are used and results of these methods are presented.
REVIEW | doi:10.20944/preprints202107.0202.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: colorblindness; color vision; myopia; cone photopigment; exon skipping; X-linked cone dysfunction
Online: 8 July 2021 (13:27:17 CEST)
The first step in seeing is light absorption by photopigment molecules expressed in the photore-ceptors of the retina. There are two types of photoreceptors in the human retina that are respon-sible for image formation, rods and cones. Except at very low light levels when rods are active, all vision is based on cones. Cones mediate high acuity vision and color vision. Furthermore, they are critically important in the visual feedback mechanism that regulates refractive development of the eye during childhood. The human retina contains a mosaic of three cone types, short-wavelength (S), long-wavelength (L) and middle-wavelength (M); however, the vast major-ity (~94%) are L and M cones. The OPN1LW and OPN1MW genes, located on the X-chromosome at Xq28, encode the protein component of the light-sensitive photopigments. Here we review mechanism by which splicing defects in these genes cause vision disorders.
ARTICLE | doi:10.20944/preprints202010.0455.v1
Subject: Engineering, Automotive Engineering Keywords: KINECT; industrial robot; vision system; RobotStudio; Visual Studio; gesture control; voice control
Online: 22 October 2020 (09:57:07 CEST)
The paper presents the possibility of using KINECT v2 module to control an industrial robot by means of gestures and voice commands. It describes elements of creating software for off-line and on-line robot control. The application for KINECT module was developed in C# language in Visual Studio environment, while the industrial robot control program was developed in RAPID language in RobotStudio environment. The development of a two-threaded application in RAPID language allowed to separate two independent tasks for the IRB120 robot. The main task of the robot is performed in thread no. 1 (responsible for movement). Simultaneously working thread no. 2 ensures continuous communication with the KINECT system and provides information about the gesture and voice commands in real time without any interference in thread no. 1. The applied solution allows the robot to work in industrial conditions without negative impact of communication task on the time of robot’s work cycles. Thanks to the development of a digital twin of the real robot station, tests of proper application functioning in off-line mode (without using a real robot) were conducted. Obtained results were verified online (on the real test station). Tests of correctness of gesture recognition were carried out, the robot recognized all programmed gestures. Another test carried out was the recognition and execution of voice commands. A difference in the time of task completion between the actual and virtual station was noticed - the average difference was 0.67 s. The last test carried out was to examine the impact of interference on the recognition of voice commands. With a 10dB difference between the command and noise, the recognition of voice commands was equal to 91.43%. The developed computer programs have a modular structure, which enables easy adaptation to process requirements.