ARTICLE | doi:10.20944/preprints202104.0282.v1
Subject: Keywords: OpenCV stereo-vision; low-cost stereo-vision; do it yourself stereo-vision; stereoscopic binocular vision; binocular vision; do it yourself stereo-vision; practical guide stereo-vision
Online: 12 April 2021 (12:09:38 CEST)
The paper presents an analysis of the latest developments in the field of stereo vision in the low-cost segment, both for prototypes and for industrial designs. We described the theory of stereo vision and presented information about cameras and data transfer protocols and their compatibility with various devices. The theory in the field of image processing for stereo vision processes is considered and the calibration process is described in detail. Ultimately, we presented the developed stereo vision system and provided the main points that need to be considered when developing such systems. The final, we presented software for adjusting stereo vision parameters in real-time in the python language in the Windows operating system.
Online: 11 March 2020 (16:00:46 CET)
The Avian retina is far less known than that of mammals such as mouse and macaque, and detailed study is overdue. The chicken (Gallus gallus) has potential as a model, in part because research can build on developmental studies of the eye and nervous system. One can expect differences between bird and mammal retinas simply because whereas most mammals have three types of visual photoreceptor birds normally have six. Spectral pathways and colour vision are of particular interest, because filtering by oil droplets narrows cone spectral sensitivities and birds are probably tetrachromatic. The number of receptor inputs is reflected in the retinal circuitry. The chicken probably has four types of horizontal cell, there are at least 11 types of bipolar cell, often with bi- or tri-stratified axon terminals, and there is a high density of ganglion cells, which make complex connections in the inner plexiform layer. In addition, there is likely to be retinal specialisation, for example chicken photoreceptors and ganglion cells have separate peaks of cell density in the central and dorsal retina, which probably serve different types of behaviour.
ARTICLE | doi:10.20944/preprints202110.0363.v1
Subject: Engineering, Other Keywords: Oil spills; synthetic aperture radar (SAR); deep convolutional neural networks (DCNNs); vision transformers (ViTs); deep learning; semantic segmentation; marine pollution; remote sensing
Online: 25 October 2021 (15:42:36 CEST)
Oil spillage over a sea or ocean’s surface is a threat to marine and coastal ecosystems. Spaceborne synthetic aperture radar (SAR) data has been used efficiently for the detection of oil spills due to its operational capability in all-day all-weather conditions. The problem is often modeled as a semantic segmentation task. The images need to be segmented into multiple regions of interest such as sea surface, oil spill, look-alikes, ships and land. Training of a classifier for this task is particularly challenging since there is an inherent class imbalance. In this work, we train a convolutional neural network (CNN) with multiple feature extractors for pixel-wise classification; and introduce to use a new loss function, namely ‘gradient profile’ (GP) loss, which is in fact the constituent of the more generic Spatial Profile loss proposed for image translation problems. For the purpose of training, testing and performance evaluation, we use a publicly available dataset with selected oil spill events verified by the European Maritime Safety Agency (EMSA). The results obtained show that the proposed CNN trained with a combination of GP, Jaccard and focal loss functions can detect oil spills with an intersection over union (IoU) value of 63.95%. The IoU value for sea surface, look-alikes, ships and land class is 96.00%, 60.87%, 74.61% and 96.80%, respectively. The mean intersection over union (mIoU) value for all the classes is 78.45%, which accounts for a 13% improvement over the state of the art for this dataset. Moreover, we provide extensive ablation on different Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) based hybrid models to demonstrate the effectiveness of adding GP loss as an additional loss function for training. Results show that GP loss significantly improves the mIoU and F1 scores for CNNs as well as ViTs based hybrid models. GP loss turns out to be a promising loss function in the context of deep learning with SAR images.
ARTICLE | doi:10.20944/preprints201903.0155.v1
Subject: Physical Sciences, Applied Physics Keywords: light pollution, vision, non-vision, DSLRs, ISS
Online: 14 March 2019 (15:44:08 CET)
Night-time lights interact with human physiology through different pathways starting at the retinal layers of the eye, from the signals provided by the rods, the S-, L- and M-cones, and the intrinsically photosensitive retinal ganglion cells (ipRGC). These individual photic channels combine in complex ways to modulate important physiological processes, among them the daily entrainment of the neural master oscillator that regulates circadian rhythms. Evaluating the relative excitation of each type of photoreceptor generally requires full knowledge of the spectral power distribution of the incoming light, information that is not easily available in many practical applications. One such instance is wide area sensing of public outdoor lighting; present-day radiometers onboard Earth-orbiting platforms with sufficient nighttime sensitivity are generally panchromatic and lack the required spectral discrimination capacity. In this paper we show that RGB imagery acquired with off-the-shelf digital single-lens reflex cameras (DSLR) can be a useful tool to evaluate, with reasonable accuracy and high angular resolution, the photoreceptoral inputs associated with a wide range of lamp technologies. The method is based on linear regressions of these inputs against optimum combinations of the associated R, G, and B signals, built for a large set of artificial light sources by means of synthetic photometry. Given the widespread use of RGB imaging devices, this approach is expected to facilitate the monitoring of the physiological effects of light pollution, from ground and space alike, using standard imaging technology.
ARTICLE | doi:10.20944/preprints202105.0119.v1
Subject: Engineering, Automotive Engineering Keywords: Autonomous Driving; Environment Perception; Grid Mapping; Stereo Vision; Mococular Vision
Online: 6 May 2021 (17:24:09 CEST)
Accurately estimating the current state of local traffic scenes is one of the key problems in the development of software components for automated vehicles. In addition to details on free space and drivability, static and dynamic traffic participants, information on the semantics may also be included in the desired representation. Multi-layer grid maps allow to include all this information in a common representation. However, most existing grid mapping approaches only process range sensor measurements such as LIDAR and Radar and solely model occupancy without semantic states. In order to add sensor redundancy and diversity it is desired to add vision based sensor setups in a common grid map representation. In this work, we present a semantic evidential grid mapping pipeline including estimates for eight semantic classes that is designed for straightforward fusion with range sensor data. Unlike in other publication our representation explicitly models uncertainties in the evdiential model. We present results of our grid mapping pipeline based on a monocular vision setup and a stereo vision setup. Our mapping resulsts are accurate and dense mapping due to the incorporation of a disparity- or depth-based ground surface estimation in the inverse perspective mapping. We conclude this paper by providing a detailed quantitative evaluation based on real traffic scenarios in the Kitti odometry benchmark and demonstrating the advantages compared to other semantic grid mapping approaches.
ARTICLE | doi:10.20944/preprints202012.0403.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Computer vision; performance metrics; Yolo3; AWS Rekognition; Azure Computer Vision
Online: 16 December 2020 (10:37:47 CET)
Computer vision is considered as an ally to solve business problems that require human intervention, intelligence and criteria. This topic of research has evolved in XXI century at faster peace, delivering various alternatives from open source until commercial platforms. With so many options and market growing, it result difficult to make a decision on which one to use, or even worse, realize it was not suited for different scenarios. In this paper we analyze five options selected arbitrarily and tested on a dataset of 755 images to detect persons in an image, using object detectors. We analyze elapsed time to process an image, error with observations by humans, number of persons detected, correlation of time and person density, object detected size and F1 Score, considering precision and recall. As we found there are score ties and similar behaviors among options available, we introduce a novel index that takes in consideration the number of persons and their pixel size, to propose the Vision Acuity Index of Computer Vision. The results demonstrate this is a good option to serve as indicator to make decisions. Also, this index proposed have a potential to be expanded for different business use cases, and to measure new proposed algorithms in the future along with the traditional metrics used previously.
CASE REPORT | doi:10.20944/preprints202011.0397.v1
Online: 16 November 2020 (08:30:08 CET)
A 31-year-old male noticed blurred vision in his right eye for five days with no obvious predisposing causes, accompanied by mild dizziness. No obvious nodular lesions were found in the body. The patient’s binocular visual acuity was 20/20. Fundus photography showed the optic nerve swelling and radial superficial retinal hemorrhage of both eyes. Blood panel, urine routine, liver, and kidney function were all normal. Total cholesterol, triglycerides, high-density lipoprotein, and low-density lipoprotein were all in the normal limits. Head MRI showed a mass in the right temporal lobe, clear boundary, and multiple separations, which thinned and disappeared closer to the skull. The right temporal lobe and lateral ventricle were all compressed, with the midline structure shifted to the left. The patient was then transferred to Neurosurgery. During the operation, we observed the tumor had invaded the skull. The actual size of the tumor was 5.6 cm × 7.5 cm × 10.1 cm. Histology revealed foam cell accumulation in the mucous connective tissue of the right temporal lobe. The immunohistochemistry showed: CD34 (+), CD99 (+), EMA (−), GFAP (−), IDH-1 (−), Ki-67 (+) index about 10%, Oliga-2 (−), PR (−), S-100 (−), Vim (+), β-Catenin (+), CD1a (−), CD68 (+). Three months after the removal of the tumor, the visual acuity of both eyes was 20/20; The visual fields were normal, the optic disc edema and retinal hemorrhages had disappeared. MRI indicated the midline structure was back to normal.
ARTICLE | doi:10.20944/preprints201912.0116.v1
Online: 9 December 2019 (04:05:48 CET)
Many accidents, such as those involving collisions or trips, appear to involve failures of vision; but the association between accident risk and vision as conventionally assessed, is weak or absent. We addressed this conundrum by embracing the distinction inspired by neuroscientific research, between vision for perception and vision for action. A dual-process perspective predicts that accident vulnerability will be associated more strongly with vision for action than vision for perception. Older and younger adults, with relatively high and relatively low self-reported accident vulnerability (Accident Proneness Questionnaire), completed three behavioural assessments targeting: vision for perception (Freiburg Visual Acuity Test); vision for action (Vision for Action Test - VAT); and the ability to perform physical actions involving balance, walking and standing (Short Physical Performance Battery). Accident vulnerability was not associated with visual acuity or with performance of physical actions; but was associated with VAT performance. VAT assesses the ability to link visual input with a specific action –launching a saccadic eye movement as rapidly as possible, in response to shapes presented in peripheral vision. The predictive relationship between VAT performance and accident vulnerability was independent of age, visual acuity and physical performance scores. Applied implications of these findings are considered.
ARTICLE | doi:10.20944/preprints202212.0221.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Computer vision; Deep learning; Image classification; Loss functions; Vision Transformers; Weather detection
Online: 13 December 2022 (02:30:49 CET)
There is great interest in automatically detecting road weather and understanding its impacts on the overall safety of the transport network. This can, for example, support road condition-based maintenance or even serve as detection systems that assist safe driving during adverse climate conditions. In computer vision, previous work has demonstrated the effectiveness of deep learning in predicting weather conditions from outdoor images. However, training deep learning models to accurately predict weather conditions using real-world road-facing images is difficult due to: (1) the simultaneous occurrence of multiple weather conditions; (2) imbalanced occurrence of weather conditions throughout the year; and (3) road idiosyncrasies, such as road layouts, illumination, road objects etc. In this paper, we explore the use of focal loss function to force the learning process to focus on weather instances that are hard to learn with the objective to help address data imbalance. In addition, we explore the attention mechanism for pixel based dynamic weight adjustment to handle road idiosyncrasies using state-of-the-art vision transformer models. Experiments with a novel multi-label road weather dataset show that focal loss significantly increases the accuracy of computer vision approaches for imbalanced weather conditions. Furthermore, vision transformers outperforms current state-of-the-art convolutional neural networks in predicting weather conditions with a validation accuracy of 92% and F1-score of 81.22%, which is impressive considering the imbalanced nature of the dataset.
ARTICLE | doi:10.20944/preprints202301.0490.v1
Subject: Biology, Animal Sciences & Zoology Keywords: navigation; behavior; proprioception; pectines; vision
Online: 27 January 2023 (06:24:23 CET)
Many sand scorpions are faithful to burrows they dig, however, it is unknown how these animals get back home after hunting excursions. Of the many mechanisms of homing that exist, path integration (PI) is one of the more common tools used by arachnids. In PI, an animal integrates its distance and direction while leaving its home, enabling it to compute an approximate home-bound vector for the return trip. The objective of our study was to test whether scorpions use PI to return home under absolute darkness in the lab. We first allowed animals to establish burrows in homing arenas. Then, after they left their burrow, we recorded the scorpion’s location in the homing arena before we transferred it to the center of a testing arena. We used overhead IR cameras to record its movements in the testing arena. If scorpions exhibited PI, we predicted they would follow a vector in the test arena that approximated the same angle and distance from the capture point to their burrow in their home arena. However, under the conditions of this experiment, we found no evidence that scorpions moved along such home-bound vectors. We speculate that scorpions may need a reliable reference cue to accommodate path integration.
ARTICLE | doi:10.20944/preprints202207.0021.v1
Subject: Life Sciences, Biotechnology Keywords: artificial neural networks; biological neural networks; cortical prosthetic vision; machine vision; neuromorphic hardware; neuroprosthesis
Online: 1 July 2022 (17:01:32 CEST)
Sense element engagement theory explains how neural networks produce cortical prosthetic vision. A major prediction of the theory can be tested by developing a device which is expected to enable perception of continuous forms in altered visual geometries. The research reported here completes several essential steps in developing this device: (1) replication of simulations that are consistent with the theory using the NEST simulator, which can also be used for full-scale network emulation by a neuromorphic computer; (2) testing whether results consistent with the theory survive increasing the scale and duration of simulations; (3) establishing a method that uses numbers of spikes produced by network neurons to report the number of phosphenes produced by cortical stimulation; and (4) simulating essential functions of the prosthetic device. NEST simulations replicated early results and increasing their scale and duration produced results consistent with the theory. A decision function created using multinomial logistic regression correctly classified the expected number of phosphenes for 2080 spike number distributions for each of three sets of data, half of which arise from simulations expected to yield continuous visual forms on an altered visual geometry. A process for modulating electrical stimulation amplitude based on intermittent population recordings that is predicted to produce continuous visual forms was successfully simulated. The classification function developed using logistic regression will be used to tune this process as the scale of simulations is further increased.
BRIEF REPORT | doi:10.20944/preprints202207.0419.v1
Subject: Mathematics & Computer Science, Other Keywords: computer vision; deep learning; CoughNet model
Online: 27 July 2022 (10:01:54 CEST)
To solve two key problems in the identification of people who are infected with COVID-19: the first problem is that the identification accuracy is not high enough. The second problem is that present identification method such as nucleic acid testing is expensive in many countries. Methods: So, I decided to design a fast identification method for COVID-19 patients which is based on deep learning. After the model (CoughNet) learns more than 6,000 cough spectrograms of both COVID-19 patients and normal people, the accuracy rate of identification of COVID-19 patients and normal people is higher than 99% in the test set. Structure: This paper is mainly divided into three parts: the first part introduces the main background and research status of the research; The second part introduces the research methods; The third part introduces the specific process of the experiment.
ARTICLE | doi:10.20944/preprints202010.0167.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: daltonisation; colour vision deficiencies; anisotropic diffusion
Online: 8 October 2020 (09:33:55 CEST)
Daltonisation refers to the recolouring of images such that details normally lost by colour vision deficient observers become visible. This comes at the cost of introducing artificial colours. In a previous work, we presented a gradient-domain colour image daltonisation method that outperformed previously known methods both in behavioural and psychometric experiments. In the present paper, we improve the method by (i) finding a good first estimate of the daltonised image, thus reducing the computational time significantly, and (ii) introducing local linear anisotropic diffusion, thus effectively removing the halo artefacts. The method uses a colour vision deficiency simulation algorithm as an ingredient, and can thus be applied for any colour vision deficiency, and can even be individualised if the exact individual colour vision is known.
REVIEW | doi:10.20944/preprints202003.0076.v2
Online: 19 April 2020 (08:06:38 CEST)
Salamanders have been habitual residents of research laboratories for more than a century, and their history in science is tightly interwoven with vision research. Nevertheless, many vision scientists – even those working with salamanders – may be unaware of how much our knowledge about vision, and particularly the retina, has been shaped by studying salamanders. In this review, we take a tour through the salamander history in vision science, highlighting the main contributions of salamanders to our understanding of the vertebrate retina. We further point out specificities of the salamander visual system and discuss the perspectives of this animal system for future vision research.
REVIEW | doi:10.20944/preprints201811.0498.v1
Subject: Biology, Animal Sciences & Zoology Keywords: color vision; cone photoreceptors; opponency; retina
Online: 20 November 2018 (11:14:49 CET)
Vertebrate color vision is evolutionarily ancient. Jawless fish evolved four main spectral types of cone photoreceptor, almost certainly complemented by retinal circuits to process chromatic opponent signals. Subsequent evolution of photoreceptors and visual pigments are now documented for many vertebrate lineages and species, giving insight into evolutionary variation and ecological adaptation of color vision. We look at organization of the photoreceptor mosaic and the functions different types of cone in teleost fish, primates, and birds and reptiles. By comparison less is known about the underlying neural processing. Here we outline the diversity of vertebrate color vision and summarize our understanding of how spectral information picked up by animal photoreceptor arrays is adapted to natural signals. We then turn to the question of how spectral information is processed in the retina. Here, the quite well known and comparatively ‘simple’ system of mammals such as mice and primates reveals some evolutionarily conserved features such as the mammalian BlueON system which compares short and long wavelength receptors signals. We then survey our current understanding of the more complex circuits of fish, amphibians, birds and reptiles. Together, these clades make up more than 90% of vertebrate species, yet we know disturbingly little about their neural circuits for colour vision beyond the photoreceptors. Here, long-standing work on goldfish, freshwater turtles and other species is being complemented by new insights gained from the experimentally amendable retina of zebrafish. From this body of work, one thing is clear: The retinal basis of colour vision in non-mammalian vertebrates is substantially richer compared to mammals: Diverse and complex spectral tunings are established at the level of the cone output via horizontal cell feedforward circuits. From here, zebrafish use cone-selective wiring in bipolar cells to set-up color opponent synaptic layers in the inner retina, which in turn lead a large diversity of color-opponent channels for transmission to the brain. However, while we are starting to build an understanding of the richness of spectral properties in some of these species’ retinal neurons, little is known about inner retinal connectivity and cell-type identify. To gain an understanding of their actual circuits, and thus to build a more generalised understanding of the vertebrate retinal basis of color vision, it will be paramount to expand ongoing efforts in deciphering the retinal circuits of non-mammalian models.
ARTICLE | doi:10.20944/preprints202206.0426.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: event-based vision; object detection and tracking; high-temporal resolution tracking; frame-based vision; hybrid approach
Online: 30 June 2022 (09:54:14 CEST)
Event-based vision is an emerging field of computer vision that offers unique properties such as asynchronous visual output, high temporal resolutions, and dependence on brightness changes to generate data. These properties can enable robust high-temporal-resolution object detection and tracking when combined with frame-based vision. In this paper, we present a hybrid, high-temporal-resolution, object detection and tracking approach, that combines learned and classical methods using synchronized images and event data. Off-the-shelf frame-based object detectors are used for initial object detection and classification. Then, event masks, generated per each detection, are used to enable inter-frame tracking at varying temporal resolutions using the event data. Detections are associated across time using a simple low-cost association metric. Moreover, we collect and label a traffic dataset using the hybrid sensor DAVIS 240c. This dataset is utilized for quantitative evaluation using state-of-the-art detection and tracking metrics. We provide ground truth bounding boxes and object IDs for each vehicle annotation. Further, we generate high-temporal-resolution ground truth data to analyze the tracking performance at different temporal rates. Our approach shows promising results with minimal performance deterioration at higher temporal resolutions (48 – 384 Hz) when compared with the baseline frame-based performance at 24 Hz.
ARTICLE | doi:10.20944/preprints202007.0326.v1
Subject: Engineering, Control & Systems Engineering Keywords: mobile robot; vision-based navigation; cascade classifiers
Online: 15 July 2020 (09:16:44 CEST)
This work presents the development and implementation of a distributed navigation system based on computer vision. The autonomous system consists of a wheeled mobile robot with an integrated colour camera. The robot navigates through a laboratory scenario where the track and several traffic signals must be detected and recognized by using the images acquired with its on-board camera. The images are sent to a computer server that processes them and calculates the corresponding speeds of the robot using a cascade of trained classifiers. These speeds are sent back to the robot, which acts to carry out the corresponding manoeuvre. The classifier cascade should be trained before experimentation with two sets of positive and negative images. The number of images in these sets should be considered to limit the training stage time and avoid overtraining the system.
ARTICLE | doi:10.3390/sci2010008
Online: 2 March 2020 (00:00:00 CET)
In recent years, a range of problems under the broad umbrella of computer vision based analysis of ancient coins have been attracting an increasing amount of attention. Notwithstanding this research effort, the results achieved by the state of the art in published literature remain poor and far from sufficiently well performing for any practical purpose. In the present paper we present a series of contributions which we believe will benefit the interested community. We explain that the approach of visual matching of coins, universally adopted in existing published papers on the topic, is not of practical interest because the number of ancient coin types exceeds by far the number of those types which have been imaged, be it in digital form (e.g., online) or otherwise (traditional film, in print, etc.). Rather, we argue that the focus should be on understanding the semantic content of coins. Hence, we describe a novel approach—to first extract semantic concepts from real-world multimodal input and associate them with their corresponding coin images, and then to train a convolutional neural network to learn the appearance of these concepts. On a real-world data set, we demonstrate highly promising results, correctly identifying a range of visual elements on unseen coins with up to 84% accuracy.
CONCEPT PAPER | doi:10.20944/preprints201910.0059.v1
Subject: Medicine & Pharmacology, Ophthalmology Keywords: myopia progression; environmental factors; vision care knowledge
Online: 7 October 2019 (10:55:03 CEST)
Importance: Because of the high prevalence of myopia in Taiwan, understanding the risk factors for its development and progression is important to public health. Background: This study investigated the risk factors for myopia and their influence on the progression of myopia in schoolchildren in Taiwan. Design: Patients’ clinical records were obtained retrospectively from ophthalmologists. Questionnaires were given to collect demographic information, family background, hours spent on daily activities, myopia progression, and treatment methods. Participants: A total of 522 schoolchildren with myopia from a regional medical hospital in northern Taiwan participated the study. Written informed consent was obtained from the participants of legal age or the parents or legal guardians. Methods: Multivariable regression analyses were performed. Myopia measured in dioptres was analysed, controlling for patients’ family and demographic information as well as their daily behaviours. Main Outcome Results: Children with high myopic parents were more myopic. Earlier onset age of myopia was associated with a higher level of myopia and greater annual myopic progression. Children reporting more near work activities had higher levels of myopia and greater progression of myopia. Lower levels of myopia were associated with more exercise, longer periods of sleep, and better vision care knowledge in children and parents. Intake of food supplements had no effect on myopia. Conclusions and Relevance: In addition to genetics, education, environment, and near work activity can influence the development of myopia. Health policies for schoolchildren should promote protective activities and vision care knowledge in order to protect the eyesight of schoolchildren.
ARTICLE | doi:10.20944/preprints201906.0105.v1
Subject: Biology, Plant Sciences Keywords: image analysis; machine learning; algorithms; computer vision
Online: 12 June 2019 (12:39:18 CEST)
Spike shape and morphometric characteristics are among the key characteristics of cultivated cereals associated with their productivity. Identification of the genes controlling these traits requires morphometric data at harvesting and analysis of numerous plants, which could be automatically done using technologies of digital image analysis. A method for wheat spike morphometry utilizing 2D image analysis is proposed. Digital images are acquired in two variants: a spike on a table (one projection) or fixed with a clip (four projections). The method identifies spike and awns in the image and estimates their quantitative characteristics (area in image, length, width, circularity, etc.). Section model, quadrilaterals, and radial model are proposed for describing spike shape. Parameters of these models are used to predict spike shape type (spelt, normal, or compact) by machine learning. The mean error in spike density prediction for the images in one projection is 4.61 (~18%) versus 3.33 (~13%) for the parameters obtained using four projections.
ARTICLE | doi:10.20944/preprints201812.0232.v1
Subject: Engineering, Other Keywords: Computer vision, Data Augmentation, Fine- Tuning, Imagenet
Online: 19 December 2018 (07:57:03 CET)
In this paper, we leverage state of the art models on Imagenet data-sets. We use the pre-trained model and learned weighs to extract the feature from the Dog breeds identification data-set. Afterwards, we applied fine-tuning and dataaugmentation to increase the performance of our test accuracy in classification of dog breeds datasets. The performance of the proposed approaches are compared with the state of the art models of Image-Net datasets such as ResNet-50, DenseNet-121, DenseNet-169 and GoogleNet. we achieved 89.66% , 85.37% 84.01% and 82.08% test accuracy respectively which shows the superior performance of proposed method to the previous works on Stanford dog breeds datasets.
ARTICLE | doi:10.20944/preprints201806.0449.v1
Subject: Engineering, Control & Systems Engineering Keywords: surface electromyography; computer vision; grasping; assistive robotics
Online: 27 June 2018 (15:01:06 CEST)
This paper presents a system that merges computer vision and surface electromyography techniques to carry out grasping tasks. To perform this, the vision-driven system is used to compute pre-grasping poses of the robotic system based on the analysis of tridimensional object features. Then, the human operator can correct the pre-grasping pose of the robot using surface electromyographic signals from the forearm during wrist flexion and extension. Weak wrist flexions and extensions allow a fine adjustment of the robotic system to grasp the object and finally, when the operator considers that the grasping position is optimal, a strong flexion is performed to initiate the grasping of the object. The system has been tested with several subjects to check its performance showing a grasping accuracy of around 95% of the attempted grasps which increases by around a 9% the grasping accuracy of previous experiments in which electromyographic control was not implemented.
ARTICLE | doi:10.20944/preprints201805.0297.v1
Subject: Engineering, Civil Engineering Keywords: bridge maintenance and inspection; UAVs; machine vision
Online: 22 May 2018 (10:09:37 CEST)
The economic development and infrastructure of a nation are closely interrelated. In addition, public trust in national infrastructure facilities is closely linked to the preservation of the advantages provided by these facilities to the public. Since the 1970s, Korea has achieved exponential economic growth over a short period of time and the number of infrastructure facilities has increased correspondingly. This compressed economic development has been underpinned by the national infrastructure, whose safety and usability have been excluded from the scope of the development. However, after around 30 years, structural deterioration coupled with general insensitivity to safety in today’s society has considerably reduced public trust in using the infrastructure. Realistically, policies that mainly focus on developing new technologies related to infrastructure construction have led to practical limitations that discourage the development of technologies for maintenance or inspection. Furthermore, current maintenance works face certain limitations caused by various reasons: insufficient budget, increasing number of infrastructure facilities requiring maintenance, shortage of manpower, and rapidly increasing number of aging infrastructure facilities. To overcome these limitations, a new approach is required that is different from general inspection methods under the existing rules and regulations. In this context, this study aimed to explore the efficiency of bridge inspection and maintenance by unmanned aerial vehicles (UAVs) that could observe inaccessible areas, could be conveniently and easily controlled, and could offer high economic benefits. To this end, various tests were performed on elevated bridges, and suitable UAV images were obtained. The obtained UAV images were inspected by using machine vision technology, thereby excluding subjective evaluations by humans. Methods for enhancing the objectivity of the inspection were also discussed. The test results showed that both the efficiency and objectivity of the proposed method were better than those of the existing bridge maintenance and inspection methods.
TECHNICAL NOTE | doi:10.20944/preprints202209.0380.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: UAV Technology; Information Processing; Machine Learning; IR Technology; Covid-19; Multi-modal machine learning; Machine vision; Computer vision
Online: 26 September 2022 (05:46:03 CEST)
Tracking and early identification of suspected cases are essential to control and prevent potential COVID-19 outbreaks. One of the most popular techniques used to track this disease is the use of Infrared cameras to identify individuals with elevated body temperatures. However, they are limited by their inability to be implemented in open public settings such as public parks or even outdoor recreational centers. This limits their ability to effectively track possible COVID-19 patients as open public recreational places such as parks, concert venues and other public venues are hotspots for the spreading of the virus. Other technological solutions such as thermal scanners require an individual to perform the actual testing as they are not individual standalone technologies. This method of testing can potentially cause the transmission of the virus between the tester and the individual getting tested. As can be seen, an alternative solution is essential to solving this issue. In this study, we aim to present the system, design and potential scope of a non-invasive system that can diagnose and identify potential COVID-19 patients using thermal and optical images of the individual using drone technology. The proposed system (COVIDRONE) combines multi-modal machine intelligence, computer vision and real-time monitoring to enable scalable monitoring. The system will also involve the use of machine learning algorithms for better and more accurate diagnosis. We envisage that development of such technologies may help in developing technological solutions to combat infectious disease threats in the future pandemics.
ARTICLE | doi:10.20944/preprints202206.0148.v1
Subject: Arts & Humanities, Architecture And Design Keywords: false-class inclusions; serendipity; machine vision; creativity; innovativeness
Online: 10 June 2022 (04:35:14 CEST)
In the mid-layers of Deep Learning systems, clustered features tend to fit multiple classifications, which are filtered out during the final stages of object recognition. However, many misclassifications remain, regarded as errors of the system. This paper claims that tagging an entity incorrectly for reasons of similarity is evidence of spontaneous machine creativeness. According to the ratings of 40 design educators and researchers, AI-generated false-class inclusions produced creative design ideas, predicting the level of innovation value. These designers were not just anybody but came from a design school in Asia with a top position on the world ranking-lists. They entered an experiment in which 20 classification mistakes were framed as early-design ideas that were either human-made or intentionally suggested by creative AI. Many examples passed the Feigenbaum variant of the Turing test with a conceptual preference to creations supposedly done by human hand.
ARTICLE | doi:10.3390/sci2010013
Online: 12 March 2020 (00:00:00 CET)
In this paper, our goal is to perform a virtual restoration of an ancient coin from its image. The present work is the first one to propose this problem, and it is motivated by two key promising applications. The first of these emerges from the recently recognised dependence of automatic image based coin type matching on the condition of the imaged coins; the algorithm introduced herein could be used as a pre-processing step, aimed at overcoming the aforementioned weakness. The second application concerns the utility both to professional and hobby numismatists of being able to visualise and study an ancient coin in a state closer to its original (minted) appearance. To address the conceptual problem at hand, we introduce a framework which comprises a deep learning based method using Generative Adversarial Networks, capable of learning the range of appearance variation of different semantic elements artistically depicted on coins, and a complementary algorithm used to collect, correctly label, and prepare for processing a large numbers of images (here 100,000) of ancient coins needed to facilitate the training of the aforementioned learning method. Empirical evaluation performed on a withheld subset of the data demonstrates extremely promising performance of the proposed methodology and shows that our algorithm correctly learns the spectra of appearance variation across different semantic elements, and despite the enormous variability present reconstructs the missing (damaged) detail while matching the surrounding semantic content and artistic style.
Subject: Behavioral Sciences, Cognitive & Experimental Psychology Keywords: eyetracking, eye movements, gaze, memory, retrieval, vision, aging
Online: 20 May 2019 (12:25:44 CEST)
Eye movements support memory encoding by binding distinct elements of the visual world into coherent representations. However, the role of eye movements in memory retrieval is less clear. We propose that eye movements play a functional role in retrieval by reinstating the encoding context. By overtly shifting attention in a manner that broadly recapitulates the spatial locations and temporal order of encoded content, eye movements facilitate access to, and reactivation of, associated details. Such mnemonic gaze reinstatement may be obligatorily recruited when task demands exceed cognitive resources, as is often observed in older adults. We review research linking gaze reinstatement to retrieval, describe the neural integration between the oculomotor and memory systems, and discuss implications for models of oculomotor control, memory, and aging.
ARTICLE | doi:10.20944/preprints201711.0021.v1
Subject: Engineering, Other Keywords: calibration; binocular vision sensor; unknown-sized elliptical stripe
Online: 2 November 2017 (17:37:06 CET)
Most of the existing calibration methods for binocular stereo vision sensor (BSVS) depend on high-accuracy target with feature points that are difficult to manufacture and costly. In complex light conditions, optical filters are used for BSVS, but they affect imaging quality. Hence, the use of a high-accuracy target with certain-sized feature points for calibration is not feasible under such complex conditions. To solve these problems, a calibration method based on unknown-sized elliptical stripe images is proposed. With known intrinsic parameters, the proposed method adopts the elliptical stripes located on the parallel planes as a medium to calibrate BSVS online. In comparison with the common calibration methods, the proposed method avoids utilizing high-accuracy target with certain-sized feature points. Therefore, the proposed method is not only easy to implement but is a realistic method for the calibration of BSVS with optical filter. Changing the size of elliptical curves projected on the target solves the difficulty of applying the proposed method in different fields of view and distances. Simulative and physical experiments are conducted to validate the efficiency of the proposed method. When the field of view is approximately 400 mm × 300 mm, the proposed method can reach a calibration accuracy of 0.03 mm, which is comparable with that of Zhang’s method.
ARTICLE | doi:10.20944/preprints201608.0186.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: active vision; the conformal camera; the Riemann sphere; Möbius geometry; complex projective geometry; projective Fourier transform; retinotopy; binocular vision; horopter
Online: 20 August 2016 (11:24:25 CEST)
Primate vision is an active process that constructs a stable internal representation of the 3D world based on 2D sensory inputs that are inherently unstable due to incessant eye movements. We present here a mathematical framework for processing visual information for a biologically-mediated active vision stereo system with asymmetric conformal cameras. This model utilizes the geometric analysis on the Riemann sphere developed in the group-theoretic framework of the conformal camera, thus far only applicable in modeling monocular vision. The asymmetric conformal camera model constructed here includes the fovea’s asymmetric displacement on the retina and the eye’s natural crystalline lens tilt and decentration, as observed in ophthalmological diagnostics. We extend the group-theoretic framework underlying the conformal camera to the stereo system with asymmetric conformal cameras. Our numerical simulation shows that the 1 theoretical horopter curves in this stereo system are conics that well approximate the empirical longitudinal horopters of the primate vision system.
ARTICLE | doi:10.20944/preprints202210.0366.v1
Subject: Mathematics & Computer Science, Other Keywords: skin segmentation; skin detection; computer vision; digital image processing
Online: 24 October 2022 (12:50:24 CEST)
A single paragraph of about 200 words maximum. For research articles, abstracts should give a pertinent overview of the work. We strongly encourage authors to use the following style of structured abstracts, but without headings: (1) Background: place the question addressed in a broad context and highlight the purpose of the study; (2) Methods: describe briefly the main methods or treatments applied; (3) Results: summarize the article’s main findings; (4) Conclusions: indicate the main conclusions or interpretations. The abstract should be an objective representation of the article, it must not contain results which are not presented and substantiated in the main text and should not exaggerate the main conclusions.
ARTICLE | doi:10.20944/preprints202204.0177.v1
Subject: Biology, Agricultural Sciences & Agronomy Keywords: Plant disease; Machine vision; UAV; Smartphone; Convolutional Neural Network
Online: 19 April 2022 (07:44:29 CEST)
Stripe rust (caused by Puccinia striiformis f. sp. tritici) is one of the most devastating diseases of wheat and causes large-scale epidemics and severe yield loss. Applying fungicides during early epidemic development is crucial to controlling the disease but is often challenged by resource-limited human visual scouting. Deep learning has the potential to process images and videos captured from affordable devices to empower high-throughput phenotyping for early detection of stripe rust for timely application of fungicides and improve control efficiency. Here, we developed RustNet, a neural network-based image classifier, for efficiently monitoring fields for stripe rust. RustNet was built on a ResNet-18 architecture pre-trained with ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) dataset using transfer learning. RGB images and videos of multiple wheat fields with different wheat types (winter and spring wheat), conditions (irrigated and non-irrigated), and locations were acquired using smartphones or unmanned aerial vehicles near the canopy. A semi-automated image labeling approach was conducted to improve labeling efficiency by combining automated machine labeling and human correction. Cross-validations across multiple categories (sensor platforms, wheat types, and locations) achieved Area Under Curve from 0.72 to 0.87. Independent validation on a published dataset from Germany achieved accuracies ranging from 0.79 to 0.86. The visualization of the last convolutional layer of RustNet demonstrated the identification of pixels with stripe rust. RustNet is freely available at https://zzlab.net/RustNet.
ARTICLE | doi:10.20944/preprints202112.0349.v2
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: yak; semantic segmentation; binocular vision; body size; weight stimation
Online: 9 March 2022 (10:02:00 CET)
In order to solve the labor-intensive and time-consuming problem in the process of measuring yak body ruler and weight in yak breeding industry in Qinghai Province, a non-contact method for measuring yak body ruler and weight was proposed in this experiment, and key technologies based on semantic segmentation, binocular ranging and neural network algorithm were studied to boost the development of yak breeding industry in Qinghai Province. Main conclusions: (1) Study yak foreground image extraction, and implement yak foreground image extraction model based on U-net algorithm; select 2263 yak images for experiment, and verify that the accuracy of the model in yak image extraction is over 97%. (2) Develop an algorithm for estimating yak body ruler based on binocular vision, and use the extraction algorithm of yak body ruler related measurement points combined with depth image to estimate yak body ruler. The final test shows that the average estimation error of body height and body oblique length is 2.6%, and the average estimation error of chest depth is 5.94%. (3) Study the yak weight prediction model; select the body height, body oblique length and chest depth obtained by binocular vision to estimate the yak weight; use two algorithms to establish the yak weight prediction model, and verify that the average estimation error of the model for yak weight is 10.7% and 13.01% respectively.
ARTICLE | doi:10.20944/preprints202111.0182.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: AHRS; Computer Vision; Dataset Acquisition; Deep Learning; Orientation Estimation.
Online: 9 November 2021 (14:35:21 CET)
The use of Attitude and Heading Reference Systems (AHRS) for orientation estimation is now common practice in a wide range of applications, e.g., robotics and human motion tracking, aerial vehicles and aerospace, gaming and virtual reality, indoor pedestrian navigation and maritime navigation. The integration of the high-rate measurements can provide very accurate estimates, but these can suffer from errors accumulation due to the sensors drift over longer time scales. To overcome this issue, inertial sensors are typically combined with additional sensors and techniques. As an example, camera-based solutions have drawn a large attention by the community, thanks to their low-costs and easy hardware setup; moreover, impressive results have been demonstrated in the context of Deep Learning. This work presents the preliminary results obtained by DOES , a supportive Deep Learning method specifically designed for maritime navigation, which aims at improving the roll and pitch estimations obtained by common AHRS. DOES recovers these estimations through the analysis of the frames acquired by a low-cost camera pointing the horizon at sea. The training has been performed on the novel ROPIS dataset, presented in the context of this work, acquired using the FrameWO application developed for the scope. Promising results encourage to test other network backbones and to further expand the dataset, improving the accuracy of the results and the range of applications of the method.
ARTICLE | doi:10.20944/preprints202010.0009.v1
Subject: Behavioral Sciences, Applied Psychology Keywords: visual search; vision loss; incidental learning; macular degeneration; fovea
Online: 1 October 2020 (09:12:00 CEST)
Foveal vision loss has been shown to reduce efficient visual search guidance due to contextual cueing by incidentally learned contexts. However, previous studies used artificial (T among L-shape) search paradigms that prevent the memorization of a target in a semantically meaningful scene. Here, we investigated contextual cueing in real-life scenes that allow explicit memory of target locations in semantically rich scenes. In contrast to the contextual cueing deficits in artificial scenes, contextual cueing in patients with age-related macular degeneration (AMD) did not differ from age-matched normal-sighted controls. We discuss this in the context of visuospatial working memory demands for which both eye-movement control in the presence of central vision loss and for memory-guided search may compete. Memory-guided search in semantically rich scenes may depend less on visuospatial working memory than search in abstract displays, potentially explaining intact contextual cueing in the former but not the latter. In a practical sense, our findings may indicate that Patients with AMD are less deficient than expected after previous lab experiments. This shows the usefulness of realistic stimuli in experimental clinical research.
ARTICLE | doi:10.20944/preprints202009.0022.v1
Subject: Engineering, Control & Systems Engineering Keywords: Artificial neural network; image processing; machine vision; yield monitoring
Online: 2 September 2020 (03:21:02 CEST)
Precision agriculture is a technology used by farmers to help food sustainability amidst growing population. One of the tools of precision agriculture is yield monitoring, which helps a farmer manage his production. Yield monitoring is usually done during harvest, however it could also be done early in the growing season. Early prediction of yield, specifically for fruit trees, aids the farmer in the marketing of their product and assists in managing production logistics such as labor requirement and storage needs. In this study, a machine vision system is developed to estimate fruit yield early in the season. The machine vision system uses a color camera to capture images of fruit trees during the full bloom period. An image segmentation algorithm based on an artificial neural network was developed to recognize and count the blossoms on the tree. The artificial neural network segmentation algorithm uses color information and position as input. The resulting correlation between the blossom count and the actual number of fruits on the tree shows the potential of this method to be used for early prediction of fruit yield.
ARTICLE | doi:10.20944/preprints202006.0170.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: object detection; semantic segmentation; computer vision; automatic check-out
Online: 14 June 2020 (12:51:26 CEST)
Auto checkout has received more and more attention in recent years and this system automatically generates a shopping bill by identifying the picture of the products purchased by the customers. However, the system is challenged by the domain adaptation problem, where each image of the training set contains only one commodity, whereas the test set is a collection of multiple commodities. The existing solution to this problem is to resynthesize the training images to enhance the training set. Then the composite images are rendered using CycleGAN to make the image distribution of the training set and the test set more similar. However, we find that the detection boxes given by the ground truth of the common dataset contain a large part of the background area, the area will affect the training process as noise. To solve this problem, we propose a mask data priming method. Specifically, we redo the large scale Retail Product Checkout (RPC) dataset and add segmentation annotation information to each item in the training set image based on the original dataset using pixel-level annotation. Secondly, a new network structure is proposed in which we train the network using joint learning of detectors and counters, and fine-tune the detection network by filtering out suitable images from the test set. Experiments on the RPC dataset have shown that our method yields better results. we used an approach that reached 81.87% compared to 56.68% for the baseline approach which demonstrates that pixel-level information helps to improve the detection results of the network.
ARTICLE | doi:10.20944/preprints201905.0243.v1
Subject: Engineering, Industrial & Manufacturing Engineering Keywords: Machine Vision; Morphological image filtering; Galvanic Industry; Rear-projection.
Online: 20 May 2019 (11:46:34 CEST)
In the fashion field, the use of electroplated small metal parts such as studs, clips and buckles is widespread. The plate is often made of precious metal, such as gold or platinum. Due to the high cost of these materials, it is strategically relevant and of primary importance for manufacturers to avoid any waste by depositing only the strictly necessary amount of material. To this aim, Companies need to be aware of the overall number of items to be electroplated so that it is possible to properly set the parameters driving the galvanic process. Accordingly, the present paper describes a Machine Vision-based method able to automatically count small metal parts arranged on a galvanic frame. The devised method relies on the definition of a proper acquisition system and on the development of image processing-based routines. Such a system is then implemented on a counting machine is meant to be adopted in the galvanic industrial practice to properly define a suitable set or working parameters (such as current, voltage and deposition time) for the electroplating machine and, thereby, to assure the desired plate thickness from one side and to avoid material waste on the other.
ARTICLE | doi:10.20944/preprints201904.0175.v1
Subject: Engineering, Automotive Engineering Keywords: IMU; vision; classification networks; hough transform; lane markings detection
Online: 15 April 2019 (13:13:19 CEST)
It's challenging to achieve robust lane detection depending on single frame when considering complicated scenarios. In order to detect more credible lane markings by using sequential frames, a novel approach to fusing vision and Inertial Measurement Unit (IMU) is proposed in this paper. The hough space is employed as the space where lane markings are stored and it's calculated by three steps. Firstly, a basic hough space is extracted by Hough Transform and primary line segments are extracted from it. In order to measure the possibility about line segments belong to lane markings, a CNNs based classifier is introduced to transform the basic hough space into a probabilistic space by using the networks outputs. However, this probabilistic hough space based on single frame is easily disturbed. In the third step, a filtering process is employed to smooth the probabilistic hough space by using sequential information. Pose information provided by IMU is applied to align hough spaces extracted at different times to each other. The final hough space is used to eliminate line segments with low possibility and output those with high confidence as the result. Experiments demonstrate that the proposed approach has achieved a good performance.
ARTICLE | doi:10.20944/preprints201903.0091.v1
Subject: Biology, Agricultural Sciences & Agronomy Keywords: image analysis; pattern recognition; algorithms; computer vision; wheat spike
Online: 7 March 2019 (12:06:15 CET)
Spike shape and morphometric characteristics are among the key characteristics of cultivated cereals associated with their productivity. Identification of the genes controlling these traits requires morphometric data harvesting and analysis for numerous plants, which is automatable using technologies of digital image analysis. A method for wheat spike morphometry utilizing 2D image analysis is proposed. Digital images are acquired in two variants: a spike on a table (one projection) or fixed with a clip (four projections). The method identifies spike and awns in the image and estimates their quantitative characteristics (area in image, length, width, circularity, etc.). Models of sections, quadrilaterals, and radial model are proposed for describing spike shape. Parameters of these models are used to predict spike shape type (spelt, normal, or compact) by machine learning. The mean error in spike density prediction for the images in one projection is 4.61 (~18%) versus 3.33 (~13%) for the parameters obtained using four projections. F1 measure in automated spike classification into three types is 0.78 using logistic regression (one projection) and 0.85 using random forest method (four projections). The proposed method is implemented in Java; examples of images and user guide are available at http://wheatdb.org/werecognizer.
ARTICLE | doi:10.20944/preprints202202.0204.v1
Subject: Engineering, Biomedical & Chemical Engineering Keywords: computer vision; image processing; medication adherence; object detection; pill detection
Online: 17 February 2022 (08:45:14 CET)
Objective tools to track medication adherence are lacking. A tool to monitor pill intake that can be implemented in mHealth apps without the need for additional devices was developed. We propose a pill intake detection tool that uses digital image processing to analyze images of a blister to detect the presence of pills. The tool uses the circular Hough transform as a feature extraction technique and is therefore primarily useful for the detection of pills with a round shape. This pill detection tool is composed of two steps. First, the registration of a full blister and storing of reference values in a local database. Second, the detection and classification of taken and remaining pills in similar blisters, to determine the actual number of untaken pills. In the registration of round pills in full blisters, 100% of pills in gray blisters or blisters with a transparent cover were successfully detected. In counting of untaken pills in partially opened blisters, 95.2% of remaining and 95.1% of taken pills were detected in gray blisters, while 88.2% of remaining and 80.8% of taken pills were detected in blisters with a transparent cover. The proposed tool provides promising results for the detection of round pills. However, the classification of taken and remaining pills need to be further improved, in particular for the detection of pills with non-oval shapes.
ARTICLE | doi:10.20944/preprints202110.0020.v2
Subject: Arts & Humanities, Media Studies Keywords: journalism; artificial intelligence; computer science; machine learning; computer vision; NLP.
Online: 21 December 2021 (14:01:16 CET)
In recent years, news media has been greatly disrupted by the potential of technologically driven approaches in the creation, production, and distribution of news products and services. Artificial intelligence (AI) has emerged from the realm of science fiction and has become a very real tool that can aid society in addressing many issues, including the challenges faced by the news industry. The ubiquity of computing has become apparent and has demonstrated the different approaches that can be achieved using AI. We analyzed the news industry’s AI adoption based on the seven subfields of AI: (i) machine learning; (ii) computer vision (CV); (iii) speech recognition; (iv) natural language processing (NLP); (v) planning, scheduling, and optimization; (vi) expert systems; and (vii) robotics. Our findings suggest that three subfields are being developed more in the news media: machine learning, computer vision, as well as planning, scheduling, and optimization. Other areas have not been fully deployed in the journalistic field. Most AI news projects rely on funds from tech companies such as Google. This limits AI’s potential to a small number of players in the news industry. We make conclusions by providing examples of how these subfields are being developed in journalism and present an agenda for future research.
ARTICLE | doi:10.20944/preprints202105.0047.v1
Subject: Social Sciences, Accounting Keywords: Textual analysis; Media; Correspondence analysis; Wavelet thresholding; KSA-2030 Vision
Online: 5 May 2021 (12:28:53 CEST)
In the present paper, we propose a wavelet method to study the impact of electronic media on economic situations. We precisely apply wavelet techniques versus classical methods to analyze economic indices in the market. The technique consists firstly in filtering the data from unprecise circumstances (noise) to construct next a wavelet denoised contingency table. Next, a thresholding procedure is applied to such a table to extract the essential information porters. The resulting tables subject finally to correspondence analysis before and after thresholding. As a case of study, we are empirically concerned with the 2030 KSA vision in electronic and social media. Effects of the electronic media texts about the trading 2030 Vision on the Saudi and global economy have been studied. Recall that the Saudi market is the most important representative market in the GCC continent. It has both regional and worldwide influence on economies and besides, it is characterized by many political, economic, and financial movements such as the worldwide economic NEOM project. The findings provided in the present paper may be applied to predict future GCC markets situation and thus may be a basis for investors’ decisions in such markets.
Subject: Engineering, Biomedical & Chemical Engineering Keywords: visual cortical prosthesis; brain-machine interface; electrical stimulation; prosthetic vision
Online: 23 March 2021 (10:42:30 CET)
The electrical stimulation of the visual cortices has the potential to restore vision to blind individuals. Until now, the results of visual cortical prosthetics has been limited as no prosthesis has restored a full working vision but the field has shown a renewed interest these last years thanks to wireless and technological advances. However, several scientific and technical challenges are still open in order to achieve the therapeutic benefit expected by these new devices. One of the main challenges is the electrical stimulation of the brain itself. In this review, we analyze the results in electrode-based visual cortical prosthetics from the electrical point of view. We first briefly describe what is known about the electrode-tissue interface and safety of electrical stimulation. Then we focus on the psychophysics of prosthetic vision and the state-of-the-art on the interplay between the electrical stimulation of the visual cortex and phosphene perception. Lastly, we discuss the challenges and perspectives of visual cortex electrical stimulation and electrode array design to develop the new generation implantable cortical visual prostheses.
ARTICLE | doi:10.20944/preprints202102.0146.v1
Subject: Earth Sciences, Atmospheric Science Keywords: SO2 emissions; computer vision; time-averaged dispersion model; CrIS; JPSS
Online: 4 February 2021 (21:53:35 CET)
Long-term continuous time series of SO2 emissions are considered critical elements of both volcano monitoring and basic research into processes within magmatic systems. One highly successful framework for computing these fluxes involves reconstructing a representative time-averaged SO2 plume from which to estimate the SO2 source flux. Previous methods within this framework have used ancillary wind datasets from reanalysis or numerical weather prediction (NWP) to construct the mean plume and then again as a constrained parameter in the fitting. Additionally, traditional SO2 datasets from ultraviolet (UV) sensors lack altitude information which must be assumed to correctly calibrate the SO2 data and to capture the appropriate NWP wind level which can be a significant source of error. We have made novel modifications to this framework which do not rely on prior knowledge of the winds and therefore do not inherit errors associated with NWP winds. To perform the plume rotation, we modify a rudimentary computer vision algorithm designed for object detection in medical imaging to detect plume-like objects in gridded SO2 data. We then fit a solution to the general time-averaged dispersion of SO2 from a point source. We demonstrate these techniques using SO2 data generated by a newly developed probabilistic layer height and column loading algorithm designed for the Cross-track Infrared Sounder (CrIS), a hyperspectral infrared sensor aboard the Joint Polar Satellite System’s Suomi-NPP and NOAA-20 satellites. This SO2 data source is best suited to flux estimates at high-latitude volcanoes and at low-latitude, but high-altitude volcanoes. Of particular importance, IR SO2 data can fill an important data gap in the UV-based record: estimating SO2 emissions from high-latitude volcanoes through the polar winters when there is insufficient solar backscatter for UV sensors to be used.
ARTICLE | doi:10.20944/preprints202008.0336.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: image processing; image classification; computer vision; expert systems; amber gemstones
Online: 15 August 2020 (04:39:11 CEST)
The article describes a classification solution for amber stones. The problem of classifying amber is known for a long time among jewelers and artisans of amber art. Existing solutions can classify amber pieces according to color, but a need to classify by shape and texture is not satisfied up to now. The proposed solution is capable of classifying the gemstones according to a shape. Amber can be considered as a specific object since the form is difficult to define unambiguously. Data for amber experiments was gathered from amber art craftsmen. In the proposed solution amber form can be classified into 10 different classes (7 classes chosen during the experiment).
ARTICLE | doi:10.3390/sci2010018
Subject: Keywords: colour words; hue histogram; colour representation; machine learning; computer vision
Online: 24 March 2020 (00:00:00 CET)
Ancient numismatics, that is, the study of ancient currencies (predominantly coins), is an interesting domain for the application of computer vision and machine learning, and has been receiving an increasing amount of attention in recent years. Notwithstanding the number of articles published on the topic, the variety of different methodological approaches described, and the mounting realisation that the relevant problems in the field are most challenging indeed, all research to date has entirely ignored one specific, readily accessible modality: colour. Invariably, colour is discarded and images of coins treated as being greyscale. The present article is the first one to question this decision (and indeed, it is a decision). We discuss the reasons behind the said choice, present a case why it ought to be reexamined, and in turn investigate the issue for the first time in the published literature. Specifically, we propose two new colour-based representations specifically designed with the aim of being applied to ancient coin analysis, and argue why it is sensible to employ them in the first stages of the classification process as a means of drastically reducing the initially enormous number of classes involved in type matching ancient coins (tens of thousands, just for Ancient Roman Imperial coins). Furthermore, we introduce a new data set collected with the specific aim of denomination-based categorisation of ancient coins, where we hypothesised colour could be of potential use, and evaluate the proposed representations. Lastly, we report surprisingly successful performances which goes further than confirming our hypothesis—rather, they convincingly demonstrate a much higher relevant information content carried by colour than even we expected. Thus we trust that our findings will be noted by others in the field and that more attention and further research will be devoted to the use of colour in automatic ancient coin analysis.
ARTICLE | doi:10.20944/preprints201911.0168.v1
Subject: Behavioral Sciences, Cognitive & Experimental Psychology Keywords: fine motor precision; vision; proprioception; sex differences; individual differences; personality
Online: 15 November 2019 (03:46:22 CET)
Previous studies have reported certain sex differences in motor performance precision. The aim of the present study was to analyse sex differences in fine motor precision performance for both hands in different tests conditions. 220 Spanish participants (ages: 12-95) performed fine motor tasks - tracing over the provided models – lines of 40 mm for both hands, two sensory conditions (PV – proprioceptive-visual; P – proprioceptive only) and three movement types (F – frontal, T – transversal and S - Sagittal). Differences in line length (the task focused on precision) were observed through MANOVA analysis for all test conditions, both sexes and different age groups. Sex differences in precision were observed in F and T movement types (statistically significance level and higher Cohens’ d was observed in condition with vision). No any statistically significant differences were observed in both hands and sensory conditions in sagittal type. Sex differences in fine motor precision were more frequently observed in the PV sensory condition in the frontal movement type and less in the sagittal one.
ARTICLE | doi:10.20944/preprints201801.0195.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: spacecraft; structure from motion; monocular vision; component detection; structure analysis
Online: 22 January 2018 (05:11:39 CET)
A monocular vision pose estimation and identification algorithm used on a small spacecraft for future orbital servicing is studied in this paper. A tracker spacecraft equipped with a short-range vision system is proposed to recover the 3D structural model of a space target in orbit and automatically identify its solar panels and main body using only visual information from an onboard camera. The proposed reconstruction and identification framework is tested using structure-from-motion and point cloud identification methods. The Efficient Perspective-n-Points (EPnP) descriptor is used for pose estimation. Triangulated points are used for component segmentation by means of orientation histogram descriptors. Experimental results based on laboratory images of a spacecraft model show the effectiveness and robustness of our approach.
ARTICLE | doi:10.20944/preprints201705.0170.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: accuracy; depth data; RMS error; 3D vision sensors; stereo disparity
Online: 23 May 2017 (09:20:27 CEST)
We propose an approach for estimating the error in depth data provided by generic 3D sensors, which are modern devices capable of generating an image (RGB data) and a depth map (distance) or other similar 2.5D structure (e.g. stereo disparity) of the scene. Our approach starts capturing images of a checkerboard pattern devised for the method. Then proceed with the construction of a dense depth map using functions that generally comes with the device SDK (based on disparity or depth). The 2D processing of RGB data is performed next to find the checkerboard corners. Clouds of corner points are finally created (in 3D), over which an RMS error estimation is computed. We come up with a multi-platform system and its verification and evaluation has been done, using the development kit of the board nVIDIA Jetson TK1 with the MS Kinects v1/v2 and the Stereolabs ZED camera. So the main contribution is the error determination procedure that does not need any data set or benchmark, thus relying only on data acquired on-the-fly. With a simple checkerboard, our approach is able to determine the error for any device. Envisioned application is on 3D reconstruction for robotic vision, with a series of 3D vision sensors embarked in robots (UAV of type quadcopter and terrestrial robots) for high-precision map construction, which can be used for sensing and monitoring.
ARTICLE | doi:10.20944/preprints201611.0034.v1
Subject: Engineering, Other Keywords: blossoms; digital image processing; machine vision; peaches; unmanned aerial system
Online: 7 November 2016 (05:18:19 CET)
One of the tools for optimal crop production is regular monitoring and assessment of crops. During the growing season of fruit trees, the bloom period has increased photosynthetic rates that correlate with the fruiting process. This paper presents the development of an image processing algorithm to detect peach blossoms on trees. Images of an experimental peach orchard were acquired from the Parma Research and Extension Center of the University of Idaho using an off-the-shelf unmanned aerial system (UAS), equipped with a multispectral camera (Near-infrared, Green, Blue). The orchard has different stone fruit varieties and different plant training system. Individual tree images (high-resolution) and arrays of trees images (low-resolution) were acquired to evaluate the detection capability. The image processing algorithm was based on different vegetation indices. Initial results showed that the image processing algorithm could detect peach blossoms and demonstrate good potential as a monitoring tool for orchard management.
REVIEW | doi:10.20944/preprints202208.0313.v3
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Convolutional Neural Network; domain; natural language processing; computer vision; semantic parsing
Online: 18 August 2022 (07:39:33 CEST)
Convolutional neural network (CNN), a class of artificial neural network (ANN) is attracting interests of researchers in all research domain. CNN was invented for computer vision. They have also shown to be useful for semantic parsing, sentence modeling and other natural language processing related tasks. Here in this paper we discuss the basics of CNN models and their scope to provide a reference/baseline to the researchers interested in using CNN models in their research.
ARTICLE | doi:10.20944/preprints202108.0282.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: Classification of insulators; Electrical power system; k-Nearest neighbors; Computer vision.
Online: 13 August 2021 (11:45:50 CEST)
The contamination on the insulators may increase its surface conductivity and, as a consequence, electrical discharges occur more frequently, which can lead to interruptions in the power supply. To maintain reliability in the electrical distribution power system, components that have lost their insulating properties must be replaced. Identifying the components that need maintenance, is a difficult task as there are several levels of contamination that are hardly noticed during inspections. To improve the quality of inspections, this paper proposes to use the k-nearest neighbours (k-NN) to classify the levels of insulator contamination, based on the image of insulators at various levels of contamination simulated in the laboratory. Using computer vision features such as mean, variance, asymmetry, kurtosis, energy, and entropy are used for training the k-NN. To assess the robustness of the proposed approach, statistical analysis and a comparative assessment with well-consolidated algorithms such as decision tree, ensemble subspace, and support vector machine models are presented. The k-NN showed results of up to 85.17 % accuracy using the k-fold cross-validation method, with an average accuracy higher than 82 % for multi-classification of the contamination of the insulators, being superior to the compared models.
DATASET | doi:10.20944/preprints202005.0345.v2
Subject: Earth Sciences, Atmospheric Science Keywords: computer vision; deep learning; Earth Engine; remote sensing; renewable energy; Tensorflow
Online: 21 July 2021 (14:53:14 CEST)
We have an unprecedented ability to map the Earth’s surface as deep learning technologies are applied to an abundance of high-frequency Earth observation data. Simple, free, and effective methods are needed to enable a variety of stakeholders to use these tools to improve scientific knowledge and decision making. Here we present a trained U-Net model that can map and delineate ground mounted solar arrays using publicly available Sentinel-2 imagery, and that requires minimal data pre-processing and no feature engineering. By using label overloading and image augmentation during training, the model is robust to temporal and spatial variation in imagery. The trained model achieved a precision and recall of 91.5% each and an intersection over union of 84.3% on independent validation data from two distinct geographies. This generalizability in space and time makes the model useful for repeatedly mapping solar arrays. We use this model to delineate all ground mounted solar arrays in North Carolina and the Chesapeake Bay watershed to illustrate how these methods can be used to quickly and easily produce accurate maps of solar infrastructure.
REVIEW | doi:10.20944/preprints202103.0449.v1
Subject: Medicine & Pharmacology, Allergology Keywords: vision rehabilitation; review of systems; traumatic brain injury; concussion; patient advocacy.
Online: 17 March 2021 (16:05:12 CET)
Treating a patient with traumatic brain injury requires an interdisciplinary approach because of the pervasive, profound and protean manifestations of this condition. In this review, key aspects of the medical history and review of systems will be described in order to highlight how the role of any provider must evolve to become a better patient advocate. Although this review is written from the vantage point of a vision care provider, it is hoped that patients, caregivers and providers will recognize the need for the team approach; it truly takes a village.
ARTICLE | doi:10.20944/preprints202102.0152.v1
Subject: Medicine & Pharmacology, Allergology Keywords: color vision deficiency; medical students; ishihara plates; humans; incidence; prevalence; frequency
Online: 5 February 2021 (09:58:31 CET)
Introduction Color vision deficiency (CVD) constitutes one of the frequently observed eye disorders in all human populations. Color is a prominent sign utilized in the medical profession to study and identify histopathological specimens, lab instruments, and patient examination. Color deficiency affects the medical skills of students resulting in poor clinical examination and color appreciation. There is no effective screening of CVD at any level of the medical profession. Hence, this study was aimed to determine the prevalence of CVD among medical students. Materials and methods This was a cross-sectional study conducted from September 2019 to February 2020 over a period of six months in Karachi, Pakistan. All medical students aged 18-21 years of either gender enrolled in the first and second years of medical college were included in this study. The examination was performed during daylight. Ishihara plates were placed at a distance of 75 cm from the subject and tilted so that the plane of the paper lies perpendicular to the line of vision. Students were given five seconds to read the plate and one examiner was instructed to mark the checklist. A score of less than 12 out of 14 red/green test plates (not including the demonstration plate) was considered as a CVD. All statistical analysis was performed using Statistical Package for Social Sciences version 20.0 (Armonk, NY: IBM Corp). Results The mean age of the medical students was 19.61± 1.22 years. There were (n=123) 53.0% females and (n=111) 47.0% males. Most of the medical students (n=131, 56.0%) belonged to the upper-middle-class socioeconomic group. CVD was observed in (n=13) 6.0%of medical students. Age (p=0.001) and socioeconomic status (p=0.001) were the only demographic factors significantly associated with color deficiency. Conclusions Color deficiency, although an unnoticed concern, is fairly common among medical students. Medical students must be screened for CVD as this will enable them to be aware of their limitations in their future observational skills as a doctor and devise ways of overcoming them in clinical practice.
ARTICLE | doi:10.20944/preprints202008.0487.v1
Subject: Social Sciences, Geography Keywords: Twitter; data reliability; risk communication; data mining; Google Cloud Vision API
Online: 22 August 2020 (02:32:40 CEST)
While Twitter has been touted to provide up-to-date information about hazard events, the reliability of tweets is still a concern. Our previous publication extracted relevant tweets containing information about the 2013 Colorado flood event and its impacts. Using the relevant tweets, this research further examined the reliability (accuracy and trueness) of the tweets by examining the text and image content and comparing them to other publicly available data sources. Both manual identification of text information and automated (Google Cloud Vision API) extraction of images were implemented to balance accurate information verification and efficient processing time. The results showed that both the text and images contained useful information about damaged/flooded roads/street networks. This information will help emergency response coordination efforts and informed allocation of resources when enough tweets contain geocoordinates or locations/venue names. This research will help identify reliable crowdsourced risk information to enable near-real time emergency response through better use of crowdsourced risk communication platforms.
ARTICLE | doi:10.20944/preprints201704.0130.v1
Subject: Engineering, Industrial & Manufacturing Engineering Keywords: intelligent robotics; flexibility; reusability; multisensor; state machine; software architecture; computer vision
Online: 20 April 2017 (04:14:33 CEST)
This paper presents a state machine based architecture which enhances flexibility and reusability of industrial robots, more concretely dual-arm multisensor robots. The proposed architecture, in addition to allowing absolute control of the execution, eases the programming of new applications by increasing the reusability of the developed modules. Through an easy-to-use graphical user interface, operators are able to create, modify, reuse and maintain industrial processes increasing the flexibility of the cell. Moreover, the proposed approach is applied in a real use case in order to demonstrate its capabilities and feasibility in industrial environments. A comparative analysis is presented for evaluating presented approach versus traditional robot programming techniques.
ARTICLE | doi:10.20944/preprints202209.0275.v1
Subject: Engineering, Civil Engineering Keywords: Bicycle Behavior; Naturalistic Cycling Data; Car/Bike Interactions; Computer Vision; Object Detection
Online: 19 September 2022 (10:22:00 CEST)
As machine learning and computer vision techniques and methods continue to advance, the collection of naturalistic traffic data from video feeds is becoming more and more feasible. That is especially true for the case of bicycles, for which the collection of naturalistic data is not achievable in the traditional vehicle approach. This study describes a research effort that aims to extract naturalistic cycling data from a video dataset for use in safety and mobility applications. The used videos come from a dataset collected in a previous Virginia Tech Transportation Institute study in collaboration with SPIN in which continuous video data at a non-signalized intersection on the Virginia Tech campus was recorded. The research team applied computer vision and machine learning techniques to develop a comprehensive framework for the extraction of naturalistic cycling trajectories. In total, this study resulted in the collection and classification of 619 bicycle trajectories based on their type of interactions with other road users. The results confirm the success of the proposed methodology in relation to extracting the locations, speeds, and accelerations of the bicycles at a high level of precision. Furthermore, preliminary insights into the acceleration and speed behavior of bicyclists around motorists are determined.
ARTICLE | doi:10.20944/preprints202203.0064.v1
Subject: Behavioral Sciences, Other Keywords: Computer vision; Google Street View; Built Environment; Walkability; Micro-scale; Deep learning
Online: 3 March 2022 (13:49:08 CET)
The study purpose was to train and validate a deep-learning approach to detect micro-scale streetscape features related to pedestrian physical activity. This work innovates by combining computer vision techniques with Google Street View (GSV) images to overcome impediments to conducting audits (e.g., time, safety, and expert labor cost). The EfficientNETB5 architecture was used to build deep-learning models for eight micro-scale features guided by the Microscale Audit of Pedestrian Streetscapes-Mini tool: sidewalks, sidewalk buffers, curb cuts, zebra and line crosswalks, walk signals, bike symbols, and streetlights. We used a train--correct loop, whereby images were trained on a training dataset, evaluated using a separate validation dataset, and trained further until acceptable performance metrics were achieved. Further, we used trained models to audit participant (N=512) neighborhoods in the WalkIT Arizona trial. Correlations were explored between micro-scale features and GIS-measured- and participant reported-macro-scale walkability. Classifier precision, recall, and overall accuracy were all >84%. Total micro-scale was associated with overall macro-scale walkability (r=0.300,p<.001). Positive associations were found between model-detected and self-reported sidewalks (r=0.41,p<.001) and sidewalk buffers (r=0.26,p<.001). Computer vision model results suggest an alternative to trained human raters, allowing for audits of hundreds or thousands of neighborhoods for population surveillance or hypothesis testing.
ARTICLE | doi:10.20944/preprints202201.0054.v1
Subject: Engineering, Other Keywords: stylus tip center self-calibration; spherical fitting; pose domain; vision measurement system
Online: 6 January 2022 (09:47:38 CET)
Light pen 3D vision coordinate measurement systems are increasingly widely used due to their advantages, such as small size, convenient carrying and wide applicability. The posture of the light pen is an important factor affecting accuracy. The pose domain of the pen needs to be given so that the measurement system has a suitable measurement range to obtain more qualified parameters. The advantage of the self-calibration method is that the entire self-calibration process can be completed at the measurement site without any auxiliary equipment. After the system camera calibration is completed, we take several pictures of the same measurement point with different poses to obtain the conversion matrix of the picture, and then use spherical fitting, the generalized inverse method of least squares, and the principle of position invariance within the pose domain range. The combined stylus tip center self-calibration method calculates the actual position of the light pen probe. The experimental results show that the absolute error is stable below 0.0737 mm and that the relative error is stable below 0.0025 mm. The experimental results verify the effectiveness of the method; the measurement accuracy of the system can meet the basic industrial measurement requirements.
ARTICLE | doi:10.20944/preprints202108.0405.v1
Subject: Biology, Anatomy & Morphology Keywords: animal welfare; pigs; deep learning; computer vision; stress detection; facial expression recognition
Online: 19 August 2021 (13:17:08 CEST)
Animal welfare is not only an ethically important consideration in good animal husbandry, but can also have a significant effect on an animal’s productivity. The aim of this paper is to show that a reduction in animal welfare, in the form of increased stress, can be identified in pigs from frontal images of the animals. We train a Convolutional Neural Network (CNN) using a leave-one-out design and show that it is able to discriminate between stressed and unstressed pigs with an accuracy of >90% in unseen animals. Grad-CAM is used to identify the animal regions used, and these support those used in manual assessments such as the Pig Grimace Scale. This innovative work paves the way for further work examining both positive and negative welfare states with a view to the development of an automated system that can be used in precision livestock farming to improve animal welfare.
ARTICLE | doi:10.20944/preprints202108.0279.v1
Subject: Keywords: Glaucoma; Diabetic Retinopathy; Convolution Neural Network (CNN); Vision Loss; Blindness; Machine Learning
Online: 12 August 2021 (15:36:51 CEST)
In the last few decades, glaucoma became the second biggest leading cause of irreversible vision loss. Because of its asymptotic growth, it is not properly diagnosed until the relatively late stage. To stop the severe damage by glaucoma it is needed to detect glaucoma in its early stages. Surprisingly diabetes also be the greatest cause of glaucoma. In the modern era, artificial intelligence makes great progress in the medical image processing field. Image analysis based on machine learning gives a huge success in diagnosis glaucoma without any misdiagnosis. The aim of this proposed paper is to create an automated process that can detect glaucoma and diabetic retinopathy. Here various Machine Learning models are used and results of these methods are presented.
REVIEW | doi:10.20944/preprints202107.0202.v1
Subject: Life Sciences, Biochemistry Keywords: colorblindness; color vision; myopia; cone photopigment; exon skipping; X-linked cone dysfunction
Online: 8 July 2021 (13:27:17 CEST)
The first step in seeing is light absorption by photopigment molecules expressed in the photore-ceptors of the retina. There are two types of photoreceptors in the human retina that are respon-sible for image formation, rods and cones. Except at very low light levels when rods are active, all vision is based on cones. Cones mediate high acuity vision and color vision. Furthermore, they are critically important in the visual feedback mechanism that regulates refractive development of the eye during childhood. The human retina contains a mosaic of three cone types, short-wavelength (S), long-wavelength (L) and middle-wavelength (M); however, the vast major-ity (~94%) are L and M cones. The OPN1LW and OPN1MW genes, located on the X-chromosome at Xq28, encode the protein component of the light-sensitive photopigments. Here we review mechanism by which splicing defects in these genes cause vision disorders.
ARTICLE | doi:10.20944/preprints202010.0455.v1
Subject: Engineering, Automotive Engineering Keywords: KINECT; industrial robot; vision system; RobotStudio; Visual Studio; gesture control; voice control
Online: 22 October 2020 (09:57:07 CEST)
The paper presents the possibility of using KINECT v2 module to control an industrial robot by means of gestures and voice commands. It describes elements of creating software for off-line and on-line robot control. The application for KINECT module was developed in C# language in Visual Studio environment, while the industrial robot control program was developed in RAPID language in RobotStudio environment. The development of a two-threaded application in RAPID language allowed to separate two independent tasks for the IRB120 robot. The main task of the robot is performed in thread no. 1 (responsible for movement). Simultaneously working thread no. 2 ensures continuous communication with the KINECT system and provides information about the gesture and voice commands in real time without any interference in thread no. 1. The applied solution allows the robot to work in industrial conditions without negative impact of communication task on the time of robot’s work cycles. Thanks to the development of a digital twin of the real robot station, tests of proper application functioning in off-line mode (without using a real robot) were conducted. Obtained results were verified online (on the real test station). Tests of correctness of gesture recognition were carried out, the robot recognized all programmed gestures. Another test carried out was the recognition and execution of voice commands. A difference in the time of task completion between the actual and virtual station was noticed - the average difference was 0.67 s. The last test carried out was to examine the impact of interference on the recognition of voice commands. With a 10dB difference between the command and noise, the recognition of voice commands was equal to 91.43%. The developed computer programs have a modular structure, which enables easy adaptation to process requirements.
ARTICLE | doi:10.20944/preprints201908.0282.v1
Subject: Engineering, Other Keywords: intelligent tractor; vision navigation; improved anti-noise morphology; boundary line; Guided Filtering
Online: 27 August 2019 (10:37:59 CEST)
An improved anti-noise morphology vision navigation algorithm is proposed for intelligent tractor tillage in a complex agricultural field environment. At first the two key steps, Guided Filtering and improved anti-noise morphology navigation line extraction were addressed in detail. Then the experiments were carried out in order to verify the effectiveness and advancement of the presented algorithm. Finally, the optimal template and it’s application condition were studied for improving the image processing speed. The comparison experiment results show that the YCbCr color space has minimum time consumption of 0.094 s in comparison with HSV, HIS and 2R-G-B color spaces. The Guided Filtering method can effectively distinguish the boundary between the new and old soil than other competing vanilla methods such as Tarel, Multi-scale Retinex, Wavelet-based Retinex and Homomorphic Filtering inspite of having the fastest processing speed of 0.113 s. The extracted soil boundary line of the improved anti-noise morphology algorithm has best precision and speed compared with other operators such as Sobel, Roberts, Prewitt and Log. After comparing different size of image template, the optimal template with the size of 140×260 pixels can meet high precision vision navigation while the course deviation angle is not more than 7.5°. The maximum tractor speed of the optimal template and global template are 51.41 km/h and 27.47 km/h respectively which can meet real-time vision navigation requirement of the smart tractor tillage operation in the field. The experimental vision navigation results demonstrated the feasibility of the autonomous vision navigation for tractor tillage operation in the field using the new and old soil boundary line extracted by the proposed improved anti-noise morphology algorithm which has broad application prospect.
ARTICLE | doi:10.20944/preprints201907.0248.v1
Subject: Engineering, Control & Systems Engineering Keywords: intelligent tractor; vision navigation; improved anti-noise morphology; boundary line; Guided Filtering
Online: 23 July 2019 (04:27:44 CEST)
An improved anti-noise morphology vision navigation algorithm is proposed for intelligent tractor tillage in a complex agricultural field environment. Firstly, the two key steps, Guided Filtering and improved anti-noise morphology navigation line extraction, were addressed in detail. Then the experiments were carried out in order to verify the effectiveness and advancement of the presented algorithm. Finally, the optimal template and its application condition were studied for improving the image processing speed. The comparison experiment results show that the YCbCr color space has minimum time consumption, 0.094 s, compared with HSV, HIS and 2R-G-B color spaces. The Guided Filtering method can enhance the new & old soil boundary effectively than any other methods such as Tarel, Multi-scale Retinex, Wavelet-based Retinex and Homomorphic Filtering, meanwhile, has the fastest processing speed of 0.113 s. The extracted soil boundary line of the improved anti-noise morphology algorithm has best precision and speed compared with other operators such as Sobel, Roberts, Prewitt and Log. After comparing different size of image template, the optimal template with the size of 140×260 pixels can meet high precision vision navigation while the course deviation angle is not more than 7.5°. The maximum tractor speed of the optimal template and global template are 51.41 km/h and 27.47 km/h respectively which can meet real-time vision navigation requirement of the smart tractor tillage operation in the field. The experimental vision navigation results demonstrated the feasibility of the autonomous vision navigation for tractor tillage operation in the field using the new & old soil boundary line extracted by the proposed improved anti-noise morphology algorithm which has broad application prospect.
ARTICLE | doi:10.20944/preprints201906.0146.v1
Subject: Engineering, Control & Systems Engineering Keywords: unmanned aerial vehicles; dynamic coordinate tracking; computer vision; anti-UAV defense system
Online: 16 June 2019 (10:23:16 CEST)
The rapid development of multicopters has led to many security problems. In order to prevent multicopters from invading restricted areas or famous buildings, an Anti-UAV Defense System (AUDS) has been developed and become a research topic of interest. Topics under research in relation to this include electromagnetic interference guns for unmanned aerial vehicles (UAVs), high-energy laser guns, US military net warheads, and AUDSs with net guns. However, these AUDSs use either manual aiming or expensive radar to track UAVs. This paper proposes a dual-axis rotary platform with UAV automatic tracking. The tracking platform uses visual image processing technology to track and lock the dynamic displacement of a UAV. When a target UAV is locked, the system uses a nine-axis attitude meter and laser rangefinders to measure its flight altitude and calculates its longitude and latitude coordinates through sphere coordinates to provide UAV monitoring for further defense or attack missions. Tracking tests of UAV flights in the air were carried out using a DJI MAVIC UAV at a height of 30 meters to 100 meters. It was set up for UAV image capture and visual recognition for tracking under various weather conditions by a thermal imager and a full-color camera respectively. When there was no cloud during the daytime, the images captured by the thermal imager and full-color camera provide a high-quality image recognition result. However, under dark weather, black clouds will emit radiant energy and seriously affect the capture of images by a thermal imager. When there is no cloud at night, the thermal imager performs well in UAV image capture. When the UAV is tracked and locked, the system can effectively obtain the flight altitude and longitude and latitude coordinate values.
ARTICLE | doi:10.20944/preprints201610.0040.v1
Subject: Engineering, Other Keywords: agriculture; digital image processing; machine vision; precision agriculture; unmanned aerial vehicle (UAV)
Online: 12 October 2016 (10:28:54 CEST)
Precision agriculture is a farm management technology that involves sensing and then responding to the observed variability in the field. Remote sensing is one of the tools of precision agriculture. The emergence of small unmanned aerial vehicles (sUAV) have paved the way to accessible remote sensing tools for farmers. This paper describes the comparison of two popular off-the-shelf sUAVs: 3DR Iris and DJI Phantom 2. Both units are equipped with a camera gimbal attached with a GoPro camera. The comparison of the two sUAV involves a hovering test and a rectilinear motion test. In the hovering test, the sUAV was allowed to hover over a known object and images were taken every second for two minutes. The position of the object in the images was measured and this was used to assess the stability of the sUAV while hovering. In the rectilinear test, the sUAV was allowed to follow a straight path and images of a lined track were acquired. The lines on the images were then measured on how accurate the sUAV followed the path. Results showed that both sUAV performed well in both the hovering test and the rectilinear motion test. This demonstrates that both sUAVs can be used for agricultural monitoring.
ARTICLE | doi:10.20944/preprints202209.0025.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: object detection; semi-supervised learning; Mask R-CNN; floor-plan images; computer vision
Online: 1 September 2022 (15:16:43 CEST)
Research has been growing on object detection using semi-supervised methods in past few years. We examine the intersection of these two areas for floor-plan objects to promote the research objective of detecting more accurate objects with less labelled data. The floor-plan objects include different furniture items with multiple types of the same class, and this high inter-class similarity impacts the performance of prior methods. In this paper, we present Mask R-CNN based semi-supervised approach that provides pixel-to-pixel alignment to generate individual annotation masks for each class to mine the inter-class similarity. The semi-supervised approach has a student-teacher network that pulls information from the teacher network and feeds it to the student network. The teacher network uses unlabeled data to form pseudo-boxes, and the student network uses both unlabeled data with the pseudo boxes and labelled data as ground truth for training. It learns representations of furniture items by combining labelled and unlabeled data. On the Mask R-CNN detector with ResNet-101 backbone network, the proposed approach achieves mAP of 98.8%, 99.7%, and 99.8% with only 1%, 5% and 10% labelled data, respectively. Our experiment affirms the efficiency of the proposed approach as it outperforms the fully supervised counterpart using only 10% of the labels.
ARTICLE | doi:10.20944/preprints202111.0154.v1
Subject: Engineering, Civil Engineering Keywords: Computer Vision; Synthetic Data; Physics-based Graphics Models; Deep Learning; Post-earthquake Inspections
Online: 8 November 2021 (15:06:45 CET)
Manual visual inspections typically conducted after an earthquake are high-risk, subjective, and time-consuming. Delays from inspections often exacerbate the social and economic impact of the disaster on affected communities. Rapid and autonomous inspection using images acquired from unmanned aerial vehicles offer the potential to reduce such delays. Indeed, a vast amount of re-search has been conducted toward developing automated vision-based methods to assess the health of infrastructure at the component and structure level. Most proposed methods typically rely on images of the damaged structure, but seldom consider how the images were acquired. To achieve autonomous inspections, methods must be evaluated in a comprehensive end-to-end manner, incorporating both data acquisition and data processing. In this paper, we leverage recent advances in computer generated imagery (CGI) to construct a 3D synthetic environment for simulation of post-earthquake inspections that allows for comprehensive evaluation and valida-tion of autonomous inspection strategies. A critical issue is how to simulate and subsequently render the damage in the structure after an earthquake. To this end, a high-fidelity nonlinear finite element model is incorporated in the synthetic environment to provide a representation of earthquake-induced damage; this finite element model, combined with photo-realistic rendering of the damage, is termed herein a physics-based graphics models (PBGM). The 3D synthetic en-vironment with PBGMs provide a comprehensive end-to-end approach for development and validation of autonomous post-earthquake strategies using UAVs, including: (i) simulation of path planning of virtual UAVs and image capture under different environmental conditions; (ii) au-tomatic labeling of captured images, potentially providing an infinite amount of data for training deep neural networks; (iii) availability of the ground truth damage state from the results of the finite-element simulation; and (iv) direct comparison of different approaches to autonomous as-sessments. Moreover, the synthetic data generated has the potential to be used to augment field datasets. To demonstrate the efficacy of PBGMs, models of reinforced concrete moment-frame buildings with masonry infill walls are examined. The 3D synthetic environment employing PBGMs is shown to provide an effective testbed for development and validation of autonomous vision-based post-earthquake inspections that can serve as an important building block for ad-vancing autonomous data to decision frameworks.
ARTICLE | doi:10.20944/preprints202103.0506.v1
Subject: Social Sciences, Accounting Keywords: street view image; subjective and objective perceptions; housing prices; machine learning; computer vision
Online: 22 March 2021 (10:16:03 CET)
The relationship between the street environment and the health, education, mobility, and criminal behaviors of its citizens has long been investigated by economists, sociologists and urban planners. Home buyers were found to pay a premium for better street appearance. Prior studies considering streetscapes mainly focus on objective measures such as the number of nearby trees, the tree canopy area, or the view index of physical features such as greenery, sky or building. However, subjective perceptions may have complex or subtle relationships to physical features, individual physical features or simply summing them up do not capture people’s comprehensive perception. In contrast, this study proposed a new approach for the urban-scale application to quantify both subjectively and objectively measured streetscape scores for six important perception qualities, namely Greenness, Walkability, Safety, Imageability, Enclosure, and Complexity. Built on prior quantitative studies in urban design quality and emerging applications in deep learning and open source street view imagery for urban perceptions, we integrated existing frameworks to (1) effectively collect and evaluate both subjectively and objectively- measured perceptions; (2) investigate the coherence and divergence in ML-predicted subjective scores and formula-derived objective scores; and (3) compare their effects in affecting house prices taking Shanghai as a case study using a large-scale dataset on home transactions. The results implied: first, the percentage increase in sales price attributable to street scores is significant for both subjective and objective measurements. In general subjective scores explained more variance over structural attributes and objective scores in hedonic price model. Particularly, objective Greenness, subjective Safety and Imageability scores positively affected house prices. Second, for Greenness and Imageability scores, the subjective and objective measures exhibited opposite signs in affecting house prices, which implied that there might be mechanisms related to the psychological, social-demographical characteristics of street users that have not been fully incorporated by objective measures that taking view indices or recombination of them. In addition, certain objective measure might outperform subjective counterpart when the connotation of the perception is self-evident and not complicated, for example the Greenness. For those concepts were not familiar to the average person, subjective framework exhibits better performance. This is the first study comprehensively expanding hedonic price method with both subjectively and objectively measured streetscape qualities. It suggested that city authorities could levy a street environment tax to compensate the public budget invested in street environment where developers secured benefits from a price premium. This study enriches our understanding of the economic values of the subjective and objective measures street qualities. It sheds light on promising future study areas where the coherence and divergence of the two measurements should be further stressed.
ARTICLE | doi:10.20944/preprints202009.0566.v1
Subject: Engineering, Automotive Engineering Keywords: transportation mode classification; vulnerable road users; recurrence plots; computer vision; image classification system
Online: 24 September 2020 (04:41:32 CEST)
As the Autonomous Vehicle (AV) industry is rapidly advancing, classification of non-motorized (vulnerable) road users (VRUs) becomes essential to ensure their safety and to smooth operation of road applications. The typical practice of non-motorized road users’ classification usually takes numerous training time and ignores the temporal evolution and behavior of the signal. In this research effort, we attempt to detect VRUs with high accuracy be proposing a novel framework that includes using Deep Transfer Learning, which saves training time and cost, to classify images constructed from Recurrence Quantification Analysis (RQA) that reflect the temporal dynamics and behavior of the signal. Recurrence Plots (RPs) were constructed from low-power smartphone sensors without using GPS data. The resulted RPs were used as inputs for different pre-trained Convolutional Neural Network (CNN) classifiers including constructing 227×227 images to be used for AlexNet and SqueezeNet; and constructing 224×224 images to be used for VGG16 and VGG19. Results show that the classification accuracy of Convolutional Neural Network Transfer Learning (CNN-TL) reaches 98.70%, 98.62%, 98.71%, and 98.71% for AlexNet, SqueezeNet, VGG16, and VGG19, respectively. The results of the proposed framework outperform other results in the literature (to the best of our knowledge) and show that using CNN-TL is promising for VRUs classification. Because of its relative straightforwardness, ability to be generalized and transferred, and potential high accuracy, we anticipate that this framework might be able to solve various problems related to signal classification.
ARTICLE | doi:10.20944/preprints202008.0202.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: heat sensors; smart offices; occupancy prediction; machine learning; computer vision; feature engineering; explainability
Online: 8 August 2020 (04:07:31 CEST)
In order to design efficient and sustainable office spaces and to automate lighting, heating and air circulation in these facilities, solving the challenge of occupancy prediction is crucial. In office spaces where large areas need to be observed, multiple sensors must be used for full coverage. In these cases, it is normally important to keep the costs low, but also to make sure that the privacy of the people who use such environments are preserved. Low-cost and low-resolution heat (thermal) sensors can be very useful to build solutions that address these concerns. However, they are extremely sensitive to noise artifacts which might be caused by heat prints of the people who left the space or by other objects which are either using electricity or exposed to sunlight. There are some earlier solutions for occupancy prediction that employ low-resolution heat sensors, however, they have not addressed nor compensate for such heat artifacts. Therefore, in this paper, we present a low-cost and low-energy consuming smart space implementation to predict the number of people in the environment based on whether their activity is static or dynamic in time. We use a low-resolution (8×8) and non-intrusive heat sensor to collect data from an actual meeting room. We propose two novel workflows to predict the occupancy; one based on computer vision and one based on machine learning. Besides comparing the advantages and disadvantages of these different workflows, we use several state-of-the-art explainability methods in order to provide a detailed analysis of the algorithm parameters and how the image properties influence the resulting performance. Furthermore, we analyze noise resources which affect the heat sensor data. We hope that our analysis brings light into understanding how to handle very low-resolution heat images in these environments. The presented workflows could be used in various domains and applications other than smart offices, where occupancy prediction is essential, e.g., for elderly care.
ARTICLE | doi:10.20944/preprints202008.0138.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Traffic Light Recognition (TLR); machine learning; Expert Instruction (EI); frequency maps; computer vision
Online: 6 August 2020 (07:56:57 CEST)
Research on Traffic Light Recognition (TLR) has grown in recent years, primarily driven by the growing interest in autonomous vehicles development. Machine Learning algorithms have been widely used to that purpose. Mainstream approaches, however, require large amount of data to properly work, and as a consequence, a lot of computational resources. In this paper we propose the use of Expert Instruction (IE) as a mechanism to reduce the amount of data required to provide accurate ML models for TLR. Given an image of the exterior scene taken from the inside of the vehicle, we stand the hypothesis that the picture of a traffic light is more likely to appear in the central and upper regions of the image. Frequency Maps of traffic light location were thus constructed to confirm this hypothesis. The frequency maps are the result of a manual effort of human experts in annotating each image with the coordinates of the region where the traffic light appears. Results show that EI increased the accuracy obtained by the classification algorithm in two different image datasets by at least 15%. Evaluation rates achieved by the inclusion of EI were also higher in further experiments, including traffic light detection followed by classification by the trained algorithm. The inclusion of EI in the PCANet achieved a precision of 83% and recall of 73% against 75.3% and 51.1%, respectively, of its counterpart. We finally presents a prototype of a TLR Device with that expert model embedded to assist drivers. The TLR uses a smartphone as a camera and processing unit. To show the feasibility of the apparatus, a dataset was obtained in real time usage and tested in an Adaptive Background Suppression Filter (AdaBSF) and Support Vector Machines (SVMs) algorithm to detect and recognize traffic lights. Results show precision of 100% and recall of 65%.
ARTICLE | doi:10.20944/preprints201810.0372.v1
Subject: Engineering, Other Keywords: teaching robotics; science teaching; STEM; robotic tool; python; Raspberry Pi; PiCamera; vision system
Online: 17 October 2018 (05:53:30 CEST)
This paper presents the robotic platform, PiBot, that has been developed and that is aimed at improving the teaching of Robotics with vision to secondary students. Its computational core is the Raspberry Pi 3 controller board, and the greatest novelty of this prototype is the support developed for the powerful camera mounted on board, the PiCamera. An open software infrastructure written in Python language was implemented so that the student may use this camera, or even a WebCam, as the main sensor of this robotic platform. Also, higher level commands have been provided to enhance the learning outcome for beginners. In addition, a PiBot 3D printable model and the counterpart for the Gazebo simulator were also developed and fully supported. They are publicly available so that students and educational centers that do not have the physical robot or can not afford the costs of these, can nevertheless practice and learn or teach Robotics using these open platforms: DIY-PiBot and/or simulated-PiBot.
ARTICLE | doi:10.20944/preprints202103.0780.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Deep learning; Computer vision; Remote sensing; Supervised learning; Semi-supervised learning; Segmentation; Seagrass mapping
Online: 31 March 2021 (15:53:19 CEST)
Intertidal seagrass plays a vital role in estimating the overall health and dynamics of coastal environments due to its interaction with tidal changes. However, most seagrass habitats around the globe have been in steady decline due to human impacts, disturbing the already delicate balance in environmental conditions that sustain seagrass. Miniaturization of multi-spectral sensors has facilitated very high resolution mapping of seagrass meadows, which significantly improve the potential for ecologists to monitor changes. In this study, two analytical approaches used for classifying intertidal seagrass habitats are compared: Object-based Image Analysis (OBIA) and Fully Convolutional Neural Networks (FCNNs). Both methods produce pixel-wise classifications in order to create segmented maps, however FCNNs are an emerging set of algorithms within Deep Learning with sparse application towards seagrass mapping. Conversely, OBIA has been a prominent solution within this field, with many studies leveraging in-situ data and multiscale segmentation to create habitat maps. This work demonstrates the utility of FCNNs in a semi-supervised setting to map seagrass and other coastal features from an optical drone survey conducted at Budle Bay, Northumberland, England. Semi-supervision is also an emerging field within Deep Learning that has practical benefits of achieving state of the art results using only subsets of labelled data. This is especially beneficial for remote sensing applications where in-situ data is an expensive commodity. For our results, we show that FCNNs have comparable performance with standard OBIA method used by ecologists, while also noting an increase in performance for mapping ecological features that are sparsely labelled across the study site.
ARTICLE | doi:10.20944/preprints202011.0009.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Computer Vision; Machine Learning; Colourimetric Test; Pre-trained Model; Point-of-Care System; Diagnosis
Online: 2 November 2020 (09:56:04 CET)
Purpose The gradual increase in geriatric issues and global imbalance of the ratio between patients and healthcare professionals has created a demand for intelligent systems with the least error-prone diagnosis results to be used by less medically trained persons and save clinical time. This paper aims at investigating the development of an image-based colourimetric analysis. The purpose of recognising such tests is to support wider users to begin a colourimetric test to be used at homecare settings, telepathology, etc. Design/methodology/approach The concept of an automatic colourimetric assay detection is delivered by utilising two cases. Training Deep Learning (DL) models on thousands of images of these tests using transfer learning, this paper i) classifies the type of the assay, and ii) classifies the colourimetric results. Findings This paper demonstrated that the assay type can be recognised using DL techniques with 100% accuracy within a fraction of a second. Some of the advantages of the pre-trained model over the calibration-based approach are robustness, readiness and suitability to deploy for similar applications within a shorter period of time. Originality/value To the best of our knowledge, this is the first attempt to provide Colourimetric Assay Type Classification (CATC) using DL. Humans are capable to learn thousands of visual classifications in their life. Object recognition may be a trivial task for humans, due to photometric and geometric variabilities along with the high degree of intra-class variabilities it can be a challenging task for machines. However, transforming visual knowledge into machines, as proposed, can support non-experts to better manage their health and reduce some of the burdens on experts.
ARTICLE | doi:10.20944/preprints201705.0117.v1
Subject: Engineering, Control & Systems Engineering Keywords: autonomous aerial refueling; computer vision; probe and drogue; target detection and tracking; ellipse fitting
Online: 16 May 2017 (05:56:11 CEST)
Autonomous aerial refueling technology is an effective solution to extend flight duration of unmanned aerial vehicles, and also a great challenge due to its high risk. For autonomous probe-and-drogue refueling tasks, relative navigation to provide relative position between the receiver aircraft and the refueling drogue is the first and essential step, and vision-based method is the most frequently used. A new monocular vision navigation sensor with image processing strategy consisting of the drogue detection method and the tracking method is developed for autonomous aerial refueling in this paper. In the drogue detection method, thresholding and mathematical morphology method are adopted to eliminate image interference, and contours extraction method is applied to obtain all contours, which are then subsequently checked to achieve target contour of drogue. In the tracking method, a rectangle of interest (ROI) of current frame image is determined by positioning results of last frame, and then processed by the previous drogue detection method. Finally, the proposed image processing strategy in monocular vision navigation sensor is validated using real flight images, which are captured from an autonomous aerial refueling testbed using a micro six-rotor aircraft as receiver aircraft.
ARTICLE | doi:10.20944/preprints202204.0279.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: object detection; challenging environments; low-light; image enhancement; complex environments; deep neural networks; computer vision
Online: 28 April 2022 (09:42:37 CEST)
In recent years, due to the advancement of machine learning, object detection has become a mainstream task in the computer vision domain. The first phase of object detection is to find the regions where objects can exist. With the improvement of deep learning, traditional approaches such as sliding windows and manual feature selection techniques have been replaced with deep learning techniques. However, object detection algorithms face a problem when performing in low light, challenging weather, and crowded scenes like any other task. Such an environment is termed a challenging environment. This paper exploits pixel-level information to improve detection under challenging situations. To this end, we exploit the recently proposed hybrid task cascade network. This network works collaboratively with detection and segmentation heads at different cascade levels. We evaluate the proposed methods on three complex datasets of ExDark, CURE-TSD, and RESIDE and achieve an mAP of 0.71, 0.52, and 0.43, respectively. Our experimental results assert the efficacy of the proposed approach.
REVIEW | doi:10.20944/preprints202102.0048.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: autonomous driving systems; computer vision; neural networks; feature extraction; segmentation; assisted driving; cloud computing; parallelization
Online: 1 February 2021 (14:50:20 CET)
Autonomous driving systems are increasingly becoming a necessary trend towards building smart cities of the future. Numerous proposals have been presented in recent years to tackle particular aspects of the working pipeline towards creating a functional end-to-end system, such as object detection, tracking, path planning, sentiment or intent detection. Nevertheless, few efforts have been made to systematically compile all of these systems into a single proposal that effectively considers the real challenges these systems will have on the road, such as real-time computation, hardware capabilities, etc. This paper has reviewed various techniques towards proposing our own end-to-end autonomous vehicle system, considering the latest state on the art on computer vision, DSs, path planning, and parallelization.
ARTICLE | doi:10.20944/preprints202009.0458.v1
Subject: Biology, Agricultural Sciences & Agronomy Keywords: machine learning; deep leaning; physiological maturity; computer vision; plant breeding; Phenology; Glycine max (L.) Merr.
Online: 19 September 2020 (10:08:43 CEST)
Soybean maturity is a trait of critical importance for the development of new soybean cultivars, nevertheless, its characterization based on visual ratings has many challenges. Unmanned aerial vehicles (UAVs) imagery-based high-throughput phenotyping methodologies have been proposed as an alternative to the traditional visual ratings of pod senescence. However, the lack of scalable and accurate methods to extract the desired information from the images remains a significant bottleneck in breeding programs. The objective of this study was to develop an image-based high-throughput phenotyping system for evaluating soybean maturity in breeding programs. Images were acquired twice a week, starting when the earlier lines began maturation until the latest ones were mature. Two complementary convolutional neural networks (CNN) were developed to predict the maturity date. The first using a single date and the second using the five best image dates identified by the first model. The proposed CNN architecture was validated using more than 15,000 ground truth observations from five trials, including data from three growing seasons and two countries. The trained model showed good generalization capability with a root mean squared error lower than two days in four out of five trials. Four methods of estimating prediction uncertainty showed potential at identifying different sources of errors in the maturity date predictions. The architecture used solves limitations of previous research and can be used at scale in commercial breeding programs.
ARTICLE | doi:10.20944/preprints202108.0389.v1
Subject: Mathematics & Computer Science, Other Keywords: remote-sensing classification; scene classification; few-shot learning; meta-learning; vision transformers; multi-scale feature fusion
Online: 18 August 2021 (14:29:29 CEST)
The central goal of few-shot scene classification is to learn a model that can generalize well to a novel scene category (UNSEEN) from only one or a few labeled examples. Recent works in the remote sensing (RS) community tackle this challenge by developing algorithms in a meta-learning manner. However, most prior approaches have either focused on rapidly optimizing a meta-learner or aimed at finding good similarity metrics while overlooking the embedding power. Here we propose a novel Task-Adaptive Embedding Learning (TAEL) framework that complements the existing methods by giving full play to feature embedding’s dual roles in few-shot scene classification - representing images and constructing classifiers in the embedding space. First, we design a lightweight network that enriches the diversity and expressive capacity of embeddings by dynamically fusing information from multiple kernels. Second, we present a task-adaptive strategy that helps to generate more discriminative representations by transforming the universal embeddings into task-specific embeddings via a self-attention mechanism. We evaluate our model in the standard few-shot learning setting on two challenging datasets: NWPU-RESISC4 and RSD46-WHU. Experimental results demonstrate that, on all tasks, our method achieves state-of-the-art performance by a significant margin.
ARTICLE | doi:10.20944/preprints202106.0037.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: robot navigation; computer vision; camera calibration; mapping; path planning; communication; NAO robot; educational innovation; higher education
Online: 1 June 2021 (14:49:11 CEST)
Maze navigation using one or more robots has become a recurring challenge in scientific literature and real life practice, with fleets having to find faster and better ways to navigate environments such as a travel hub (e.g. airports) or to evacuate a disaster zone. Many methods have been used to solve this issue, including the implementation of a variety of sensors and other signal receiving systems. Most interestingly, camera-based techniques have increasingly become more popular in this kind of applications, given their robustness and scalability. In this paper, we have implemented an end-to-end strategy to address this scenario, allowing a robot to solve a maze in an autonomous way, by using computer vision and path planning. In addition, this robot shares the generated knowledge to another by means of communication protocols, having to adapt its mechanical characteristics to be able to solve the same challenge. The paper presents experimental validation of the four components of this solution, namely camera calibration, maze mapping, path planning and robot communication. Finally, we present the integration and functionality of these methods applied in a pair of NAO robots.
ARTICLE | doi:10.20944/preprints201906.0023.v1
Subject: Engineering, Civil Engineering Keywords: structural health monitoring; displacement measurement; non-contact; computer vision, environmental factors; spatio-temporal context; Taylor approximatio
Online: 3 June 2019 (12:59:00 CEST)
Currently the majority of studies on vision-based measurement has been conducted under ideal environments so that an adequate measurement performance and accuracy is ensured. However, vision-based systems may face some adverse influencing factors such as illumination change and fog interference, which can affect the measurement accuracy. This paper develops a robust vision-based displacement measurement method which can handle the two common and important adverse factors given above and achieve sensitivity at the subpixel level. The proposed method leverages the advantage of high-resolution imaging incorporating spatial and temporal context aspects. To validate the feasibility, stability and robustness of the proposed method, a series of experiments was conducted on a two-span three-lane bridge in the laboratory. The illumination change and fog interference are simulated experimentally in the laboratory. The results of the proposed method are compared to conventional displacement sensor data and current vision-based method results. It is demonstrated that the proposed method gives better measurement results than the current ones under illumination change and fog interference.
ARTICLE | doi:10.20944/preprints201810.0523.v1
Subject: Biology, Other Keywords: spatiotemporal neural dynamics; vision; dorsal and ventral streams; multivariate pattern analysis; representational similarity analysis; fMRI; MEG
Online: 23 October 2018 (06:41:16 CEST)
To build a representation of what we see, the human brain recruits regions throughout the visual cortex in cascading sequence. Recently, an approach was proposed to evaluate the dynamics of visual perception in high spatiotemporal resolution at the scale of the whole brain. This method combined functional magnetic resonance imaging (fMRI) data with magnetoencephalography (MEG) data using representational similarity analysis and revealed a hierarchical progression from primary visual cortex through the dorsal and ventral streams. To assess the replicability of this method, here we present results of a visual recognition neuro-imaging fusion experiment, and compare them within and across experimental settings. We evaluated the reliability of this method by assessing the consistency of the results under similar test conditions, showing high agreement within participants. We then generalized these results to a separate group of individuals and visual input by comparing them to the fMRI-MEG fusion data of Cichy et al. (2016), revealing a highly similar temporal progression recruiting both the dorsal and ventral streams. Together these results are a testament to the reproducibility of the fMRI-MEG fusion approach and allows for the interpretation of these spatiotemporal dynamic in a broader context.
ARTICLE | doi:10.20944/preprints202211.0046.v1
Subject: Biology, Animal Sciences & Zoology Keywords: zebrafish; classical conditioning; operant-conditioning; software; auditory discrimination; learning; spatial working memory; decision making; reward; vision; hearing
Online: 2 November 2022 (06:08:45 CET)
Directed movement towards a target requires spatial working memory, including processing of sensory inputs and motivational drive. In a stimulus-driven, operant conditioning paradigm designed to train zebrafish, we present a pulse of light via LED’s and/or sounds via an underwater transducer. A webcam placed below a glass tank records fish swimming behavior. During operant conditioning, a fish must interrupt an infrared beam at one location to obtain a small food reward at the same or different location. A timing-gated interrupt activates robotic-arm and feeder stepper motors via custom software controlling a microprocessor (Arduino). “Ardulink”, a JAVA facility, implements Arduino-computer communication protocols. In this way, full automation of stimulus-conditioned directional swimming is achieved. Precise multiday scheduling of training, including timing, location and intensity of stimulus parameters, and feeder control is accomplished via a user-friendly interface. Our training paradigm permits tracking of learning by monitoring, turning, location, response times and directional swimming of individual fish. This facilitates comparison of performance within and across a cohort of animals. We demonstrate the ability to train and test zebrafish using visual and auditory stimuli. Current methods used for associative conditioning often involve human intervention, which is labor intensive, stressful to animals, and introduces noise in the data. Our relatively simple yet flexible paradigm requires a simple apparatus and minimal human intervention. Our scheduling and control software and apparatus (NemoTrainer) can be used to screen neurologic drugs and test the effects of CRISPR-based and optogenetic modification of neural circuits on sensation, locomotion, learning and memory.
ARTICLE | doi:10.20944/preprints202201.0090.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: formula detection; Hybrid Task Cascade network; mathematical expression detection; document image analysis; deep neural networks; computer vision
Online: 6 January 2022 (12:56:23 CET)
This work presents an approach for detecting mathematical formulas in scanned document images. The proposed approach is end-to-end trainable. Since many OCR engines cannot reliably work with the formulas, it is essential to isolate them to obtain the clean text for information extraction from the document. Our proposed pipeline comprises a hybrid task cascade network with deformable convolutions and a Resnext101 backbone. Both of these modifications help in better detection. We evaluate the proposed approaches on the ICDAR-2017 POD and Marmot datasets and achieve an overall accuracy of 96% for the ICDAR-2017 POD dataset. We achieve an overall reduction of error of 13%. Furthermore, the results on Marmot datasets are improved for the isolated and embedded formulas. We achieved an accuracy of 98.78% for the isolated formula and 90.21% overall accuracy for embedded formulas. Consequently, it results in an error reduction rate of 43% for isolated and 17.9% for embedded formulas.
ARTICLE | doi:10.20944/preprints202110.0089.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Object Detection; Cascade Mask R-CNN; Floor Plan Images; Deep Learning; Transfer Learning; Dataset Augmentation; Computer Vision
Online: 5 October 2021 (15:09:26 CEST)
Object detection is one of the most critical tasks in the field of Computer vision. This task comprises identifying and localizing an object in the image. Architectural floor plans represent the layout of buildings and apartments. The floor plans consist of walls, windows, stairs, and other furniture objects. While recognizing floor plan objects is straightforward for humans, automatically processing floor plans and recognizing objects is a challenging problem. In this work, we investigate the performance of the recently introduced Cascade Mask R-CNN network to solve object detection in floor plan images. Furthermore, we experimentally establish that deformable convolution works better than conventional convolutions in the proposed framework. Identifying objects in floor plan images is also challenging due to the variety of floor plans and different objects. We faced a problem in training our network because of the lack of publicly available datasets. Currently, available public datasets do not have enough images to train deep neural networks efficiently. We introduce SFPI, a novel synthetic floor plan dataset consisting of 10000 images to address this issue. Our proposed method conveniently surpasses the previous state-of-the-art results on the SESYD dataset and sets impressive baseline results on the proposed SFPI dataset. The dataset can be downloaded from SFPI Dataset Link. We believe that the novel dataset enables the researcher to enhance the research in this domain further.
ARTICLE | doi:10.20944/preprints202107.0165.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Formula detection; Cascade Mask R-CNN; Mathematical expression detection; document image analysis; deep neural networks; computer vision.
Online: 6 July 2021 (17:42:24 CEST)
This paper presents a novel architecture for detecting mathematical formulas in document images, which is an important step for reliable information extraction in several domains. Recently, Cascade Mask R-CNN networks have been introduced to solve object detection in computer vision. In this paper, we suggest a couple of modifications to the existing Cascade Mask R-CNN architecture: First, the proposed network uses deformable convolutions instead of conventional convolutions in the backbone network to spot areas of interest better. Second, it uses a dual backbone of ResNeXt-101, having composite connections at the parallel stages. Finally, our proposed network is end-to-end trainable. We evaluate the proposed approach on the ICDAR-2017 POD and Marmot datasets. The proposed approach demonstrates state-of-the-art performance on ICDAR-2017 POD at a higher IoU threshold with an f1-score of 0.917, reducing the relative error by 7.8%. Moreover, we accomplished correct detection accuracy of 81.3% on embedded formulas on the Marmot dataset, which results in a relative error reduction of 30%.
ARTICLE | doi:10.20944/preprints202203.0202.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: machine learning; artificial intelligence; computer vision; cybersecurity; privacy, security; gerontology; social gerontology; internet of medical things; best practices
Online: 15 March 2022 (10:40:36 CET)
Fall prediction using machine learning has become one of the most fruitful and socially relevant applications of computer vision in gerontological research. Since its inception in the early 2000s, this subfield has proliferated into a robust body of research underpinned by various machine learning algorithms (including neural networks, support vector machines, and decision trees) as well as statistical modeling approaches (Markov chains, Gaussian mixture models, and hidden Markov models). Furthermore, some advancements have been translated into commercial and clinical practice, with companies in various stages of development capitalizing on the aging population to develop new commercially available products. Yet despite the marvel of modern machine learning-enabled fall prediction, little research has been conducted to shed light on the security and privacy concerns that such systems pose for older adults. The present study employs an interdisciplinary lens in examining privacy issues associated with machine learning fall prediction and exploring the implications of these models in elderly care and the Internet of Medical Things (IoMT). Ultimately, a justice-informed set of best practices rooted in social geroscience is suggested to help fall prediction researchers and companies continue to advance the field while preserving elderly privacy and autonomy.
ARTICLE | doi:10.20944/preprints202101.0347.v1
Subject: Behavioral Sciences, Applied Psychology Keywords: interocular suppression; consciousness; color vision; visual search; attentional templates; early visual system; awareness; continuous flash suppression; binocoular rivalry
Online: 18 January 2021 (14:32:29 CET)
Color can direct visual attention to specific locations through bottom-up and top-down mechanisms. Using Continuous Flash Suppression (CFS) as way to investigate the factors that gate access to consciousness, the current study investigated whether color also directly affected the timing of conscious perception. Low or high spatial frequency (SF) gratings with different orientations were shown as targets to the non-dominant eye of human participants. CFS patterns were presented at a rate of 10Hz to the dominant eye to delay conscious perception of the targets, and participants had the task to report the target’s orientation as soon as they could see it. With low-SF targets, two types of color-based effects became evident. First, when the targets and the CFS patterns had different colors, the targets entered consciousness faster than in trials where the targets and CFS patterns had the same color. Second, when participants searched for a specific target color, targets that matched these search settings entered consciousness faster compared to conditions where the target color was irrelevant and could vary from trial to trial. Thus, the current study demonstrates that color is a central feature of human perception and leads to faster conscious perception of visual stimuli through bottom-up and top-down attentional mechanisms.
ARTICLE | doi:10.20944/preprints202004.0387.v1
Subject: Engineering, Industrial & Manufacturing Engineering Keywords: industry 4.0; vision system; image processing; machine learning; pen parts feature identification; illumination variation; fuzzy C-means algorithm
Online: 21 April 2020 (13:48:10 CEST)
The fourth Industrial Revolution, well-known as “Industry 4.0”, based on the integration of information and communication technologies, has introduced significant improvements in manufacturing. However, vision systems still experience various impracticalities in dealing with the effect of complex lighting on the systems platform. Therefore, a machine vision system for automatic identification of pen parts under varying lighting conditions at a digital learning factory is proposed. The developed vision system presents a straightforward approach by effectively minimizing the environmental lighting effect on the identification process. First, the obtained information of the designed vision framework is exported to a program, where a reduction of non-uniform illumination is achieved through the implementation of Retinex image enhancement techniques. Then, the color-based Fuzzy C-means (FCM) algorithm, including improved mark watershed segmentation, is employed for pen parts object classification. Finally, the position features of the selected pen part are reported. The process applied to a total number of 210 upper pen parts (caps) and 241 lower pen parts (tubes) images under different lighting scenarios. Results indicate that average parts identification precision for cap and tube parts is different and equals to 98.64% and 95.26%, respectively. The present methodology provides a promising scheme that can be feasibly adapted for other industrial Color-based object recognition applications.
ARTICLE | doi:10.20944/preprints202003.0296.v1
Subject: Life Sciences, Cell & Developmental Biology Keywords: retinol binding protein 4 receptor 2; RBP4; Rbpr2; STRA6; all-trans retinol transport; photoreceptor cell; vision; retinoids; zebrafish
Online: 19 March 2020 (03:16:19 CET)
Dietary vitamin A/all-trans retinol/ROL plays a critical role in human vision. ROL circulates bound to the plasma retinol-binding protein (RBP4) as RBP4-ROL. In the eye, the STRA6 membrane receptor binds to circulatory RBP4 and internalizes ROL. STRA6 is however not expressed in systemic tissues, where there is high-affinity RBP4 binding and ROL uptake. We tested the hypothesis, that the second retinol-binding protein 4 receptor 2 (Rbpr2) which is highly expressed in systemic tissues of zebrafish and mouse, contains a functional RBP4 binding domain, critical for ROL transport. As for STRA6, modeling and docking studies confirmed three conserved RBP4 binding residues in zebrafish Rbpr2. In cell culture studies, disruption of the RBP4 binding residues on Rbpr2 almost completely abolished uptake of exogenous vitamin A. CRISPR generated rbpr2-RBP4 domain zebrafish mutants showed microphthalmia, shorter photoreceptor outer segments, and decreased opsins, that were attributed to impaired ocular retinoid content. Injection of WT-Rbpr2 mRNA into rbpr2 mutant or all-trans retinoic acid treatments rescued the mutant eye phenotypes. In conclusion, zebrafish Rbpr2 contains a putative extracellular RBP4-ROL ligand-binding domain, critical for yolk vitamin A transport to the eye for ocular retinoid production and homeostasis, for photoreceptor cell survival.
REVIEW | doi:10.20944/preprints201801.0109.v1
Subject: Behavioral Sciences, Developmental Psychology Keywords: dyslexia; reading; magnocellular neurons; vision; hearing; phonology; sequencing; timing; temporal processing; transient; coloured filters; rhythm; music; omega 3s
Online: 12 January 2018 (07:15:33 CET)
Until the 1950s, developmental dyslexia was defined as a hereditary visual disability, selectively affecting reading without compromising oral or non-verbal reasoning skills. This changed radically after the development of the phonological theory of dyslexia; this not only ruled out any role for visual processing in its aetiology, but also cast doubt on the use of discrepancy between reading and reasoning skills as a criterion for diagnosing it. Here I argue that this theory is set at too high a cognitive level to be explanatory; we need to understand the pathophysiological visual and auditory mechanisms that cause children’s phonological problems. I discuss how the ‘magnocellular theory’ attempts to do this in terms of slowed and error prone temporal processing which leads to dyslexics’ defective visual and auditory sequencing when attempting to read. I attempt to deal with the criticisms of this theory and show how it leads to a number of successful ways of helping dyslexic children to overcome their reading difficulties.
ARTICLE | doi:10.20944/preprints202101.0534.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: fruit occlusion; deep learning; machine vision; yield estimation; fruit count; neural network; CNN; tree crop; Mangifera indica; MLP; canopy
Online: 26 January 2021 (11:29:49 CET)
Imaging systems mounted to ground vehicles are used to image fruit tree canopies for estimation of fruit load, but frequently need correction for fruit occluded by branches, foliage or other fruits. This can be achieved using an orchard ‘occlusion factor’, estimated from a manual count of fruit load on a sample of trees (referred to as the reference method). It was hypothesised that canopy images could hold information related to the number of occluded fruit. Five approaches to correct for occluded fruit based on canopy images were compared using data of three mango orchards in two seasons. However, no attribute correlates to the number of hidden fruit were identified. Several image features obtained through segmentation of fruit and canopy areas, such as the proportion of fruit that were partly occluded, were used in training Random forest and multi-layered perceptron (MLP) models for estimation of a correction factor per tree. In another approach, deep learning convolutional neural networks (CNNs) were directly trained against harvest fruit count on trees. The supervised machine learning methods for direct estimation of fruit load per tree delivered an improved prediction outcome over the reference method for data of the season/orchard from which training data was acquired. For a set of 2017 season tree images (n=98 trees), a R2 of 0.98 was achieved for the correlation of the number of fruits predicted by a Random forest model and the ground truth fruit count on the trees, compared to a R2 of 0.68 for the reference method. The best prediction of whole orchard (n = 880 trees) fruit load, in the season of the training data, was achieved by the MLP model, with an error to packhouse count of 1.6% compared to the reference method error of 13.6%. However, the performance of these models on new season data (test set images) was at best equivalent and generally poorer than the reference method. This result indicates that training on one season of data was insufficient for the development of a robust model. This outcome was attributed to variability in tree architecture and foliage density between seasons and between orchards, such that the characters of the canopy visible from the interrow that relate to the proportion of hidden fruit are not consistent. Training of these models across several seasons and orchards is recommended.
ARTICLE | doi:10.20944/preprints202009.0647.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Lung condition; COVID-19; Machine learning; Custom Vision; Core ML; Auto ML; AI; Pneumonia; Smartphone application; Real-time diagnosis
Online: 26 September 2020 (16:14:39 CEST)
AI is leveraging all aspects of life. Medical services are not untouched. Especially in the field of medical image processing and diagnosis. Big IT and Biotechnology companies are investing millions of dollars in medical and AI research. The recent outbreak of SARS COV-2 gave us a unique opportunity to study for a non interventional and sustainable AI solution. Lung disease remains a major healthcare challenge with high morbidity and mortality worldwide. The predominant lung disease was lung cancer. Until recently, the world has witnessed the global pandemic of COVID19, the Novel coronavirus outbreak. We have experienced how viral infection of lung and heart claimed thousands of lives worldwide. With the unprecedented advancement of Artificial Intelligence in recent years, Machine learning can be used to easily detect and classify medical imagery. It is much faster and most of the time more accurate than human radiologists. Once implemented, it is more cost-effective and time-saving. In our study, we evaluated the efficacy of Microsoft Cognitive Service to detect and classify COVID19 induced pneumonia from other Viral/Bacterial pneumonia based on X-Ray and CT images. We wanted to assess the implication and accuracy of the Automated ML-based Rapid Application Development (RAD) environment in the field of Medical Image diagnosis. This study will better equip us to respond with an ML-based diagnostic Decision Support System(DSS) for a Pandemic situation like COVID19. After optimization, the trained network achieved 96.8% Average Precision which was implemented as a Web Application for consumption. However, the same trained network did not perform like Web Application when ported to Smartphone for Real-time inference, which was our main interest of study. The authors believe, there is scope for further study on this issue. One of the main goals of this study was to develop and evaluate the performance of AI-powered Smartphone-based Real-time Applications. Facilitating primary diagnostic services in less equipped and understaffed rural healthcare centers of the world with unreliable internet service.
ARTICLE | doi:10.20944/preprints202111.0530.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: robot vision; compound eye; two-dimensional model; distance measurement; azimuth measurement; deep learning; training data set generation; deep neural network
Online: 29 November 2021 (12:28:15 CET)
This paper presents a two-dimensional mathematical model of compound eye vision. Such a model is useful for solving navigation issues for autonomous mobile robots on the ground plane. The model is inspired by the insect compound eye that consists of ommatidia, which are tiny independent photoreception units, each of which combines a cornea, lens, and rhabdom. The model describes the planar binocular compound eye vision, focusing on measuring distance and azimuth to a circular feature with an arbitrary size. The model provides a necessary and sufficient condition for the visibility of a circular feature by each ommatidium. On this basis, an algorithm is built for generating a training data set to create two deep neural networks (DNN): the first detects the distance, and the second detects the azimuth to a circular feature. The hyperparameter tuning and the configurations of both networks are described. Experimental results showed that the proposed method could effectively and accurately detect the distance and azimuth to objects.
ARTICLE | doi:10.20944/preprints202007.0625.v1
Subject: Engineering, General Engineering Keywords: elderly care; hand gesture; computer vision system; Microsoft Kinect depth sensor; Arduino Nano Microcontroller; global system for mobile communication (GSM)
Online: 26 July 2020 (02:07:09 CEST)
Hand gestures may play an important role in medical applications for health care of elderly people, where providing a natural interaction for different requests can be executed by making specific gestures. In this study we explored three different scenarios using a Microsoft Kinect V2 depth sensor then evaluated the effectiveness of the outcomes. The first scenario utilized the default system embedded in the Kinect V2 sensor, which depth metadata gives 11 parameters related to the tracked body with five gestures for each hand. The second scenario used joint tracking provided by Kinect depth metadata and depth threshold together to enhance hand segmentation and efficiently recognize the number of fingers extended. The third scenario used a simple convolutional neural network with joint tracking by depth metadata to recognize five categories of gestures. In this study, deaf-mute elderly people execute five different hand gestures to indicate a specific request, such as needing water, meal, toilet, help and medicine. Then, the requests were sent to the care provider’s smartphone because elderly people could not execute any activity independently. The system transferred these requests as a message through the global system for mobile communication (GSM) using a microcontroller.
ARTICLE | doi:10.20944/preprints202106.0590.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Object detection; challenging environments; low-light; image enhancement; complex environments; state-of-the-art; deep neural networks; computer vision; performance analysis.
Online: 23 June 2021 (16:01:33 CEST)
Recent progress in deep learning has led to accurate and efficient generic object detection networks. Training of highly reliable models depends on large datasets with highly textured and rich images. However, in real-world scenarios, the performance of the generic object detection system decreases when (i) occlusions hide the objects, (ii) objects are present in low-light images, or (iii) they are merged with background information. In this paper, we refer to all these situations as challenging environments. With the recent rapid development in generic object detection algorithms, notable progress has been observed in the field of object detection in challenging environments. However, there is no consolidated reference to cover state-of-the-art in this domain. To the best of our knowledge, this paper presents the first comprehensive overview, covering recent approaches that have tackled the problem of object detection in challenging environments. Furthermore, we present the quantitative and qualitative performance analysis of these approaches and discuss the currently available challenging datasets. Moreover, this paper investigates the performance of current state-of-the-art generic object detection algorithms by benchmarking results on the three well-known challenging datasets. Finally, we highlight several current shortcomings and outline future directions.
ARTICLE | doi:10.20944/preprints201801.0235.v1
Subject: Engineering, Civil Engineering Keywords: infrastructure inspection; computer vision; structure from motion; dam inspection; 3D scene reconstruction; aerial robots; remote sensing; structural health monitoring; unmanned aerial vehicles
Online: 25 January 2018 (05:00:51 CET)
Dams are a critical infrastructure system for many communities, but they are also one of the most challenging to inspect. Dams are typically very large and complex structures, and the result is that inspections are often time-intensive and require expensive, specialized equipment and training to provide inspectors with comprehensive access to the structure. The scale and nature of dam inspections also introduces additional safety risks to the inspectors. Unmanned aerial vehicles (UAV) have the potential to address many of these challenges, particularly when used as a data acquisition platform for photogrammetric three-dimensional (3D) reconstruction and analysis, though the nature of both UAV and modern photogrammetric methods necessitates careful planning and coordination for integration. This paper presents a case study on one such integration at the Brighton Dam, a large-scale concrete gravity dam in Maryland, USA. A combination of multiple UAV platforms and multi-scale photogrammetry was used to create two comprehensive and high-resolution 3D point clouds of the dam and surrounding environment at intervals. These models were then assessed for their overall quality, as well as their ability to resolve flaws and defects that were artificially applied to the structure between inspection intervals. The results indicate that the integrated process is capable of generating models that accurately render a variety of defect types with sub-millimeter accuracy. Recommendations for mission planning and imaging specifications are provided as well.