ARTICLE | doi:10.20944/preprints202202.0250.v1
Online: 21 February 2022 (10:09:18 CET)
Remote sensing technology, especially using satellite images, has become essential support in many aspects of decision-making, particularly in disaster risk management. It requires a shorter period of data updates and less cost compared to conventional field observations and surveys. Yet, the intensive processing and high-powered computing resources are necessary to analyze satellite imagery data through Geographic Information System (GIS). In this paper, we introduce the identification and mapping of natural disaster impact in Indonesia using the open-source collaborative tool of Google Earth Engine (GEE) application which analyzes the relative temporal difference of Earth surface from three major satellite images: Sentinel-1, Sentinel-2, and Landsat-8. Taking the advantage of the geographical, geological, and demographic conditions of Indonesia's disaster-prone areas, we analyze relative difference from normalized difference vegetation index (NDVI) out of months before and after natural disaster occurrence to measure the impact of natural disaster in focus study areas. Given the high-vegetation nature of three main natural disaster impacted areas in Indonesia: Aceh, Palu, and Yogyakarta, we are able to simplify the analysis by highlighting areas with vegetative loss or gain after the event. Using an open-source GEE application, namely HazMapper, we identify and visualize the aftermath of the tsunami disaster in Aceh and Palu as well as the earthquake in Yogyakarta. Our study is potentially beneficial for government and decision-makers to utilize publicly available satellite images for disaster recovery and mitigation policy.
ARTICLE | doi:10.20944/preprints201810.0615.v1
Subject: Earth Sciences, Environmental Sciences Keywords: hyperspectral images; multispectral images; spectral diversity; Shannon entropy; tropical forest; marine coral reefs; biodiversity; correlation.
Online: 25 October 2018 (16:30:53 CEST)
Hyperspectral images are an important tool to assess ecosystem biodiversity both on terrestrial and benthic habitats. To obtain more precise analysis of biodiversity indicators that agree with indicators obtained using field data, analysis of spectral diversity calculated from images have to be validated with field based diversity estimates. The plant species richness is one of the most important indicators of biodiversity. This indicator can be measured in hyperspectral images considering the Spectral Variation Hypothesis (SVH) which states that the spectral heterogeneity is related to spatial heterogeneity and thus to species richness. The goal of this research is to capture spectral heterogeneity from hyperspectral images for a terrestrial neo tropical forest site using Vector Quantization (VQ) method and then use the result for prediction of plant species richness. The results are compared with that of Hierarchical Agglomerative Clustering (HAC). The validation of the process index is done calculating the Pearson correlation coefficient between the Shannon entropy from actual field data and the Shannon entropy computed in the images. Terrestrial dry forest and marine coastal hyperspectral images with different resolutions have been used for spectral diversity feature validation.
COMMUNICATION | doi:10.20944/preprints202106.0139.v1
Subject: Biology, Anatomy & Morphology Keywords: Vegetation indices; precision agriculture; RGB images
Online: 4 June 2021 (11:25:04 CEST)
Here, we report the prediction of vegetative stages variables of canary bean crop by means of RGB and multispectral images obtained from UAV during the ripening stage, correlating the vegetation indices with biometric variables measured manually in the field. Results indicated a highly significant correlation of plant height with eight RGB image vegetation indices for the canary bean crop, which were used for predictive models, obtaining a maximum correlation of R2 = 0.79. On the other hand, the estimated indices of multispectral images did not show significant correlations.
ARTICLE | doi:10.20944/preprints202103.0494.v1
Subject: Earth Sciences, Atmospheric Science Keywords: Landmass expansion; India Coast; Landsat Images
Online: 19 March 2021 (08:56:01 CET)
This study explores the changes in the landmass bounded by the coast of India during 1975-2005 by using on-screen visual interpretation technique (with 100m resolution and 1:50,000 scale) from NASA Landsat Imagery in three different time periods viz. 1975, 1990, and 2005. The result indicated an overall expansion of 130 sq. km area of the landmass that surrounded by the Indian coast during 1975-2005 (74 sq. km during 1975-1990 and 56 sq. km during 1991-2005). These estimations are based on the preliminary analysis and may be estimated more accurately by reducing the scale and using further higher resolution images.
ARTICLE | doi:10.20944/preprints201907.0219.v1
Subject: Life Sciences, Molecular Biology Keywords: circRNA; reproducible analysis; pipeline, Docker images
Online: 19 July 2019 (05:05:35 CEST)
Recently the increased cost-effectiveness of high-throughput technologies has made available a large number of RNA sequencing datasets to identify circular RNAs (circRNAs). However, despite many computational tools were developed to predict circRNAs, a limited number of workflows exists to predict and to characterize circRNAs. Moreover, to the best of our knowledge, these available workflows do not ensure computational reproducibility and require advanced bash scripting skills to be correctly installed and used. To cope with these critical aspects we present Docker4Circ, a new computational framework designed for a comprehensive analysis of circRNAs composed of: circRNAs prediction, classification and annotation using public databases, the back-splicing sequence reconstruction; the internal alternative splicing of circularizing exons; the alignment-free circRNAs quantification from RNA-Seq reads, and, finally, their differential expression analysis. Docker4Circ was specifically designed for making easier and more accessible circRNAs analysis thanks to the following features: (i) its R interface; (ii) the encapsulation of its computational tasks into a docker image; (iii) an available user-friendly Java GUI Interface. Furthermore, Docker4Circ ensures a reproducible analysis because all its tasks were embedded into a docker image following the guidelines provided by Reproducible Bioinformatics Project (RBP, http://reproducible-bioinformatics.org/). The effectiveness of Docker4Circ was demonstrated on a real case study whose goal is to characterize the circRNAs predicted in colorectal cancer cell lines and quantified in public RNA-Seq experiments performed on primary tumor tissues. In conclusion, we propose Docker4Circ as a framework for reproducible and comprehensive analyses of circRNAs to efficiently exploit their biological role.
ARTICLE | doi:10.20944/preprints201902.0105.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: fusion; point clouds; images; object detection
Online: 12 February 2019 (16:53:19 CET)
This paper aims at tackling with the task of fusion feature from images and its corresponding point clouds for 3D object detection in autonomous driving scenarios basing on AVOD, an Aggregate View Object Detection network. The proposed fusion algorithms fuse features targeted from Bird’s Eye View (BEV) LIDAR point clouds and its corresponding RGB images. Differs in existing fusion methods, which are simply the adoptions of concatenation module, element-wise sum module or element-wise mean module, our proposed fusion algorithms enhance the interaction between BEV feature maps and its corresponding images feature maps by designing a novel structure, where single level feature maps and another utilizes multilevel feature maps. Experiments show that our proposed fusion algorithm produces better results on 3D mAP and AHS with less speed loss comparing to existing fusion method used on the KITTI 3D object detection benchmark.
ARTICLE | doi:10.20944/preprints202007.0307.v1
Subject: Mathematics & Computer Science, General & Theoretical Computer Science Keywords: Color images segmentation; Particle Motion; Local Color Distance Images; Normal compressive vector field; Edge vector field
Online: 14 July 2020 (12:09:23 CEST)
This paper presents an Edge-based color image segmentation derived from the method of Particle Motion in a Vector Image Fields (PMVIF) that could previously be applied only to monochrome images. Instead of using an edge vector field derived from a gradient vector field and a normal compressive vector field derived from a Laplacian-gradient vector field, two novel orthogonal vector fields, directly computed from a color image, one parallel and another orthogonal to the edges, were used in the model to force a particle to move along the object edges. The normal compressive vector field is derived from the center-to-centroid vectors of local color distance images. Next, the edge vector field is derived by taking the normal compressive vector field, multiplied by differences of auxiliary image pixels to obtain a vector field analogous to a Hamiltonian gradient vector field. Using the PASCAL Visual Object Classes Challenge 2012 (VOC2012) and the Berkeley Segmentation Data Set and Benchmarks 500 (BSDS500), the benchmark score of the proposed method is provided in comparison with those of the traditional PMVIF, Watershed, SLIC, K-means, Mean shift, and JSEG. The proposed method yields better RI, GCE, NVI, BDE, Dice coefficients, faster computation time, and noise resistance.
ARTICLE | doi:10.20944/preprints201706.0012.v3
Subject: Engineering, Other Keywords: deep convolutional neural networks; road segmentation; conditional random fields; landscape metrics; satellite images; aerial images; THEOS
Online: 5 June 2017 (06:39:54 CEST)
Object segmentation on remotely-sensed images: aerial (or very high resolution, VHS) images and satellite (or high resolution, HR) images, has been applied to many application domains, especially road extraction in which the segmented objects are served as a mandatory layer in geospatial databases. Several attempts in applying deep convolutional neural network (DCNN) to extract roads from remote sensing images have been made; however, the accuracy is still limited. In this paper, we present an enhanced DCNN framework specifically tailored for road extraction on remote sensing images by applying landscape metrics (LMs) and conditional random fields (CRFs). To improve DCNN, a modern activation function, called exponential linear unit (ELU), is employed in our network resulting in a higher number of and yet more accurate extracted roads. To further reduce falsely classified road objects, a solution based on an adoption of LMs is proposed. Finally, to sharpen the extracted roads, a CRF method is added to our framework. The experiments were conducted on Massachusetts road aerial imagery as well as THEOS satellite imagery data sets. The results showed that our proposed framework outperformed Segnet, the state-of-the-art object segmentation technique on any kinds of remote sensing imagery, in most of the cases in terms of precision, recall, and F1.
ARTICLE | doi:10.20944/preprints202103.0767.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Images; matrices; angles; subspace; linear algebra applications
Online: 31 March 2021 (12:42:31 CEST)
An image consisting of (m, n) pixels can be seen as a matrix of m × n size. Based on the formula of angles between two subspaces, a pair of angles can be defined between two matrices, by utilizing column spaces and row spaces of the two matrices. The singular values of each matrix can be used to calculate distances. Thus, a distance representation is obtained, and a pair of angles between 2 matrices of the same size.
ARTICLE | doi:10.20944/preprints202210.0140.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: synthesized view; quality enhancement; synthetic images; data augmentation
Online: 11 October 2022 (04:39:16 CEST)
Recently, deep learning-based image quality enhancement models have been proposed to improve the perceptual quality of distorted synthesized views impaired by compression and Depth Image Based Rendering (DIBR) process in multiview video systems. However, due to the lack of multi-view video plus depth data, the training data for quality enhancement models is small, which limits the performance and progress of these models. Augmenting the training data to enhance the Synthesized View Quality Enhancement (SVQE) models is a feasible solution. In this paper, we suggest a deep learning-based SVQE model using more synthetic Synthesized View Images (SVIs). To simulate the irregular geometric displacement of DIBR distortion, a random irregular polygon-based SVI synthesis method is proposed based on existing massive RGB/RGBD data, and a synthetic synthesized view database is constructed, which includes synthetic SVIs and DIBR distortion masks. Moreover, to further guide the SVQE models to focus more precisely on DIBR distortion, the DIBR distortion mask prediction network which could predict the position and variance of DIBR distortion is embedded into the SVQE models. The experimental results demonstrate that by pretraining on the synthetic SVI database, the performance of the existing SVQE models could be greatly promoted. In addition, by introducing the DIBR distortion mask prediction network, the SVI quality could be further enhanced.
ARTICLE | doi:10.20944/preprints202208.0192.v1
Subject: Engineering, Automotive Engineering Keywords: Transfer Learning; Generative Adversarial Networks; MRI Brain Images
Online: 10 August 2022 (05:04:02 CEST)
Segmentation is an important step in medical imaging. In particular, machine learning, especially deep learning, has been widely used to efficiently improve and speed up the segmentation process in clinical practice. Despite the acceptable segmentation results of multi-stage models, little attention was paid to the use of deep learning algorithms for brain image segmentation, which could be due to the lack of training data. Therefore, in this paper, we propose a Generative Adversarial Network (GAN) model that performs transfer learning to segment MRI brain images.Our model enables the generation of more labeled brain images from existing labeled and unlabeled images. Our segmentation targets brain tissue images, including white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). We evaluate the performance of our GAN model using a commonly used evaluation metric, which is Dice Coefficient (DC). Our experimental results reveal that our proposed model significantly improves segmentation results compared to the standard GAN model. We observe that our model is 2.1–10.83 minutes faster than stat-of-the-art-models.
ARTICLE | doi:10.20944/preprints202012.0592.v1
Subject: Engineering, Automotive Engineering Keywords: Super Resolution; Attention-augmented Convolution; Panchromatic Images; WGAN
Online: 23 December 2020 (14:10:31 CET)
Panchromatic (PAN) images contain abundant spatial information that is useful for earth observation, but always suffer from low-resolution due to the sensor limitation and large-scale view field. The current super-resolution (SR) methods based on traditional attention mechanism have shown remarkable advantages but remain imperfect to reconstruct the edge details of SR images. To address this problem, an improved super-resolution model which involves the self-attention augmented WGAN is designed to dig out the reference information among multiple features for detail enhancement. We use an encoder-decoder network followed by a fully convolutional network (FCN) as the backbone to extract multi-scale features and reconstruct the HR results. To exploit the relevance between multi-layer feature maps, we first integrate a convolutional block attention module (CBAM) into each skip-connection of the encoder-decoder subnet, generating weighted maps to enhance both channel-wise and spatial-wise feature representation automatically. Besides, considering that the HR results and LR inputs are highly similar in structure, yet cannot be fully reflected in traditional attention mechanism, we therefore design a self augmented attention (SAA) module, where the attention weights are produced dynamically via a similarity function between hidden features, this design allows the network to flexibly adjust the fraction relevance among multi-layer features and keep the long-range inter information, which is helpful to preserve details. In addition, the pixel-wise loss is combined with perceptual and gradient loss to achieve comprehensive supervision. Experiments on benchmark datasets demonstrate that the proposed method outperforms other SR methods in terms of both objective evaluation and visual effect.
ARTICLE | doi:10.20944/preprints201810.0566.v1
Subject: Engineering, Other Keywords: remote sensing; evapotranspiration; CWSI; thermal images; almond; pistachio
Online: 24 October 2018 (10:45:22 CEST)
In California, water is a perennial concern. As competition for water resources increases due to growth in population, California’s tree nut farmers are committed to improving the efficiency of water used for food production. There is an imminent need to have reliable methods that provide information about the temporal and spatial variability of crop water requirements, which allow farmers to make irrigation decisions at field scale. This study focuses on estimating the actual evapotranspiration and crop coefficients of an almond and pistachio orchard located in Central Valley (California) during an entire growing season by combining a simple crop evapotranspiration model with remote sensing data. A dataset of the vegetation index NDVI derived from Landsat-8 was used to facilitate the estimation of the basal crop coefficient (Kcb), or potential crop water use. The soil water evaporation coefficient (Ke) was measured from microlysimeters. The water stress coefficient (Ks) was derived from airborne remotely sensed canopy thermal-based methods, using seasonal regressions between the crop water stress index (CWSI) and stem water potential (Ystem). These regressions were statistically-significant for both crops, indicating clear seasonal differences in pistachios, but not in almonds. In almonds, the estimated maximum Kcb values ranged between 1.05 to 0.90, while for pistachios, it ranged between 0.89 to 0.80. The model indicated a difference of 97 mm in transpiration over the season between both crops. Soil evaporation accounted for an average of 16% and 13% of the total actual evapotranspiration for almonds and pistachios, respectively. Verification of the model-based daily crop evapotranspiration estimates was done using eddy-covariance and surface renewal data collected in the same orchards, yielding an r2 >= 0.7 and average root mean square errors (RMSE) of 0.74 and 0.91 mm day-1 for almond and pistachio, respectively. It is concluded that the combination of crop evapotranspiration models with remotely-sensed data is helpful for upscaling irrigation information from plant to field scale and thus may be used by farmers for making day-to-day irrigation management decisions.
ARTICLE | doi:10.20944/preprints202106.0706.v1
Subject: Keywords: Hyperspectral Images, Classification, K means, Spectral Matching, Abundance Estimation
Online: 29 June 2021 (12:56:59 CEST)
Hyperspectral image (HSI) classification is a mechanism of analyzing differentiated land cover in remotely sensed hyperspectral images. In the last two decades, a number of different types of classification algorithms have been proposed for classifying hyperspectral data. These algorithms include supervised as well as unsupervised methods. Each of these algorithms has its own limitations. In this research, three different types of unsupervised classification methods are used to classify different datasets i-e Pavia Center, Pavia University, Cuprite, Moffett Field. The main objective is to assess the performance of all three classifiers K-Means, Spectral Matching, and Abundance Mapping, and observing their applicability on different datasets. This research also includes spectral feature extraction for hyperspectral datasets.
REVIEW | doi:10.20944/preprints201811.0162.v1
Subject: Earth Sciences, Geology Keywords: High-spatial-resolution images; Geology; Deep learning; Remote sensing
Online: 7 November 2018 (13:17:40 CET)
Geologists employ high-spatial-resolution (HR) remote sensing (RS) data for many diverse applications as they effectively reflect detailed geological information, enabling high-quality and efficient geological surveys. Applications of HR RS data to geological and related fields have grown recently. By analyzing these applications, we can better understand the results of previous studies and more effectively use the latest data and methods to efficiently extract key geological information. HR optical remote sensing data are widely used in geological hazard assessment, seismic monitoring, mineral exploitation, glacier monitoring, and mineral information extraction due to high accuracy and clear object features. Compared with optical satellite images, synthetic-aperture radar (SAR) images are stereoscopic and exhibit clear relief, strong performance, and good detection of terrain, landforms, and other information. SAR images have been applied to seismic mechanism research, volcanic monitoring, topographic deformation, and fault analysis. Furthermore, a multi-standard maturity analysis of the geological applications of HR images using literature from the Science Citation Index reveals that optical remote sensing data are superior to radar data for mining, geological disaster, lithologic, and volcanic applications, but inferior for earthquake, glacial, and fault applications. Therefore, geological remote sensing research needs to be truly multidisciplinary or interdisciplinary, ensuring more detailed and efficient surveys through cross-linking with other disciplines. Moreover, the recent application of deep learning technology to remote sensing data extraction has improved automatic processing and data analysis capabilities.
ARTICLE | doi:10.20944/preprints201810.0617.v1
Subject: Earth Sciences, Environmental Sciences Keywords: UAV images; mangrove; vegetation indices; Leaf Area Index (LAI)
Online: 26 October 2018 (05:32:52 CEST)
The urban mangrove of the Vitória Bay, Espírito Santo, Southern Brazil suffers from anthropogenic impacts, which interfere in the foliar spectral response of its species. Identifying the spectral behavior of these species and creating regression models to indirectly obtain structure data like the Leaf Area Index (LAI) are powerful environmental monitoring tools. In this study, LAI was obtained in 32 plots distributed in four stations. In situ LAI regression analysis with the SAVI resulted in significant positive relationships (r2 = 0.58). Forest variability regarding the degree of maturity and structural heterogeneity and LAI influenced the adjustment of vegetation indices (VIs). The highest regression values were obtained for the homogeneous field data, represented by R. mangle plots, which also had higher LAI values. The same field data were correlated with SAVI of a RapidEye image for comparison purposes. The results showed that, images obtained by a UAV have higher spatial resolution than the Rapideye image, and therefore had a greater influence of the background. Another point is that the statistical analysis of the field data with the IVs obtained from the RapidEye image did not present high regression coefficient (r2 = 0.7), suggesting that the use of VIs applied to the study of urban mangroves needs to be better evaluated, observing the factors that influence the leaf spectral response.
ARTICLE | doi:10.20944/preprints201810.0354.v1
Subject: Earth Sciences, Geoinformatics Keywords: open LiDAR; terrestrial images; building reconstruction; point cloud registration
Online: 16 October 2018 (11:20:43 CEST)
Recent advances in open data initiatives allow us to free access to a vast amount of open LiDAR data in many cities. However, most of these open LiDAR data over cities are acquired by airborne scanning, where the points on façades are sparse or even completely missing due to the viewpoint and object occlusions in the urban environment. Integrating other sources of data, such as ground images, to complete the missing parts is an effective and practical solution. This paper presents an approach for improving open LiDAR data coverage on building façades by using point cloud generated from ground images. A coarse-to-fine strategy is proposed to fuse these two different sources of data. Firstly, the façade point cloud generated from terrestrial images is initially geolocated by matching the SFM camera positions to their GPS meta-information. Next, an improved Coherent Point Drift algorithm with normal consistency is proposed to accurately align building façades to open LiDAR data. The significance of the work resides in the use of 2D overlapping points on the outline of buildings instead of limited 3D overlap between the two point clouds and the achievement to a reliable and precise registration under possible incomplete coverage and ambiguous correspondence. Experiments show that the proposed approach can significantly improve the façades details of buildings in open LiDAR data and improving registration accuracy from up to 10 meters to less than half a meter compared to classic registration methods.
ARTICLE | doi:10.20944/preprints201804.0251.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: ATR; ISAR/SAR images; saliency attention; SIFT; multitask-SRC
Online: 19 April 2018 (10:32:02 CEST)
In this paper, we propose a novel approach to recognize radar targets on inverse synthetic aperture radar (ISAR) and synthetic aperture radar (SAR) images. This approach is based on the multiple salient keypoint descriptors (MSKD) and multitask sparse representation based classification (MSRC). Thus, to characterize the targets in the radar images, we combine the scale-invariant feature transform (SIFT) and the saliency map. The goal of this combination is to reduce the SIFT keypoints and their time computing time by maintaining only those located in the target area (salient region). Then, we compute the feature vectors of the resulting salient SIFT keypoints (MSKD). This methodology is applied for both training and test images. The MSKD of the training images leads to construct the dictionary of a sparse convex optimization problem. To achieve the recognition, we adopt the MSRC taking into consideration each vector in the MSKD as a task. This classifier solves the sparse representation problem for each task over the dictionary and determines the class of the radar image according to all sparse reconstruction errors (residuals). The effectiveness of the proposed approach method has been demonstrated by a set of extensive empirical results on ISAR and SAR images databases. The results show the ability of our method to predict adequately the aircraft and the ground targets.
ARTICLE | doi:10.20944/preprints202212.0455.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: datasets; online; cooking; flutter, videos; images; online recommendation; machine learning
Online: 23 December 2022 (08:19:03 CET)
During the pandemic, home-cooked meals have received more attention as people get more time for themselves and their families. It cannot deny that people have more time to prepare meals, and it is safer to cook at home since they can avoid direct contact with people who can potentially F0 or F1. For that particular reason, the number of home cooking has increased rapidly. In the article, the author pointed out that in comparison to the previous time, 42.0 percent of respondents said they cooked more frequently during the lockdown, while just 7.0 percent said they cooked less frequently. The increase in cooking frequency had a more significant impact on women than on men (p<0.05). Those who said they cooked more regularly spent more time preparing meals (78\%) and trying new recipes (73.6\%), as well as spending more time baking (67.1 percent ). In particular, the consumption of “comfort” foods (salty snacks, sweets) showed a significant rise, ranging from (23\%) to (60\%) of respondents who declared snacking more studied.Moreover, articles pointed out that in conjunction with the dramatic reduction in energy expenditure due to the impossibility of going out, this situation could have led to energy imbalance and, thus, weight gain. Noticing the massive rise in home cooking, the search for new recipes, and the need for balance meals, the idea for a cooking app called "Cooking Papa" has been proposed. As far as the research goes, the concept of "cooking" seems close but yet not easy to define. The Oxford English Dictionary defines cooking as “to prepare food by the action of heat. However, limited evidence suggests that people interpret the meaning of cooking quite differently. Moreover, the terms ‘homemade’, ‘convenience,’ ‘proper cooking,’ ‘cook,’ ‘basic ingredients,’ and ‘ready prepared’ are not uniformly understood.
ARTICLE | doi:10.20944/preprints202212.0423.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: inverse problems; image reconstruction; vague figures; mental images; Artificial Intelligence
Online: 22 December 2022 (07:14:16 CET)
This article proposes application of a new mathematical model of spots for solving inverse problems using a learning method, which is similar to using the deep learning. In general, the spots represent vague figures in abstract “information spaces” or crisp figures with lack of in-formation about their shapes and are adequate for representation human mental images and reasoning in Artificial Intelligence (AI). However, crisp figures are regarded as a special and limiting case of spots. A basic mathematical apparatus, basing on L4 numbers, has been developed for the representation and processing of qualitative information of elementary spatial relations between spots. Also, we defined L4 vectors, L4 matrices, and mathematical operations on them. Developed apparatus can be used in AI, in particular, for knowledge representation and for modeling qualitative reasoning and learning. Another application area is the solution of inverse problems by learning. For example, this can be applied to image reconstruction using ultrasound, X-ray, magnetic resonance, or radar scan data. The introduced apparatus was verified by solving problems of reconstruction of images, utilizing only qualitative data of its elementary relations with some scanning figures. This article also demonstrates application of spot-based inverse Radon algorithm for binary image reconstruction.
ARTICLE | doi:10.20944/preprints202210.0070.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: vegetation indices; NDVI; RGB images; Deep Forest; Random Kernel Forests
Online: 7 October 2022 (07:26:52 CEST)
Vegetation indexes help perform precision farming because they provide useful information regarding moisture, nutrient content, and crop health. Primary sources of those indexes are satellites and unmanned aerial vehicles equipped with expensive multispectral sensors. Reducing the price of obtaining such information would increase the availability of precision farming for small farms. Several studies have proposed deep neural network methods to estimate the indexes from RGB color images. However, these methods report relatively large errors for mature plants when highly non-linear relationships of images and vegetation indexes arise. One could apply multilayer random forest-based models (Deep Forests) to solve this problem, but the discriminative power of such models is limited: they cannot catch complex dependencies between image features. In this paper, we propose a method that combines ideas of deep forests, random forests of kernel trees, and global pruning of random forests to tackle the problem. As a result, the method considers the properties of objects with a complex structure: the presence of relationships between groups of features, displacement, and scaling of objects. The experimental results show that the proposed method outperforms neural network-based solutions in several datasets.
ARTICLE | doi:10.20944/preprints202107.0257.v1
Subject: Keywords: Hyperspectral images, unsupervised Algorithm, clustering,K-means algorithm, spectral signature.
Online: 12 July 2021 (12:14:58 CEST)
Hyper-spectral images contain a wide range of bands or wavelength due to which they are rich in information. These images are taken by specialized sensors and then investigated through various supervised or unsupervised learning algorithms. Data that is acquired by hyperspectral image contain plenty of information hence it can be used in applications where materials can be analyzed keenly, even the smallest difference can be detected on the basis of spectral signature i.e. remote sensing applications. In order to retrieve information about the concerned area, the image has to be grouped in different segments and can be analyzed conveniently. In this way, only concerned portions of the image can be studied that have relevant information and the rest that do not have any information can be discarded. Image segmentation can be done to assort all pixels in groups. Many methods can be used for this purpose but in this paper, we discussed k means clustering to assort data in AVIRIS cuprite, AVIRIS Muffet and Rosis Pavia in order to calculate the number of regions in each image and retrieved information of 1st, 10th and100th band. Clustering has been done easily and efficiently as k means algorithm is the easiest approach to retrieve information.
ARTICLE | doi:10.20944/preprints202105.0438.v1
Subject: Medicine & Pharmacology, Allergology Keywords: Deep learning, infectious keratitis, cropped corneal image, slit-lamp images
Online: 19 May 2021 (10:09:06 CEST)
In this study, we aimed to develop a deep learning model for identifying bacterial keratitis (BK) and fungal keratitis (FK) by using slit-lamp images. We retrospectively collected slit-lamp images of patients with culture-proven microbial keratitis between January 1, 2010, and December 31, 2019, from two medical centers in Taiwan. We constructed a deep learning algorithm, consisting of a segmentation model for cropping cornea images and a classification model that applies convolutional neural networks to differentiate between FK and BK. The model performance was evaluated and presented as the area under the curve (AUC) of the receiver operating characteristic curves. A gradient-weighted class activation mapping technique was used to plot the heatmap of the model. By using 1330 images from 580 patients, the deep learning algorithm achieved an average diagnostic accuracy of 80.00%. The diagnostic accuracy for BK ranged from 79.59% to 95.91% and that for FK ranged from 26.31% to 63.15%. DenseNet169 showed the best model performance, with an AUC of 0.78 for both BK and FK. The heat maps revealed that the model was able to identify the corneal infiltrations. The model showed better diagnostic accuracy than the previously reported diagnostic performance of both general ophthalmologists and corneal specialists.
ARTICLE | doi:10.20944/preprints202111.0153.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Diabetic Retinopathy; Fundus Images; Retina,; Support vector machine; K-Means Clustering.
Online: 8 November 2021 (14:59:13 CET)
The complication of people with diabetes causes an illness known as Diabetic Retinopathy (DR). It is very widespread among middle-aged and older people. As diabetes progresses, patients' vision may deteriorate and cause DR. People to lose their vision because of this illness. To cope with DR, early detection is needed. Patients will have to be checked by doctors regularly, which is a waste of time and energy. DR can be divided into two groups: non-proliferative (NPDR) while the other is proliferative (PDR). In this study, machine learning (ML) techniques are used to diagnose DR early. These are PNN, SVM, Bayesian Classification, and K-Means Clustering. These techniques will be evaluated and compared with each other to choose the best methodology. A total of 300 fundus photographs are processed for training and testing. The features are extracted from these raw images using image processing techniques. After an experiment, it is concluded that PNN has an accuracy of about 89%, Bayes Classifications 94%, SVM 97%, and K-Means Clustering 87%. The preliminary results prove that SVM is the best technique for early detection of DR.
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Ship detection; image super-resolution; mid-low resolution remote sensing images
Online: 16 August 2021 (12:51:38 CEST)
Existing methods enhance mid-low resolution remote sensing ship detection by feeding super-resolved images to the detectors. Although these methods marginally improve the detection accuracy, the correlation between image super-resolution (SR) and ship detection is under-exploited. In this paper, we propose a simple but effective ship detection method called ShipSR-Det, in which both the output image and the intermediate features of the SR module are fed to the detection module. Using the super-resolved feature representation, the potential benefit introduced by image SR can be fully used for ship detection. We apply our method to the SSD and Faster-RCNN detectors and develop ShipSR-SSD and ShipSR-Faster-RCNN, respectively. Extensive ablation studies validate the effectiveness and generality of our method. Moreover, we compare ShipSR-Faster-RCNN with several state-of-the-art ship detection methods. Comparative results on the HRSC2016, DOTA and NWPU VHR-10 datasets demonstrate the superior performance of our proposed method.
ARTICLE | doi:10.20944/preprints202105.0408.v1
Subject: Engineering, Automotive Engineering Keywords: UAV Images; Monoscopic Mapping; Stereoscopic Plotting; Image Overlap; Optimal Image Selection
Online: 18 May 2021 (10:10:07 CEST)
Recently, the mapping industry has been focusing on the possibility of large-scale mapping from unmanned aerial vehicles (UAVs) owing to advantages such as easy operation and cost reduction. In order to produce large-scale maps from UAV images, it is important to obtain precise orientation parameters. For this, various techniques have been developed and are included in most of the commercial UAV image processing software. For mapping, it is equally important to select images that can cover a region of interest (ROI) with the fewest possible images. Otherwise, to map the ROI, one may have to handle too many images, and commercial software does not provide information needed to select images, nor does it explicitly explain how to select images for mapping. For these reasons, stereo mapping of UAV images in particular is time consuming and costly. In order to solve these problems, this study proposes a method to select images intelligently. We can select a minimum number of image pairs to cover the ROI with the fewest possible images. We can also select optimal image pairs to cover the ROI with the most accurate stereo pairs. We group images by strips, and generate the initial image pairs. We then apply an intelligent scheme to iteratively select optimal image pairs from the start to the end of an image strip. According to the results of the experiment, the number of images selected is greatly reduced by applying the proposed optimal image–composition algorithm. The selected image pairs produce a dense 3D point cloud over the ROI without any holes. For stereoscopic plotting, the selected image pairs were map the ROI successfully on a digital photogrammetric workstation (DPW), and a digital map covering the ROI is generated. The proposed method should contribute to time and cost reductions in UAV mapping.
ARTICLE | doi:10.20944/preprints202102.0426.v1
Subject: Earth Sciences, Atmospheric Science Keywords: karst wetland mapping; SegNet model; UAV images; fusion model; texture feature
Online: 19 February 2021 (09:44:24 CET)
Karst wetlands are being seriously damaged, and protecting it has become an important matter. Karst vegetation is the essential component of wetland and plays an important role in in the ecological functions of wetland ecosystems. Classifying karst vegetation is important for karst wetlands protection and management. This paper addressed to classify karst vegetation in Huixian National Wetland Park, located in China using the improved SegNet Deep-Learning Algorithm and UAV images. This study proposed a method to fuse single-class SegNet models using the maximum probability algorithm for karst vegetation classification, and compared with object-based RF classification and multi-class SegNet classification, respectively. This paper evaluated the performance of multi-class SegNet model and fusion of single-class SegNet model with different EPOCH values for mapping karst vegetation. A new optimized post-classification algorithm was proposed to eliminate the stitching traces caused by SegNet model prediction. The specific conclusions of this paper include the followings:(1) fusion of four single-class SegNet models produced better classification for karst wetland vegetation than multi-class SegNet model, and achieved the highest overall classification accuracy (87.34%); (2) The optimized post-classification algorithm was able to improve prediction accuracy of SegNet model, and it could eliminate splicing traces; (3) The karst wetland vegetation classifications produced by single-class SegNet model outperformed multi-class SegNet model, and improved classification accuracy(F1-Score) between 10%~25%;(4)The EPCOH values and textural feature important impact on karst wetland vegetation classifications. The SegNet model with EPCOH 15 achieved greater classification accuracy(F1-Score) than the model with EPOCH 5 or 10. The textural feature improved improves the capability of the SegNet model for mapping karst vegetation;(5) Fusion of single-class SegNet models and object-based RF model could provide high classifications results for karst wetland vegetation, and both achieved greater 87% overall accuracy.
ARTICLE | doi:10.20944/preprints201812.0320.v1
Subject: Earth Sciences, Environmental Sciences Keywords: Central Rift Valley, Ethiopia, Landsat images, Lake, land use/land cover
Online: 27 December 2018 (10:49:16 CET)
LULC changes are major environmental challenges in many parts of the world which are adversely affecting ecosystem services. This study was aimed to analyze LULC changes in the ecological landscape of Ethiopia CRV areas from 1985 to 2015. Satellite images were accessed and pre-processing and classification is done. Major LULC types were detected and change analysis was executed. Nine LULC changes were successfully evaluated. The classification result revealed that in 1985, 44.34% of the land was covered with small scale farming followed by mixed cultivated/acacia (21.89%), open woodland (11.96%), and water bodies (9.77%). Whereas for the same study year open grazing land, forest, degraded savannah and settlements accounted the smallest proportion. Though the area varied among land use classes, the trend of share occupied by the LULC types in the study area remained the same in 1995 and 2015. Increase in small and large scale farming, settlements and mixed cultivation/acacia while a decrease in water bodies, forest, and open woodlands is noted. About 86.11% of the land showed major changes in land use/cover. Lastly, DPSIR framework analysis was done and integrated land use and development planning and policy reform are suggested for sustainable land use planning and management.
ARTICLE | doi:10.20944/preprints202107.0200.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: image quality assessment; image quality metrics; NR-IQAs; D-IQA; OCR accuracy; OCR prediction; OCR improvements; visual aids; visually impaired; reading aids; document images; text-based images
Online: 8 July 2021 (13:21:49 CEST)
For Visually impaired People (VIPs), the ability to convert text to sound can mean a new level of independence or the simple joy of a good book. With significant advances in Optical Character Recognition (OCR) in recent years, a number of reading aids are appearing on the market. These reading aids convert images captured by a camera to text which can then be read aloud. However, all of these reading aids suffer from a key issue – the user must be able to visually target the text and capture an image of sufficient quality for the OCR algorithm to function – no small task for VIPs. In this work, a Sound-Emitting Document Image Quality Assessment metric (SEDIQA) is proposed which allows the user to hear the quality of the text image and automatically captures the best image for OCR accuracy. This work also includes testing of OCR performance against image degradations, to identify the most significant contributors to accuracy reduction. The proposed No-Reference Image Quality Assessor (NR-IQA) is validated alongside established NR-IQAs and this work includes insights into the performance of these NR-IQAs on document images.
ARTICLE | doi:10.20944/preprints202110.0362.v1
Subject: Mathematics & Computer Science, Probability And Statistics Keywords: 3D reconstruction; 3D data smoothing; mesh simplification; high resolution micro-CT images
Online: 25 October 2021 (15:34:27 CEST)
Three-dimensional reconstruction plays an important role in assisting doctors and surgeons in diagnosing bone defects’ healing progress. Common three-dimensional reconstruction methods include surface and volume rendering. As the focus is on the shape of the bone, volume rendering is omitted. Many improvements have been made on surface rendering methods like Marching Cubes and Marching Tetrahedra, but not many on working towards real-time or near real-time surface rendering for large medical images, and studying the effects of different parameter settings for the improvements. Hence, in this study, an attempt towards near real-time surface rendering for large medical images is made. Different parameter values are experimented on to study their effect on reconstruction accuracy, reconstruction and rendering time, and the number of vertices and faces. The proposed improvement involving three-dimensional data smoothing with convolution kernel Gaussian size 0.5 and mesh simplification reduction factor of 0.1, is the best parameter value combination for achieving a good balance between high reconstruction accuracy, low total execution time, and a low number of vertices and faces. It has successfully increased the reconstruction accuracy by 0.0235%, decreased the total execution time by 69.81%, and decreased the number of vertices and faces by 86.57% and 86.61% respectively.
ARTICLE | doi:10.20944/preprints202007.0052.v1
Subject: Keywords: COVID-19; Breast Cancer; Breast Cancer (Suspected); Mammogram Images and Invasive Cancer
Online: 5 July 2020 (07:40:57 CEST)
Breast cancer develops from cells lining the milk ducts and slowly grows into a lump or a tumour. Breast cancer may be invasive or non-invasive. Invasive cancer spreads from the milk duct or lobule to other tissues in the breast, whereas, non-invasive ones lack the ability to invade other breast tissues. Non-invasive breast cancer is called in situ and may remain inactive for entire lifetime. Due to heterogeneity nature of breast, density as well as masses is variable in size and shape. A dataset of 18056 patients are collected from 20 Government Hospitals and 50 Private Hospitals in West Bengal before COVID-19 and after COVID-19. The classification of patients are made on three classes- Normal, Sign of Abnormality and Abnormality. The reports of MRIs of patients in January 2020 and February 2020 are collected from different hospitals. It is treated as dataset before COVID-19 . MRIS of patients in April 2020 and May 2020 are dataset during COVID-19. The entire datasets are accumulated for testing of any change in patients MRIS after the official announcement of new virus COVID-19 in March 2020. The aim of the paper is to make a comparison of any change in size and shape of masses of MRIs of patients before and after COVId-19. All collected MRIs reports are diagnosed by radiologists of hospitals.
ARTICLE | doi:10.20944/preprints202002.0442.v1
Subject: Earth Sciences, Geoinformatics Keywords: Digital Elevation Models; ortho-mosaicked images; glacier; remote sensing; Unmanned Aerial Vehicle
Online: 28 February 2020 (13:34:22 CET)
Unmanned Aerial Vehicle (UAV) based remote sensing (RS) studies in glaciology are mainly focusing on obtaining accurate high-resolution data from UAV images. Studies for identifying and minimising the challenges faced during the UAV-based RS data acquisition survey on inaccessible and harsh terrains of mountain glaciers is limited. This study aims to examine the practical challenges faced during UAV surveys of glaciers and derive strategies to minimize them. To the authors' knowledge, this is the first study that addresses such problems over the Himalayan region. Here, the UAV surveys were conducted using a fixed-wing commercial-grade off-the-shelf UAV (eBee plus, SenseFly) on three glacier sites (East Rathong, Hamtah and Panchinala-A) located in different zones and climate regimes lying within the Indian part of Himalayas. From UAV collected photos, the study was able to generate ultra-high-resolution ortho-mosaicked images and Digital Elevation Models (DEMs) at 0.1m GSD. UAV-derived DEMs was able to achieve vertical (horizontal) accuracy of 0.45 and 0.21 m (0.15 and 0.1 m) with 3 and 6 ground control points (GCPs) for an area of 0.75 km2 and 1.38 km2. Accuracy assessment of UAV DEMs generated with and without GCPs indicate that GCPs are must to obtain decimetre level accurate DEM especially on glaciers with steep-valleyed terrains. The utility of the obtained ultra-high-resolution ortho-mosaicked images was demonstrated by generating glacier surface feature maps. Based on the challenges observed during UAV surveys, the study identifies and recommends best-suited locations on a glacier and its adjacent regions for conducting UAV surveys efficiently in the glaciated terrain of Himalayas and possibly beyond. Recommendations reported in this article shall minimise the challenges faced and involved risks for data acquisition and thus enable UAVs to cover more glaciated area successfully.
ARTICLE | doi:10.20944/preprints201908.0225.v1
Subject: Earth Sciences, Geoinformatics Keywords: water bodies; satellite images; vector data; SVM; positive and negative buffering; polygons
Online: 21 August 2019 (10:30:16 CEST)
The technique of obtaining information or data about any feature or object from afar, called in technical parlance as remote sensing, has proven extremely useful in diverse fields. In the ecological sphere, especially, remote sensing has enabled collection of data or information about large swaths of areas or landscapes. Even then, in remote sensing the task of identifying and monitoring of different water reservoirs has proved a tough one. This is mainly because getting correct appraisals about the spread and boundaries of the area under study and the contours of any water surfaces lodged therein becomes a factor of utmost importance. Identification of water reservoirs is rendered even tougher because of presence of cloud in satellite images, which becomes the largest source of error in identification of water surfaces. To overcome this glitch, the method of the shape matching approach for analysis of cloudy images in reference to cloud-free images of water surfaces with the help of vector data processing, is recommended. It includes the database of water bodies in vector format, which is a complex polygon structure. This analysis highlights three steps: First, the creation of vector database for the analysis; second, simplification of multi-scale vector polygon features; and third, the matching of reference and target water bodies database within defined distance tolerance. This feature matching approach provides matching of one to many and many to many features. It also gives the corrected images that are free of clouds.
ARTICLE | doi:10.20944/preprints201804.0377.v1
Subject: Earth Sciences, Geoinformatics Keywords: land cover change detection; adaptive contextual information; bi-temporal remote sensing images
Online: 29 April 2018 (10:52:26 CEST)
Land cover change detection (LCCD) based on bi-temporal remote sensing images plays an important role in the inventory of land cover change. Due to the benefit of having spatial dependency properties within the image space while using remote sensing images for detecting land cover change, many contextual information based change detection methods have been proposed during past decades. However, there is still a space for improvement in accuracies and usability of LCCD. In this paper, a LCCD method based on adaptive contextual information is proposed. First, an adaptive region is constructed by gradually detecting the spectral similarity surrounding a central pixel. Second, the Euclidean distance between pairwise extended regions is calculated to measure the change magnitude between the pairwise central pixels of bi-temporal images. While the whole bi-temporal images are scanned pixel-by-pixel, the change magnitude image (CMI) can be generated. Then, the Otsu or a manual threshold is employed to acquire the binary change detection map (BCDM). The detection accuracies of the proposed approach are investigated by two land cover change cases with Landsat bi-temporal remote sensing images. In comparison to several widely used change detection methods, the proposed approach can achieve a land cover change inventory map with a competitive accuracy.
ARTICLE | doi:10.20944/preprints201705.0214.v1
Subject: Earth Sciences, Geoinformatics Keywords: multi-spectral analysis; remote sensing images; sparse coding; generalized aggregation; scene recognition
Online: 30 May 2017 (08:54:08 CEST)
Satellite scene classification is challenging because of the high variability inherent in satellite data. Although rapid progress in remote sensing techniques has been witnessed in recent years, the resolution of the available satellite images remains limited compared with the general images acquired using a common camera. On the other hand, a satellite image usually has a greater number of spectral bands than a general image, thereby permitting the multi-spectral analysis of different land materials and promoting low-resolution satellite scene recognition. This study advocates multi-spectral analysis and explores the middle-level statistics of spectral information for satellite scene representation instead of using spatial analysis. This approach is widely utilized in general image and natural scene classification and achieved promising recognition performance for different applications. The proposed multi-spectral analysis firstly learns the multi-spectral prototypes (codebook) for representing any pixel-wise spectral data, and then based on the learned codebook, a sparse coded spectral vector can be obtained with machine learning techniques. Furthermore, in order to combine the set of coded spectral vectors in a satellite scene image, we propose a hybrid aggregation (pooling) approach, instead of conventional averaging and max pooling, which includes the benefits of the two existing methods but avoids extremely noisy coded values. Experiments on three satellite datasets validated that the performance of our proposed approach is much more accurate than even the deep learning framework for spatial analysis.
ARTICLE | doi:10.20944/preprints202209.0025.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: object detection; semi-supervised learning; Mask R-CNN; floor-plan images; computer vision
Online: 1 September 2022 (15:16:43 CEST)
Research has been growing on object detection using semi-supervised methods in past few years. We examine the intersection of these two areas for floor-plan objects to promote the research objective of detecting more accurate objects with less labelled data. The floor-plan objects include different furniture items with multiple types of the same class, and this high inter-class similarity impacts the performance of prior methods. In this paper, we present Mask R-CNN based semi-supervised approach that provides pixel-to-pixel alignment to generate individual annotation masks for each class to mine the inter-class similarity. The semi-supervised approach has a student-teacher network that pulls information from the teacher network and feeds it to the student network. The teacher network uses unlabeled data to form pseudo-boxes, and the student network uses both unlabeled data with the pseudo boxes and labelled data as ground truth for training. It learns representations of furniture items by combining labelled and unlabeled data. On the Mask R-CNN detector with ResNet-101 backbone network, the proposed approach achieves mAP of 98.8%, 99.7%, and 99.8% with only 1%, 5% and 10% labelled data, respectively. Our experiment affirms the efficiency of the proposed approach as it outperforms the fully supervised counterpart using only 10% of the labels.
ARTICLE | doi:10.3390/sci2010010
Subject: Keywords: black and white aerial photographs; multispectral satellite images; data fusion; correlation tables; classification
Online: 13 March 2020 (00:00:00 CET)
Αbstract: To date, countless satellite image fusions have been made, mainly with panchromatic spatial resolution to a multispectral image ratio of 1/4, fewer fusions with lower ratios, and relatively recently fusions with much higher spatial resolution ratios have been published. Apart from this, there is a small number of publications studying the fusion of aerial photographs with satellite images, with the year of image acquisition varying and the dates of acquisition not mentioned. In addition, in these publications, either no quantitative controls are performed on the composite images produced, or the aerial photographs are recent and colorful and only the RGB bands of the satellite images are used for data fusion purposes. The objective of this paper is the study of the addition of multispectral information from satellite images to black and white aerial photographs of the 80s decade (1980–1990) with small difference (just a few days) in their image acquisition date, the same year and season. Quantitative tests are performed in two case studies and the results are encouraging, as the accuracy of the classification of the features and objects of the Earth’s surface is improved and the automatic digital extraction of their form and shape from the archived aerial photographs is now allowed. This opens up a new field of use for the black and white aerial photographs and archived multispectral satellite images of the same period in a variety of applications, such as the temporal changes of cities, forests and archaeological sites.
ARTICLE | doi:10.20944/preprints201908.0104.v1
Subject: Earth Sciences, Environmental Sciences Keywords: GEOBIA; canga ecosystem; Carajás National Forest; mine land revegetation; satellite images; environmental assessment
Online: 8 August 2019 (12:00:50 CEST)
Remote sensing technologies may play a fundamental role in the environmental assessment of open-cast mining and the accurate quantification of mine land rehabilitation efforts. Here, we developed a systematic geographic object-based image analysis (GEOBIA) approach to map the amount of revegetated area and to quantify the land-use changes in open-cast mines in the Carajás region situated in the eastern Amazon. Based on high-resolution satellite images from 2011 to 2015 from different sensors (GeoEye, WorldView-3 and Ikonos), we quantified forests, cangas (natural metalliferous savanna ecosystems), mine land, revegetated areas and water bodies. Based on the GEOBIA approach, threshold values were established to discriminate land cover classes using spectral bands, and the NDVI and NDWI indices and LiDAR digital ground and slope models. The overall accuracy was higher than 90%, and the Kappa indices varied between 0.82 and 0.88. During the observation period, the mining complex expanded; for that, canga and forest vegetation was converted to mine land. At the same time, the amount of revegetated area increased. Thus, we conclude that our approach is capable of providing consistent information regarding land cover changes in mines, with a special focus on the amount of revegetation necessary to fulfill environmental liabilities.
REVIEW | doi:10.20944/preprints201806.0301.v1
Subject: Medicine & Pharmacology, Ophthalmology Keywords: Glaucoma; Intraocular Pressure(IOP); fundus images; early detection; Cup-to-Disk Ratio(CDR)
Online: 19 June 2018 (13:54:42 CEST)
Glaucoma is a disease associated with retina of eye. Presently, millions of human being is suffering from this disease. Early detection of these diseases can save the people from blindness. Therefore, various methods have been developed for its detection. In this paper, we have studied the reported methods and summarized their performance in terms of accuracy of detection.
ARTICLE | doi:10.20944/preprints201710.0166.v2
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: satellite images; image analysis; self organizing maps; quantization error; structural change; demographic data
Online: 20 March 2018 (10:38:43 CET)
The quantization error (QE) from Self-Organizing Map (SOM) output after learning is exploited in this studies. SOM learning is applied on time series of spatial contrast images with variable relative amount of white and dark pixel contents, as in monochromatic medical images or satellite images. It is proven that the QE from the SOM output after learning provides a reliable indicator of potentially critical changes in images across time. The QE increases linearly with the variability in spatial contrast contents of images across time when contrast intensity is kept constant. The hitherto unsuspected capacity of this metric to capture even the smallest changes in large bodies of image time series after using ultra-fast SOM learning is illustrated on examples from SOM learning studies on computer generated images, MRI image time series, and satellite image time series. Linear trend analysis of the changes in QE as a function of the time an image of a given series was taken gives proof of the statistical reliability of this metric as an indicator of local change. It is shown that the QE is correlated with significant clinical, demographic, and environmental data from the same reference time period during which test image series were recorded. The findings show that the QE from SOM, which is easily implemented and requires computation times no longer than a few minutes for a given image series of 20 to 25, is useful for a fast analysis of whole series of image data when the goal is to provide an instant statistical decision relative to change/no change between images.
ARTICLE | doi:10.20944/preprints201608.0192.v1
Subject: Biology, Physiology Keywords: trichromacy; opponency; color circularity; spectral images; unique colors; four-color map problem; perception
Online: 23 August 2016 (10:34:13 CEST)
The reasons for the circular sense of human color perception generated by two sorts of color opponent neurons and three cone types are not well understood. Here we use geometrical analysis to examine the hypothesis that opponency, the recursive nature of color perception, and trichromacy arise as the most efficient ways of distinguishing spectrally different points on a plane using a minimum of color classes and receptor types.
ARTICLE | doi:10.20944/preprints202210.0131.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Adversarial examples; Remote sensing images; Universal adversarial patch; Object detection; Joint optimization; Scale factor.
Online: 11 October 2022 (02:34:23 CEST)
Although deep learning has received extensive attention and achieved excellent performances in various of scenarios, it suffers from adversarial examples to some extent. Especially, physical attack poses more threats than digital attack. However, existing researches pay less attention to physical attack of object detection in remote sensing images (RSIs). In this work, we systematically analyze the universal adversarial patch attack for multi-scale objects in the remote sensing field. There are two challenges for adversarial attack in RSIs. On one hand, the number of objects in remote sensing images is more than that of natural images. Therefore, it is difficult for adversarial patch to show adversarial effect on all objects when attacking a detector of RSIs. On the other hand, the wide range of height of photography platform causes that the size of objects diverse a lot, which brings challenges for generating universal adversarial perturbation for multi-scale objects. To this end, we propose an adversarial attack method on object detection for remote sensing data. One of the key ideas of the proposed method is the novel optimization of adversarial patch. We aim to attack as many objects as possible by formulating a joint optimization problem. Besides, we raise a scale factor to generate a universal adversarial patch that adapts to multi-scale objects, which ensures the adversarial patch is valid for multi-scale objects in the real world. Extensive experiments demonstrate the superiority of our method against state-of-the-art methods on YOLO-v3 and YOLO-v5. In addition, we also validate the effectiveness of our method in real-world applications.
ARTICLE | doi:10.20944/preprints202108.0469.v2
Subject: Mathematics & Computer Science, Other Keywords: U-shape network; fully convolutional networks; deep learning; macula fovea; ultra-widefield Fundus images
Online: 7 September 2021 (11:51:17 CEST)
Macula fovea detection is a crucial prerequisite towards screening and diagnosing macular diseases. Without early detection and proper treatment, any abnormality involving the macula may lead to blindness. However, with the ophthalmologist shortage and time-consuming artificial evaluation, neither accuracy nor effectiveness of the diagnose process could be guaranteed. In this project, we proposed a deep learning approach on ultra-widefield fundus (UWF) images for macula fovea detection. This study collected 2300 ultra-widefield fundus images from Shenzhen Aier Eye Hospital in China. Methods based on U-shape network (Unet) and Fully Convolutional Networks (FCN) are implemented on 1800 (before amplifying process) training fundus images, 400 (before amplifying process) validation images and 100 test images. Three professional ophthalmologists were invited to mark the fovea. A method from the anatomy perspective is investigated. This approach is derived from the spatial relationship between macula fovea and optic disc center in UWF. A set of parameters of this method is set based on the experience of ophthalmologists and verified to be effective. Results are measured by calculating the Euclidean distance between proposed approaches and the accurate grounded standard, which is detected by Ultra-widefield swept-source optical coherence tomograph (UWF-OCT) approach. Through a comparation of proposed methods, we conclude that, deep learning approach of Unet outperformed other methods on macula fovea detection tasks, by which outcomes obtained are comparable to grounded standard method.
ARTICLE | doi:10.20944/preprints202009.0524.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: COVID-19; chest X-ray images; deep convolutional neural network; COV-MCNet; deep learning
Online: 23 September 2020 (03:31:30 CEST)
The COVID-19 pandemic situation has created even more difficulties in the quick identification and screening of the COVID-19 patients for the medical specialists. Therefore, a significant study is necessary for detecting COVID-19 cases using an automated diagnosis method, which can aid in controlling the spreading of the virus. In this paper, the study suggests a Deep Convolutional Neural Network-based multi-classification approach (COV-MCNet) using eight different pre-trained architectures such as VGG16, VGG19, ResNet50V2, DenseNet201, InceptionV3, MobileNet, InceptionResNetV2, Xception which are trained and tested on the X-ray images of COVID-19, Normal, Viral Pneumonia, and Bacterial Pneumonia. The results from 3-class (Normal vs. COVID-19 vs. Viral Pneumonia) showed that only the ResNet50V2 model provides the highest classification performance (accuracy: 95.83%, precision: 96.12%, recall: 96.11%, F1-score: 96.11%, specificity: 97.84%) compared to rest of the models. The results from 4-class (Normal vs. COVID-19 vs. Viral Pneumonia vs. Bacterial Pneumonia) demonstrated that the pre-trained model DenseNet201 provides the highest classification performance (accuracy: 92.54%, precision: 93.05%, recall: 92.81%, F1-score: 92.83%, specificity: 97.47%). Notably, the ResNet50V2 (3-class) and DenseNet201 (4-class) models in the proposed COV-MCNet framework showed higher accuracy compared to the rest six models. This indicates that the designed system can produce promising results to detect the COVID-19 cases on the availability of more data. The proposed multi-classification network (COV-MCNet) significantly speeds up the existing radiology-based method, which will be helpful to the medical community and clinical specialists for early diagnosis of the COVID-19 cases during this pandemic.
ARTICLE | doi:10.20944/preprints201712.0157.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: Coregistration; pansharpening; multi-sensor fusion; multitemporal images; deep learning; normalized difference vegetation index (NDVI)
Online: 21 December 2017 (17:10:23 CET)
Sensitivity to weather conditions, and specially to clouds, is a severe limiting factor to the use of optical remote sensing for Earth monitoring applications. A possible alternative, is to resort to weather-insensitive synthetic aperture radar (SAR) images. However, in many real-world applications, critical decisions are made based on some informative spectral features, such as water, vegetation or soil indices, which cannot be extracted from SAR images. In the absence of optical sources, these data must be estimated. The current practice is to perform linear interpolation between data available at temporally close time instants. In this work, we propose to estimate missing spectral features through data fusion and deep-learning. Several sources of information are taken into account - optical sequences, SAR sequences, DEM - so as to exploit both temporal and cross-sensor dependencies. Based on these data, and a tiny cloud-free fraction of the target image, a compact convolutional neural network (CNN) is trained to perform the desired estimation. To validate the proposed approach, we focus on the estimation of the normalized difference vegetation index (NDVI), using coupled Sentinel-1 and Sentinel-2 time-series acquired over an agricultural region of Burkina Faso from May to November 2016. Several fusion schemes are considered, causal and non-causal, single-sensor or joint-sensor, corresponding to different operating conditions. Experimental results are very promising, showing a significant gain over baselines methods according to all performance indicators.
CONCEPT PAPER | doi:10.20944/preprints202006.0341.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: Hyper spectral Document Images; Non-destructive Analysis; Forensics Document; Ink Mismatch Detection; K-means Clustering
Online: 28 June 2020 (19:26:25 CEST)
Hyper spectral imaging (HSI) is a technique that is used to obtain the spectrum for each pixel in the image. It helps in finding objects and identifying materials etc. Such an identification is very difficult using other imaging techniques. It allows the researchers to investigate the documents without any physical contact. Nowadays detection of unequal Ink mismatch based on HSI has shown vast improvement in distinguishing the inks. Detection of unequal Ink mismatch is an unbalanced clustering problem. This paper used K-means Clustering for ink mismatch detection. K-means Clustering find same subgroups in the data based on Euclidean distance. This paper demonstrates performance in unequal Ink mismatch based on HSI.
ARTICLE | doi:10.20944/preprints201612.0075.v1
Subject: Earth Sciences, Geoinformatics Keywords: image recognition bases location; indoor positioning; RGB-D images; LiDAR; DataBase; mobile computing; image retrieval
Online: 15 December 2016 (07:17:35 CET)
This paper describes the first results of an Image Recognition Based Location (IRBL) for mobile application focusing on the procedure to generate a Database of range images (RGB-D). In an indoor environment, to estimate the camera position and orientation, a prior spatial knowledge of the surrounding is needed. In order to achieve this objective a complete 3D survey of two different environment (Bangbae metro station of Seoul and E.T.R.I. building in Daejeon – Republic of Korea) was performed using LiDAR (Light Detection And Ranging) instrument and the obtained scans were processed in order to obtain a spatial model of the environments. From this, two databases of reference images were generated using a specific software realized by the Geomatics group of Politecnico di Torino (ScanToRGBDImage). This tool allow to generate synthetically different RGB-D images) centered in the each scan position in the environment. Later, the external parameters (X, Y, Z, ω, φ, κ) and the range information extracted from the DB images retrieved, are used as reference information for pose estimation of a set of acquired mobile pictures in the IRBL procedure. In this paper the survey operations, the approach for generating the RGB-D images and the IRB strategy are reported. Finally the analysis of the results and the validation test are described.
ARTICLE | doi:10.20944/preprints202208.0329.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: Synthetic Aperture Radar; Doppler frequencies; multi-chromatic analysis; micro-motion; Pyramid of Khnum-Khufu; sonic images
Online: 18 August 2022 (03:45:58 CEST)
A problem with synthetic aperture radar (SAR) is that, due to the poor penetrating action of electromagnetic waves inside solid bodies, the capability to observe inside distributed targets is precluded. Under these conditions, imaging action is provided only on the surface of distributed targets. The present work describes an imaging method based on the analysis of micro-movements on the Khnum-Khufu Pyramid, which are usually generated by background seismic waves. The results obtained prove to be very promising, as high-resolution full 3D tomographic imaging of the pyramid's interior and subsurface was achieved. Khnum-Khufu becomes transparent like a crystal when observed in the micro-movement domain. Based on this novelty, we have completely reconstructed internal objects, observing and measuring structures that have never been discovered before. The experimental results are estimated by processing series of SAR images from the second-generation Italian COSMO-SkyMed satellite system, demonstrating the effectiveness of the proposed method.
ARTICLE | doi:10.20944/preprints201710.0181.v1
Subject: Mathematics & Computer Science, Analysis Keywords: ultrasound image analysis; speckle noise; synthetic ultrasound images; texture features; local binary patterns; image quality assessment
Online: 30 October 2017 (09:37:59 CET)
Speckle noise reduction is an important area of research in the field of ultrasound image processing. Several algorithms for speckle noise characterization and analysis have been recently proposed in the area. Synthetic ultrasound images can play a key role in noise evaluation methods as they can be used to generate a variety of speckle noise models under different interpolation and sampling schemes, and can also provide valuable ground truth data for estimating the accuracy of the chosen methods. However, not much work has been done in the area of modelling synthetic ultrasound images, and in simulating speckle noise generation to get images that are as close as possible to real ultrasound images. An important aspect of simulated synthetic ultrasound images is the requirement for extensive quality assessment for ensuring that they have the texture characteristics and gray-tone features of real images. This paper presents texture feature analysis of synthetic ultrasound images using local binary patterns (LBP) and demonstrates the usefulness of a set of LBP features for image quality assessment. Experimental results presented in the paper clearly show how these features could provide an accurate quality metric that correlates very well with subjective evaluations performed by clinical experts.
ARTICLE | doi:10.20944/preprints202206.0238.v1
Subject: Earth Sciences, Atmospheric Science Keywords: neural networks; satellite images; class imbalance; feature attribution; lightning prediction; nowcasting; short-term forecasts; machine learning; meteorology
Online: 16 June 2022 (10:48:59 CEST)
While thunderstorms can pose severe risks to property and life, forecasting remains challenging, even at short lead times, as these often arise in meta-stable atmospheric conditions. In this paper, we examine the question of how well we could perform short-term (up to 180min) forecasts using exclusively multi-spectral satellite images and past lighting events as data. We employ representation learning based on deep convolutional neural networks in an “end-to-end” fashion. Here, a crucial problem is handling the imbalance of the positive and negative classes appropriately in order to be able to obtain predictive results (which is not addressed by many previous machine-learning-based approaches). The resulting network outperforms previous methods based on physically-based features and optical flow methods (similar to operational prediction models) and generalizes across different years. A closer examination of the classifier performance over time and under masking of input data indicates that the learned model actually draws most information from structures in the visible spectrum, with infrared imaging sustaining some classification performance during the night.
ARTICLE | doi:10.20944/preprints202110.0089.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Object Detection; Cascade Mask R-CNN; Floor Plan Images; Deep Learning; Transfer Learning; Dataset Augmentation; Computer Vision
Online: 5 October 2021 (15:09:26 CEST)
Object detection is one of the most critical tasks in the field of Computer vision. This task comprises identifying and localizing an object in the image. Architectural floor plans represent the layout of buildings and apartments. The floor plans consist of walls, windows, stairs, and other furniture objects. While recognizing floor plan objects is straightforward for humans, automatically processing floor plans and recognizing objects is a challenging problem. In this work, we investigate the performance of the recently introduced Cascade Mask R-CNN network to solve object detection in floor plan images. Furthermore, we experimentally establish that deformable convolution works better than conventional convolutions in the proposed framework. Identifying objects in floor plan images is also challenging due to the variety of floor plans and different objects. We faced a problem in training our network because of the lack of publicly available datasets. Currently, available public datasets do not have enough images to train deep neural networks efficiently. We introduce SFPI, a novel synthetic floor plan dataset consisting of 10000 images to address this issue. Our proposed method conveniently surpasses the previous state-of-the-art results on the SESYD dataset and sets impressive baseline results on the proposed SFPI dataset. The dataset can be downloaded from SFPI Dataset Link. We believe that the novel dataset enables the researcher to enhance the research in this domain further.
ARTICLE | doi:10.20944/preprints202012.0222.v1
Subject: Medicine & Pharmacology, Allergology Keywords: event-related potentials; visual evoked potentials; component P300; brain-computer interface; speller; oddball paradigm; categorization of images.
Online: 9 December 2020 (11:56:41 CET)
The objective of this study was aimed to study the sensory processes of the “human-computer interaction” model when classifying visual images with an incomplete set of signs based on the analysis of early, middle, late and slow components of event-related potentials (ERPs). 26 healthy subjects (men) aged 20-22 years were investigated. ERPs in 19 monopolar sites according to the 10/20 system were recorded. Discriminant and factor analysis were applied. The component N450 is the most specialized indicator of the perception of unrecognizable (oddball) visual images. The amplitude of the ultra-late components N750 and N900 is also higher under conditions of presentation of the oddball image, regardless of the location of the registration points. The results of the study are discussed in the light of the paradigm of the P300 wave application in brain-computer interface systems, as well as with the peculiarities in brain pathology. Promising directions for the development of studies of the “Brain Computer Interface” (BCI) P300 systems are to increase the throughput of information flows. To extend the application of the P300 ERPs to multiple modalities, the underlying physiological mechanisms and responses of the brain for a particular sensory system and mental function must be carefully examined.
ARTICLE | doi:10.20944/preprints202206.0120.v1
Subject: Earth Sciences, Environmental Sciences Keywords: Miscanthus; remote sensing; UAV; multispectral images; high-throughput phenotyping; machine learning; yield prediction; trait estimation; PROSAIL; multi-sensor interoperability
Online: 8 June 2022 (09:44:59 CEST)
Miscanthus holds a great potential in the frame of the bioeconomy and yield prediction can help improving Miscanthus logistic supply chain. Breeding programs in several countries are attempting to produce high-yielding Miscanthus hybrids better adapted to different climates and end-uses. Multispectral images acquired from unmanned aerial vehicles (UAVs) in Italy and in the UK in 2021 and 2022 were used to investigate the feasibility of high-throughput phenotyping (HTP) of novel Miscanthus hybrids for yield prediction and crop traits estimation. An intercalibration procedure was performed using simulated data from the PROSAIL model to link vegetation indices (VIs) derived from two different multispectral sensors. Random forest algorithm estimated with good accuracy yield traits (light interception, plant height, green leaf biomass and standing biomass) using VIs time series and predicted yield using peak descriptor derived from VIs time series with 2.3 Mg DM ha-1 of RMSE. The study demonstrates the potential of UAVs multispectral in HTP applications and in yield prediction for providing important information needed to increase sustainable biomass production.
ARTICLE | doi:10.20944/preprints201901.0088.v1
Subject: Engineering, Other Keywords: signal-to-noise ratio; nighttime light imaging; time sequence images; Luojia 1-01; radiative transfer model; radiometric calibration; in-orbit test
Online: 9 January 2019 (15:43:53 CET)
Signal-to-noise ratio (SNR) is an important index to evaluate radiation performance and image quality of optical imaging systems under low illumination background. Under the nighttime lighting condition, the illumination of remote sensing objects is low and varies greatly, usually ranging from several lux to tens of thousands of lux. Nighttime light remote sensing imaging requires high sensitivity and large dynamic range of detectors. Luojia 1-01 is the first professional nighttime light remote sensing satellite in the world. In this paper, we took the nighttime light remote sensing camera carried on the satellite as research object, proposed an in-orbit SNR test method based on time series images to overcome the problem of low spatial resolution. We first analyzed the process of luminous flux transmission between objects and satellite and established a radiative transfer model. By combining the parameters of large relative aperture optical system and high sensitivity CMOS device, we established SNR model and specially analyzed the effect of exposure time and quantization bits on SNR. Finally we used the proposed in-orbit test method to calculate SNR of lighting images acquired by satellite. And the measured result is in good agreement with the model predicted data. Under the condition of 10lx illumination, the SNR of typical objects can reach 27.02dB, which is much better than the requirement of 20dB for engineering application.
ARTICLE | doi:10.20944/preprints202110.0433.v1
Subject: Physical Sciences, Fluids & Plasmas Keywords: X-ray Imaging; Plasma Diagnostics; Electron Cyclotron Resonance Ion Sources; High Dynamical Range Analysis; Single-Photon-Counted Images; X-ray Spatially-resolved Spectroscopy
Online: 28 October 2021 (11:39:28 CEST)
At INFN-LNS, and in collaboration with the ATOMKI laboratories, an innovative multi-diagnostic system with advanced analytical methods has been designed and implemented. This is based on several detectors and techniques (Optical Emission Spectroscopy, RF systems, Interfero-polarimetry, X-ray detectors) and here we focus on high resolution spatially-resolved X-ray spectroscopy, performed by means of a X-ray pin-hole camera setup operating in the 0.5−20 keV energy domain. The diagnostic system was installed at a 14 GHz Electron Cyclotron Resonance (ECR) ion source (ATOMKI, Debrecen), enabling high precision X-ray spectrally-resolved imaging of ECR plasmas heated by hundreds of Watts. The achieved spatial and energy resolution were 0.5 mm and 300 eV at 8 keV, respectively. We here present the innovative analysis algorithm that we properly developed for obtaining Single Photon-Counted (SPhC) images providing the local plasma emitted spectrum in a High-Dynamic-Range (HDR) mode, by distinguishing fluorescence lines of the materials of the plasma chamber (Ti, Ta) from plasma (Ar). This method allows a quantitative characterization of warm electrons population in the plasma (and its 2D distribution) which are the most important for ionization, and also to estimate local plasma density and spectral temperatures. The developed post-processing analysis is also able to remove the readout noise, that is often observable at very low exposure times (msec). The setup is now under update including fast shutters and trigger systems in order to allow simultaneously space and time-resolved plasma spectroscopy during transients, stable and turbulent regimes.
REVIEW | doi:10.20944/preprints202104.0739.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Deep neural network; survey; document images; review paper; deep learning; performance evaluation; page object detection, graphical page objects; document image analysis; page segmentation
Online: 28 April 2021 (10:17:49 CEST)
In any document, graphical elements like tables, figures, and formulas contain essential information. The processing and interpretation of such information require specialized algorithms. Off-the-shelf OCR components cannot process this information reliably. Therefore, an essential step in document analysis pipelines is to detect these graphical components. It leads to a high-level conceptual understanding of the documents that makes digitization of documents viable. Since the advent of deep learning, the performance of deep learning-based object detection has improved many folds. In this work, we outline and summarize the deep learning approaches for detecting graphical page objects in the document images. Therefore, we discuss the most relevant deep learning-based approaches and state-of-the-art graphical page object detection in document images. This work provides a comprehensive understanding of the current state-of-the-art and related challenges. Furthermore, we discuss leading datasets along with the quantitative evaluation. Moreover, it discusses briefly the promising directions that can be utilized for further improvements.
ARTICLE | doi:10.20944/preprints202108.0360.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Table detection, table localization, deep learning, Hybrid Task Cascade, Object detection, deformable convolution, deep neural networks, computer vision, scanned document images, document image analysis.
Online: 17 August 2021 (10:26:42 CEST)
Tables in the document image are one of the most important entities since they contain crucial information. Therefore, accurate table detection can significantly improve information extraction from tables. In this work, we present a novel end-to-end trainable pipeline, HybridTabNet, for table detection in scanned document images. Our two-stage table detector uses the ResNeXt-101 backbone for feature extraction and Hybrid Task Cascade (HTC) to localize the tables in scanned document images. Moreover, we replace conventional convolutions with deformable convolutions in the backbone network. This enables our network to detect tables of arbitrary layouts precisely. We evaluate our approach comprehensively on ICDAR-13, ICDAR-17 POD, ICDAR-19, TableBank, Marmot, and UNLV. Apart from the ICDAR-17 POD dataset, our proposed HybridTabNet outperforms earlier state-of-the-art results without depending on pre and post-processing steps. Furthermore, to investigate how the proposed method generalizes unseen data, we conduct an exhaustive leave-one-out-evaluation. In comparison to prior state-of-the-art results, our method reduces the relative error by 27.57% on ICDAR-2019-TrackA-Modern, 42.64% on TableBank (Latex), 41.33% on TableBank (Word), 55.73% on TableBank (Latex + Word), 10% on Marmot, and 9.67% on UNLV dataset. The achieved results reflect the superior performance of the proposed method.
ARTICLE | doi:10.20944/preprints202107.0385.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Visual Question Generation; Visual Question Answering; Variational Autoencoders; Radiology Images; Domain Knowledge; UMLS; Data Augmentation; Computer Vision; Natural Language Processing; Artificial Intelligence; Medical Domain.
Online: 16 July 2021 (16:18:56 CEST)
Visual Question Generation (VQG) from images is a rising research topic in both fields of natural language processing and computer vision. Although there are some recent efforts towards generating questions from images in the open domain, the VQG task in the medical domain has not been well-studied so far due to the lack of labeled data. In this paper, we introduce a goal-driven VQG approach for radiology images called VQGRaD that generates questions targeting specific image aspects such as modality and abnormality. In particular, we study generating natural language questions based on the visual content of the image and on additional information such as the image caption and the question category. VQGRaD encodes the dense vectors of different inputs into two latent spaces, which allows generating, for a specific question category, relevant questions about the images, with or without their captions. We also explore the impact of domain knowledge incorporation (e.g., medical entities and semantic types) and data augmentation techniques on visual question generation in the medical domain. Experiments performed on the VQA-RAD dataset of clinical visual questions showed that VQGRaD achieves 61.86% BLEU score and outperforms strong baselines. We also performed a blinded human evaluation of the grammaticality, fluency, and relevance of the generated questions. The human evaluation demonstrated the better quality of VQGRaD outputs and showed that incorporating medical entities improves the quality of the generated questions. Using the test data and evaluation process of the ImageCLEF 2020 VQA-Med challenge, we found that relying on the proposed data augmentation technique to generate new training samples by applying different kinds of transformations, can mitigate the lack of data, avoid overfitting, and bring a substantial improvement in medical VQG.
ARTICLE | doi:10.20944/preprints201910.0247.v1
Subject: Behavioral Sciences, Behavioral Neuroscience Keywords: visual contrast; perceived relative object depth; 2D images; sound frequency; two alternative forced-choice; response times; high-probability decision; readiness to respond; probability summation
Online: 22 October 2019 (03:34:45 CEST)
Pieron's and Chocholle’s seminal psychophysical work predicts that human response time to information relative to visual contrast and/or sound frequency decreases when contrast intensity or sound frequency increases. The goal of this study is to bring to the fore the ability of individuals to use visual contrast intensity and sound frequency in combination for faster perceptual decisions of relative depth (“nearer”) in planar (2D) object configurations on the basis of physical variations in luminance contrast. Computer controlled images with two abstract patterns of varying contrast intensity, one on the left and one on the right, preceded or not by a pure tone of varying frequency, were shown to healthy young humans in controlled experimental sequences. Their task (two-alternative forced-choice) was to decide as quickly as possible which of two patterns, the left or the right one, in a given image appeared to “stand out as if it were nearer” in terms of apparent (subjective) visual depth. The results show that the combinations of varying relative visual contrast with sounds of varying frequency exploited here produced an additive effect on choice response times in terms of facilitation, where a stronger visual contrast combined with a higher sound frequency produced shorter forced-choice response times. This new effect is predicted by cross-modal audio-visual probability summation.
ARTICLE | doi:10.20944/preprints201711.0193.v1
Subject: Keywords: computational intelligence; quantum hybrid intelligent systems; quantum machine learning; medical image processing; disease diagnosis; Fuzzy k-NN; Quantum-behaved PSO; cervical smear images; cancer detection
Online: 30 November 2017 (07:21:00 CET)
A quantum hybrid (QH) intelligent approach that blends the adaptive search capability of the quantum-behaved particle swarm optimisation (QPSO) method with the intuitionistic rationality of traditional fuzzy k-nearest neighbours (Fuzzy k-NN) algorithm (known simply as the Q-Fuzzy approach) is proposed for efficient feature selection and classification of cells in cervical smeared (CS) images. From an initial multitude of seventeen (17) features describing the geometry, colour, and texture of the CS images, the QPSO stage of our proposed technique is used to select the best subset features (i.e. global best particles) that represent a pruned down collection of seven (7) features. Using a dataset of almost 1000 images, performance evaluation of our proposed Q-Fuzzy approach assesses the impact of our feature selection on classification accuracy by way of three experimental scenarios that are compared alongside two other approaches: The All-features (i.e. classification without prior feature selection) and another hybrid technique combining the standard PSO algorithm with the Fuzzy k-NN technique (P-Fuzzy approach). In the first and second scenarios, we further divided the assessment criteria in terms of classification accuracy based on the choice of best features and those in terms of the different categories of the cervical cells. In the third scenario, we introduced new QH hybrid techniques, i.e. QPSO combined with other supervised learning methods, and compared the classification accuracy alongside our proposed Q-Fuzzy approach. Furthermore, we employed statistical approaches to establish qualitative agreement with regards to the feature selection in scenarios 1 and 3. The synergy between the QPSO and Fuzzy k-NN in the proposed Q-Fuzzy approach marginally improves classification accuracy as manifest in the reduction in number cell features, which is crucial for effective cervical cancer detection and diagnosis.