ARTICLE | doi:10.20944/preprints202302.0026.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Graph Neural Network; Variational Autoencoder; Pooling; Nearest Neighbours
Online: 2 February 2023 (03:36:17 CET)
We present a Deep Learning generative model specialized to work with graphs with a regular geometry. It is build on a Variational Autoencoder framework and employs Graph convolutional layers in both encoding and decoding phases. We also introduce a pooling technique (ReNN-Pool), used in the encoder, that allows to downsample graph nodes in a spatially uniform and highly interpretable way. In the decoder, a symmetrical un-pooling technique is used to retrieve the original dimensionality of graphs. Performance of the model are tested on the standard Sprite benchmark dataset, a set of 2D images of video game characters, adequately transforming images data into graphs, and on the more realistic use-case of a dataset of cylindrical-shaped graph data that describe the distributions of the energy deposited by a particle beam in a medium.
ARTICLE | doi:10.20944/preprints202108.0282.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Classification of insulators; Electrical power system; k-Nearest neighbors; Computer vision.
Online: 13 August 2021 (11:45:50 CEST)
The contamination on the insulators may increase its surface conductivity and, as a consequence, electrical discharges occur more frequently, which can lead to interruptions in the power supply. To maintain reliability in the electrical distribution power system, components that have lost their insulating properties must be replaced. Identifying the components that need maintenance, is a difficult task as there are several levels of contamination that are hardly noticed during inspections. To improve the quality of inspections, this paper proposes to use the k-nearest neighbours (k-NN) to classify the levels of insulator contamination, based on the image of insulators at various levels of contamination simulated in the laboratory. Using computer vision features such as mean, variance, asymmetry, kurtosis, energy, and entropy are used for training the k-NN. To assess the robustness of the proposed approach, statistical analysis and a comparative assessment with well-consolidated algorithms such as decision tree, ensemble subspace, and support vector machine models are presented. The k-NN showed results of up to 85.17 % accuracy using the k-fold cross-validation method, with an average accuracy higher than 82 % for multi-classification of the contamination of the insulators, being superior to the compared models.
ARTICLE | doi:10.20944/preprints202307.1658.v1
Subject: Engineering, Bioengineering Keywords: Deep learning; Convolutional Neural Networks (CNN); K-Nearest Neighbors (KNN), Diabetes type II
Online: 25 July 2023 (09:37:09 CEST)
Abstract: The surge of diabetes poses a significant global health challenge, particularly in Oman and the Middle East. Early detection of diabetes is crucial for proactive intervention and improved patient outcomes. This research leverages the power of machine learning, specifically, Convolutional Neural Networks (CNNs), to develop an innovative 4D CNN model dedicated to early diabetes prediction. A region-specific dataset from Oman is utilized to enhance health outcomes for individuals at risk of developing diabetes. The proposed model showcases remarkable accuracy, achieving an average accuracy of 98.49% to 99.17% across various epochs. Additionally, it demonstrates excellent F1 score, recall, and sensitivity, highlighting its ability to identify true positive cases. The findings contribute to the ongoing effort to combat diabetes and pave the way for future research in using deep learning for early disease detection and proactive healthcare.
ARTICLE | doi:10.20944/preprints202306.0364.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Chinese address parsing; low-resource scenarios; In-context learning; GPT; BERT; k-nearest neighbors
Online: 9 June 2023 (04:28:59 CEST)
Address parsing is a crucial task in natural language processing, particularly for Chinese addresses. The complex structure and semantic features of Chinese addresses present challenges due to their inherent ambiguity. Additionally, different task scenarios require varying levels of granularity in address components, further complicating the parsing process. To address these challenges and adapt to low-resource environments, we propose CapICL, a novel Chinese address parsing model based on the In-Context Learning (ICL) framework. CapICL leverages a sequence generator, regular expression matching, BERT semantic similarity computation, and GPT modeling to enhance parsing accuracy by incorporating contextual information. We construct the sequence generator using a small annotated dataset, capturing distribution patterns and boundary features of address types to model address structure and semantics, mitigating interference from unnecessary variations. We introduce the REB-KNN algorithm, which selects similar samples for ICL-based parsing using regular expression matching and BERT semantic similarity computation. The selected samples, raw text, and explanatory text are combined to form prompts, and inputted into the GPT model for prediction and address parsing. Experimental results demonstrate significant achievements of CapICL in low-resource environments, reducing dependency on annotated data and computational resources. Our model's effectiveness, adaptability, and broad application potential are validated, showcasing its positive impact in natural language processing and geographical information systems.
ARTICLE | doi:10.20944/preprints202012.0237.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Biometrics; Face Recognition; Single Sample Face Recognition; Binarized Statistical Image Features; K-Nearest Neighbors
Online: 9 December 2020 (18:25:02 CET)
Single sample face recognition (SSFR) is a computer vision challenge. In this scenario, there is only one example from each individual on which to train the system, making it difficult to identify persons in unconstrained environments, particularly when dealing with changes in facial expression, posture, lighting, and occlusion. This paper suggests a different method based on a variant of the Binarized Statistical Image Features (BSIF) descriptor called Multi-Block Color-Binarized Statistical Image Features (MB-C-BSIF) to resolve the SSFR Problem. First, the MB-C-BSIF method decomposes a facial image into three channels (e.g., red, green, and blue), then it divides each channel into equal non-overlapping blocks to select the local facial characteristics that are consequently employed in the classification phase. Finally, the identity is determined by calculating the similarities among the characteristic vectors adopting a distance measurement of the k-nearest neighbors (K-NN) classifier. Extensive experiments on several subsets of the unconstrained Alex & Robert (AR) and Labeled Faces in the Wild (LFW) databases show that the MB-C-BSIF achieves superior results in unconstrained situations when compared to current state-of-the-art methods, especially when dealing with changes in facial expression, lighting, and occlusion. Furthermore, the suggested method employs algorithms with lower computational cost, making it ideal for real-time applications.
ARTICLE | doi:10.20944/preprints202010.0616.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Spike-and-wave; Generalized Gaussian distribution; EEG; Morlet wavelet; k-nearest neighbors classifier; Epilepsy
Online: 29 October 2020 (14:05:54 CET)
Spike-and-wave discharge (SWD) pattern detection in electroencephalography (EEG) signals is a key signal processing problem. It is particularly important for overcoming time-consuming, difficult, and error-prone manual analysis of long-term EEG recordings. This paper presents a new SWD method with a low computational complexity that can be easily trained with data from standard medical protocols. Precisely, EEG signals are divided into time segments for which the Morlet 1-D decomposition is applied. The generalized Gaussian distribution (GGD) statistical model is fitted to the resulting wavelet coefficients. A k-nearest neighbors (k-NN) self-supervised classifier is trained using the GGD parameters to detect the spike-and-wave pattern. Experiments were conducted using 106 spike-and-wave signals and 106 non-spike-and-wave signals for training and another 96 annotated EEG segments from six human subjects for testing. The proposed SWD classification methodology achieved 95 % sensitivity (True positive rate), 87% specificity (True Negative Rate), and 92% accuracy. These results set the path to new research to study causes underlying the so-called absence epilepsy in long-term EEG recordings.
ARTICLE | doi:10.20944/preprints202307.2084.v1
Subject: Computer Science And Mathematics, Signal Processing Keywords: cognitive radio; dynamic spectrum access; spectrum sensing; embedding parameters; false nearest neighbours; recurrence quantification analysis
Online: 31 July 2023 (10:08:43 CEST)
This paper addresses the problem of non-cooperative spectrum sensing in very low signal noise ratio (SNR) conditions. In our approach, detecting an unoccupied bandwidth consists to detect the presence or absence of a communication signal on this bandwidth. Major well known communication signals may contain hidden periodicities, we use the Recurrence Quantification Analysis (RQA) to reveal the hidden periodicities. RQA is very sensitive to a reliable estimation of the phase space dimension m or the time delay τ. In view of the limitations of algorithms proposed in the literature, we have proposed a new algorithm to estimate simultaneously the optimal values of m and τ. The new proposed optimal values allow the states reconstruction of the observed signal and then the estimation of the distance matrix. This distance matrix has particular properties which we have exploited to propose the Recurrence Analysis based Detector (RAD). RAD can detect a communication signal in a very low SNR condition. Using Receiver Operating Characteristic curves, our experimental results corroborate the robustness of our proposed algorithm comparing to classical widely used algorithms.
ARTICLE | doi:10.20944/preprints202012.0791.v1
Subject: Biology And Life Sciences, Anatomy And Physiology Keywords: Mint; Plant volatiles; Electronic Nose; Principal Component Analysis; Linear Discriminant Analysis; k-Nearest-Neighbors Analysis
Online: 31 December 2020 (11:45:40 CET)
Mints emit diverse scents that exert specific biological functions and are relevance for applications. The current work strives to develop electronic noses that can electronically discriminate the scents emitted by different species of Mint as alternative to conventional profiling by gas chromatography. Here, 12 different sensing materials including 4 different metal oxide nanoparticle dispersions (AZO, ZnO, SnO2, ITO), one Metal-Organic Frame as Cu(BPDC), and 7 different polymer films including PVA, PEDOT: PSS, PFO, SB, SW, SG, PB were used for functionalizing of QCM sensors. The purpose was to discriminate six economically relevant Mint species (Mentha x piperita, Mentha spicata, Mentha spicata ssp. crispa, Mentha longifolia, Agastache rugosa, and Nepeta cataria). The adsorption and desorption datasets obtained from each modified QCM sensor were processed by three different classification models including Principal Component Analy-sis (PCA), Linear Discriminant Analysis (LDA), and k-Nearest Neighbor Analysis (k-NN). This allowed discriminating the different Mints with classification accuracies of 97.2% (PCA), 100% (LDA), and 99.9% (k-NN), respectively. Prediction accuracies with a repeating test measurement reached up to 90.6% for LDA, and 85.6% for k-NN. These data demonstrate that this electronic nose can discriminate different Mint scents in a reliable and efficient manner.
ARTICLE | doi:10.20944/preprints201802.0192.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: LiDAR sensors reliability; Internet of Things, self-turning parametrization; k-nearest neighbors, driven-assistance simulator
Online: 28 February 2018 (11:26:12 CET)
Nowadays, the research and development of on-chip LiDAR sensors for vehicle collision avoidance is growing very fast. Therefore, the assessment of the reliability in obstacle detection using the information provided by LiDAR sensors has become a key issue to be explored by the scientific community. This paper presents the design and implementation of a self-tuning method in order to maximize the reliability of an Internet-of-Things sensors network and to minimize the number of sensors to localize with the required accuracy obstacles by a detection threshold. In order to achieve this goal, models that predict accuracy (i.e., prediction error) for object localization using data collected by LIDAR sensors are designed and implemented in Webots Automobile 3D simulation tool. The approach is based on combining different techniques. Firstly, point-cloud clustering technique and an error prediction model library composed by a multilayer perceptron neural network with backpropagation, k-nearest neighbors and linear regression are explored. Secondly the above-mentioned techniques for modeling are also combined with a supervised and reinforcement machine learning technique, Q-learning in order to minimize the detection threshold. In addition, a IoT driving assistance simulated scenario with a LiDAR sensor network is designed in order to validate the prediction model and the optimal configuration of the sensor network to guarantee reliability in obstacle localization. The results demonstrate that the self-tuning method is appropriate to increase the reliability of the sensor network whereas minimizing the detection threshold
ARTICLE | doi:10.20944/preprints201706.0021.v1
Subject: Environmental And Earth Sciences, Oceanography Keywords: sea surface temperature (SST); radial basis function network (RBFN); improved nearest neighbor cluster (INNC) algorithm
Online: 5 June 2017 (05:01:17 CEST)
A radial basis function network (RBFN) method is proposed to reconstruct daily Sea surface temperatures (SSTs) with limited SST samples. For the purpose of evaluating the SSTs using this method, non-biased SST samples in the Pacific Ocean (10°N–30°N, 115°E–135°E) are selected when the tropical storm Hagibis arrived in June 2014, and these SST samples are obtained from the OISST products according to the distribution of AVHRR L2p SST and in-situ SST data. Furthermore, an improved nearest neighbor cluster (INNC) algorithm is designed to search the optimal hidden knots for RBFNs from both the SST samples and the background fields. Then the reconstructed SSTs from the RBFN method are compared with the results from the optimum interpolation (OI) method. The statistical results show that the RBFN method has a better performance of reconstructing SST than the OI method in the study, and the average RMSE is 0.48°C for the RBFN method, which is quite smaller than the value of 0.69°C for the OI method. Additionally, the RBFN methods with different basis functions and clustering algorithms are tested, and we discover that the INNC algorithm with multi-quadric function is quite suitable for the RBFN method to reconstruct SSTs when the SST samples are sparsely distributed.
ARTICLE | doi:10.20944/preprints202106.0278.v1
Subject: Chemistry And Materials Science, Biomaterials Keywords: Basil; Mint; Plant volatiles; Electronic Nose; Principal Component Analysis,; Linear Discriminant Analysis; k-Nearest-Neighbors Analysis.
Online: 10 June 2021 (08:09:36 CEST)
The Lamiaceae belong to the species-richest families of flowering plants and harbor many species used as herbs or for medicinal applications, such as Basils or Mints. Evolution of this group has been driven by chemical speciation, mainly of Volatile Organic Compounds (VOCs). The commercial use of these plants is characterized by a large extent of adulteration and surrogation. To authenticate and discern the species, is, thus, relevant for consumer safety, but usually requires cumbersome analytics, such as Gas Chromatography, often to be coupled with Mass Spectroscopy. We demon-strate here that quartz-crystal microbalance (QCM)-based electronic noses provide a very cost-efficient alternative, allowing for a fast, automated discrimination of scents emitted from leaves of different plants. To explore the range of this strategy, we used leaf material from four genera of Lamiaceae along with Lemongrass as similarly scented, but non-related outgroup. In order to unambiguously differentiate the scents from the different plants, the output of the 6 different SURMOF/QCM sensors was analyzed using machine learning (ML) methods, together with a thorough statistical analysis. The exposure and purging datasets (4 cycles) obtained from a QCM-based, low-cost homemade portable e-Nose were analyzed with Linear Discriminant Analysis (LDA) classification model. Prediction accuracies with repeating test measurements reached values of up to 90%. We show that it is not only possible to discern and identify plants on the genus level, but even to discriminate closely related sister clades within a genus (Basil), demonstrating that e-Noses are a powerful technology to safeguard consumer safety against the challenges of globalized trade.
ARTICLE | doi:10.20944/preprints201612.0078.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: missing value imputation; machine learning; decision tree imputation; k-nearest neighbors imputation; self-organizing map imputation
Online: 15 December 2016 (08:27:13 CET)
Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models.
ARTICLE | doi:10.20944/preprints202309.0313.v1
Subject: Physical Sciences, Condensed Matter Physics Keywords: ising model; free energy; critical exponents; antiferromagnetic; balanced system; effective number of the nearest neighbors; layered media
Online: 6 September 2023 (03:16:00 CEST)
In the framework of mean field approximation, we consider a spin system consisting of two interacting sub-ensembles: spins interactions with in each sub-ensemble are ferromagnetic, while the inter-ensemble interactions are antiferromagnetic. We define the effective number of the nearest neighbors, and show that if the two sub-ensembles have the same effective number of the nearest neighbors, the classical form of critical exponents (, , , ) gives way to the non-classical form (, , , ), and the scaling function changes simultaneously. We demonstrate that this system allows for two second-order phase transitions and two first-order phase transitions. We observe that an external magnetic field does not destroy the phase transitions, but only shifts their critical points, allowing for control of the system’s parameters. We discuss the regime when the magnetization as a function of the magnetic field develops a low-magnetization plateau, and show that the height of this plateau abruptly rises to the value of one when the magnetic field reaches a critical value. Our analytical results are supported by a Monte Carlo simulation of a three-dimensional layered model.
ARTICLE | doi:10.20944/preprints202012.0054.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Patterns recognition; Machine learning; Hereditary Ataxia diseases; K-Nearest Neighbors; Multi Layer Perceptron; Ensemble Classification Trees; SVM.
Online: 2 December 2020 (09:33:15 CET)
The progressive impairment analysis in gait from neurological diseases patients such as Hereditary Ataxias (HA) has been carried out using gait data collected with movement sensors. This research is focused on finding the minimum amount required of gait features to recognize efficiently and less intrusive way, HA patients based on data collected with iPhone movement sensors placed on the ankles from 14 HA patients and 14 healthy people. A twofold proposal is made , first a local minimum prominent peak criterion to find out the starting point of each stride, to get 10-stride window about which 56 spatial-temporal features are derived; second a search strategy based on Hill Climbing algorithm to reduce the number of gait features and sensors. The main results were the findings that with two gait patterns a 96% of classification accuracy was achieved by using K-Nearest Neighbors (KNN) and Multi-Layer Perceptron (MLP) algorithms, but in addition, MLP only right ankle sensor patterns were required which also allows to reduce the intrusion.
ARTICLE | doi:10.20944/preprints202009.0257.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: Face Detection; Kohonen Self-Organizing Feature Map(K-SOM); Skin Color Segmentation; K-Nearest Neighbour (KNN) Classifier
Online: 11 September 2020 (12:10:28 CEST)
In today's world it is very much important to maintain the security of information and its risks. The biometric-based techniques are very much useful in these problems. Among the several kinds of biometric-based technique, face detection is much complex and much more important. Due to the age and several other problems, a human face structure changes over time, again a human has lots of expressions. Sometimes due to the lighting condition or the variation of the angle of an input device, the pattern of a human face structure also changed. As a result, the face cannot be detected properly. In this paper, a method is proposed that can detect the human faces both automatically and manually very efficiently. In manual mode, a user can select the input faces referred by the system according to their choice. In automated mode, the system detected all possible face areas using the Kohonen Self-Organizing Feature Map technique. This method reduced the complex color image into a vector quantized image with desired colors. Then a color segmentation technique is used to detect the possible face skin areas from the vector quantized image. Then the Histogram Oriented Gradient technique used to detect the feature from the faces and K-Nearest Neighbour Classifier is used to compare both face images detected by the two modes. The automated method prosed better accuracy than the manual method.
ARTICLE | doi:10.20944/preprints202305.0917.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Machine learning; Geriartic fall detection; Dataset; K Nearest Neighbours; Naive Bayes; Logistic Regression; Random Forest; Support Vector Machine
Online: 12 May 2023 (10:00:16 CEST)
ARTICLE | doi:10.20944/preprints202108.0413.v3
Subject: Medicine And Pharmacology, Dentistry And Oral Surgery Keywords: Dental Age Measurement; Dental Radiography; Orthopantomogram; Convolutional Neural Network; K-Nearest Neighbour; Health Data Analytics; Biomedical Machine Learning
Online: 12 April 2022 (10:12:48 CEST)
Age estimation in dental radiographs Orthopantomography (OPG) is a medical imaging technique that physicians and pathologists utilise for disease identification and legal matters. For example, for estimating post-mortem interval, detecting child abuse, drug trafficking, and identifying an unknown body. Recent development in automated image processing models improved the age estimation's limited precision to an approximate range of +/- one year. While this estimation is often accepted as accurate measurement, age estimation should be as precise as possible in most serious matters, such as homicide. Current age estimation techniques are highly dependent on manual and time-consuming image processing. Age estimation is often a time-sensitive matter in which the image processing time is vital. Recent development in Machine learning-based data processing methods has decreased the imaging time processing; however, the accuracy of these techniques remains to be further improved. We proposed an ensemble method of image classifiers and transfer learning techniques to enhance the accuracy of age estimation using OPGs from one year to a couple of months (1-3-6). This hybrid model is based on convolutional neural networks (CNN) and K nearest neighbours (KNN). The hybrid (HCNN-KNN) model was used to investigate 1,922 panoramic dental radiographs of patients aged 15 to 23. These OPGs were obtained from the various teaching institutes and private dental clinics in Malaysia. To minimise the chance of overfitting in our model, we used the principal component analysis (PCA) algorithm and eliminated the features with high correlation. To further enhance the performance of our hybrid model, we performed systematic image pre-processing. We applied a series of classifications to train our model. We have successfully demonstrated that combining these innovative approaches has improved the classification and segmentation and thus the age-estimation outcome of the model. Our findings suggest that our innovative model, for the first time, to the best of our knowledge, successfully estimated the age in classified studies of one year old, six months, three months and one-month-old cases with accuracies of 99.98, 99.96, 99.87, and 98.78 respectively.
ARTICLE | doi:10.20944/preprints202306.2178.v1
Subject: Engineering, Telecommunications Keywords: delay; dimensionality reduction; LTE; VoIP; Neural Networks; Support Vector Machines; k-Nearest Neighbors; Feature Selection; Pareto 80/20 rule
Online: 30 June 2023 (07:38:35 CEST)
Delay in data transmission is one of key performance indicators (KPIs) of a network. The planning and project value of delay in network management is of crucial importance for the optimal allocation of network resources and their performance focuses. To create optimal solutions, predictive models, which are currently most often based on machine learning (ML), are used. This paper aims to investigate the training, testing and selection of the best predictive delay model for a VoIP service in an Long Term Evolution (LTE) network using three ML techniques - Neural Networks (NN), Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN). The space of model input variables is optimized by dimensionality reduction techniques: RReliefF algorithm, Backward selection via the recursive feature elimination algorithm and the Pareto 80/20 rule. A three-segment road in the geo-space between the cities of Banja Luka (BL) and Doboj (Db) in the Republic of Srpska (RS), Bosnia and Herzegovina (BiH), covered by the cellular network (LTE) of the M:tel BL operator was chosen for the case study. The results show that, in all three optimization approaches, the k-NN model is selected as the best solution. For the RReliefF optimization algorithm, the best model has 6 inputs and minimum relative error (RE), RE=0.109; for the Backward selection via the recursive feature elimination algorithm, the best model has 4 inputs and RE=0.041; and for the Pareto 80/20 rule, the best model has 11 inputs and RE= 0.049. The comparative analysis of the results concludes that according to observed criteria for the selection of the final model, the best solution is an approach to optimizing the number of predictors based on the Backward selection via the recursive feature elimination algorithm.
ARTICLE | doi:10.20944/preprints201905.0382.v1
Subject: Engineering, Control And Systems Engineering Keywords: supervised machine learning; flood inundation mapping; high-resolution; synthetic aperture radar; height above nearest drainage; sentinel-1; inundated vegetation
Online: 31 May 2019 (08:48:14 CEST)
Floods are one of the most wide-spread, frequent, and devastating natural disasters that continue to increase in frequency and intensity. Remote sensing, specifically synthetic aperture radar (SAR), has been widely used to detect surface water inundation to provide retrospective and near-real time (NRT) information due to its high-spatial resolution, self-illumination, and low atmospheric attenuation. However, the efficacy of flood inundation mapping with SAR is susceptible to reflections and scattering from a variety of factors including dense vegetation and urban areas. In this study, the topographic dataset height above nearest drainage (HAND) was investigated as a potential supplement to Sentinel-1A C-Band SAR along with supervised machine learning to improve the detection of inundation in heterogeneous areas. Three machine learning classifiers were trained on two sets of features SAR only (VV & VH) and VV, VH & HAND to map inundated areas. Three study sites along the Neuse River in North Carolina, USA during the record flood of Hurricane Matthew in October 2016 were selected. The binary classification analysis (inundated as positive vs. non-inundated as negative) revealed significant improvements when incorporating HAND in several metrics including classification accuracy (ACC) (+37.1%), true positive rate (TPR) (+51.2%), and negative predictive value (NPV) (+23.7%), A marginal improvement of +1.4% was seen for positive predictive value (PPV), but true negative rate (TNR) fell -15.1%. By incorporating HAND, a significant number of areas with high SAR backscatter but low HAND values were detected as inundated which increased true positives. This in turn also increased the false positives detected but to a lesser extent as evident in the metrics. This study demonstrates that HAND could be considered a valuable feature to enhance SAR flood inundation mapping especially in areas with heterogeneous land covers with dense vegetation that interfere with SAR.
ARTICLE | doi:10.20944/preprints202308.0063.v1
Subject: Chemistry And Materials Science, Analytical Chemistry Keywords: optical sensing; absorbance; fluorescence; fingerprinting; recognition of motor oils; oxidation of carbocyanine dyes; linear discriminant analysis; k-nearest neighbors algorithm
Online: 1 August 2023 (10:57:32 CEST)
Optical “fingerprints” are widely used in chemometrics-assited recognition of samples of different nature. An emerging trend in this area is the transition from obtaining "static" spectral data to reactions occurring over time. The indicator reactions are usually carried out in aqueous solutions; in this study we have developed the reactions that occur in an organic solvent, which makes it possible to recognize fat-soluble samples. In this capacity, we used 5W40, 10W40 and 5W30 motor oils of 4 manufacturers, totally 6 samples. The procedure involved mixing of the dye, sample, and reagents (HNO3, HCl, or t-butyl hydroperoxide) in ethanolic solution in a 96-well plate and measuring absorbance or near-IR fluorescence intensity every several minutes during 20–55 min. The obtained photographic images were processed by linear discriminant analysis (LDA) and k-nearest neighbors algorithm (kNN). The discrimination accuracy was evaluated by using the validation procedure. Reaction of oxidation of a dye with nitric acid allowed to recognize all 6 samples with 100% accuracy by LDA. Merging data of 4 reactions that did not provide complete discrimination ensured an accuracy of 93% by kNN technique. The developed indicator systems have good prospects for the discrimination of other fat-soluble samples. Overall, the results confirm the viability of the kinetic-based discrimination strategy.
ARTICLE | doi:10.20944/preprints202307.1043.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: Image classification; Land use/land cover mapping; Maximum likelihood; K-nearest neighbors; Random Forest; Support Vertors Machine; Landsat-8 OLI; Sentinel-2 MSI
Online: 17 July 2023 (09:23:49 CEST)
Satellite-based data classification performance remains a challenge for research community in the field of land use/land cover mapping. Here we investigated supervised per-pixel classifications performance under different scenarios, based on single and seasonal multispectral data combina-tions of different sensors (Landsat-8 OLI and Sentinel-2 MSI). In case of Landsat, seasonal spectral indices (EVI and NDMI) were included. A typical Mediterranean watershed with a complex landscape comprised of various forest and wetland ecosystems, crops, artificial surfaces, and lake water was selected to test our approach. All available geospatial data from national databases (Forest Map, LPIS, Natura2000 habitats, cadastral parcels, etc.) are used as ancillary data for clas-sification training and validation. We examined and compared the performance of ML, RF, KNN and SVM classifiers under different scenarios for land use/land cover mapping, according to Co-pernicus Land Cover nomenclature. In total, eight land use/land cover classes were identified in Landsat-8 OLI and nine in Sentinel-2 MSI for an acceptable overall accuracy over 85%. A com-parison of the overall classification accuracies shows that Sentinel-2 overall accuracy was slightly higher than Landsat-8 (96.68% vs. 93.02%). Respectively, the best-performed algorithm was ML in Sentinel-2 while in Landsat-8 was KNN. However, machine-learning algorithms have similar results regardless the type of sensor. We concluded that best classification performances achieved using seasonal multispectral data. Future research should be oriented towards integrating time-series multispectral data of different sensors and geospatial ancillary data for land use/land cover mapping.
ARTICLE | doi:10.20944/preprints202109.0181.v1
Subject: Computer Science And Mathematics, Mathematical And Computational Biology Keywords: Classification; stacking ensemble method; heart surgery; unbalanced data problem; hybrid predictive model; machine learning in healthcare; resampling method; Edited-Nearest-Neighbor; nonparametric test.
Online: 10 September 2021 (10:53:35 CEST)
Nowadays, according to spectacular improvement in health care and biomedical level, a tremendous amount of data is recorded by hospitals. In addition, the most effective approach to reduce disease mortality is to diagnose it as soon as possible. As a result, data mining by applying machine learning in the field of diseases provides good opportunities to examine the hidden patterns of this collection. An exact forecast of the mortality after heart surgery will cause Successful medical treatment and fewer costs. This research wants to recommend a new stacking predictive model after utilizing the random forest feature importance method to foresee the mortality after heart surgery on a highly unbalanced dataset by using the most practical features. To solve the unbalanced data problem, a combination of the SVM-SMOTE over-sampling algorithm and the Edited-Nearest-Neighbor under-sampling algorithm is used. This research compares the introduced model with some other machine learning classifiers to ensure efficiency through shuffle hold-out and 10-fold cross-validation strategies. In order to validate the performance of the implemented machine learning methods in this research, both shuffle hold-out, and 10-fold cross-validation results indicated that our model had the highest efficiency compared to the other models. Furthermore, the Friedman statistical test is applied to survey the differences between models. The result demonstrates that the introduced stacking model reached the most accurate predicting performance after Logistic Regression.
REVIEW | doi:10.20944/preprints202303.0066.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Fourth Industrial Revolution (4IR); Machine Learning (ML); Precision Agriculture; Space Vector Machine (SVM); Artificial Neural Network (ANN); k-Nearest Neighbour (k-NN); Fuzzy Classification; Global Navigation and Satellite System (GNSS)
Online: 3 March 2023 (09:28:23 CET)
The globe and more particularly the economically developed regions of the world are currently in the era of the fourth Industrial revolution (4IR). Conversely; the economically developing regions in the world and more particularly the African continent have not yet even fully passed through the Third Industrial Revolution (3IR) wave and its economy is still heavily dependent on the agricultural field. On the other hand, the state of global food insecurity is worsening on an annual basis thanks to the exponential growth of the global human population which continuously heightens the food demand in both quantity and quality. This justifies the significance of the focus on digitizing agricultural practices to improve the farm yield to meet up with the steep food demand and stabilize the economy of the African continent and countries like India whose economy is mainly dependent on Agriculture. The tools we have at our disposal to utilize in the digitization of farming practices include space technology and Global Navigation and Satellite System (GNSS) in particular, Machine learning (ML), precision agriculture and communication systems such as the Internet of Things (IoT) and Information And Communication Technologies (ICT). The most pressing challenges in the farming field include the monitoring of diseases, pests, weeds and nutrient deficiencies in the crops as early detection translates to swift and timely correction actions and hence more yield at the end of a farming cycle. Vast opportunities in the field of precision agriculture still exist that can amount to further research studies such as the lack of real-time monitoring and real-time corrective action focus.
ARTICLE | doi:10.20944/preprints202309.1009.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Electrical power systems; Support vector machines; random Forest; machine learning; wavelet transform; transmission lines fault; Electrical power quality; short circuit; Classification of faults; localization of faults; decision trees; Ensemble learning; K-nearest neighbors
Online: 15 September 2023 (04:54:55 CEST)
Keywords: Electrical power systems, Support vector machines, Random Forest, Machine learning, Wavelet transform, Transmission lines fault, Electrical power quality, Short circuit, Classification of faults, Localization of faults, Decision trees, Ensemble learning, K-nearest neighbors.