BRIEF REPORT | doi:10.20944/preprints202306.1630.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Random Forest; Loan Risk
Online: 22 June 2023 (12:50:50 CEST)
As people's consumption habits change, loan plays a crucial role in our modern society. It provides individuals who do not have sufficient money with funds to purchase residential property or start a business. However, for avoiding unpleasant loan defaults, all financial institutions will first assess the borrower's risk index. By predicting the default risk of the borrower to decide whether to lend money. Machine learning algorithms, including random forest, linear regression and so on, have been benefited most of the real-world applications. With the development of machine learning methods, this paper, based on the personal history loan data of an institution studies the loan default risk, and uses the random forest classification model to predict the possibility of loan default. The result showed that the accuracy of this method was 85.62%, which show its application ability of real-world loan prediction and benefits the manager to decide the degree of risk for loan grant.
ARTICLE | doi:10.20944/preprints201910.0360.v1
Subject: Computer Science And Mathematics, Mathematical And Computational Biology Keywords: Random Forest; Iterative Random Forest; gene expression networks; high performance computing; X-AI-based eQTL
Online: 31 October 2019 (02:33:17 CET)
As time progresses and technology improves, biological data sets are continuously increasing in size. New methods and new implementations of existing methods are needed to keep pace with this increase. In this paper, we present a high performance computing(HPC)-capable implementation of Iterative Random Forest (iRF). This new implementation enables the explainable-AI eQTL analysis of SNP sets with over a million SNPs. Using this implementation we also present a new method, iRF Leave One Out Prediction (iRF-LOOP), for the creation of Predictive Expression Networks on the order of 40,000 genes or more. We compare the new implementation of iRF with the previous R version and analyze its time to completion on two of the world's fastest supercomputers Summit and Titan. We also show iRF-LOOP's ability to capture biologically significant results when creating Predictive Expression Networks. This new implementation of iRF will enable the analysis of biological data sets at scales that were previously not possible.
ARTICLE | doi:10.20944/preprints202306.2075.v1
Subject: Environmental And Earth Sciences, Geophysics And Geology Keywords: geological hazard; susceptibility; random forests; certainty factor; Huize county
Online: 29 June 2023 (08:27:07 CEST)
The assessment outcomes of regional susceptibility to geological disasters can directly indicate the extent and intensity of risks within the study area, thus, providing targeted guidance for disaster management efforts. This study selects eight evaluation indicators, namely elevation, gradient, terrain relief, lithology of strata, normalized difference vegetation index, distance from the fault, distance from road, and distance from the river. The study focuses on Huize County in Yunnan Province as the research area, utilizing the certainty factor (CF) and random forest (RF) models for evaluating the susceptibility to geological disasters. The non-geological disaster points in the study area are determined using the deterministic coefficient prior model, and the deterministic coefficient values for each evaluation factor serve as the classification data for the random forest model. The optimal parameters for the random forest are selected through iterative calculations of bag error in PyCharm, while the weight of the evaluation factor is determined based on the ran-dom forest model with the optimal parameters. The results of geological disaster susceptibility zoning in Huize County are obtained by overlaying the weighted deterministic coefficients of each evaluation factor. The accuracy of the evaluation results is verified using zoning statistics and ROC curves with a test sample of 30% of the points. The results demonstrate the high accuracy of the model in evaluating the susceptibility to geological disasters in Huize County. Compared to the single deterministic coefficient model, this approach offers advantages in terms of reliability and accuracy. The evaluation results can serve as a scientific reference for related work in Huize County.
ARTICLE | doi:10.20944/preprints202304.0024.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: random forest; gaussian plume; GEM-AQ; downscalling; PM10
Online: 3 April 2023 (10:40:09 CEST)
High PM10 concentrations are still a significant problem in many parts of the world. In many countries, including Poland, 50μg/m3 is the permissible threshold for a daily averaged PM10 concentration. The number of people affected by this threshold’s exceedance is challenging to estimate and requires high-resolution concentration maps. This paper presents an application of random forests for downscaling regional model air quality results. As policymakers and other end users are eager to receive a detailed resolution PM10 concentration maps, we propose a technique which utilizes the results of regional CTM (GEM-AQ, with 2.5km resolution) and local Gaussian plume model. As a result, we receive a detailed, 250-meter resolution PM10 distribution, which resembles the complex emission pattern in a foothill area in southern Poland. The random forest results are highly consistent with the GEM-AQ and observed concentration. We also discuss different strategies of data training random forest - using additional features and selecting target variables.
ARTICLE | doi:10.20944/preprints201609.0053.v1
Subject: Engineering, Control And Systems Engineering Keywords: electricity markets; price forecasting; multi-output models; random forests; conditional inference trees
Online: 18 September 2016 (06:16:19 CEST)
Predicting electricity prices is a very important issue in modern society, because the associated decision process under uncertainty requires accurate forecasts for the economic agents involved. In this paper, we apply the decision tree extension of Random Forests to the prediction of electricity prices in Spain, but with the novelty of modeling prices jointly with demand, with the purpose of achieving greater accuracy than with univariate response Random Forests, particularly in price prediction, as well as understanding the effect of the input variables (lagged values of price and demand, current production levels of available energy sources) on the joint of the two outputs. The results are very encouraging, providing significant increase in price prediction accuracy. Also, interesting methodological challenges appear as far as the appropriate choice of the relative weights of price and demand in the joint modeling is concerned and a new procedure to provide the importance variable ranking is proposed. The partykit (package of R software) library allowing for multivariate Random Forests has been used.
ARTICLE | doi:10.20944/preprints202012.0752.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: Random Forest; machine learning; multispectral imagery; deforestation; PFBC landscapes
Online: 30 December 2020 (11:57:18 CET)
The evaluation of deforestation by optical remote sensing remains a challenge in the humid tropical region due to high cloud cover. This paper develops a simple and reproducible method for mapping deforestation of the old-growth forest using open access software. A map of old-growth forest depletion was created using composites from three different dates (2003, 2010, 2016). Four models were tested: the first model using spectral bands (nir, swir1, swir2 and red), the second model was based on the association of spectral bands and spectral indices (NDVI, B54R, NDWI and NBR), the third model was constructed using spectral bands and geomorphological indices (DEM, Slope and Roughness) and the last model combined spectral bands, spectral indices and geomorphological indices. The optimal random forest ntrees and Mtry parameters were determined for each model to optimize the mapping in each model. The out-of-bag error for these four models was 2.15 %, 2.05 %, 1.86 % and 1.85 %, respectively. The fourth model had the lowest error and was hence used to predict deforestation of the old-growth forest. The annual rates of deforestation amounted 0.26 % (69861 ha) and 0.66 % (145768 ha) between 2003 – 2010 and 2010 – 2016, respectively. The area of the old-growth forest in 2016 was 3601607 ha and 215629 ha of forest lost between 2003 and 2016. These results showed that the Random Forest Classification (RFC) model was able to effectively map the reduction of old-growth forests.
ARTICLE | doi:10.20944/preprints202305.1623.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Goertzel algorithm; ITSC fault; traction motor; random forest; fault diagnosis
Online: 23 May 2023 (08:31:43 CEST)
The stator winding insulation system is the most critical and weak part of the EMU's (electric multiple unit) traction motor. The effective diagnosis for stator ITSC (inter-turn short-circuit) faults can prevent the fault from expanding into phase-to-phase or ground short-circuits. The TCU(traction control unit) controls the traction inverter to output SPWM (sine pulse width modulation) excitation voltage when the traction motor is stationary. Three ITSC fault diagnostic conditions are based on different IGBTs control logic. The Goertzel algorithm is used to calculate the fundamental current amplitude difference Δi and phase angle difference Δθ of equivalent parallel windings under the three diagnostic conditions. The six parameters under the three diagnostic conditions are used as features to establish an ITSC fault diagnostic model based on random forest. The proposed method was validated using a simulation experimental platform for ITSC fault diagnosis of EMU traction motors. The experimental results indicate that the current amplitude features Δi and phase angle features Δθ change obviously with the increase of ITSC fault extent if the ITSC fault occurs at the equivalent parallel windings. The accuracy of the ITSC fault diagnosis model based on the random forest for ITSC fault detection and location both in train and test samples are 100%.
ARTICLE | doi:10.20944/preprints201801.0088.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: wheat classification; random forest; spectral gradient difference; vegetation indices
Online: 10 January 2018 (09:13:02 CET)
The early-season area estimation of the winter wheat crop as a strategic product is important for decision makers. Classification of multi-temporal images is an approach which is affected by many factors like appropriate training sample size, proper frequency and acquisition times, vegetation indices (VIs) type, temporal gradient of spectral bands and VIs, appropriate classifier and missed values because of cloudy conditions. This paper addresses the impact of appropriate frequency and acquisition times and VIs type along with the spectral and VI gradient on random forest (RF) classifier when missed values exist in multi-temporal images. To investigate the appropriate temporal resolution for image acquisition, the study area was selected on an overlapping area between two LDCM paths. In our developed method, the miss values of cloudy bands for each pixel are retrieved by the mean of k-nearest ordinary pixels. Then the multi-temporal image analysis is performed by considering different scenarios provided by decision makers in terms of desired crop types that should be extracted at early-season in the study areas. The classification results obtained by the RF decrease by 1.6% when temporally missed values retrieved by the proposed method, which is an acceptable result. Moreover, the experimental results demonstrated that if temporal resolution of Landsat 8 increased to one week the classification task can be conducted earlier with almost better results in terms of OA and kappa. The impact of incorporating VIs along with the temporal gradients of spectral bands and VIs as new features in RF demonstrated that the OA and Kappa are improved 3.1% and 6.6%, respectively. Furthermore, the obtained result showed that if only one image from seasonal changes of crops is available, the temporal gradient of VIs and spectral bands play the main role to discriminate remarkably wheat from barley. The experiments also demonstrated that if both wheat and barley merge to a single class the crop area can be estimated two months earlier with 97.1 and 93.5 in terms of OA and kappa, respectively.
Subject: Engineering, Mechanical Engineering Keywords: diesel engine; fault diagnosis; variational mode decomposition; random forest; feature extraction
Online: 25 December 2019 (11:13:13 CET)
Diesel engines, as power equipment, are widely used in the fields of automobile industry, ship and power equipment. Due to wear or faulty adjustment, the valve train clearance abnormal fault is a typical failure of diesel engines, which may result in the performance degradation, even valve fracture and cylinder hit fault. However, the failure mechanism features mainly in time domain and angular domain, on which the current diagnosis methods based, are easily affected by working conditions or hard to extract accurate enough, as the diesel engine keeps running in transient and non-stationary process. This work arms at diagnosing this fault mainly based on frequency band features which would change when the valve clearance fault occurs. For the purpose of extracting a series of frequency band features adaptively，a decomposition technique based on improved variational mode decomposition is investigated in this work. As the connection between the features and the fault is fuzzy, the random forest algorithm is used to analyze the correspondence between features and faults. In addition, the feature dimension is reduced to improve the operation efficiency according to importance score. The experimental results under variable speed condition show that the method based on variational mode decomposition and random forest is capable to detect valve clearance fault effectively.
ARTICLE | doi:10.20944/preprints202201.0138.v1
Subject: Business, Economics And Management, Business And Management Keywords: Smart Grid; Random Forest; Internet of Things; Power management; Machine Learning; Smart Meter; Priority Power Scheduling.
Online: 11 January 2022 (13:01:08 CET)
Presently power control and management play a vigorous role in information technology and power management. Instead of non-renewable power manufacturing, renewable power manufacturing is preferred by every organization for controlling resource consumption, price reduction and efficient power management. Smart grid efficiently satisfies these requirements with the integration of machine learning algorithms. Machine learning algorithms are used in a smart grid for power requirement prediction, power distribution, failure identification etc. The proposed Random Forest-based smart grid system classifies the power grid into different zones like high and low power utilization. The power zones are divided into number of sub-zones and map to random forest branches. The sub-zone and branch mapping process used to identify the quantity of power utilized and the non-utilized in a zone. The non-utilized power quantity and location of power availabilities are identified and distributed the required quantity of power to the requester in a minimal response time and price. The priority power scheduling algorithm collect request from consumer and send the request to producer based on priority. The producer analysed the requester existing power utilization quantity and availability of power for scheduling the power distribution to the requester based on priority. The proposed Random Forest based sustainability and price optimization technique in smart grid experimental results are compared to existing machine learning techniques like SVM, KNN and NB. The proposed random forest-based identification technique identifies the exact location of the power availability, which takes minimal processing time and quick responses to the requestor. Additionally, the smart meter based smart grid technique identifies the faults in short time duration than the conventional energy management technique is also proven in the experimental results.
ARTICLE | doi:10.20944/preprints202102.0318.v3
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: Machine Learning; Artificial Intelligence; Androgen Receptor; Random Forest; Deep Neural Network; Convolutional
Online: 24 February 2021 (13:14:01 CET)
Substances that can modify the androgen receptor pathway in humans and animals are entering the environment and food chain with the proven ability to disrupt hormonal systems and leading to toxicity and adverse effects on reproduction, brain development, and prostate cancer, among others. State-of-the-art databases with experimental data of human, chimp, and rat effects by chemicals have been used to build machine learning classifiers and regressors and evaluate these on independent sets. Different featurizations, algorithms, and protein structures lead to dif- ferent results, with deep neural networks (DNNs) on user-defined physicochemically-relevant features developed for this work outperforming graph convolutional, random forest, and large featurizations. The results show that these user-provided structure-, ligand-, and statistically-based features and specific DNNs provided the best results as determined by AUC (0.87), MCC (0.47), and other metrics and by their interpretability and chemical meaning of the descriptors/features. In addition, the same features in the DNN method performed better than in a multivariate logistic model: validation MCC = 0.468 and training MCC = 0.868 for the present work compared to evalu- ation set MCC = 0.2036 and training set MCC = 0.5364 for the multivariate logistic regression on the full, unbalanced set. Techniques of this type may improve AR and toxicity description and predic- tion, improving assessment and design of compounds. Source code and data are available at https://github.com/AlfonsoTGarcia-Sosa/ML
ARTICLE | doi:10.20944/preprints202210.0190.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: Above-ground biomass; mangroves; pneumatophores; terrestrial LiDAR; machine learning; random forest
Online: 13 October 2022 (08:14:07 CEST)
Accurately quantifying the above-ground volume (AGV) and thus above-ground biomass (AGB) of forest stands is an important aspect in the conservation of mangrove ecosystem owing to their ecological and economic benefits. However, the number of studies focusing on quantifying mangrove forests’ biomass has been relatively low due to their marshy terrain, making exploratory studies challenging. In recent times, the use of LiDAR technologies in forest inventory studies has become increasingly popular, due to the reliability of LiDAR as a highly accurate means of 3D spatial data acquisition. In this study, we propose an end-to-end methodology for estimating AGV of mangrove forest stands from terrestrial LiDAR data. Many of the recent studies on this topic effectively employ machine learning algorithms such as multi layer perceptron, random forests, etc. for filtering foliage in the point cloud data of single trees. This study further extends that approach by incorporating the impact of class imbalance of forest point cloud data in a weighted random forest classifier. For the task of segmentation of wood/foliage points in a single tree point cloud, this approach yielded an average increase of 2.737% in the balanced accuracy score, 0.007 in the Cohen’s kappa score, 2.745% in the ROC AUC score and 0.857% in the F1 score. For the task of AGV estimation of a single tree, this approach resulted in an average coefficient of determination of 0.93 with respect to the ground truth volumes. For the task of counting pneumatophores in a plot-level point cloud, the proposed breadth-first searching method yielded an average coefficient of determination of 0.9391. Also, the machine learning classifier and geometric features used in this study were invariant to tree species and hence could be generalised for the classification of point clouds of other tree species as well. Finally, a breadth-first graph-search segmentation based approach is also proposed as part of this pipeline to estimate the contribution of pneumatophores to the AGB of mangrove forest stands. Since pneumatophores are a special adaptation of mangrove forests for gaseous exchange in marshy environments, this study aims to incorporate the detection and AGB estimation of pneumatophores in the inventory of mangrove forest stands. Studying the contribution of pneumatophores to the AGB of mangrove forest plots could also aid future mangrove forest inventory studies in modeling the underlying root network and estimating the below-ground biomass of mangrove trees.
ARTICLE | doi:10.20944/preprints202308.1209.v1
Subject: Medicine And Pharmacology, Medicine And Pharmacology Keywords: Breast cancer; Activity prediction; Random forest; Feature selection; Bayesian hyperparameter optimization; AdaBoosting
Online: 17 August 2023 (03:57:54 CEST)
Breast cancer is the most common malignancy in women worldwide. The pathogenesis of this disease is closely related to the estrogen receptor alpha subtype (ERα). Therefore, it is of great importance to develop effective inhibitors of ERα activity for the treatment of breast cancer. In this paper, we propose a novel ensemble machine learning model for quantitative structure-activity relationship of anti-breast cancer drugs, which can effectively predict drug activity in small samples with multiple characteristic variables. To avoid the problem of over-fitting caused by low-correlation independent variables, the scoring mechanism of random forest was improved by incorporating three relevance indicators, including the maximum mutual information number, Pearson correlation coefficient and distance correlation coefficient, and 20 optimal molecular descriptors were selected. The Bayesian hyperparameter optimization method was used to optimize the parameters of multiple linear regression (MLR), support vector regression (SVR), and extreme gradient boosting (XGBoost), respectively. The AdaBoost strong learner was constructed by combining the weak learner with the weighted linear addition method. The results show that the proposed ensemble learning model has the best prediction performance compared to the three basic learner models and the CNN-LSTM combination prediction model. The root mean square error was reduced by 7.60%-26.51%. The mean relative error was reduced by 6.46%-30.92%. Goodness of fit increased by 9.57%-36.94%. Finally, the biological activities of 50 candidate compounds for ERα inhibitors were predicted, and it was found that 4-[2-benzyl-1-[4-(2-pyrrolidin-1-ylethoxy)phenyl]but-1-enyl]phenol had an excellent biological activity value pIC50, which had the potential to be an ERα inhibitor. The model proposed in this paper has good prediction accuracy, which can provide an effective reference for the discovery and development of anti-breast cancer drugs.
ARTICLE | doi:10.20944/preprints202108.0024.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: Nitrogen Dioxide (NO2); Random Forest; Contribution Rate; Air pollution; COVID-19 lockdown
Online: 2 August 2021 (11:54:10 CEST)
During the COVID-19 lockdown in Wuhan, transportation, industrial production and other human activities declined significantly, as did the NO2 concentration. In order to assess the relative contributions of different factors to reductions of air pollutants, sensitivity experiments were implemented by random forest (RF) model, with the comparison of contributions of meteorology, road traffic, and emission sources between different periods. Besides, an emulator was operated to suggest an appropriate limit for control of transportation. The RF models showed different mechanisms for air pollutants. Within-city Migration index (WMI) was more important in the normal, pre-lockdown and post-pandemic model while Out-Migration Index (OMI) was emphasized in the lockdown model. In the COVID-19 lockdown period, 73.3% of the reduction can be attributed to the decreased road traffic, showing massive impact of road traffic on the air quality. In the post-pandemic period, meteorology controlled about 42.2% of the decrease and emissions from industry and household controlled 40.0% while road traffic only contributed to 17.8%. It was suggested that priority of restriction should be given to road traffic within the city. A limit of less than 40% on the control of the road traffic can get a better effect, especially for cities with severe traffic pollution.
ARTICLE | doi:10.20944/preprints202307.1252.v1
Subject: Environmental And Earth Sciences, Geophysics And Geology Keywords: spatial multi-information fusion; random forest; metallogenic prediction; Central Kunlun; Xinjiang
Online: 19 July 2023 (03:08:11 CEST)
In recent years, how to combine intelligent prospecting algorithms such as random forest with a large number of geological and mineral data for quantitative prediction of exploration geochemistry has become an important topic of concern to quantitatively improve the accuracy of target delineation. The ore-forming geological conditions in the central Kunlun area of Xinjiang are great and have good prospecting prospects. However, due to the exhaustion of shallow deposits and the lag of geological prospecting work in the past ten years, there has been no expected breakthrough in the search for large and super-large metal deposits for many years. There has been a serious shortage of reserve resources. The use of new theories, new methods and new technologies for mineral resources investigation and evaluation has become an urgent need in the current prospecting work. In view of this, based on the existing spatial database of geological and mineral resources in the central Kunlun of Xinjiang, combined with the geological characteristics, genesis and metallogenic regularity of the area, this paper carried out a series of studies on gold polymetallic minerals with the help of geographic information system and data science programming software platform. The researchers integrated geological and regional geochemical data, and constructed a random forest metallogenic discriminant model based on two different sampling methods (integrated random undersampling and selection of training samples) to predict the mineralization of gold polymetallic minerals in the central Kunlun area of Xinjiang and delineate the metallogenic target area. The quantitative prediction of gold polymetallic mineral resources in the central Kunlun area of Xinjiang by two random forest models is compared and discussed: the known ore spots, fault structures and geochemical information are extracted, and the known gold polymetallic ore spots and geochemical data are used to form a training set and a prediction set to construct a machine learning random forest model. The results of prediction evaluation and metallogenic prospect division show that for different sampling methods, the performance evaluation parameters of the training process show that the prediction accuracy of the selected training samples is higher, and the selected training samples are more reliable because they can fully learn the complex information of the original data. In the metallogenic prospect prediction and metallogenic potential division, the random forest model of selecting training samples has more reference value and further exploration research significance in the production problem considering the actual exploration cost because of its small area of high potential prediction area and high proportion of ore bearing per unit area. At the same time, this study innovatively improves the prediction accuracy, reduces the exploration risk, and expands the prospecting idea of machine learning algorithm in mathematical geology in the central Kunlun area of Xinjiang. The delineated metallogenic potential area has positive guiding significance for the actual gold polymetallic prospecting work in this area.
ARTICLE | doi:10.20944/preprints202307.0841.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: land cover; sentinel-2 images; random forest; boreal forest; alpine tundra
Online: 12 July 2023 (13:39:19 CEST)
A land cover map of two arctic catchments, nearby the Abisko Scientific Research Station, was obtained from a classification of a Sentinel-2 satellite image and a ground survey performed in July 2022. The two contiguous catchments, Miellajokka and Stordalen, are covered by various ecotypes, from boreal forest to alpine tundra and peatland. The random forests algorithm correctly identified 88% of polygon pixels reserved for testing. The developed workflow relied solely on open source software and acquired ground observations. Space organization was directed by the altitude as demonstrated by the intersection of the land cover with the topography. Comparison between this new land cover map and previous ones based on data acquired between 2008 and 2011 shows some trends of vegetation cover evolution in response to climate change in the considered area.
ARTICLE | doi:10.20944/preprints202306.1742.v1
Subject: Engineering, Civil Engineering Keywords: Machine learning; ground vibration; on-site experiment; random forest; Bayesian optimization; elevated high-speed railway
Online: 26 June 2023 (05:11:33 CEST)
Aiming at the prediction of environmental vibrations induced by elevated high-speed railway, a machine-learning method is developed by combining random forest algorithm and Bayesian optimization, which using the dataset from on-site experiments . When it comes to achieving a rapid and effective prediction of environmental vibration, there is few research on com-parisons and verifications of different algorithms, and neither on parameter tuning and optimi-zation of machine learning algorithms. In this paper, a field experiment is firstly carried out to measure the ground vibrations caused by high-speed trains running on bridge, and then the en-vironmental vibration characteristics are analyzed in view of ground accelerations and weighted vibration levels. Subsequently, three machine-learning algorithms of linear regression, support vector machine and random forest are developed by using experimental database, and their prediction performance are discussed. Finally, two optimization models for the hyperparameter set of random forest algorithm are further compared. It turns out that the integrated random forest algorithm has higher accuracy in predicting environmental vibration than linear regression and support vector machine; the Bayesian optimization has excellent performance and high efficiency in achieving efficient and in-depth optimization of parameters, and can be combined with the RF machine learning algorithm to effectively predict the environmental vibrations induced by the high-speed railway.
ARTICLE | doi:10.20944/preprints202306.0123.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: Sentinel-2 multispectral data; Maize lodging; Random Forest classification; Predictive variables; Model generalizability
Online: 2 June 2023 (04:08:42 CEST)
Lodging is a common problem in maize production that seriously impacts yield, quality, and the capacity for mechanical harvesting. Evaluation of site-specific lodging risks requires establishment of a method for multi-year monitoring. In this study, spectral images collected by the Sentinel-2 satellite were processed to obtain three types of data: gray-level co-occurrence matrix texture (GLCM), vegetation indices (VIs), and spectral reflectance (SR). Lodging classification models were then established with Random Forest (RF) using each of the three data types separately (the GLCM, VI, and SR models) and in combination (SR+VI model, SR+GLCM model, VI+GLCM mod-el, and SR+VI+GLCM model). By gradually removing features with low importance scores from the SR+VI+GLCM model and analyzing the changes in the overall accuracy (OA), the optimal set of predictive variables was identified and used to construct the optimal model. A model built us-ing data from a single timepoint in 2021 was tested on data collected at a similar timepoint in 2019 and vice versa to assess interannual model generalizability. The results of this study demon-strate that for monitoring maize lodging, models constructed with a single feature type, the GLCM model had significantly lower accuracy compared to the VI and SR models. During certain growth stages, the model constructed with combined features had significantly higher accuracy in monitoring maize lodging compared to models constructed with a single feature. During the pro-cess of selecting the optimal predictive variables, it was found that the accuracy of the model did not increase as the number of predictive variables increased. The results show that the positive and negative validation models had an accuracy of 96.55% and 95.18%, with kappa values of 0.93 and 0.83, respectively. This indicates that the model has strong generality for the same repro-ductive stage between years. This study provides a detailed method for large-scale maize lodging monitoring, allowing for identification of optimal planting practices to reduce the probability of lodging and ultimately improving regional maize yield and quality.
ARTICLE | doi:10.3390/sci2040061
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: industry4.0; fault detection; fault diagnosis; random forest; diagnostic graph; distributed diagnosis; model-based; data-driven; hybrid approach; hydraulic test rig
Online: 24 September 2020 (00:00:00 CEST)
In this work, a hybrid component Fault Detection and Diagnosis (FDD) approach for industrial sensor systems is established and analyzed, to provide a hybrid schema that combines the advantages and eliminates the drawbacks of both model-based and data-driven methods of diagnosis. Moreover, it shines the light on a new utilization of Random Forest (RF) together with model-based diagnosis, beyond its ordinary data-driven application. RF is trained and hyperparameter tuned using three-fold cross validation over a random grid of parameters using random search, to finally generate diagnostic graphs as the dynamic, data-driven part of this system. This is followed by translating those graphs into model-based rules in the form of if-else statements, SQL queries or semantic queries such as SPARQL, in order to feed the dynamic rules into a structured model essential for further diagnosis. The RF hyperparameters are consistently updated online using the newly generated sensor data to maintain the dynamicity and accuracy of the generated graphs and rules thereafter. The architecture of the proposed method is demonstrated in a comprehensive manner, and the dynamic rules extraction phase is applied using a case study on condition monitoring of a hydraulic test rig using time-series multivariate sensor readings.
ARTICLE | doi:10.20944/preprints202007.0548.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: industry4.0; fault detection; fault diagnosis; random forest; diagnostic graph; distributed diagnosis; model-based; data-driven; hybrid approach; hydraulic test rig
Online: 23 July 2020 (11:26:41 CEST)
In this work, A hybrid component Fault Detection and Diagnosis (FDD) approach for industrial sensor systems is established and analyzed, to provide a hybrid schema that combines the advantages and eliminates the drawbacks of both model-based and data-driven methods of diagnosis. Moreover, spotting the light on a new utilization of Random Forest (RF) together with model-based diagnosis, beyond its ordinary data-driven application. RF is trained and hyperparameter tuned using 3-fold cross-validation over a random grid of parameters using random search, to finally generate diagnostic graphs as the dynamic, data-driven part of this system. Followed by translating those graphs into model-based rules in the form of if-else statements, SQL queries or semantic queries such as SPARQL, in order to feed the dynamic rules into a structured model essential for further diagnosis. The RF hyperparameters are consistently updated online using the newly generated sensor data, in order to maintain the dynamicity and accuracy of the generated graphs and rules thereafter. The architecture of the proposed method is demonstrated in a comprehensive manner, as well as the dynamic rules extraction phase is applied using a case study on condition monitoring of a hydraulic test rig using time series multivariate sensor readings.
ARTICLE | doi:10.20944/preprints201806.0188.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: minimum noise fraction (MNF) transformation; object-based image analysis (OBIA); APEX hyperspectral imagery; Random forest (RF) classifier; multiresolution segmentation (MRS); tree species classification
Online: 12 June 2018 (10:55:07 CEST)
Tree species composition is an important key element for biodiversity and sustainable forest management, and hyperspectral data provide detailed spectral information, which can be used for tree species classification. There are two main challenges for using hyperspectral imagery: a) Hughes phenomena, meaning by increasing the number of bands in hyperspectral imagery, the number of required classification samples would increase exponentially, and b) in a more complex environment, such as riparian mixed forest, focusing on spectral variability per pixel may not be adequate for definability of tree species. Therefore, the focus of this study is to assess spectral-spatial dimensionality reduction of airborne hyperspectral imagery by using minim noise fraction (MNF) transformation, and object-based image analysis (OBIA). An airborne prism experiment (APEX) hyperspectral imagery was used. A study area was a riparian mixed forest located along the Salzach river, and six tree species including Picea abies, Populus (canadensis and balsamifera), Fraxinus excelsior, Alnus incana, and Salix alba were selected. A machine learning algorithm random forest (RF) was used to train and apply a prediction model for classification. Using a spectral dimensionality reduced APEX, a pixel-level classification was also done. According to a confusion matrix, the object-level classification of MNF-derived components achieved the overall accuracy of 85 %, and kappa coefficient of 0.805. The performance of classes according to producer’s accuracy varied between 80% for Fraxinus excelsior, Alnus incana, and Populus canadensis to 90% for Salix alba and Picea abies. Comparison the results to a pixel-level classification, showed a better performance of object-level classification (an overall accuracy of 63% and Kappa coefficient of 0.559 were achieved for pixel-level classification). The performance of classes using pixel-based classification varied 45 % for Alnus incana to 80% for Picea abies. In general, Spectral-spatial complexity reduction using MNF transformation and object-level classification yielded a statistically satisfactory results.
ARTICLE | doi:10.20944/preprints202306.1169.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: Entropy accumulation; Random Number Generator; Quantum random noises
Online: 16 June 2023 (04:32:21 CEST)
The efficient generation of high-quality random numbers is essential in the operation of cryptographic modules. The quality of a random number generator is evaluated by the min-entropy of its entropy source. Typical method used to achieve high min-entropy of the output sequence is an entropy accumulation based on a hash function. This is grounded in the famous Leftover Hash Lemma which guarantees a lower bound on the min-entropy of the output sequence. However, the hash function based entropy accumulation has slow speed in general. For a practical perspective we need a new efficient entropy accumulation with the theoretical background for the min-entropy of the output sequence. In this work, we obtain the theoretical bound for the min-entropy of the output random sequence through the very efficient entropy accumulation using only bitwise XOR operations, where the input sequences from the entropy source are independent. Moreover we examine our theoretical results by applying to the quantum random number generator that uses dark noise arising from image sensor pixels as its entropy source.
ARTICLE | doi:10.20944/preprints201809.0195.v1
Subject: Computer Science And Mathematics, Analysis Keywords: random fixed point, random $\alpha-$admissible with respect to $\eta$, generalized random $\alpha-\psi-$contractive mapping.
Online: 11 September 2018 (11:52:25 CEST)
In this paper, we prove some random fixed point theorems for generalized random $\alpha-\psi-$contractive mappings in a Polish space and, as some applications, we show the existence of random solutions of second order random differential equation.
ARTICLE | doi:10.20944/preprints201905.0036.v3
Online: 4 June 2019 (11:12:53 CEST)
We talk about random when it is not possible to determine a pattern on the observed outcomes. A computer follows a sequence of fixed instructions to give any of its output, hence the difficulty of choosing numbers randomly from algorithmic approaches. However, some algorithms based on mathematical formulas like the Linear Congruential algorithm and the Lagged Fibonacci generator appear to produce "true" random sequences to anyone who does not know the secret initial input . Up to now, we cannot rigorously answer the question on the randomness of prime numbers [2, page 1] and this highlights a connection between random number generator and the distribution of primes. From  and  one sees that it is quite naive to expect good random reproduction with prime numbers. We are, however, interested in the properties underlying the distribution of prime numbers, which emerge as sufficient or insufficient arguments to conclude a proof by contradiction which tends to show that prime numbers are not randomly distributed. To achieve this end, we use prime gap sequence variation. The algorithm that we produce makes possible to deduce, in the case of a binary choice, a uniform behavior in the individual consecutive occurrence of primes, and no uniformity trait when the occurrences are taken collectively.
Subject: Computer Science And Mathematics, Discrete Mathematics And Combinatorics Keywords: discrete degenerate random variables; degenerate binomial random variable; degenerate Poisson random variable; new type degenerate Bell polynomials
Online: 15 November 2019 (16:43:03 CET)
In this paper, we introduce two discrete degenerate random variables, namely the degenerate binomial and degenerate Poisson random variables. We deduce the expectations of the degenerate binomial random variables. We compute the generating function of the moments of the degenerate Poisson random variables, which leads us to define the new type degenerate Bell polynomials, and hence obtain explicit expressions for the moments of those random variables in terms of such polynomials. We also get the variances of the degenerate Poisson random variables. Finally, we illustrate two examples of the degenerate Poisson random variables.
REVIEW | doi:10.20944/preprints202309.0093.v1
Subject: Medicine And Pharmacology, Other Keywords: blocking; hazard ratios; confidence intervals; generalizability; randomized controlled trials; random allocation; random sampling; random treatment assignment; stratification; transportability
Online: 4 September 2023 (03:22:18 CEST)
This article describes rationales and limitations for making inferences based on data from randomized controlled trials (RCTs). We argue that obtaining a representative random sample from a patient population is impossible for a clinical trial because patients are accrued sequentially over time and thus comprise a convenience sample, subject only to protocol entry criteria. Consequently, the trial’s sample is unlikely to represent a definable patient population. We use causal diagrams to illustrate the difference between random allocation of interventions within a clinical trial sample and true simple or stratified random sampling, as done in surveys. We argue that group-specific statistics, such as a median survival time estimate for a treatment arm in an RCT, have limited meaning as estimates of larger patient population parameters. In contrast, random allocation between interventions facilitates comparative causal inferences about between-treatment effects, such as hazard ratios or differences between probabilities of response. Comparative inferences also require the assumption of transportability from a clinical trial’s convenience sample to a targeted patient population. We focus on the consequences and limitations of randomization procedures in order to clarify the distinctions between pairs of complementary concepts of fundamental importance to data science and RCT interpretation. These include internal and external validity, generalizability and transportability, uncertainty and variability, representativeness and inclusiveness, blocking and stratification, relevance and robustness, forward and reverse causal inference, intention to treat and per protocol analyses, and potential outcomes and counterfactuals.
ARTICLE | doi:10.20944/preprints202002.0350.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: Swift-Hohenberg equation; Random-pullback attractor; Non-autonomous random dynamical system
Online: 24 February 2020 (12:30:08 CET)
In this paper, we study the existence of the random -pullback attractor of a non-autonomous local modiﬁed stochastic Swift-Hohenberg equation with multiplicative noise in stratonovich sense. It is shown that a random -pullback attractor exists in when its external force has exponential growth. Due to the stochastic term, the estimate are delicate, we overcome this difficulty by using the Ornstein-Uhlenbeck(O-U) transformation and its properties.
ARTICLE | doi:10.20944/preprints202309.0879.v1
Subject: Computer Science And Mathematics, Security Systems Keywords: Random Number Generation; Cryptography
Online: 14 September 2023 (03:37:49 CEST)
this paper we present approaches for generating random numbers along with potential applications. Rather than trying to provide extensive coverage of several techniques or algorithms that have appeared in the scientific literature, we focus on some representative approaches presenting their workings and properties in detail. Our goal is to delineate their strengths and weaknesses as well as their potential application domains so as the reader can judge what would be the best approach for the application in hand, possibly a combination of the available approaches. For instance, a physical source of randomness can be used for the initial seed, then suitable preprocessing can enhance its randomness and then the output of the preprocessing can feed different types of generators, e.g. a linear congruential generator, a cryptographically secure one and one based on the combination of one way hash functions and shared key cryptoalgorithms in various modes of operation. Then, if desired, the outputs of the different generators can be combined giving the final random sequence. Moreover, we present a set of practical randomness tests which can be applied on the outputs of random number generators in order to assess their randomness characteristics. In order to demonstrate the importance of unpredictable random sequences, we present an application of cryptographically secure generators in domains where unpredictability is one of the major requirements, i.e. eLotteries and cryptographic key generation.
ARTICLE | doi:10.20944/preprints201905.0165.v2
Online: 18 June 2019 (11:15:56 CEST)
Background: As the opioid epidemic continues, understanding the geospatial, temporal and demand patterns is important for policymakers to assign resources and interdict individual, organization, and country-level bad actors. Methods: GIS geospatial-temporal analysis and extreme-gradient boosted random forests evaluate ICD-10 F11 opioid-related admissions and admission rates using geospatial analysis, demand analysis, and explanatory models, respectively. The period of analysis was January 2016 through September 2018. Results: The analysis shows existing high opioid admissions in Chicago and New Jersey with emerging areas in Atlanta, Salt Lake City, Phoenix, and Las Vegas. High rates of admission (claims per 10,000 population) exist in the Appalachian area and on the Northeastern seaboard. Explanatory models suggest that hospital overall workload and financial variables might be used for allocating opioid-related treatment funds effectively. Gradient-boosted random forest models accounted for 87.8% of the variability of claims on blinded 20% test data. Conclusions: Based on the GIS analysis, opioid admissions appear to have spread geographically, while higher frequency rates are still found in some regions. Interdiction efforts require demand-analysis such as that provided in this study to allocate scarce resources for supply-side and demand-side interdiction: prevention, treatment, and enforcement. Based on GIS analysis, the opioid epidemic is likely to spread or diffuse through the country, and interdiction efforts require demand-analysis such as that provided in this study to allocate scarce resources for supply-side and demand-side interdiction: prevention, treatment, and enforcement.
ARTICLE | doi:10.20944/preprints202308.1257.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Blockchain; Mobility; Random Selection; Encoding; Token
Online: 17 August 2023 (09:55:13 CEST)
. Various elements such as evolutions in IoT services resulting from sensoring by vehicle parts and advances in small communication technology devices have significantly impacted the mass spread of mobility services that are provided to users in need of limited resources. In particular, business models are progressing away from one-off cost towards longer term cost as represented by shared services utilizing kick-boards or bicycles and subscription services for vehicle software. Advances in shared mobility services as described are calling for solutions that can enhance reliability of data aggregate by users leveraging mobility services in the next-generation mobility areas. However, mining process to renew status, ensures continued network communication and block creation demands high performance in public block chain. This thesis proposes random certificate node selection mechanism in block network that creates blocks via node that has tokens issued for block creation and lets only specific nodes selected by encrypting token information acquires token.
ARTICLE | doi:10.20944/preprints202307.1517.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: random; vibrations; tillage; tools; complex; cultivator
Online: 21 July 2023 (11:39:13 CEST)
The article continues the exposition of the results obtained in the research of an agricultural machine for processing soil, designed for research with applications including exploitation. The MCLS complex cultivator was designed for the research of the working processes of the instruments intended for soil processing. The MCLS cultivator is a modulated machine (it can work for three working widths: 1, 2, and 4 m, with tractors of different powers) that is designed to use a wide range of working bodies. The experimental data obtained with the structure with a working width of 1 m and the results of their processing within the framework of the theory of random vibrations are presented in this article. The experimental results are analysed as random vibrations of the supports of the active body. As a result, the main characteristics of random vibrations are exposed: the distribution function, the average value, the autocorrelation, and the frequency spectrum. These general results regarding random vibrations are used for several critical applications in the design, execution, and exploitation of some subassemblies and assemblies of agricultural machines of this type. The main applications are: estimating the probability of the occurrence of dangerous load peaks, counting and selecting the load peaks that produce fatigue accumulation in the material of the supports of the working bodies, identifying some design deficiencies or defects in the work regime, estimating the effects of vibrations on the quality of soil processing. All the outcomes are comprised of applications in MCLS research and exploitation. The applications pursue well-known objectives of modeling the working processes of agricultural machines: safety at work, increasing the quality of work, optimizing energy consumption, and increasing productivity, all in a broad context to obtain a compromise situation
ARTICLE | doi:10.20944/preprints202305.1425.v1
Subject: Public Health And Healthcare, Public Health And Health Services Keywords: Haemoglobin; Anaemia; Dietary, Diversity; Random-effect
Online: 19 May 2023 (10:06:34 CEST)
Anaemia is the most prevalent nutritional deficiency in the World and is associated with long-lasting developmental effects in children. Anaemia weakens the immune system and im-pairs cognitive development of children. Anaemia is multifactorial; therefore, anaemia’s pre-ventive/control measures should be evidence-based. This paper aimed to investigate the preva-lence and correlates of anaemia in preschool aged children at individual, maternal/household and community level in the Democratic Republic of Congo (DRC). Retrospective, nationally representative cross-sectional data, the 2013-2014 DRC Demographic and Health Survey (DHS) was used. Three -levels random intercept logistics regression models were fitted to the data using the outcome anaemia in children, defined as a haemoglobin concentration below 11dl with potential risk factors grouped at individual, maternal/household and community levels. Anaemia in children is a severe public health issue in the DRC, 63% of school-aged children are anaemic. Anaemia is highly prevalent amongst males, children with infections (fever/malaria and intestinal parasites), whose mothers are anaemic, from the poorest household, and whose drinking source of water is unclean, and who reside in provinces with recent/previous armed forces attacks. Results from this paper highlight the need for a clean and safe environment for children’s growth.
ARTICLE | doi:10.20944/preprints202102.0492.v3
Subject: Social Sciences, Geography, Planning And Development Keywords: LMIC; Global South; indicator; Random Forrest
Online: 1 April 2022 (06:22:53 CEST)
Disaggregated population counts are needed to calculate health, economic, and development indicators in Low- and Middle-Income Countries (LMICs), especially in settings of rapid urbanisation. Censuses are often outdated and inaccurate in LMIC settings, and rarely disaggregated at fine geographic scale. Modelled gridded population datasets derived from census data have become widely used by development researchers and practitioners. These datasets are evaluated for accuracy at the spatial scale of the input data which is often much courser (e.g. administrative units) than the neighbourhood or cell-level scale of many applications. We simulate a realistic "true" 2016 population in Khomas, Namibia, a majority urban region, and introduce realistic levels of outdatedness (over 15 years) and inaccuracy in slum, non-slum, and rural areas. We aggregate these simulated realistic populations by census and administrative boundaries (to mimic census data), and generate 32 gridded population datasets that are typical of a LMIC setting using WorldPop-Global-Unconstrained gridded population approach. We evaluate the cell-level accuracy of these simulated datasets using the original "true" population as a reference. In our simulation, we found large cell-level errors, particularly in slum cells, driven by the use of average population densities in large areal units to determine cell-level population densities. Age, accuracy, and aggregation of the input data also played a role in these errors. We suggest incorporating finer-scale training data into gridded population models generally, and WorldPop-Global-Unconstrained in particular (e.g., from routine household surveys or slum community population counts), and use of new building footprint datasets as a covariate to improve cell-level accuracy. It is important to measure accuracy of gridded population datasets at spatial scales more consistent with how the data are being applied, especially if they are to be used for monitoring key development indicators at neighbourhood scales with relevance to small dense deprived areas within larger administrative units.
ARTICLE | doi:10.20944/preprints201805.0302.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: graph entropy; chromatic classes; random graphs
Online: 22 May 2018 (11:59:26 CEST)
Combinatoric measures of entropy capture the complexity of a graph, but rely upon the calculation of its independent sets, or collections of non-adjacent vertices. This decomposition of the vertex set is a known NP-Complete problem and for most real world graphs is an inaccessible calculation. Recent work by Dehmer et al. and Tee et al. identified a number of alternative vertex level measures of entropy that do not suffer from this pathological computational complexity. It can be demonstrated that they are still effective at quantifying graph complexity. It is intriguing to consider whether there is a fundamental link between local and global entropy measures. In this paper, we investigate the existence of correlation between vertex level and global measures of entropy, for a narrow subset of random graphs. We use the greedy algorithm approximation for calculating the chromatic information and therefore Körner entropy. We are able to demonstrate close correlation for this subset of graphs and outline how this may arise theoretically.
REVIEW | doi:10.20944/preprints202304.0755.v1
Subject: Business, Economics And Management, Finance Keywords: option pricing; fuzzy-random variables; fuzzy numbers; fuzzy-random option pricing; Vasicek’s model of term structure
Online: 23 April 2023 (03:58:33 CEST)
This paper has a twofold objective. The first aim is to present a comprehensive bibliographical analysis of journal articles and book chapters on fuzzy-random option pricing (FROP) over the WoS and SCOPUS databases. It follows PRISMA criteria and takes special care of developments in continuous time. Thus, we described the principal findings about the research streams, outlets and authors of this topic and lets us suggest further research. The second contribution is motivated by the fact that the bibliographical revision has identified a lack of developments on equilibrium models on the yield curve. This question motivates extending Vasicek’s yield curve equilibrium model to introduce fuzziness in the parameters that govern interest rate movements (speed of reversion, equilibrium short-term interest rate, and volatility). Likewise, this paper develops an empirical application on the term structure of fixed income public bonds with the highest credit rating in the Euro area.
ARTICLE | doi:10.20944/preprints202309.0607.v1
Subject: Environmental And Earth Sciences, Pollution Keywords: ammonia; emission modelling; emission inventory; random forest
Online: 11 September 2023 (05:26:24 CEST)
Ammonia is an atmospheric pollutant, predominantly emitted from agriculture, leading acidification and eutrophication of soil and water and contributing to secondary PM2.5. The implementation of accurate emission inventories with high spatial and time resolution plays a fundamental role in the development of air modelling simulation and in the impact assessment of actions for air quality improvement. The development and release of new algorithms and the increase of data availability are supporting the implementation of machine learning approaches in environmental and air quality data analysis. In this paper we present a methodology developed by the application of the Random Forest algorithm to bottom-up local emission inventories of ammonia to validate annual time series of ammonia emissions and calculate high resolution temporal profiles. The model has been trained and tested by the hourly measurements of ammonia concentrations and atmospheric turbulence parameters starting from a constant emission scenario. The initial values of emissions are calculated based on a bottom-up emission inventory detailed at the municipal basis and considering a circular area of about 4 km radius centered on measurement sites. By comparing predicted and measured concentrations, the emissions are modified, the model's training and testing are repeated, and the model converges to a very high performance in predicting ammonia concentrations and establishing an hourly time changing emission profile. The site-specific emissions profiles, estimated by the proposed methodology, clearly show a nonlinear relation with measured concentrations and allow to identify the effect of atmospheric turbulence on pollutant accumulation. The estimated time series well confirm the available data of the emission inventories and the monthly emission profiles have been compared with estimated data from satellite.
ARTICLE | doi:10.20944/preprints202301.0557.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: PDF; Malware; Machine Learning; Python; Random Forest
Online: 30 January 2023 (12:55:47 CET)
Portable Document Format (PDF) is one of the most widely used files types worldwide in data exchange, this has encourage hackers to utilize such files to spread any malicious content through PDF, utilizing different methods and techniques to accomplish that, on the other hand, security researches kept trying to improve detection methods to cope up to the rapidly increasing number of malwares daily, one of the commonly used detection technique nowadays is by utilizing artificial intelligence and Machine learning classificat; thision to help detecting PDF Malwares, in this paper, we utilize machine learning classifier Random Forest on a newly released PDF Malware dataset CIC-Evasive-PDFMal2022 to achieve the main goal of detecting malicious PDF documents, results showing a detection accuracy of around 99.5%
ARTICLE | doi:10.20944/preprints202208.0050.v1
Subject: Physical Sciences, Applied Physics Keywords: quorum sensing; resistance random network; complex networks
Online: 2 August 2022 (08:21:25 CEST)
We propose a model for bacterial Quorum Sensing based on an auxiliary electrostatic-like interaction originating from a fictitious electrical charge that represents bacteria activity. A cooperative mechanism for charge/activity exchange is introduced to implement chemotaxis and replication. The bacteria system is thus represented by means of a complex resistor network where link resistances take into account the allowed activity-flow among individuals. By explicit spatial stochastic simulations, we show that the model exhibits different quasi-realistic behaviors from colony formation to biofilm aggregation. The electrical signal associated with Quorum Sensing is analyzed in space and time and provides useful information about the colony dynamics. In particular, we analyze the transition between the planktonic and the colony phases as the intensity of Quorum Sensing is varied.
ARTICLE | doi:10.20944/preprints202205.0023.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: Random Triangle; Quasiorthogonal Dimension; Combinatorics; Computational Problems
Online: 5 May 2022 (07:58:23 CEST)
In this work we study the following problem, from a computational point of view: If three points are selected in the unit square at random, what is the probability that the triangle obtained is obtuse, acute or right? We provide two convergent strategies: the frst derived from the ideas introduced in  and the second built on the combinatorics theory. The combined use of these two methods allows us to address the random triangle theory from a new perspective and, we hope, to work out a general method of dealing with some classes of computational problems.
ARTICLE | doi:10.20944/preprints202202.0175.v1
Subject: Biology And Life Sciences, Biology And Biotechnology Keywords: antimicrobial peptide prediction; sequence analysis; random forest
Online: 14 February 2022 (11:57:01 CET)
Antimicrobial peptides (AMPs) are considered as promising alternatives to conventional antibiotics in order to overcome the growing problems of antibiotic resistance. Computational prediction approaches receive an increasing interest to identify and design the best candidate AMPs prior to the in-vitro tests. In this study, we focused on the linear cationic peptides with non-hemolytic activity, which are downloaded from the Database of Antimicrobial Activity and Structure of Peptides (DBAASP). Referring to the MIC (Minimum inhibition concentration) values, we have assigned a positive label to a peptide if it shows antimicrobial activity; otherwise the peptide is labeled as negative. Here, we focused on the peptides showing antimicrobial activity against Gram-negative and against Gram-positive bacteria separately, and we created two datasets accordingly. Ten different physico-chemical properties of the peptides are calculated and used as features in our study. Following data exploration and data preprocessing steps, a variety of classification algorithms are used with 100-fold Monte Carlo Cross Validation to build models and to predict the antimicrobial activity of the peptides. Among the generated models, Random Forest has resulted in the best performance metrics for both Gram-negative dataset (Accuracy: 0.98, Recall: 0.99, Specificity: 0.97, Precision: 0.97, AUC: 0.99, F1: 0.98) and Gram-positive dataset (Accuracy: 0.95, Recall: 0.95, Specificity: 0.95, Precision: 0.90, AUC: 0.97, F1: 0.92) after outlier elimination is applied. This prediction approach might be useful to evaluate the antibacterial potential of a candidate peptide sequence before moving to the experimental studies.
ARTICLE | doi:10.20944/preprints202102.0498.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: proximal hyperspectral sensing; precision agriculture; random forest
Online: 22 February 2021 (17:20:41 CET)
A strategy to reduce qualitative and quantitative losses in crop-yields refers to early and accurate detection of insect-damage caused in plants. Remote sensing systems like hyperspectral proximal sensors are a promising strategy for managing crops. In this aspect, machine learning predictions associated with clustering techniques may be an interesting approach mainly because of its robustness to evaluate high dimensional data. In this paper, we model the spectral response of insect-herbivory-damage in maize plants and propose an approach based on machine learning and a clustering method to predict whether the plant is herbivore-attacked or not using leaf reflectance measurements. We differentiate insect-type damage based on the spectral response and indicate the most contributive wavelengths to perform it. For this, we used a maize experiment in semi-field conditions. The maize plants were submitted to three different treatments: control (health plants); plants submitted to Spodoptera frugiperda herbivory-damage, and; plants submitted to Dichelops melacanthus herbivory-damage. The leaf spectral response of all plants (controlled and submitted to herbivory) was measured with a FieldSpec 3.0 Spectroradiometer from 350 to 2500 nm for eight consecutive days. We evaluated the performance of different learners like random forest (RF), support vector machine (SVM), extreme gradient boost (XGB), neural networks (MLP), and measured the impact of a day-by-day analysis into the prediction. We proposed a novel framework with a ranking strategy, based on the accuracy returned by predictions, and a clusterization method based on a self-organizing map (SOM) to identify important regions in the reflectance measurement. Our results indicated that the RF-based framework algorithm is the overall best learner to deal with this type of data. After the 5th day of analysis, the accuracy of the algorithm improved substantially. It separated the three treatments into different groups with an F-measure equal to 0.967, 0.917, and 0.881, respectively. We also verified that the most contributive spectral regions are situated in the near-infrared domain. We conclude that the proposed approach with machine learning methods is adequate to monitor herbivory-damage of S. frugiperda and stink bugs like Dichelops melacanthus in maize, differentiating the types of insect-attack early on. We also demonstrate that the framework proposed for the analysis of the most contributive wavelengths is suitable to highlight spectral regions of interest.
Subject: Medicine And Pharmacology, Pharmacology And Toxicology Keywords: Cannabis; Metabolite; Principal Component Analysis; Random Forest
Online: 5 September 2020 (07:51:50 CEST)
The many strains of Cannabis spp. are associated with many effects on users and contain many different potentially psychoactive metabolites, but the links between metabolite profiles and user effects are unclear. Here we take a statistical approach to linking cause (i.e. metabolites) to effects in Cannabis spp. through the prism of strains, using quantitative data for metabolite composition and user effects. We find that species (indica vs. sativa) explains <2% of the variability in metabolite profiles, while strain explains 1/3 of variability, indicating species is nonindicative of metabolite composition, while strain is approximately indicative. Using random forests we generate a table of potential metabolite-effect links. We also find that effect-weighted metabolite composition can effectively be described in terms of four values representing the concentrations of pairs or triplets of particular compounds.
ARTICLE | doi:10.20944/preprints202008.0132.v1
Subject: Environmental And Earth Sciences, Soil Science Keywords: reinforced soil; hexapods; layered inclusion; random inclusion
Online: 5 August 2020 (10:51:29 CEST)
Henry Vidal first introduced the concept of using strips, grids, and sheets for reinforcing soil masses. Since then, a large variety of materials such as steel bars, tire shreds, polypropylene, polyester, glass fibers, coir, jute fibers etc. have been widely added to the soil mass randomly or in a regular, oriented manner. In this investigation, a new concept of multi-oriented plastic reinforcement (hexa-pods), is discussed. A systematic and comprehensive laboratory tests were conducted on unreinforced and reinforced soil samples. Laboratory tests such as direct shear teat and California bearing ratio (CBR) test were analyzed on soil samples consisting of only soil samples, soil sample with random inclusion of hexapods and soil samples with layered inclusion of hexapods. From the results obtained through direct shear test it could be observed that cohesion value of both the soil sample has increased and the angle of internal friction has been decreased after reinforcing it with inclusions in both randomly and layered conditions. CBR test indicates that for same amount of compactive effort, both random and layered inclusions of hexapods show improvement in strength and stiffness. Random inclusions of hexapods give better resistance to penetration as compared to layered inclusions. The hexa-pods also changed the brittle behavior of unreinforced sand samples to ductile ones.
ARTICLE | doi:10.20944/preprints201812.0250.v3
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: Built-settlements; urban features; spatial growth; , random forest; dasymetric modelling; population
Online: 9 October 2019 (10:48:20 CEST)
Mapping settlement extents at the annual time step has a wide variety of applications in demography, public health, sustainable development, and many other fields. Recently, while more multitemporal urban feature or human settlement datasets have become available, issues still exist in remotely-sensed imagery due to coverage, adverse atmospheric conditions, and expenses involved in producing such feature sets. These challenges make it difficult to increase temporal coverage while maintaining high fidelity in the spatial resolution. Here we demonstrate an interpolative and flexible modeling framework for producing annual built-settlement extents. We use a combined technique of random forest and spatio-temporal dasymetric modeling with open source subnational data to produce annual 100m x 100m resolution binary settlement maps in four test countries of varying environmental and developmental contexts for test periods of five-year gaps. We find that in the majority of years, across all study areas, the model correctly identified between 85-99% of pixels that transition to built-settlement. Additionally, with few exceptions, the model substantially out performed a model that gave every pixel equal chance of transitioning to the category “built” in each year. This modelling framework shows strong promise for filling gaps in cross-sectional urban feature datasets derived from remotely-sensed imagery, provide a base upon which to create future built/settlement extent projections, and further explore the relationships between built area and population dynamics.
ARTICLE | doi:10.20944/preprints201804.0022.v1
Subject: Business, Economics And Management, Economics Keywords: cooperatives; membership heterogeneity; random forest; collective action
Online: 2 April 2018 (11:01:16 CEST)
The effects of heterogeneity of cooperative membership on cooperative and collective action sustainability has been previously discussed. However, despite the importance of membership heterogeneity in recent theoretical frameworks, empirical examinations have been limited. We determine the effect of changes to cooperative member heterogeneity on cooperative sustainability and discuss changes to heterogeneity overtime that can advance our understanding to cooperative sustainability long-term. This study uses USDA Agricultural Management Resource Survey data, coupled with USDA-Rural Development cooperative financial data at the state level, to quantify effects of cooperative member heterogeneity to sustainability of U.S. farmer cooperatives. We use random forest regression to interpret the significance of heterogeneity with cooperative sustainability at an aggregate level. The findings of this empirical study narrowly reconciles the theoretical understanding of the emergence of intra-cooperative issues while providing consistent empirical evidence and expectations for the sustainability of cooperatives in the near term.
ARTICLE | doi:10.20944/preprints201611.0028.v1
Subject: Medicine And Pharmacology, Oncology And Oncogenics Keywords: random survival forests; ependymoma; predictors; valproic acid
Online: 3 November 2016 (11:02:12 CET)
Ependymoma is responsible for 8–10% of all pediatric brain tumors and constitutes the third most common brain tumor in children. No robust molecular markers are yet in routine clinical use. Surgical resection and adjuvant radiotherapy cure approximately 40-70% of pediatric patients with ependymoma. In our centre, we have been using prophylactic valproic acid treatment for brain tumor patients. Initial observations indicated that valproate could have a beneficial effect in the survival of patients. Recent observations by other authors have shown that patients with glioblastoma benefited from the treatment with valproic acid, a histone deacetylase inhibitor. We have used random survival forest, a novel ensemble survival modelling method to study a single- center, small number cohort of pediatric patients with ependymoma. This analysis has confirmed surgery resection extent and treatment with radiotherapy as independent predictors of overall survival. Treatment with valproic acid was also a predictor of higher survival in this cohort. These results highlight the potential usefullness of the random survival forest model in gathering information from retrospective data. More data is needed about the possible influence of histone deacetylase inhibition by valproic acid in the survival of patients with ependymoma.
ARTICLE | doi:10.20944/preprints202111.0078.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: GRaVN; machine learning; convolutional neural networks; CNN; raman spectroscopy; analogue missions; planetary science; random undersampling; random oversampling; CanMoon
Online: 3 November 2021 (09:24:38 CET)
During planetary exploration mission operations, one of the key responsibilities of the instrument teams to determine data viability for subsequent analysis. During the 2019 CanMoon Lunar Sample Return Analogue Mission, the Lead Raman Specialist manually examined each spectra to provide quality assurance/validation. This non-trivial process requires years of experience to complete accurately. With the proven efficacy of Convolutional Neural Networks (CNNs) in classification tasks, and the increased use of automation and control loops on planetary space platforms for navigation and science targeting, an opportunity presents itself to approach this validation problem utilising CNNs. We present the Generalised Raman Validation Network (GRaVN), an neural network focused specifically on extracting the generalised structure of Raman spectra for quality assurance/validation. This work demonstrates the viability of utilising a CNN network in validation activities for Raman spectroscopy. Utilising only two hidden layers, a configuration was developed that provided good levels of accuracy on a manually curated dataset. This indicates that such a system could be useful as part of an autonomous control loop during planetary exploration activities.
ARTICLE | doi:10.20944/preprints202304.1222.v1
Subject: Biology And Life Sciences, Toxicology Keywords: aromatic compounds; CYP2E1; phenylalanine; molecular simulation; random forest
Online: 29 April 2023 (07:32:55 CEST)
The amino acid composition of the active site in an enzyme is influential on its substrate selectivity. For CYP2E1, the role of PHE residues in the formation of effective orientations for its activity toward aromatic substrates remains unclear. In this study, molecular docking and molecular dynamics analysis were performed toward the interactions between PHEs in the active site of human CYP2E1 and various aromatic compounds as confirmed CYP2E1 substrates. The results indicated that the orientation of 1-methylpyrene (1-MP) in the active site was highly modulated by the PHEs, PHE478 contributing to the binding free energy most significantly. Furthermore, by building a random forest model the relationship between each of 19 molecular descriptors of PCB congeners (from molecular docking, quantum mechanics, and physicochemical properties) and established human CYP2E1-dependent mutagenicity of a series of polychlorinated biphenyls (PCBs), which have been proven to be human carcinogens and endocrinal disrupters, was investigated. The presence of PHEs did not appear to significantly modify the electronic or structural feature of each bound ligand (PCB), instead, the flexibility of the molecular conformation of PHEs contributed substantially to the effective binding energy and orientation. It is supposed that PHE residues adjust their own conformation to permit a suitable space for the ligand binding and form an orientation favorable for a biochemical reaction. This study has provided some insights into the role of PHEs in guiding the interactive adaptation of the active site of human CYP2E1 for the binding and metabolism of aromatic substrates.
ARTICLE | doi:10.20944/preprints202108.0248.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: Random fields; warped Gaussian Process; Spatial field reconstruction
Online: 11 August 2021 (10:39:35 CEST)
A class of models for non-Gaussian spatial random fields is explored for spatial field reconstruction in environmental and sensor network monitoring. The family of models explored utilises a class of transformation functions known as the Tukey g-and-h transformations to create a family of warped spatial Gaussian process models which can support various desirable features such as flexible marginal distributions, which can be skewed, leptokurtic and/or heavy-tailed. The resulting model is widely applicable in a range of spatial field reconstruction applications. To utilise the model in applications in practice, it is important to carefully characterise the statistical properties of the Tukey g-and-h random fields. In this work, we both study the properties of the resulting warped Gaussian processes as well as using the characterising statistical properties of the warped processes to obtain flexible spatial field reconstructions. In this regard, we derive five different estimators for various important quantities often considered in spatial field reconstruction problems. These include the multi-point Minimum Mean Squared Error (MMSE) estimators; the multiple point Maximum A-Posteriori (MAP) estimators; an efficient class of multiple-point linear estimators based on the Spatial-Best Linear Unbiased (S-BLUE) estimators; and two multi-point threshold exceedance based estimators, namely the Spatial Regional and Level Exceedance estimators. Simulation results and real data examples show the benefits of using the Tukey g-and-h transformation as opposed to standard Gaussian spatial random fields in a real data application for environmental monitoring.
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Random walk with resetting; Escape probabilities; Exit times
Online: 7 June 2021 (08:04:12 CEST)
We consider a discrete-time random walk (xt) which at random times is reset to the starting position and performs a deterministic motion between them. We show that the quantity Prxt+1=n+1|xt=n,n→∞ determines if the system is averse, neutral or inclined towards resetting. It also classifies the stationary distribution. Double barrier probabilities, first passage times and the distribution of the escape time from intervals are determined.
ARTICLE | doi:10.20944/preprints202101.0349.v1
Subject: Business, Economics And Management, Accounting And Taxation Keywords: Capital structure; Determinants; Microfinance Institutions; Random effect Model
Online: 18 January 2021 (14:50:08 CET)
The aim of this study was to identify MFIs specific determinants of capital structure of selected micro finance institutions in Ethiopia. The researcher employed quantitative research approach with explanatory research design. The result of regression analysis showed that out that variables like growth, profitability, firm size, age, and asset tangibility have positive and statistically significant effect on leverage ratio. Whereas, profitability has statistically significant and negative effect on capital structure. Based on the findings of the study, the researcher concluded that the firm specific determinants of capital structure of micro finance institutions in Ethiopia were growth, profitability, firm size, age, and asset tangibility.
ARTICLE | doi:10.20944/preprints202006.0028.v1
Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: genetic algorithm; search techniques; random tests; evolution; applications
Online: 4 June 2020 (07:44:03 CEST)
Nowadays genetic algorithm (GA) is greatly used in engineering pedagogy as an adaptive technique to learn and solve complex problems and issues. It is a meta-heuristic approach that is used to solve hybrid computation challenges. GA utilizes selection, crossover, and mutation operators to effectively manage the searching system strategy. This algorithm is derived from natural selection and genetics concepts. GA is an intelligent use of random search supported with historical data to contribute the search in an area of the improved outcome within a coverage framework. Such algorithms are widely used for maintaining high-quality reactions to optimize issues and problems investigation. These techniques are recognized to be somewhat of a statistical investigation process to search for a suitable solution or prevent an accurate strategy for challenges in optimization or searches. These techniques have been produced from natural selection or genetics principles. For random testing, historical information is provided with intelligent enslavement to continue moving the search out from the area of improved features for processing of the outcomes. It is a category of heuristics of evolutionary history using behavioral science-influenced methods like an annuity, gene, preference, or combination (sometimes refers to as hybridization). This method seemed to be a valuable tool to find solutions for problems optimization. In this paper, the author has explored the GAs, its role in engineering pedagogies, and the emerging areas where it is using, and its implementation.
ARTICLE | doi:10.20944/preprints201808.0018.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: Nuclear Magnetic Resonance Spectroscopy, Metabolomics, Biomarker, Random Forest.
Online: 1 August 2018 (11:30:39 CEST)
Background: Diabetes is among the most prevalent diseases worldwide, of all the affected individuals a significant proportion of the population remains undiagnosed because of a lack of specific symptoms early in this disorder and inadequate diagnostics. Diabetes and its associated sequela, i.e., comorbidity are associated with microvascular and macrovascular complications. As diabetes is characterized by an altered metabolism of key metabolites and regulatory pathways. Metabolic phenotyping can provide us with a better understanding of the unique set of regulatory perturbations that predispose to diabetes and its associated comorbidities. Methodology: The present study utilizes the analytical platform NMR spectroscopy coupled with Random Forest statistical analysis to identify the discriminatory metabolites of diabetes (DB) and diabetes-related comorbidity (DC) along with the healthy control (HC) subjects. A combined and pairwise analysis was performed, between the serum samples of HC (n=50), and DB (n=38), and DC (n=35) individuals to identify the discriminatory metabolites responsible for class separation. The perturbed metabolites were further rigorously validated using t-test, AUROC analysis to examine the statistical significance of the identified metabolites. Results: The DB and DC patients were well discriminated from HC. However, 15 metabolites were found to be significantly perturbed in DC patients compared to DB, the identified panel of metabolites are TCA cycle (succinate, citrate), methylamine metabolism (trimethylamine, methylamine, betaine), -intermediates; energy metabolites (glucose, lactate, pyruvate); and amino acids (valine, arginine, glutamate, methionine, proline and threonine). The metabolites were further used to identify the perturbed metabolic pathway and correlation of metabolites in DC patients. Conclusion: The 1H NMR metabolomics may prove a promising technique to differentiate and predict diabetes and its comorbidities on their onset or progression by determining the altered levels of the metabolites in serum.
ARTICLE | doi:10.20944/preprints201802.0008.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: Optimal Bayesian detection, information geometry, minimal error probability, Chernoff/Bhattacharyya upper bound, large random tensor, Fisher information, large random sensing matrix
Online: 1 February 2018 (16:32:04 CET)
The performance in terms of minimal Bayes’ error probability for detection of a high-dimensional random tensor is a fundamental under-studied difficult problem. In this work, we consider two Signal to Noise Ratio (SNR)-based detection problems of interest. Under the alternative hypothesis, i.e., for a non-zero SNR, the observed signals are either a noisy rank-R tensor admitting a Q-order Canonical Polyadic Decomposition (CPD) with large factors of size Nq R, i.e, for 1 q Q, where R, Nq ! ¥ with R1/q/Nq converge towards a finite constant or a noisy tensor admitting TucKer Decomposition (TKD) of multilinear (M1, . . . ,MQ)-rank with large factors of size Nq Mq, i.e, for 1 q Q, where Nq,Mq ! ¥ with Mq/Nq converge towards a finite constant. The detection of the random entries (coefficients) of the core tensor in the CPD/TKD is hard to study since the exact derivation of the error probability is mathematically intractable. To circumvent this technical difficulty, the Chernoff Upper Bound (CUB) for larger SNR and the Fisher information at low SNR are derived and studied, based on information geometry theory. The tightest CUB is reached for the value minimizing the error exponent, denoted by s?. In general, due to the asymmetry of the s-divergence, the Bhattacharyya Upper Bound (BUB) (that is, the Chernoff Information calculated at s? = 1/2) can not solve this problem effectively. As a consequence, we rely on a costly numerical optimization strategy to find s?. However, thanks to powerful random matrix theory tools, a simple analytical expression of s? is provided with respect to the Signal to Noise Ratio (SNR) in the two schemes considered. A main conclusion of this work is that the BUB is the tightest bound at low SNRs. This property is, however, no longer true for higher SNRs.
Subject: Environmental And Earth Sciences, Soil Science Keywords: and cover changes; soil mapping; random forest; plain areas
Online: 1 August 2023 (10:53:33 CEST)
The flat terrain in plain areas makes the land easily accessible for cultivation and farming, providing vast opportunities for agricultural development. Additionally, these areas are crucial for urban construction and economic growth. Soil mapping plays a crucial role in understanding soil characteristics and guiding land management practices. However, accurately mapping soils in plain regions can be challenging due to their low spatial variability and diverse land use types. This study focuses on the impact of land cover changes on the accuracy of soil mapping in plain areas, aiming to provide effective assistance in soil mapping through the analysis of their coupling relationship. Starting with a 20-year land cover change analysis, this study utilizes a unified approach that combines expert knowledge, mixed sampling methods, and random forest mapping techniques. The study incorporates environmental covariates that have minimal period influence and synergistically use NDVI (Normalized Difference Vegetation Index) and land cover data from the same year. The analysis is based on transition matrices, confusion matrices, and their derived indicators. The research findings indicate that Tongzhou District has experienced rapid development over the past 20 years, with the area of construction land nearly doubling. 29% of arable land has been converted into construction land, resulting in an increase in the accuracy of the soil map from 58.99% to 66.91% over the 20-year period. The soil change area during this period accounts for 16.5% of the total area, with 51.9% of the changed areas overlapping with land cover change areas. These overlapping regions are predominantly influenced by human activities. In terms of cultivated land types in the study area, the quantity of arable land has decreased by approximately 29% over the 20 years, while the proportion of sandy loam calcareous fluvo-aquic soil and light loam calcareous fluvo-aquic soil, which constitute nearly half of the soil types, has increased. These data demonstrate the coupling relationship between land cover changes and soil type variations, particularly the significant influence of human activities on soil structure. It is evident that on one hand, improving the extent of land use in plain areas enhances the credibility of soil mapping. On the other hand, human activities impact land cover, which in turn affects and reflects changes in the soil.
ARTICLE | doi:10.20944/preprints202306.1210.v1
Subject: Engineering, Marine Engineering Keywords: offshore wind; parameter inversion; pile-soil interaction; random search
Online: 16 June 2023 (10:13:49 CEST)
To deal with the uncertainties in modeling offshore wind turbines, we proposed a parameter inversion method for the pile-soil interaction model based on structural health monitoring results and the numerical model. The proposed parameter inversion method has a numerical model, an objective function selected using both the numerical and identified results, and an inverse optimization using a random search algorithm in the assumed parameter space. The parameter results in the minimum optimization objective function are identified as the in-situ parameter. The proposed method is confirmed to converge after some iterations, whatever the initial parameter values are. However, different initial parameter cases may converge to slightly different optimal parameters, implying the pile results are sensitive to geological parameters. Moreover, a comparison with the original design results shows design redundancy or risks. Though the proposed method has several flaws, it can shed some light on the influence of uncertainties in offshore wind turbines, such as soil parameters in geological surveys.
ARTICLE | doi:10.20944/preprints202306.0705.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: Phenology; Tillering; Random Forest; Crop type; Clustering, Unsupervised classification
Online: 9 June 2023 (11:04:40 CEST)
The rising global population amidst the growing concerns of climate change will have a dire consequence on global food security and socio-economic activities. Wheat is one of the most important staple foods consumed by more than four billion people in the world, but climate change impacts account for a decline of 5.5% in wheat yield and predictions indicate that the production could further dwindle by nearly 30% in 2050, due to trends in temperature, precipitation, and carbon dioxide. An effective annual crop estimate is necessary not only to inform government the status of national food security, but also is used to determine the benchmark on which agricultural commodities are priced in the market. Thus, annual crop monitoring and yield estimate is paramount to determine the amount of wheat imports required to make up for the shortfalls in the national wheat production in South Africa, which has been a net importer of wheat since 1998. A joint project between South Africa and Poland investigated satellite based-crop growth monitoring using Sentinel 2 and determined the most distinguishable crop phenology for an accurate winter wheat classification during the growing season from August – December with Random Forest (RF) algorithm. The winter wheat crop was more accurately identified during the crop ‘heading’ stage in October yielding the highest user’s (75.56%) and producer’s (92.52%) accuracies, despite the relatively lower overall accuracy (78.14%) compared to that of December with OA of 83.58% obtained during the maturity stage. This study, therefore, confirms the suitability of sentinel 2 for an effective phenology-based winter wheat crop classification during the heading stage, reducing the ambiguity of spectral confusion created with surrounding grass and maize crops.
ARTICLE | doi:10.20944/preprints202305.0778.v1
Subject: Computer Science And Mathematics, Computational Mathematics Keywords: Ant colony clustering algorithm; Random Forest; Fuzzy number; Classification
Online: 11 May 2023 (03:57:06 CEST)
Commercial banks usually classify customers according to their credit reports when making loans. In this study, we put our focus on classifying customers based on their credit reports from the People's Bank of China (PBC). Since there are no target labels of users in the credit report of the People's Bank of China, we put forward the fuzzy clustering method for the initial label, and then Construct ant colony search to optimize intelligent recognition. Finally, this study uses SVM, BP neural network, and random forest to classify users and compare their results. The research re-sults indicate that using ant colony clustering algorithm and random forest for classification is the most effective method with the PBC credit reports.
ARTICLE | doi:10.20944/preprints202209.0088.v1
Subject: Engineering, Mechanical Engineering Keywords: Short fiber-reinforced composite; Random fields; Plasticity; Numerical simulation
Online: 6 September 2022 (10:11:54 CEST)
For the numerical simulation of components made of short fiber-reinforced composites the correct prediction of the deformation including the elastic and plastic behavior and its spatial distribution is essential. When using purely deterministic modeling approaches the information of the probabilistic microstructure is not included in the simulation process. One possible approach for the integration of stochastic information is the use of random fields. In this study numerical simulations of tensile test specimens are conducted utilizing a finite deformation elastic-ideal plastic material model. A selection of the material parameters covering the elastic and plastic domain are represented by cross-correlated second-order Gaussian random fields to incorporate the probabilistic nature of the material parameters. To validate the modeling approach tensile tests until failure are carried out experimentally, that confirm the assumption of spatially distributed material behavior in both the elastic and plastic domain. Since the correlation lengths of the random fields cannot be determined by pure analytic treatments, additionally numerical simulations are performed for different values of the correlation length. The numerical simulations endorse the influence of the correlation length on the overall behavior. For a correlation length of 5mm a good conformity with the experimental results is obtained. Therefore, it is concluded, that the presented modeling approach is suitable to predict the elastic and plastic deformation of a set of tensile test specimens made of short fiber-reinforced composite sufficiently.
ARTICLE | doi:10.20944/preprints202208.0058.v1
Subject: Physical Sciences, Theoretical Physics Keywords: complexity; phase transitions; criticality; Ising model; random Boolean networks
Online: 2 August 2022 (09:30:37 CEST)
The dynamics of many complex systems can be classified as ordered, chaotic, or critical. Order offers stability and robustness, while chaos allows for change and adaptability. Criticality, then, is often seen as a balance required by living systems at different scales. In classical models, however, criticality is only found near phase transitions, restricting the parameter space (and thus the likelihood) of critical dynamics, as most parameters yield ``undesirable'' solutions. Here we show that this limitation is due to the homogeneity built-in these models, i.e., all elements sharing parameter values. By exploring heterogeneous versions of archetypal models in physics and computer science, we observe critical dynamics in a broader range of parameters, and thus could be more common than previously thought.
ARTICLE | doi:10.20944/preprints202207.0462.v1
Subject: Business, Economics And Management, Finance Keywords: Machine Learning; Random Forest; Google Trends; Predictability; Banks; Greece
Online: 29 July 2022 (13:07:42 CEST)
Background/Objectives: Accurate prediction of stock prices is an extremely challenging task because of factors such as political conditions, global economy, unexpected events, market anomalies, and relevant companies’ features. In this work, the random forest has been used to forecast the prices of the four major Greek systemic banks Methods/Analysis: We make use of a set of financial variables based on intraday data: (i) Open stock price, (ii) High stock price, (iii) Low stock price, and (iv) Close stock price of a particular Greek systemic bank. Results/Findings: The variables used here are crucial in predicting systemic banks' stock closing prices. These provide a better prediction of the next day's closing price of the bank series. Novelty /Improvement: To our knowledge, this is the first study that employs machine learning techniques in Greek systemic banks.
ARTICLE | doi:10.20944/preprints202112.0138.v2
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: Yield mapping; vegetation index; Stepwise; SR; Random Forest; KNN
Online: 9 December 2021 (15:39:34 CET)
The use of machine learning techniques to predict yield based on remote sensing is a no-return path and studies conducted on farm aim to help rural producers in decision-making. Thus, commercial fields equipped with technologies in Mato Grosso, Brazil, were monitored by satellite images to predict cotton yield using supervised learning techniques. The objective of this research was to identify how early in the growing season, which vegetation indices and which machine learning algorithms are best to predict cotton yield at the farm level. For that, we went through the following steps: 1) We observed the yield in 398 ha (3 fields) and eight vegetation indices (VI) were calculated on five dates during the growing season. 2) Scenarios were created to facilitate the analysis and interpretation of results: Scenario 1: All Data (8 indices on 5 dates = 40 inputs) and Scenario 2: best variable selected by Stepwise regression (1 input). 3) In the search for the best algorithm, hyperparameter adjustments, calibrations and tests using machine learning were performed to predict yield and performances were evaluated. Scenario 1 had the best metrics in all fields of study, and the Multilayer Perceptron (MLP) and Random Forest (RF) algorithms showed the best performances with adjusted R2 of 47% and RMSE of only 0.24 t ha-1, however, in this scenario all predictive inputs that were generated throughout the growing season (approx. 180 days) are needed, so we optimized the prediction and tested only the best VI in each field, and found that among the eight VIs, the Simple Ratio (SR), driven by the K-Nearest Neighbor (KNN) algorithm predicts with 0.26 and 0.28 t ha-1 of RMSE and 5.20% MAPE, anticipating the cotton yield with low error by ±143 days, and with important aspect of requiring less computational demand in the generation of the prediction when compared to MLP and RF, for example, enabling its use as a technique that helps predict cotton yield, resulting in time savings for planning, whether in marketing or in crop management strategies.
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: Microbiome; Diazotroph; Nitrogen fixation bacteria; Random Forest; Network; Trichomona
Online: 23 August 2021 (12:15:31 CEST)
Biofertilizer, an environment-friendly and renewable plant nutrient source, has been widely applied and studied to reduce dependency on chemical fertilizers. However, most studies focus on the effects of biofertilizer on the bacterial and fungal communities, and we still lack an understanding of biofertilizer on the protistan community. Here, the effects of biofertilizer application on the composition and interaction of the protistan community in the wheat rhizosphere were investigated based on a 4-year field experiment. Biofertilizer application altered soil physicochemical properties and the protistan community composition (ANOSIM, p < 0.001), and significantly induced an alpha diversity decline. Random forecast and redundancy analysis demonstrated that nitrogenase activity and available phosphorus were the main drivers. Trichomonas classified to the phylum Metamonada was enriched by biofertilizer, and was significantly positive connections with soil nitrogenase activity and some function genes involved in nitrogen-fixation and nitrogen-dissimilation. Biofertilization loosely connected biotic interactions, while did not affect the stability of the protistan community. Besides, biofertilizer promoted the connections of protists with fungi, bacteria, and archaea. Combined with the conjunct biotic network (protist, fungi, bacteria, and archaea) and interactions between protists and soil physicochemical properties/function genes, protists may act as keystone taxa potentially driving soil microbiome composition and function.
ARTICLE | doi:10.20944/preprints202012.0152.v1
Subject: Engineering, Automotive Engineering Keywords: Wind farm noise; Amplitude modulation; Random Forest; AM detection
Online: 7 December 2020 (12:51:54 CET)
Amplitude modulation (AM) is a characteristic feature of wind farm noise and has the potential to contribute to annoyance and sleep disturbance. This study aimed to develop an AM detection method using a random forest approach. The method was developed and validated on 6,000 10-second samples of wind farm noise manually classified by a scorer via a listening experiment. Comparison between the random forest method and other widely-used methods showed that the proposed method consistently demonstrated superior performance. This study also found that a combination of low-frequency content features and other unique characteristics of wind farm noise play an important role in enhancing AM detection performance. Taken together, these findings support that using machine learning-based detection of AM is well suited and effective for in-depth exploration of large wind farm noise data sets for potential legislative and research purposes.
ARTICLE | doi:10.20944/preprints202011.0436.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Space-filling curves; Ergodic Theory; uniform random number generation.
Online: 16 November 2020 (16:51:07 CET)
In this paper the problem of sampling from uniform probability distributions is approached by means of space-filling curves (SFCs), a topological concept that has found a number of important applications in recent years. Departing from the theoretical fact that they are surjective but not necessarilly injective, the investigation focused upon the structure of the distributions obtained when their domain is swept in a uniform and discrete manner, and the corresponding values used to build histograms, that are approximations of their true PDFs. This work concentrates on the real interval [0,1], and the Sierpinski space-filling curve was chosen because of its favorable computational properties. In order to validate the results, the Kullback-Leibler distance is used when comparing the obtained distributions in several levels of granularity with other already established sampling methods. In truth, the generation of uniform random numbers is a deterministic simulation of randomness using numerical operations. In this fashion, sequences resulting from this sort of process are not truly random.
ARTICLE | doi:10.20944/preprints202002.0108.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: machine learning; decision tree; random forest; crime data analytics
Online: 9 February 2020 (16:02:03 CET)
Machine learning plays a key role in present day crime detection, analysis and prediction. The goal of this work is to propose methods for predicting crimes classified into different categories of severity. We implemented visualization and analysis of crime data statistics in recent years in the city of Boston. We then carried out a comparative study between two supervised learning algorithms, which are decision tree and random forest based on the accuracy and processing time of the models to make predictions using geographical and temporal information provided by splitting the data into training and test sets. The result shows that random forest as expected gives a better result by 1.54% more accuracy in comparison to decision tree, although this comes at a cost of at least 4.37 times the time consumed in processing. The study opens doors to application of similar supervised methods in crime data analytics and other fields of data science
Subject: Chemistry And Materials Science, Metals, Alloys And Metallurgy Keywords: particle-hole symmetry; metal-insulator transition; random gap model
Online: 22 October 2019 (15:40:00 CEST)
We use a random gap model to describe a metal-insulator transition in three-dimensional semiconductors due to doping and find a conventional phase transition, where the effective scattering rate is the order parameter. Spontaneous symmetry breaking results in metallic behavior, whereas the insulating regime is characterized by the absence of spontaneous symmetry breaking. The transition is continuous for the average conductivity with critical exponent equal to 1. Away from the critical point the exponent is roughly 0.6, which may explain experimental observations of a crossover of the exponent from 1 to 0.5 by going away from the critical point.
ARTICLE | doi:10.20944/preprints201802.0007.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: Chaotic itineracy; random dynamics; computer aided proof; neural networks
Online: 1 February 2018 (10:04:20 CET)
We consider a random dynamical system arising in a model of associative memory. This system can be seen as a small (stochastic and deterministic) perturbation of a determinstic system having two weak attractors which are destroyed after the perturbation. We show, with a computer aided proof, that the system has a kind of chaotic itineracy. Typical orbits are globally chaotic, while they spend relatively long time visiting attractor's ruins.
REVIEW | doi:10.20944/preprints201705.0111.v1
Subject: Physical Sciences, Optics And Photonics Keywords: random fiber laser; Lévy statistics; photonic spin-glass behavior
Online: 15 May 2017 (11:59:43 CEST)
The interest in random fiber lasers (RFLs), first demonstrated one decade ago, is still growing and their basic characteristics have been studied by several authors. RFLs are open systems that present instabilities in the intensity fluctuations due to the energy exchange among their non-orthogonal quasi-modes. In this work, we present a review of the recent investigations on the output characteristics of a continuous-wave erbium-doped RFL, with emphasis on the statistical behavior of the emitted intensity fluctuations. A progression from the Gaussian to Lévy and back to the Gaussian statistical regime was observed by increasing the excitation laser power from below to above the RFL threshold. By analyzing the RFL output intensity fluctuations, the probability density function of emission intensities was determined, and its correspondence with the experimental results was identified, enabling a clear demonstration of the analogy between the RFL phenomenon and the spin-glass phase transition. A replica-symmetry-breaking phase above the RFL threshold was characterized and the glassy behavior of the emitted light was established. We also discuss perspectives for future investigations on RFL systems.
ARTICLE | doi:10.20944/preprints202309.1255.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: cellular automata; mean-field theory; gliders detection; complexity; random amplification
Online: 19 September 2023 (15:27:10 CEST)
Cellular automata are mathematical models that represent systems with complex behavior through simple interactions between their individual elements. These models can be used to study unconventional computational systems and complexity. One notable aspect of cellular automata is their ability to create structures known as gliders, which move in a regular pattern to represent the manipulation of information. This paper introduces the modification of mean-field theory applied to cellular automata, using random perturbations based on the system’s evolution rule. The original aspect of this approach is that the perturbation factor is tailored to the nature of the rule, altering the behavior of the mean-field polynomials. By combining the properties of both the original and perturbed polynomials, it is possible to detect when a cellular automaton is more likely to generate gliders without having to run evolutions of the system.This methodology is a useful approach to finding more examples of cellular automata that exhibit complex behavior. We start by examining elementary cellular automata, then move on to examples of automata that can generate gliders with more states. To illustrate the results of this methodology, we provide evolution examples of the detected automata.
ARTICLE | doi:10.20944/preprints202309.0693.v1
Subject: Public Health And Healthcare, Public Health And Health Services Keywords: Health literacy; Bibliometric Analysis; Public Health; Random Forest; Covid-19
Online: 12 September 2023 (05:35:03 CEST)
Introduction: In recent decades, health literacy, in connection with a broad range of public 1 health terms, has become a burgeoning field. This study aims to explore trends and biases in this 2 area through a bibliometric analysis. Methods: A Random Forest Model was utilized to identify 3 keywords and other metadata that predict annual citations in the field. In order to supplement 4 this machine learning analysis, we have also implemented a bibliometric review of the corpus. 5 Results: Findings indicate a high positive coefficient for the keyword ’Covid-19’ and ’Male’, whereas a 6 negative coefficient was observed for ’Female’, suggesting potential biases. Evolving themes such as 7 Covid-19, Mental Health, and Social Media were discovered. A significant shift was noted in the main 8 publishing journals, while the major contributing authors remained the same. Discussion: The results 9 hint at the influence of the Covid-19 pandemic and potential gender biases on citation likelihood, as 10 well as changing publication strategies despite the fact that the main researchers remain as the ones 11 that have been studying health literacy since its creation.
ARTICLE | doi:10.20944/preprints202308.1146.v1
Subject: Engineering, Mechanical Engineering Keywords: PHM; CBM; diagnosis; lightGBM; random forest; contextual diagnosis; RUL; forklift
Online: 16 August 2023 (11:33:09 CEST)
This study examined ways to prevent failures in the front-end of forklifts by addressing the center of gravity of heavy objects carried by forklifts, predicting the remaining useful lifetime (RUL), and the fault diagnosis based on alarm rules. In the research process, acceleration signals were acquired from the outer beam of the front-end of the forklift. A one-second window was applied to extract the time-domain statistical features, which were then set as variables. An exponentially weighted moving average was used to smooth the noise in the feature data set. The AWGN and LSTM autoencoders were used for data augmentation. Based on them, random forest and lightGBM models were used to develop classification models for the weight center of heavy objects carried by a forklift. In addition, contextual diagnosis performed by applying exponentially weighted moving averages to the classification probabilities of the machine learning models showed that random forest achieved an accuracy of 0.9563 and lightGBM achieved an accuracy of 0.9566. In addition, the acceleration data were collected through experiments to predict forklift failure and RUL because of the repeated forklift use when the center of heavy objects carried by the forklift was skewed to the right. The time-domain statistical features of the acceleration signals were extracted and set as variables by applying a 20-second window. Subsequently, logistic regression and random forest models were used to classify the failure stages of the forklifts. The f1-score (macro) obtained were 0.9790 and 0.9220 for logistic regression and random forest, respectively. In addition, the random forest probabilities for each stage were combined and averaged to generate a degradation curve and derive the failure threshold. The coefficient of the exponential function was calculated using the least squares method on the degradation curve, and the RUL prediction model was developed to predict the failure point. In addition, the SHAP algorithm was used to identify the significant features in classifying the stage. The alarm rule-based fault diagnosis was performed using the threshold of the normal stage distribution of the significant features.
ARTICLE | doi:10.20944/preprints202308.0841.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: quantum entanglement, negativity, fidelity, Bures-Hall ensemble, random matrix theory
Online: 10 August 2023 (11:09:23 CEST)
To estimate the degree of quantum entanglement, it is important to understand the statistical behavior of functions of spectrum of density matrices such as von Neumann entropy, quantum purity, and entanglement capacity. These entangle- ment metrics over different generic state ensembles have been studied intensively in the literature. As an alternative metric, in this work we study sum of square root spectrum of density matrices, which is relevant to negativity and fidelity in quantum information processing. In particular, we derive the exact mean and vari- ance of sum of square root spectrum over the Bures-Hall generic state ensemble extending known results obtained recently over the Hilbert-Schmidt ensemble.
ARTICLE | doi:10.20944/preprints202308.0660.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: FFT-Acoustic Descriptor; timbral variations; Random Forest algorithm; musical acoustics
Online: 8 August 2023 (10:47:10 CEST)
Quantitative evaluation of the musical timbre and its variations is important for the analysis of audio recordings and computer-aided music composition. Using the FFT acoustic descriptors and their representation in an abstract timbral space, variations of a sample of monophonic sounds of chordophones (violin, cello) and aerophones (trumpet, transverse flute, and clarinet) sounds are analyzed. It is concluded that the FFT acoustic descriptors allow us to distinguish the timbral variations of the musical dynamics, including crescendo and vibrato. Furthermore, using the Random Forest algorithm, it is shown that the FFT-Acoustic provides a statistically significant classification to distinguish musical instruments, family of instruments, and dynamics. We observed a better behavior for the FFT-Acoustic descriptors when classifying pitch compared to some timbral features of Librosa.
ARTICLE | doi:10.20944/preprints202307.0714.v1
Subject: Social Sciences, Safety Research Keywords: Organizational climate; Safety climate; Multiple mediation; Construction personnel; Random sample
Online: 12 July 2023 (03:06:04 CEST)
Organizational climate is the ascribed psychological meanings and significance associated with the procedures, policies and practices that are recognized and rewarded in the workplace, and hence mediates the effects of environmental stimuli on individuals’ response. Safety climate is a specific organizational climate, i.e., organizational climate for safety. Previous research claimed that organizational climate provides foundation for safety climate, but without elaboration on the foundational mechanisms. This paper attempts to fill up this knowledge gap. As organizational climate is a multi-dimensional phenomenon, this paper chooses two dimensions, i.e., perceived organizational support (POS) and participative decision-making (PaDM), for illustrative purposes. Drawing on an interactive approach to forming climate perceptions, this paper introduces two interactive constructs, i.e., leader-member exchange (LMX) and team member exchange (TMX), and establishes a multiple mediation model depicting the foundational effect of organizational climate on safety climate. A random sample of Hong Kong based construction personnel is used to validate the model. The results show that both POS and PaDM are positively associated with perceived safety climate, both LMX and TMX fully mediate the effect of PaDM on safety climate, and only LMX partially mediates the effect of POS on safety climate. This study sheds light on the foundational effects of organizational climate on safety climate. POS can improve the quality of reciprocal exchange about safety matters between construction personnel and their supervisors, and hence raise construction personnel’s awareness of the priority of safety. PaDM can improve the quality of reciprocal exchange about safety matters vertically and horizontally, and hence have construction personnel aware the importance of safety. In practice, this paper suggests that project managers timely recognize and reward construction personnel’s contribution, genuinely cares about their well-being, and take their suggestions seriously in making decisions. In this way, the quality of both vertical and horizontal exchange about safety matters improves, and a sound and positive safety climate ensues.
ARTICLE | doi:10.20944/preprints202212.0270.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: image encryption; high pixel density; neural networks; quantum random walk
Online: 15 December 2022 (06:50:34 CET)
This paper proposes an encryption scheme for high pixel density images. Based on the application of the quantum random walk algorithm, the Long short-term memory (LSTM) can effectively solve the problem of low efficiency of the quantum random walk algorithm in generating large-scale pseudorandom matrices, and further improve the statistical properties of the pseudorandom matrices required for encryption. The LSTM is then divided into columns and fed into the LSTM in order for training. Due to the randomness of the input matrix, the LSTM cannot be trained effectively, so the output matrix is predicted to be highly random. The LSTM prediction matrix of the same size as the key matrix is generated based on the pixel density of the image to be encrypted, which can effectively complete the encryption of the image. In the statistical performance test, the proposed encryption scheme achieves an average information entropy of 7.9992, an average number of pixels changed rate (NPCR) of 99.6231%, an average uniform average change intensity (UACI) of 33.6029% and an average correlation of 0.0032. Finally, various noise simulation tests are also conducted to verify its robustness in real-world applications where common noise and attack interference are encountered.
ARTICLE | doi:10.20944/preprints202211.0143.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Wave energy converter; Wave overtopping; Shape memory alloy; Random waves
Online: 8 November 2022 (03:17:28 CET)
A Shape Memory Alloy (SMA) enabled Overtopping Wave Energy Converter (OWEC) that maximizes its overtopping discharge, and thus energy output under different wave conditions is presented. Among all the parameters affecting the overtopping discharge rate, the crest freeboard height is the most influential one to control the overtopping discharge rate and the stored overtopping volume behind it. Currently, all the OWEC crest freeboard heights are fixed by design to maximize the discharge rate on one particular sea state. In the present study, we show that the SMA can adjust the crest freeboard height through a control system based on the sea state and achieve an optimal overtopping discharge rate. A scaled OWEC model is built in the lab with its crest freeboard height controlled by springs made of SMA. The length (and thus tension) of the springs is controlled by temperature changes by changing the passing current through the springs. By adjusting the length of the springs based on the incoming wave condition, we adjust the freeboard to an optimal height known to generate a maximum overtopping discharge rate for energy conversion. This smart material-enabled design can maximize the overtopping discharge and thus the output power of the OWEC under various wave conditions. Furthermore, the simplicity of using SMA springs as the actuator leads to the minimum number of moving mechanical parts, which can remarkably decrease maintenance costs. As the proof of concept, two types of tests are conducted in the laboratory using the same OWEC model under several random wave trains generated from spectra with different significant wave heights - one type with a fixed crest freeboard height and the other type featuring the adjustable crest freeboard height controlled by the springs. The substantial increase of harvested output power in the OWEC with the adjustable crest freeboard height may pave the way for more efficient wave energy conversion systems.
ARTICLE | doi:10.20944/preprints202210.0402.v1
Subject: Chemistry And Materials Science, Other Keywords: QSAR; q-RASAR; random forest; machine learning; TiO2-based nanoparticles
Online: 26 October 2022 (07:36:50 CEST)
Read-Across Structure-Activity Relationship (RASAR) is an emerging cheminformatic approach that combines the usefulness of a QSAR model and similarity-based Read-Across predictions. In this work, we have generated a simple, interpretable, and transferable quantitative-RASAR (q-RASAR) model which can efficiently predict the cytotoxicity of TiO2-based multi-component nanomaterials. The data set involves 29 TiO2-based nanomaterials which contain specific amounts of noble metal precursors in the form of Ag, Au, Pd, and Pt. The data set was rationally divided into training and test sets and the Read-Across-based predictions for the test set were generated using the tool Read-Across-v4.1 available from https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home. The hyperparameters were optimized based on the training set data and using this optimized setting, the Read-Across-based predictions for the test set were obtained. The optimized hyperparameters and the similarity approach, which yields the best predictions, were used to calculate the similarity and error-based RASAR descriptors using the tool RASAR-Desc-Calc-v2.0 available from https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home. These RASAR descriptors were then clubbed with the physicochemical descriptors and were subjected to features selection using the tool Best Subset Selection v2.1 available from https://dtclab.webs.com/software-tools. The final set of selected descriptors was used to develop multiple linear regression based q-RASAR models, which were validated using stringent criteria as per the OECD guidelines. Finally, a random forest model was also developed with the selected descriptors. The final machine learning model can efficiently predict the cytotoxicity of TiO2-based multi-component nanomaterials superseding previously reported models in the prediction quality.
SHORT NOTE | doi:10.20944/preprints202210.0350.v1
Subject: Engineering, Mechanical Engineering Keywords: composite preparation; random fiber design; natural frequency; moderate thick plates
Online: 24 October 2022 (07:09:48 CEST)
The experimental verification for the computational method sometimes varies due to numerous factors such as the manufacturing process and the materials' property change due to environmental aspects. In this work, we performed verification of experimental and computational evaluation of a hybrid composite moderate thick plate. The experiment was performed with simplistic approaches and without the advanced tools of preparing composite materials. This is due to the fact that most of the students in many developing countries around the world cannot have access to such equipment. As such, in this research, we are presenting cheap and easy preparation methods, with some details, for even equipment calibration and some tricks to attain a reliable composite structure for educational purposes. Moreover, the software and solvers used in this study are freely provided by the supplier for educational purposes. This study examined two methods for producing carbon and glass/polyester composite plates and discussed which one was best based on mechanical properties for different volume fractions, random stacking sequences, and ply angles (using OCTAVE's random estimation program). It also determined the three natural frequencies experimentally and with the aid of ANSYS. Less than 6% separated the experimentally determined natural frequencies from the calculated results.
ARTICLE | doi:10.20944/preprints202210.0070.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: vegetation indices; NDVI; RGB images; Deep Forest; Random Kernel Forests
Online: 7 October 2022 (07:26:52 CEST)
Vegetation indexes help perform precision farming because they provide useful information regarding moisture, nutrient content, and crop health. Primary sources of those indexes are satellites and unmanned aerial vehicles equipped with expensive multispectral sensors. Reducing the price of obtaining such information would increase the availability of precision farming for small farms. Several studies have proposed deep neural network methods to estimate the indexes from RGB color images. However, these methods report relatively large errors for mature plants when highly non-linear relationships of images and vegetation indexes arise. One could apply multilayer random forest-based models (Deep Forests) to solve this problem, but the discriminative power of such models is limited: they cannot catch complex dependencies between image features. In this paper, we propose a method that combines ideas of deep forests, random forests of kernel trees, and global pruning of random forests to tackle the problem. As a result, the method considers the properties of objects with a complex structure: the presence of relationships between groups of features, displacement, and scaling of objects. The experimental results show that the proposed method outperforms neural network-based solutions in several datasets.
ARTICLE | doi:10.20944/preprints202206.0356.v1
Subject: Computer Science And Mathematics, Mathematics Keywords: optimization; video segmentation; decision tree; random forest; gradient boost tree
Online: 27 June 2022 (08:56:21 CEST)
Video segmentation is crucial in a variety of practical applications especially in computer visions. Most of recent works in video segmentation are focusing on Deep learning based video segmentation, there are rooms for improvement in respect of the evolutionary algorithms. This paper aims to propose the novel method to video segmentation by using the optimization of segmentation parameters based on ensemble-based random forest and gradient boosting decision tree. The experimental results show Pareto front of segmentation parameters (hue, brightness, luminance, and saturation). Our optimization model yields accuracy: 85% +/-8.85 % (micro average: 85.00 %), average class precision: 84.88%, and average class recall: 85%. We also show the video segmentation results based on our optimization method and compare our results with Kinect-based video segmentation.
ARTICLE | doi:10.20944/preprints202109.0460.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: Space-filling curves; Ergodic Theory; random number generation; Gaussian distribution
Online: 28 September 2021 (09:56:55 CEST)
This work addresses the problem of sampling from Gaussian probability distributions by means of uniform samples obtained deterministically and directly from space-filling curves (SFCs), a purely topological concept. To that end, the well-known inverse cumulative distribution function method is used, with the help of the probit function,which is the inverse of the cumulative distribution function of the standard normal distribution. Mainly due to the central limit theorem, the Gaussian distribution plays a fundamental role in probability theory and related areas, and that is why it has been chosen to be studied in the present paper. Numerical distributions (histograms) obtained with the proposed method, and in several levels of granularity, are compared to the theoretical normal PDF, along with other already established sampling methods, all using the cited probit function. Final results are validated with the Kullback-Leibler and two other divergence measures, and it will be possible to draw conclusions about the adequacy of the presented paradigm. As is amply known, the generation of uniform random numbers is a deterministic simulation of randomness using numerical operations. That said, sequences resulting from this kind of procedure are not truly random. Even so, and to be coherent with the literature, the expression ”random number” will be used along the text to mean ”pseudo-random number”.
ARTICLE | doi:10.20944/preprints202107.0389.v1
Subject: Engineering, Automotive Engineering Keywords: Centralized fusion estimation, Random delay systems, Tessarine processing, Tk properness.
Online: 16 July 2021 (16:30:57 CEST)
The centralized fusion estimation problem for discrete-time vectorial tessarine signals in multiple sensor stochastic systems with random one-step delays and correlated noises is analyzed under different T-properness conditions. Based on Tk, k=1,2, linear processing, new centralized fusion filtering, prediction, and fixed-point smoothing algorithms are devised. These algorithms have the advantage of providing optimal estimators with a significant reduction in computational cost compared to that obtained through a real or widely linear processing approach. Simulation examples illustrate the effectiveness and applicability of the algorithms proposed, in which the superiority of the Tk linear estimators over their counterparts in the quaternion domain is apparent.
ARTICLE | doi:10.20944/preprints202010.0564.v1
Subject: Engineering, Automotive Engineering Keywords: robotics; autonomy obstacle avoidance; path optimization; genetic algorithm; random search
Online: 27 October 2020 (20:44:30 CET)
In the rescue operations the full time of action plays important role. It is a sum of planning, travel, and manipulation (in the action place) phases times. The time minimization of first two phases by autonomous vehicle for remote action is considered in the paper. For known a priori map the path planning consists of local optimal decision collected next in the general algorithm of the optimal path. Such approach significantly reduces time of path planning. The robot features and known sparse obstacles reduce the allowable robot speeds. The time of travel is calculated from allowable velocity profile. So, it can be used to estimate the travel performance. Genetic algorithm and random search-based methods for path finding with travel time optimization are exploited and compared in the paper. All the proposed time optimisation solutions of rescue operation are checked during computer simulations and results of simulation are presented.
ARTICLE | doi:10.20944/preprints202008.0089.v1
Subject: Environmental And Earth Sciences, Geophysics And Geology Keywords: Deep Neural Network; Extreme Gradient Boosting; Random Forest; Landslide Susceptibility
Online: 4 August 2020 (11:13:02 CEST)
Landslides impact on human activities and socio-economic development especially in mountainous areas. This study focuses on the comparison of the prediction capability of advanced machine learning techniques for rainfall-induced shallow landslide susceptibility of Deokjeokri catchment and Karisanri catchment in South Korea. The influencing factors for landslides i.e. topographic, hydrologic, soil, forest, and geologic factors are prepared from various sources based on availability and a multicollinearity test is also performed to select relevant causative factors. The landslide inventory maps of both catchments are obtained from historical information, aerial photographs and performing field survey. In this study, Deokjeokri catchment is considered as a training area and Karisanri catchment as a testing area. The landslide inventories content 748 landslide points in training and 219 points in testing areas. Three landslide susceptibility maps using machine learning models i.e. Random Forest (RF), Extreme Gradient Boosting (XGBoost) and Deep Neural Network (DNN) are prepared and compared. The outcomes of the analyses are validated using the landslide inventory data. A receiver operating characteristic curve (ROC) method is used to verify the results of the models. The results of this study show that the training accuracy of RF is 0.757 and the testing accuracy is 0.74. Similarly, training accuracy of XGBoost is 0.756 and testing accuracy is 0.703. The prediction of DNN revealed acceptable agreement between susceptibility map and the existing landslides with training and testing accuracy of 0.855 and 0.802, respectively. The results showed that, the DNN model achieved lower prediction error and higher accuracy results than other models for shallow landslide modeling in the study area
ARTICLE | doi:10.20944/preprints202003.0036.v1
Subject: Medicine And Pharmacology, Other Keywords: ECG feature selection; heartbeat classification; arrhythmia detection; random forest classifier
Online: 3 March 2020 (11:12:20 CET)
Finding an optimal combination of features and classifier is still an open problem in the development of automatic heartbeat classification systems, especially when applications that involve resource-constrained devices are considered. In this paper, a novel study of the selection of informative features and the use of a random forest classifier while following the recommendations of the Association for the Advancement of Medical Instrumentation (AAMI) and an inter-patient division of datasets is presented. Features were selected using a filter method based on the mutual information ranking criterion on the training set. Results showed that normalized R-R intervals and features relative to the width of the QRS complex are the most discriminative among those considered. The best results achieved on the MIT-BIH Arrhythmia Database were an overall accuracy of 96.14% and F1-scores of 97.97%, 73.06%, and 90.85% in the classification of normal beats, supraventricular ectopic beats, and ventricular ectopic beats respectively. In comparison with other state of the art approaches tested under similar constraints, this work represents one of the highest performances reported to date while relying on a very small feature vector.
ARTICLE | doi:10.20944/preprints201908.0056.v1
Subject: Chemistry And Materials Science, Materials Science And Technology Keywords: CO2 separation; random copolymer; PIM-polyimide; permeability-selectivity; pressure effect
Online: 5 August 2019 (08:07:23 CEST)
Random copolymers made of both (PIM-polyimide) and (6FDA-durene-PI) were prepared for the first time by a facile one-step polycondensation reaction. By combining the highly porous and contorted structure of PIM (polymers with intrinsic microporosity) and high thermomechanical properties of PI (polyimide), the membranes obtained from these random copolymers [(PIM-PI)x-(6FDA-durene-PI)y] showed high CO2 permeability (> 1047 Barrer) with moderate CO2/N2 (> 16.5) and CO2/CH4 (> 18) selectivity, together with excellent thermal and mechanical properties. The membranes prepared from three different compositions of two comonomers (1:4, 1:6 and 1:10 of x:y), all showed similar morphological and physical properties, and gas separation performance, indicating ease of synthesis and practicability for large-scale production. The gas separation performance of these membranes at various pressure ranges (100–1500 torr) was also investigated.
ARTICLE | doi:10.20944/preprints201907.0158.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: Cunninghamia lanceolate; UAVs; hyperspectral camera; machine learning; random forests; XGBoost
Online: 11 July 2019 (11:41:33 CEST)
Accurate measurements of tree height and diameter at breast height (DBH) in forests to evaluate the growth rate of cultivars is still a significant challenge, even when using LiDAR and 3-D modeling. We propose an integrated pipeline methodology to measure the biomass of different tree cultivars in plantation forests with high crown density which that combines unmanned aerial vehicles (UAVs), hyperspectral image sensors, and data processing algorithms using machine learning. Using a planation of Cunninghamia lanceolate, commonly known as Chinese fir, in Fujian, China, images were collected using a hyperspectral camera and orthorectified in HiSpectral Stitcher. Vegetation indices and modeling were processed in Python using decision trees, random forests, support vector machine, and eXtreme Gradient Boosting (XGBoost) third-party libraries. Tree height and DBH of 2880 samples were measured manually and clustering into three groups: “fast growth,” “median,” growth and “normal” growth group, and 19 vegetation indices from 12,000 pixels were abstracted as the input of features for the modeling. After modeling and cross-validation, the classifier generated by random forests had the best prediction accuracy compare to other algorisms (75%). This framework can be applied to other tree species to make management and business decisions.
ARTICLE | doi:10.20944/preprints201904.0244.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: salient object; local binary pattern; histogram features; conditional random field
Online: 22 April 2019 (11:40:11 CEST)
We propose a novel method for salient object detection in different images. Our method integrates spatial features for efficient and robust representation to capture meaningful information about the salient objects. We then train a conditional random field (CRF) using the integrated features. The trained CRF model is then used to detect salient objects during the online testing stage. We perform experiments on two standard datasets and compare the performance of our method with different reference methods. Our experiments show that our method outperforms the compared methods in terms of precision, recall, and F-Measure.
ARTICLE | doi:10.20944/preprints201710.0086.v2
Subject: Engineering, Electrical And Electronic Engineering Keywords: cluster head; dead node; random; vicinity; modulation; index; survival; overhead
Online: 23 October 2017 (08:06:47 CEST)
As Heterogeneous Wireless Sensor Network (HWSN) fulfill the requirements of researchers in the design of real life application to resolve the issues of unattended problem. But, the main constraint face by researchers is energy source available with sensor nodes. To prolong the life of sensor nodes and hence HWSN, it is necessary to design energy efficient operational schemes. One of the most suitable routing scheme is clustering approach, which improves stability and hence enhances performance parameters of HWSN. A novel solution proposed in this article is to design energy efficient clustering protocol for HWSN, to enhance performance parameters by EECPEP-HWSN. Propose protocol is designed with three level nodes namely normal, advance and super node respectively. In clustering process, for selection of cluster head we consider three parameters available with sensor node at run time, i.e., initial energy, hop count and residual energy. This protocol enhance the energy efficiency of HWSN, it improves performance parameters in the form of enhance energy remain in the network, force to enhance stability period, prolong lifetime and hence higher throughput. It is been found that proposed protocol outperforms than LEACH, DEEC and SEP with about 188, 150 and 141 percent respectively.
ARTICLE | doi:10.20944/preprints202309.0444.v1
Subject: Environmental And Earth Sciences, Geophysics And Geology Keywords: InSAR; landslide susceptibility; random forest; support vector machine; convolutional neural network
Online: 7 September 2023 (04:03:25 CEST)
Landslide is one of the most common geological disasters in China, which is characterized by suddenness and uncertainty, and it is difficult to realize accurate identification, early warning and forecasting of landslide disaster by conventional means. With the development of high-resolution remote sensing satellites and InSAR surface deformation monitoring technology, the traditional means of landslide monitoring data sources are limited, and there is a lack of effective methods to excavate the characteristics of the spatial distribution of landslide hazards and their triggering factors and other issues. In this paper, the area extending 10 km outside the VII isobar of the Gengma earthquake is taken as the study area, and 13 evaluation factors are screened out by integrating the factors of InSAR surface deformation, topography and geological environment, and the Bayesian Optimized Convolutional Neural Network (BO-CNN) is used for the evaluation of landslide susceptibility, and the BO-RF and PSO-SVM models are selected for the comparative analysis. The model accuracy evaluation was carried out by three indexes: ROC curve, AUC value and FR value, in which the ROC curves of PSO-SVM, BO-RF and BO-CNN were all close to the upper-left corner of the corner, and the AUC values were 0.9388, 0.9529, and 0.9535, respectively, and the FR value of landslide in the high susceptibility area of BO-CNN was as high as 14.9, and was higher than that of PSO-SVM and BO-RF, respectively. SVM and BO-RF model is 4.55 and 3.69 higher, the experimental results show that the BO-CNN model used in this paper has a better effect in landslide susceptibility evaluation, and the research results of the local government's disaster prevention and mitigation measures are of great significance.
ARTICLE | doi:10.20944/preprints202307.0194.v1
Subject: Medicine And Pharmacology, Neuroscience And Neurology Keywords: general taste status; taste loss; supervised learning regression; random forest regressor
Online: 4 July 2023 (10:26:21 CEST)
In healthy humans, taste sensitivity varies widely, influencing food selection and nutritional status. Chemosensory reductions have been associated with numerous pathological disorders or pharmacological interventions. Reliable psychophysical methods are crucial resources to analyze the taste function during routine clinical assessment. However, in the daily clinical routine, they are often considered to be too time-consuming. We used the Supervised Learning (SL) regression method to analyze with high precision the overall taste status of healthy controls (HC) and patients with chemosensory loss and to characterize the combination of responses that best can predict the overall taste status of two groups. Random Forest regressor allowed us to achieve our objective. The analysis of the order of importance and impact of each parameter on the prediction of overall taste status in the two groups showed that salty (low concentration) and sour (high concentration) stimuli specifically characterized healthy subjects, while bitter (high concentration) and astringent (high concentration) stimuli identified patients with chemosensory loss. The identification of these distinctions appears to be of interest to the health system since they may allow the use of specific stimuli during routine clinical assessments of taste function reducing the commitment in terms of time and costs.
ARTICLE | doi:10.20944/preprints202305.0373.v2
Subject: Physical Sciences, Theoretical Physics Keywords: Information; Quantum Physics; Matrix Exponential Function; Bernoulli Random Walk; BRW; Statistics
Online: 25 June 2023 (04:49:47 CEST)
Information is physically measurable as a selection from a set of possibilities, the domain of information. This defines the term "information". The domain of the information must be known together reproducibly beforehand. As a practical consequence, digital information exchange can be made globally efficient, interoperable and searchable to a large extent by online definition of application-optimized domains of information. There are even more far-reaching consequences for physics. The purpose of this article is to present prerequisites and possibilities for a physical approach that is consistent with the precise definition of information. This concerns not only the discretization of the sets of possible experimental results, but also the order of their definition along time. The access to or comparison with the domain of information is the more frequent, the earlier it was defined. The geometrical appearance of our space is apparently a delayed statistical consequence of a very frequent connection with the common primary domain of information.
ARTICLE | doi:10.20944/preprints202306.1612.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: light pollution; entropy weight method; Markov random fields; simulated annealing algorithm
Online: 22 June 2023 (11:48:40 CEST)
In this paper, a Markov random field model is proposed to determine a site’s light pollution risk level. The specific data of 12 indicators of five typical cities in China is first collected to establish a hierarchical indicator system using an R-type clustering algorithm. Then, the entropy weight method is used to filter, determine 10-factor indicators, three potential impact indicators, and a light pollution risk index, and establish an undirected probability map model. A light pollution measurement based on Markov random field is obtained, and a location-based light pollution risk assessment index (LBLPRAI) is proposed. LBLPRAI of different types of sites is analyzed, and three possible intervention strategies are proposed to solve the light pollution problem: road lighting system planning, increasing vegetation coverage, and building system planning. Finally, the simulated annealing algorithm is used to determine the best intervention strategy, and it is concluded that the use of strategy 1 in the urban community 2 is the most effective measure, which can reduce the risk level of light pollution by 17.2%.
CONCEPT PAPER | doi:10.20944/preprints202305.1909.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: multivariate Gaussians; correlated random variables; visualization; entropy; relative entropy; mutual information
Online: 26 May 2023 (10:00:47 CEST)
The fundamental objective is to study the application of multivariate sets of data in Gaussian distribution. This paper examines broad measurements of structure for both Gaussian and non-Gaussian distributions, which shows that they can be described in terms of the infor-mation-theoretic between the given covariance matrix and correlated random variables (in terms of relative entropy). In order to develop the multivariate Gaussian distribution with entropy and mutual information, several significant methodologies are presented through the discussion supported by illustrations, both technically and statistically. The content obtained allows readers to better perceive concepts, comprehend techniques, and properly execute software programs for future study on the topic's science and implementations. It also helps readers grasp the themes' fundamental concepts. Involving the relative entropy and mutual information as well as the potential correlated covariance analysis based on differential equations, a wide range of information is addressed, including basic to application concerns.
ARTICLE | doi:10.20944/preprints202305.1769.v1
Subject: Engineering, Transportation Science And Technology Keywords: intercity bus; accident; severity; probability model; random parameter; ordered logit; heterogeneity
Online: 25 May 2023 (08:52:26 CEST)
As intercity buses are a mode that moves large scaled occupancy between regions, it accounts for the mode share-means for mid to long-distance movement in South Korea. However, the study on intercity bus safety needs to be more extensive, and safety policies are carried out based on traditional probability models without considering the data characteristics of bus accidents. Therefore, in this study, the Random Parameter Ordered Logit model was applied to derive fixed parameter factors that have the same effect on the severity of intercity bus accidents and random parameters that consider the heterogeneity of unique attributes by accident. It also analyzed the marginal effect of intercity bus accident severity. As a result of the study discovered that the influencing factors that reflect heterogeneity with Random Parameters were driver’s condition: drowsiness, vehicle size: medium, crash type: vehicle-pedestrian accident, road condition: wet pavement, and log form of AADT. The random parameter ordered logit model was traditionally found to be more suitable than the ordinal logit model, which only reflects fixed factors and more reliable predictions considering the heterogeneity of accident characteristics for each observation.
ARTICLE | doi:10.20944/preprints202304.0653.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: land cover; sentinel-2 images; random forest; boreal forest; alpine tundra
Online: 20 April 2023 (10:51:55 CEST)
A land cover map of two arctic catchments, nearby the Abisko Scientific Research Station, was obtained from a classification of a Sentinel-2 satellite image and a ground survey performed in July 2022. The two contiguous catchments, Miellajokka and Stordalen, are covered by various ecotypes, from boreal forest to alpine tundra and peatland. The Random Forest algorithm correctly identified 83% of polygon pixels reserved for testing. The developed workflow relied solely on open source software and acquired ground observations. Space organization was directed by the altitude as shown by the intersection of the land cover with the topography. Comparison between this new land cover map and previous ones based on data acquired between 2008 and 2011 demonstrates some trends of vegetation cover evolution in response to climate change in the considered area. The potential applications in terms of permafrost modeling (hiperborea.omp.eu) are finally discussed.
ARTICLE | doi:10.20944/preprints202304.0056.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: Trophic level index; spectral indices change; spectral signatures; Random Forest algorithm.
Online: 5 April 2023 (11:02:28 CEST)
Continuous water resources monitoring is needed for sustainable urban water supply. Remote sensing techniques have proven useful for monitoring some water qualitative parameters with optical characteristics. The study area was the Marateca reservoir in central inland Portugal. The aims were the following: (1) to explore the water quality parameters at the monitoring points of the Marateca reservoir that may explain the event; (2) to validate optical water quality parameters with the monitoring points data; and (3) to model the reservoir water characteristics regarding its deepness, trophic state, and turbidity. The parameters total phosphorus, total nitrogen, and chlorophyll-a were used to compute a trophic level index. The Sentinel-2 imagery was used to compute spectral indices and bands image ratio; to obtain spectral signatures for the monitoring points, and to model water characteristics. The water parameters were above the recommended values at the reservoir entry point from the Ocreza River. The reservoir trophic level was Hypereutrophic and Eutrophic. The spectral signatures confirmed a Hypereutrophic pattern in the entry point. The Marateca reservoir’s water characteristics modeling forecasted problematic zones by contamination. The methodological approach developed can be easily applied to other reservoirs and is a key support tool for decision-makers.
ARTICLE | doi:10.20944/preprints202302.0229.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: branching random walks; moments of particle numbers; evolution operator; Green’s function
Online: 14 February 2023 (03:29:07 CET)
We consider a new model of a branching random walk on a multidimensional lattice with continuous time and one source of particle reproduction and death, as well as an infinite number of sources in which, in addition to the walk, only absorption of particles can occur. The asymptotic behavior of the integer moments of both the total number of particles and the number of particles at a lattice point is studied depending on the relationship between the model parameters. In the case of the existence of an isolated positive eigenvalue of the evolution operator of the average number of particles, a limit theorem is obtained on the exponential growth of both the total number of particles and the number of particles at a lattice point.
ARTICLE | doi:10.20944/preprints202208.0481.v1
Subject: Computer Science And Mathematics, Discrete Mathematics And Combinatorics Keywords: Random normalization; thinning operators; Bernstein Theorem; problem of moments; Sibuya distribution
Online: 29 August 2022 (09:43:57 CEST)
Different variants of thinning for discrete random variables are studied. The thinning procedure allows to introduce an analog of scale parameter for positive integer-valued random variables. Sufficient and necessary conditions for the existence of such a scale are given.