ARTICLE | doi:10.20944/preprints202208.0179.v1
Subject: Life Sciences, Other Keywords: In-house validation study; reproducibility precision; measurement uncertainty; prediction interval; uncertainty interval
Online: 9 August 2022 (10:56:40 CEST)
Measurement uncertainty is typically expressed in terms of a symmetric interval , where denotes the measurement result and the expanded uncertainty. However, in the case of heteroscedasticity, symmetric uncertainty intervals can be misleading. In this paper, a different approach for the calculation of uncertainty intervals is introduced. This approach is applicable when a validation study has been conducted with samples with known concentrations. It will be shown how, under certain circumstances, asymmetric uncertainty intervals arise quite naturally and lead to more reliable uncertainty intervals.
ARTICLE | doi:10.20944/preprints202112.0391.v1
Subject: Medicine & Pharmacology, Pediatrics Keywords: online prediction; CYP21A2; mutation analysis; pathogenicity prediction
Online: 23 December 2021 (12:00:40 CET)
Context: CYP21A2 deficiency represents 95% of congenital adrenal hyperplasia cases (CAH), a group of genetic disorders that affect steroid biosynthesis. The genetic and functional analysis provides critical tools to elucidate complex CAH cases. One of the most accessible tools to infer the pathogenicity of new variants is in silico prediction. Objective: Analyze the performance of in silico prediction tools to categorize missense single nucleotide variants (SNVs) of the CYP21A2. Methods: SNVs of the CYP21A2 characterized in vitro by functional assays were selected to assess the performance of online single and meta predictors. SNVs were tested separately or in combination with the related phenotype (severe or mild CAH form). In total, 103 SNVs of the CYP21A2 (90 pathogenic and 13 neutral) were used to test the performance of 13 single-predictors and four meta-predictors. Results: SNVs associated with the severe phenotypes were well categorized by all tools, with an accuracy between 0.69 (PredictSNP2) and 0.97 (CADD), and Matthews' correlation coefficient (MCC) between 0.49 (PoredicSNP2) and 0.90 (CADD). However, SNVs related to the mild phenotype had more variation, with the accuracy between 0.47 (S3Ds&GO and MAPP) and 0.88 (CADD), and MCC between 0.18 (MAPP) and 0.71 (CADD). Conclusion: From our analysis, we identified four predictors of CYP21A2 pathogenicity with good performance. These results can be used for future analysis to infer the impact of uncharacterized SNVs' in CYP21A2.
ARTICLE | doi:10.20944/preprints201808.0500.v1
Subject: Earth Sciences, Atmospheric Science Keywords: East Asian summer monsoon, Seasonal prediction, dynamic prediction, summer rainfall prediction, NESM3.0, ENSO teleconnection
Online: 29 August 2018 (13:42:45 CEST)
It has been an outstanding challenge for global climate models to simulate and predict East Asia (EA) summer monsoon (EASM) rainfall. This study evaluates the dynamical hindcast skills with the newly developed Nanjing University of Information Science and Technology Earth System Model version 3.0 (NESM3.0). To improve the poor prediction of an earlier version of NESM3.0, we have modified convective parameterization schemes to suppress excessive deep convection and enhance insufficient shallow and stratiform clouds. The new version of NESM3.0 with modified parameterizations (MOD hereafter) yields significantly improved rainfall prediction in the northern and southern China but not over the Yangtze River Valley. The improved prediction is primarily attributed to the improvements in the predicted climatological summer mean rainfall and circulations, seasonal march of the subtropical rain belt, Nino 3.4 SST anomaly, and the rainfall anomalies associated with the development and decay of El Nino events. However, the MOD still has notable biases in the predicted leading mode of interannual variability of precipitation. The leading mode captures the dry (wet) anomalies over the South China Sea (northern EA) but misplaced precipitation anomalies over the Yangtze River Valley. The model can capture the interannual variation of the circulation indices very well, but the bias in the circulation-rainfall connection caused predicted rainfall errors. The results here suggest that over EA land regions, the skillful rainfall prediction relies on not only model’s capability in predicting better summer mean and seasonal march of rainfall and ENSO teleconnection with EASM, but also accurate prediction of the leading modes of interannual variability.
ARTICLE | doi:10.20944/preprints202205.0091.v1
Subject: Medicine & Pharmacology, Oncology & Oncogenics Keywords: risk prediction; prediction models; risk of bias; PROBAST; melanoma
Online: 7 May 2022 (03:50:41 CEST)
Rising incidences of cutaneous melanoma have fueled the development of statistical models that predict the individual melanoma risk. Our aim was to assess the validity of published prediction models for incident cutaneous melanoma using a standardized procedure based on PROBAST (Prediction model Risk Of Bias ASsessment Tool). We included studies that were identified by a recent systematic review and updated the literature search to ensure that our PROBAST rating included all relevant studies. Six reviewers assessed the risk of bias (ROB) for each study using the published “PROBAST Assessment Form” that consists of four domains and an overall rating of ROB. We further examined a temporal effect regarding changes in overall and domain-specific ROB rating distributions. Altogether 42 studies were assessed, of which a vast majority (n=34; 81%) was rated as having high ROB. Only one study was judged as having low ROB. The main reasons for high ROB ratings were the use of hospital controls in case-control studies and the omission of any validation of prediction models. However, our results of the temporal analysis showed a significant reduction in the number of studies with high ROB for the domain analysis. Nevertheless, the evidence base of high-quality studies that can be used to draw conclusions on the prediction of incident cutaneous melanoma is currently much weaker than the high number of studies on this topic would suggest.
ARTICLE | doi:10.20944/preprints202105.0669.v1
Subject: Earth Sciences, Atmospheric Science Keywords: porosity prediction; pore-water prediction; gravity; resistivity; combined inversion
Online: 27 May 2021 (13:16:28 CEST)
This work describes a method to carry out 2-D inversion of gravity data in terms of porosity and matrix density distribution using previous DC resistivity inversion results to constraint the fractional pore-water content in the rocks. The inversion is carried out using a controlled random search (CRS) algorithm for global optimization. The method was tested on synthetic data generated from a model representing a graben, and the results show that it can estimate accurate values of contrast-density and porosity. The method was also applied to gravity and dc experimental data collected in NE Portugal, showing results that agree quite well with the known geological information.
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: Online Social Media prediction, Covid-19 prediction, Twitter, Google Trends
Online: 3 June 2021 (11:37:56 CEST)
As the coronavirus disease 2019 (COVID-19) continues to rage worldwide, the United States has become the most affected country with more than 34.1 million total confirmed cases up to June 1, 2021. In this work, we investigate correlations between online social media and Internet search for the COVID-19 pandemic among 50 U.S. states. By collecting the state-level daily trends through both Twitter and Google Trends, we observe a high but state-different lag correlation with the number of daily confirmed cases. We further find that the predictive accuracy measured by the correlation coefficient is positively correlated to a state’s demographic, air traffic volume and GDP development. Most importantly, we show that a state’s early infection rate is negatively correlated with the lag to the previous peak in Internet search and tweeting about COVID-19, indicating that earlier collective awareness on Twitter/Google correlates with lower infection rate. Lastly, we demonstrate that correlations between online social media and search trends are sensitive to time, mainly due to the attention shifting of the public.
ARTICLE | doi:10.20944/preprints202105.0116.v1
Subject: Keywords: Time Series Prediction; ANN forecasting; New Coronavirus; COVID19 prediction cases; COVID19 prediction deaths; COVID19 prediction ICU, COVID19 Vaccination; COVID19 in Europe; COVID19 in Israel; COVID19 use of face mask.
Online: 6 May 2021 (16:58:01 CEST)
The use of Artificial Neural Networks (ANN) is a great contribution to medical studies since the application of forecasting concepts allows the analysis of future diseases propagations. In this context, this paper presents a study of the new coronavirus SARS-COV-2 with a focus on verifying the virus propagation associated with mitigation procedures and massive vaccination campaigns. There were proposed two methodologies to predict 28 days ahead the number of new cases, deaths, and ICU patients of five European countries: Portugal, France, Italy, United Kingdom, and Germany, and a case study of the results of massive immunization in Israel. The data input of cases, deaths, and daily ICU patients was normalized to reduce discrepant numbers due to the countries size, and the cumulative vaccination values by the percentage of population immunized, at least with one dose of vaccine. As a comparative criterion, the calculation of the mean absolute error (MAE) of all predictions presents the best methodology and targets other possibilities of use for the proposed method. The best architecture achieved a general MAE for the 1 to 28 days ahead forecast lower than 30 cases, 0,6 deaths and 2,5 ICU patients by million people.
Subject: Behavioral Sciences, Social Psychology Keywords: active inference; digital affordances; patterns of attention; prediction error minimization; prediction error dynamics
Online: 19 September 2022 (04:49:39 CEST)
Culture exploits the acquisition of meaningful content by crafting regimes of shared attention, determining what is relevant, valuable, and salient. Culture changes the field of relevant social affordances worthy of being acted upon in a context-sensitive manner. When relevant affordances are highly weighted, their attentional capture and their salience increase the probability of them being enacted due to the associated expectation for minimizing prediction error. This process is known as active inference. In the digital era, individuals need to infer the action-related attributes of digital cues, here characterized as digital affordances. The digital affordances of digital social platforms are of particular interest here. Digital social affordances are defined as online possibilities of social interactions. By their own nature, these are salient because they are related to social interactions and relevant social cues. However, the problem of digital social platforms is that they are not equivalent to situated social interactions because their structure is built, mediated, and defined by third-parties with diverse interests. The third-parties behind the digital social platforms are using the same mechanism exploited by culture to manipulate the shared patterns of attention. Moreover, digital social platforms are deliberately designed to be hyper-stimulating, making digital social affordances highly rewarding and increasingly salient. This appropriation, for economic purposes, is an issue of great importance, especially as the COVID-19 pandemic brought deep global changes, pushing societies to an online digital way of life. Here, we examined different types of digital social affordances under an active inference view, placing them into two categories, those for self-identity formation, and those for belief-updating. This paper aims to analyze digital social affordances in light of the prediction error dynamics they might elicit to their users. Although each of the analyzed digital social affordances allows different epistemic and instrumental digital actions, they all share the characteristic of having an "easy" and a fast expected rate of error reduction. Here, we aim to provide a new hypothesis about how the design behind digital social affordances is built on our natural attractiveness to minimize prediction error and the resulting positive embodied feelings when doing so. Finally, it is suggested that because digital social affordances are becoming highly weighted in the field of affordances, this might be putting at risk our context-sensitive grip on a rich, dynamic and varied field of relevant affordances.
ARTICLE | doi:10.20944/preprints202201.0313.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: stochastic clustering; energy prediction; disaggregation
Online: 20 January 2022 (20:40:54 CET)
This paper describes a stochastic clustering architecture that is used in the paper for making predictions over energy data. The design is discrete, localised optimisations based on similarity, followed by a global aggregating layer, which can be compared with the recent random neural network designs, for example. The topic relates to the IDEAS Smart Home Energy Project, where a client-side Artificial Intelligence component can predict energy consumption for appliances. The proposed data model is essentially a look-up table of the key energy bands that each appliance would use. Each band represents a level of consumption by the appliance. This table can replace disaggregation from more complicated methods, usually constructed from probability theory, for example. Results show that the table can accurately disaggregate a single source to a set of appliances, because each appliance has quite a unique energy footprint. As part of predicting energy consumption, the model could possibly reduce costs by 50% and more than that if the proposed schedules are also included. The hyper-grid has been changed to consider rows as single units, making it more tractable. A second case study considers wind power patterns, where the grid optimises over the dataset columns in a self-similar way to the rows, allowing for some level of feature analysis.
ARTICLE | doi:10.20944/preprints201901.0023.v1
Online: 3 January 2019 (13:20:00 CET)
The main objective of this study is to search better prediction result of rainy seasonal rainfall (15 June-15 August). A correlation between rainfall of Bengali rainy seasons at Rangpur, Dhaka, Barisal and Sylhet and global sea surface temperature (SST) of different areas of the world was studied by using the both data of 1975- 2008 years with the help of the Climate Predictability Tool (CPT) to find more positive correlated SST with observed rainfall and use as predictor for giving the prediction of the year 2009. Using SST of one month before rainy season as predictor, the positive deviation of predicted rainfall from observed rainfall was 1.34 mm/day at Sylhet and 0.9 mm/day at Dhaka. The negative deviation of mean rainfall was 1.16 mm/day at Rangpur and 1.10 mm/day at Barisal. Again, using of starting one month SST of rainy season as predictor, positive deviation of predicted rainfall from observed rainfall was 4.03 mm/day at Sylhet. The positive deviation of daily mean rainfall was found 6.58 mm/day at Dhaka and 6.23 mm/day over southern Bangladesh. The study reveals that sea surface temperature (SST) of one month before rainy season was better predictor than SST of starting month of rainy season.
REVIEW | doi:10.20944/preprints201810.0098.v2
Subject: Earth Sciences, Environmental Sciences Keywords: flood prediction; machine learning; forecasting
Online: 26 October 2018 (11:56:27 CEST)
Floods are among the most destructive natural disasters, which are highly complex to model. The research on the advancement of flood prediction models has been contributing to risk reduction, policy suggestion, minimizing loss of human life and reducing the property damage associated with floods. To mimic the complex mathematical expressions of physical processes of floods, during the past two decades, machine learning (ML) methods have highly contributed in the advancement of prediction systems providing better performance and cost effective solutions. Due to the vast benefits and potential of ML, its popularity has dramatically increased among hydrologists. Researchers through introducing the novel ML methods and hybridization of the existing ones have been aiming at discovering more accurate and efficient prediction models. The main contribution is to demonstrate the state of the art of ML models in flood prediction and give an insight over the most suitable models. The literature where ML models are benchmarked through a qualitative analysis of robustness, accuracy, effectiveness, and speed have been particularly investigated to provide an extensive overview on various ML algorithms usage in the field. The performance comparison of ML models presents an in-depth understanding about the different techniques within the framework of a comprehensive evaluation and discussion. As the result, the paper introduces the most promising prediction methods for both long-term and short-term floods. Furthermore, the major trends in improving the quality of the flood prediction models are investigated. Among them, hybridization, data decomposition, algorithm ensemble, and model optimization are reported the most effective strategy in improvement of the ML methods. This survey can be used as a guideline for the hydrologists as well as climate scientists to assist them choosing the proper ML method according to the prediction task conclusions.
ARTICLE | doi:10.20944/preprints202207.0323.v1
Subject: Mathematics & Computer Science, Probability And Statistics Keywords: Algorithmic probability; Kolmogorov complexity; prediction; induction
Online: 21 July 2022 (10:48:26 CEST)
Developing new ways to estimate probabilities can be valuable for science, statistics, and engineering. By considering the information content of different output patterns, recent work invoking algorithmic information theory has shown that a priori probability predictions based on pattern complexities can be made in a broad class of input-output maps. These algorithmic probability predictions do not depend on a detailed knowledge of how output patterns were produced, or historical statistical data. Although quantitatively fairly accurate, a main weakness of these predictions is that they are given as an upper bound on the probability of a pattern, but many low complexity, low probability patterns occur, for which the upper bound has little predictive value. Here we study this low complexity, low probability phenomenon by looking at example maps, namely a finite state transducer, natural time series data, RNA molecule structures, and polynomial curves. Some mechanisms causing low complexity, low probability behaviour are identified, and we argue this behaviour should be assumed as a default in the real world algorithmic probability studies. Additionally, we examine some applications of algorithmic probability and discuss some implications of low complexity, low probability patterns for several research areas including simplicity in physics and biology, a priori probability predictions, Solomonoff induction and Occam's razor, machine learning, and password guessing.
ARTICLE | doi:10.20944/preprints202205.0313.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: Failure Prediction; Asynchronous motor; Neural Network
Online: 24 May 2022 (03:37:35 CEST)
Three-phase motors are commonly adopted in several industrial contexts and their failures can result in costly downtime causing undesired service outages; this way, motor diagnostics is an issue that assumes great importance. To prevent their failures and timely face the considered service outages, a non-invasive method to identify electrical and mechanical faults in three-phase asynchronous electric motors is proposed in the paper. In particular, a measurement strategy along with a machine learning algorithm based on Artificial Neural Network is exploited to properly classify failures. In particular, digitized current samples of each motor phase are first processed by means of FFT and PSD in order to estimate the associated spectrum. Suitable features (in terms of frequency and amplitude of the spectral components) are then singled out to either train or feed a neural network acting as a classifier. The method is preliminary validated on a set of 28 electric motors, and its performance is compared with common state-of-art machine learning techniques. The obtained results show that the proposed methodology is able to reach accuracy levels greater than 98\% in identifying anomalous conditions of three-phase asynchronous motors.
ARTICLE | doi:10.20944/preprints202011.0366.v1
Subject: Engineering, Automotive Engineering Keywords: Fault detection; Control Valve; Reliability, Prediction
Online: 13 November 2020 (09:23:39 CET)
Reliability assessment is an important component and tool used for process plants since the facility consists of many loops and instruments attached and operates based on each other availability, thus it requires a statistical method to visualize the reliability. The paper focuses on reliability assessment and prediction based on available statistical models such as normal, log-normal, exponential, and Weibull distribution. This paper also visualizes, which model fits best for assessment and prediction and also considers failure modes caused during a simulation mode process control operation. A simulation model is designed in this paper to observe the failure of the control valve causing stiction to visualize the failure modes and predict the best-fit model for reliability assessment.
ARTICLE | doi:10.20944/preprints202008.0139.v1
Subject: Engineering, Industrial & Manufacturing Engineering Keywords: copper price; prediction; support vector regression
Online: 6 August 2020 (08:26:35 CEST)
Predicting copper price is essential for making decisions that can affect companies and governments dependent on the copper mining industry. Copper prices follow a time series that is non-linear, non-stationary, and which have periods that change as a result of potential growth, cyclical fluctuation and errors. Sometimes the trend and cyclical components together are referred to as a trend-cycle. In order to make predictions, it is necessary to consider the different characteristics of trend-cycle. In this paper, we study a copper price prediction method using Support Vector Regression. This work explores the potential of the Support Vector Regression with external recurrences to make predictions at 5, 10, 15, 20 and 30 days into the future in the copper closing price at the London Metal Exchanges. The best model for each forecast interval is performed using a grid search and balanced cross-validation. In experiments on real data-sets, our results obtained indicate that the parameters (C, ε, γ) of the model Support Vector Regression do not differ between the different prediction intervals. Additionally, the amount of preceding values used to make the estimates does not vary according to the predicted interval. Results show that the support vector regression model has a lower prediction error and is more robust. Our results show that the presented model is able to predict copper price volatilities near reality, being the RMSE equal or less than the 2.2% for prediction periods of 5 and 10 days.
ARTICLE | doi:10.20944/preprints202002.0095.v1
Subject: Earth Sciences, Other Keywords: permafrost; temperature; nonlinear fitting; prediction model
Online: 7 February 2020 (11:31:37 CET)
The pile foundation in the permafrost region is in a negative temperature environment, so the concrete is affected by the negative temperature of the surrounding soil.It not only affects the formation of concrete strength, but also leads to engineering quality accidents in serious cases.Based on the actual measurement of temperature at different strata depths and the comprehensive consideration of surface temperature, terrestrial heat flux and other parameters, the law curve of temperature change along depth in Greater Khingan is established.The calculated results of the curve are consistent with the measured results of ground temperature.The results show that the variation trend of ground temperature along the strata depth at different monitoring sites is basically the same. From June to November, the ground temperature at different depths tends to be constant.From December to May, the ground temperature at any depth within the depth range of 0 to 5.5m follows the law of the cosine function.Below 5.5m, the earth temperature no longer varies with depth.The research results can be used as reference for pile foundation construction under negative temperature environment.
ARTICLE | doi:10.20944/preprints201909.0238.v1
Subject: Engineering, Control & Systems Engineering Keywords: Software runtime entropy; failure prediction; indicator
Online: 20 September 2019 (10:49:11 CEST)
With the development of computer science and software engineering, software becomes more and more complex. Traditional software reliability assurance techniques including software testing and evaluation can't ensure software reliable execution after being deployed. Software failure prediction techniques based on failure indicators can predict software failures according to abnormal indicator values. The latter can be collected using runtime monitoring techniques. An essential part of this method is finding proper indicators which have strong correlation with software failures. We propose a novel type of indicators in this work named software runtime entropy, which takes both software module execution time and call times into consideration. Three common open source software, grep, flex and gzip are used as study cases for finding the relationships between the indicators and software failures. Firstly, a series of fault injection experiments are conducted on those three software respectively. The decision tree algorithm is used to train those data to build the correlation models between software runtime entropy and software failures. Several common measures in machine learning domains such as accuracy, recall rates, and F-measure are used to evaluate the models. The decision tree models can be used as failure mechanisms to assist the failure prediction work. One can examine the value of runtime entropy and make a warning report when it ranges from the normal interval to abnormal one.
REVIEW | doi:10.20944/preprints202010.0510.v1
Subject: Life Sciences, Biochemistry Keywords: disease-associated mutation; IDR; intrinsically disordered region; LLPS; phase separation; PTM; Ahr; AhRR; SIM1; SIM2; Hif-2α; NPAS4; ARNT2; BMAL1; disorder prediction; LLPS prediction; cancer; HuVarBase; catGranule prediction
Online: 26 October 2020 (10:30:47 CET)
The bHLH-PAS proteins are a family of transcription factors regulating expression of a wide range of genes involved in different functions, from differentiation and development control, by oxygen and toxins sensing to circadian clock setting. In addition to the well-preserved DNA-binding bHLH and PAS domains, bHLH-PAS proteins contain long intrinsically disordered C-terminal regions, responsible for their activity regulation. Our aim was to analyse the potential connection between disordered regions of the bHLH-PAS transcription factors with posttranscriptional modifications and liquid-liquid phase separation in the context of the disease-associated missense mutations. Highly flexible disordered regions, enriched in short more ordered motives, are responsible for wide spectrum of interactions with transcriptional co-regulators. Based on our in silico analysis and taking into account fact that transcription factors functions can be modulated by posttranslational modifications and spontaneous phase separation, we assume that the location of missense mutations inducing disease states, is clearly related to sequences directly undergoing these processes or to sequences responsible for their activity regulation.
ARTICLE | doi:10.20944/preprints202209.0277.v1
Subject: Mathematics & Computer Science, Probability And Statistics Keywords: link prediction; AUC-ROC; Early retrieval evaluation
Online: 19 September 2022 (10:31:53 CEST)
Link prediction is an unbalanced early retrieval problem, whose goal is to prioritize a small cohort of positive links on top of a list largely populated by unlabelled links. Differently from binary classification, here the evaluation focuses on how the predictor prioritizes the positive class because, in practice, a negative class does not exist. Previous studies explained that AUC-ROC is not apt for unbalanced class problems and is misleading for early retrieval problems, therefore standard AUC-ROC is not appropriate for evaluation of link prediction. However, some scholars argue that an AUC-ROC like evaluation accounting for the relative positioning of the few positive links among the vastness of unlabelled links remains a valid concept to pursue. Here we propose the area under the magnified ROC (AUC-mROC), a new measure that adjusts the standard AUC-ROC to work also for unbalanced early retrieval problems such as link prediction.
ARTICLE | doi:10.20944/preprints202207.0226.v1
Subject: Earth Sciences, Environmental Sciences Keywords: yield prediction; APSIM; optimization; Bayesian; hierarchical; emulation
Online: 15 July 2022 (05:44:05 CEST)
The enormous increase in the volume of Earth Observations (EOs) has provided the scientific community with unprecedented temporal, spatial, and spectral information. However, this increase in the volume of EOs has not yet resulted in proportional progress with our ability to forecast agricultural systems.This study examines the applicability of EOs obtained from Sentinel2 and Landsat8 for constraining the APSIM-Maize model parameters. We leveraged leaf area index (LAI) retrieved from Sentinel2 and Landsat8 NDVI to constrain a series of APSIM-Maize model parameters in three different Bayesian multi-criteria optimization frameworks across 13 different sites across the U.S Midwest. A time variant sensitivity analysis was performed to identify the most influential parameters driving the LAI estimates in APSIM-Maize model. Then surrogate models were develop using random samples taken from the parameter space using Latin hypercube sampling to emulate APSIM’s behavior in simulating NDVI and LAI at all sites. Site-level, global and hierarchical Bayesian optimization models were then developed using the site-level emulators to simultaneously constrain all parameters and estimate the site to site variability in crop parameters. For within sample predictions, site-level optimization showed the largest predictive uncertainty around LAI and crop yield, whereas the global optimization showed the most constraint predictions for these variables. Lowest RMSE for within sample yield prediction was found for hierarchical optimization scheme (1423 Kg ha−1) while the largest RMSE was found for site-level (1494 Kg ha−1). In out-of-sample predictions within the spatio-temporal extent of the training sites, global optimization showed lower RMSE (1627 Kg ha−1) compared to the hierarchical approach (1822 Kg ha−1) across 90 independent sites in the U.S Midwest. On comparison between these two optimization schemes across another 242 independent sites outside the spatio-temporal extent of the training sites, global optimization also showed substantially lower RMSE (1554 Kg ha−1) as compared to the hierarchical approach (2532 Kg ha−1). Overall, EOs demonstrated their real use case for constraining process-based crop models and showed comparable results to model calibration exercises using only field measurements.
ARTICLE | doi:10.20944/preprints202207.0035.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: QoE; Fairness; SDN; Classification Prediction; DASH; Multimedia
Online: 4 July 2022 (06:08:03 CEST)
Quality of Experience (QoE) metrics can be used to assess user perception and satisfaction in data services applications delivered over the Internet. End-to-end metrics are formed because QoE is dependent on both the users’ perception and the service used. Traditionally, network optimization has focused on improving network properties such as the QoS. In this paper we examine the Adaptive streaming over a software defined network environment. We aimed to evaluate and study the media streams, aspects affecting the stream, and network. This was done to eventually reach a stage of analysing the network’s features and their direct relationship with the perceived QoE. We then use machine learning to build a prediction model based on subjective user experiments. This will help to eliminate future physical experiments and automate the process of predicting QoE.
ARTICLE | doi:10.20944/preprints202202.0175.v1
Subject: Life Sciences, Biotechnology Keywords: antimicrobial peptide prediction; sequence analysis; random forest
Online: 14 February 2022 (11:57:01 CET)
Antimicrobial peptides (AMPs) are considered as promising alternatives to conventional antibiotics in order to overcome the growing problems of antibiotic resistance. Computational prediction approaches receive an increasing interest to identify and design the best candidate AMPs prior to the in-vitro tests. In this study, we focused on the linear cationic peptides with non-hemolytic activity, which are downloaded from the Database of Antimicrobial Activity and Structure of Peptides (DBAASP). Referring to the MIC (Minimum inhibition concentration) values, we have assigned a positive label to a peptide if it shows antimicrobial activity; otherwise the peptide is labeled as negative. Here, we focused on the peptides showing antimicrobial activity against Gram-negative and against Gram-positive bacteria separately, and we created two datasets accordingly. Ten different physico-chemical properties of the peptides are calculated and used as features in our study. Following data exploration and data preprocessing steps, a variety of classification algorithms are used with 100-fold Monte Carlo Cross Validation to build models and to predict the antimicrobial activity of the peptides. Among the generated models, Random Forest has resulted in the best performance metrics for both Gram-negative dataset (Accuracy: 0.98, Recall: 0.99, Specificity: 0.97, Precision: 0.97, AUC: 0.99, F1: 0.98) and Gram-positive dataset (Accuracy: 0.95, Recall: 0.95, Specificity: 0.95, Precision: 0.90, AUC: 0.97, F1: 0.92) after outlier elimination is applied. This prediction approach might be useful to evaluate the antibacterial potential of a candidate peptide sequence before moving to the experimental studies.
ARTICLE | doi:10.20944/preprints202202.0145.v1
Subject: Mathematics & Computer Science, Other Keywords: Smart grids; Optimization; Prediction methods; Energy exchange
Online: 10 February 2022 (07:55:02 CET)
The concept of distributed generation has made photovoltaic an integral source of energy in smart grid systems, especially in peer-to-peer energy trading frameworks that exploit excess power to fulfill the energy requirements of consumers in cost-efficient and eco-friendly manner. It is believed that P2P energy trading will dominate a significant portion of research in forthcoming power generation systems due to the excessive rise of energy demands across the globe. Despite a plethora of studies on energy optimization solutions in P2P trading, minimizing nanogrid energy trading cost and efficient energy sharing between consumers and prosumers are deemed among the challenging problems. This study overcomes essential issues overlooked by the contemporary P2P energy trading models by introducing a predictive optimization-oriented nanogrid energy trading model. The proposed study encompasses two stages: (1) predictive optimization model which harnesses BD-LSTM-based forecasted energy parameters (energy load, energy consumption, and PV generation) that are later incorporated in PSO-enabled objective function to reduce nanogrid trading cost, (2) optimal energy sharing plan is devised to decide the role of nanogrids as prosumers or consumers by emphasizing the use of PV-produced energy. The proposed model is validated on the case study containing nanogrid houses data. The simulation provides detailed experiments by comparing the energy demand and response using the proposed energy sharing model. The outcomes yield that the energy sharing plan holds a significant potential to fulfill maximum energy requirements of nanogrid house in P2P cluster and significantly reduces the energy cost compared to grid.
ARTICLE | doi:10.20944/preprints202111.0030.v1
Subject: Physical Sciences, General & Theoretical Physics Keywords: reservoir computing; time series prediction; performance optimisation
Online: 2 November 2021 (10:09:46 CET)
Reservoir computing is a machine learning method that uses the response of a dynamical system to a certain input in order to solve a task. As the training scheme only involves optimising the weights of the responses of the dynamical system, this method is particularly suited for hardware implementation. Furthermore, the inherent memory of dynamical systems which are suitable for use as reservoirs mean that this method has the potential to perform well on time series prediction tasks, as well as other tasks with time dependence. However, reservoir computing still requires extensive task dependent parameter optimisation in order to achieve good performance. We demonstrate that by including a time-delayed version of the input for various time series prediction tasks, good performance can be achieved with an unoptimised reservoir. Furthermore, we show that by including the appropriate time-delayed input, one unaltered reservoir can perform well on six different time series prediction tasks at a very low computational expense. Our approach is of particular relevance to hardware implemented reservoirs, as one does not necessarily have access to pertinent optimisation parameters in physical systems but the inclusion of an additional input is generally possible.
ARTICLE | doi:10.20944/preprints202103.0337.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Graph embedding; Link prediction; Mutual information; Subgraph
Online: 12 March 2021 (08:47:29 CET)
The prediction of drug--target interactions is always a key task in the field of drug redirection. However, traditional methods of predicting drug--target interactions are either mediocre or rely heavily on data stacking. In this work, we merged heterogeneous graph information and obtained effective node information and substructure information based on mutual information in graph embeddings. We then learned high quality representations for downstream tasks, and proposed an end--to--end auto--encoder model to complete the task of link prediction. Experimental results show that our method outperforms several state--of--art models. The model can achieve the area under the receiver operating characteristics (AUROC) curve of 0.959 and area under the precise recall curve (AUPR) of 0.848. We found that the mutual information between the substructure and graph--level representations contributes most to the mutual information index in a relatively sparse network. And the mutual information between the node--level and graph--level representations contributes most in a relatively dense network.
Subject: Biology, Animal Sciences & Zoology Keywords: intramuscular fat; prediction; image analysis; Bísaro pork
Online: 13 January 2021 (13:16:19 CET)
This work presents an analytical methodology to predict meat juiciness (discriminant semi-quantitative analysis using groups of intervals of intramuscular fat) and intramuscular fat (regression analysis) in Longissimus thoracis et lumborum (LTL) muscle of Bísaro pigs using as independent variables the animal carcass weight and parameters from color and image analysis. These are non-invasive and non-destructive techniques which allow development of rapid, easy and inexpensive methodologies to evaluate pork meat quality in a slaughterhouse. The proposed predictive supervised multivariate models were non-linear. Discriminant mixture analysis to evaluate meat juiciness by classified samples into three groups—0.6 to 1.1%; 1.25 to 1.5%; and, greater than 1.5%. The obtained model allowed 100% of correct classifications (92% in cross-validation with seven-folds with five repetitions). Polynomial support vector machine regression to determine the intramuscular fat presented R2 and RMSE values of 0.88 and 0.12, respectively in cross-validation with seven-folds with five repetitions. This quantitative model (model’s polynomial kernel optimized to degree of three with a scale factor of 0.1 and a cost value of one) presented R2 and RSE values of 0.999 and 0.04, respectively. The overall predictive results demonstrated the relevance of photographic image and color measurements of the muscle to evaluate the intramuscular fat, rarther than the usual time-consuming and expensive chemical analysis.
ARTICLE | doi:10.20944/preprints202009.0521.v1
Subject: Engineering, Biomedical & Chemical Engineering Keywords: electroencephalographic; feature selection; machine learning; prediction model
Online: 22 September 2020 (11:27:03 CEST)
In recent years, research has focused on generating mechanisms to assess the levels of subjects' cognitive workload when performing various activities that demand high concentration levels, such as driving a vehicle. These mechanisms have implemented several tools to analyze cognitive workload where the electroencephalographic (EEG) signals are the most used due to its high precision. However, one of the main challenges in the EEG signals implementing is finding the appropriate information to identify cognitive states. Here we show a new feature selection model for pattern recognition using information from EEG signals based on machine learning techniques called GALoRIS. GALoRIS combines Genetic Algorithms and Logistic Regression to create a new fitness function that identifies and selects the critical EEG features that contribute to recognizing high and low cognitive workload and structures a new dataset capable of optimizing the model's predictive process. We found that GALoRIS identifies data related to high and low cognitive workload of subjects while driving a vehicle using information extracted from multiple EEG signals, reducing the original dataset by more than 50%, maximizing the model's predictive capacity-achieving a precision rate greater than 90%.
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Crime prediction; Ensemble Learning; Machine Learning; Regression
Online: 14 September 2020 (00:53:30 CEST)
While the use of crime data has been widely advocated in the literature, its availability is often limited to large urban cities and isolated databases tend not to allow for spatial comparisons. This paper presents an efficient machine learning framework capable of predicting spatial crime occurrences, without using past crime as a predictor, and at a relatively high resolution: the U.S. Census Block Group level. The proposed framework is based on an in-depth multidisciplinary literature review allowing the selection of 188 best-fit crime predictors from socio-economic, demographic, spatial, and environmental data. Such data are published periodically for the entire United States. The selection of the appropriate predictive model was made through a comparative study of different machine learning families of algorithms, including generalized linear models, deep learning, and ensemble learning. The gradient boosting model was found to yield the most accurate predictions for violent crimes, property crimes, motor vehicle thefts, vandalism, and the total count of crimes. Extensive experiments on real-world datasets of crimes reported in 11 U.S. cities demonstrated that the proposed framework achieves an accuracy of 73 and 77% when predicting property crimes and violent crimes, respectively.
ARTICLE | doi:10.20944/preprints202005.0176.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: COVID-19; epidemic diseases; compartmental model; prediction
Online: 10 May 2020 (17:10:15 CEST)
In India the first case of coronavirus disease 2019 (COVID-19) reported on 30 January 2020, and thereafter cases were increasing daily after the last week of Feb. 2020. COVID-19 identified as family member of coronaviridae where previously Middle East Respiratory Syndrome MERS and Severe Acute Respiratory Syndrome SARS belongs to same family. The COVID-19 attacks on respiratory system signing fever, cough and breath shortness, in severe cases may cause pneumonia, SARS or some time death. The aim of this study work is to develop model which predicts the epidemic peak for COVID-19 in India by using the real-time data from 30 Jan to 10 May 2020. There are uncertainties while identifying the population information due to the incomplete and inaccurate data, we initiate the most popular model for epidemic prediction i.e Susceptible, Exposed, Infectious, & Recovered SEIR initially the compartmental model for the prediction. Based on the solution of the state estimation problem for polynomial system with Poisson noise, we estimate that the epidemic peak may reach the early-middle July 2020, initializing recovered R0 to 0 and Infected I0 to 1. The outcomes of the model will help epidemiologist to isolate the source of the disease geospatially and analyze the death. Also government authorities will be able to target their interventions for rapidly checking the spread of the epidemic.
ARTICLE | doi:10.20944/preprints202004.0539.v1
Online: 30 April 2020 (16:50:03 CEST)
With the pandemic of Corona Virus [Covid-19], another infectious disease such as dengue neglected In Indonesia. Since the majority of resources, both human and capital, are focusing more on Covid-19, it is still essential to also manage dengue as it is still becoming a threat to the community. This paper aims to predict the number of cases of dengue in Kupang, East Nusa Tenggara, which can help the government to plan for dengue program activities. The result shows the forecast that dengue will remain high for the whole year. With the stay at the home approach to preventing COVID19, chances to get dengue virus increased. Maintaining a clean environment, reduction of breeding sites, and other protective measurements against dengue transmission is very important to perform.
ARTICLE | doi:10.20944/preprints202004.0421.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: COVID-19; trend prediction; optimized neural network
Online: 24 April 2020 (02:57:32 CEST)
The recent worldwide outbreak of the novel corona-virus (COVID-19) opened up new challenges to the research community. Artificial intelligence (AI) driven methods can be useful to predict the parameters, risks, and effects of such an epidemic. Such predictions can be helpful to control and prevent the spread of such diseases. The main challenges of applying AI is the small volume of data and the uncertain nature. Here, we propose a shallow Long short-term memory (LSTM) based neural network to predict the risk category of a country. We have used a Bayesian optimization framework to optimized and automatically design country-specific networks. We have combined the trend data and weather data together for the prediction. The results show that the proposed pipeline outperforms against state-of-the-art methods for 170 countries data and can be a useful tool for such risk categorization. The tool can be used to predict long-duration outbreak of such an epidemic such that we can take preventive steps earlier.
Subject: Chemistry, General & Theoretical Chemistry Keywords: structure prediction; Rosetta; computational modeling; protein design
Online: 16 October 2019 (05:40:52 CEST)
The Rosetta software suite for macromolecular modeling, docking, and design is widely used in pharmaceutical, industrial, academic, non-profit, and government laboratories. Considering its broad modeling capabilities, Rosetta consistently ranks highly when compared to other leading methods created for highly specialized protein modeling and design tasks. Developed for over two decades by a global community of scientists at more than 60 institutions, Rosetta has undergone multiple refactorings, and now comprises over three million lines of code. Here we discuss the methods developed in the last five years, involving the latest protocols for structure prediction, protein–protein and protein–small molecule docking, protein structure and interface design, loop modeling, the incorporation of various types of experimental data, and modeling of peptides, antibodies and other proteins in the immune system, nucleic acids, non-standard amino acids, carbohydrates, and membrane proteins. We briefly discuss improvements to the energy function, user interfaces, and usability of the software. Rosetta is available at www.rosettacommons.org.
ARTICLE | doi:10.20944/preprints201805.0015.v1
Online: 2 May 2018 (08:12:01 CEST)
We examine the use of deep learning (neural networks) to predict the movement of the S&P 500 Index using past returns of all the stocks in the index. Our analysis finds that the future direction of the S&P 500 index can be weakly predicted by the prior movements of the underlying stocks in the index. Decomposition of the prediction error indicates that most of the lack of predictability comes from randomness and only a little from nonstationarity. We believe this is the first test of S&P500 market efficiency that uses a very large information set, and it extends the domain of weak-form market efficiency tests.
ARTICLE | doi:10.20944/preprints201701.0063.v1
Subject: Materials Science, General Materials Science Keywords: HfB4; structure prediction; superhard material; anisotropic properties
Online: 12 January 2017 (10:57:03 CET)
By using the particle swarm optimization algorithm for crystal structure prediction, we reveal a newly orthorhombic Cmcm structure of HfB4, which is more energetically superior to the previously proposed YB4-, ReP4-, FeB4-, CrB4-, and MnB4-type structures in the considered pressure range. The phonon dispersion and elastic constants calculations confirm that the new phase is dynamically and mechanically stable. The calculated large shear modulus (240 GPa) and high hardness (45.7 GPa) imply that the predicted Cmcm-HfB4 is a potential superhard material. Meanwhile, the directional dependences of the Young's modulus, bulk modulus, and shear modulus for HfB4 are systematically investigated. Further analyses of the density of states and electronic localization function indicate that the strong B-B and B-Hf covalent bonds greatly contribute to its high hardness and stability.
Subject: Mathematics & Computer Science, Numerical Analysis & Optimization Keywords: harmony search; meta-heuristic; parameter optimization; software defect prediction; just-in-time prediction; software quality assurance; maintenance; maritime transportation
Online: 31 December 2020 (09:27:46 CET)
Software is playing the most important role in recent vehicle innovation, and consequently the amount of software has been rapidly growing last decades. Safety-critical nature of ships, one sort of vehicles, makes Software Quality Assurance (SQA) has gotten to be a fundamental prerequisite. Just-In-Time Software Defect Prediction (JIT-SDP) aims to conduct software defect prediction (SDP) on commit-level code changes to achieve effective SQA resource allocation. The first case study of SDP in maritime domain reported feasible prediction performance. However, we still consider that the prediction model has still rooms for improvement since the parameters of the model are not optimized yet. Harmony Search (HS) is a widely used music-inspired meta-heuristic optimization algorithm. In this article, we demonstrated that JIT-SDP can produce the better performance of prediction by applying HS-based parameter optimization with balanced fitness value. Using two real-world datasets from the maritime software project, we obtained an optimized model that meets the performance criterion beyond baseline of previous case study throughout various defect to non-defect class imbalance ratio of datasets. Experiments with open source software also showed better recall for all datasets despite we considered balance as performance index. HS-based parameter optimized JIT-SDP can be applied to the maritime domain software with high class imbalance ratio. Finally, we expect that our research can be extended to improve performance of JIT-SDP not only in maritime domain software but also in open source software.
ARTICLE | doi:10.20944/preprints202208.0119.v1
Subject: Earth Sciences, Environmental Sciences Keywords: LULC; prediction; artificial neural network; Urmia; CA-Markov
Online: 5 August 2022 (09:32:32 CEST)
A correctly obtained Land-use/land-cover (LULC) prediction map is essential to under-standing and assessing future patterns. In the study, the LULC map of Urmia/Iran in 2030 was produced using two different prediction methods CA-Markov and Artificial Neural Network (ANN). In general, the study followed a methodology consisting of three steps. In the first steps, Landsat satellite images acquired in 2000, 2010 and 2020 were classified with maximum likelihood algorithm and LULC maps were prepared for each year. In the second stage, to investigate the LULC prediction methods' validation (CA-Markov and ANN) the LULC prediction map of 2020 was produced using the LULC map of 2000 and 2010; In this step, the predicted LULC map of 2020 and the actual LULC map of 2020 were evaluated by correctness, completeness and quality indexes. Finally, The LULC map for 2030 was prepared using all two algorithms and the change map was extracted. The results show that the area of soil and vegetation decreased, and built-up regions increased during the research period. The methods validation results show that the two algorithms are much closer to each other. Nevertheless, in general, ANN has the highest completeness (96.21%) and quality (93.8%) and CA-Markov the most correctness (96.47). This study shows that the CA-Markov algorithm is most successful in predicting the future that had larger areas and a higher percentage in the region (urban and vegetation cover) and the ANN algorithm in predicting phenomena that had smaller levels with fewer percentages (soil and rock).
ARTICLE | doi:10.20944/preprints202201.0378.v1
Subject: Medicine & Pharmacology, General Medical Research Keywords: staphylococcus aureus; infective endocarditis; clinical prediction rules; echocardiography
Online: 25 January 2022 (10:41:47 CET)
Background. It is unclear whether the use of clinical prediction rules is sufficient to rule out infective endocarditis (IE) in patients with Staphylococcus aureus bacteremia (SAB) without an echocardiogram evaluation, either transthoracic (TTE) and/or transesophageal (TEE). Our primary purpose was to test the usefulness of PREDICT, POSITIVE and VIRSTA scores to rule out IE without echocardiography. Our secondary purpose was to evaluate whether not performing an echocardiogram evaluation is associated with higher mortality. Methods. We conducted a unicentric retrospective cohort including all patients with a first SAB episode from January 2015 to December 2020. IE was defined according to modified Duke criteria. We predefined threshold cut-off points to consider that IE was ruled out by means of the mentioned scores. To assess 30-day mortality, we used a multivariable regression model considering performing an echocardiogram as covariate. Results. Out of 404 patients, IE was diagnosed in 50 (12.4%). Prevalence of IE within patients with negative PREDICT, POSITIVE and VIRSTA scores was: 3.6% (95% CI 0.1-6.9%), 4.9% (95% CI 2.2-7.7%), and 2.2% (95% CI 0.2-4.3%), respectively. Patients with negative VIRSTA and negative TTE had an IE prevalence of 0.9% (95% CI 0-2.8%). Performing an echocardiogram was independently associated with lower 30-day mortality (OR 0.24 95%CI 0.10-0.54, p=0.001). Conclusion. PREDICT and POSITIVE scores were not sufficient to rule out IE without TEE. In patients with negative VIRSTA score, it was doubtful if IE could be discarded with a negative TTE. Not performing an echocardiogram was associated with worse outcomes, which might be related to presence of occult IE. Further studies are needed to assess the usefulness of clinical prediction rules in avoiding echocardiographic evaluation in SAB patients.
ARTICLE | doi:10.20944/preprints202110.0360.v2
Subject: Mathematics & Computer Science, Other Keywords: Household Disaster Preparation; Natural Hazards Mitigation; Prediction Model
Online: 2 November 2021 (12:57:04 CET)
Natural disasters are showing an increase in the magnitude, frequency, and geographic distribution. Studies have shown that individuals’ self-sufficiency, which largely depends on household preparedness, is very important for hazard mitigation in at least the first 72 hours following a disaster. However, for factors that influence a household’s disaster preparedness, though there are many studies trying to identify from different aspects, we still lack an integrative analysis on how these factors contribute to a household’s preparation. This paper aims to build a classification model to predict whether a household has prepared for a potential disaster based on their personal characteristics and the environment they located. We collect data from the Federal Emergency Management Agency’s National Household Survey in 2018 and train four classification models - logistic regression, decision trees, support vector machines, and multi-layer perceptron classifier models- to predict the impact of personal characteristics and the environment they located on household prepare for the potential natural disaster. Results show that the multi-layer perceptron classifier model outperforms others with the highest scoring on both recall (0.8531) and f1 measure (0.7386). In addition, feature selection results also show that among other factors, a household’s accessibility to disaster-related information is the most critical factor that impacts household disaster preparation. Though there is still room for further parameter optimization, the model gives a clue that we could support disaster management by gathering publicly accessible data.
ARTICLE | doi:10.20944/preprints202106.0533.v1
Online: 22 June 2021 (08:30:30 CEST)
The novel coronavirus disease (COVID-19) has created immense threats to public health on various levels around the globe. The unpredictable outbreak of this disease and the pandemic situation are causing severe depression, anxiety and other mental as physical health related problems among the human beings. To combat against this disease, vaccination is essential as it will boost the immune system of human beings while being in the contact with the infected people. The vaccination process is thus necessary to confront the outbreak of COVID-19. This deadly disease has put social, economic condition of the entire world into an enormous challenge. The worldwide vaccination progress should be tracked to identify how fast the entire economic as well as social life will be stabilized. The monitor ofthe vaccination progress, a machine learning based Regressor model is approached in this study. This tracking process has been applied on the data starting from 14th December, 2020 to 24th April, 2021. A couple of ensemble based machine learning Regressor models such as Random Forest, Extra Trees, Gradient Boosting, AdaBoost and Extreme Gradient Boosting are implemented and their predictive performance are compared. The comparative study reveals that the AdaBoostRegressor outperforms with minimized mean absolute error (MAE) of 9.968 and root mean squared error (RMSE) of 11.133.
ARTICLE | doi:10.20944/preprints202104.0628.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Food production; machine learning; agricultural production; prediction model
Online: 23 April 2021 (10:20:09 CEST)
Advancing models for accurate estimation of food production is essential for policymaking and managing national plans of action for food security. This research proposes two machine learning models for the prediction of food production. The adaptive network-based fuzzy inference system (ANFIS) and multilayer perceptron (MLP) methods are used to advance the prediction models. In the present study, two variables of livestock production and agricultural production were considered as the source of food production. Three variables were used to evaluate livestock production, namely livestock yield, live animals, and animal slaughtered, and two variables were used to assess agricultural production, namely agricultural production yields and losses. Iran was selected as the case study of the current study. Therefore, time-series data related to livestock and agricultural productions in Iran from 1961 to 2017 have been collected from the FAOSTAT database. First, 70% of this data was used to train ANFIS and MLP, and the remaining 30% of the data was used to test the models. The results disclosed that the ANFIS model with Generalized bell-shaped (Gbell) built-in membership functions has the lowest error level in predicting food production. The findings of this study provide a suitable tool for policymakers who can use this model and predict the future of food production to provide a proper plan for the future of food security and food supply for the next generations.
Subject: Engineering, Automotive Engineering Keywords: drought; drought indices; South Asia; prediction; projection; teleconnection
Online: 1 March 2021 (17:52:21 CET)
South Asian countries experience frequent drought incidents recently, and due to this reason, many scientific studies were carried to explore the drought in South Asia. In this context, we review scientific studies related to drought in South Asia. The study initially identifies the importance of drought-related studies and discusses drought types for South Asian regions. The representative examples of drought events, severity, frequency, and duration in South Asian countries are identified. The Standardized Precipitation Index (SPI) was mostly adopted in South Asian countries to quantify and monitor droughts. Nevertheless, the absence of drought quantification studies in Bhutan and Maldives is of great concern. Future studies to generate a combined drought severity map for the South Asian region are required. Moreover, the drought prediction and projection in the regions is rarely studied. Further, the teleconnection between drought and large-scale atmospheric circulations in the South Asian area has not been discussed in detail in the most scientific literature. Therefore, as a take-home message, there is an urgent need for scientific studies related to drought quantification for some regions in South Asia, prediction and projection of drought for an individual country (or as a region), and drought teleconnection to atmospheric circulation.
ARTICLE | doi:10.20944/preprints202008.0676.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Popularity Prediction; Classification; Social Network; Machine Learning; Instagram
Online: 30 August 2020 (15:56:34 CEST)
Predicting the popularity of posts on social networks has taken on significant importance in recent years, and several social media management tools now offer solutions to improve and optimize the quality of published content and to enhance the attractiveness of companies and organizations. Scientific research has recently moved in this direction, with the aim of exploiting advanced techniques such as machine learning, deep learning, natural language processing, etc., to support such tools. In light of the above, in this work we aim to address the challenge of predicting the popularity of a future post on Instagram, by defining the problem as a classification task and by proposing an original approach based on Gradient Boosting and feature engineering, which led us to promising experimental results. The proposed approach exploits big data technologies for scalability and efficiency and is general enough to be applied to other social media as well.
ARTICLE | doi:10.20944/preprints202007.0697.v1
Subject: Engineering, Industrial & Manufacturing Engineering Keywords: RUL prediction; sensors; IOT; aircraft engine; business intelligence
Online: 29 July 2020 (12:34:24 CEST)
Increased smart devices in various industries is creating numerous sensors in each of the equipment prompting the need for methods and models for sensor data. Current research proposes a systematic approach to analyze the data generated from sensors attached to industrial equipment. The methodology involves data cleaning, preprocessing, basics statistics, outlier, and anomaly detection. Present study presents the prediction of RUL by using various Machine Learning models like Regression, Polynomial Regression, Random Forest, Decision Tree, XG Boost. Hyper Parameter Optimization is performed to find the optimal parameters for each variable. In each of the model for RUL prediction RMSE, MAE are compared. Outcome of the RUL prediction should be useful for decision maker to drive the business decision; hence Binary classification is performed, and business case analysis is performed. Business case analysis includes the cost of maintenance and cost of non-maintaining a particular asset. Current research is aimed at integrating the machine intelligence and business intelligence so that the industrial operations optimized both in resource and profit.
ARTICLE | doi:10.20944/preprints202007.0650.v1
Subject: Mathematics & Computer Science, Other Keywords: Myocarditis; Diagnosis; Convolutional Neural Network; Cardiac MRI; prediction
Online: 26 July 2020 (17:44:05 CEST)
Myocarditis is the form of an inflammation of the middle layer of the heart wall which is caused by a viral infection and can affect the heart muscle and its electrical system. It has remained as one of the most challenging diagnoses in cardiology. Myocardial is the prime cause of unexpected death in approximately 20% of adults less than 40 years of age. Cardiac MRI (CMR) has been considered as a noninvasive and golden standard diagnostic tool for suspected myocarditis and plays an indispensable role in diagnosing various cardiac diseases. However, the performance of CMR is heavily dependent on the clinical presentation and non-specific features such as chest pain, arrhythmia, and heart failure. Besides, other imaging factors like artifacts, technical errors, pulse sequence, acquisition parameters, contrast agent dose, and more importantly qualitatively visual interpretation can affect the result of the diagnosis. This paper introduces a new deep learning-based model called Convolutional Neural Network-Clustering (CNN-KCL) to diagnose the Myocarditis. The hybrid CNN-KCL method performs the early and accurate diagnosis of Myocarditis. To the best-of-our-knowledge, a Convolutional neural network has never been used before for the diagnosis of Myocarditis. In this study, we used 47 subjects to diagnose myocarditis patients from Tehran's Omid Hospital. The total number of data examined is 10425. Our results demonstrate that CNN-KCL achieves 92.3% in terms of diagnosis myocarditis prediction accuracy which is significantly better than those reported in previous studies.
ARTICLE | doi:10.20944/preprints202004.0257.v2
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: COVID-19; Predictive Analytics; Machine Learning; Prediction; Pandemic
Online: 14 May 2020 (09:03:52 CEST)
Globally, there is massive uptake and explosion of data and challenge is to address issues like scale, pace, velocity, variety, volume and complexity of this big data. Considering the recent epidemic in China, modeling of COVID-19 epidemic for cumulative number of infected cases using data available in early phase was big challenge. Being COVID-19 pandemic during very short time span, it is very important to analyze the trend of these spread and infected cases. This chapter presents medical perspective of COVID-19 towards epidemiological triad and the study of state-of-the-art. The main aim this chapter is to present different predictive analytics techniques available for trend analysis, different models and algorithms and their comparison. Finally, this chapter concludes with the prediction of COVID-19 using Prophet algorithm indicating more faster spread in short term. These predictions will be useful to government and healthcare communities to initiate appropriate measures to control this outbreak in time.
ARTICLE | doi:10.20944/preprints202004.0466.v1
Subject: Medicine & Pharmacology, General Medical Research Keywords: COVID-19; coronavirus; ACE2; bioinformatics analysis; drug prediction
Online: 26 April 2020 (03:14:50 CEST)
Recently, the outbreak of coronavirus disease 2019 (COVID-19) is threatening human health globally. There is a dire need to find potential therapeutic agents. Angiotensin converting enzyme 2 (ACE2), as an entry receptor of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is considered as potential therapeutic target in COVID-19 pandemic. Here, our bioinformatics analysis revealed that the biological function of ACE2 was correlated with regulation of blood pressure and mediation of SARS-CoV-2 entry into host cells. Ten ACE2 cooperative proteins were identified by using STRING with a high score. ACE2 expressed highly in the small intestine, testis, and kidney. The level of ACE2 expression in tumor tissues varies in different types of cancers compared with that in normal tissues. It was worth noting that the expression level of ACE2 in the tumor has no effect on patient survival. MiRNA hsa-miR-942-5p, and three transcription factors (TFs) including Signal transducer and activator of transcription 4 (STAT4), Estrogen related receptor α (ESRRA), and Signal transducer and activator of transcription 3 (STAT3) were selected as novel ACE2 regulators. Moreover, nine potential therapeutic drugs were predicted by two online databases. Thus, our research may expand the overall view of ACE2 in COVID-19 treatment.
ARTICLE | doi:10.20944/preprints201908.0042.v1
Subject: Earth Sciences, Atmospheric Science Keywords: Africa; rainfall; variability; prediction; multimodel; superensemble; synthetic; skill
Online: 5 August 2019 (04:48:15 CEST)
Improvements that can be attained in seasonal climate predictions in various parts of Africa using the multimodel supersensemble scheme are presented in this study. The synthetic superensemble (SSE) used follows the approach originally developed at Florida State University (FSU). The technique takes more advantage of the skill in the climate forecast data sets from atmosphere-ocean general circulation models running at many centres worldwide including the WMO global producing centers (GPCs). The module used in this work drew data sets from the Four versions of FSU coupled model system, seven models from the DEMETER project which is the forerun to the current European Ensembles Forecast System, the NCAR Model, and the Predictive Ocean Atmosphere Model for Australia (POAMA), all making a set of 13 individual models. An archive consisting of monthly simulations of precipitation was available over all the 5 regions of Africa, namely Eastern, Central, Northern, Southern, and Western Africa. The results showed that the SSE forecast for precipitation carries a higher skill compared to each of the member models and the ensemble mean. Relative to the ensemble mean (EM), the SSE provides an improvement of 18% in simulation of season cycle of precipitation climatology. In Eastern Africa, during December-February season, a north-south gradient of precipitation prevails between Tropical East Africa and the sector of the region towards Southern Africa. This regional scale climate pattern is a direct influence of the Intertropical Convergence Zone (ITZC) across the African continent during this time of the year. The SSE emerges with superior skill scores such as lowest root mean square error above the EM and the member models, for example in the prediction of spatial location and precipitation magnitudes that characterize the see-saw precipitation pattern in Eastern Africa. In all parts of Africa, and especially Eastern Africa where seasonal precipitation variability is a frequent cause huge human suffering in due to droughts and famine, the multimodel superensemble and its subsequent improvements will always provide a forecast that out weighs the best Atmosphere-Ocean Climate Model.
ARTICLE | doi:10.20944/preprints201901.0091.v1
Subject: Engineering, Civil Engineering Keywords: Acoustic emissions, fracture process, failure prediction, q-statistics
Online: 9 January 2019 (16:35:10 CET)
In this paper we present experimental results concerning Acoustic Emission (AE) recorded during cyclic compression tests on two different kinds of brittle building materials, namely concrete and basalt. The AE inter-event times were investigated through a non-extensive statistical mechanics analysis which shows that their decumulative probability distributions follow q-exponential laws. The entropic index q and the relaxation parameter q 1=Tq, obtained by fitting the experimental data, exhibit systematic changes during the various stages of the failure process, namely (q; Tq) linearly align. The Tq = 0 point corresponds to the macroscopic breakdown of the material. The slope, including its sign, of the linear alignment appears to depend on the chemical and mechanical properties of the sample. These results provide an insight on the warning signs of the incipient failure of building materials and could therefore be used in monitoring the health of existing structures such as buildings and bridges.
ARTICLE | doi:10.20944/preprints201811.0220.v1
Subject: Social Sciences, Accounting Keywords: bankruptcy prediction; audit report; artificial intelligence; PART algorithm
Online: 8 November 2018 (14:45:12 CET)
Despite the number of studies on bankruptcy prediction using financial ratios, very little is known about how external audit information can contribute to anticipating financial distress. A handful of papers show that a combination of ratios and audit data can provide significant predictive purposes, but a recent paper by Muñoz-Izquierdo et al. (2018) provided an 80% predictive accuracy solely by using the disclosures of audit reports. We complement this study. Applying an artificial intelligence method (the PART algorithm), we examine the predictive ability of more easily extracted information from the report and suggest a practical implication for each user. Simply by (1) finding the audit opinion, (2) identifying if a matter section exist, (3) and the number of comments disclosed, then any user may predict a bankruptcy situation with the same accuracy as if they had scrutinised the whole report. In addition, we also provide an extended literature review about previous studies on the interaction between bankruptcy prediction and the external audit information.
ARTICLE | doi:10.20944/preprints201810.0103.v1
Subject: Life Sciences, Virology Keywords: Nipah Virus, outbreak, inhibitors, QSAR, database, prediction algorithm
Online: 5 October 2018 (15:04:23 CEST)
Nipah virus (NiV) is responsible to cause various outbreaks in Asian countries, with latest from Kerala state of India. Till date there is no drug available despite its urgent requirement. In the current study, we have provided a computational one-stop solution for NiV inhibitors. We have developed “anti-Nipah” web resource, which comprised of a data repository, prediction method, and data visualization modules. The database comprised of 313 (181 unique) inhibitors from different strains and outbreaks of NiV extracted from research articles and patents. However, the quantitative structure–activity relationship (QSAR) based predictors were accomplished using classification approach employing 10-fold cross validation through support vector machine with 120 (68p + 52n) inhibitors. The overall predictor showed the accuracy and Matthew’s correlation coefficient of 88.89% and 0.77 on training/testing dataset respectively. The independent validation dataset also performed equally well. The data visualization modules from chemical clustering and principal component analyses displayed the diversity in the NiV inhibitors. Therefore, our web platform would be of immense help to the researchers working in developing effective inhibitors against NiV. The user-friendly webserver is freely available on URL: http://bioinfo.imtech.res.in/manojk/antinipah/
COMMUNICATION | doi:10.20944/preprints201803.0054.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: data feature selection; data clustering; travel time prediction
Online: 7 March 2018 (13:30:06 CET)
In recent years, governments applied intelligent transportation system (ITS) technique to provide several convenience services (e.g., garbage truck app) for residents. This study proposes a garbage truck fleet management system (GTFMS) and data feature selection and data clustering methods for travel time prediction. A GTFMS includes mobile devices (MD), on-board units, fleet management server, and data analysis server (DAS). When user uses MD to request the arrival time of garbage truck, DAS can perform the procedure of data feature selection and data clustering methods to analyses travel time of garbage truck. The proposed methods can cluster the records of travel time and reduce variation for the improvement of travel time prediction. After predicting travel time and arrival time, the predicted information can be sent to user’s MD. In experimental environment, the results showed that the accuracies of previous method and proposed method are 16.73% and 85.97%, respectively. Therefore, the proposed data feature selection and data clustering methods can be used to predict stop-to-stop travel time of garbage truck.
ARTICLE | doi:10.20944/preprints201710.0163.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: link prediction; combination method; theoretical limit; TLF method
Online: 26 October 2017 (05:49:34 CEST)
The theoretical limit of link prediction is a fundamental problem in this field. Taking the network structure as object to research this problem is the mainstream method. This paper proposes a new viewpoint that link prediction methods can be divided into single or combination methods, based on the way they derive the similarity matrix, and investigates whether there a theoretical limit exists for combination methods. We propose and prove necessary and sufficient conditions for the combination method to reach the theoretical limit. The limit theorem reveals the essence of combination method that is to estimate probability density functions of existing links and nonexistent links. Based on limit theorem, a new combination method, theoretical limit fusion (TLF) method, is proposed. Simulations and experiments on real networks demonstrated that TLF method can achieve higher prediction accuracy.
ARTICLE | doi:10.20944/preprints201709.0114.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: WSN; IoT; seawater temperature prediction; marine aquaculture support
Online: 23 September 2017 (11:31:13 CEST)
Aquaculture is growing ever more important due to the decrease in natural marine resources and increase inworldwide demand. To avoid losses due to aging and abnormalweather, it is important to predict seawater temperature in order to maintain a more stable supply, particularly for high value added products, such as pearls and scallops. The increase in species extinction is a prominent societal issue. Furthermore, in order to maintain a stable quality of farmed fishery, water temperature should be measured daily and farming methods altered according to seasonal stresses. In this paper, we propose an algorithm to estimate seawater temperature in marine aquaculture by combining seawater temperature data and actual weather data.
ARTICLE | doi:10.20944/preprints202010.0436.v1
Subject: Keywords: Naïve Bayes Classification; Eulers Strength Formula; Cricket Prediction; Supervised Learning; KNIME Tool; Cricket prediction; sports analytics; multivariate regression; neural network
Online: 21 October 2020 (12:34:00 CEST)
In cricket, particularly the twenty20 format is most watched and loved by the people, where no one can guess who will win the match until the last ball of the last over. In India, The Indian Premier League (IPL) started in 2008 and now it is the most popular T20 league in the world. So we decided to develop a machine learning model for predicting the outcome of its matches. Winning in a Cricket Match depends on many key factors like a home ground advantage, past performances on that ground, records at the same venue, the overall experience of the players, record with a particular opposition, and the overall current form of the team and also the individual player. This paper briefs about the key factors that affect the result of the cricket match and the regression model that best fits this data and gives the best predictions. Cricket, the mainstream and widely played sport across India which has the most noteworthy fan base. Indian Premier League follows 20-20 format which is very unpredictable. IPL match predictor is a ML based prediction approach where the data sets and previous stats are trained in all dimensions covering all important factors such as: Toss, Home Ground, Captains, Favorite Players, Opposition Battle, Previous Stats etc, with each factor having different strength with the help of KNIME Tool and with the added intelligence of Naive Bayes network and Eulers strength calculation formula.
REVIEW | doi:10.20944/preprints201806.0137.v1
Subject: Medicine & Pharmacology, Other Keywords: opportunity; challenge; perspective; health data; disease prediction; clinical outcome prediction; healthcare process; data quality; quantity and quality analysis; artificial intelligence
Online: 8 June 2018 (13:22:08 CEST)
Health information technology has been widely used in healthcare, which has contributed a huge amount of data. Health data has four characteristics: high volume; high velocity; high variety and high value. Thus, they can be leveraged to i) discover associations between genes, diseases and drugs to implement precision medicine; ii) predict diseases and identify their corresponding causal factors to prevent or control the diseases at an earlier time; iii) learn risk factors related to clinical outcomes (e.g., patients’ unplanned readmission), to improve care quality and reduce healthcare expenditure; and iv) discover care coordination patterns representing good practice in the implementation of collaborative patient-centered care. At the same time, there are major challenges existing in data-driven healthcare research, which include: i) inefficient health data exchanges across different sources; ii) learned knowledge is biased to specific institution; iii) inefficient strategies to evaluate plausibility of the learned patterns and v) incorrect interpretation and translation of the learned patterns. In this paper, we review various types of health data, discuss opportunities and challenges existing in the data-driven healthcare research, provide solutions to solve the challenges, and state the important role of the data-driven healthcare research in the establishment of smart healthcare system.
ARTICLE | doi:10.20944/preprints202205.0147.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: SARIMA; Artificial Neural Networks (ANN); LSTM; hybrid methodologies; prediction
Online: 11 May 2022 (05:50:41 CEST)
The choice of holiday destinations is highly depended on climate considerations. Nowadays, since the effects of climate crisis are being increasingly felt, the need of accurate weather and climate services for hotels is crucial. Such a service could be beneficial for both the future planning of tourists’ activities and destinations and for hotel managers as it could help in decision making about the planning and expansion of the touristic season, due to a prediction of higher temperatures for a longer time span, thus causing increased revenue for companies in the local touristic sector. The aim of this work is to calculate predictions on climatic variables using statistical techniques as well as Artificial Intelligence (AI) for a specific area of interest utilising data from in situ meteorological station, and produce valuable and reliable localised predictions with the most cost-effective method possible. This investigation will answer the question of the most suitable prediction method for time series data from a single meteorological station that is deployed in a specific location. As a result, an accurate representation of the microclimate in a specific are is achieved. To achieve this high accuracy in situ measurements and prediction techniques are used. As prediction techniques, Seasonal Auto Regressive Integrated Moving Average (SARIMA), AI techniques like the Long-Short-Term-Memory (LSTM) Neural Network and hybrid combinations of the two are used. Variables of interest are divided in the easier to predict temperature and humidity that are more periodic and less chaotic, and the wind speed as an example of a more stochastic variable with no known seasonality and patterns. Our results show that the examined Hybrid methodology performs the best at temperature and wind speed forecasts, closely followed by the SARIMA whereas LSTM perform better overall at the humidity forecast, even after the correction of the Hybrid to the SARIMA model.
ARTICLE | doi:10.20944/preprints202203.0365.v1
Subject: Medicine & Pharmacology, General Medical Research Keywords: SOFA; Impedance ratio; mortality; emergency department; Critical care; prediction
Online: 28 March 2022 (14:01:05 CEST)
Background: The Sequential Organ Failure Assessment (SOFA) is a scoring system used for the evaluation of disease severity and prognosis of critically ill patients. The impedance ratio (Imp-R) is a novel mortality predictor. Aims: This study aimed to evaluate the combination of SOFA + Imp-R in the prediction of mortality in critically ill patients admitted to the emergency department (ED). Methods: A retrospective cohort study was performed in adult patients with acute illness admitted to the ED of a tertiary-care referral center. Baseline SOFA score and bioelectrical impedance analysis to obtain the Imp-R were performed within the first 24 hours after admission to the ED. A Cox regression analysis was performed to evaluate mortality risk of initial SOFA score plus Imp-R. Harrell's C-statistic and decision curve analyses (DCA) were performed. Results: Out of 325 patients, 240 were included for analysis. Overall mortality was 31.3%. Only 21.3% of non-surviving patients died after hospital discharge, and 78.4% died during hospital stay. Of the latter, 40.6% died in the ED. SOFA and Imp-R values were higher in non-survivors and were significantly associated with mortality in all models. The combination of SOFA + Imp-R significantly predicted 30-day mortality, in-hospital mortality, and ED mortality with area under the curve (AUC) of 0.80 (95% CI: 74-0.86), 0.79 (95% CI: 0.74-0.86) and 0.75 (95% CI: 0.66-0.84) respectively. The DCA showed that combining SOFA + Imp-R improved the prediction of mortality through the lower risk thresholds. Conclusion: The addition of Imp-R to baseline SOFA score at admission to the ED improves mortality prediction in severely acutely ill patients admitted to the ED.
ARTICLE | doi:10.20944/preprints202111.0548.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Failure Prediction; Fault-tolerance; Cloud Computing; Artificial Intelligence; Reliability
Online: 29 November 2021 (15:39:23 CET)
Identifying and anticipating potential failures in the cloud is an effective method for increasing cloud reliability and proactive failure management. Many studies have been conducted to predict potential failure, but none have combined SMART (Self-Monitoring, Analysis, and Reporting Technology) hard drive metrics with other system metrics such as CPU utilisation. Therefore, we propose a combined metrics approach for failure prediction based on Artificial Intelligence to improve reliability. We tested over 100 cloud servers’ data and four AI algorithms: Random Forest, Gradient Boosting, Long-Short-Term Memory, and Gated Recurrent Unit. Our experimental result shows the benefits of combining metrics, outperforming state-of-the-art.
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: Query Variations; Query Reformulations; Query Performance Prediction; Systematic Reviews
Online: 13 September 2021 (09:56:16 CEST)
Evidence-based healthcare integrates the best research evidence with clinical expertise in order to make decisions based on the best practices available. In this context, the task of collecting all the relevant information, a recall oriented task, in order to take the right decision within a reasonable time frame has become an important issue. In this paper, we investigate the problem of building an effective Consumer Health Search (CHS) systems that use query variations to achieve high recall and fulfill the information needs of health consumers. In particular, we study an intent-aware gain metric used to estimate the amount of missing information and make a prediction about the achievable recall for each query reformulation during a search session. We evaluate and propose alternative formulations of this metric using standard test collections of the CLEF 2018 eHealth Evaluation Lab CHS.
ARTICLE | doi:10.20944/preprints202107.0073.v1
Subject: Medicine & Pharmacology, Obstetrics & Gynaecology Keywords: second primary cancers (SPCs); endometrial cancer (EC); risk prediction.
Online: 2 July 2021 (15:54:58 CEST)
Due to the high effectiveness of cancer screening and therapies, the diagnosis of second primary cancers (SPCs) has increased in women with endometrial cancer (EC). However, there’s no previous literature mentioned about adequate evidence to support screening for SPCs in endometrial cancer. This study was aimed to develop effective risk prediction models of second primary endometrial cancer in women with obesity (Body-mass index; BMI > 25) and this study includes datasets of the incidence of SPCs and the other risks of SPCs in 4480 primary cancer survivors by a hospital-based cancer registry database. In our study, we found the obesity played a key role in SPCs. There’re 10 independent variables used as predicting variables, which corelated to obesity should be monitored for the early detection of SPCs in endometrial cancer. In conclusion, it is a promising SPCs prediction. The proposed scheme can support the important influence of obesity and clinical data representations in all cases after primary treatments. Our results suggested that obesity is still a crucial risk factor to SPCs in endometrial cancer.
ARTICLE | doi:10.20944/preprints202104.0588.v2
Subject: Earth Sciences, Atmospheric Science Keywords: Air Pollution; STURLA; Urban Structure; Mobile Monitoring; Spatial Prediction
Online: 5 May 2021 (12:41:17 CEST)
Understanding the relationships between land cover/urban structure patterns and air pollutants is key to sustainable urban planning and development. In this study, we employ a mobile monitoring method to collect PM2.5 and BC data in the city of Philadelphia, PA during the summer of 2019 and apply the Structure of Urban Landscapes (STURLA) methodology to examine relationships between urban structure and atmospheric pollution. We find that, while PM2.5 and BC vary by STURLA class, many of the differences in pollutant concentrations between classes are not significant. However, we also find that the proportions in which STURLA components are present throughout the urban landscape can be used to predict urban air pollution. Among frequently sampled STURLA classes, gpl hosted the highest PM2.5 concentrations on average (16.60 ± 4.29 µg/m3), while tgbwp hosted the highest BC concentrations (2.31 ± 1.94 µg/m3). Furthermore, STURLA combined with machine learning modeling was able to correlate PM2.5 (R2= 0.68, RMSE 2.82 µg/m3) and BC (R2 = 0.64, RMSE 0.75 µg/m3) concentrations with the urban landscape and spatially interpolate concentrations where sampling did not take place. These results demonstrate the efficacy of the STURLA methodology in modeling relationships between air pollution and land cover/urban structure patterns.
ARTICLE | doi:10.20944/preprints202101.0411.v1
Subject: Engineering, Automotive Engineering Keywords: Energy performance; Cooling load prediction; Neural network, Metaheuristic optimization.
Online: 21 January 2021 (09:23:04 CET)
Regarding the high efficiency of metaheuristic techniques in energy performance analysis, this paper scrutinizes and compares five novel optimizers, namely biogeography-based optimization (BBO), invasive weed optimization (IWO), social spider algorithm (SOSA), shuffled frog leaping algorithm (SFLA), and harmony search algorithm (HSA) for the early prediction of cooling load in residential buildings. The algorithms are coupled with a multi-layer perceptron (MLP) to adjust the neural parameters that connect the CL with the influential factors. The complexity of the models is optimized by means of a trial-and-error effort, and it was shown that the BBO and IWO need more crowded spaces for fulfilling the optimization. The results revealed that the internal parameters (i.e., biases and weights) suggested by the BBO generate the most reliable MLP for both analyzing and generalizing the CL pattern (with nearly 93 and 92% correlations, respectively). Followed by this, the IWO emerged as the second powerful optimizer with mean absolute errors of 1.8632 and 1.9110 in the training and testing phases. Therefore, the BBO-MLP and IWO-MLP can be reliably used for accurate analysis of the CL in future projects.
ARTICLE | doi:10.20944/preprints202012.0530.v1
Subject: Social Sciences, Accounting Keywords: Economic impact; Population mobility data; Prediction; Assess; Covid-19
Online: 21 December 2020 (14:28:50 CET)
The COVID-19 pandemic caused by SARS-CoV-2 poses a devastating threat to human society in terms of health, economy and lifestyle. Establishing accurate and real-time models to predict and assess the impact of the epidemic on the economy is instructive. We have designed a new model to quantitatively assess the impact of the COVID-19 on the economy of China’s mainland. The nominal GDP in the Q1 of 2020 that we predicted for China’s mainland with the Baidu Mi-gration Data is RMB 20,785.7 billion, which is less by 3.59% than that in 2019. The estimated val-ue is confirmed roughly by the official report released in April 17, 2020 (RMB 20,650 billion, 6.8% year-on-year declined). Strict control measures during the epidemic have greatly reduced Chi-na's economic activity and had a serious impact on the country's economy. Orderly promotion of population mobility plays a decisive role in economic recovery.
ARTICLE | doi:10.20944/preprints202011.0248.v1
Subject: Engineering, Automotive Engineering Keywords: Tool life; Machine Learning; Gradient Descent Algorithm; Prediction; Machining
Online: 6 November 2020 (15:49:06 CET)
In automated manufacturing systems, most of the manufacturing processes including machining are automated. Automatic tool change is one of the important parameters for reducing manufacturing lead time. Ceramic cutting tools are used to machine hard materials. Ti[C,N] mixed alumina ceramic cutting tools are widely used to machine hardened steel and Stainless Steel due to its superior mechanical properties. Martensitic stainless steel has wide applications in screws, bolts, nuts and other engineering applications. Machining studies on Martensitic Stainless Steel was conducted using Ti[C,N] mixed alumina ceramic cutting tool. Tool life was evaluated using flank wear criterion. The tool life obtained from experimental machining process was taken as training dataset and test dataset for machine learning. Using the dataset obtained from experimental machining tool life model has been developed using Gradient Descent algorithm. The model was validated using co-efficient of determination. The accuracy of the machine learning model was tested using the test data and 99.83% accuracy was obtained. Tool life model based on Gradient Descent Algorithm was successfully implemented for the tool life of Ti[C,N] mixed alumina ceramic cutting tool.Keywords: keyword 1; keyword 2; keyword 3 (List three to ten pertinent keywords specific to the article; yet reasonably common within the subject discipline.)
ARTICLE | doi:10.20944/preprints202011.0189.v1
Subject: Engineering, Automotive Engineering Keywords: Milling；Finite element simulation；Tool wear；Tool life prediction
Online: 4 November 2020 (10:43:46 CET)
In the process of metal cutting, the anti-wear performance of the tool determines the life of the tool and affects the surface quality of the workpiece. The finite element simulation method can directly show the tool wear state and morphology, but due to the limitations of the simulation time and complex boundary conditions, it has not been commonly used in tool life prediction. Based on this, a tool wear model was established on the platform of a finite element simulation software for the cutting process of titanium alloy TC4 by end milling. The key technique is to embed different types of tool wear models into the finite element model in combination with the consequent development technology. The effectiveness of the tool wear model is validated by comparing the experimental results with the simulation results. At the same time, in order to quickly predict the tool life, an empirical prediction formula of tool wear was established, and the change course of tool wear under time change was obtained.
Subject: Earth Sciences, Atmospheric Science Keywords: Atacama microbiome; function prediction; extremophiles; osmotic stress; salt amendments
Online: 14 October 2020 (10:26:02 CEST)
Over the past 150 million years, the Chilean Atacama Desert has been transformed into one of the most inhospitable landscapes by geophysical changes, which makes it an ideal Mars analog that has been explored for decades. However, two heavy rainfalls that occurred in the Atacama in 2015 and 2017 provide a unique opportunity to study the response of resident extremophiles to rapid environmental change associated with excessive water and salt shock. Here we combine mineral/ salt composition measurements, amendment cell culture experiments, and next-generation sequencing analyses to study the variations in salts and microbial communities along a latitudinal aridity gradient of the Atacama Desert. In addition, we examine the reshuffling of Atacama microbiomes after the two rainfall events by comparing with previous researches. Analysis of microbial community composition revealed that soils within the southern arid desert were consistently dominated by Actinobacteria, Proteobacteria, Acidobacteria, Planctomycetes, Chloroflexi, Bacteroidetes, Gemmatimonadetes, and Verrucomicrobia. Intriguingly, the hyperarid microbial consortia exhibited a similar pattern to the more southern desert. Salts at the shallow subsurface were dissolved and leached down to a deeper layer, challenging indigenous microorganisms with the increasing osmotic stress. Microbial viability was found to change with aridity and rainfall events. This study sheds light on the structure of xerophilic, halophilic, and radioresistant microbiomes from the hyperarid northern desert to the less arid southern transition region, as well as their response to changes in water availability. Our findings may infer similar events that happened on the wetter early Mars.
Subject: Keywords: COVID-19; description; prediction; causal inference; extrapolation; simulation; projection
Online: 10 August 2020 (10:44:46 CEST)
The models used to estimate disease transmission, susceptibility and severity determine what epidemiology can (and cannot tell) us about COVID-19. These include: ‘model organisms’ chosen for their phylogenetic/aetiological similarities; multivariable statistical models to estimate the strength/direction of (potentially causal) relationships between variables (through ‘causal inference’), and the (past/future) value of unmeasured variables (through ‘classification/prediction’); and a range of modelling techniques to predict beyond the available data (through ‘extrapolation’), compare different hypothetical scenarios (through ‘simulation’), and estimate key features of dynamic processes (through ‘projection’). Each of these models: address different questions using different techniques; involve assumptions that require careful assessment; and are vulnerable to generic and specific biases that can undermine the validity and interpretation of their findings. It is therefore necessary that the models used: can actually address the questions posed; and have been competently applied. In this regard, it is important to stress that extrapolation, simulation and projection cannot offer accurate predictions of future events when the underlying mechanisms (and the contexts involved) are poorly understood and subject to change. Given the importance of understanding such mechanisms/contexts, and the limited opportunity for experimentation during outbreaks of novel diseases, the use of multivariable statistical models to estimate the strength/direction of potentially causal relationships between two variables (and the biases incurred through their misapplication/misinterpretation) warrant particular attention. Such models must be carefully designed to address: ‘selection-collider bias’, ‘unadjusted confounding bias’ and ‘inferential mediator adjustment bias’ – all of which can introduce effects capable of enhancing, masking or reversing the estimated (true) causal relationship between the two variables examined. Selection-collider bias occurs when these two variables independently cause a third (the ‘collider’), and when this collider determines/reflects the basis for selection in the analysis. It is likely to affect all incompletely representative samples, although its effects will be most pronounced wherever selection is constrained (e.g. analyses focusing on infected/hospitalised individuals). Unadjusted confounding bias disrupts the estimated (true) causal relationship between two variables when: these share one (or more) common cause(s); and when the effects of these causes have not been adjusted for in the analyses (e.g. whenever confounders are unknown/unmeasured). Inferentially similar biases can occur when: one (or more) variable(s) (or ‘mediators’) fall on the causal path between the two variables examined (i.e. when such mediators are caused by one of the variables and are causes of the other); and when these mediators are adjusted for in the analysis. Such adjustment is commonplace when: mediators are mistaken for confounders; prediction models are mistakenly repurposed for causal inference; or mediator adjustment is used to estimate direct and indirect causal relationships (in a mistaken attempt at ‘mediation analysis’). These three biases are central to ongoing and unresolved epistemological tensions within epidemiology. All have substantive implications for our understanding of COVID-19, and the future application of artificial intelligence to ‘data-driven’ modelling of similar phenomena. Nonetheless, competently applied and carefully interpreted, multivariable statistical models may yet provide sufficient insight into mechanisms and contexts to permit more accurate projections of future disease outbreaks.
ARTICLE | doi:10.20944/preprints202004.0491.v1
Subject: Mathematics & Computer Science, Probability And Statistics Keywords: COVID-19; India; prediction models; statistics; data; Indian states
Online: 28 April 2020 (08:57:57 CEST)
The very first case of corona-virus illness was recorded on 30 January 2020, in India and the number of infected cases, including the death toll, continues to rise. In this paper, we present short-term forecasts of COVID-19 for 28 Indian states and five union territories using real-time data from 30 January to 20May 2020. Applying Holt’s second-order exponential smoothing method and autoregressive integrated moving average (ARIMA) model, we generated 10-day ahead forecasts of the likely number of infected cases and deaths in India until 29 May2020. Our results show that the number of cumulative cases in India will rise to169109 [PI 95% (14426, 19455)], concurrently the number of deaths may increase to 4863 [PI 95% (4221, 5551)] by 29 May 2020.Further, we have marked the states (e.g. Delhi, Uttar Pradesh, Rajasthan, Madhya Pradesh, Maharashtra, Gujarat, and Tamil Nadu) where outburst is expected by considering the cases above three standard deviations. Under the worst-case scenario, Maharashtra is likely to be the most affected state with around 62628 [PI 95% (52840, 73555)] cumulative cases by 29 May 2020. However, Kerala and Karnataka are likely to remain in the lesser affected region. The presented results mark the states where lockdown by 1 June2020, can be loosened.
ARTICLE | doi:10.20944/preprints202004.0473.v1
Subject: Medicine & Pharmacology, Other Keywords: COVID-19; Egypt; prediction exponential growth rate; hospital preparedness
Online: 27 April 2020 (03:27:47 CEST)
BackgroundThe novel virus COVID-19, also known as SARS-CoV‑2, is currently rapidly spreading around the globe and pushing healthcare systems to the limits of their capacity. One of the functions of predictive models is to timely act for epidemic preparedness including hospital preparedness. In Egypt, like many other countries in the world, the epidemic situation and forecasting have not yet sufficiently studied. ObjectiveThe study was carried out to develop a short-term forecast scenario for the COVID-19 epidemic situation in Egypt and predict the hospital needs to accommodate the growing number of cases.MethodsSecondary data from the COVID-2019 daily reports and the report issued 8th of April by the Egyptian Ministry of Health and Population were used. Due to the daily changing level of knowledge and data, the article reflects the status up to 18 April 2020. The prediction was based on the exponential growth rate model. For the depiction of the situation, the full length of the epidemic timeline was analyzed (from February 14th till April 18th). The growth rates and their rates of decline during the period from the 22nd of March till the 18th of April were calculated and extrapolated in the coming 7 weeks. The predicted hospital needs were assessed against the announced allocated resources.ResultsThe epidemic curve in Egypt is on the ascending arm as of April, 18. The active cases showed exponential growth from the start of the epidemic till April, 18. At the end of this period time, the recovery rate was 23.12% and the case fatality rate (CFR) was7.39. The case fatality rate median level during the last four weeks was 6.64. The active cases are expected to reach more than 20,000 by late May then starts to decline. The allocated regular hospital beds are predicted to show shortage by the time of the release of the paper. The intensive care units (ICU) beds and ventilators are predicted to show insufficiency on May 6.Conclusions: The COVID-19 epidemic in Egypt is expected to continue on the rise for the next few weeks and expected to start to decline late in May, 2020. Our estimates should be useful in preparedness planning. Serious actions should be taken to provide ICU beds and ventilators enough for the predicted number of cases that would need them, not later than the end of April. Mitigation actions have to continue for the coming 6 weeks or until the epidemic situation is more clearly seen.
ARTICLE | doi:10.20944/preprints201904.0076.v3
Subject: Physical Sciences, Condensed Matter Physics Keywords: period vectors; external stress; crystal struture prediction; statistical physics
Online: 18 November 2019 (04:39:26 CET)
In crystal periodic structure prediction, a general equation is needed to determine the period vectors (cell edge vectors), especially when crystals are under arbitrary external stress. It has been derived in Newtonian dynamics years ago, which can be combined with quantum mechanics by further modeling. Here we derived such an equation in statistical physics, applicable to both classical physics and quantum physics by itself.
ARTICLE | doi:10.20944/preprints201907.0338.v1
Subject: Engineering, Automotive Engineering Keywords: prediction; futures studies; complex environment; machine learning data mining
Online: 30 July 2019 (03:48:37 CEST)
Decision-makers are concerned with the inherent complexity of the modern world's markets. However, price fluctuations, environmental concerns, technological development, emerging markets, political challenges, and social expectations made the 21st century's more dynamic and complex. From a policy-making perspective, it is vital to uncover future trends. This paper proposed that artificial intelligence can improve interpretations in complex markets, such as financial and energy markets. In a complex environment, it is critical to investigate maximum available input features to ensure no valuable informative feature is neglected. Some AI-based models are investigated and presented that AI-based models can successfully uncover future trends. From a scenario development perspective purified input features subset refer to driving forces which shape alternative futures. Results showed that using AI can improve our understanding of how input features influence future behaviors and simultaneously improves prediction accuracy and reliability.
ARTICLE | doi:10.20944/preprints201809.0426.v1
Subject: Engineering, Marine Engineering Keywords: riser; vortex-induced vibration; fatigue damage prediction; empirical method
Online: 21 September 2018 (04:04:01 CEST)
To gain insight into riser motions and associated fatigue damage due to vortex-induced vibration (VIV), data loggers such as strain sensors and/or accelerometers are sometimes deployed on risers to monitor their motion in different current velocity conditions. Accurate reconstruction of the riser response and empirical estimation of fatigue damage rates over the entire riser length using measurements from a limited number of sensors can help in efficient utilization of the costly measurements recorded. Several different empirical procedures are described here for analysis of the VIV response of a long flexible cylinder subjected to uniform and sheared current profiles. The methods include weighted waveform analysis (WWA), proper orthogonal decomposition (POD), modal phase reconstruction (MPR), a modified WWA procedure, and a hybrid method which combines MPR and the modified WWA method. Fatigue damage rates estimated using these different empirical methods are compared and cross-validated against measurements. Detailed formulations for each method are presented and discussed with examples. Results suggest that all the empirical methods, despite different underlying assumptions in each of them, can be employed to estimate fatigue damage rates quite well from limited strain measurements.
ARTICLE | doi:10.20944/preprints201808.0194.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: ANN; continual learning; machine intelligence; prediction; vandalism; security; SDR
Online: 9 August 2018 (15:02:06 CEST)
Oil and gas pipeline vandalism is a recurrent problem in oil rich zones of Nigeria and its West African neighbors and remains a challenge for multinationals to set ahead control measures to avert possible damages to operations both in infrastructure and business profit margins. In this paper, an integrative systems model comprising of a machine intelligence technique called Hierarchical Temporal Memory (HTM) and a sequence learning neural network called the Online-Sequential Extreme Learning Machine (OS-ELM) is proposed for monitoring and prediction of pipeline pressure data. The system models the continual prediction of pipeline oil/gas pressure signals useful for secure monitoring and control to avert acts of vandalism in oil and gas installations. The HTM uses a spatial pooler operated in temporal aggregated fashion and is defined as HTM-SP. The OS-ELM technique uses an explicit hierarchical training scheme so that the best cost estimates may be found after a stipulated number of trial runs. We study the performance of three OS-ELM neural activations: the sigmoid (sig), sinusoidal (sin) and radial basis function (rbf) activations. The results indicate improvement factors of 1.297, 1.297 and 1.300 of the HTM-SP over the OS-ELM sigmoid, sinusoidal and radial basis activations respectively.
ARTICLE | doi:10.20944/preprints201712.0197.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: air pollutant prediction; multi-task learning; regularization; analytical solution
Online: 28 December 2017 (09:09:20 CET)
In this paper, we tackle air quality forecasting by using machine learning approaches to predict the hourly concentration of air pollutants (e.g., Ozone, PM2.5 and Sulfur Dioxide). Machine learning, as one of the most popular techniques, is able to efficiently train a model on big data by using large-scale optimization algorithms. Although there exists some works applying machine learning to air quality prediction, most of the prior studies are restricted to small scale data and simply train standard regression models (linear or non-linear) to predict the hourly air pollution concentration. In this work, we propose refined models to predict the hourly air pollution concentration based on meteorological data of previous days by formulating the prediction of 24 hours as a multi-task learning problem. It enables us to select a good model with different regularization techniques. We propose a useful regularization by enforcing the prediction models of consecutive hours to be close to each other, and compare with several typical regularizations for multi-task learning including standard Frobenius norm regularization, nuclear norm regularization, ℓ2,1 norm regularization. Our experiments show the proposed formulations and regularization achieve better performance than existing standard regression models and existing regularizations.
ARTICLE | doi:10.20944/preprints201608.0142.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: churn prediction; incremental principal component analysis; stochastic gradient descent
Online: 13 August 2016 (11:28:39 CEST)
Modern companies accumulate a vast amount of customer data that can be used for creating a personalized experience. Analyzing this data is difficult and most business intelligence tools cannot cope with the volume of the data. One example is churn prediction, where the cost of retaining existing customers is less than acquiring new ones. Several data mining and machine learning approaches can be used, but there is still little information about the different algorithm settings to be used when the dataset doesn't fit into a single computer memory. Because of the difficulties of applying feature selection techniques at a large scale, Incremental Probabilistic Component Analysis (IPCA) is proposed as a data preprocessing technique. Also, we present a new approach to large scale churn prediction problems based on the mini-batch Stochastic Gradient Decent (SGD) algorithm. Compared to other techniques, the new method facilitates training with large data volumes using a small memory footprint while achieving good prediction results.
ARTICLE | doi:10.20944/preprints202209.0309.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: machine learning; natural language processing; commit messages; change prediction model
Online: 20 September 2022 (14:52:49 CEST)
Version Control and Source Code Management Systems, such as GitHub, contain large amount ofunstructured historical information of software projects. Recent studies have introduced Natural Language Processing (NLP) to help software engineers retrieve information from very large collection of unstructured data. In this study, we have extended our previous study by increasing our datasets and ML and clustering techniques. Method: We have followed a complex methodology made up of various steps. Starting from the raw commit messages we have employed NLP techniques to build a structured database. We have extracted their main features and used as input of different clustering algorithms. Once labelled each entry, we have applied supervised machine learning techniques to build a prediction and classification model. Results: We have developed a machine learning-based model to automatically classify commit messages of a software project. Our model exploits a ground-truth dataset which includes commit messages obtained from various GitHub projects belonging to the HEP context. Conclusions: The contribution of this paper is two-fold: it proposes a ground-truth database; it provides a machine learning prediction model. They automatically identify the more change-proneness areas of code. Our model has obtained a very high average precision, recall and F1-score.
ARTICLE | doi:10.20944/preprints202209.0008.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Glucose Oscillation; Prediction; Multi-agent; Type 1 Diabetes; Personalized; Recommendation
Online: 1 September 2022 (07:13:20 CEST)
The glucose-insulin regulatory system and its glucose oscillations is a recurring theme in the literature because of its impact on human lives, mostly the ones affected by diabetes mellitus. Several approaches were proposed, from mathematical to data-based models, with the aim of modeling the glucose oscillation curve. Having such a curve, it is possible to predict, when injecting insulin in type 1 diabetes (T1D) individuals. However, the literature presents prediction horizons no longer than 6 hours, which could be a problem considering their sleeping time. This work presents Tesseratus, a model that adopts a multi-agent approach to combine machine learning and mathematical modeling to predict the glucose oscillation up to 8 hours. Tesseratus uses the pharmacokinetics of insulins and data collected from T1D individuals. Its outcome can support endocrinologists while prescripting daily treatment for T1D individuals, and provide personalized recommendations for such individuals, to keep their glucose concentration in the ideal range. Tesseratus brings pioneering results for prediction horizons of 8 hours for nighttime, in an experiment with seven real T1D individuals. It is our claim that Tesseratus will be a reference for classification of glucose prediction model, supporting the mitigation of short- and long-term complications in the T1D individuals.
ARTICLE | doi:10.20944/preprints202202.0111.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Transformer; human activity recognition; time series; sequence-to-sequence prediction
Online: 8 February 2022 (13:09:21 CET)
This paper describes the successful application of the Transformer model used in the natural language processing and vision tasks as a means of processing the time series of signals from gyroscope and accelerometer sensors for the classification of human activities. The Transformer model is based on deep neural networks with many layers which can generalize well on signals. All measured signals come from a smartphone placed in a waist bag. Activity prediction is sequence-to-sequence, each time step of the signal is assigned a designation of the performed activity. Emphasis is placed on attention mechanisms, which express individual dependencies between signal values within a time series. In comparison with another recent result, the recognition precision was improved from 89.67 percent to 99.2 percent. The transformer model should in the future be included among the top options in machine learning methods for human activity recognition.
ARTICLE | doi:10.20944/preprints202202.0051.v1
Subject: Life Sciences, Other Keywords: Glioblastoma; survival prediction; Machine Learning; biomarkers; HumanPSDTM; Long-term survivor
Online: 3 February 2022 (12:00:23 CET)
Glioblastoma (GBM) is a very aggressive malignant brain tumor with the vast majority of patients surviving less than 12 months (Short-term survivors [STS]). Only around 2% of patients survive more than 36 months (Long-term survivors [LTS]). Studying these extreme survival groups might help in better understanding GBM biology. This work aims at exploring application of machine learning methods in predicting survival groups(STS, LTS). We used age and gene expression profiles belonging to 249 samples from publicly available datasets. 10 Machine learning methods have been implemented and compared for their performances. Hyperparameter tuned random forest model performed best with accuracy of 80% (AUC of 74% and F1_score of 85%). The performance of this model is validated on external test data of 16 samples. The model predicted the true survival group for 15 samples achieving an accuracy of 93.75%. This classification model is deployed as a web tool GlioSurvML. The top 1500 features which retained classification efficiency (Accuracy of 80%, AUC of 74%) were studied for enriched pathways and disease-causal biomarker associations using the HumanPSDTM database. We identified 199 genes as possible biomarkers of GBM and/or similar diseases (like Glioma, astrocytoma, and others). 57 of these genes are shown to be differentially expressed across survival groups and/or have impact on survival. This work demonstrates the application of machine learning methods in predicting survival groups of GBM.
ARTICLE | doi:10.20944/preprints202102.0172.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: Citizen Security; Smart Cities; Crime Prediction; Artificial Intelligence; Safe City
Online: 8 February 2021 (07:44:57 CET)
Smart city infrastructure has a significant impact on improving the quality of humans life. However, a substantial increase in the urban population from the last few years is posing challenges related to resource management, safety, and security. In order to ensure safe mobility and security in the smart city environment, this paper proposes a novel Artificial Intelligence (AI) based approach empowering the authorities to better visualize the threats and to help them identify the highly-reported crime zones yielding greater predictability of crime hot-spots in a smart city. To this end, it first investigates the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) to detect the hot-spots that have a higher risk of crimes to be committed. Second, for crime prediction, Seasonal Auto-Regressive Integrated Moving Average (SARIMA) exploited in each dense crime region to predict the number of crimes in the future with spatial and temporal information. The proposed HDBSCAN and SARIMA based crime prediction model is evaluated on ten years of crime data (2008-2017) for New York City (NYC). The accuracy of the model is measured by considering different time period scenarios i.e. (a) year-wise, i.e., for each year and (b) for the whole period of ten years, using an 80:20 ratio where 80\% data was used for training and 20\% data was used for testing. The proposed approach outperforms with an average Mean Absolute Error (MAE) of 11.47.
ARTICLE | doi:10.20944/preprints202012.0318.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Interpretable Artificial Intelligence; Cardiovascular disease prediction; Machine Learning in Healthcare
Online: 14 December 2020 (09:49:13 CET)
Learning systems have been very focused on creating models that are capable of obtaining the best results in error metrics. Recently, the focus has shifted to improvement in order to interpret and explain their results. The need for interpretation is greater when these models are used to support decision making. In some areas this becomes an indispensable requirement, such as in medicine. This paper focuses on the prediction of cardiovascular disease by analyzing the well-known Statlog (Heart) Data Set from the UCI’s Automated Learning Repository. This study will analyze the cost of making predictions easier to interpret by reducing the number of features that explain the classification of health status versus the cost in accuracy. It will be analyzed on a large set of classification techniques and performance metrics. Demonstrating that it is possible to make explainable and reliable models that have a good commitment to predictive performance.
REVIEW | doi:10.20944/preprints202009.0757.v1
Online: 30 September 2020 (15:08:34 CEST)
Human civilizations are under enormous threats due to the outbreak of novel coronavirus (COVID-19) originated from Wuhan, China. The asymptomatic carriers are the potential spreads of this novel virus. Since, guaranteed antiviral treatments have not been available in the market so far, it is really challenging to fight against this contagious disease. To save the living mankind, it is urgent to know more about how the virus transmits itself from one to another quite rapidly and how we can predict future infections. Scientists and Researchers are working hard in investigating to understand its high infection rate and transmission process. One possible way to know is to use our existing COVID-19 infection data and prepare a useful model to predict the future trend. Mathematical modelling is very useful to understand the basic principle of COVID-19 transmission and provide necessary guidelines for future prediction. Here, we have reviewed 9 distinct commonly used models based on Mathematical implementations for COVID-19 transmission and dig into the deep head to head comparison of each model. Finally, we have discussed interesting key behaviour of each model, relevant upcoming important issues, challenges and future directions.
HYPOTHESIS | doi:10.20944/preprints202009.0346.v1
Subject: Medicine & Pharmacology, General Medical Research Keywords: Sepsis; SIRS; oxic sulphidic oscillator; risk prediction; multiorgan failure; chemocline
Online: 16 September 2020 (04:07:17 CEST)
Life evolved in an euxinic world with subsequent oxic 'invasion' leading to two parallel but interconnected biospheres, hydrogen sulphide (H2S) and hydrogen peroxide (H2O2) exemplify these worlds respectively. Their concentration gradients have informational value in meromictic lakes. Similarly, it is posited, there exists a whole body chemocline in humans in which the two molecules form an inversely coupled oxic/sulphidic oscillator (OSO). The OSO is hormetic and characterised by a range of amplitudes and frequencies in health. Deviations from its baseline profile heralds the onset of SIRS before the appearance of clinical signs. Loss of oscillator status and transition to a steady state causes widespread intercellular and inter-organ communication failure presaging multi-organ dysfunction. The salient clinico-pathophysiological features of SIRS of any aetiology are emergent phenomena related to the OSO profile. Extent of recovery of organ function will mirror the recovery of the OSO profile thereby providing a tool to predict outcomes in SIRS.
ARTICLE | doi:10.20944/preprints202008.0542.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: Covid-19 test results; prediction; habits; health records; deep learning
Online: 25 August 2020 (08:58:07 CEST)
A patient will visit physicians when he/she feels ill. This illness is not for COVID-19 but it is a general tendency of human being to visit doctor probably it can not be controlled by general drug. When a patient comes to a doctor, the doctor examines him/her after knowing his/her problem. The physician always asks him/her about some questions related to him/her daily life. For example, if a young male patient comes to a doctor with a symptom of fever and cough, the first question doctor asked him that he has a habit of smoking. Then doctor asks him whether this type of symptom appeared often to him previously or not. If the answers of both questions are yes, then the first one is habit and the second one is that he may suffering from some serious disease or a disease due to the weather. The aim of this paper is to consider habit of the patient as well as he/she has been affected by a critical disease. This information is used to build a model that will predict whether there is any possibility of his/her being affected by COVID-19. This research work contributes to tackle the pandemic situation occurred due to Corona Virus Infectious Disease, 2019 (Covid-19). Outbreak of this disease happens based on numerous factors such as past health records and habits of patients. Health records include diabetes tendency, cardiovascular disease existence, pregnancy, asthma, hypertension, pneumonia; chronic renal disease may contribute to this disease occurrence. Past lifestyles such as tobacco, alcohol consumption may be analyzed. A deep learning based framework is investigated to verify the relationship between past health records, habits of patients and covid-19 occurrence. A stacked Gated Recurrent Unit (GRU) based model is proposed in this paper that identifies whether a patient can be infected by this disease or not. The proposed predictive system is compared against existing benchmark Machine Learning classifiers such as Support Vector Machine (SVM) and Decision Tree (DT).
ARTICLE | doi:10.20944/preprints202007.0124.v1
Subject: Keywords: COVID-19; Epidemic Prediction; Clinical Diagnosis; Policy Effectiveness; Contact Tracing
Online: 7 July 2020 (10:06:05 CEST)
The widely spread CoronaVirus Disease (COVID)- 19 is one of the worst infectious disease outbreaks in history and has become an emergency of primary international concern. As the pandemic evolves, academic communities have been actively involved in various capacities, including accurate epidemic estimation, fast clinical diagnosis, policy effectiveness evaluation and development of contract tracing technologies. There are more than 23,000 academic papers on the COVID-19 outbreak, and this number is doubling every 20 days while the pandemic is still on-going . The literature, however, at its early stage, lacks a comprehensive survey from a data analytics perspective. In this paper, we review the latest models for analyzing COVID19 related data, conduct post-publication model evaluations and cross-model comparisons, and collect data sources from different projects.
ARTICLE | doi:10.20944/preprints202006.0214.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: COVID-19; Prediction model; Pandemic bell curve; India; Different scenarios
Online: 17 June 2020 (09:40:23 CEST)
This paper is an attempt to present a COVID-19 prediction model for India. Lockdown plays an important role in the arrest of community spread of the disease. This was evident from the study of other countries such as Russia, Belgium and Germany, where peak cases were recorded within a month of the imposition of lockdown, that it showed an immediate positive effect. However, in India, even after 65 days of lockdown, there is no decrease in the number of daily new cases reported. There were many models prepared for India and almost all of them were proven wrong by the increase in the number of cases. The model in this paper is prepared using the COVID-19 trend in other countries, population density and the pandemic bell curve. Based on the available data until 24th May 2020, two scenarios have been presented. In one, the peak shall be obtained when the number of daily new cases per million reaches 190 and in the second when the daily new cases per million reach 724. One model predicts the number of cases to reach 1 million by mid-July 2020. The other model predicts the number of cases to peak by mid-July with the total cases reaching 20 million. The predicted cases were compared with the actual cases recorded for the period 25th May to 11th June 2020. It was observed that the actual values matched quite reasonably with the predicted values.
COMMUNICATION | doi:10.20944/preprints202005.0401.v1
Subject: Life Sciences, Other Keywords: SARS-CoV-2; Peptide Vaccine; Spike Protein; Vaccinomics; Epitope Prediction
Online: 24 May 2020 (19:11:19 CEST)
SARS-CoV-2 has been the talk of the town ever since the beginning of 2020. The pandemic has brought the complete world on a halt. Every country is trying all possible steps to combat the disease ranging from shutting the complete economy of the country to repurposing of drugs and vaccine development. The rapid data analysis and widespread tools, software and databases have made bioinformatics capable of giving new insights to the researchers to deal with the current scenario more efficiently. Vaccinomics, the new emerging field of bioinformatics uses concepts of immunogenetics and immunogenomics with in silico tools to give promising results for wet lab experiments. This approach is highly validated for the designing and development of potent vaccines. The present in-silico study was attempted to identify peptide fragments from spike surface glycoprotein that can be efficiently used for the designing and development of epitope-based vaccine designing approach. Both B-cell and T-cell epitopes are predicted using integrated computational tools. VaxiJen server was used for prediction of protective antigenicity of the protein. NetCTL was studied for analyzing most potent T cell epitopes and its subsequent MHC-I interaction through tools provided by IEDB. 3D structure prediction of peptides and MHC-I alleles (HLA-C*03:03) was further done to carry out docking studies using AutoDock4.0. Various tools from IEDB were used to predict B-cell epitopes on the basis of different essential parameters like surface accessibility, beta turns and many more. Based on results interpretation, the peptide sequence from 1138-1145 amino acid and sequence WTAGAAAYY and YDPLQPEL were obtained as a potential B-cell epitope and T-cell epitope respectively. This in-silico study will help us to identify novel epitope-based peptide vaccine target in spike protein of SARS-CoV-2. Further, in-vitro and in-vivo study needed to validate the findings.
ARTICLE | doi:10.20944/preprints202005.0247.v1
Online: 15 May 2020 (03:47:36 CEST)
This study presents a prediction model based on Logistic Growth Curve to evaluate the effectiveness of Movement Control Order (MCO) on COVID-19 pandemic spread. The evaluation assesses and predicts the growth models. The estimated model is a forecast-based model that depended on partial data from the COVID-19 cases in Malaysia. The model is then studied together with the effectiveness of the three phases of MCO implemented in Malaysia. Evidence from this study suggests that results of the LGC prediction model match with the progress and effectiveness of the MCO to flatten the curve, thus helped to control the spike in number of active COVID-19 cases and spread of COVID-19 infection growth.
ARTICLE | doi:10.20944/preprints202002.0233.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: wind power; machine learning; hybrid model; prediction; whale optimization algorithm
Online: 17 February 2020 (02:22:05 CET)
Wind power as a renewable source of energy, has numerous economic, environmental and social benefits. In order to enhance and control the renewable wind power, it is vital to utilize models that predict wind speed with high accuracy. Due to neglecting of requirement and significance of data preprocessing and disregarding the inadequacy of using a single predicting model, many traditional models have poor performance in wind speed prediction. In the current study, for predicting wind speed at target stations in the north of Iran, the combination of a multi-layer perceptron model (MLP) with the Whale Optimization Algorithm (WOA) used to build new method (MLP-WOA) with a limited set of data (2004-2014). Then, the MLP-WOA model was utilized at each of the ten target stations, with the nine stations for training and tenth station for testing (namely: Astara, Bandar-E-Anzali, Rasht, Manjil, Jirandeh, Talesh, Kiyashahr, Lahijan, Masuleh and Deylaman) to increase the accuracy of the subsequent hybrid model. Capability of the hybrid model in wind speed forecasting at each target station was compared with the MLP model without the WOA optimizer. To determine definite results, numerous statistical performances were utilized. For all ten target stations, the MLP-WOA model had precise outcomes than the standalone MLP model. The hybrid model had acceptable performances with lower amounts of the RMSE, SI and RE parameters and higher values of NSE, WI and KGE parameters. It was concluded that WOA optimization algorithm can improve prediction accuracy of MLP model and may be recommended for accurate wind speed prediction.
ARTICLE | doi:10.20944/preprints201912.0204.v1
Subject: Engineering, Automotive Engineering Keywords: diesel engines; numerical simulation; pollutant emissions prediction; computational fluid dynamics
Online: 16 December 2019 (05:09:55 CET)
In this paper an integrated methodology for the coupling between 1D- and 3D-CFD simulation codes is presented, which has been developed to support the design and calibration of new diesel engines. The aim of the proposed methodology is to couple 1D engine models, which may be available in the early-stage engine development phases, with 3D predictive combustion simulations, in order to obtain reliable estimates of engine performance and emissions for newly designed automotive diesel engines. The coupling procedure features simulations performed in 1D-CFD by means of GT-SUITE and in 3D-CFD by means of Converge, executed within a specifically designed calculation methodology. An assessment of the coupling procedure has been performed by comparing its results with experimental data acquired on an automotive Diesel engine, considering different working points including both part load and full load conditions. Different multiple injection schedules have been evaluated for part-load operation, including pre and post injections. The proposed methodology, featuring detailed 3D chemistry modeling, was proven to be capable to properly assess pollutant formation, specifically to estimate NOx concentrations. Soot formation trend was also well-matched for most of the explored working points. The proposed procedure can therefore be considered as a suitable methodology to support the design and calibration of new Diesel engines, thanks to its ability to provide reliable engine performance and emissions estimations from the early-stage of a new engine development.
Subject: Engineering, Automotive Engineering Keywords: software defect prediction; machine learning approach; integrated approach; Deep Forest
Online: 6 December 2019 (04:25:21 CET)
Accurate prediction of defects in software components plays a vital role in administrating the quality of the quality and efficiency of the system to be developed. So we have written a systematic literature review in order to evaluate the four main defect prediction techniques. Defect prediction paves way for the testers to find bugs and modify them in order to achieve input to output conformance. In this paper we have discussed the open issues in predicting software defects and have provided with a detailed analyzation of different methods including Machine Learning, Integrated Approach, Cross-Project and Deep Forest algorithm in order to prevent these flaws. However, it is almost impossible to rule which method is better than the other so every technique can be analyzed separately and the best technique according to the problem at hand can be used or can be altered to create hybrid technique suitable for the cause.
ARTICLE | doi:10.20944/preprints201908.0294.v1
Subject: Physical Sciences, Mathematical Physics Keywords: time series; Colorado River; water supply; cross-validation; decadal prediction
Online: 28 August 2019 (11:32:10 CEST)
The future of the Colorado River water supply (WS) affects millions of people and the U.S. economy. A recent study suggested a cross-basin correlation between the Colorado River and its neighboring Great Salt Lake (GSL). Following that study, the feasibility of using the previously developed multi-year prediction of the GSL water level to forecast the Colorado River WS was tested. Time-series models were developed to predict the changes in WS out to 10 years. Regressive methods and the GSL water level data were used for the depiction of decadal variability of the Colorado River WS. Various time-series models suggest a decline in the 10-year-averaged WS since 2013 before starting to increase around 2020. Comparison between this WS prediction and the WS projection published in a 2012 government report (derived from climate models) reveals a widened imbalance between supply and demand by 2020. Further research to update similar multi-year prediction of the Colorado River WS is needed. Such information could aid in management decision making in the face of future water shortages.
ARTICLE | doi:10.20944/preprints201901.0110.v1
Subject: Chemistry, Analytical Chemistry Keywords: Food authenticity; Toro appellation of origin; Prediction Models; Wine; Aging.
Online: 11 January 2019 (10:50:58 CET)
A combination of physical-chemical analysis has been used to monitor the aging of red wines from D.O. Toro (Spain). The changes in the chemical composition of wines that occur along aging time can be permitted to discriminate wine samples collected after one, four, seven and ten months of aging. Different computational models were used to develop a good authenticity tool to certificate wines. In this research different models have developed: Artificial Neural Network models (ANNs), Support Vector Machine (SVM) and Random Forest (RF) models. The results obtained for the ANN model developed with sigmoidal function in the output neuron and the RF model permit to determine the aging time, with an average absolute percentage deviation below 1% and it can conclude that these two models have demonstrated its capacity as a valid tool to predict the wine age.
ARTICLE | doi:10.20944/preprints201804.0004.v1
Subject: Engineering, Marine Engineering Keywords: ship’s propeller jet; mean axial velocity of flow; prediction equations
Online: 1 April 2018 (16:07:01 CEST)
The propeller jet from a ship has a significant component directed upwards towards the free surface of the water, which can be used for ice management. This paper describes a comprehensive laboratory experiment where the influences of operational factors affecting a propeller wake velocity field were investigated. The experiment was done on a steady wake field to investigate the characteristics of the axial velocity of the fluid in the wake and the corresponding variability downstream of the propeller. The axial velocities and the variability recorded were time-averaged. Propeller rotational speed was found to be the most significant factor, followed by propeller inclination. The experimental results also provide some idea about the change of the patterns of the mean axial velocity distribution against the factors considered for the test throughout the effective wake field, as well as the relationships to predict the axial velocity for known factors.
ARTICLE | doi:10.20944/preprints201711.0132.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: Sensors; Dynamic measurement errors; Prediction; Improved PSO; Support Vector Machine
Online: 20 November 2017 (16:56:20 CET)
Dynamic measurement error correction is an effective method to improve the sensor precision. Dynamic measurement error prediction is an important part of error correction, support vector machine (SVM) is often used to predicting the dynamic measurement error of sensors. Traditionally, the parameters of SVM were always set by manual, which can not ensure the model’s performance. In this paper, a method of SVM based on an improved particle swarm optimization (NAPSO) is proposed to predict the dynamic measurement error of sensors. Natural selection and Simulated annealing are added in PSO to raise the ability to avoid local optimum. To verify the performance of NAPSO-SVM, three types of algorithms are selected to optimize the SVM’s parameters, they are the particle swarm optimization algorithm (PSO), the improved PSO optimization algorithm (NAPSO), and the glowworm swarm optimization (GSO). The dynamic measurement error data of two sensors are applied as the test data. The root mean squared error and mean absoluter percentage error are employed to evaluate the prediction models’ performances. The experiment results show that the NAPSO-SVM has a better prediction precision and a less prediction errors among the three algorithms, and it is an effective method in predicting dynamic measurement errors of sensors.
ARTICLE | doi:10.20944/preprints202201.0365.v3
Subject: Life Sciences, Biochemistry Keywords: binding affinity prediction; machine learning; data quality; data quantity; deep learning
Online: 23 May 2022 (11:16:49 CEST)
Prediction of protein-ligand binding affinities is crucial for computational drug discovery. A number of deep learning approaches have been developed in recent years to improve the accuracy of such affinity prediction. While the predicting power of these systems have advanced to some degrees depending on the dataset used for model training and testing, the effects of the quality and quantity of the underlying data have not been thoroughly examined. In this study, we employed erroneous datasets and data subsets of different sizes, created from one of the largest databases of experimental binding affinities, to train and evaluate a deep learning system based on convolutional neural networks. Our results show that data quality and quantity do have significant impacts on the prediction performance of trained models. Depending on the variations in data quality and quantity, the performance discrepancies could be comparable to or even larger than those observed among different deep learning approaches. In particular, the presence of proteins during model training leads to a dramatic increase in prediction accuracy. This implies that continued accumulation of high-quality affinity data, especially for new protein targets, is indispensable for improving deep learning models to better predict protein-ligand binding affinities.
ARTICLE | doi:10.20944/preprints202202.0232.v1
Subject: Biology, Entomology Keywords: Small RNA sequencing; miRNAs; Target prediction; Chemosensory-associated genes; Apolygus lucorum
Online: 18 February 2022 (10:01:58 CET)
MicroRNAs (miRNAs) are a class of small non-coding RNAs, which function as regulators of gene expression and contribute in numerous physiological processes. However, little is known referring to miRNAs function in insect chemosensation. In the current study, nine small RNA libraries were constructed and sequenced from the antennae of nymphs, adult males and females of Apolygus lucorum. In total, 399 miRNAs were identified including 275 known and 124 novel miRNAs. Known miRNAs were classified into 71 families, amongst which, 23 families were insect-specific. Expression profile analysis showed that miR-7-5p_1 was the most abundant miRNAs in the antennae of A. lucorum. Altogether, 69708 target genes related to biogenesis, membrane and binding activities were predicted for 399 miRNAs. Particularly, 15 miRNAs were found to target 16 olfactory genes. These miRNAs could be involved in regulation of olfactory-associated genes ex-pression. Comparing the antennae of nymphs, adult males and females, 94 miRNAs were found to be differentially expressed. The expression levels of some differentially expressed miRNAs measured by qPCR were consistent with sequencing results. This study provides a global miRNAs transcriptome in the antennae of A. lucorum and valuable information for further investigation on miRNA-mRNA interactions, especially the functions of miRNAs in regulating chemosensation.
REVIEW | doi:10.20944/preprints202202.0212.v1
Subject: Mathematics & Computer Science, Analysis Keywords: Knowledge Graphs; Link Prediction; Semantic-Based Models; Translation Based Embedded Models
Online: 17 February 2022 (11:49:24 CET)
For disciplines like biological science, security, and the medical field, link prediction is a popular research area. To demonstrate the link prediction many methods have been proposed. Some of them that have been demonstrated through this review paper are TransE, Complex, DistMult, and DensE models. Each model defines link prediction with different perceptions. We argue that the practical performance potential of these methods, having similar parameter values, using the fine-tuning technique to evaluate their reliability and reproducibility of results. We describe those methods and experiments; provide theoretical proofs and experimental examples, demonstrating how current link prediction methods work in such settings. We use the standard evaluation metrics for testing the model's ability.
ARTICLE | doi:10.20944/preprints202111.0422.v1
Subject: Earth Sciences, Geophysics Keywords: tephra; ground-based weather radar; Bayesian approach; nowcasting; ensemble prediction system
Online: 23 November 2021 (13:00:31 CET)
Tephra plumes can cause a significant hazard for surrounding towns, infrastructure, and air traffic. The current work presents the use of a small and compact X-band Multi-Parameter (X-MP) radar for the remote tephra detection and tracking of two eruptive events at Merapi Volcano, Indonesia, in May and June 2018. Tephra detection was done by analysing the multiple parameters of radar: copolar correlation and reflectivity intensity. These parameters were used to cancel unwanted clutter and retrieve tephra properties, which are grain size and concentration. Real-time spatial and temporal forecasting of tephra dispersal was performed by applying an advection scheme (nowcasting) in the manner of Ensemble Prediction System (EPS). Cross-validation was done using field-survey data, radar observations, and Himawari-8 imagery. The nowcasting model computed both the displacement and growth and decaying rate of the plume based on the temporal changes in two-dimensional movement and tephra concentration, respectively. Our results with ground-based data, where the radar-based estimated grain size distribution fell within the range of in-situ data. The uncertainty of real-time forecasted tephra plume depends on the initial condition, which affects the growth-and decaying rate estimation. The EPS improves the predictability rate by reducing the number of missed and false forecasted events. Our findings and the method presented here are suitable for early warning of tephra fall hazard at the local scale.
Subject: Keywords: Trip purpose prediction; Smart card data; POIs, neural networks; machine learning
Online: 3 June 2021 (13:34:52 CEST)
Predicting trip purpose from comprehensive and continuous smart card data is beneficial for transport and city planners in investigating travel behaviours and urban mobility. Here we propose a framework, ActivityNET, using machine learning (ML) algorithms to predict passengers’ trip purpose from smart card data and Points-of-Interest (POIs) data. The feasibility of the framework is demonstrated in two phases. Phase I focuses on extracting activities from individuals’ daily travel patterns from smart card data and combining them with POIs using the proposed ‘activity-POIs consolidation algorithm’. Phase II feeds the extracted features into an artificial neural network (ANN) with multiple scenarios and predicts trip purpose under primary activities (home and work) and secondary activities (entertainment, eating, shopping, child drop-offs/pick-ups and part-time work) with high accuracy. As a case study, the proposed ActivityNET framework is applied in Greater London and illustrates a robust competence to predict trip purpose. The promising outcomes demonstrate that the cost-effective framework offers high predictive accuracy and valuable insights into transport planning.