REVIEW | doi:10.20944/preprints202309.1764.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Machine-learning; Weather prediction; Climate prediction; Survey; Meteorological Forecasting
Online: 30 October 2023 (17:09:12 CET)
With the rapid development of artificial intelligence, machine learning is gradually becoming popular in predictions in all walks of life. In meteorology, It is gradually competing with traditional climate predictions dominated by physical models. This survey aims to consolidate the current understanding of Machine Learning (ML) applications in weather and climate prediction—a field of growing importance across multiple sectors including agriculture and disaster management. Building upon an exhaustive review of more than 20 methods highlighted in existing literature, this survey pinpointed eight techniques that show particular promise for improving the accuracy of both short-term weather and medium-to-long-term climate forecasts. According to the survey, while ML demonstrates significant capabilities in short-term weather prediction, its application in medium-to-long-term climate forecasting remains limited, constrained by factors such as intricate climate variables and data limitations. Current literature tends to focus narrowly on either short-term weather or medium-to-long-term climate forecasting, often neglecting the relationship between the two, as well as general neglect of modelling structure and recent advances. By providing an integrated analysis of models spanning different time scales, this survey aims to bridge these gaps, thereby serving as a meaningful guide for future interdisciplinary research in this rapidly evolving field.
ARTICLE | doi:10.20944/preprints202112.0391.v1
Subject: Medicine And Pharmacology, Pediatrics, Perinatology And Child Health Keywords: online prediction; CYP21A2; mutation analysis; pathogenicity prediction
Online: 23 December 2021 (12:00:40 CET)
Context: CYP21A2 deficiency represents 95% of congenital adrenal hyperplasia cases (CAH), a group of genetic disorders that affect steroid biosynthesis. The genetic and functional analysis provides critical tools to elucidate complex CAH cases. One of the most accessible tools to infer the pathogenicity of new variants is in silico prediction. Objective: Analyze the performance of in silico prediction tools to categorize missense single nucleotide variants (SNVs) of the CYP21A2. Methods: SNVs of the CYP21A2 characterized in vitro by functional assays were selected to assess the performance of online single and meta predictors. SNVs were tested separately or in combination with the related phenotype (severe or mild CAH form). In total, 103 SNVs of the CYP21A2 (90 pathogenic and 13 neutral) were used to test the performance of 13 single-predictors and four meta-predictors. Results: SNVs associated with the severe phenotypes were well categorized by all tools, with an accuracy between 0.69 (PredictSNP2) and 0.97 (CADD), and Matthews' correlation coefficient (MCC) between 0.49 (PoredicSNP2) and 0.90 (CADD). However, SNVs related to the mild phenotype had more variation, with the accuracy between 0.47 (S3Ds&GO and MAPP) and 0.88 (CADD), and MCC between 0.18 (MAPP) and 0.71 (CADD). Conclusion: From our analysis, we identified four predictors of CYP21A2 pathogenicity with good performance. These results can be used for future analysis to infer the impact of uncharacterized SNVs' in CYP21A2.
ARTICLE | doi:10.20944/preprints201808.0500.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: East Asian summer monsoon, Seasonal prediction, dynamic prediction, summer rainfall prediction, NESM3.0, ENSO teleconnection
Online: 29 August 2018 (13:42:45 CEST)
It has been an outstanding challenge for global climate models to simulate and predict East Asia (EA) summer monsoon (EASM) rainfall. This study evaluates the dynamical hindcast skills with the newly developed Nanjing University of Information Science and Technology Earth System Model version 3.0 (NESM3.0). To improve the poor prediction of an earlier version of NESM3.0, we have modified convective parameterization schemes to suppress excessive deep convection and enhance insufficient shallow and stratiform clouds. The new version of NESM3.0 with modified parameterizations (MOD hereafter) yields significantly improved rainfall prediction in the northern and southern China but not over the Yangtze River Valley. The improved prediction is primarily attributed to the improvements in the predicted climatological summer mean rainfall and circulations, seasonal march of the subtropical rain belt, Nino 3.4 SST anomaly, and the rainfall anomalies associated with the development and decay of El Nino events. However, the MOD still has notable biases in the predicted leading mode of interannual variability of precipitation. The leading mode captures the dry (wet) anomalies over the South China Sea (northern EA) but misplaced precipitation anomalies over the Yangtze River Valley. The model can capture the interannual variation of the circulation indices very well, but the bias in the circulation-rainfall connection caused predicted rainfall errors. The results here suggest that over EA land regions, the skillful rainfall prediction relies on not only model’s capability in predicting better summer mean and seasonal march of rainfall and ENSO teleconnection with EASM, but also accurate prediction of the leading modes of interannual variability.
ARTICLE | doi:10.20944/preprints202205.0091.v1
Subject: Medicine And Pharmacology, Oncology And Oncogenics Keywords: risk prediction; prediction models; risk of bias; PROBAST; melanoma
Online: 7 May 2022 (03:50:41 CEST)
Rising incidences of cutaneous melanoma have fueled the development of statistical models that predict the individual melanoma risk. Our aim was to assess the validity of published prediction models for incident cutaneous melanoma using a standardized procedure based on PROBAST (Prediction model Risk Of Bias ASsessment Tool). We included studies that were identified by a recent systematic review and updated the literature search to ensure that our PROBAST rating included all relevant studies. Six reviewers assessed the risk of bias (ROB) for each study using the published “PROBAST Assessment Form” that consists of four domains and an overall rating of ROB. We further examined a temporal effect regarding changes in overall and domain-specific ROB rating distributions. Altogether 42 studies were assessed, of which a vast majority (n=34; 81%) was rated as having high ROB. Only one study was judged as having low ROB. The main reasons for high ROB ratings were the use of hospital controls in case-control studies and the omission of any validation of prediction models. However, our results of the temporal analysis showed a significant reduction in the number of studies with high ROB for the domain analysis. Nevertheless, the evidence base of high-quality studies that can be used to draw conclusions on the prediction of incident cutaneous melanoma is currently much weaker than the high number of studies on this topic would suggest.
ARTICLE | doi:10.20944/preprints202105.0669.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: porosity prediction; pore-water prediction; gravity; resistivity; combined inversion
Online: 27 May 2021 (13:16:28 CEST)
This work describes a method to carry out 2-D inversion of gravity data in terms of porosity and matrix density distribution using previous DC resistivity inversion results to constraint the fractional pore-water content in the rocks. The inversion is carried out using a controlled random search (CRS) algorithm for global optimization. The method was tested on synthetic data generated from a model representing a graben, and the results show that it can estimate accurate values of contrast-density and porosity. The method was also applied to gravity and dc experimental data collected in NE Portugal, showing results that agree quite well with the known geological information.
ARTICLE | doi:10.20944/preprints202311.1738.v1
Subject: Environmental And Earth Sciences, Geography Keywords: Landslide; Spatial prediction; Vulnerability; Cross-validation; Prediction table; Landslide risk
Online: 28 November 2023 (07:18:26 CET)
This research deals with risk assessment for human life, man-made infrastructure, and agriculture in a landslide-prone area using GIS-based spatial methods. The study area landslide inventory map was prepared based on previous landslide information, aerial photograph analysis, and several field observations. A total of 550 landslides have been included with 182 debris flow and 368 soil slides. All included landslides were classified into two groups by random selection; half were used for model calibration and the rest were used for cross-validation. In the analysis, fourteen causative factors were vastly used, such as aspect, slope, curvature, elevation, topographic wetness index, forest timber diameter, forest type, forest crown density, forest age, land-use, geology, soil drainage, soil depth, and soil texture. Moreover, to identify the interaction between occurred landslides and causative factors, the affected pixels were divided into different sub-classes using a frequency ratio method. Based on the total dataset, three landslide susceptibility maps were constructed using Bayesian prediction, likelihood ratio, and fuzzy set method. By evaluating cross-validation and success rate curve, model susceptibility results were plotted with a receiver operating characteristic (ROC) curve and the area under the curve (AUC) was estimated. In addition, for risk assessment, each social data layer such as agriculture, house, industry, business, road, river, population intensity, monetary value, and vulnerability level was added based on the local standard and incident time and was converted into US dollars. During the analysis, each method hazard map was used with a specific group of thematic data layers. Subsequently, for preparing the probability table, study area total pixels and predictive landslide affected pixels were considered. Matching with the affected pixels, a standard of 5000 pixels was selected to run the final evaluation. Based on the results, the agricultural field showed the highest vulnerability and estimated risk of US $ 16.3 million. Further, the man-made infrastructure map showed a risk of US $ 31.3 million. The total estimated population casualties were 6.77, which was relatively similar to the published data.
Subject: Computer Science And Mathematics, Information Systems Keywords: Online Social Media prediction, Covid-19 prediction, Twitter, Google Trends
Online: 3 June 2021 (11:37:56 CEST)
As the coronavirus disease 2019 (COVID-19) continues to rage worldwide, the United States has become the most affected country with more than 34.1 million total confirmed cases up to June 1, 2021. In this work, we investigate correlations between online social media and Internet search for the COVID-19 pandemic among 50 U.S. states. By collecting the state-level daily trends through both Twitter and Google Trends, we observe a high but state-different lag correlation with the number of daily confirmed cases. We further find that the predictive accuracy measured by the correlation coefficient is positively correlated to a state’s demographic, air traffic volume and GDP development. Most importantly, we show that a state’s early infection rate is negatively correlated with the lag to the previous peak in Internet search and tweeting about COVID-19, indicating that earlier collective awareness on Twitter/Google correlates with lower infection rate. Lastly, we demonstrate that correlations between online social media and search trends are sensitive to time, mainly due to the attention shifting of the public.
ARTICLE | doi:10.20944/preprints202304.1235.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: Genomic prediction; flavonoid pigmentation; Sorghum bicolor; prediction accuracy; marker-assisted selection
Online: 29 April 2023 (10:11:10 CEST)
Marker-assisted selection (MAS) and genomic selection (GS) have been used to select individuals with desirable traits. MAS used a few markers associated with a specific trait to select individuals with desirable traits, which are determined after a Genome-wide association studies (GWAS). On the contrary, GS uses a large number of markers distributed across the genome to predict the genomic breeding values for a further selection of the individuals. In general, MAS has shown a high prediction accuracy but is not suitable for traits that are controlled for multiple genes, and has another constraint, it is required the phenotypic data; on the contrary, GS has not shown the highest prediction accuracy as MAS but it takes into account the effect of multiple genes controlling a target trait and it can be used without phenotypic data. Including GWAS-selected markers in GS can enhance the reduced prediction accuracy that GS shows in comparison with MAS. Thus, the objective of this study was to compare the prediction accuracy of MAS, and some models of genomic prediction (gBLUP, gBLUP including GWAs-selected markers, and some Bayesian models such as Bayes A, Bayes B, Bayes LASSO and Bayesian Ridge Regression) with GWAS-selected markers incorporated in gBLUP in order to confirm if the incorporation of GWAS in GS increases the prediction accuracy of GS. As a model for this study, it was used data from Sorghum which has shown population structure, to evaluate if the incorporation of GWAs-selected markers into GS improves prediciton accuracy. It was used a sample of 6000 SNPs out of the 265.487 reported in the study conducted by Morris et al (2013), and also it was considered some parameters that affect the efficiency of the selection such as the size of the training population, the heritability, and the number of QTNs. The GWAS-selected SNPs were identified after using the model BLINK. The results showed that the incorporation of GWAS-selected markers enhanced the performance of the genomic selection with similar prediction accuracy as MAS, the number of QTNs and size of the training population affected the accuracy, with higher accuracy with a bigger size of the training population and with a lower number of QTNs, but it seems that the heritability does not have any impact in the model where GWAS-selected SNPs were included in gBLUP.
ARTICLE | doi:10.20944/preprints202105.0116.v1
Subject: Medicine And Pharmacology, Pulmonary And Respiratory Medicine Keywords: Time Series Prediction; ANN forecasting; New Coronavirus; COVID19 prediction cases; COVID19 prediction deaths; COVID19 prediction ICU, COVID19 Vaccination; COVID19 in Europe; COVID19 in Israel; COVID19 use of face mask.
Online: 6 May 2021 (16:58:01 CEST)
The use of Artificial Neural Networks (ANN) is a great contribution to medical studies since the application of forecasting concepts allows the analysis of future diseases propagations. In this context, this paper presents a study of the new coronavirus SARS-COV-2 with a focus on verifying the virus propagation associated with mitigation procedures and massive vaccination campaigns. There were proposed two methodologies to predict 28 days ahead the number of new cases, deaths, and ICU patients of five European countries: Portugal, France, Italy, United Kingdom, and Germany, and a case study of the results of massive immunization in Israel. The data input of cases, deaths, and daily ICU patients was normalized to reduce discrepant numbers due to the countries size, and the cumulative vaccination values by the percentage of population immunized, at least with one dose of vaccine. As a comparative criterion, the calculation of the mean absolute error (MAE) of all predictions presents the best methodology and targets other possibilities of use for the proposed method. The best architecture achieved a general MAE for the 1 to 28 days ahead forecast lower than 30 cases, 0,6 deaths and 2,5 ICU patients by million people.
ARTICLE | doi:10.20944/preprints202311.1073.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Academic Performance, Progress Prediction, Score Prediction, Learning Behavior, Learning Dataset, Educational Data Mining
Online: 16 November 2023 (11:24:43 CET)
Intelligent Tutoring Systems (ITS) are increasingly popular for online learning. These systems use adaptive algorithms to recommend relevant content based on students' profiles. However, instructors need to periodically assess students' performance to ensure learning outcomes and adjust strategies accordingly. Our objective is to predict students' progress in advance, enabling teachers to make quicker decisions and facilitating the iterative process of adaptive algorithms. For this study, we collected a dataset from ALIN, an online learning platform, consisting of over 5,000 students' learning records and test results. Using this dataset, we conducted experiments employing various machine learning algorithms. The results indicate that learning behavior contributes to improving forecast performance, while students' progress strongly correlates with their previous test results. Additionally, we discovered that students' progress can be indirectly predicted by forecasting their scores. Furthermore, by breaking down overall scores into several distinct components and predicting individual scores for each component, the accuracy of the forecasts can be improved.
ARTICLE | doi:10.20944/preprints202311.0560.v1
Subject: Environmental And Earth Sciences, Oceanography Keywords: sea ice concentration; recurrent neural network; Arctic sea ice prediction; short-term prediction
Online: 8 November 2023 (14:45:25 CET)
Arctic sea ice prediction holds significant importance for facilitating Arctic route planning, optimizing fisheries management, and advancing the field of sea ice dynamics research. While various deep learning models have been developed for sea ice prediction, they predominantly operate at the seasonal or sub-seasonal scale, often focusing on localized areas, and few cater to full-region daily scale prediction. This study introduces the use of spatiotemporal sequence data prediction models, namely, the convolutional LSTM (ConvLSTM) and predictive recurrent neural network (PredRNN), for the prediction of sea ice concentration (SIC). Our analysis reveals that, when solely utilizing SIC historical data as the input, the ConvLSTM model outperforms the PredRNN model in SIC prediction. To enhance the model's capacity to capture spatiotemporal relationships between multiple variables, we expanded the range of input data types to form the ConvLSTM-multi and PredRNN-multi models. Experimental findings demonstrate that the ConvLSTM-multi model excels in assimilating the influence of reanalysis data on sea ice within the sea ice edge region, thus exhibiting superior performance in predicting daily Arctic SIC over the subsequent 10 days. Furthermore, sensitivity tests on various model parameters highlight the substantial impact of sea surface temperature and prediction date on the accuracy of daily sea ice prediction. Additionally, meteorological and oceanographic parameters primarily affect the prediction accuracy of the thin ice region at the edge of the sea ice.
ARTICLE | doi:10.20944/preprints202310.1988.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: academic performance; progress prediction; score prediction; learning behavior; learning dataset; educational data mining
Online: 31 October 2023 (09:40:38 CET)
Data mining techniques have garnered significant attention within the realm of education. However, the procurement of ample student data poses a formidable challenge. In response to this challenge, we present a student dataset characterized by its size and distinctive attributes. This dataset encompasses various task-related topics interconnected through a learning pathway, thereby enabling researchers to delve into the data from novel perspectives. Moreover, it encompasses extensive longitudinal student behavioral data, a rarity that adds substantial value. Spanning the years from 2010 to 2021, our dataset comprises a cohort of 7,933 students, 64,344 test scores, and 183,390 behavior records, solidifying its status as a valuable resource for educational research. In our experiments, we achieved successful predictions of students' test outcomes based on behavioral learning data. The strengths of our dataset render it apt for analyzing the nexus between student conduct and academic performance, crafting personalized learning recommendations, and pursuing various other research pursuits.
Subject: Social Sciences, Psychology Keywords: active inference; digital affordances; patterns of attention; prediction error minimization; prediction error dynamics
Online: 19 September 2022 (04:49:39 CEST)
Culture exploits the acquisition of meaningful content by crafting regimes of shared attention, determining what is relevant, valuable, and salient. Culture changes the field of relevant social affordances worthy of being acted upon in a context-sensitive manner. When relevant affordances are highly weighted, their attentional capture and their salience increase the probability of them being enacted due to the associated expectation for minimizing prediction error. This process is known as active inference. In the digital era, individuals need to infer the action-related attributes of digital cues, here characterized as digital affordances. The digital affordances of digital social platforms are of particular interest here. Digital social affordances are defined as online possibilities of social interactions. By their own nature, these are salient because they are related to social interactions and relevant social cues. However, the problem of digital social platforms is that they are not equivalent to situated social interactions because their structure is built, mediated, and defined by third-parties with diverse interests. The third-parties behind the digital social platforms are using the same mechanism exploited by culture to manipulate the shared patterns of attention. Moreover, digital social platforms are deliberately designed to be hyper-stimulating, making digital social affordances highly rewarding and increasingly salient. This appropriation, for economic purposes, is an issue of great importance, especially as the COVID-19 pandemic brought deep global changes, pushing societies to an online digital way of life. Here, we examined different types of digital social affordances under an active inference view, placing them into two categories, those for self-identity formation, and those for belief-updating. This paper aims to analyze digital social affordances in light of the prediction error dynamics they might elicit to their users. Although each of the analyzed digital social affordances allows different epistemic and instrumental digital actions, they all share the characteristic of having an "easy" and a fast expected rate of error reduction. Here, we aim to provide a new hypothesis about how the design behind digital social affordances is built on our natural attractiveness to minimize prediction error and the resulting positive embodied feelings when doing so. Finally, it is suggested that because digital social affordances are becoming highly weighted in the field of affordances, this might be putting at risk our context-sensitive grip on a rich, dynamic and varied field of relevant affordances.
ARTICLE | doi:10.20944/preprints202212.0141.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Soil moisture prediction; LSTM; PSO
Online: 8 December 2022 (02:28:04 CET)
Soil moisture is an important factor affecting the plant growth. For a long time, the convenience, timeliness and accuracy of soil moisture monitoring have been limited due to the backward of observation methods and equipment. Therefore, the quantitative prediction of soil moisture has become a difficult problem. Aiming at the problems of high erection cost, easily damaged sensors and low measurement accuracy of the existing fixed sensor soil moisture monitoring system, a soil moisture prediction model based on the long short term memory neural network (LSTM) integrating the particle swarm optimization (PSO) (PSO-LSTM) is designed and implemented. The hyperparameters of the LSTM network can be obtained based on the excellent global search ability of the PSO algorithm. According to the meteorological data and soil moisture data of Haidian Park in 2019, the long short term memory(LSTM) neural network based prediction model is constructed with input vectors of surface temperature, average temperature, evaporation, sunshine hours, precipitation and average wind speed, and the output vector of soil relative humidity. The results show that compared with the back propagation(BP) neural network, the Elman neural network and the LSTM neural network, the proposed PSO-LSTM model has higher prediction performance.
ARTICLE | doi:10.20944/preprints202201.0313.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: stochastic clustering; energy prediction; disaggregation
Online: 20 January 2022 (20:40:54 CET)
This paper describes a stochastic clustering architecture that is used in the paper for making predictions over energy data. The design is discrete, localised optimisations based on similarity, followed by a global aggregating layer, which can be compared with the recent random neural network designs, for example. The topic relates to the IDEAS Smart Home Energy Project, where a client-side Artificial Intelligence component can predict energy consumption for appliances. The proposed data model is essentially a look-up table of the key energy bands that each appliance would use. Each band represents a level of consumption by the appliance. This table can replace disaggregation from more complicated methods, usually constructed from probability theory, for example. Results show that the table can accurately disaggregate a single source to a set of appliances, because each appliance has quite a unique energy footprint. As part of predicting energy consumption, the model could possibly reduce costs by 50% and more than that if the proposed schedules are also included. The hyper-grid has been changed to consider rows as single units, making it more tractable. A second case study considers wind power patterns, where the grid optimises over the dataset columns in a self-similar way to the rows, allowing for some level of feature analysis.
ARTICLE | doi:10.20944/preprints201901.0023.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: CPT, Rainfall, Prediction, Season, SST
Online: 3 January 2019 (13:20:00 CET)
The main objective of this study is to search better prediction result of rainy seasonal rainfall (15 June-15 August). A correlation between rainfall of Bengali rainy seasons at Rangpur, Dhaka, Barisal and Sylhet and global sea surface temperature (SST) of different areas of the world was studied by using the both data of 1975- 2008 years with the help of the Climate Predictability Tool (CPT) to find more positive correlated SST with observed rainfall and use as predictor for giving the prediction of the year 2009. Using SST of one month before rainy season as predictor, the positive deviation of predicted rainfall from observed rainfall was 1.34 mm/day at Sylhet and 0.9 mm/day at Dhaka. The negative deviation of mean rainfall was 1.16 mm/day at Rangpur and 1.10 mm/day at Barisal. Again, using of starting one month SST of rainy season as predictor, positive deviation of predicted rainfall from observed rainfall was 4.03 mm/day at Sylhet. The positive deviation of daily mean rainfall was found 6.58 mm/day at Dhaka and 6.23 mm/day over southern Bangladesh. The study reveals that sea surface temperature (SST) of one month before rainy season was better predictor than SST of starting month of rainy season.
REVIEW | doi:10.20944/preprints201810.0098.v2
Subject: Environmental And Earth Sciences, Environmental Science Keywords: flood prediction; machine learning; forecasting
Online: 26 October 2018 (11:56:27 CEST)
Floods are among the most destructive natural disasters, which are highly complex to model. The research on the advancement of flood prediction models has been contributing to risk reduction, policy suggestion, minimizing loss of human life and reducing the property damage associated with floods. To mimic the complex mathematical expressions of physical processes of floods, during the past two decades, machine learning (ML) methods have highly contributed in the advancement of prediction systems providing better performance and cost effective solutions. Due to the vast benefits and potential of ML, its popularity has dramatically increased among hydrologists. Researchers through introducing the novel ML methods and hybridization of the existing ones have been aiming at discovering more accurate and efficient prediction models. The main contribution is to demonstrate the state of the art of ML models in flood prediction and give an insight over the most suitable models. The literature where ML models are benchmarked through a qualitative analysis of robustness, accuracy, effectiveness, and speed have been particularly investigated to provide an extensive overview on various ML algorithms usage in the field. The performance comparison of ML models presents an in-depth understanding about the different techniques within the framework of a comprehensive evaluation and discussion. As the result, the paper introduces the most promising prediction methods for both long-term and short-term floods. Furthermore, the major trends in improving the quality of the flood prediction models are investigated. Among them, hybridization, data decomposition, algorithm ensemble, and model optimization are reported the most effective strategy in improvement of the ML methods. This survey can be used as a guideline for the hydrologists as well as climate scientists to assist them choosing the proper ML method according to the prediction task conclusions.
ARTICLE | doi:10.20944/preprints202308.1858.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: Deep Learning, ELECTRA; Ion binding site prediction; Transformer; Natural Language Processing; Sequence-based prediction
Online: 29 August 2023 (02:50:38 CEST)
Interactions between proteins and ions are essential for various biological functions like structural stability, metabolism, and signal transport. Given that more than half of all proteins bind to ions, it becomes crucial to identify ion-binding sites. Accurate identification of protein-ion binding sites helps us to understand proteins’ biological functions and plays a significant role in drug discovery. While several computational approaches have been proposed, this remains a challenging problem due to the small size and high versatility of metals and acid radicals. In this study, we propose IonPred, a sequence-based approach that employs ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately) to predict ion binding sites using only raw protein sequences. We successfully fine-tuned our pretrained model to predict the binding sites for nine metal ions (Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, and K+) and four acid radical ion ligands (CO32−, SO42−, PO43−, NO2−). IonPred surpassed six current state-of-the-art tools by over 44.65% and 28.46% respectively in F1 score and MCC when compared on an independent test dataset. Our method is more computationally efficient than existing tools producing prediction results for a hundred sequences for a specific ion in under ten minutes.
ARTICLE | doi:10.20944/preprints202311.0345.v1
Subject: Engineering, Architecture, Building And Construction Keywords: similarity method; cooling load prediction; neural network prediction model; entropy weight method; grey correlation method
Online: 7 November 2023 (02:49:30 CET)
Artificial intelligence algorithms have gained widespread adoption in the field of air conditioning load prediction. However, their prediction accuracy is substantially influenced by the quality of training samples. Training samples that lack relevance to the predicted moments can introduce interference into the neural network's learning process, potentially leading to a state of local convergence during its iterative process. This, in turn, diminishes the robustness and generalization capabilities of the prediction model. To enhance the prediction accuracy of air conditioning load prediction models based on artificial intelligence algorithms, this study presents an artificial intelligence algorithm prediction model based on the method of sample similarity sample screening. Initially, the comprehensive similarity coefficient between samples is computed using the gray correlation analysis method, enriched with enhancements in information entropy. Subsequently, a subset of closely related samples is extracted from the original dataset and employed as the training dataset for the artificial intelligence prediction model. Finally, the trained artificial intelligence algorithm prediction model is deployed for air conditioning load prediction. The results illustrate that the method of screening training samples based on sample similarity effectively improves the prediction accuracy of BP neural network (BPNN) and extreme learning machine (ELM) prediction models. However, it is important to note that this approach may not be suitable for genetic algorithm BPNN (GABPNN) and support vector regression (SVR) models.
ARTICLE | doi:10.20944/preprints202308.0364.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Medical images; Compression; CNN; LMCDP; Prediction
Online: 4 August 2023 (07:21:52 CEST)
The primary goal of picture compression is to reduce the amount of unused image data while still storing or transmitting it in a format that is appropriate. The compression of raw binary data is quite different from the compression of a picture, and these differences may be rather substantial. In light of this, compression is often regarded as an essential technique for the purposes of both data storage an d transmission in order to mitigate the excessive amounts of data that are generated by these images. In order to transmit enormous datasets, particularly for the purposes of telemedicine and teleradiology, one needs a significant amount of storage capacity as well as an expansive network. As a result, compression is an important aspect of medical imaging. In addition to the importance of compression, the quality of the photos themselves is also an essential factor in the success of analysis. In addition to this, the amount of time necessary to compress the photographs before sending them should be reduced.
ARTICLE | doi:10.20944/preprints202306.1444.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Machine Learning; Detection; Prediction; Oversampling; SMOTE
Online: 20 June 2023 (14:35:15 CEST)
Research in brain stroke prediction is very crucial as it can lead to the development of early detection techniques and interventions that can enhance the prognosis for stroke victims. Early detection and intervention can help to minimise the damage caused by a stroke, reduce the risk of long-term complications, and enhance the general quality of life for people who have survived a stroke. Additionally, research in stroke prediction can help to identify risk factors and improve understanding of the underlying causes of stroke, which can lead to the development of better prevention strategies. Research on brain stroke prediction is ongoing and has led to the development of various models and tools for predicting the risk of stroke and detecting it early. However, the implementation and use of these tools in clinical practise vary depending on several factors, such as the availability of resources, the specific healthcare system, and the level of awareness and acceptance of these tools among healthcare providers and patients. In general, risk prediction models may be used to quickly identify individuals at high risk of stroke and target them for preventive interventions, such as lifestyle changes, medication management, and screenings. Early detection tools can be used to quickly identify stroke symptoms and initiate appropriate treatment, which can improve outcomes for stroke patients. However, it is important to note that research and development of these models and tools are ongoing, and their use in clinical practise is constantly evaluated and updated. It may take time for these tools to be widely adopted in clinical practise and to see their real-world impact. This research paper focuses on predicting brain stroke occurrence using a range of machine learning algorithms such as Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Gaussian Naive Bayes (GNB), Bernoulli Naive Bayes (BNB), and a Voting Classifier. The main emphasis of this research was to compare the effectiveness of oversampling techniques, as the data was significantly imbalanced. We have used evaluation metrics in this research to assess the accuracy, precision, recall, and other key performance indicators of a model's predictions. Area under the Receiver Operating Characteristics Curve (AUC), minority class accuracy, and majority class accuracy were used to assess the approaches.
ARTICLE | doi:10.20944/preprints202305.0313.v1
Subject: Engineering, Aerospace Engineering Keywords: ANFIS; Decision-making; Failure prediction; Aviation
Online: 5 May 2023 (07:17:49 CEST)
Safety is very important in aviation since a loss of safety frequently results in both fatalities and financially damaging situations that are typically unrecoverable. Thus, achieving safety as much as possible is the primary goal of practically all aviation technology work. In this study, an Adaptive Neuro-Fuzzy Inference System (ANFIS)-based classifier is created to estimate the fault risk factor of airplanes. Five categories of real fleet data belonging to structure, electrical, avionic, motor systems, and incident statistics of the planes have been used for classifier development. A risk factor determination for each plane is the output of the developed intelligent classifier, and it can be used to identify general overhaul candidate planes and stop defects and crashes before they happen. The obtained results show that using ANFIS provides a great capability in processing many inputs and outputs depending on different types and classes in the aviation industry and thus predicting the failure risk of the airplane efficiently.
REVIEW | doi:10.20944/preprints202210.0022.v2
Subject: Biology And Life Sciences, Virology Keywords: monkeypox; risk; elimination; epidemiology; outbreak; prediction
Online: 14 November 2022 (09:43:24 CET)
Human monkeypox, caused by monkeypox virus, has spread unprecedentedly to more than 100 countries since May 2022. Here we summarized the epidemiology of monkeypox through a literature review and elucidated the risks and elimination strategies of this outbreak mainly based on the summarized epidemiology. We demonstrated that monkeypox virus became more contagious and less virulent in 2022, which could result from the fact that the virus entered a special transmission network favoring close contacts (i.e., sexual behaviors of men who have sex with men outside Africa) and the possibility that the virus accumulated a few adaptive mutations. We gave the reasons to investigate whether cattle, goats, sheep, and pigs are susceptible to monkeypox virus and whether infection with monkeypox virus could be latent in some primates. We listed six potential scenarios for the future of the outbreak (e.g., the outbreak could lead to endemicity outside Africa with increased transmissibility or virulence). We also listed multiple factors aiding or impeding the elimination of the outbreak. We showed that the control measures strengthened worldwide after the World Health Organization declared the outbreak a public health emergency of international concern (PHEIC) could eliminate the outbreak in 2022. We clarified eight strategies, i.e., publicity and education, case isolation, vaccine stockpiling, risk-based vaccination or ring vaccination, importation quarantine, international collaboration, and laboratory management, for the elimination of the outbreak.
ARTICLE | doi:10.20944/preprints202210.0434.v1
Subject: Business, Economics And Management, Economics Keywords: Housing Prices; MRT; Prediction; Machine Learning
Online: 27 October 2022 (11:12:44 CEST)
Real estate has the dual characteristics of consumption and investment. Under the inherent concept of "where there is land, there is wealth", real estate has become the investment target of the main asset. In addition to the supply and demand of the market will affect the price of real estate, many general economic factors will also have a certain impact on the price of real estate. This paper pays a special focus on the effects of MRT on housing prices, finding the closer a house is to an MRT station, the higher its price. In addition, convenience stores are very popularized, and this research report is also interested in whether the number of convenience stores within walking distance will affect the house price. In a word, these two factors do affect housing prices significantly.
ARTICLE | doi:10.20944/preprints202207.0323.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: Algorithmic probability; Kolmogorov complexity; prediction; induction
Online: 21 July 2022 (10:48:26 CEST)
Developing new ways to estimate probabilities can be valuable for science, statistics, and engineering. By considering the information content of different output patterns, recent work invoking algorithmic information theory has shown that a priori probability predictions based on pattern complexities can be made in a broad class of input-output maps. These algorithmic probability predictions do not depend on a detailed knowledge of how output patterns were produced, or historical statistical data. Although quantitatively fairly accurate, a main weakness of these predictions is that they are given as an upper bound on the probability of a pattern, but many low complexity, low probability patterns occur, for which the upper bound has little predictive value. Here we study this low complexity, low probability phenomenon by looking at example maps, namely a finite state transducer, natural time series data, RNA molecule structures, and polynomial curves. Some mechanisms causing low complexity, low probability behaviour are identified, and we argue this behaviour should be assumed as a default in the real world algorithmic probability studies. Additionally, we examine some applications of algorithmic probability and discuss some implications of low complexity, low probability patterns for several research areas including simplicity in physics and biology, a priori probability predictions, Solomonoff induction and Occam's razor, machine learning, and password guessing.
ARTICLE | doi:10.20944/preprints202205.0313.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Failure Prediction; Asynchronous motor; Neural Network
Online: 24 May 2022 (03:37:35 CEST)
Three-phase motors are commonly adopted in several industrial contexts and their failures can result in costly downtime causing undesired service outages; this way, motor diagnostics is an issue that assumes great importance. To prevent their failures and timely face the considered service outages, a non-invasive method to identify electrical and mechanical faults in three-phase asynchronous electric motors is proposed in the paper. In particular, a measurement strategy along with a machine learning algorithm based on Artificial Neural Network is exploited to properly classify failures. In particular, digitized current samples of each motor phase are first processed by means of FFT and PSD in order to estimate the associated spectrum. Suitable features (in terms of frequency and amplitude of the spectral components) are then singled out to either train or feed a neural network acting as a classifier. The method is preliminary validated on a set of 28 electric motors, and its performance is compared with common state-of-art machine learning techniques. The obtained results show that the proposed methodology is able to reach accuracy levels greater than 98\% in identifying anomalous conditions of three-phase asynchronous motors.
ARTICLE | doi:10.20944/preprints202011.0366.v1
Subject: Engineering, Automotive Engineering Keywords: Fault detection; Control Valve; Reliability, Prediction
Online: 13 November 2020 (09:23:39 CET)
Reliability assessment is an important component and tool used for process plants since the facility consists of many loops and instruments attached and operates based on each other availability, thus it requires a statistical method to visualize the reliability. The paper focuses on reliability assessment and prediction based on available statistical models such as normal, log-normal, exponential, and Weibull distribution. This paper also visualizes, which model fits best for assessment and prediction and also considers failure modes caused during a simulation mode process control operation. A simulation model is designed in this paper to observe the failure of the control valve causing stiction to visualize the failure modes and predict the best-fit model for reliability assessment.
ARTICLE | doi:10.20944/preprints202008.0139.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: copper price; prediction; support vector regression
Online: 6 August 2020 (08:26:35 CEST)
Predicting copper price is essential for making decisions that can affect companies and governments dependent on the copper mining industry. Copper prices follow a time series that is non-linear, non-stationary, and which have periods that change as a result of potential growth, cyclical fluctuation and errors. Sometimes the trend and cyclical components together are referred to as a trend-cycle. In order to make predictions, it is necessary to consider the different characteristics of trend-cycle. In this paper, we study a copper price prediction method using Support Vector Regression. This work explores the potential of the Support Vector Regression with external recurrences to make predictions at 5, 10, 15, 20 and 30 days into the future in the copper closing price at the London Metal Exchanges. The best model for each forecast interval is performed using a grid search and balanced cross-validation. In experiments on real data-sets, our results obtained indicate that the parameters (C, ε, γ) of the model Support Vector Regression do not differ between the different prediction intervals. Additionally, the amount of preceding values used to make the estimates does not vary according to the predicted interval. Results show that the support vector regression model has a lower prediction error and is more robust. Our results show that the presented model is able to predict copper price volatilities near reality, being the RMSE equal or less than the 2.2% for prediction periods of 5 and 10 days.
ARTICLE | doi:10.20944/preprints202002.0095.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: permafrost; temperature; nonlinear fitting; prediction model
Online: 7 February 2020 (11:31:37 CET)
The pile foundation in the permafrost region is in a negative temperature environment, so the concrete is affected by the negative temperature of the surrounding soil.It not only affects the formation of concrete strength, but also leads to engineering quality accidents in serious cases.Based on the actual measurement of temperature at different strata depths and the comprehensive consideration of surface temperature, terrestrial heat flux and other parameters, the law curve of temperature change along depth in Greater Khingan is established.The calculated results of the curve are consistent with the measured results of ground temperature.The results show that the variation trend of ground temperature along the strata depth at different monitoring sites is basically the same. From June to November, the ground temperature at different depths tends to be constant.From December to May, the ground temperature at any depth within the depth range of 0 to 5.5m follows the law of the cosine function.Below 5.5m, the earth temperature no longer varies with depth.The research results can be used as reference for pile foundation construction under negative temperature environment.
ARTICLE | doi:10.20944/preprints201909.0238.v1
Subject: Engineering, Control And Systems Engineering Keywords: Software runtime entropy; failure prediction; indicator
Online: 20 September 2019 (10:49:11 CEST)
With the development of computer science and software engineering, software becomes more and more complex. Traditional software reliability assurance techniques including software testing and evaluation can't ensure software reliable execution after being deployed. Software failure prediction techniques based on failure indicators can predict software failures according to abnormal indicator values. The latter can be collected using runtime monitoring techniques. An essential part of this method is finding proper indicators which have strong correlation with software failures. We propose a novel type of indicators in this work named software runtime entropy, which takes both software module execution time and call times into consideration. Three common open source software, grep, flex and gzip are used as study cases for finding the relationships between the indicators and software failures. Firstly, a series of fault injection experiments are conducted on those three software respectively. The decision tree algorithm is used to train those data to build the correlation models between software runtime entropy and software failures. Several common measures in machine learning domains such as accuracy, recall rates, and F-measure are used to evaluate the models. The decision tree models can be used as failure mechanisms to assist the failure prediction work. One can examine the value of runtime entropy and make a warning report when it ranges from the normal interval to abnormal one.
REVIEW | doi:10.20944/preprints202010.0510.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: disease-associated mutation; IDR; intrinsically disordered region; LLPS; phase separation; PTM; Ahr; AhRR; SIM1; SIM2; Hif-2α; NPAS4; ARNT2; BMAL1; disorder prediction; LLPS prediction; cancer; HuVarBase; catGranule prediction
Online: 26 October 2020 (10:30:47 CET)
The bHLH-PAS proteins are a family of transcription factors regulating expression of a wide range of genes involved in different functions, from differentiation and development control, by oxygen and toxins sensing to circadian clock setting. In addition to the well-preserved DNA-binding bHLH and PAS domains, bHLH-PAS proteins contain long intrinsically disordered C-terminal regions, responsible for their activity regulation. Our aim was to analyse the potential connection between disordered regions of the bHLH-PAS transcription factors with posttranscriptional modifications and liquid-liquid phase separation in the context of the disease-associated missense mutations. Highly flexible disordered regions, enriched in short more ordered motives, are responsible for wide spectrum of interactions with transcriptional co-regulators. Based on our in silico analysis and taking into account fact that transcription factors functions can be modulated by posttranslational modifications and spontaneous phase separation, we assume that the location of missense mutations inducing disease states, is clearly related to sequences directly undergoing these processes or to sequences responsible for their activity regulation.
ARTICLE | doi:10.20944/preprints202312.0156.v1
Subject: Environmental And Earth Sciences, Other Keywords: cmip5; decadal; precipitation; prediction; catchment; multi-model
Online: 4 December 2023 (07:41:48 CET)
The fidelity of the decadal experiment in Coupled Model Intercomparison Project Phase-5 (CMIP5) has been examined, over different climate variables for different temporal and spatial scales, in many previous studies. However, most of the studies were for the temperature and temperature-based climate indices. A quite limited study was conducted on precipitation of decadal experiment and no attention was paid to a catchment level. This study evaluates the performances of eight GCMs (MIROC4h, EC-EARTH, MRI-CGCM3, MPI-ESM-MR, MPI-ESM-LR, MIROC5, CMCC-CM, and CanCM4) for the monthly hindcast precipitation of decadal experiment over the Brisbane River catchment in Queensland, Australia. First, the GCMs datasets were spatially interpolated onto a spatial resolution of 0.050×0.050 (5 km× 5 km) matching with the grids of observed data and then were cut for the catchment. Next, model outputs are evaluated for temporal skills, dry and wet periods, and total precipitation (over time and space) based on the observed values. Skill test results reveal that model performances varied over the initialization years and showed comparatively higher scores from the initialization year 1990 and onward. Models with finer spatial resolutions show comparatively better performances as opposed to the models of coarse spatial resolutions where MIROC4h outperformed followed by EC-EARTH and MRI-CGCM3. Comparing the skills, models are divided into three categories (Category-I: MIROC4h, EC-EARTH, and MRI-CGCM3; Category-II: MPI-ESM-LR and MPI-ESM-MR; and Category-III: MIROC5, CanCM4, and CMCC-CM). Three multimodel ensembles’ mean (MMEMs) are formed using the arithmetic mean of Category-I (MMEM1), Category-I and II (MMEM2), and all eight models (MMEM3). The performances of MMEMs are also assessed using the same skill tests and MMEM2 performed best which suggests evaluating the models before the formation of MMEM.
ARTICLE | doi:10.20944/preprints202311.0044.v1
Subject: Engineering, Energy And Fuel Technology Keywords: solar radiation; prediction; cluster algorithm; neural network
Online: 1 November 2023 (09:44:39 CET)
One of the most important sources of energy is the sun. Taiwan is located at north 22-25° latitude. Due to its proximity to the equator, it experiences only a small angle of sunlight incidence. Its unique geographical location which can obtain sustainable and stable solar resources. This study takes research on the forecast of solar radiation to maximize the benefits of solar power generation, and develops methods that can predict the future solar radiation pattern to help reduce the costs of solar power generation. This study builds supervised machine learning models, known as deep neural network (DNN) and long short-term memory neural network (LSTM). The hybrid supervised and unsupervised model, namely cluster-based artificial neural network (k-means clustering and fuzzy C-means clustering-based models), was developed. After establishing these models, the study evaluated their prediction results. For different prediction periods, the study selected the best-performing model based on the results and proposed combining them to establish a real-time updated solar radiation forecast system capable of predicting the next 12 hours. The study area covered Kaohsiung, Hualien, and Penghu in Taiwan. Data from ground stations of the Central Weather Administration, collected between 1993 and 2021, as well as the solar angle parameters of each station, were used as input data for the model. The results of this study show that different models have their advantages and disadvantages in predicting different future times. Therefore, the hybrid prediction system can predict future solar radiation more accurately than a single model.
ARTICLE | doi:10.20944/preprints202308.0701.v1
Subject: Medicine And Pharmacology, Ophthalmology Keywords: phacotrabeculectomy; glaucoma; cataract surgery; prediction error; refraction
Online: 9 August 2023 (10:41:02 CEST)
Purpose: To compare refractive prediction error (PE)s between phacotrabeculectomy and phacoemulsification. Methods: Refractive PE was defined as the difference of spherical equivalent between the predicted value using the Barrett Universal II formula and the actual value obtained at postoperative one month. Forty-eight eyes that had undergone phacotrabeculectomy (19 eyes, open-angle glaucoma; 29 eyes, angle-closure glaucoma) were matched with 48 eyes that had undergone phacoemulsification by age, average keratometry value and axial length (AL), and their PEs were compared. The factors associated with PE were analyzed by multivariable regression analyses. Results: Phacotrabeculectomy group showed a larger absolute PE compared to the phacoemulsification group (0.51 ± 0.37 Diopters vs. 0.38 ± 0.22, P=0.033). Larger absolute PE was associated with longer AL (P=0.010) and higher intraocular pressure (IOP) difference (P=0.012). Hyperopic shift (PE>0) was associated with shallower preoperative anterior chamber depth (ACD) (P=0.024) and larger IOP difference (P=0.031). In the phacotrabeculectomy group, the PE was inversely correlated with AL: long eyes showed myopic shift and short eyes hyperopic shift (P=0.002). Conclusions: Surgeons should be aware of the possibility of worse refractive outcomes when planning phacotrabeculectomy especially in eyes with high preoperative IOP, shallow ACD, and/or extreme AL.
ARTICLE | doi:10.20944/preprints202305.0797.v1
Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: link prediction; graph neural network; graph embedding
Online: 11 May 2023 (05:24:00 CEST)
Link prediction is to complete the missing links in the network or to predict the generation of new links according to the current network structure information, which is very important for mining and analyzing the evolution of the network such for construction and analysis of logical architecture in 6G network. Link prediction algorithms based on node similarity need predefined similarity functions, which is highly hypothetical and only applies to specific network structures without generality. To solve this problem, this paper proposes a link prediction algorithm based on the subgraph of the target node pair. In order to automatically learn the graph structure characteristics, the algorithm firstly extracts the h-hop subgraph of the target node pair, and then predicts whether the target node pair will be linked according to the subgraph. Experiments on seven real data sets show that the link prediction algorithm based on target node pair subgraph is suitable for various network structures and superior to other link prediction algorithms.
COMMUNICATION | doi:10.20944/preprints202305.0600.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Continuous Learning; Multi-agent System; Prediction; Adaptability
Online: 9 May 2023 (08:23:29 CEST)
In this work, we propose a self-supervised multi-agent system that meets the online learning of clustering tasks for video behavior recognition spatio-temporal tasks. Encoding visual behavioral actions as discrete temporal sequence(DTS). Real-time clustering recognition task in a multi-agent system for continuous model building, training, and correction. Finally, we implemented a fully decentralized multi-agent system and completed its feasibility verification in a surveillance video application scenario on vehicle path clustering.
ARTICLE | doi:10.20944/preprints202301.0580.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Electronic monitoring; hate speech; data leakage; prediction.
Online: 31 January 2023 (08:59:39 CET)
Technological innovations and the expansion of Internet access have produced significant changes in the configurations of organizations and, consequently, in the relationships between employees and employers. This new scenario generates the need for greater monitoring in the workplace in order to control inappropriate behavior or situations that may generate misfortunes. Two important problems faced are the dissemination of hate through networks and data leakage that can have social, psychological, and financial impacts. Thus, monitoring tools can be incorporated to assist in surveillance, and thus ensure the achievement of organizational objectives. This paper presents a workplace computer monitoring solution that integrates Spyware techniques, and text sentiment classification, along with a distributed microservices architecture, which aims to collect a range of information and generate alerts to managers regarding hate speech and vulnerabilities. Preliminary tests have been conducted to evaluate the performance of Spyware integrated with prediction models.
ARTICLE | doi:10.20944/preprints202211.0539.v1
Subject: Medicine And Pharmacology, Obstetrics And Gynaecology Keywords: preeclampsia; prediction; machine leaning; pregnancy; first trimester
Online: 29 November 2022 (07:09:22 CET)
(1) Background: Preeclampsia (PE) prediction in the first trimester of pregnancy is a challenge for the clinicians. The aim of this study was to evaluate and compare the predictive performances of machine-learning based models for the prediction of preeclampsia, and its subtypes; (2) Methods: This prospective case-control study evaluated pregnancies that occurred in women who attended a tertiary maternity hospital in Romania between November 2019 and September 2022. The patients’ clinical and paraclinical characteristics were evaluated in the first trimester, and were included in 4 machine learning based models: decision tree (DT), naïve Bayes (NB), support vector machine (SVM), and random forest (RF), and their predictive performance was assessed; (3) Results: early-onset PE was best predicted by DT (accuracy: 94.1%), and SVM (accuracy: 91.2%) models, while NB (accuracy: 98.6%), and RF (accuracy: 92.8%) models had the highest performance when used to predict all types of PE. The predictive performance of these models was modest for moderate and severe types of PE, with accuracies ranging from 70.6% and 82.4%; (4) The machine learning-based models could be useful tools for PE prediction in the first trimester of pregnancy.
ARTICLE | doi:10.20944/preprints202210.0086.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: retweet prediction; multilayer network; natural language processing
Online: 8 October 2022 (03:00:41 CEST)
Retweet prediction is an important task related to different problems such as information spreading analysis, the automatic detection of fake news, social media monitoring, etc. In this study we explore the possibilities of retweet prediction based on heterogeneous data sources. In order to classify the tweet according to the amount of retweets, we combine features extracted from the multilayer network and the text. More specifically, we introduce a multilayer framework that proposes the multilayer network representation of Twitter. This formalism captures different users' actions and complex relationships as well as other key properties of communication on Twitter. We select a set of local network measures from each layer and construct a set of multilayer network features. In addition, we adopt a BERT-based language model, namely Cro-CoV-cseBERT to capture high-level semantics and structure of tweets as a set of text features. Then, we train six machine learning (ML) algorithms: random forest, multilayer perceptron, light gradient boosting machine, category embedding model, neural oblivious decision ensembles and attentive interpretable tabular learning model in the task of retweet prediction. We compare the performance of all six algorithms in three different setups (i) using only text features, (ii) using only multilayer network features and (iii) using both sets of features. We evaluate all setups in terms of standard evaluation measures i.e. precision, recall, F1-score and accuracy. For this task, we first prepare and use an empirical dataset of 199,431 tweets in the Croatian language posted during the period between January 1, 2020 and May 31, 2021. Our results indicate that by integrating multilayer network features with text features the prediction model would perform better than using just one set of features.
ARTICLE | doi:10.20944/preprints202209.0277.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: link prediction; AUC-ROC; Early retrieval evaluation
Online: 19 September 2022 (10:31:53 CEST)
Link prediction is an unbalanced early retrieval problem, whose goal is to prioritize a small cohort of positive links on top of a list largely populated by unlabelled links. Differently from binary classification, here the evaluation focuses on how the predictor prioritizes the positive class because, in practice, a negative class does not exist. Previous studies explained that AUC-ROC is not apt for unbalanced class problems and is misleading for early retrieval problems, therefore standard AUC-ROC is not appropriate for evaluation of link prediction. However, some scholars argue that an AUC-ROC like evaluation accounting for the relative positioning of the few positive links among the vastness of unlabelled links remains a valid concept to pursue. Here we propose the area under the magnified ROC (AUC-mROC), a new measure that adjusts the standard AUC-ROC to work also for unbalanced early retrieval problems such as link prediction.
ARTICLE | doi:10.20944/preprints202207.0226.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: yield prediction; APSIM; optimization; Bayesian; hierarchical; emulation
Online: 15 July 2022 (05:44:05 CEST)
The enormous increase in the volume of Earth Observations (EOs) has provided the scientific community with unprecedented temporal, spatial, and spectral information. However, this increase in the volume of EOs has not yet resulted in proportional progress with our ability to forecast agricultural systems.This study examines the applicability of EOs obtained from Sentinel2 and Landsat8 for constraining the APSIM-Maize model parameters. We leveraged leaf area index (LAI) retrieved from Sentinel2 and Landsat8 NDVI to constrain a series of APSIM-Maize model parameters in three different Bayesian multi-criteria optimization frameworks across 13 different sites across the U.S Midwest. A time variant sensitivity analysis was performed to identify the most influential parameters driving the LAI estimates in APSIM-Maize model. Then surrogate models were develop using random samples taken from the parameter space using Latin hypercube sampling to emulate APSIM’s behavior in simulating NDVI and LAI at all sites. Site-level, global and hierarchical Bayesian optimization models were then developed using the site-level emulators to simultaneously constrain all parameters and estimate the site to site variability in crop parameters. For within sample predictions, site-level optimization showed the largest predictive uncertainty around LAI and crop yield, whereas the global optimization showed the most constraint predictions for these variables. Lowest RMSE for within sample yield prediction was found for hierarchical optimization scheme (1423 Kg ha−1) while the largest RMSE was found for site-level (1494 Kg ha−1). In out-of-sample predictions within the spatio-temporal extent of the training sites, global optimization showed lower RMSE (1627 Kg ha−1) compared to the hierarchical approach (1822 Kg ha−1) across 90 independent sites in the U.S Midwest. On comparison between these two optimization schemes across another 242 independent sites outside the spatio-temporal extent of the training sites, global optimization also showed substantially lower RMSE (1554 Kg ha−1) as compared to the hierarchical approach (2532 Kg ha−1). Overall, EOs demonstrated their real use case for constraining process-based crop models and showed comparable results to model calibration exercises using only field measurements.
ARTICLE | doi:10.20944/preprints202207.0035.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: QoE; Fairness; SDN; Classification Prediction; DASH; Multimedia
Online: 4 July 2022 (06:08:03 CEST)
Quality of Experience (QoE) metrics can be used to assess user perception and satisfaction in data services applications delivered over the Internet. End-to-end metrics are formed because QoE is dependent on both the users’ perception and the service used. Traditionally, network optimization has focused on improving network properties such as the QoS. In this paper we examine the Adaptive streaming over a software defined network environment. We aimed to evaluate and study the media streams, aspects affecting the stream, and network. This was done to eventually reach a stage of analysing the network’s features and their direct relationship with the perceived QoE. We then use machine learning to build a prediction model based on subjective user experiments. This will help to eliminate future physical experiments and automate the process of predicting QoE.
ARTICLE | doi:10.20944/preprints202202.0175.v1
Subject: Biology And Life Sciences, Biology And Biotechnology Keywords: antimicrobial peptide prediction; sequence analysis; random forest
Online: 14 February 2022 (11:57:01 CET)
Antimicrobial peptides (AMPs) are considered as promising alternatives to conventional antibiotics in order to overcome the growing problems of antibiotic resistance. Computational prediction approaches receive an increasing interest to identify and design the best candidate AMPs prior to the in-vitro tests. In this study, we focused on the linear cationic peptides with non-hemolytic activity, which are downloaded from the Database of Antimicrobial Activity and Structure of Peptides (DBAASP). Referring to the MIC (Minimum inhibition concentration) values, we have assigned a positive label to a peptide if it shows antimicrobial activity; otherwise the peptide is labeled as negative. Here, we focused on the peptides showing antimicrobial activity against Gram-negative and against Gram-positive bacteria separately, and we created two datasets accordingly. Ten different physico-chemical properties of the peptides are calculated and used as features in our study. Following data exploration and data preprocessing steps, a variety of classification algorithms are used with 100-fold Monte Carlo Cross Validation to build models and to predict the antimicrobial activity of the peptides. Among the generated models, Random Forest has resulted in the best performance metrics for both Gram-negative dataset (Accuracy: 0.98, Recall: 0.99, Specificity: 0.97, Precision: 0.97, AUC: 0.99, F1: 0.98) and Gram-positive dataset (Accuracy: 0.95, Recall: 0.95, Specificity: 0.95, Precision: 0.90, AUC: 0.97, F1: 0.92) after outlier elimination is applied. This prediction approach might be useful to evaluate the antibacterial potential of a candidate peptide sequence before moving to the experimental studies.
ARTICLE | doi:10.20944/preprints202202.0145.v1
Subject: Computer Science And Mathematics, Analysis Keywords: Smart grids; Optimization; Prediction methods; Energy exchange
Online: 10 February 2022 (07:55:02 CET)
The concept of distributed generation has made photovoltaic an integral source of energy in smart grid systems, especially in peer-to-peer energy trading frameworks that exploit excess power to fulfill the energy requirements of consumers in cost-efficient and eco-friendly manner. It is believed that P2P energy trading will dominate a significant portion of research in forthcoming power generation systems due to the excessive rise of energy demands across the globe. Despite a plethora of studies on energy optimization solutions in P2P trading, minimizing nanogrid energy trading cost and efficient energy sharing between consumers and prosumers are deemed among the challenging problems. This study overcomes essential issues overlooked by the contemporary P2P energy trading models by introducing a predictive optimization-oriented nanogrid energy trading model. The proposed study encompasses two stages: (1) predictive optimization model which harnesses BD-LSTM-based forecasted energy parameters (energy load, energy consumption, and PV generation) that are later incorporated in PSO-enabled objective function to reduce nanogrid trading cost, (2) optimal energy sharing plan is devised to decide the role of nanogrids as prosumers or consumers by emphasizing the use of PV-produced energy. The proposed model is validated on the case study containing nanogrid houses data. The simulation provides detailed experiments by comparing the energy demand and response using the proposed energy sharing model. The outcomes yield that the energy sharing plan holds a significant potential to fulfill maximum energy requirements of nanogrid house in P2P cluster and significantly reduces the energy cost compared to grid.
ARTICLE | doi:10.20944/preprints202111.0030.v1
Subject: Physical Sciences, Applied Physics Keywords: reservoir computing; time series prediction; performance optimisation
Online: 2 November 2021 (10:09:46 CET)
Reservoir computing is a machine learning method that uses the response of a dynamical system to a certain input in order to solve a task. As the training scheme only involves optimising the weights of the responses of the dynamical system, this method is particularly suited for hardware implementation. Furthermore, the inherent memory of dynamical systems which are suitable for use as reservoirs mean that this method has the potential to perform well on time series prediction tasks, as well as other tasks with time dependence. However, reservoir computing still requires extensive task dependent parameter optimisation in order to achieve good performance. We demonstrate that by including a time-delayed version of the input for various time series prediction tasks, good performance can be achieved with an unoptimised reservoir. Furthermore, we show that by including the appropriate time-delayed input, one unaltered reservoir can perform well on six different time series prediction tasks at a very low computational expense. Our approach is of particular relevance to hardware implemented reservoirs, as one does not necessarily have access to pertinent optimisation parameters in physical systems but the inclusion of an additional input is generally possible.
ARTICLE | doi:10.20944/preprints202103.0337.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Graph embedding; Link prediction; Mutual information; Subgraph
Online: 12 March 2021 (08:47:29 CET)
The prediction of drug--target interactions is always a key task in the field of drug redirection. However, traditional methods of predicting drug--target interactions are either mediocre or rely heavily on data stacking. In this work, we merged heterogeneous graph information and obtained effective node information and substructure information based on mutual information in graph embeddings. We then learned high quality representations for downstream tasks, and proposed an end--to--end auto--encoder model to complete the task of link prediction. Experimental results show that our method outperforms several state--of--art models. The model can achieve the area under the receiver operating characteristics (AUROC) curve of 0.959 and area under the precise recall curve (AUPR) of 0.848. We found that the mutual information between the substructure and graph--level representations contributes most to the mutual information index in a relatively sparse network. And the mutual information between the node--level and graph--level representations contributes most in a relatively dense network.
Subject: Biology And Life Sciences, Animal Science, Veterinary Science And Zoology Keywords: intramuscular fat; prediction; image analysis; Bísaro pork
Online: 13 January 2021 (13:16:19 CET)
This work presents an analytical methodology to predict meat juiciness (discriminant semi-quantitative analysis using groups of intervals of intramuscular fat) and intramuscular fat (regression analysis) in Longissimus thoracis et lumborum (LTL) muscle of Bísaro pigs using as independent variables the animal carcass weight and parameters from color and image analysis. These are non-invasive and non-destructive techniques which allow development of rapid, easy and inexpensive methodologies to evaluate pork meat quality in a slaughterhouse. The proposed predictive supervised multivariate models were non-linear. Discriminant mixture analysis to evaluate meat juiciness by classified samples into three groups—0.6 to 1.1%; 1.25 to 1.5%; and, greater than 1.5%. The obtained model allowed 100% of correct classifications (92% in cross-validation with seven-folds with five repetitions). Polynomial support vector machine regression to determine the intramuscular fat presented R2 and RMSE values of 0.88 and 0.12, respectively in cross-validation with seven-folds with five repetitions. This quantitative model (model’s polynomial kernel optimized to degree of three with a scale factor of 0.1 and a cost value of one) presented R2 and RSE values of 0.999 and 0.04, respectively. The overall predictive results demonstrated the relevance of photographic image and color measurements of the muscle to evaluate the intramuscular fat, rarther than the usual time-consuming and expensive chemical analysis.
ARTICLE | doi:10.20944/preprints202009.0521.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: electroencephalographic; feature selection; machine learning; prediction model
Online: 22 September 2020 (11:27:03 CEST)
In recent years, research has focused on generating mechanisms to assess the levels of subjects' cognitive workload when performing various activities that demand high concentration levels, such as driving a vehicle. These mechanisms have implemented several tools to analyze cognitive workload where the electroencephalographic (EEG) signals are the most used due to its high precision. However, one of the main challenges in the EEG signals implementing is finding the appropriate information to identify cognitive states. Here we show a new feature selection model for pattern recognition using information from EEG signals based on machine learning techniques called GALoRIS. GALoRIS combines Genetic Algorithms and Logistic Regression to create a new fitness function that identifies and selects the critical EEG features that contribute to recognizing high and low cognitive workload and structures a new dataset capable of optimizing the model's predictive process. We found that GALoRIS identifies data related to high and low cognitive workload of subjects while driving a vehicle using information extracted from multiple EEG signals, reducing the original dataset by more than 50%, maximizing the model's predictive capacity-achieving a precision rate greater than 90%.
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Crime prediction; Ensemble Learning; Machine Learning; Regression
Online: 14 September 2020 (00:53:30 CEST)
While the use of crime data has been widely advocated in the literature, its availability is often limited to large urban cities and isolated databases tend not to allow for spatial comparisons. This paper presents an efficient machine learning framework capable of predicting spatial crime occurrences, without using past crime as a predictor, and at a relatively high resolution: the U.S. Census Block Group level. The proposed framework is based on an in-depth multidisciplinary literature review allowing the selection of 188 best-fit crime predictors from socio-economic, demographic, spatial, and environmental data. Such data are published periodically for the entire United States. The selection of the appropriate predictive model was made through a comparative study of different machine learning families of algorithms, including generalized linear models, deep learning, and ensemble learning. The gradient boosting model was found to yield the most accurate predictions for violent crimes, property crimes, motor vehicle thefts, vandalism, and the total count of crimes. Extensive experiments on real-world datasets of crimes reported in 11 U.S. cities demonstrated that the proposed framework achieves an accuracy of 73 and 77% when predicting property crimes and violent crimes, respectively.
ARTICLE | doi:10.20944/preprints202005.0176.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: COVID-19; epidemic diseases; compartmental model; prediction
Online: 10 May 2020 (17:10:15 CEST)
In India the first case of coronavirus disease 2019 (COVID-19) reported on 30 January 2020, and thereafter cases were increasing daily after the last week of Feb. 2020. COVID-19 identified as family member of coronaviridae where previously Middle East Respiratory Syndrome MERS and Severe Acute Respiratory Syndrome SARS belongs to same family. The COVID-19 attacks on respiratory system signing fever, cough and breath shortness, in severe cases may cause pneumonia, SARS or some time death. The aim of this study work is to develop model which predicts the epidemic peak for COVID-19 in India by using the real-time data from 30 Jan to 10 May 2020. There are uncertainties while identifying the population information due to the incomplete and inaccurate data, we initiate the most popular model for epidemic prediction i.e Susceptible, Exposed, Infectious, & Recovered SEIR initially the compartmental model for the prediction. Based on the solution of the state estimation problem for polynomial system with Poisson noise, we estimate that the epidemic peak may reach the early-middle July 2020, initializing recovered R0 to 0 and Infected I0 to 1. The outcomes of the model will help epidemiologist to isolate the source of the disease geospatially and analyze the death. Also government authorities will be able to target their interventions for rapidly checking the spread of the epidemic.
ARTICLE | doi:10.20944/preprints202004.0539.v1
Subject: Medicine And Pharmacology, Epidemiology And Infectious Diseases Keywords: Covid-19; dengue cases; Kupang city prediction
Online: 30 April 2020 (16:50:03 CEST)
With the pandemic of Corona Virus [Covid-19], another infectious disease such as dengue neglected In Indonesia. Since the majority of resources, both human and capital, are focusing more on Covid-19, it is still essential to also manage dengue as it is still becoming a threat to the community. This paper aims to predict the number of cases of dengue in Kupang, East Nusa Tenggara, which can help the government to plan for dengue program activities. The result shows the forecast that dengue will remain high for the whole year. With the stay at the home approach to preventing COVID19, chances to get dengue virus increased. Maintaining a clean environment, reduction of breeding sites, and other protective measurements against dengue transmission is very important to perform.
ARTICLE | doi:10.20944/preprints202004.0421.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: COVID-19; trend prediction; optimized neural network
Online: 24 April 2020 (02:57:32 CEST)
The recent worldwide outbreak of the novel corona-virus (COVID-19) opened up new challenges to the research community. Artificial intelligence (AI) driven methods can be useful to predict the parameters, risks, and effects of such an epidemic. Such predictions can be helpful to control and prevent the spread of such diseases. The main challenges of applying AI is the small volume of data and the uncertain nature. Here, we propose a shallow Long short-term memory (LSTM) based neural network to predict the risk category of a country. We have used a Bayesian optimization framework to optimized and automatically design country-specific networks. We have combined the trend data and weather data together for the prediction. The results show that the proposed pipeline outperforms against state-of-the-art methods for 170 countries data and can be a useful tool for such risk categorization. The tool can be used to predict long-duration outbreak of such an epidemic such that we can take preventive steps earlier.
Subject: Chemistry And Materials Science, Theoretical Chemistry Keywords: structure prediction; Rosetta; computational modeling; protein design
Online: 16 October 2019 (05:40:52 CEST)
The Rosetta software suite for macromolecular modeling, docking, and design is widely used in pharmaceutical, industrial, academic, non-profit, and government laboratories. Considering its broad modeling capabilities, Rosetta consistently ranks highly when compared to other leading methods created for highly specialized protein modeling and design tasks. Developed for over two decades by a global community of scientists at more than 60 institutions, Rosetta has undergone multiple refactorings, and now comprises over three million lines of code. Here we discuss the methods developed in the last five years, involving the latest protocols for structure prediction, protein–protein and protein–small molecule docking, protein structure and interface design, loop modeling, the incorporation of various types of experimental data, and modeling of peptides, antibodies and other proteins in the immune system, nucleic acids, non-standard amino acids, carbohydrates, and membrane proteins. We briefly discuss improvements to the energy function, user interfaces, and usability of the software. Rosetta is available at www.rosettacommons.org.
ARTICLE | doi:10.20944/preprints201805.0015.v1
Subject: Business, Economics And Management, Finance Keywords: deep neural nets; market efficiency; market prediction
Online: 2 May 2018 (08:12:01 CEST)
We examine the use of deep learning (neural networks) to predict the movement of the S&P 500 Index using past returns of all the stocks in the index. Our analysis finds that the future direction of the S&P 500 index can be weakly predicted by the prior movements of the underlying stocks in the index. Decomposition of the prediction error indicates that most of the lack of predictability comes from randomness and only a little from nonstationarity. We believe this is the first test of S&P500 market efficiency that uses a very large information set, and it extends the domain of weak-form market efficiency tests.
ARTICLE | doi:10.20944/preprints201701.0063.v1
Subject: Chemistry And Materials Science, Materials Science And Technology Keywords: HfB4; structure prediction; superhard material; anisotropic properties
Online: 12 January 2017 (10:57:03 CET)
By using the particle swarm optimization algorithm for crystal structure prediction, we reveal a newly orthorhombic Cmcm structure of HfB4, which is more energetically superior to the previously proposed YB4-, ReP4-, FeB4-, CrB4-, and MnB4-type structures in the considered pressure range. The phonon dispersion and elastic constants calculations confirm that the new phase is dynamically and mechanically stable. The calculated large shear modulus (240 GPa) and high hardness (45.7 GPa) imply that the predicted Cmcm-HfB4 is a potential superhard material. Meanwhile, the directional dependences of the Young's modulus, bulk modulus, and shear modulus for HfB4 are systematically investigated. Further analyses of the density of states and electronic localization function indicate that the strong B-B and B-Hf covalent bonds greatly contribute to its high hardness and stability.
Subject: Computer Science And Mathematics, Mathematics Keywords: harmony search; meta-heuristic; parameter optimization; software defect prediction; just-in-time prediction; software quality assurance; maintenance; maritime transportation
Online: 31 December 2020 (09:27:46 CET)
Software is playing the most important role in recent vehicle innovation, and consequently the amount of software has been rapidly growing last decades. Safety-critical nature of ships, one sort of vehicles, makes Software Quality Assurance (SQA) has gotten to be a fundamental prerequisite. Just-In-Time Software Defect Prediction (JIT-SDP) aims to conduct software defect prediction (SDP) on commit-level code changes to achieve effective SQA resource allocation. The first case study of SDP in maritime domain reported feasible prediction performance. However, we still consider that the prediction model has still rooms for improvement since the parameters of the model are not optimized yet. Harmony Search (HS) is a widely used music-inspired meta-heuristic optimization algorithm. In this article, we demonstrated that JIT-SDP can produce the better performance of prediction by applying HS-based parameter optimization with balanced fitness value. Using two real-world datasets from the maritime software project, we obtained an optimized model that meets the performance criterion beyond baseline of previous case study throughout various defect to non-defect class imbalance ratio of datasets. Experiments with open source software also showed better recall for all datasets despite we considered balance as performance index. HS-based parameter optimized JIT-SDP can be applied to the maritime domain software with high class imbalance ratio. Finally, we expect that our research can be extended to improve performance of JIT-SDP not only in maritime domain software but also in open source software.
ARTICLE | doi:10.20944/preprints202311.1493.v1
Subject: Medicine And Pharmacology, Obstetrics And Gynaecology Keywords: machine learning; preeclampsia; intrauterine growth restriction; prediction; screening
Online: 23 November 2023 (08:45:26 CET)
1) Background: The screening of preeclampsia (PE) and intrauterine growth restriction (IUGR) represents a constant challenge for obstetricians. The aim of this study was to determine and compare the predictive performance of 4 machine learning-based algorithms for the prediction of PE, IUGR, and their association in a cohort of singleton pregnancies; (2) Methods This prospective study was conducted at a tertiary maternity hospital in Romania, and included 210 pregnancies that underwent first trimester screening. We included clinical and paraclinical data into 4 machine learning-based algorithms decision tree (DT), naïve Bayes (NB), support vector machine (SVM), and random forest (RF), and calculated their predictive performance; (3) Results: RF performed the best when used to predict PE, IUGR, and its subtypes, as well as the association between PE and IUGR. The overall predictive performance of DT for all these disorders was inferior to RF, NB, and SVM. Both SVM and NB had similar accuracy for the prediction of PE, while NB performed better than SVM for the prediction of IUGR; (4) Conclusions: Machine-learning-based algorithms could be useful for the prediction of ischemic placental disease and need to be validated on large cohorts of patients.
ARTICLE | doi:10.20944/preprints202311.1375.v1
Subject: Chemistry And Materials Science, Food Chemistry Keywords: HS-MS e-NOSE; packaging; prediction; shelf life
Online: 22 November 2023 (06:35:46 CET)
A rapid and efficient technique using an electronic nose based on a mass detector combined with headspace sampling (HS-SPME-MS e-nose) and chemometric tools was applied to classify beer samples between fresh and aged and between samples contained in aluminium cans or glass bottles, and to predict the shelf life of beer. The mass spectra obtained from the HS-SPME-MS e-nose contain information on the volatile compounds, recorded as the abundance of each ion at different mass-to-charge (m/z) ratios. The analysis was performed on 53 samples aged naturally for eleven months in the absence of light and with a controlled temperature of around 14° C +/- 0.5° C. Principal Component Analysis (PCA) was performed on the data, showing a grouping of samples between fresh and aged. Partial Least Square Discriminant Analysis (PLS-DA) allowed discriminating fresh from aged beers but was not able to discriminate samples packaged in aluminium cans or in glass bottles. Finally, Partial Least Square Regression (PLSR) was applied to build a prediction model and showed to be effective to predict beer shelf life.
ARTICLE | doi:10.20944/preprints202311.0190.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: machine learning; crossfit; sport analytics; weightlifting; performance prediction
Online: 2 November 2023 (17:27:56 CET)
(1) Background: The analysis of athletic performance has always aroused great interest from sport scientist. This study utilized machine learning methods to build predictive models using a comprehensive CrossFit (CF) dataset, aiming to reveal valuable insights into the factors influencing performance and emerging trends.; (2) Methods: The study used Random Forest (RF) and Multiple Linear Regression (MLR) models to predict performance in four key weightlifting exercises within CF: clean & jerk, snatch, back squat, and deadlift. Performance was evaluated using R-squared (R2) values and Mean Squared Error (MSE). Feature importance analysis was conducted using RF, XGBoost, and AdaBoost models.; (3) Results: The RF model excelled in deadlift performance prediction (R2 = 0.80), while the MLR model demonstrated remarkable accuracy in clean & jerk (R2 = 0.93). Across exercises, clean & jerk consistently emerged as a crucial predictor. The feature importance analysis revealed intricate relationships among exercises, with gender significantly impacting deadlift performance.; (4) Conclusions: This research advances our understanding of performance prediction in CF through machine learning techniques. It provides actionable insights for practitioners, optimize performance, and demonstrates the potential for future advancements in data-driven sports analytics.
ARTICLE | doi:10.20944/preprints202309.1987.v1
Subject: Engineering, Chemical Engineering Keywords: asphaltene precipitation; asphaltene prediction; asphaltene machine learning model
Online: 28 September 2023 (10:24:29 CEST)
The precipitation, flocculation, and deposition of asphaltene cause severe formation damage within a reservoir and shorten a well’s productive life. Pressure depletion is one factor that contributes to asphaltene precipitation during production; therefore, the first step in managing asphaltene is to determine the onset pressure of the precipitation. While there are numerous equation of state models that can be used to predict the onset pressure, these models are complex and heavily reliant on tuning parameters. Using multivariate linear regression, this work attempts to develop a simple and accurate thermodynamic model for predicting the upper precipitation onset pressure under pressure depletion above the bubble point pressure (Pb) at various temperatures. A total of 94 experimental data points from 37 published crude oil data sets were compiled from the literature. To develop the model, 59 experimental data points were used as training data and 35 experimental data points as testing data. According to the results of the multicollinearity test, the bubble point pressure, temperature, resins, and saturate-to-aromatic ratio were chosen as predictors. The upper onset pressure data with comparable trends were clustered, and unsupervised recognition of three distinct cluster groups was performed. For each cluster identified, a multivariate linear regression model was developed. The model was chosen based on Mallow’s coefficient of determination (Cp), adjusted R2 (statistical measure of fit), and S (standard error of the regression slope). The developed model was tested using a data set, and the results showed an adjusted R2 of 96.25%, with a mean absolute error of 4.1%. The model was randomly applied to 15 data points to compare it to perturbed-chain statistical associated fluid theory (PC SAFT) and the Peng-Robinson equation of state models and to the multivariate regression models of Fahim (2007) and Ameli et al. (2016). The results showed that the mean absolute error for predicting the asphaltene precipitation onset pressure was 2.82% using Peng-Robinson, 2.36% using the PC SAFT equation of state, 23.96% using the Fahim model, 24.80% using the model reported by Ameli et al., and 2.39% using the newly developed multivariate regression model. The developed multivariate model appears to be as accurate as the PC SAFT equation of state modeling with tuning parameters. The primary advantage of multivariate regression is that, unlike the PC SAFT equation of state model, it does not require saturates, aromatics, resins, and asphaltenes (SARA)-based characterization methodologies or rigorous parameter tuning. It is simple to use, quick, and it produces results in a short period of time.
ARTICLE | doi:10.20944/preprints202309.1747.v1
Subject: Environmental And Earth Sciences, Space And Planetary Science Keywords: machine learning; Dst index; LSTM; EMD-LSTM; prediction
Online: 26 September 2023 (11:29:55 CEST)
The Dst index is the geomagnetic storm index used to measure the energy level of geomagnetic storms, and the prediction of this index is of great significance for the geomagnetic storm study and the solar activity. In contrast to traditional numerical modelling techniques, machine learning, which has emerged in decades ago based on rapidly developing computer hardware and software and artificial intelligence methods, has been unprecedentedly developed in geophysics, especially solar-terrestrial space physics. This study chooses two machine learning models, the LSTM (Long-Short Time Memory, LSTM) and EMD-LSTM model (Empirical Mode Decomposition, EMD), to model and predict the Dst index. By building the Dst index data series from 2018-2023, two models were built to fit and predict the data. Firstly, we evaluated the influences of the learning rate and the amount of training data on the prediction accuracy of the LSTM model, and finally, 10-3 was thought as the optimal learning rate; secondly, the two models were used to predict the Dst index in the solar active and quiet periods, respectively, and the RMSE (Root Mean Square Error) of the LSTM model in the active period is 7.34 nT, the CC (correlation coefficient) is 0.96, those of the quiet period are 2.64nT and 0.97; the RMSE and r of EMD-LSTM model are 8.87nT and 0.93 in active time and 3.29nT and 0.95 in the quiet time. Finally, the prediction accuracy of the LSTM model in short time period is slightly better than the EMD-LSTM model. However, there will be a problem of prediction lag, which the EMD-LSTM model can then solve, and can better predict the geomagnetic storm.
ARTICLE | doi:10.20944/preprints202308.1529.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Distributed representation; Knowledge graph; Link prediction; Logical rule
Online: 22 August 2023 (07:32:17 CEST)
Knowledge graphs (KGs) play a crucial role in many applications, such as question answering, but incompleteness is an urgent issue for their broad application. Much research in knowledge graph completion (KGC) has been performed to resolve this issue. The methods of KGC can be classified into two major categories: rule-based reasoning and embedding-based reasoning. The former has high accuracy and good interpretability, but a major challenge is to obtain effective rules on large-scale KGs. The latter has good efficiency and scalability, but it relies heavily on data richness and cannot fully use domain knowledge in the form of logical rules. We propose a novel method that injects rules and learns representations iteratively to take full advantage of rules and embeddings. Specifically, we model the conclusions of rule groundings as 0-1 variables and use a rule confidence regularizer to remove the uncertainty of the conclusions. The proposed approach has the following advantages: 1) It combines the benefits of both rules and knowledge graph embeddings (KGEs) and achieves a good balance between efficiency and scalability. 2) It uses an iterative method to continuously improve KGEs and remove incorrect rule conclusions. Evaluations on two public datasets show that our method outperforms the current state-of-the-art methods, improving performance by 2.7% and 4.3% in mean reciprocal rank (MRR).
ARTICLE | doi:10.20944/preprints202307.1478.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: fuzzy logic system; pipeline prediction model; multiple sclerosis
Online: 21 July 2023 (09:08:26 CEST)
Interferon-beta is one of the most widely prescribed disease-modifying therapies for multiple sclerosis patients. However, this treatment is only partially effective, and a significant proportion of patients do not respond to this drug. This paper proposes an alternative fuzzy logic system based on the opinion of a neurology expert to classify relapsing-remitting multiple sclerosis patients: high, medium, and low responder to interferon-beta. Also, a pipeline prediction model trained with biomarkers associated to interferon-beta response is proposed for predicting whether patients are potential candidates to be treated with this drug, in order to avoid ineffective therapies. The classification results shows that the fuzzy system presents a 100% efficiency compared with an unsupervised hierarchical clustering method (52%). So, the performance of the prediction model is evaluated, and a 0.8 testing accuracy is achieved. Hence, a pipeline model including data standardization, data compression, and a learning algorithm, can be a useful tool for getting reliable predictions about the response to interferon-beta.
ARTICLE | doi:10.20944/preprints202307.1127.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: prediction; adherence; methotrexate; self-report questionnaires; rheumatoid arthritis
Online: 18 July 2023 (02:57:56 CEST)
Abstract: Objective: This study aimed to estimate adherence to methotrexate in patients with rheumatoid arthritis and identify specific nonadherence risk factors. Methods: A cross-sectional study included 111 patients (age mean 56.2±10.6 years, 78.4% female, and disease duration mean 6 (3-13) years). Three adherence self-assessment questionnaires were used: the Compli-ance-Questionnaire-Rheumatology (CQR19), the Medication Adherence Reports Scale (MARS-5), and the Visual Analogue Scale (VAS). We also collected demographic data, disease and treatment characteristics, and anxiety/depression estimation results (Hospital Anxiety and Depression Scale- HADS). Results: Adherence was identified in 48.6% of patients (COR19), 70.3% (MARS-5), and 82.9% of patients in the VAS questionnaire. All three questionnaires displayed a significant positive mutual correlation: CQR19 with MARS-5 and VAS (r =0.364, r=0.329 respectively, p<0.001 for both), between VAS and MARS-5 score (r=0.496, p<0.001). A significant positive prediction was shown for urban residence (0.347 (0.134-0.901), p=0.030), using the MARS-5 scale, female sex (0.264 (0.095-0.730), p=0.010) according to CQR19 and for a dose of methotrexate (0.881 (0.783-0.992), p=0.036) in VAS scale, while negative prediction were shown for comorbidity number (3.062 (1.057-8.874), p=0.039), and depression (1.142 (1.010-1.293), p=0.035) using MARS-5 scale and for older age (1.041 (1.003-1.081), p=0.034) according to CQR19. The use of steroids was a significant positive predictor in all three questionnaires and remained an independent predictor for metho-trexate adherence in multivariant logistic regression. Conclusion: We showed nonadherence to methotrexate in a significant number of patients using all three questionnaires. Concomitant steroid therapy emerged as an independent positive predictor for adherence.
ARTICLE | doi:10.20944/preprints202307.0164.v1
Subject: Medicine And Pharmacology, Dentistry And Oral Surgery Keywords: artificial intelligence; machine learning; mandibular growth; growth prediction
Online: 4 July 2023 (07:55:01 CEST)
The goal was to create a novel machine learning (ML) model which can predict the magnitude and direction of pubertal mandibular growth in males with Class II malocclusion. Lateral cephalometric radiographs of 123 males at three time points (T1: 12, T2: 14, T3: 16 years old) were collected from an online database of longitudinal growth studies. Each radiograph was traced, and 7 different ML models were trained using 38 data points obtained from 92 subjects. 31 subjects were used as a test group, to predict post-pubertal mandibular length and Y-axis using input data from T1 and T2 combined (2-year prediction), and T1 alone (4-year prediction). Mean absolute errors (MAEs) were used to evaluate the accuracy of each model. For all ML methods tested using the 2-year prediction, the MAEs for post-pubertal mandibular length ranged from 2.11-6.07mm and 0.85-2.74° for the Y-axis. For all ML methods tested with 4-year prediction, the MAEs for post-pubertal mandibular length ranged from 2.32-5.28 mm and 1.25-1.72° for the Y-axis. Besides its initial length, the most predictive factors for mandibular length were found to be chronological age, upper and lower face heights, upper and lower incisor positions and inclinations. For the Y-axis, the most predictive factors were found to be Y-axis at earlier time points, SN-MP, SN-Pog, SNB and SNA. Whilst the potential of ML techniques to accurately forecast future mandibular growth in Class II cases is promising, a requirement for more substantial sample sizes exists to further enhance the precision of these predictions.
ARTICLE | doi:10.20944/preprints202306.1808.v1
Subject: Engineering, Architecture, Building And Construction Keywords: Occupancy prediction; Deep learning; Multi-sensor fusion; Transformer
Online: 26 June 2023 (11:50:04 CEST)
Buildings are responsible for approximately 40% of the world’s energy consumption and 36% of the total carbon dioxide emissions. Building occupancy is essential, enabling Occupant-Centric Control for zero emissions and decarbonization. Although existing machine learning and deep learning methods for building occupancy prediction have achieved remarked progress, their analyses remain limited when applied to complex real-world scenarios. Besides, there is a high expectation for Transformer algorithms to predict building occupancy accurately. Therefore, this paper presents an Occupancy Prediction Transformer network (OPTnet). We fuse and feed multi-sensor data (building occupancy, indoor environmental conditions, HVAC operations) into a Transformer model to forecast the future occupancy presence in multiple zones. We perform experimental analysis and compare it to different occupancy prediction methods (e.g., Decision Tree, Long Short-Term Memory networks, Multi-layer perceptron) and diverse time horizons (1,2,3,5,10,20,30mins). The performance metrics (e.g., Accuracy and Mean Squared Error) are employed to evaluate the effectiveness of prediction algorithms. Our OPTnet method achieves superior performance on our experimental two-week data compared to existing methods. The improved performance signifies its potential to enhance HVAC control systems and energy optimization strategies. We will make the code publicly available at https://github.com/kailaisun/occupancy-prediction-binary to promote transparency and reproducibility.
ARTICLE | doi:10.20944/preprints202306.1705.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: GNSS; Deep Learning; Time Series Prediction; VMD; LSTM
Online: 25 June 2023 (03:34:01 CEST)
GNSS time series prediction plays a significant role in monitoring crustal plate motion, landslide detection, and maintenance of the global coordinate framework. Long Short-Term Memory (LSTM), a deep learning model has been widely applied in the field of high-precision time series prediction especially when combined with Variational Mode Decomposition (VMD) to form the VMD-LSTM hybrid model. To further improve the prediction accuracy of the VMD-LSTM model, this paper proposes a dual variational modal decomposition long short-term memory (DVMD-LSTM) model to effectively handle the noise in GNSS time series prediction. This model extracts fluctuation features from the residual terms obtained after VMD decomposition to reduce the prediction errors associated with residual terms in the VMD-LSTM model. Daily E, N, and U coordinate data recorded at multiple GNSS stations between 2000 and 2022 are used to validate the performance of the proposed DVMD-LSTM model. The experimental results demonstrate that compared to the VMD-LSTM model, the DVMD-LSTM model achieves significant improvements in prediction performance across all measurement stations. The average RMSE is reduced by 9.86%, and the average MAE is reduced by 9.44%. Furthermore, the average accuracy of the optimal noise model for the predicted results is improved by 36.50%, and the average speed accuracy of the predicted results is enhanced by 33.02%. These findings collectively attest to the superior predictive capabilities of the DVMD-LSTM model, thereby enhancing the reliability of the predicted results.
ARTICLE | doi:10.20944/preprints202306.0671.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Earthquake; Recurrent neural network; Prediction; Artificial neural network
Online: 9 June 2023 (05:21:52 CEST)
An earthquake is a natural event by its general definition. This natural event is a disaster that causes significant damage, loss of life, and other economic effects that will damage the state. The possibility of predicting a natural event such as an earthquake will minimize the reasons mentioned. Data collection, data processing, and data evaluation were carried out in this study. Earthquake forecasting was performed using the data and the RNN (Recurrent Neural Network) method. The study was carried out on seismic data with a magnitude of 3.0 and above belonging to Düzce Province between 1990 and 2022. In order to increase the learning potential of the method, the b and d values of the earthquake were calculated and included in the data set, except for the earthquake magnitude. The determination of earthquakes in a specific time interval in regions of Turkey, the classification of earthquake-related seismic data using artificial neural networks, and the production of predictions for the future reveal the importance of this study.
ARTICLE | doi:10.20944/preprints202306.0370.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: miRNA; Buffalo; Estrus; Prognostic biomarker; Gene-Target prediction
Online: 6 June 2023 (03:52:39 CEST)
Buffalo are silent breeders, therefore detecting estrus is a serious challenge. There is a rising need for sensitive and precise biomarkers in this scenario. Recent research on miRNA has demon-strated the importance of these molecules as biomarkers. Though there have been miRNA studies in saliva during the estrous cycle, there have been few miRNA research in blood samples. The current study was designed to look at blood miRNAs during the oestrous cycle in heifers (n=5) to address the issue of silent estrus. On the day of estrus and diestrus, blood samples from 60 heifers were obtained and pooled into (n=5) separate samples. Ultrasonography and progesterone assay was performed to confirm estrus. Then, employing particular miRNA adapters, small RNA se-quencing of miRNA was performed using the Illumina Miseq 2500. The UEA sRNA bioinfor-matics workbench identified 94 substantially differently expressed miRNAs (p>0.05) from these data. In estrus, 63 miRNA were upregulated and 31 miRNA were downregulated. When fold change was increased to (log2foldchange >1; q value less than 0.05), 25 miRNAs were elevated during estrus. miR-497, miR-582, miR-10174, miR-23, miR-223, miR-1296 were upregulated, whereas miR-10167, 671, 1246,122 were downregulated. miR-497 is unusually elevated (log2 foldchange>5) when compared to another miRNA (log2 foldchange >5) miRNet 2.0, Cytoscape, and MIENTURNET network software found that miR-497 has more degree centrality, above 60; it is associated with more than 60 nodes, followed by miR-93
ARTICLE | doi:10.20944/preprints202306.0044.v1
Subject: Engineering, Mechanical Engineering Keywords: RUL prediction; spatiotemporal information, aero-engine, deep learning
Online: 1 June 2023 (07:17:18 CEST)
The ability to handle spatiotemporal information makes contribution for improving the prediction performance of machine RUL. However, most existing models for spatiotemporal information processing are not only complex in structure but also lack adaptive feature extraction capabilities. Therefore, a lightweight operator with adaptive spatiotemporal information extraction ability named Involution GRU (Inv-GRU) is proposed for aero-engine RUL prediction. Involution, the adaptive feature extraction operator, is replaced by the information connection in the gated recurrent unit for obtaining the adaptively spatiotemporal information extraction ability and reducing the parameters. Thus, Inv-GRU can well extract the degradation information the of aero-engine. Then for RUL prediction task, the Inv-GRU based deep learning (DL) framework is firstly constructed, where features extracted by Inv-GRU and several human-made features are separately processed to generate the health indicators (HIs) from multi-raw data of aero-engines. Finally, fully connection layers are adopted are adopted to reduce dimension and regress RUL based on the generated HIs. By applying the Inv-GRU based DL framework to the Commercial Modular Aero Propulsion System Simulation (C-MAPSS) datasets, successful predictions of aero-engines RUL have been achieved. Comparative analysis reveals that the proposed model exhibits superior overall prediction performance compared to recent public methods.
ARTICLE | doi:10.20944/preprints202305.0229.v1
Subject: Engineering, Energy And Fuel Technology Keywords: PV Power Prediction; Mode Decomposition; NARX; LSTM; LightGBM
Online: 4 May 2023 (08:16:05 CEST)
Photovoltaic(PV) power generation is highly nonlinear and stochastic. Accurate prediction of PV power generation plays a crucial role in grid connection as well as the operation and scheduling of power plants. To predict the PV power combination model, this paper suggests a method based on Empirical Mode Decomposition (EMD), Ensemble Empirical Mode Decomposition (EEMD), Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN), Nonlinear Auto-Regressive Neural Networks with Exogenous Input (NARXNN), Long Short Term Memory (LSTM) Neural Network, and Light Gradient Boosting Machine (LightGBM) algorithms. To attempt to reduce the non-smoothness of PV power, the weather variable features with the greatest effect on PV power are first identified by correlation analysis. Following this, the PV power modal decomposition is split and reorganized into a new feature matrix. Finally， a NARX is used to obtain preliminary PV power components and residual vector features， and the PV power is predicted by combining three models of LightGBM， LSTM, and NARX and then the final prediction results are obtained by combining the PV power prediction results using error inverse method weighted optimization. The prediction results demonstrate that the model put forth in this paper outperforms those of other models and validate the model's validity by utilizing real measurement data from Andre Agassi College in the United States.
REVIEW | doi:10.20944/preprints202304.0075.v1
Subject: Social Sciences, Demography Keywords: human migration; prediction; methods; artificial intelligence; data; uncertainty
Online: 6 April 2023 (07:12:19 CEST)
As a fundamental, overall, and strategic issue facing human society, human migration is a key factor affecting the development of countries and cities given constantly changing population numbers. The fuzziness of the spatiotemporal attributes of human migration limits the pool of open-source data for human migration prediction, leading to a relative lag in human migration prediction algorithm research. This study expands the definition of human migration research, reviews the progress of research into human migration prediction, and classifies and compares human migration algorithms based on open-source data. It also explores the critical uncertainty factors restricting the development of human migration prediction. Given the effect of human migration prediction, in combination with artificial intelligence and big data technology, the paper concludes with specific suggestions and countermeasures aimed at enhancing human migration prediction research results to serve economic and social development and national strategy.
ARTICLE | doi:10.20944/preprints202212.0350.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: miRNA target prediction; CLASH; deep learning; interpretation; visualization
Online: 20 December 2022 (03:28:52 CET)
MicroRNAs (miRNAs) are small non-coding RNAs that play a central role in the post-transcriptional regulation of biological processes. miRNAs regulate transcripts by direct binding involving the Argonaute protein family. The exact rules of binding are not known, and several in silico miRNA target prediction methods have been developed to date. Deep Learning has recently revolutionized miRNA target prediction. However, the higher predictive power comes with decreased ability to interpret increasingly complex models. Here, we present a novel interpretation technique, called attribution sequence alignment, for miRNA target site prediction models that can interpret such Deep Learning models on a two-dimensional representation of miRNA and putative target sequence. Our method produces a human readable visual representation of miRNA:target interactions and can be used as a proxy for further interpretation of biological concepts learned by the neural network. We demonstrate applications of this method in clustering of experimental data into binding classes, as well as using the method to narrow down predicted miRNA binding sites on long transcript sequences. Importantly, the presented method works with any neural network model trained on a two-dimensional representation of interactions and can be easily extended to further domains such as protein-protein interactions.
ARTICLE | doi:10.20944/preprints202212.0195.v1
Subject: Medicine And Pharmacology, Oncology And Oncogenics Keywords: radiomics; machine learning; radiation therapy; bone metastases; prediction
Online: 12 December 2022 (08:22:54 CET)
Background: Painful spinal bone metastases (PSBMs) patients regularly receive palliative radiation therapy (RT) with response rates in about 2 of 3 patients. In this exploratory study, we evaluated the value of machine learning (ML) models based on radiomic, semantic and clinical features to predict complete pain response. Methods: Gross tumour volumes (GTV) and clinical target volumes (CTV) of 261 PSBMs were segmented on planning computed tomography (CT) scans. Radiomic, semantic and clinical features were collected for all patients. Random forest (RFC) and support vector machine (SVM) classifiers were compared using repeated nested cross-validation.Results: The best radiomic classifier was trained on CTV with an area under the receiver-operator curve (AUROC) of 0.62 ± 0.01 (RFC; 95% confidence interval). The semantic model achieved a comparable AUROC of 0.63 ± 0.01 (RFC), significantly below the clinical model (SVM, AUROC: 0.80 ± 0.01); and slightly lower than the spinal instability neoplastic score (SINS; LR, AUROC: 0.65 ± 0.01). A combined model did not improve performance (AUROC: 0,74 ± 0,01).Conclusions: We could demonstrate that radiomic and semantic analyses of planning CTs allowed for limited prediction of therapy response to palliative RT. ML predictions based on established clinical parameters achieved the best results.
ARTICLE | doi:10.20944/preprints202212.0188.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Bitcoin price movement; machine learning; crypto price prediction
Online: 12 December 2022 (03:36:10 CET)
With the rise of Blockchain technology, the cryptocurrency market has been gaining significant interest. In particular, the number of cryptocurrency traders and the market capitalization have grown tremendously. However, predicting cryptocurrency price is very challenging and difficult due to the high price volatility. In this paper, we propose a classification machine learning approach in order to predict the direction of the market (i.e., if the market is going up or down). We identify key features such as Relative Strength Index (RSI) and Moving Average Convergence Divergence (MACD) to feed the machine learning model. We illustrate our approach through the analysis of Bitcoin close price. We evaluate the proposed approach via different simulations. Particularly, we provide a backtesting strategy. The evaluation results show that the proposed machine learning approach provides buy and sell signals with more than 86% accuracy.
ARTICLE | doi:10.20944/preprints202211.0439.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: coauthorship; coauthorship Network; Link Prediction; Graph Database; Nodes
Online: 23 November 2022 (07:38:22 CET)
In the modern world where research is taking a huge leap, the collaboration network between authors is also expanding, increasing the probability of different authors coming together to work on the same project, same research paper making them co-authors. In coauthorship, link prediction is used to anticipate new interactions between its members that are likely to occur in the future. Researchers have concentrated their efforts on studying and suggesting methods for providing effective reviews for authors who can collaborate on a scientific endeavor. In order to provide a precise link prediction, a graph database approach is proposed in this paper using nodes to determine most possible co-authors in future. In order to forecast the connections, we preprocessed the data set for the maximum relative contents. A supervised learning approach is used to execute the solution, which includes random forest classifier and logistic regression. The first findings of our technique reveal that the total of two author node’s research collaboration indices has the greatest influence on the performance of supervised link prediction than that of the traditional approach, which stimulates us to conduct further study on employing such a forecast.
ARTICLE | doi:10.20944/preprints202210.0301.v1
Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: Emotion prediction; music; music emotion dataset; affective computing
Online: 20 October 2022 (08:33:49 CEST)
Music is capable of conveying many emotions. The level and type of emotion of the music perceived by a listener, however, is highly subjective. In this study, we present the Music Emotion Recognition with Profile information dataset (MERP). This database was collected through Amazon Mechanical Turk (MTurk) and features dynamical valence and arousal ratings of 54 selected full-length songs. The dataset contains music features, as well as user profile information of the annotators. The songs were selected from the Free Music Archive using an innovative method (a Triple Neural Network with the OpenSmile toolkit) to identify 50 songs with the most distinctive emotions. Specifically, the songs were chosen to fully cover the four quadrants of the valence arousal space. Four additional songs were selected from DEAM to act as a benchmark in this study and filter out low quality ratings. A total of 277 participants participated in annotating the dataset, and their demographic information, listening preferences, and musical background were recorded. We offer an extensive analysis of the resulting dataset, together with a baseline emotion prediction model based on a fully connected model and an LSTM model, for our newly proposed MERP dataset.
ARTICLE | doi:10.20944/preprints202208.0119.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: LULC; prediction; artificial neural network; Urmia; CA-Markov
Online: 5 August 2022 (09:32:32 CEST)
A correctly obtained Land-use/land-cover (LULC) prediction map is essential to under-standing and assessing future patterns. In the study, the LULC map of Urmia/Iran in 2030 was produced using two different prediction methods CA-Markov and Artificial Neural Network (ANN). In general, the study followed a methodology consisting of three steps. In the first steps, Landsat satellite images acquired in 2000, 2010 and 2020 were classified with maximum likelihood algorithm and LULC maps were prepared for each year. In the second stage, to investigate the LULC prediction methods' validation (CA-Markov and ANN) the LULC prediction map of 2020 was produced using the LULC map of 2000 and 2010; In this step, the predicted LULC map of 2020 and the actual LULC map of 2020 were evaluated by correctness, completeness and quality indexes. Finally, The LULC map for 2030 was prepared using all two algorithms and the change map was extracted. The results show that the area of soil and vegetation decreased, and built-up regions increased during the research period. The methods validation results show that the two algorithms are much closer to each other. Nevertheless, in general, ANN has the highest completeness (96.21%) and quality (93.8%) and CA-Markov the most correctness (96.47). This study shows that the CA-Markov algorithm is most successful in predicting the future that had larger areas and a higher percentage in the region (urban and vegetation cover) and the ANN algorithm in predicting phenomena that had smaller levels with fewer percentages (soil and rock).
ARTICLE | doi:10.20944/preprints202201.0378.v1
Subject: Medicine And Pharmacology, Clinical Medicine Keywords: staphylococcus aureus; infective endocarditis; clinical prediction rules; echocardiography
Online: 25 January 2022 (10:41:47 CET)
Background. It is unclear whether the use of clinical prediction rules is sufficient to rule out infective endocarditis (IE) in patients with Staphylococcus aureus bacteremia (SAB) without an echocardiogram evaluation, either transthoracic (TTE) and/or transesophageal (TEE). Our primary purpose was to test the usefulness of PREDICT, POSITIVE and VIRSTA scores to rule out IE without echocardiography. Our secondary purpose was to evaluate whether not performing an echocardiogram evaluation is associated with higher mortality. Methods. We conducted a unicentric retrospective cohort including all patients with a first SAB episode from January 2015 to December 2020. IE was defined according to modified Duke criteria. We predefined threshold cut-off points to consider that IE was ruled out by means of the mentioned scores. To assess 30-day mortality, we used a multivariable regression model considering performing an echocardiogram as covariate. Results. Out of 404 patients, IE was diagnosed in 50 (12.4%). Prevalence of IE within patients with negative PREDICT, POSITIVE and VIRSTA scores was: 3.6% (95% CI 0.1-6.9%), 4.9% (95% CI 2.2-7.7%), and 2.2% (95% CI 0.2-4.3%), respectively. Patients with negative VIRSTA and negative TTE had an IE prevalence of 0.9% (95% CI 0-2.8%). Performing an echocardiogram was independently associated with lower 30-day mortality (OR 0.24 95%CI 0.10-0.54, p=0.001). Conclusion. PREDICT and POSITIVE scores were not sufficient to rule out IE without TEE. In patients with negative VIRSTA score, it was doubtful if IE could be discarded with a negative TTE. Not performing an echocardiogram was associated with worse outcomes, which might be related to presence of occult IE. Further studies are needed to assess the usefulness of clinical prediction rules in avoiding echocardiographic evaluation in SAB patients.
ARTICLE | doi:10.20944/preprints202110.0360.v2
Subject: Computer Science And Mathematics, Computer Science Keywords: Household Disaster Preparation; Natural Hazards Mitigation; Prediction Model
Online: 2 November 2021 (12:57:04 CET)
Natural disasters are showing an increase in the magnitude, frequency, and geographic distribution. Studies have shown that individuals’ self-sufficiency, which largely depends on household preparedness, is very important for hazard mitigation in at least the first 72 hours following a disaster. However, for factors that influence a household’s disaster preparedness, though there are many studies trying to identify from different aspects, we still lack an integrative analysis on how these factors contribute to a household’s preparation. This paper aims to build a classification model to predict whether a household has prepared for a potential disaster based on their personal characteristics and the environment they located. We collect data from the Federal Emergency Management Agency’s National Household Survey in 2018 and train four classification models - logistic regression, decision trees, support vector machines, and multi-layer perceptron classifier models- to predict the impact of personal characteristics and the environment they located on household prepare for the potential natural disaster. Results show that the multi-layer perceptron classifier model outperforms others with the highest scoring on both recall (0.8531) and f1 measure (0.7386). In addition, feature selection results also show that among other factors, a household’s accessibility to disaster-related information is the most critical factor that impacts household disaster preparation. Though there is still room for further parameter optimization, the model gives a clue that we could support disaster management by gathering publicly accessible data.
ARTICLE | doi:10.20944/preprints202106.0533.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: COVID-19; Vaccine; Prediction; Regression; Ensemble learning; AdaBoost
Online: 22 June 2021 (08:30:30 CEST)
The novel coronavirus disease (COVID-19) has created immense threats to public health on various levels around the globe. The unpredictable outbreak of this disease and the pandemic situation are causing severe depression, anxiety and other mental as physical health related problems among the human beings. To combat against this disease, vaccination is essential as it will boost the immune system of human beings while being in the contact with the infected people. The vaccination process is thus necessary to confront the outbreak of COVID-19. This deadly disease has put social, economic condition of the entire world into an enormous challenge. The worldwide vaccination progress should be tracked to identify how fast the entire economic as well as social life will be stabilized. The monitor ofthe vaccination progress, a machine learning based Regressor model is approached in this study. This tracking process has been applied on the data starting from 14th December, 2020 to 24th April, 2021. A couple of ensemble based machine learning Regressor models such as Random Forest, Extra Trees, Gradient Boosting, AdaBoost and Extreme Gradient Boosting are implemented and their predictive performance are compared. The comparative study reveals that the AdaBoostRegressor outperforms with minimized mean absolute error (MAE) of 9.968 and root mean squared error (RMSE) of 11.133.
ARTICLE | doi:10.20944/preprints202104.0628.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Food production; machine learning; agricultural production; prediction model
Online: 23 April 2021 (10:20:09 CEST)
Advancing models for accurate estimation of food production is essential for policymaking and managing national plans of action for food security. This research proposes two machine learning models for the prediction of food production. The adaptive network-based fuzzy inference system (ANFIS) and multilayer perceptron (MLP) methods are used to advance the prediction models. In the present study, two variables of livestock production and agricultural production were considered as the source of food production. Three variables were used to evaluate livestock production, namely livestock yield, live animals, and animal slaughtered, and two variables were used to assess agricultural production, namely agricultural production yields and losses. Iran was selected as the case study of the current study. Therefore, time-series data related to livestock and agricultural productions in Iran from 1961 to 2017 have been collected from the FAOSTAT database. First, 70% of this data was used to train ANFIS and MLP, and the remaining 30% of the data was used to test the models. The results disclosed that the ANFIS model with Generalized bell-shaped (Gbell) built-in membership functions has the lowest error level in predicting food production. The findings of this study provide a suitable tool for policymakers who can use this model and predict the future of food production to provide a proper plan for the future of food security and food supply for the next generations.
Subject: Engineering, Automotive Engineering Keywords: drought; drought indices; South Asia; prediction; projection; teleconnection
Online: 1 March 2021 (17:52:21 CET)
South Asian countries experience frequent drought incidents recently, and due to this reason, many scientific studies were carried to explore the drought in South Asia. In this context, we review scientific studies related to drought in South Asia. The study initially identifies the importance of drought-related studies and discusses drought types for South Asian regions. The representative examples of drought events, severity, frequency, and duration in South Asian countries are identified. The Standardized Precipitation Index (SPI) was mostly adopted in South Asian countries to quantify and monitor droughts. Nevertheless, the absence of drought quantification studies in Bhutan and Maldives is of great concern. Future studies to generate a combined drought severity map for the South Asian region are required. Moreover, the drought prediction and projection in the regions is rarely studied. Further, the teleconnection between drought and large-scale atmospheric circulations in the South Asian area has not been discussed in detail in the most scientific literature. Therefore, as a take-home message, there is an urgent need for scientific studies related to drought quantification for some regions in South Asia, prediction and projection of drought for an individual country (or as a region), and drought teleconnection to atmospheric circulation.
ARTICLE | doi:10.20944/preprints202008.0676.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Popularity Prediction; Classification; Social Network; Machine Learning; Instagram
Online: 30 August 2020 (15:56:34 CEST)
Predicting the popularity of posts on social networks has taken on significant importance in recent years, and several social media management tools now offer solutions to improve and optimize the quality of published content and to enhance the attractiveness of companies and organizations. Scientific research has recently moved in this direction, with the aim of exploiting advanced techniques such as machine learning, deep learning, natural language processing, etc., to support such tools. In light of the above, in this work we aim to address the challenge of predicting the popularity of a future post on Instagram, by defining the problem as a classification task and by proposing an original approach based on Gradient Boosting and feature engineering, which led us to promising experimental results. The proposed approach exploits big data technologies for scalability and efficiency and is general enough to be applied to other social media as well.
ARTICLE | doi:10.20944/preprints202007.0697.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: RUL prediction; sensors; IOT; aircraft engine; business intelligence
Online: 29 July 2020 (12:34:24 CEST)
Increased smart devices in various industries is creating numerous sensors in each of the equipment prompting the need for methods and models for sensor data. Current research proposes a systematic approach to analyze the data generated from sensors attached to industrial equipment. The methodology involves data cleaning, preprocessing, basics statistics, outlier, and anomaly detection. Present study presents the prediction of RUL by using various Machine Learning models like Regression, Polynomial Regression, Random Forest, Decision Tree, XG Boost. Hyper Parameter Optimization is performed to find the optimal parameters for each variable. In each of the model for RUL prediction RMSE, MAE are compared. Outcome of the RUL prediction should be useful for decision maker to drive the business decision; hence Binary classification is performed, and business case analysis is performed. Business case analysis includes the cost of maintenance and cost of non-maintaining a particular asset. Current research is aimed at integrating the machine intelligence and business intelligence so that the industrial operations optimized both in resource and profit.
ARTICLE | doi:10.20944/preprints202007.0650.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Myocarditis; Diagnosis; Convolutional Neural Network; Cardiac MRI; prediction
Online: 26 July 2020 (17:44:05 CEST)
Myocarditis is the form of an inflammation of the middle layer of the heart wall which is caused by a viral infection and can affect the heart muscle and its electrical system. It has remained as one of the most challenging diagnoses in cardiology. Myocardial is the prime cause of unexpected death in approximately 20% of adults less than 40 years of age. Cardiac MRI (CMR) has been considered as a noninvasive and golden standard diagnostic tool for suspected myocarditis and plays an indispensable role in diagnosing various cardiac diseases. However, the performance of CMR is heavily dependent on the clinical presentation and non-specific features such as chest pain, arrhythmia, and heart failure. Besides, other imaging factors like artifacts, technical errors, pulse sequence, acquisition parameters, contrast agent dose, and more importantly qualitatively visual interpretation can affect the result of the diagnosis. This paper introduces a new deep learning-based model called Convolutional Neural Network-Clustering (CNN-KCL) to diagnose the Myocarditis. The hybrid CNN-KCL method performs the early and accurate diagnosis of Myocarditis. To the best-of-our-knowledge, a Convolutional neural network has never been used before for the diagnosis of Myocarditis. In this study, we used 47 subjects to diagnose myocarditis patients from Tehran's Omid Hospital. The total number of data examined is 10425. Our results demonstrate that CNN-KCL achieves 92.3% in terms of diagnosis myocarditis prediction accuracy which is significantly better than those reported in previous studies.
ARTICLE | doi:10.20944/preprints202004.0257.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: COVID-19; Predictive Analytics; Machine Learning; Prediction; Pandemic
Online: 14 May 2020 (09:03:52 CEST)
Globally, there is massive uptake and explosion of data and challenge is to address issues like scale, pace, velocity, variety, volume and complexity of this big data. Considering the recent epidemic in China, modeling of COVID-19 epidemic for cumulative number of infected cases using data available in early phase was big challenge. Being COVID-19 pandemic during very short time span, it is very important to analyze the trend of these spread and infected cases. This chapter presents medical perspective of COVID-19 towards epidemiological triad and the study of state-of-the-art. The main aim this chapter is to present different predictive analytics techniques available for trend analysis, different models and algorithms and their comparison. Finally, this chapter concludes with the prediction of COVID-19 using Prophet algorithm indicating more faster spread in short term. These predictions will be useful to government and healthcare communities to initiate appropriate measures to control this outbreak in time.
ARTICLE | doi:10.20944/preprints202004.0466.v1
Subject: Biology And Life Sciences, Endocrinology And Metabolism Keywords: COVID-19; coronavirus; ACE2; bioinformatics analysis; drug prediction
Online: 26 April 2020 (03:14:50 CEST)
Recently, the outbreak of coronavirus disease 2019 (COVID-19) is threatening human health globally. There is a dire need to find potential therapeutic agents. Angiotensin converting enzyme 2 (ACE2), as an entry receptor of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is considered as potential therapeutic target in COVID-19 pandemic. Here, our bioinformatics analysis revealed that the biological function of ACE2 was correlated with regulation of blood pressure and mediation of SARS-CoV-2 entry into host cells. Ten ACE2 cooperative proteins were identified by using STRING with a high score. ACE2 expressed highly in the small intestine, testis, and kidney. The level of ACE2 expression in tumor tissues varies in different types of cancers compared with that in normal tissues. It was worth noting that the expression level of ACE2 in the tumor has no effect on patient survival. MiRNA hsa-miR-942-5p, and three transcription factors (TFs) including Signal transducer and activator of transcription 4 (STAT4), Estrogen related receptor α (ESRRA), and Signal transducer and activator of transcription 3 (STAT3) were selected as novel ACE2 regulators. Moreover, nine potential therapeutic drugs were predicted by two online databases. Thus, our research may expand the overall view of ACE2 in COVID-19 treatment.
ARTICLE | doi:10.20944/preprints201908.0042.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: Africa; rainfall; variability; prediction; multimodel; superensemble; synthetic; skill
Online: 5 August 2019 (04:48:15 CEST)
Improvements that can be attained in seasonal climate predictions in various parts of Africa using the multimodel supersensemble scheme are presented in this study. The synthetic superensemble (SSE) used follows the approach originally developed at Florida State University (FSU). The technique takes more advantage of the skill in the climate forecast data sets from atmosphere-ocean general circulation models running at many centres worldwide including the WMO global producing centers (GPCs). The module used in this work drew data sets from the Four versions of FSU coupled model system, seven models from the DEMETER project which is the forerun to the current European Ensembles Forecast System, the NCAR Model, and the Predictive Ocean Atmosphere Model for Australia (POAMA), all making a set of 13 individual models. An archive consisting of monthly simulations of precipitation was available over all the 5 regions of Africa, namely Eastern, Central, Northern, Southern, and Western Africa. The results showed that the SSE forecast for precipitation carries a higher skill compared to each of the member models and the ensemble mean. Relative to the ensemble mean (EM), the SSE provides an improvement of 18% in simulation of season cycle of precipitation climatology. In Eastern Africa, during December-February season, a north-south gradient of precipitation prevails between Tropical East Africa and the sector of the region towards Southern Africa. This regional scale climate pattern is a direct influence of the Intertropical Convergence Zone (ITZC) across the African continent during this time of the year. The SSE emerges with superior skill scores such as lowest root mean square error above the EM and the member models, for example in the prediction of spatial location and precipitation magnitudes that characterize the see-saw precipitation pattern in Eastern Africa. In all parts of Africa, and especially Eastern Africa where seasonal precipitation variability is a frequent cause huge human suffering in due to droughts and famine, the multimodel superensemble and its subsequent improvements will always provide a forecast that out weighs the best Atmosphere-Ocean Climate Model.
ARTICLE | doi:10.20944/preprints201901.0091.v1
Subject: Engineering, Civil Engineering Keywords: Acoustic emissions, fracture process, failure prediction, q-statistics
Online: 9 January 2019 (16:35:10 CET)
In this paper we present experimental results concerning Acoustic Emission (AE) recorded during cyclic compression tests on two different kinds of brittle building materials, namely concrete and basalt. The AE inter-event times were investigated through a non-extensive statistical mechanics analysis which shows that their decumulative probability distributions follow q-exponential laws. The entropic index q and the relaxation parameter q 1=Tq, obtained by fitting the experimental data, exhibit systematic changes during the various stages of the failure process, namely (q; Tq) linearly align. The Tq = 0 point corresponds to the macroscopic breakdown of the material. The slope, including its sign, of the linear alignment appears to depend on the chemical and mechanical properties of the sample. These results provide an insight on the warning signs of the incipient failure of building materials and could therefore be used in monitoring the health of existing structures such as buildings and bridges.
ARTICLE | doi:10.20944/preprints201811.0220.v1
Subject: Business, Economics And Management, Accounting And Taxation Keywords: bankruptcy prediction; audit report; artificial intelligence; PART algorithm
Online: 8 November 2018 (14:45:12 CET)
Despite the number of studies on bankruptcy prediction using financial ratios, very little is known about how external audit information can contribute to anticipating financial distress. A handful of papers show that a combination of ratios and audit data can provide significant predictive purposes, but a recent paper by Muñoz-Izquierdo et al. (2018) provided an 80% predictive accuracy solely by using the disclosures of audit reports. We complement this study. Applying an artificial intelligence method (the PART algorithm), we examine the predictive ability of more easily extracted information from the report and suggest a practical implication for each user. Simply by (1) finding the audit opinion, (2) identifying if a matter section exist, (3) and the number of comments disclosed, then any user may predict a bankruptcy situation with the same accuracy as if they had scrutinised the whole report. In addition, we also provide an extended literature review about previous studies on the interaction between bankruptcy prediction and the external audit information.
ARTICLE | doi:10.20944/preprints201810.0103.v1
Subject: Biology And Life Sciences, Virology Keywords: Nipah Virus, outbreak, inhibitors, QSAR, database, prediction algorithm
Online: 5 October 2018 (15:04:23 CEST)
Nipah virus (NiV) is responsible to cause various outbreaks in Asian countries, with latest from Kerala state of India. Till date there is no drug available despite its urgent requirement. In the current study, we have provided a computational one-stop solution for NiV inhibitors. We have developed “anti-Nipah” web resource, which comprised of a data repository, prediction method, and data visualization modules. The database comprised of 313 (181 unique) inhibitors from different strains and outbreaks of NiV extracted from research articles and patents. However, the quantitative structure–activity relationship (QSAR) based predictors were accomplished using classification approach employing 10-fold cross validation through support vector machine with 120 (68p + 52n) inhibitors. The overall predictor showed the accuracy and Matthew’s correlation coefficient of 88.89% and 0.77 on training/testing dataset respectively. The independent validation dataset also performed equally well. The data visualization modules from chemical clustering and principal component analyses displayed the diversity in the NiV inhibitors. Therefore, our web platform would be of immense help to the researchers working in developing effective inhibitors against NiV. The user-friendly webserver is freely available on URL: http://bioinfo.imtech.res.in/manojk/antinipah/
COMMUNICATION | doi:10.20944/preprints201803.0054.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: data feature selection; data clustering; travel time prediction
Online: 7 March 2018 (13:30:06 CET)
In recent years, governments applied intelligent transportation system (ITS) technique to provide several convenience services (e.g., garbage truck app) for residents. This study proposes a garbage truck fleet management system (GTFMS) and data feature selection and data clustering methods for travel time prediction. A GTFMS includes mobile devices (MD), on-board units, fleet management server, and data analysis server (DAS). When user uses MD to request the arrival time of garbage truck, DAS can perform the procedure of data feature selection and data clustering methods to analyses travel time of garbage truck. The proposed methods can cluster the records of travel time and reduce variation for the improvement of travel time prediction. After predicting travel time and arrival time, the predicted information can be sent to user’s MD. In experimental environment, the results showed that the accuracies of previous method and proposed method are 16.73% and 85.97%, respectively. Therefore, the proposed data feature selection and data clustering methods can be used to predict stop-to-stop travel time of garbage truck.
ARTICLE | doi:10.20944/preprints201710.0163.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: link prediction; combination method; theoretical limit; TLF method
Online: 26 October 2017 (05:49:34 CEST)
The theoretical limit of link prediction is a fundamental problem in this field. Taking the network structure as object to research this problem is the mainstream method. This paper proposes a new viewpoint that link prediction methods can be divided into single or combination methods, based on the way they derive the similarity matrix, and investigates whether there a theoretical limit exists for combination methods. We propose and prove necessary and sufficient conditions for the combination method to reach the theoretical limit. The limit theorem reveals the essence of combination method that is to estimate probability density functions of existing links and nonexistent links. Based on limit theorem, a new combination method, theoretical limit fusion (TLF) method, is proposed. Simulations and experiments on real networks demonstrated that TLF method can achieve higher prediction accuracy.
ARTICLE | doi:10.20944/preprints201709.0114.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: WSN; IoT; seawater temperature prediction; marine aquaculture support
Online: 23 September 2017 (11:31:13 CEST)
Aquaculture is growing ever more important due to the decrease in natural marine resources and increase inworldwide demand. To avoid losses due to aging and abnormalweather, it is important to predict seawater temperature in order to maintain a more stable supply, particularly for high value added products, such as pearls and scallops. The increase in species extinction is a prominent societal issue. Furthermore, in order to maintain a stable quality of farmed fishery, water temperature should be measured daily and farming methods altered according to seasonal stresses. In this paper, we propose an algorithm to estimate seawater temperature in marine aquaculture by combining seawater temperature data and actual weather data.
ARTICLE | doi:10.20944/preprints202010.0436.v1
Subject: Computer Science And Mathematics, Mathematical And Computational Biology Keywords: Naïve Bayes Classification; Eulers Strength Formula; Cricket Prediction; Supervised Learning; KNIME Tool; Cricket prediction; sports analytics; multivariate regression; neural network
Online: 21 October 2020 (12:34:00 CEST)
In cricket, particularly the twenty20 format is most watched and loved by the people, where no one can guess who will win the match until the last ball of the last over. In India, The Indian Premier League (IPL) started in 2008 and now it is the most popular T20 league in the world. So we decided to develop a machine learning model for predicting the outcome of its matches. Winning in a Cricket Match depends on many key factors like a home ground advantage, past performances on that ground, records at the same venue, the overall experience of the players, record with a particular opposition, and the overall current form of the team and also the individual player. This paper briefs about the key factors that affect the result of the cricket match and the regression model that best fits this data and gives the best predictions. Cricket, the mainstream and widely played sport across India which has the most noteworthy fan base. Indian Premier League follows 20-20 format which is very unpredictable. IPL match predictor is a ML based prediction approach where the data sets and previous stats are trained in all dimensions covering all important factors such as: Toss, Home Ground, Captains, Favorite Players, Opposition Battle, Previous Stats etc, with each factor having different strength with the help of KNIME Tool and with the added intelligence of Naive Bayes network and Eulers strength calculation formula.
REVIEW | doi:10.20944/preprints201806.0137.v1
Subject: Medicine And Pharmacology, Other Keywords: opportunity; challenge; perspective; health data; disease prediction; clinical outcome prediction; healthcare process; data quality; quantity and quality analysis; artificial intelligence
Online: 8 June 2018 (13:22:08 CEST)
Health information technology has been widely used in healthcare, which has contributed a huge amount of data. Health data has four characteristics: high volume; high velocity; high variety and high value. Thus, they can be leveraged to i) discover associations between genes, diseases and drugs to implement precision medicine; ii) predict diseases and identify their corresponding causal factors to prevent or control the diseases at an earlier time; iii) learn risk factors related to clinical outcomes (e.g., patients’ unplanned readmission), to improve care quality and reduce healthcare expenditure; and iv) discover care coordination patterns representing good practice in the implementation of collaborative patient-centered care. At the same time, there are major challenges existing in data-driven healthcare research, which include: i) inefficient health data exchanges across different sources; ii) learned knowledge is biased to specific institution; iii) inefficient strategies to evaluate plausibility of the learned patterns and v) incorrect interpretation and translation of the learned patterns. In this paper, we review various types of health data, discuss opportunities and challenges existing in the data-driven healthcare research, provide solutions to solve the challenges, and state the important role of the data-driven healthcare research in the establishment of smart healthcare system.
ARTICLE | doi:10.20944/preprints202311.0788.v1
Subject: Chemistry And Materials Science, Electrochemistry Keywords: lead-acid batteries; Electrochemical Impedance Spectroscopy; battery lifetime prediction
Online: 13 November 2023 (10:48:47 CET)
Electrochemical Impedance Spectroscopy techniques were applied in this work to 9 lead-acid battery prototypes fabricated industrially, divided on three type/technology packages. Frequency dependent impedance changes were interpreted during successive charge/discharge cycles on two distinct stages: 1) immediately after fabrication; and 2) after a controlled aging procedure to 50% Depth of Discharge, following industrial standards. By investigating their State of Health behaviour vs electrical response, three methods were employed, namely the (Q-Q0) total charge analysis, the decay values of Constant Phase Element in the equivalent Randles circuits, and the resonance frequency of the circuit. A direct correlation has been found for prediction of the best performant batteries in each package, thus allowing a qualitative analysis capable to provide the decay of the battery State of Health. We emphasized which parameters are directly connected with their lifetime performance in both stages, and by consequence, which type/technology battery prototype emphasize the best performance. Based on this methodology, the industrial producers can further establish the quality of the novel batteries in terms of performance vs lifespan, allowing them to validate the novel technological innovations implemented in the current prototypes.
ARTICLE | doi:10.20944/preprints202310.0992.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: tropical cyclones; tropical cyclogenesis, ensemble prediction system; data assimilation
Online: 19 October 2023 (06:19:43 CEST)
In this study, we conducted experiments to assess the forecasting capabilities for the tropical cyclone (TC) genesis over the South China Sea using the ensemble-based data assimilation system (EPS-DA) by WRF-LETKF. These experiments covered forecast lead times of up to 5 days and spanned a period from 2012 to 2019, involving a total of 45 TC formation events. The evaluation involved forecast probability assessments and positional and timing error analysis. Results indicated that successful forecasting depends on lead time and initial condition quality. For TC formation from an embryo vortex to tropical depression intensity, the EPS-DA system demonstrated improved accuracy as the forecast cycle approached the actual formation time. TC centers converged toward observed locations, highlighting the potential of assimilation up to 5 days before formation. We examined statistical variations in dynamic and thermodynamic variables relevant to TC processes, offering an objective system assessment. Our study emphasized early warnings of TC development appear linked to formation-time environmental conditions, particularly strong vorticity and enhanced moisture processes.
ARTICLE | doi:10.20944/preprints202310.0816.v1
Subject: Public Health And Healthcare, Other Keywords: Framingham Risk Score; cardiovascular disease; prediction; risk factors; recalibration
Online: 12 October 2023 (16:24:30 CEST)
1. Background: Cardiovascular diseases (CVDs) are India’s leading cause of mortality. This study aimed to recalibrate the original Framingham Risk Score (FRS) equations among adults in Kerala state. 2. Methods: Baseline survey data from the Kerala Diabetes Prevention Program were analyzed: 921 males and 567 females for lipid-based FRS scores and 1042 males and 646 females for BMI-based FRS scores. Recalibration of the original FRS scores was performed using local data on CVD risk factors and CVD mortality. 3. Results: Among males, the median 10-year CVD risk with the recalibrated lipid-based FRS score was 7.34 (IQR 4.33-12.42), compared with the original score of 8.88 (5.23-14.87) (p<0.001). For BMI-based FRS scores, the median 10-year CVD risk was 7.40 (4.27-11.83) with the recalibrated score, compared to 9.32 (5.40-14.80) for the original score (p<0.001). In females, the median 10-year CVD risk was 4.83 (2.90-8.36) with the recalibrated score, compared to 2.85 (IQR 1.71-4.98) with the original score (p<0.001). Similarly, the median 10-year CVD risk was 4.66 (2.74-8.81) with the recalibrated BMI-based FRS score, compared to 2.95 (1.72-5.61) with the original score (p<0.001). 4. Conclusions: Recalibrated FRS scores estimated a significantly lower 10-year CVD risk in males and a higher risk in females than the original FRS scores.