Computer Science and Mathematics

Sort by

Article
Computer Science and Mathematics
Probability and Statistics

Adrian Velasco

Abstract: Reliable crop statistics are foundational to food-security planning, yet the literature often treats forecasting accuracy and data-quality assessment as separate tasks. This paper develops an integrated evidence synthesis around three Philippine studies that together illuminate both problems for rice and corn. The first compared Seasonal Autoregressive Integrated Moving Average and Holt-Winters models for quarterly rice and corn production and found that Holt-Winters with additive seasonality yielded lower forecast errors. The second extended the forecasting problem to machine-learning models and reported that Random Forest produced the strongest predictive performance among the tested algorithms, while performance varied across other nonlinear approaches. The third applied the Newcomb-Benford law to official crop production statistics and identified deviations in rice and corn digit patterns that warrant further validation. Drawing on official Philippine Statistics Authority documentation and broader methodological literature on forecast evaluation, survey reliability, and crop-yield prediction, the paper argues that forecastability and statistical integrity should be studied together rather than in isolation. A series can be forecastable yet still contain reporting irregularities, while a numerically plausible series can remain difficult to forecast because of structural breaks, weather shocks, or shifting production conditions. For agricultural planning, the strongest evidence base comes from combining temporal modeling with routine statistical-quality screening, transparent revision practices, and follow-up diagnostics when anomalies appear. The paper concludes by proposing a practical framework for Philippine agricultural analytics in which data integrity checks precede and accompany forecasting, thereby improving the credibility of crop outlooks used for procurement, import planning, early warning, and resource allocation.

Article
Computer Science and Mathematics
Probability and Statistics

Felix Reichel

Abstract: Skyjo is a simple stochastic card game with partial information, local replacement decisions, and score-reducing column removal events. This paper develops a formal mathematical model of the game, derives expected-score rules for turn-level actions, proves several dominance and threshold results, and evaluates a family of heuristic strategies through Monte Carlo simulation. The focus here lies on local optimality under explicit belief assumptions rather than a full equilibrium solution of the multiplayer game. Finally a simulation code is provided for reproducibility.

Article
Computer Science and Mathematics
Probability and Statistics

Ntebogang Dinah Moroke

Abstract: The attribution of systemic financial stress to specific market sectors requires metrics that are simultaneously faithful to the model’s internal computations, statistically consistent as the sample size grows, and connected to a physically meaningful measure of directed information flow. This paper addresses all three requirements through the lens of information geometry. We present and empirically verify the Entropy-Saliency Equivalence Theorem: the Metabolic Saliency Sms(i, t) introduced in the companion paper (Paper 1 of this series) is an asymptotically unbiased estimator of the local Kullback-Leibler divergence KL(q(i) t ∥q(i) 0 ) between the stressed and resting sector-level return distributions, where the convergence is governed by the Fisher information matrix of the Power Mapping Network (PMNet) output distribution. We also derive the finite-sample bias-variance decomposition of the Kraskov-Stögbauer-Grassberger (KSG) transfer entropy estimator used to construct the saliency weights, establishing a minimax-optimal convergence rate of O(T−2/(d+2)) for a d-dimensional density support. A novel evaluation metric, the Spatio-Temporal Information Flux (STIF), is proposed to quantify the directed flow of stress-relevant information between JSE sectors in bits per trading day, providing a sector-level causal audit trail that satisfies the interpretability requirements of the South African Financial Sector Regulation Act (FSRA, 2017) and MiFID II. Empirical validation on the JSE canonical panel (N = 87 securities, T = 2,731 trading days, January 2015 to December 2025) with Eskom load-shedding stages as exogenous stress injectors confirms that Sms tracks KL(qt||q0) with a Pearson correlation of \( \hat{\rho} \) = 0.81 (p < 0.001) and that the STIF metric identifies the energy sector as the primary information source during Stage 4+ events, with information flux to the financial sector peaking at 0.43 bits/day—a 3.1× increase above the resting baseline of 0.14 bits/day. These results complete the information-theoretic glass-box characterisation of the GWS-STNet architecture and bridge topological stability theory with a fully information-theoretic characterisation of financial stress attribution.

Article
Computer Science and Mathematics
Probability and Statistics

Aris Spanos

Abstract: The Two-Envelope Problem (TEP) is revisited to argue that the standard evaluation of expected returns relies on spurious probabilities arising from a misuse of formal probability theory. The source of the problem is the ex post framing of two identical envelopes, X and Y, one containing twice as much money as the other, after one envelope, say X, has been selected and its content X=x observed. The value x is then used to define Y in terms of the values y=x/2 and y=2x, each assigned probability .5, with an analogous derivation when Y is selected. This renders X and Y ill-defined random variables because the relevant probabilistic framing must instead be based on the original experimental setup, prior to any selection or observation, where the envelope contents are unknown, say $θ and $2θ. Framing the original setup using axiomatic probability, the dependence between X and Y is accounted for when x=θ, y=2θ, and when x=2θ, y=θ. The ensuing joint distribution of X and Y determines that the expected returns imply indifference between keeping the chosen envelope and switching, explaining away the ‘paradox’ as a misapplication of probability theory.

Article
Computer Science and Mathematics
Probability and Statistics

Moriba Kemessia Jah

Abstract: Probability is often treated as the default representation of uncertainty in statistical inference and machine learning. This paper asks a more fundamental question: under what conditions is a probability distribution a valid representation of uncertainty, and what is the information cost of assuming one when those conditions are not met? We show that inference is governed by two joint constraints: maximizing information capacity by preserving the geometric degrees of freedom through which contrast can register, and minimizing false information by asserting nothing the evidence has not forced. These constraints, expressed through Jaynes’s principle of maximum entropy and Popper’s criterion of falsification, determine the structure of inference without remainder. Bayesian inference emerges not as a competing framework, but as the limiting geometry obtained when epistemic width has contracted sufficiently to justify probabilistic closure. In this sense, probability is not assumed—it is earned. We trace the origin of these ideas through two decades of operational experience in spacecraft navigation, space situational awareness, and orbit determination, where standard probabilistic filters performed well in nominal regimes but failed systematically when uncertainty was driven by genuine ignorance rather than statistical variability. Across problems including debris tracking, attitude estimation, and multi-target inference, the consistent failure mode was premature probabilistic commitment in regimes where observation geometry could not support distinguishability. The central result is that information exists only in the presence of contrast, and that structure destroyed without evidence justification is information permanently lost. We formalize this principle through an epistemic geometry of inference and show that probabilistic representations are valid only when distinguishability, parameterization, and likelihood structure are all earned by the data. When these conditions fail, probabilistic closure incurs a measurable and avoidable information capacity cost.

Article
Computer Science and Mathematics
Probability and Statistics

Joseph Njuki

,

Thomas Gilbert

Abstract: In this article, we develop a goodness-of-fit test for the Kumaraswamy distribution based on energy statistics. Due to the availability of its quantile (inverse) function, Kumaraswamy distribution has been shown to be the preferred alternative to the beta distribution, since both have bounded support in the (0,1) interval. The proposed test procedure is simple and more powerful against general alternatives. Under different settings, simulations show that the proposed test is capable of being well controlled for any given significance (nominal) levels. In terms of power comparisons, the proposed test outperforms other existing methods in different settings. We then apply the proposed test to real datasets (underground economy index, food expenditure, and Shasta water reservoir) to demonstrate its competitiveness and usefulness.

Article
Computer Science and Mathematics
Probability and Statistics

Muddassiru Abubakar

,

Umar Usman

Abstract: Road traffic accidents remain a critical public safety challenge in rapidly urbanizing regions of sub-Saharan Africa, where heterogeneous road infrastructure and high population density exacerbate risk. This study applies Kernel Density Estimation (KDE) and Geographically Weighted Regression (GWR) to analyze spatial patterns of road traffic accidents across Jega Local Government Area, Kebbi State, Nigeria, using fifty georeferenced primary data points collected through Global Positioning System surveys and manual traffic counts. The KDE analysis identified optimal bandwidth of 175 meters with a Prediction Accuracy Index (PAI) of 3.50 at the 85th percentile threshold, indicating strong spatial clustering of accidents. Spatial autocorrelation analysis revealed significant clustering (Moran's I = 0.312, p < 0.05). The GWR model demonstrated strong explanatory power with global R² of 0.72 and AICc of 420.35. Local R² values exhibited substantial spatial variation (range: 0.20–0.95), highlighting the importance of localized analysis. Cross-validation results (RMSE = 3.45, MAE = 2.12, R² = 0.65) confirmed predictive robustness. The integrated geospatial framework identified distinct high-risk corridors, with Gada (8 accidents), Garkar Ando (5 accidents), and Gobirawa (5 accidents) emerging as critical hotspots requiring immediate intervention. This research provides a validated geostatistical framework for micro-scale road safety planning in Nigerian cities.

Article
Computer Science and Mathematics
Probability and Statistics

Xingwei Hu

,

Caihong Hu

,

Cheng-Kuang Wu

Abstract: This paper derives closed-form expressions for the asymptotic covariance matrices of unrotated factor loading and uniqueness estimators for several widely used non-maximum-likelihood factor extraction methods. These include least squares, principal factor, iterative principal component, alpha factor, and image factor analysis. By expressing these results explicitly in terms of the asymptotic covariance of the sample covariance or correlation matrix, the proposed formulas facilitate straightforward computation of standard errors. When combined with the delta method from rotation criteria, they further yield analytically tractable standard errors for rotated factor loadings. Monte Carlo simulations demonstrate accurate finite-sample performance, and an empirical application illustrates practical implementation of the proposed approach.

Article
Computer Science and Mathematics
Probability and Statistics

Yiwen Yuan

,

Junfeng Shang

,

Chao Gu

Abstract: Sequential linear models can be adopted to describe the data where the response variable depends on lagged outcomes and fixed effects variables. In estimation and variable selection and for predicting the response variable with high accuracy, we propose the penalized method based on Smoothly Clipped Absolute Deviation Penalty (SCAD) in the sequential linear models. We conduct the simulations where the SCAD-penalized method is compared with other methods including the ordinary least squares (OLS), Lasso, and adaptive Lasso in the sequential linear models. The simulation results demonstrate that the SCAD-penalized method in the sequential linear models excels in estimation with better accuracy and precision and in variable selection with better prediction. We apply the proposed method to two real data sets for further illustrating the performance of the SCAD-penalized method in the sequential linear modeling.

Article
Computer Science and Mathematics
Probability and Statistics

Alexander Robitzsch

Abstract: Item response theory (IRT) models are widely used in the social sciences to analyze multivariate discrete data that include cognitive test items. In many applications, the performance of two groups is compared using IRT modeling. The assessment of differential item functioning (DIF) plays a central role in this context, as it evaluates whether specific items function differently across groups; that is, whether their item parameters differ between groups. DIF detection is commonly based on statistical inference using item fit statistics. The mean deviation (MD) and root mean square deviation (RMSD) statistics are two widely used item fit measures. However, in the literature and in empirical research, these statistics are typically treated only as effect size measures (i.e., point estimates), and formal statistical inference for them is largely lacking. To address this gap, this article proposes confidence interval (CI) estimation for the MD and RMSD statistics based on asymptotic theory and a computationally efficient parametric bootstrap method. A simulation study was conducted to evaluate the proposed CI estimation approaches and demonstrated their validity. Across both item fit statistics, for DIF and non-DIF items, and across all simulation conditions, the results indicate that CI estimation based on the parametric bootstrap using empirical percentiles performed best and outperformed both the parametric bootstrap with normal distribution-based CIs and the asymptotic theory-based approach. It is therefore recommended that CI estimation for MD and RMSD statistics be routinely reported in addition to point estimates in empirical research.

Article
Computer Science and Mathematics
Probability and Statistics

Bissilimou Rachidatou Orounla

,

Ouanan Nicolas Tuo

,

Kolawolé Valère Salako

,

Justice Moses K. Aheto

,

Romain Glèlè Kakaï

Abstract: The COVID-19 pandemic has spread rapidly across the world and caused several economic, social, and demographic impacts, even though there were strong geographical disparities. This study aims to assess the effect of socio-demographic factors and the use of non conventional medicines on COVID-19 risk perception in West Africa using Structural Equation Modeling (SEM) approach. A quantitative survey was conducted in four countries (Benin, Togo, Ghana and Côte d’Ivoire). Data were collected on demographic characteristics, COVID-19 risk perception (risk feeling and risk analysis), affective attitude, trust predictors and non-conventional medicine. Nominal polychotomous logistic regression, binary logistic regression and partial least squares were used for the data analysis. Among the respondents 59.11% from the in-person survey, 28.08% were from Benin, 32.84% from Côte d’Ivoire, 24.96% from Togo and 14.12% from Ghana. The results showed a very high level of risk perception within the countries. Participants aged between 18 and 40 used less non-conventional medicine. Also, people with a low level of education or no formal education often perceive a higher risk associated with COVID-19 and use more non-conventional medicine than others. The PLS-SEM model’s loadings were higher compared to those of the Consistent PLS (PLSc-SEM), but the Consistent PLS showed robust values in the structural model with lower RMSE than the linear model. Our results also indicated that non-conventional medicine has a positive relationship with COVID-19 risk perception. For decision-makers and health workers, this research underscores the importance of unconventional medicine and the emotional state of local population in managing epidemic.

Article
Computer Science and Mathematics
Probability and Statistics

Peter Gács

Abstract: In the context of the dynamical systems of classical mechanics, we introduce two new notions called “algorithmic fine-grain and coarse-grain entropy”. The fine-grain algorithmic entropy is, on the one hand, a simple variant of the randomness tests of Martin-L¨of (and others) and is, on the other hand, a connecting link between description (Kolmogorov) complexity, Gibbs entropy and Boltzmann entropy. The coarse-grain entropy is a slight correction to Boltzmann’s coarse-grain entropy. Its main advantage is its less partition-dependence, due to the fact that algorithmic entropies for different coarse-grainings are approximations of one and the same fine-grain entropy. It has the desirable properties of Boltzmann entropy in a wider range of systems, including those of interest in the “thermodynamics of computation”. It also helps explaining the behavior of some unusual spin systems arising from cellular automata.

Article
Computer Science and Mathematics
Probability and Statistics

Rui Gonçalves

Abstract: The Box–Cox transformation is widely used to induce approximate normality and linearity in statistical modelling. Within the Power Normal framework, it embeds non-Gaussian variables into a latent Gaussian structure where conditional relationships become linear. However, the inverse transformation does not generally preserve these functional relationships when returning to the original scale. In this paper, we formally analyze the discrepancy between the inverse image of the linear regression function in the transformed domain and the true conditional expectation in the original scale. We derive an explicit second-order decomposition showing that the conditional mean in the original scale consists of the inverse-transformed linear predictor plus a curvature-induced correction term proportional to the conditional variance. This distortion term depends explicitly on the transformation parameter and the local geometry of the inverse Box-Cox function. The analysis reveals that the loss of structural preservation under inversion is an intrinsic consequence of the nonlinear transformation and can be interpreted as a second-order Jensen-type correction. Numerical illustrations based on simulated bivariate Power Normal models confirm the theoretical findings. These results clarify a structural limitation of transformation-based Gaussian modelling and provide insight into its implications for statistical inference and applied modelling.

Article
Computer Science and Mathematics
Probability and Statistics

Kazuharu Misawa

Abstract: Accurate evaluation of extremely small Gaussian tail probabilities is essential in statistical meta-analyses, in which large z-scores (often exceeding 8 or 9) must be converted into p-values. Meanwhile, direct numerical integration of complementary error function erfc(a) suffers from severe underflow in floating-point arithmetic. In this paper, a simple and robust approximation scheme for log[erfc(a)] is proposed based on a geometric tangent construction. This approach yields explicit lower and upper bounds, closed-form asymptotic expansions up to order a^-8, and numerically stable formulas suitable for implementation in statistical software. Numerical comparisons demonstrate that the lower and upper bounds become extremely tight for a>=6, making the proposed method practical for large-scale meta-analytic computations.

Article
Computer Science and Mathematics
Probability and Statistics

Gultac Eroglu Inan

Abstract: The geometric process (GP) is one of the important and widely used stochastic models in reliability theory. Although it is used in various areas of application, it has some limitations that cause difficulties. The doubly geometric (DGP) has been proposed to overcome these limitations. The parameter estimation problem plays an important role for both GP and DGP. In this study, the parameter estimation problem for DGP when the distribution of the first interarrival time is assumed to be a gamma distribution with the parameters α and β is considered. Firstly, the maximum likelihood (ML) method is used to estimate the model parameters. Asymptotic joint distribution of the estimators and their asymptotic unbiasedness and consistency properties are obtained. Then the small sample performances of the estimators are evaluated by a simulation study. Finally, the applicability of the method is illustrated by using two real-life data examples. It is shown that these data sets can be modeled by DGP. Additionally, the nonparametric estimators which are called modified moment (MM) estimators are compared with the ML estimators. As a result it can be said that the ML estimators are more efficient than the MM estimators.

Article
Computer Science and Mathematics
Probability and Statistics

Gurami Tsitsiashvili

Abstract: In this paper, we construct a probabilistic model of a sliding mode. This model is based on the moment a random walk with positive jumps crosses a certain critical level. It is assumed that the jump magnitude has a geometric distribution. If the initial state is negative and the critical level is zero, then after crossing this level, a random walk begins in the opposite direction until it crosses zero again. As a result, motion orthogonal to the slip line is defined as a regenerative process, in which the moments of regeneration are the moments of zero crossings from right to left. An estimate of the Qi Fan metric of the maximum deviation of this random walk over a certain time interval is constructed under the assumption that the time and magnitude of the jumps are reduced by a factor of m. This estimate is found to be of the order of lnm/m as m→∞ and characterizes the deviation of a random trajectory orthogonal to the slip line. In the model of motion along a slip line, its velocity is assumed to assume fixed values when the trajectory of motion orthogonal to the slip line is above or below zero. Using the central limit theorem for the integral of a regenerative process, an estimate of the non-uniformity of motion of a random trajectory along the slip line is constructed. It is found that the characteristic magnitude of this non-uniformity is of the order of 1/m as m→∞. This indicates that the accumulation of random errors during motion along the slip line is significantly faster than during motion orthogonal to the slip line.

Article
Computer Science and Mathematics
Probability and Statistics

Gonçalo Melo de Magalhães

Abstract: Machine learning's dominant paradigm—whether model-centric or data-centric—treats intelligence as the extraction of statistical patterns from behavioral records. This approach has delivered remarkable engineering feats. Yet something foundational is missing. Data is not reality: it is a finite record of trajectories through reality. A photograph of a river is not the river's law. This paper argues that the data paradigm conflates measurement with mechanism, capturing where systems have been rather than why they go there. We propose an alternative grounded in the Architecture of Freedom Intelligence (AFI), which identifies navigability—the structural availability of paths—as the primary organizing principle of all complex systems. The Law of Freedom, F = P/D, states that navigational capacity equals differentiation capacity (Perception, P) divided by structural resistance (Distortion, D). Under this framework, intelligence is not pattern memorization but distortion navigation: all systems move according to dx/dt = −P(x)·∇D(x), following gradients of resistance scaled by perceptual capacity. We demonstrate that this gradient law is structurally identical to Fick's diffusion, Berg–Brown chemotaxis, Ohm's law, and gradient descent—revealing a deep structural unity that the data paradigm treats as coincidental analogy. Nature does not train on labeled datasets: ants, neurons, immune cells, and ecological populations navigate through calibrated heuristics on Perception and Distortion fields, not through backpropagation over historical trajectories. This observation motivates a fundamental reconceptualization of what training should accomplish. We propose Freedom Intelligence Training (FIT): a learning paradigm oriented toward learning P and D fields directly, rather than fitting statistical correlations over behavioral snapshots. FIT rests on five predictions: (i) models trained on P–D fields require exponentially less data than pattern-extraction models; (ii) generalization improves because P–D fields encode causal structure; (iii) out-of-distribution performance improves because navigability laws transfer across domains; (iv) interpretability is natural since every prediction decomposes into ΔP and ΔD contributions; (v) the exploration–exploitation transition is quantifiable as the coefficient of variation of the Freedom field crossing 1.0. We provide ten falsification criteria and position FIT within the emerging landscape of world models, physics-informed learning, and causal inference. This is a theoretical proposal; a complete experimental roadmap is provided.

Article
Computer Science and Mathematics
Probability and Statistics

Bojan Baškot

,

Andrej Ševa

,

Vesna Lešević

,

Bogdan Ubiparipović

Abstract: Structural Equation Modeling (SEM) is a key framework for analyzing complex economic relationships involving latent variables, mediation effects, and endogeneity, yet the choice between frequentist and Bayesian estimation remains theoretically and practically contested, especially in settings with non-stationary data and small samples. This study provides a formal comparison of the two approaches by formulating SEM as a probabilistic graphical model and deriving the corresponding estimation procedures, identifiability conditions, and uncertainty measures. We examine asymptotic properties of frequentist estimators and posterior consistency in Bayesian SEM, with particular attention to integrated time-series SEM applications such as shadow economy estimation. The analysis shows that while both approaches converge under large-sample conditions, important differences arise in finite samples. Bayesian methods exhibit more stable inference through coherent uncertainty quantification and greater robustness to model misspecification, especially when prior theoretical information is available. In contrast, frequentist estimators rely more heavily on asymptotic assumptions that may be violated in typical economic datasets. These findings suggest that Bayesian SEM offers practical advantages for empirical economic modeling under realistic data constraints, without rejecting the theoretical validity of frequentist methods in large-sample settings.

Article
Computer Science and Mathematics
Probability and Statistics

Zlatko Pangarić

Abstract: This paper presents a new methodological approach to the analysis of numerical sequences that are commonly considered random. This includes the decimal expansion of the number π, stock market indices (e.g., Belex15), pseudorandom numbers (PRNG), cryptographically secure pseudorandom numbers (CSPRNG), physical random number generators (RNG), and quantum random numbers (QRNG). The core method is based on hierarchical computation of higher-order differences and symbolic transformation of signs, enabling structural encoding of each sequence into a symbolic space. The primary objective is to determine whether the decimal expansion of π and related sequences exhibit the same distribution of symbolic patterns as the theoretical model of variations with repetition. The analysis is extended to sequences of 4 and 5 digits, including higher-order differences such as third and fourth order. The results show that empirical distributions of these multilayer structures in the digits of π closely correspond to theoretical distributions derived from all possible variations with repetition. This method opens new possibilities for applications in number theory, cryptography, statistics, and classification of algorithmically generated sequences.

Article
Computer Science and Mathematics
Probability and Statistics

Syafi’ Bariq’ Syihabuddin Hidayatullah

,

Muhammad Ahsan

,

Wibawati Wibawati

Abstract: One of the main tools in Statistical Process Control (SPC) for monitoring quality is the control chart. Simultaneous multivariate control charts are widely used to monitor shifts in the process mean and variability at the same time. One Shewhart-type simul-taneous multivariate chart is the Max-Half-Mchart, which can detect both small and large shifts in the mean and variability. However, outliers can distort the estimation of process parameters used to set control limits. In addition, outliers can cause two related problems, namely the masking effect and the swamping effect. Recent studies have highlighted the importance of cellwise outliers. Previous studies have shown that cellwise contamination can trigger outlier propagation. Therefore, casewise-based ro-bust estimators become less relevant under such conditions. CellMCD is a robust method for estimating location and covariance by integrating cellwise outlier detection into a single objective function. This study aims to develop a robust Max-Half-M chart based on cellMCD. Based on simulation studies under different correlation levels and contamination proportions, the proposed chart shows more stable performance than the conventional chart and the robust Fast-MCD–based version, as indicated by higher AUC values and lower FN rates. The ARL analysis also suggests that the cellMCD-based chart tends to detect small to moderate shifts faster. In the real-data application, the cellMCD-based chart successfully detects seven out-of-control signals, which is more than the comparison charts.

of 29

Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated