Computer Science and Mathematics

Sort by

Article
Computer Science and Mathematics
Probability and Statistics

Maksym Luz

,

Mikhail Moklyachuk

Abstract: We consider the problem of optimal linear estimation of the functional ANξ = ∑Nk=0 a(k)ξ(k) which depends on the unknown values ξ(k), k = 0,1,...,N, of a stochastic sequence with harmonizable symmetric α-stable nth increments. The derived estimates are based on observations at points m ∈ Z \ {0,1,2,...,N}. The cases of observations without noise, with harmonizable symmetric α-stable noise and with noise having harmonizable symmetric α-stable increments are studied. The classical solutions as well as the minimax robust ones are obtained.

Article
Computer Science and Mathematics
Probability and Statistics

Francisco Novoa-Muñoz

Abstract: The Poisson–Three-Parameter Lindley (PTPL) distribution constitutes a flexible Poisson mixture model for overdispersed count data, encompassing several classical count distributions as special or limiting cases. Despite its growing use in applied contexts, no formal goodness-of-fit test specifically designed for this distribution is currently available. In this paper, we propose and study a new goodness-of-fit test for the PTPL model based on a Cramér-von Mises type distance between the empirical and theoretical probability generating functions (PGFs). For polynomial weight functions, the test statistic admits an explicit closed-form representation; in practice, it is computed efficiently via numerical quadrature. The null distribution of the statistic is approximated via parametric bootstrap. We establish theoretical properties of the proposed procedure, including consistency against fixed alternatives and the validity of the bootstrap approximation. Monte Carlo simulations with sample sizes n ∈ {50,100,150,200,500} for size evaluation and n = 250 for power comparisons, and weight exponents a ∈ {0,1,2}, show that the empirical size is well controlled at both the 5% and 10% nominal levels, and that the test exhibits competitive power against Poisson, Negative Binomial, COM-Poisson, and Zero-Inflated Poisson alternatives. Areal data application to five overdispersed count datasets further illustrates the practical utility of the method.

Article
Computer Science and Mathematics
Probability and Statistics

Megang Nkamga Junile Staures

,

Audrius Kabašinskas

Abstract: Wedocument anetwork-level vulnerability in pension fund systems: lifecycle allocation regulation, designed to protect individual participants, compresses cross-fund return dynamics to the point where provider choice offers little diversification. Using daily net asset value data from second-pillar pension funds in Lithuania over 2019–2025, we find that a single common factor explains at least 73% of total return variance even in the calmest observed periods. We develop an unsupervised regime-detection framework combining a PCA-based absorption ratio, DTW hierarchical clustering, and a Gaussian hidden Markov model with a data-driven crisis threshold. The HMM specification is supported by a dual empirical calibration of the stickiness prior and cross-validated against a fully Bayesian sticky HMM. The framework identifies three latent regimes in which elevated systemic co-movement is the structural norm rather than an exceptional state and shows that funds group primarily by age cohort rather than by provider. The absorption ratio has no significant relationship with global equity benchmarks in either calm or high-risk regimes, indicating that systemic stress is network-internal rather than imported. Cluster-level tracking-error amplification of 1.09× to 1.23× during high-risk episodes confirms that even conservative funds serving retirement-age participants are not insulated.

Essay
Computer Science and Mathematics
Probability and Statistics

George Ellison

Abstract: This article critically examines Jack Duffield’s proposition that (all) intelligence analysts should become competent in “a core set of statistical analytical techniques” so as to address the “total information overload” they have experienced following the proliferation of ‘Big Data’. It summarises the generic (technique-agnostic) and particular (technique-specific): analytical choices and decisions that analysts need to be competent to make when using each of the 11 techniques proposed; together with the parametric and non-parametric assumptions on which each of these techniques rely; the sources of non-systematic and systematic error that analysts using these techniques need to address; and the diagnostic measures that providers and consumers of findings generated by these techniques should use to assess any flaws or contingencies associated therewith, and thereby temper any associated inferential certainty and importance. When compared to baseline statistical competencies of non-specialist intelligence analysts, these summaries demonstrate the substantial additional training that Duffield’s proposition would require. The article concludes that this may prove a big ask without an extended period of additional training, work-based experience and specialist supervision. In the absence thereof, underqualified intelligence analysts using such techniques would risk undermining intelligence analyses with multiple analytical and inferential mistakes.

Article
Computer Science and Mathematics
Probability and Statistics

Myroslav Strynadko

Abstract: Stochastic computing represents numerical values as probabilities encoded in Bernoulli bitstreams, enabling simple logic-based operations but also introducing practical challenges related to bitstream length, correlation, synchronization, and scalability. These challenges become especially important when heterogeneous physical signals are processed directly by mapping each low-level feature into a separate stochastic stream. This work proposes a hierarchical event-oriented stochastic processor architecture for continuous monitoring of event probabilities. Instead of directly encoding all physical signal features as independent bitstreams, the proposed architecture introduces local event-probability formation blocks. Each local block converts task-relevant features of a physical channel into a compact set of calibrated or model-defined event probabilities, which are then represented by Bernoulli bitstreams and processed by a stochastic event-fusion core. The processor output is not a single static decision value but a time-resolved global event-probability signal, Pevent (t), enabling temporal analysis of event persistence, repetition, trend, and accumulated exposure. As a proof of principle, the architecture is demonstrated using a synthetic acoustic monitoring scenario. Acoustic features associated with warning-like, repeated, prolonged, impact-like, and background sound patterns are converted into local event probabilities and subsequently into stochastic bitstreams. The results show that the proposed hierarchical representation can reduce the number of required stochastic streams while preserving event-level interpretability. The numerical demonstration also illustrates the expected compression–accuracy trade-off: direct feature-to-bitstream mapping may provide lower reconstruction error, whereas hierarchical event mapping improves architectural compactness and supports continuous event-level monitoring. The proposed framework provides a basis for future optical, photonic, or hybrid implementations of event-oriented stochastic processors for probabilistic sensing and decision-support systems.

Article
Computer Science and Mathematics
Probability and Statistics

Wilfried Kuissi-Kamdem

,

Marcel Ndengo

Abstract: This paper studies an optimal consumption-investment problem in a multi-asset financial market where risky assets returns incorporate returns history. Preferences are modelled using Epstein-Zin recursive utility, allowing a separation between risk aversion and intertemporal substitution. Using the well-known martingale optimality principle and forward-backward stochastic differential equations (FBSDEs), we obtain explicit closed-form solutions for the optimal strategy and value function. A sensitivity analysis illustrates the dependence of optimal policies and value function on key parameters, including risk aversion, elasticity of intertemporal substitution (EIS), memory horizon, learning intensity, and wealth-history parameters. The findings provide new insights into the interaction between behavioural features and dynamic portfolio choice in a multi-asset setting.

Article
Computer Science and Mathematics
Probability and Statistics

Katerine M. Sadie

,

Johan A. du Preez

,

Willie Brink

Abstract: Probabilistic graphical models (PGMs) provide a powerful framework for modelling complex systems, but inference over loopy graphs requires approximate methods whose accuracy depends on how factors are clustered in the graphical representation. Existing factor clustering methods rely on the number of variables in a cluster as a proxy for memory cost and informational content---a loose upper bound that leads to suboptimal merging decisions. We address this limitation by proposing an efficient algorithm for estimating the joint entropy of a group of clusters without explicitly multiplying out the constituent factors, thereby avoiding the exponential computational cost that makes exact computation prohibitive. The algorithm integrates naturally with both static and dynamic graph restructuring methods, and reduces to the Kikuchi entropy approximation when applied to the complete graph. Experiments on models with up to 24 variables demonstrate that the algorithm produces accurate (when compared to ideal junction tree performance) entropy estimates across diverse model types, with errors remaining within tight bounds. Scalability is further validated on a substantially larger model defined over 2640 random variables. These results confirm that accurate entropy estimation is achievable wherever reliable probabilistic inference is possible, and that the proposed estimation algorithm yields objectively close approximations, thereby supporting improved clustering decisions in PGM structuring algorithms.

Article
Computer Science and Mathematics
Probability and Statistics

Saisai Hou

,

Yunzhi Zhu

,

Sen Zhang

Abstract: Randomized reward mechanisms are often described as repeated trials with a fixed success probability. Pity and guarantee rules depart from this symmetric baseline by making the hazard depend on the current state. This paper studies that state-dependent asymmetry for a finite soft-pity waiting-time model. The waiting time for one rare item is represented as an absorption time of a Markov chain whose transient state is the pity counter. For a specified piecewise-linear success schedule, backward recurrences are derived for the expectation and variance, and a dynamic-programming recursion gives the full probability mass function. Repeated convolution then yields the distribution for multiple independent stages. The numerical section reports quantiles, tail probabilities, VaR/CVaR-type summaries, expected excess values, sensitivity analyses, normal-approximation diagnostics, and distributional asymmetry indicators. A featured-target variant with a binary guarantee state is also included. Throughout, the reported quantities are consequences of the stated transition rule; Monte Carlo simulation is used only as a numerical check.

Article
Computer Science and Mathematics
Probability and Statistics

G. Archana Alias Gurulakshmi

,

Aliakbar Montazer Haghighi

,

G. Ayyappan

,

N. Arulmozhi

,

Natarajan Aishwarya

Abstract: A single-server queueing framework with infinite queueing capacity is formulated and analyzed, where customer arrivals are governed by a Markovian arrival process and both service and repair times are characterized through phase-type distributions. The service structure is two-tier: each incoming customer undergoes a mandatory primary service under a first-come, first-served discipline; then a secondary service becomes available on an optional basis, provided only at the customer’s request after the primary service is completed. When the system empties, the server initiates a shutdown process before entering a vacation period. Upon return from vacation, the server resumes service if customers are present; otherwise, successive vacations will be taken until demand arises. Random server failures can occur during either service mode, after which the server undergoes repair before restarting. The steady-state behavior of the system is analyzed using the matrix analytical method, from which the stability conditions, stationary probability vectors, and key performance metrics are derived. A cost analysis framework is also formulated to evaluate the economic implications of system operation. To substantiate these analytical findings and illustrate the practical applicability of the proposed model, a series of numerical experiments are performed.

Article
Computer Science and Mathematics
Probability and Statistics

Dzulani Mashavhela

,

Thakhani Ravele

,

Caston Sigauke

Abstract: Currency instability in emerging markets has become increasingly consequential for trade flows, investment allocation, and macroeconomic management. This study examines the volatility dynamics of the South African rand against the US dollar (ZAR/USD) using two advanced econometric frameworks: the Family GARCH (fGARCH) model and the first-order Beta-Skew-T-Generalised Autoregressive Conditional Heteroskedasticity (Beta-Skew-T-EGARCH) model. As one of the most heavily traded emerging-market currency pairs, the ZAR/USD serves as a barometer of South Africa’s economic health and vulnerability to external shocks. Standard GARCH specifications, however, impose symmetry constraints that fail to accommodate the long-memory effects, distributional skewness, and leverage dynamics consistently observed in emerging-market currency returns. This study addresses these limitations by deploying the fGARCH and Beta-Skew-T-EGARCH frameworks on daily ZAR/USD returns spanning 5 January 2000 to 1 October 2024. The sGARCH and fGARCH specifications were assessed across five innovation distributions, Student’s t, skewed Student’s t (SSTD), generalised error (GED), skewed generalised error (SGED), and generalised hyperbolic (GH), with model fitness evaluated using the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Hannan-Quinn criterion (HQ), and Shibata criterion (SIC), selecting the specification with the lowest combined penalty. The fGARCH(1,1) model fitted to return-frequency data under the SSTD achieves the lowest AIC, outperforming the sGARCH benchmark. Among the covariates examined (day, month, trend, oil, platinum), the trend variable is the sole statistically significant predictor (p = 0.007), exerting a positive influence on ZAR/USD volatility. The two-component Beta-Skew-T-EGARCH model, by decomposing volatility into long-run structural and short-run transient components, delivers a superior fit over the one-component variant, evidenced by a lower BIC (3.068435) and a higher log-likelihood (-748.464826). Seven-day-ahead forecasts confirm that the two-component model captures declining conditional volatility, whereas the one-component model sustains persistently elevated estimates.

Article
Computer Science and Mathematics
Probability and Statistics

Cynthia A.V. Tojeiro

,

Vera D. Tomazella

,

Agatha S. Rodrigues

,

Pedro R. Marinho

Abstract: In this paper, we propose two novel defective survival models within the Gamma–G family: the Defective Gamma–Gompertz and the Defective Gamma–Da- gum distributions. Unlike classical mixture cure models, our formulation incorporates the cure fraction directly into the survival function through the defective property of the baseline distribution, avoiding the need for an explicit mixing parameter. The motivation for these new models lies in the limited set of defective distributions currently available, despite the increasing demand for flexible and parsimonious cure rate models in biomedical applications. By extending the defective property to the Gamma–G construction, our approach fills this methodological gap while providing models that are both interpretable and computationally efficient. We show that the Gamma–G construction preserves defectiveness whenever the baseline distribution is defective, thus establishing a coherent theoretical foundation. Both models allow covariate effects through regression structures on shape, scale, and, in the case of the Gamma–Dagum distribution, on the cure-fraction parameter, resulting in parsimonious and interpretable specifications. Parameters are estimated via maximum likelihood, and an extensive Monte Carlo study confirms estimator consistency and accurate coverage in finite samples. The practical relevance of the models is illustrated with two large clinical datasets on melanoma and cervical cancer from the São Paulo Cancer Registry. Results reveal that the proposed models not only provide superior goodness-of-fit but also offer clearer insights into long-term survival compared to traditional cure-rate approaches. Overal, this work introduces a unifying and flexible framework for defective survival models, extending their applicability and delivering practical improvements over existing cure models.

Article
Computer Science and Mathematics
Probability and Statistics

Rachid Jaafar

,

Ahmed Hfa

,

Ahmed Sani

Abstract: Copulas, as a new tool for statistical analysis, are studied in depth. One of the most notable aspects of this study is the geometric perspective, particularly the concept of regeneration via extreme points and the well-known result in functional analysis: the Krein-Milman theorem. The practical value of such a theoretical study is highlighted by a very interesting application in biomedical analysis. Computer implementations have demonstrated the effectiveness of the adopted approach.

Article
Computer Science and Mathematics
Probability and Statistics

Adrian Velasco

Abstract: Reliable crop statistics are foundational to food-security planning, yet the literature often treats forecasting accuracy and data-quality assessment as separate tasks. This paper develops an integrated evidence synthesis around three Philippine studies that together illuminate both problems for rice and corn. The first compared Seasonal Autoregressive Integrated Moving Average and Holt-Winters models for quarterly rice and corn production and found that Holt-Winters with additive seasonality yielded lower forecast errors. The second extended the forecasting problem to machine-learning models and reported that Random Forest produced the strongest predictive performance among the tested algorithms, while performance varied across other nonlinear approaches. The third applied the Newcomb-Benford law to official crop production statistics and identified deviations in rice and corn digit patterns that warrant further validation. Drawing on official Philippine Statistics Authority documentation and broader methodological literature on forecast evaluation, survey reliability, and crop-yield prediction, the paper argues that forecastability and statistical integrity should be studied together rather than in isolation. A series can be forecastable yet still contain reporting irregularities, while a numerically plausible series can remain difficult to forecast because of structural breaks, weather shocks, or shifting production conditions. For agricultural planning, the strongest evidence base comes from combining temporal modeling with routine statistical-quality screening, transparent revision practices, and follow-up diagnostics when anomalies appear. The paper concludes by proposing a practical framework for Philippine agricultural analytics in which data integrity checks precede and accompany forecasting, thereby improving the credibility of crop outlooks used for procurement, import planning, early warning, and resource allocation.

Article
Computer Science and Mathematics
Probability and Statistics

Felix Reichel

Abstract: Skyjo is a simple stochastic card game with partial information, local replacement decisions, and score-reducing column removal events. This paper develops a formal mathematical model of the game, derives expected-score rules for turn-level actions, proves several dominance and threshold results, and evaluates a family of heuristic strategies through Monte Carlo simulation. The focus here lies on local optimality under explicit belief assumptions rather than a full equilibrium solution of the multiplayer game. Finally a simulation code is provided for reproducibility.

Article
Computer Science and Mathematics
Probability and Statistics

Ntebogang Dinah Moroke

Abstract: The attribution of systemic financial stress to specific market sectors requires metrics that are simultaneously faithful to the model’s internal computations, statistically consistent as the sample size grows, and connected to a physically meaningful measure of directed information flow. This paper addresses all three requirements through the lens of information geometry. We present and empirically verify the Entropy-Saliency Equivalence Theorem: the Metabolic Saliency Sms(i, t) introduced in the companion paper (Paper 1 of this series) is an asymptotically unbiased estimator of the local Kullback-Leibler divergence KL(q(i) t ∥q(i) 0 ) between the stressed and resting sector-level return distributions, where the convergence is governed by the Fisher information matrix of the Power Mapping Network (PMNet) output distribution. We also derive the finite-sample bias-variance decomposition of the Kraskov-Stögbauer-Grassberger (KSG) transfer entropy estimator used to construct the saliency weights, establishing a minimax-optimal convergence rate of O(T−2/(d+2)) for a d-dimensional density support. A novel evaluation metric, the Spatio-Temporal Information Flux (STIF), is proposed to quantify the directed flow of stress-relevant information between JSE sectors in bits per trading day, providing a sector-level causal audit trail that satisfies the interpretability requirements of the South African Financial Sector Regulation Act (FSRA, 2017) and MiFID II. Empirical validation on the JSE canonical panel (N = 87 securities, T = 2,731 trading days, January 2015 to December 2025) with Eskom load-shedding stages as exogenous stress injectors confirms that Sms tracks KL(qt||q0) with a Pearson correlation of \( \hat{\rho} \) = 0.81 (p < 0.001) and that the STIF metric identifies the energy sector as the primary information source during Stage 4+ events, with information flux to the financial sector peaking at 0.43 bits/day—a 3.1× increase above the resting baseline of 0.14 bits/day. These results complete the information-theoretic glass-box characterisation of the GWS-STNet architecture and bridge topological stability theory with a fully information-theoretic characterisation of financial stress attribution.

Article
Computer Science and Mathematics
Probability and Statistics

Aris Spanos

Abstract: The Two-Envelope Problem (TEP) is revisited to argue that the standard evaluation of expected returns relies on spurious probabilities arising from a misuse of formal probability theory. The source of the problem is the ex post framing of two identical envelopes, X and Y, one containing twice as much money as the other, after one envelope, say X, has been selected and its content X=x observed. The value x is then used to define Y in terms of the values y=x/2 and y=2x, each assigned probability .5, with an analogous derivation when Y is selected. This renders X and Y ill-defined random variables because the relevant probabilistic framing must instead be based on the original experimental setup, prior to any selection or observation, where the envelope contents are unknown, say $θ and $2θ. Framing the original setup using axiomatic probability, the dependence between X and Y is accounted for when x=θ, y=2θ, and when x=2θ, y=θ. The ensuing joint distribution of X and Y determines that the expected returns imply indifference between keeping the chosen envelope and switching, explaining away the ‘paradox’ as a misapplication of probability theory.

Article
Computer Science and Mathematics
Probability and Statistics

Moriba Kemessia Jah

Abstract: Probability is often treated as the default representation of uncertainty in statistical inference and machine learning. This paper asks a more fundamental question: under what conditions is a probability distribution a valid representation of uncertainty, and what is the information cost of assuming one when those conditions are not met? We show that inference is governed by two joint constraints: maximizing information capacity by preserving the geometric degrees of freedom through which contrast can register, and minimizing false information by asserting nothing the evidence has not forced. These constraints, expressed through Jaynes’s principle of maximum entropy and Popper’s criterion of falsification, determine the structure of inference without remainder. Bayesian inference emerges not as a competing framework, but as the limiting geometry obtained when epistemic width has contracted sufficiently to justify probabilistic closure. In this sense, probability is not assumed—it is earned. We trace the origin of these ideas through two decades of operational experience in spacecraft navigation, space situational awareness, and orbit determination, where standard probabilistic filters performed well in nominal regimes but failed systematically when uncertainty was driven by genuine ignorance rather than statistical variability. Across problems including debris tracking, attitude estimation, and multi-target inference, the consistent failure mode was premature probabilistic commitment in regimes where observation geometry could not support distinguishability. The central result is that information exists only in the presence of contrast, and that structure destroyed without evidence justification is information permanently lost. We formalize this principle through an epistemic geometry of inference and show that probabilistic representations are valid only when distinguishability, parameterization, and likelihood structure are all earned by the data. When these conditions fail, probabilistic closure incurs a measurable and avoidable information capacity cost.

Article
Computer Science and Mathematics
Probability and Statistics

Joseph Njuki

,

Thomas Gilbert

Abstract: In this article, we develop a goodness-of-fit test for the Kumaraswamy distribution based on energy statistics. Due to the availability of its quantile (inverse) function, Kumaraswamy distribution has been shown to be the preferred alternative to the beta distribution, since both have bounded support in the (0,1) interval. The proposed test procedure is simple and more powerful against general alternatives. Under different settings, simulations show that the proposed test is capable of being well controlled for any given significance (nominal) levels. In terms of power comparisons, the proposed test outperforms other existing methods in different settings. We then apply the proposed test to real datasets (underground economy index, food expenditure, and Shasta water reservoir) to demonstrate its competitiveness and usefulness.

Article
Computer Science and Mathematics
Probability and Statistics

Muddassiru Abubakar

,

Umar Usman

Abstract: Road traffic accidents remain a critical public safety challenge in rapidly urbanizing regions of sub-Saharan Africa, where heterogeneous road infrastructure and high population density exacerbate risk. This study applies Kernel Density Estimation (KDE) and Geographically Weighted Regression (GWR) to analyze spatial patterns of road traffic accidents across Jega Local Government Area, Kebbi State, Nigeria, using fifty georeferenced primary data points collected through Global Positioning System surveys and manual traffic counts. The KDE analysis identified optimal bandwidth of 175 meters with a Prediction Accuracy Index (PAI) of 3.50 at the 85th percentile threshold, indicating strong spatial clustering of accidents. Spatial autocorrelation analysis revealed significant clustering (Moran's I = 0.312, p < 0.05). The GWR model demonstrated strong explanatory power with global R² of 0.72 and AICc of 420.35. Local R² values exhibited substantial spatial variation (range: 0.20–0.95), highlighting the importance of localized analysis. Cross-validation results (RMSE = 3.45, MAE = 2.12, R² = 0.65) confirmed predictive robustness. The integrated geospatial framework identified distinct high-risk corridors, with Gada (8 accidents), Garkar Ando (5 accidents), and Gobirawa (5 accidents) emerging as critical hotspots requiring immediate intervention. This research provides a validated geostatistical framework for micro-scale road safety planning in Nigerian cities.

Article
Computer Science and Mathematics
Probability and Statistics

Xingwei Hu

,

Caihong Hu

,

Cheng-Kuang Wu

Abstract: This paper derives closed-form expressions for the asymptotic covariance matrices of unrotated factor loading and uniqueness estimators for several widely used non-maximum-likelihood factor extraction methods. These include least squares, principal factor, iterative principal component, alpha factor, and image factor analysis. By expressing these results explicitly in terms of the asymptotic covariance of the sample covariance or correlation matrix, the proposed formulas facilitate straightforward computation of standard errors. When combined with the delta method from rotation criteria, they further yield analytically tractable standard errors for rotated factor loadings. Monte Carlo simulations demonstrate accurate finite-sample performance, and an empirical application illustrates practical implementation of the proposed approach.

of 29

Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated