Computer Science and Mathematics

Sort by

Article
Computer Science and Mathematics
Probability and Statistics

Hening Huang

Abstract: In many scientific and engineering fields (e.g., measurement science), a probability density function often models a system comprising a signal embedded in noise. Conventional measures, such as the mean, variance, entropy, and informity, characterize signal strength and uncertainty (or noise level) separately. However, the true performance of a system depends on the interaction between signal and noise. In this paper, we propose a novel measure, called "inforpower", for quantifying the system’s informational power that explicitly captures the interaction between signal and noise. We also propose a new measure of central tendency, called “information-energy center”. Closed-form expressions for inforpower and information-energy center are provided for ten well-known continuous distributions. Moreover, we propose a maximum inforpower criterion, which can complement the Akaike information criterion (AIC), the minimum entropy criterion, and the maximum informity criterion for selecting the best distribution from a set of candidate distributions. Two examples (synthetic Weibull distribution data and Tana River annual maximum streamflow) are presented to demonstrate the effectiveness of the proposed maximum inforpower criterion and compare it with existing goodness-of-fit criteria.
Article
Computer Science and Mathematics
Probability and Statistics

Kotchaporn Karoon

,

Yupaporn Areepong

Abstract: Among various statistical process control (SPC) methods, control charts are widely employed as essential instruments for monitoring and improving process quality. This study focuses on a new modified exponentially weighted moving average (New Modified EWMA) control chart that enhances detection capability under integrated and fractionally integrated time series processes. Special attention is given to the effect of symmetry on the chart structure and performance. The proposed chart preserves a symmetric monitoring configuration, in which the two-sided design (LCL>0) establishes control limits that are equally spaced around the center line, enabling balanced detection of both upward and downward shifts. Conversely, the one-sided version (LCL=0) introduces a deliberate asymmetry to increase sensitivity to upward mean shifts, which is particularly useful when downward deviations are physically implausible or less critical. The efficacy of the control chart utilizing both models is assessed through Average Run Length (ARL). Herein, the explicit formula of ARL is derived and compared to the ARL obtained from the numerical integral equation (NIE) in terms of both accuracy and computational time. The efficacy of the control chart employing both models is evaluated via Average Run Length (ARL). The explicit formula for ARL is derived and compared to the ARL produced by the numerical integral equation (NIE) regarding accuracy and processing time. The accuracy of the analytical ARL expression is validated by its negligible percentage difference (%diff) in comparison to the results derived using the NIE approach and the display processing time not exceeding 3 seconds. To confirm the highest capability, the suggested method is compared to both the classic EWMA and the modified EWMA charts using evaluation metrics such as ARL and SDRL (standard deviation run length), as well as RMI (relative mean index) and PCI (performance comparison index). Finally, Its examination of US stock prices illustrates performance, employing a symmetrical two-sided control chart for the rapid detection of changes through the new modified EWMA, in contrast to standard EWMA and modified EWMA charts.
Article
Computer Science and Mathematics
Probability and Statistics

Sello Dalton Pitso

,

Taryn Michael

Abstract: Background: Banks often assume that higher credit limits increase customer default risk because greater exposure appears to imply greater vulnerability. This reasoning, however, conflates correlation with causation. Whether increasing a customer's credit limit truly raises the likelihood of default remains an open empirical question which this work aims to answer. Methods: We applied Bayesian causal inference to estimate the causal effect of credit limits on default probability. The analysis incorporated Directed Acyclic Graphs (DAGs) for causal structure, d-separation for identification, and Bayesian logistic regression using a dataset of 30,000 credit card holders in Taiwan (April--September 2005). Twenty-two confounding variables were adjusted for, covering demographics, repayment history, and billing and payment behavior. Continuous covariates were standardized, and posterior inference was performed using NUTS sampling with posterior predictive simulations to compute Average Treatment Effects (ATEs). Results: We found that a one standard deviation increase in credit limit reduces default probability by 1.44 percentage points (95% HDI: [-2.0%, -1.0%]), corresponding to a 6.3% relative decline. The effect was consistent across demographic subgroups and remained robust under sensitivity analysis addressing potential unmeasured confounding. Conclusion: The findings suggest that increasing credit limits can causally reduce default risk, likely by enhancing financial flexibility and lowering utilization ratios. These results have practical implications for credit policy design and motivate further investigation into mechanisms and applicability across broader lending environments.
Article
Computer Science and Mathematics
Probability and Statistics

Christopher Stroude Withers

Abstract: Suppose that we have a statistical model with $q=q_n$ unknown parameters, $w=(1_1,\dots,w_q)'$, estimated by $\hat{w}$, based on a sample of size $n$. Suppose also, that we have Edgeworth expansions for the density and distribution of $X_n=n^{1/2} (\hat{w}-w)$. %We ask the question: How fast can $q=q_n$ increase with $n$ for the three main Edgeworth expansions to remain valid? We show that it is sufficient that $q_n=o(n^{1/6})$, if the estimate $\hat{w}$ is a standard estimate. That is, $E\ \hat{w}\rightarrow w$ as $n\rightarrow w$, and for $r\geq 1$, its $r$th order cumulants have magnitude $n^{1-r}$ and can be expanded in powers of $n^{-1}$. This very large class of estimates has a huge range of potential applications. When $\hat{w}=t(\bar{X})$ for $t:R^q\rightarrow R^p$ a smooth function of a sample mean $\bar{X}$ from a distribution on $R^q$, and $p_nq_n=pq\rightarrow\infty$ as $n\rightarrow\infty$, I show that the Edgeworth expansions for $\hat{w}$ remain valid if $q_n^8 p_n^6=o(n)$. For example, this holds for fixed $p=p_n$ if $q_n=o(n^{1/8})$. I also give a method that greatly reduces the number of terms needed for the 2nd and 3rd order terms in the Edgeworth expansions, that is, for the 1st and 2nd order corrections to the Central Limit Theorems (CLTs).
Article
Computer Science and Mathematics
Probability and Statistics

Iman Attia

Abstract: In the present paper, Probability weighted moments (PWMs) method for parameter estimation of the median based unit weibull (MBUW) distribution is discussed. The most widely used first order PWMs is compared with the higher order PWMs for parameter estimation of (MBUW) distribution. Asymptotic distribution of this PWM estimator is derived. This comparison is illustrated using real data analysis.
Review
Computer Science and Mathematics
Probability and Statistics

Eunji Lim

Abstract: Shape-restricted regression provides a framework for estimating an unknown regression function $f_0: \Omega \subset \mathbb{R}^d \rightarrow \mathbb{R}$ from noisy observations \((\boldsymbol{X}_1, Y_1), \ldots, (\boldsymbol{X}_n, Y_n)\) when no explicit functional relationship between $\boldsymbol{X}$ and $Y$ is known, but $f_0$ is assumed to satisfy structural constraints such as monotonicity or convexity. In this work, we focus on these two shape constraints (monotonicity and convexity), and provide a review on the isotonic regression estimator, which is a least squares estimator under monotonicity, and the convex regression estimator, which is a least squares estimator under convexity. We review existing literature with an emphasis on the following key aspects: quadratic programming formulations of isotonic and convex regression, statistical properties of these estimators, efficient computational algorithms for computing them, their practical applications, and current challenges. Finally, we conclude with a discussion of open challenges and possible directions for future research.
Article
Computer Science and Mathematics
Probability and Statistics

Humphrey Takunda Muchapireyi

Abstract: In this study, we propose a mechanism for rotational savings and credit associations (ROSCAs) by matching players into pools based on an anonymity rating while privacy and regulatory auditability are preserved; and fees and penalties guarantee collateral. We replace the conventional local trust, reputation, and social enforcement of these games with actuarially manufactured trust. We posit the generalization of cycle length from the usual lunar cadence to variable arbitrary periods. In fact, in Zimbabwe, ‘Rounds’ now vary the payout avenue itself, from regular cash contributions to formal bank transfers, mobile money and the dispensing of goods and groceries. We explore ex‑ante solvency via concentration bounds, budget non‑deficit under simple collateral schedules and individual rationality. Our study hints at the premise that actuarially mediated trust enables scalability, anonymity, and resilience to default.
Article
Computer Science and Mathematics
Probability and Statistics

Zdeněk Kala

Abstract: A Hermite-based framework for reliability assessment within the limit state method is developed in this paper. Closed-form design quantiles under a four-moment Hermite density are derived by inserting the Gaussian design quantile into a calibrated cubic translation. Admissibility and implementation criteria are established, including a monotonicity bound, a positivity condition for the platykurtic branch, and a balanced Jacobian for the leptokurtic branch. Material data for the yield strength and ductility of structural steel are fitted using moment-matched Hermite models and validated through goodness-of-fit tests. A truss structure is then analysed to quantify how non-Gaussian input geometry influences structural resistance and its corresponding design value. Variance-based Sobol sensitivity analysis demonstrates that departures of the radius distribution towards negative skewness and higher kurtosis increase the first-order contribution of geometric variables and thicken the lower tail of the resistance distribution. Closed-form Hermite design resistances are shown to agree with numerical integration results and reveal systematic deviations from FORM estimates, which rely solely on the mean and standard deviation. Monte Carlo simulation studies confirm these trends and highlight the slow convergence of tail quantiles and higher-order moments. The proposed approach remains fully compatible in the Gaussian limit and offers a practical complement to EN 1990 verification procedures when skewness and kurtosis have a significant influence on design quantiles.
Article
Computer Science and Mathematics
Probability and Statistics

Anna V. Aleshina

,

Andrey L. Bulgakov

,

Yanliang Xin

,

Larisa S. Skrebkova

Abstract: A mathematical model of sustainable resource allocation in a competitive economy is developed and studied, taking into account transaction costs and technological constraints. The model describes the interaction of producers and consumers, introduces a technological set and price dynamics through demand–supply imbalance. Using the theory of covering mappings and variational methods, the existence of equilibrium prices is proven. Issues of stability, numerical algorithms, and macroeconomic interpretation of the obtained results are considered.
Article
Computer Science and Mathematics
Probability and Statistics

Anna V. Aleshina

,

Andrey L. Bulgakov

,

Yanliang Xin

,

Igor Y. Panarin

Abstract: The paper develops a mathematical approach to the analysis of the stability of economic equilibria in nonsmooth models. The λ-Hölder apparatus of subdifferentials is used, which extends the class of systems under study beyond traditional smooth optimization and linear approximations. Stability conditions are obtained for solutions to intertemporal choice problems and capital accumulation models in the presence of nonsmooth dependencies, threshold effects, and discontinuities in elasticities. For λ-Hölder production and utility functions, estimates of the sensitivity of equilibria to parameters are obtained, and indicators of the convergence rate of trajectories to the stationary state are derived for λ > 1. The methodology is tested on a multisectoral model of economic growth with technological shocks and stochastic disturbances in capital dynamics. Numerical experiments confirm the theoretical results: a power-law dependence of equilibrium sensitivity on the magnitude of parametric disturbances is revealed, as well as consistency between the analytical λ-Hölder convergence rate and the results of numerical integration. Stochastic disturbances of small variance do not violate stability. The results obtained provide a rigorous mathematical foundation for the analysis of complex economic systems with nonsmooth structures, which are increasingly used in macroeconomics, decision theory, and regulation models.
Article
Computer Science and Mathematics
Probability and Statistics

Takashi Hayakawa

,

Satoshi Asai

Abstract: Hierarchical Bayesian models based on Gaussian processes are considered useful for describing complex nonlinear statistical dependencies among variables in real-world data. However, effective Monte Carlo algorithms for inference with these models have not yet been established, except for several simple cases. In this study, we show that, compared with the slow inference achieved with existing program libraries, the performance of Riemannian-manifold Hamiltonian Monte Carlo (RMHMC) can be drastically improved by optimising the computation order according to the model structure and dynamically programming the eigendecomposition. This improvement cannot be achieved when using an existing library based on a naive automatic differentiator. We numerically demonstrate that RMHMC effectively samples from the posterior, allowing the calculation of model evidence, in a Bayesian logistic regression on simulated data and in the estimation of propensity functions for the American national medical expenditure data using several Bayesian multiple-kernel models. These results lay a foundation for implementing effective Monte Carlo algorithms for analysing real-world data with Gaussian processes, and highlight the need to develop a customisable library set that allows users to incorporate dynamically programmed objects and finely optimises the mode of automatic differentiation depending on the model structure.
Article
Computer Science and Mathematics
Probability and Statistics

Jiangcui Ge

,

Xiaoshuang Zhou

,

Cuiping Wang

Abstract: Aiming at the estimation efficiency problem of multicollinearity and longitudinal data correlation in the varying coefficient partially nonlinear models, a method based on QR decomposition and quadratic inference function (QIF) is proposed to obtain the orthogonality estimation of parameter components and varying coefficient functions. QR decomposition eliminates the pathology of the design matrix, and combines the adaptive weighting of the relevant structures within the group by QIF to effectively capture the complex correlation structure of longitudinal data. The theoretical analysis proves the asymptotic nature of the estimator, and the efficiency of the estimation method proposed in this paper is verified by simulation experiments.
Article
Computer Science and Mathematics
Probability and Statistics

Fausto Galetto

Abstract: We start with the ideas in the papers “Chakraborti et al., Properties and performance of the c-chart for attributes data, Journal of Applied Statistics, January 2008” and “Bayesian Control Chart for Number of Defects in Production Quality Control. Mathematics 2024, 12, 1903. https:// doi.org/10.3390/math12121903”; then we use the Jarrett (1979) data from “A Note on the Intervals Between Coal-Mining Disasters” and the analysis by Kumar et al., and by Zhang et al. From the analysis of all the papers data we get different results: the cause is that they use the Probability Limits of the PI (Probability Interval) as they were the Control Limits (so they name them) of the Control Charts (CCs): those authors do not extract the complete information from the statistical data of CCs from the data not normally distributed. The Control Limits in the Shewhart CCs are based on the Normal Distribution (Central Limit Theorem, CLT) and are not valid for non-normal distributed data: consequently, the decisions about the “In Control” (IC) and “Out Of Control” (OOC) states of the process are wrong. The Control Limits of the CCs are wrongly computed, due to unsound knowledge of the fundamental concept of Confidence Interval. Minitab and other software (e.g. JMP, SAS) use the “T Charts”, claimed to be a good method for dealing with “rare events”, but their computed Control Limits of the CCs are wrong. We will show that the Reliability Integral Theory (RIT) is able to solve these problems.
Review
Computer Science and Mathematics
Probability and Statistics

Sourangshu Ghosh

Abstract: The Feynman–Kac formula stands among the rare mathematical results that elegantly bridge distinct conceptual worlds — analysis, probability, and physics — by asserting that the evolution of analytic structures such as parabolic partial differential equations can be represented through the expectations of random paths. It is a synthesis that makes rigorous Feynman’s intuitive vision of quantum propagation via path integrals and Kac’s probabilistic representation of parabolic equations. The purpose of this monograph is to present the Feynman–Kac formula and its far-reaching generalizations within a rigorous measure-theoretic and functional-analytic framework. While many excellent expositions introduce the formula as a computational tool or as a heuristic bridge between stochastic processes and partial differential equations, few texts aim to unify its analytical, probabilistic, and physical interpretations within a single, fully self-contained narrative.
Article
Computer Science and Mathematics
Probability and Statistics

Demetris Koutsoyiannis

,

G.-Fivos Sargentis

Abstract: We investigate the fundamental tradeoff between entropy and Gini index within income distributions, employing a stochastic framework to expose deficiencies in conventional inequality metrics. Anchored in the principle of maximum entropy (ME), we position entropy as a key marker of societal robustness, while the Gini index, identical to the (second-order) K-spread coefficient, captures spread but neglects dynamics in distribution tails. We recommend supplanting Lorenz profiles with simpler graphs such as the odds and probability density functions, and a core set of numerical indicators (K-spread K₂/μ, standardized entropy Φμ, and upper and lower tail indices, ξ, ζ) for deeper diagnostics. This approach fuses ME into disparity evaluation, highlighting a path to harmonize fairness with structural endurance. Drawing from percentile records in the World Income Inequality Database over 1947–2023, we fit flexible models (Pareto–Burr–Feller, Dagum) and extract K-moments and tail indices. Results unveil a convex frontier: moderate Gini reductions have little effect on entropy, but aggressive equalization incurs steep stability costs. Country-level analyses (Argentina, Brazil, South Africa, Bulgaria) link entropy declines to political ruptures, positioning low entropy as a precursor to instability. On the other hand, analyses based on the core set of indicators for present-day geopolitical powers (China, India, USA and Russia) show that they are positioned in a high stability area.
Article
Computer Science and Mathematics
Probability and Statistics

Yuchen Quan

,

Yaru Xue

,

Haisu Zhu

Abstract: Reconstructing medical images from partial measurements is a critical inverse problem in Computer Tomography, essential for reducing radiation exposure while maintaining diagnostic accuracy, addressing challenges of small size and poor resolution in CT data. Existing solutions based on machine learning typically train a model to directly map measurements to medical images, relying on a training dataset of paired images and measurements synthesized using a fixed physical model of the measurement process; however, this approach greatly hinders generalization to unknown measurement processes. To address this issue, we propose a fully unsupervised technique for solving the inverse problem, leveraging score-based generative models to eliminate the need for paired data. Specifically, we first train a score-based generative model on clean conventional-dose medical images to capture their prior distribution. Then, given measurements and a physical model of the measurement process, we introduce a sampling method to reconstruct an image consistent with both the prior and the measurements. Empirically, we observe comparable or better performance to other sampling techniques in several medical imaging tasks in Computer Tomography, while demonstrating considerably better generalization to unknown measurement processes. The code is available.
Article
Computer Science and Mathematics
Probability and Statistics

Karson Hodge

,

Weiqiang Dong

,

Emmanuel Tamakloe

,

Jie Zhou

Abstract: We propose a distribution–aware framework for unsupervised outlier detection that transforms multivariate data into one–dimensional neighborhood statistics and identifies anomalies through fitted parametric distributions. Supported by the CDF Superiority Theorem, validated through Monte Carlo simulations, the method connects distributional modeling with ROC–AUC consistency and produces interpretable, probabilistically calibrated scores. Across 23 real–world datasets, the proposed parametric models demonstrate competitive or superior detection accuracy with strong stability and minimal tuning compared with baseline non–parametric approaches. The framework is computationally lightweight and robust across diverse domains, offering clear probabilistic interpretability and substantially lower computational cost than conventional non–parametric detectors. These findings establish a principled and scalable approach to outlier detection, showing that statistical modeling of neighborhood distances can achieve high accuracy, transparency, and efficiency within a unified parametric framework.
Article
Computer Science and Mathematics
Probability and Statistics

Albert Antwi

,

Alexander Boateng

,

Daniel Maposa

Abstract: Classical binomial interval methods often exhibit poor performance when applied to extreme conditions, such as rare-event scenarios or small-sample estimations. Recent frequentist and Bayesian approaches have improved coverage in small-samples and rare-events but typically rely on fixed error margins that do not scale with the magnitude of the proportion, thus distorting uncertainty quantification at the extremes. As an alternative method to reduce these boundary distortions, we propose a novel hybrid approach that blends Bayesian, frequentist, and approximation-based techniques to estimate robust and adaptive intervals. The variance incorporates sampling variability, Wilson score margin of error, a tuned credible level, and a gamma regularisation term that is inversely proportional to sample size. Extensive simulation studies and real-data applications demonstrate that the proposed method consistently achieves competitive or superior coverage proportions with narrower or more conservative interval widths compared to Jeffreys and Wilson score intervals, especially for rare and extreme events. Geometric analysis of the tuning curves reveals convex-to-linear transitions and mirrored symmetry across the rare-extreme spectrum, which underscores its boundary sensitivity and adaptivity. Our method offers a theoretically grounded, computationally efficient and practically robust estimation of rare-event intervals which has applications in safety-critical reliability, epidemiology, and early-phase clinical trials.
Article
Computer Science and Mathematics
Probability and Statistics

Nipaporn Chutiman

,

Supawadee Wichitchan

,

Chawalit Boonpok

,

Monchaya Chiangpradit

,

Pannarat Guayjarernpanishk

Abstract: Adaptive cluster sampling (ACS) is a sampling method commonly employed when the population is rare and exhibits clustering. However, the initial sample selection may include units that do not satisfy the specified condition. To address this, general inverse sampling is incorporated into ACS, where the initial units are selected sequentially and termination criteria are applied to regulate the number of rare elements drawn from the population. The objective of this study is to develop an estimator of the population mean by utilizing auxiliary information within the framework of general inverse adaptive cluster sampling. The proposed estimator, constructed on the basis of a regression-type estimator, is analytically examined. A simulation study was conducted to validate the theoretical results. In this study, the region of interest was divided into 400 square units (20 rows by 20 columns). The results demonstrate that the proposed estimator, which incorporates auxiliary variables, consistently yields a lower variance than the conventional mean estimator without auxiliary information. This superiority holds across all scenarios considered, specifically when the predetermined number of rare units r ranges from two to ten. Therefore, the proposed estimator is shown to be more efficient than the estimator that does not employ auxiliary information.
Article
Computer Science and Mathematics
Probability and Statistics

Lwando Dlembula

,

Chioneso S. Marange

,

Lwando Kondlo

Abstract: Multicollinearity and outliers are common challenges in multiple linear regression, often adversely affecting the properties of least squares estimators. To address these issues, several robust estimators have been developed to handle multicollinearity and outliers individually or simultaneously. More recently, [35] introduced the robust Stein estimator (RSE), which integrates shrinkage and robustness to effectively mitigate the impact of both multicollinearity and outliers. Despite its theoretical advantages, the finite-sample performance of this approach under multicollinearity and outliers remains underexplored. Firstly, outliers in the y direction have been the main focus of earlier research on the RSE, not considering outliers in the x direction could substantially impact regression results. This study addresses this gap by considering outliers in both the y and x directions, providing a more thorough assessment of RSE robustness. Lastly, to extend the limited existing benchmark, we compare and evaluate the RSE performance with a wide range of robust and classical estimators. This extends existing benchmarking, which is limited in the current literature. Several Monte Carlo (MC) simulations were conducted, considering both normal and heavy-tailed error distributions, with sample sizes, multicollinearity levels, and outlier proportions varied. Performance was evaluated using bootstrap estimates of root mean squared error (RMSE) and bias. The MC simulation results indicated that the RSE outperformed other estimators under several scenarios where both multicollinearity and outliers are present. Finally, real data studies confirm the MC simulation results.

of 27

Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated