Computer Science and Mathematics

Sort by

Article
Computer Science and Mathematics
Probability and Statistics

Jonas Asplund,

Arkady Shemyakin

Abstract: COVID-19’s effects on mortality are hard to quantify. Issues with attribution can cause problems with resulting conclusions. Analyzing excess mortality addresses this concern and allows for the analysis of broader effects of the pandemic. We propose separate ARIMA models to analyze excess mortality for several countries. For the model of joint excess mortality, we suggest vine copulas with Bayesian pair copula selection. The present study examines weekly mortality data from 2019-2022 in the USA, Canada, France, Germany, Norway, and Sweden. Proposed ARIMA models have low lags and no residual autocorrelation. Only Norway’s residuals exhibited normality, while remaining suggest skewed Student-t distributions as a plausible fit. A vine copula model was then developed to model the association between the ARIMA residuals for different countries, with the countries farther apart geographically exhibiting weak or no association. The validity of fitted distributions and resulting vine copula was checked using 2023 data. Goodness of fit tests suggest that the fitted distributions were suitable, except for the USA, and that the vine copula used was also valid. We conclude that the time series models of COVID-19 excess mortality are viable. Overall, the suggested methodology seems suitable for creating joint forecasts of pandemic mortality for several countries or geographical regions.
Review
Computer Science and Mathematics
Probability and Statistics

Theodor-Nicolae Carp

Abstract: The Southeastern European country of Romania is among the most seismically active throughout the continent and currently faces a substantial seismic risk, particularly in the Vrancea region, which accounts for approximately 90% of the earthquakes. The present study evaluates the value of the statistical probability that a Richter Magnitude 8+ earthquake will occur in the area utilising the Poisson method, and introduces a novel statistical approach for approximating intervals of recurrence. Through the utilisation of the N [M(x+0.5)] = 2 * N (Mx) + 14 heuristic formula, in which N is equal to the number of years, it may be deduced that M8+ seismic events occur once every 174 years, on average, given that M7.5+ earthquakes generally occur once every 80 years. Historical records indicate that the most recent M8+ seismic event occurred in 1802 the latest, on Saint Paraskevi’s Day, meaning that at least 222 years have passed since the last event of such a magnitude, exceeding the statistical threshold with an extent of at least ~27.586%. If such an event only had a Richter Magnitude of 7.9, the current statistical risk for an M8+ event in Vrancea is likely even higher. Artificial Intelligence tools and simulation models that include Reason ChatGPT and DeepSearch Grok 3 beta have been utilised to provisionally assess the mentioned scientific data. Following such an assessment, the probability for an M8+ earthquake in the Vrancea region is projected to rise with 43.73% each hundred of years following the last M8+ earthquake in the area. Any consequence of such an event would implicate likely dam failures and numerous casualties and widespread damages and economic loss throughout Europe. The present study also takes into consideration existing tectonic influences from plates that are located in areas both neighbouring and afar, with a potential key example representing the Himalayan Mountains, which were formed as a result of a violent crash between the Indian and the Eurasian continental plates. It is important to mention that the Carpathian Mountains are part of the Himalayan-Alpine ring of mountains, alongside the Taurus, the Caucasian, the Alborz and the Sulaiman Mountains, and the fact that two major earthquakes in Vrancea (November 1940 and March 1977) occurred within 6-12 months after two major earthquakes in Turkey (December 1939 and November 1976) during the 20th century may indicate the existence of a transfer of tectonic stress originating from the Himalayan Mountains. Given that the Himalayan Mountains have continued to grow in altitude over the past millennia, that Mount Everest experienced a growth in altitude of ~4 metres in the 20th century (8848 m -> 8852 m) and that, following the occurrence of a Magnitude 7.8 earthquake in Nepal in 2015, Mount Everest experienced a loss of altitude of 3 metres (8852 m -> 8849 m), it may be important to consider the potential existence of an association between major tectonic changes within the Himalayan Mountains and rising probabilities of seismic events in proximal and distant areas within the Himalayan-Alpine ring, including the Carpathian Mountains, whose steep curvature in the Southeast may explain the existence of a significant fault underneath Vrancea. An analogy into geology may apply from biological systems, in which tectonic stresses can be transferred throughout lengthy distances, just as muscular contractions can significantly affect distant parts of the human body, and particularly the ones with a lower extent of stability. In the same manner, areas of significant mountain ring curvature that likely include the Vrancea region, may be particularly affected by any transmitted wave of seismic stress, regardless of the distance from the emitting source of such stress. As a result, even subtle increases in underlying stresses may substantially increase the risk of a major earthquake in Southeastern Romania, even for fault lines located several thousand of kilometres away, especially if such stresses originate from a central point of tectonic interaction between two major continental plates. Furthermore, the continuous movement of the African plate toward the Eurasian plate with an average speed of 2.15 cm per year could also have been gradually amplifying existing transfers of tectonic stress via the African, Asian and European plates, potentially having brought additional, indirect implications for both Asia Minor and Southeastern Europe, and the consideration of long-term tectonic changes around the Eastern Mediterranean would be particularly important. For the purpose of International Health and Safety guidelines, local, national and international authorities should not rule out the occurrence of an earthquake with a magnitude of up to M9, given the lack of historic data regarding the occurrence of an M8.5+ earthquake within the past 400 years [N (M8.5) = 2 * 174 + 14 = 362 years], let alone since records first began around 800 years ago. Any seismic event exceeding M8 in Vrancea would likely be felt throughout widespread areas of Europe, and the medium-high average depth of earthquakes in Vrancea (i.e. generally around 70-200 km underneath the sea level) would be a major factor to such a phenomenon.
Article
Computer Science and Mathematics
Probability and Statistics

Guillermo Martínez Flórez,

Rafael Bráz Azevedo Farias,

Carlos Javier Barrera Causil

Abstract: This paper introduces a flexible family of segmented proportional hazard distributions designed to model abrupt changes in hazard rates, which are often observed in medical and engineering applications. The proposed framework generalizes the proportional hazard transformation to segmented distributions, including new forms of the Rayleigh, log-logistic, Lindley, and Laplace PH models. We develop a maximum likelihood estimation procedure incorporating right censoring, a key feature of real-world survival data. The segmented hazard models effectively capture structural breaks in the hazard function, providing a robust alternative to traditional survival models that assume constant hazard dynamics. A case study based on IQ score data illustrates the improved flexibility and interpretability of the segmented Laplace PH model in detecting latent change points. The proposed models enhance the capacity to model complex survival patterns with abrupt changes in risk, contributing to a deeper understanding of dynamic hazard processes.
Article
Computer Science and Mathematics
Probability and Statistics

Wei Yang,

Bochen Zhang,

Jun Wang

Abstract: In this paper, based on the big data-driven economic cycle feature extraction method, an intelligent prediction model integrating Bi-LSTM, attention mechanism and Transformer architecture is constructed, and the performance of the model is enhanced by data preprocessing, feature engineering and hyperparameter optimization. The results show that the method outperforms traditional methods in trend identification, prediction accuracy and robustness, and is able to capture economic cycle inflection points more effectively. The applicability and reliability of the proposed method are verified by error analysis and model stability test.
Article
Computer Science and Mathematics
Probability and Statistics

Christopher Stroude Withers

Abstract: Normal moments are the building blocks of the Hermite polynomials, which in turn are the building blocks of the Edgeworth expansions for the distribution of parameter estimates. Isserlis (1918) gave the bivariate normal moments and 2 special cases of trivariate moments. Beyond that, convenient expressions for multivariate variate normal moments %, $\mu_n$, are still not available. We compare 3 methods for obtaining them, the most powerful being the differential method. We give simpler formulas for the bivariate moment than that of Isserlis, and explicit expressions for the general moments of dimensions 3 and 4.
Article
Computer Science and Mathematics
Probability and Statistics

Christopher Withers

Abstract: We give the Edgeworth-Cornish-Fisher expansions for the distribution, density and quantiles of the sample mean of a stationary process.
Short Note
Computer Science and Mathematics
Probability and Statistics

Yudong Tang

Abstract:

This article studies the terminal distribution of multi-variate Brownian motion where the correlations are not constant. In particular, with the assumption that the correlation function is driven by one factor, this article developed PDEs to quantify the moments of the conditional distribution of other factors. By using normal distribution and moment matching, we found a good approximation to the true Fokker Planck solution and the method provides a good analytic tractability and fast performance due to the low dimensions of PDEs to solve. This method can be applied to model correlation skew effect in quantitative finance, or other cases where a non-constant correlation is desired in modelling multi-variate distribution.

Article
Computer Science and Mathematics
Probability and Statistics

Moritz Sohns

Abstract: The development of stochastic integration theory throughout the 20th century has led to several definitions and approaches of the stochastic integral, particularly for predictable integrands and semimartingale integrators. This survey provides an overview of the two prominent approaches in defining the stochastic integral: the classical approach attributed to Itô, Meyer and Jacod, and the more contemporary functional analytical approach mainly developed by Bichteler and Protter. It also delves into the historical milestones and achievements in this area and analyzes them from a modern perspective. Drawing inspiration from the similarities of existing approaches, this survey introduces a new topology-based approach to the general vector-valued stochastic integral for predictable integrands and semimartingale integrators. This new approach provides a faster, simpler way to define the general integral and offers a self-contained derivation of its key properties without depending on semimartingale decomposition theorems.
Article
Computer Science and Mathematics
Probability and Statistics

S. Ejaz Ahmed,

Ersin Yilmaz,

Dursun Aydın

Abstract: This paper introduces ridge‐type kernel smoothing estimators for par-tially linear time‐series models that employ shrinkage estimation to han-dle autoregressive errors and severe multicollinearity in the parametric component. By combining a generalized ridge penalty with kernel smoothing, the proposed estimators solve inflated variances arising from linear dependencies among predictors, while also accounting for auto-correlation. Four well-known selection criteria—Generalized Cross Val-idation (GCV), Improved Akaike Information Criterion (AICc), Bayesian Information Criterion (BIC), and Risk Estimation via Classical Pilots (RECP)—are used to optimally choose both the bandwidth and shrinkage parameters. We provide closed‐form expressions for these estimators, establish their asymptotic properties, and present a risk‐based analysis that highlights the benefits of ordinary and positive‐part shrinkage ex-tensions. Simulation studies confirm that the introduced shrinkage ap-proaches outperforms standard methods when predictors are strongly correlated, with this advantage growing as sample sizes increase. An ap-plication to airline delay time‐series data further illustrates the efficacy and practical interpretability of the introduced methodology.
Article
Computer Science and Mathematics
Probability and Statistics

Martin Tunnicliffe,

Gordon Hunter

Abstract: We compare the “classical” equations of type-token systems, namely Zipf’s laws, Heaps’ law and the relationships between their indices, with data selected from the Standardized Project Gutenberg Corpus (SPGC). Selected items all exceed 100,000 word-tokens and are trimmed to 100,000 word-tokens each. With the most egregious anomalies removed, a dataset of 8,432 items is examined in terms of the relationships between the Zipf and Heaps indices computed using the Maximum Likelihood algorithm. Zipf’s second (size) law indices suggest that the types vs. frequency distribution is log-log convex, the high and low frequency indices showing weak but significant negative correlation. Under certain circumstances the classical equations work tolerably well, though the level of agreement depends heavily on the type of literature and the language (Finnish being notably anomalous). The frequency vs. rank characteristics exhibit log-log linearity in the “middle range” (ranks 100-1000), as characterized by the Kolmogorov-Smirnoff significance. For most items, the Heaps’ index correlates strongly with the low-frequency Zipf index in a manner consistent with classical theory, while the high frequency indices are largely uncorrelated. This is consistent with a simple simulation.
Article
Computer Science and Mathematics
Probability and Statistics

Cécile Barbachoux,

Joseph Kouneiher

Abstract: We develop a unified analytical framework linking kinetic theory, optimal transport, and entropy dissipation through the lens of hypocoercivity. Centered on the Boltzmann and Fokker–Planck equations, we analyze the emergence of macroscopic irreversibility from time-reversible dynamics via entropy methods, functional inequalities, and commutator estimates. The hypercoercivity approach provides sharp exponential convergence rates under minimal regularity, resolving degeneracies in kinetic operators through geometric control. We extend this framework to the study of hydrodynamic limits, collisional relaxation in magnetized plasmas, and the Vlasov–Poisson system for self-gravitating matter. Additionally, we explore connections with high-dimensional data analysis, where Wasserstein gradient flows, entropic regularization, and kinetic Langevin dynamics underpin modern generative and sampling algorithms. Our results highlight entropy as a structural and variational tool across both physical and algorithmic domains.
Article
Computer Science and Mathematics
Probability and Statistics

Julio Rives

Abstract: We assume that the probability mass function Pr(Z)=(2Z)^-2 is at Newcomb-Benford Law's root and the origin of positional notation. Under its tail, we find that the harmonic (global) Q-NBL for bijective numeration is Pr(b,q)=(q Hb)^-1, where q is a quantum (1≤q≤b), Hn is the nth harmonic number, and b is the bijective base. Under its tail, the logarithmic (local) R-NBL for bijective numeration is Pr(r,d)=Log(r+1,1+1/d), where d≤r ≪ b, being d a digit of a local complex system’s bijective radix r. We generalize both lows to calculate the probability mass of the leading quantum/digit of a chain/numeral of a given length and the probability mass of a quantum/digit at a given position, verifying that the global and local NBL are length- and position-invariant in addition to scale-invariant. In the framework of bijective numeration, we also prove that the sums of Kempner’s series conform to the global Newcomb-Benford Law and suggest a natural resolution for the precision of a universal positional notation system
Article
Computer Science and Mathematics
Probability and Statistics

Priyantha Wijayatunga

Abstract: Jeffreys--Lindley paradox is a case where frequentist and Bayesian hypothesis testing methodologies contradict with each other. This has caused confusion among data analysts for selecting a methodology for their statistical inference tasks. Though the paradox goes back to mid 1930's so far there hasn't been a satisfactory resolution given for it. In this paper we show that it arises mainly due to the simple fact that, in the frequentist approach, the difference between the hypothesized parameter value and the observed estimate of the parameter is assessed in terms of the standard error of the estimate, no matter what the actual numerical difference is and how small the standard error is, whereas in the Bayesian methodology it has no effect due to the definition of the Bayes factor in the context, even though such an assessment is present. In fact, the paradox is an instance of conflict between statistical and practical significance and a result of using a sharp null hypothesis to approximate an acceptable small range of values for the parameter. Occurrence of type-I error that is allowed in frequentist methodology plays important role in the paradox. Therefore, the paradox is not a conflict between two inference methodologies but an instance of not agreeing their conclusions.
Article
Computer Science and Mathematics
Probability and Statistics

Madhavan Balasubramaniam

Abstract: This article develops a Bayesian Auto-Regressive forecasting model for predicting global surface temperatures and compares its performance with a frequentist approach. Using the NASA GISS Surface Temperature dataset, we first implement an AR(4) model and then incorporate trend and seasonality components. Results show that the Bayesian approach improves generalization and provides probabilistic parameter estimates, making it more robust for long-term forecasting.
Article
Computer Science and Mathematics
Probability and Statistics

Iman Attia

Abstract: In the present paper, the author discusses the Generalized Odd Median Base Unit Rayleigh (GOMBUR) in relation to the Median Based Unit Rayleigh (MBUR) to evaluate the additive value of the new shape parameter on the estimation process as regards validity indices, goodness of fit statistics, estimated variances of the estimated parameters and their standard errors. This evaluation is conducted on real datasets. Each dataset is analyzed by fitting different competitor distributions in addition to MBUR and GOMBUR distributions. The parameter estimation is achieved by applying Maximum likelihood estimator (MLE) using Nelder Mead optimizer.
Article
Computer Science and Mathematics
Probability and Statistics

Wenzheng Tao,

Sarang Joshi,

Ross Whitaker

Abstract: Functional data, including one-dimensional curves and higher-dimensional surfaces, have become increasingly prominent across scientific disciplines. They offer a continuous perspective that captures subtle dynamics and richer structures compared to discrete representations, thereby preserving essential information and facilitating more natural modeling of real-world phenomena, especially in sparse or irregularly sampled settings. A key challenge lies in identifying low-dimensional representations and estimating covariance structures that capture population statistics effectively. We propose a novel Bayesian framework with a nonparametric kernel expansion and a sparse prior, enabling direct modeling of measured data and avoiding the artificial biases from regridding. Our method, Bayesian scalable functional data analysis (BSFDA), automatically selects both subspace dimensionalities and basis functions, reducing computational overhead through an efficient variational optimization strategy. We further propose a faster approximate variant that maintains comparable accuracy but accelerates computations significantly on large-scale datasets. Extensive simulation studies demonstrate that our framework outperforms conventional techniques in covariance estimation and dimensionality selection, showing resilience to high dimensionality and irregular sampling. The proposed methodology proves effective for multidimensional functional data and showcases practical applicability in biomedical and meteorological datasets. Overall, BSFDA offers an adaptive, continuous, and scalable solution for modern functional data analysis across diverse scientific domains.
Article
Computer Science and Mathematics
Probability and Statistics

Moritz Sohns

Abstract: We introduce a new coherent risk measure called the minimal entropy risk measure. This measure is based on the minimal entropy σ-martingale measure, which itself is inspired by the minimal entropy martingale measure well-known in option pricing. While the minimal entropy martingale measure is commonly used for pricing and hedging, the minimal entropy σ-martingale measure has not previously been studied, nor has it been analyzed as a traditional risk measure. We address this gap by clearly defining this new risk measure and examining its fundamental properties. In addition, we revisit the entropic risk measure, typically expressed through an exponential formula. We provide an alternative definition using a supremum over Kullback–Leibler divergences, making its connection to entropy clearer. We verify important properties of both risk measures, such as convexity and coherence, and extend these concepts to dynamic situations. We also illustrate their behavior in scenarios involving optimal risk transfer. Our results link entropic concepts with incomplete-market pricing and demonstrate how both risk measures share a unified entropy-based foundation once market constraints are considered.
Article
Computer Science and Mathematics
Probability and Statistics

Mihai Covaci,

Brindusa Covaci

Abstract: The paper is a plea in favor of statistical quantitative analyzes in the modeling of the mountain economic development. The arguments in favor of the use of statistics in the modeling of mountain businesses reside in the need to map the present to establish future trends. The future is built through constructs, and abstractions made only conceptually intuitively - without quantitative data validation - can ensure sustainability or failure in predicting the trends of the current society. Science and reality in the 21st century are metamorphosing more rapidly and chaotically than until now, especially against the background of the insertion of new technologies in contemporary society, the dynamics of various phenomena being difficult to predict without applicative substantiation. As a model, the paper presents statistical analyzes and econometric models carried out at the European and national level for mountain entrepreneurship in the EU, related to the supporting sectors of mountain agriculture, respectively industry, services, and the quaternary sector. The authors of the study carried out the analysis for the indicators Population of active enterprises in t - number, Establishments of enterprises in t - number, Dissolutions of enterprises in t - number, Newly established enterprises in t-3 that survived until t - number. The indicators presented, analyzed for the period 2008-2018, and forecasted for the period 2019-2026, integrate the mountain business environment into the considerable development of the demographics of European businesses. The results of the paper demonstrate the importance of using statistical analyzes and econometric models in establishing the reality and forecasting mountain economic phenomena. Specific to the European business environment, the results of the paper present a general picture favorable to mountain entrepreneurship.
Article
Computer Science and Mathematics
Probability and Statistics

Jonas Šiaulys,

Aistė Elijio,

Remigijus Leipus,

Neda Nakliuda

Abstract: The paper investigates the randomly stopped sums. Primary random variables are supposed to be nonnegative, independent, and identically distributed, whereas the stopping moment is supposed to be a~nonnegative, integer-valued, and nondegenerate at zero random variable, independent of primary random variables. We find the conditions under which dominated variation or extended regularity of randomly stopped sum determines the stopping moment to belong to the class of dominatedly varying distributions. In the case of extended regularity, we derive the asymptotic inequalities for the ratio of tails of the distributions of randomly stopped sums and a stopping moment. The obtained results generalize analogous statements recently obtained for regularly varying distributions. Compared with the previous studies, we apply new methods to the proofs of the main statements. At the end, we provide one example that illustrates the theoretical results.
Article
Computer Science and Mathematics
Probability and Statistics

Ayuba Jack Alhassan,

S. Ejaz Ahmed,

Dursun Aydin,

Ersin Yilmaz

Abstract: This study includes a comprehensive evaluation of six penalty estimation strategies for partially linear models (PLRMs), focusing on their performance in the presence of multicollinearity and their ability to handle both parametric and nonparametric components. The methods under consideration include Ridge regression, Lasso, Adaptive Lasso (aLasso), smoothly clipped absolute deviation (SCAD), ElasticNet, and minimax concave penalty (MCP). In addition to these established methods, we also incorporate Stein-type shrinkage estimation techniques that are standard and positive shrinkage, and assess their effectiveness in this context. To estimate the PLRMs, we considered a kernel smoothing technique grounded in penalized least squares. Our investigation involves a theoretical analysis of the estimators' asymptotic properties and a detailed simulation study designed to compare their performance under a variety of conditions, including different sample sizes, numbers of predictors, and levels of multicollinearity. The simulation results reveal that aLasso and shrinkage estimators, particularly the positive shrinkage estimator, consistently outperform the other methods in terms of Mean Squared Error (MSE) relative efficiencies (RE), especially when the sample size is small, and multicollinearity is high. Furthermore, we present a real data analysis using the Hitters dataset to demonstrate the applicability of these methods in a practical setting. The results of the real data analysis align with the simulation findings, highlighting the superior predictive accuracy of aLasso and the shrinkage estimators in the presence of multicollinearity. The findings of this study offer valuable insights into the strengths and limitations of these penalty and shrinkage strategies, guiding their application in future research and practice involving semiparametric regression.

of 21

Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2025 MDPI (Basel, Switzerland) unless otherwise stated