Introduction
Fitting data to statistical distributions is crucial for understanding the underlying processes that generate the data. Researchers have successfully developed a variety of distributions to describe complex real-world phenomena effectively. Before 1980, the primary techniques for generating distributions included solving systems of differential equations, using transformations, and applying quantile function strategies. Since 1980, the focus has shifted to adding new parameters to existing distributions or combining known distributions. These methods have resulted in a wide range of adaptable and flexible distributions that can accommodate various types of asymmetrical data and outliers in data sets. Fitting distributions to data improves modeling in analyses related to regression, survival analysis, reliability analysis, and time series analysis.
A multitude of real-world phenomena can be elegantly captured as proportions, ratios, or fractions nestled within the bounded interval (0,1). These captivating representations are not merely abstract concepts; they reflect the intricate relationships found in various fields such as biology, where the delicate balance of ecosystems is analyzed; finance, where the ebb and flow of market ratios unfold; and mortality rates, which provide profound insights into human health and longevity. Additionally, recovery rates in medical science showcase the resilience of life, while economics delves into the nuanced distributions of wealth and resources. Engineering and hydrology further enrich this tapestry, modeling everything from structural integrity to the flow of water in our environments. The measurement sciences, too, rely on these continuous distributions, breathing life into data that inform our understanding of the world.
Some of these distributions include: the Johnson SB distribution (1), Beta distribution (2), Unit Johnson distribution (3), Topp-Leone distribution (4), Unit Gamma distribution (5),(6),(7),(8), Unit Logistic distribution (9), Kumaraswamy distribution (10), Unit Burr-III distribution (11), Unit Modified Burr-III distribution (12), Unit Burr-XII distribution (13), Unit-Gompertz distribution (14), Unit-Lindley distribution (15), Unit-Weibull distribution (16), Unit-Birnbaum-Saunders (17) and Unit Muth distribution (18).
This paper is structured into several sections for clarity and coherence. Section 1 provides a comprehensive discussion of the methodology employed to derive the new distribution. Section 2 delves into its fundamental characteristics, including the probability density function (PDF), cumulative distribution function (CDF), survival function (S), hazard function (HF), reversed hazard function (RHF), and quantile function. Section 3 offers an in-depth discussion that encompasses an analysis of real data as well as a detailed examination of the findings. In conclusion, Section 4 provides a comprehensive overview of our findings and offers valuable recommendations for future research, inviting further exploration and innovation in the field.
Section 2: Some Properties of the Generalized Odd MBUR Distribution
Theorem 2: the cumulative distribution function (CDF) of the generalized odd MBUR is:
for version 1 & for version 2.
Apply the transformation of equation 12 and substitute in equation 13 yields equation 14
Apply the transformation of equation 12 and substitute in equation 15 yields equation 16
Lemma 1: the survival function (S) for version 1 is shown in equation 17
Lemma 2: the survival function (S) for version 2 is shown in equation 18
Lemma 3: the Hazard function or rate (HF or hr) for version 1 is shown in equation 19
Lemma 4: the Hazard function or rate (HF or hr) for version 2 is shown in equation 20
Lemma 5: the reversed hazard function (RHF) or reversed hazard rate (rhr) for version 1 is shown in equation 21
Lemma 6: the reversed hazard function (RHF) or reversed hazard rate (rhr) for version 2 is shown in equation 22
The quantile function of the distribution has no closed explicit form.
Figure 1,
Figure 2,
Figure 3,
Figure 4,
Figure 5,
Figure 6,
Figure 7,
Figure 8 and
Figure 9 illustrate the PDF of the first version for different values of n {2, 3, 5, 10, 20, 30, 40, 50, 84} and different values of alpha {0.452, 1, 1.5, 1.8, 2.2, 4.5, 10.5}
Figure 1.
shows PDF of the first version for different alpha and n=2. For alpha less than 1, the PDF is left skewed. If alpha is one the PDF is symmetric around y=0.5. For alpha values larger than 1 the PDF exhibits right skewness or decreasing .
Figure 1.
shows PDF of the first version for different alpha and n=2. For alpha less than 1, the PDF is left skewed. If alpha is one the PDF is symmetric around y=0.5. For alpha values larger than 1 the PDF exhibits right skewness or decreasing .
Figure 2.
shows PDF of the first version for different levels of alpha and n=3.
Figure 2.
shows PDF of the first version for different levels of alpha and n=3.
Figure 3.
shows PDF of the first version for different levels of alpha and n=5.
Figure 3.
shows PDF of the first version for different levels of alpha and n=5.
Figure 4.
shows PDF of the first version for different levels of alpha and n=10.
Figure 4.
shows PDF of the first version for different levels of alpha and n=10.
Figure 5.
shows PDF of the first version for different levels of alpha and n=20.
Figure 5.
shows PDF of the first version for different levels of alpha and n=20.
Figure 6.
shows PDF of the first version for different levels of alpha and n=30.
Figure 6.
shows PDF of the first version for different levels of alpha and n=30.
Figure 7.
shows PDF of the first version for different levels of alpha and n=40.
Figure 7.
shows PDF of the first version for different levels of alpha and n=40.
Figure 8.
shows PDF of the first version for different levels of alpha and n=50.
Figure 8.
shows PDF of the first version for different levels of alpha and n=50.
Figure 9.
shows PDF of the first version for different levels of alpha and n=84.
Figure 9.
shows PDF of the first version for different levels of alpha and n=84.
The figures of the PDF of the first version illustrates that for values of n larger than 30 but not exceeding 84 and associated with different values of alpha not exceeding 4, these PDFs are more or less symmetric around different values of the variable occupying the whole unit range. This is an advantage of the new added parameter that generalizes the distribution.
Figure 10,
Figure 11,
Figure 12,
Figure 13,
Figure 14,
Figure 15,
Figure 16,
Figure 17,
Figure 18,
Figure 19 and
Figure 20 illustrate the PDF of the second version for different values of n {2, 3, 5, 10, 20, 30, 40, 50, 100, 150, 170} and different values of alpha {0.452, 1, 1.5, 1.8, 2.2, 4.5, 10.5}
Figure 10.
shows PDF of the second version for different levels of alpha and n=2.
Figure 10.
shows PDF of the second version for different levels of alpha and n=2.
Figure 11.
shows PDF of the second version for different levels of alpha and n=3.
Figure 11.
shows PDF of the second version for different levels of alpha and n=3.
Figure 12.
shows PDF of the second version for different levels of alpha and n=5.
Figure 12.
shows PDF of the second version for different levels of alpha and n=5.
Figure 13.
shows PDF of the second version for different levels of alpha and n=10.
Figure 13.
shows PDF of the second version for different levels of alpha and n=10.
Figure 14.
shows PDF of the second version for different levels of alpha and n=20.
Figure 14.
shows PDF of the second version for different levels of alpha and n=20.
Figure 15.
shows PDF of the second version for different levels of alpha and n=30.
Figure 15.
shows PDF of the second version for different levels of alpha and n=30.
Figure 15.
shows PDF of the second version for different levels of alpha and n=40.
Figure 15.
shows PDF of the second version for different levels of alpha and n=40.
Figure 16.
shows PDF of the second version for different levels of alpha and n=50.
Figure 16.
shows PDF of the second version for different levels of alpha and n=50.
Figure 17.
shows PDF of the second version for different levels of alpha and n=100.
Figure 17.
shows PDF of the second version for different levels of alpha and n=100.
Figure 18.
shows PDF of the second version for different levels of alpha and n=150.
Figure 18.
shows PDF of the second version for different levels of alpha and n=150.
Figure 19.
shows PDF of the second version for different levels of alpha and n=170.
Figure 19.
shows PDF of the second version for different levels of alpha and n=170.
The figures of the PDF of the second version illustrates that for values of n larger than 50 but not exceeding 170 and associated with different values of alpha not exceeding 4, these PDFs are more or less symmetric around different values of the variable occupying the whole unit range. This is an advantage of the new added parameter that generalizes the distribution
Figure 20 and
Figure 21 illustrate the CDF of the first version for different values of n {2, 3, 5, 10, 20, 30, 40, 50, 100, 150, 170} and different values of alpha {0.272, 0.614, 1, 1.3, 1.8, 2.2, 3.5, 10.5}
Figure 20.
shows CDF of the first version for different levels of alpha and n.
Figure 20.
shows CDF of the first version for different levels of alpha and n.
Figure 21.
shows CDF of the first version for different levels of alpha and n.
Figure 21.
shows CDF of the first version for different levels of alpha and n.
Figure 22 and
Figure 23 illustrate the CDF of the second version for different values of n {2, 3, 5, 10, 20, 30, 40, 50, 100, 150, 170} and different values of alpha {0.272, 0.614, 1, 1.3, 1.8, 2.2, 3.5, 10.5}
Figure 22.
shows CDF of the second version for different levels of alpha and n.
Figure 22.
shows CDF of the second version for different levels of alpha and n.
Figure 23.
shows CDF of the second version for different levels of alpha and n.
Figure 23.
shows CDF of the second version for different levels of alpha and n.
Figure 24 and
Figure 25 illustrate the survival function of the first version for different values of n {2, 3, 5, 10, 20, 30, 40, 50, 100, 150, 170} and different values of alpha {0.272, 0.614, 1, 1.3, 1.8, 2.2, 3.5, 10.5}
Figure 24.
shows survival function of the first version for different levels of alpha and n.
Figure 24.
shows survival function of the first version for different levels of alpha and n.
Figure 25.
shows survival function of the first version for different levels of alpha and n.
Figure 25.
shows survival function of the first version for different levels of alpha and n.
Figure 26 and
Figure 27 illustrate the survival function of the second version for different values of n {2, 3, 5, 10, 20, 30, 40, 50, 100, 150, 170} and different values of alpha {0.272, 0.614, 1, 1.3, 1.8, 2.2, 3.5, 10.5}
Figure 26.
shows survival function of the second version for different levels of alpha and n.
Figure 26.
shows survival function of the second version for different levels of alpha and n.
Figure 27.
shows survival function of the second version for different levels of alpha and n.
Figure 27.
shows survival function of the second version for different levels of alpha and n.
Figure 28,
Figure 29,
Figure 30,
Figure 31,
Figure 32,
Figure 33,
Figure 34,
Figure 35 and
Figure 36 illustrate the hazard rate function (hr) of the first version for different values of n {2, 3, 5, 10, 20, 30, 40, 50, 84} and different values of alpha {0.272, 0.614, 1, 1.3, 1.8, 2.2, 3.5, 10.5}
Figure 28.
shows hazard rate function of the first version for different levels of alpha and n=2.
Figure 28.
shows hazard rate function of the first version for different levels of alpha and n=2.
Figure 29.
shows hazard rate function of the first version for different levels of alpha and n=3.
Figure 29.
shows hazard rate function of the first version for different levels of alpha and n=3.
Figure 30.
shows hazard rate function of the first version for different levels of alpha and n=5.
Figure 30.
shows hazard rate function of the first version for different levels of alpha and n=5.
Figure 31.
shows hazard rate function of the first version for different levels of alpha and n=10.
Figure 31.
shows hazard rate function of the first version for different levels of alpha and n=10.
Figure 32.
shows hazard rate function of the first version for different levels of alpha and n=20.
Figure 32.
shows hazard rate function of the first version for different levels of alpha and n=20.
Figure 33.
shows hazard rate function of the first version for different levels of alpha and n=30.
Figure 33.
shows hazard rate function of the first version for different levels of alpha and n=30.
Figure 34.
shows hazard rate function of the first version for different levels of alpha and n=40.
Figure 34.
shows hazard rate function of the first version for different levels of alpha and n=40.
Figure 35.
shows hazard rate function of the first version for different levels of alpha and n=50.
Figure 35.
shows hazard rate function of the first version for different levels of alpha and n=50.
Figure 36.
shows hazard rate function of the first version for different levels of alpha and n=84.
Figure 36.
shows hazard rate function of the first version for different levels of alpha and n=84.
Figure 37,
Figure 38,
Figure 39,
Figure 40,
Figure 41,
Figure 42,
Figure 43,
Figure 44,
Figure 45,
Figure 46 and
Figure 47 illustrate the hazard rate function (hr) of the second version for different values of n {2, 3, 5, 10, 20, 30, 40, 50, 100,150,170} and different values of alpha {0.272, 0.614, 1, 1.3, 1.8, 2.2, 3.5, 10.5}
Figure 37.
shows hazard rate function of the second version for different levels of alpha and n=2.
Figure 37.
shows hazard rate function of the second version for different levels of alpha and n=2.
Figure 38.
shows hazard rate function of the second version for different levels of alpha and n=3.
Figure 38.
shows hazard rate function of the second version for different levels of alpha and n=3.
Figure 39.
shows hazard rate function of the second version for different levels of alpha and n=5.
Figure 39.
shows hazard rate function of the second version for different levels of alpha and n=5.
Figure 40.
shows hazard rate function of the second version for different levels of alpha and n=10.
Figure 40.
shows hazard rate function of the second version for different levels of alpha and n=10.
Figure 41.
shows hazard rate function of the second version for different levels of alpha and n=20.
Figure 41.
shows hazard rate function of the second version for different levels of alpha and n=20.
Figure 42.
shows hazard rate function of the second version for different levels of alpha and n=30.
Figure 42.
shows hazard rate function of the second version for different levels of alpha and n=30.
Figure 43.
shows hazard rate function of the second version for different levels of alpha and n=40.
Figure 43.
shows hazard rate function of the second version for different levels of alpha and n=40.
Figure 44.
shows hazard rate function of the second version for different levels of alpha and n=50.
Figure 44.
shows hazard rate function of the second version for different levels of alpha and n=50.
Figure 45.
shows hazard rate function of the second version for different levels of alpha and n=100.
Figure 45.
shows hazard rate function of the second version for different levels of alpha and n=100.
Figure 46.
shows hazard rate function of the second version for different levels of alpha and n=150.
Figure 46.
shows hazard rate function of the second version for different levels of alpha and n=150.
Figure 47.
shows hazard rate function of the second version for different levels of alpha and n=170.
Figure 47.
shows hazard rate function of the second version for different levels of alpha and n=170.
The figures of the hazard functions depict the different shapes the hazard function can attain. These shapes ranges from increasing, bath tub and J shaped appearance. The new finding is that when alpha level is large than or equal to one and with increasing n values the hazard rates exhibit oscillating pattern before it starts to approach infinity at the upper end of the unit interval. For example, at the same alpha level 1.8 and with increased values of n, this oscillating pattern is attained at lower values of the random variable . In other words, when n=50, the oscillating pattern is attained at y=0.8, when n=100, this oscillating patter is achieved at y= 0.6, when n=150, the oscillating pattern is captured at y=0.5, when n=170, the oscillating pattern is realized at y values between 0.4 and 0.5. This pattern is almost present at different values of n and alpha. The oscillating pattern in the different graphs abruptly end because the hazard rate values are infinite at these interrupts.
Another new finding is that at higher values of alpha more than two accompanied with higher n values between 20 and 30, the hazard rate function is concave; it starts at zero and then increasing until a specific peak after which it starts to fall down till reaching an oscillating pattern then it starts to approach infinity. This is obvious at alpha=2.2 and when n value is between 20 and 30 and thereafter up to n=150
Theorem 3: the rth raw moment of the first version of the distribution is given by
Proof: the expectation of the r
th moment in equation (23) is obtained with the help of the transformation mentioned in equation (12)
Theorem 4: the rth raw moment of the second version of the distribution is given by
Proof: the expectation of the r
th moment in equation (24) is obtained with the help of the transformation mentioned in equation (12)
Section 3: Real Data Analysis
Table 1 shows the flood data. These are 20 observations regarding the maximum flood level of the Susquehanna River at Harrisburg, Pennsylvania. (19).
Table 2 shows the descriptive statistics of the data with right skewness and mild positive excess kertosis (leprokurtic).
Table 3 shows the statistical analysis for first version of the Generalized Odd MBUR distribution and its competitors. It outperforms all of them as it has the highest value of log-likelihood and the most negative values for AIC, CAIC, and BIC. The K-S test fails to reject the null hypothesis which supports that the distribution fits the data well with p value 0.3297. Both the AD statistics and the CVM statistics has the lowest values among the other distribution which favors better fit of the data. These are the results for fitting the first version of the distribution.
The P-values for the estimators of alpha and beta parameters of the Beta distribution and Kumaraswamy distributions are significant .
P-values for the estimators of alpha of the MBUR distribution is significant .
P-values for the estimators of theta of the Unit Lindley distribution is significant .
The generalization of the MBUR improves the statistical analysis of the previously analyzed data with the MBUR. It leverages the indices and this generalization makes the MBUR outperforms the beta and the Kumaraswamy distribution.
Figure 48 shows the histogram and the fitted Generalized Odd MBUR which shows marked enhancement for fitting the data than the MBUR shown in
Figure 49. This is also true if comparing the theoretical CDF of the Generalized Odd MBUR in
Figure 50 than the one in
Figure 51.
Figure 48.
shows the histogram of the flood data and the fitted Generalized Odd MBUR distribution ( first version ) .
Figure 48.
shows the histogram of the flood data and the fitted Generalized Odd MBUR distribution ( first version ) .
Figure 49.
shows the histogram of the flood data and the fitted MBUR distribution and other competitors.
Figure 49.
shows the histogram of the flood data and the fitted MBUR distribution and other competitors.
Figure 50.
shows the empirical and the theoretical CDF after fitting the generalized Odd MBUR distribution (first version).
Figure 50.
shows the empirical and the theoretical CDF after fitting the generalized Odd MBUR distribution (first version).
Figure 51.
shows the eCDF vs. theoretical CDF of the distributions for the flood data set. The purple curve is the MBUR. The generalization of MBUR markedly improves fitting the data as shown in figure 50.
Figure 51.
shows the eCDF vs. theoretical CDF of the distributions for the flood data set. The purple curve is the MBUR. The generalization of MBUR markedly improves fitting the data as shown in figure 50.
Table 4 shows the statistical analysis for second version of the Generalized Odd MBUR distribution and its competitors. The same results are obtained apart from the estimated n parameter value and its associated variance and standard error. Otherwise, it outperforms the other distributions for the same reasons as the first version of it. The value of the estimated n parameter is larger (17.2087) compared with (8.0302) while the value of the estimated alpha is the same(1.1168).
The P-values for the estimators of alpha and beta parameters of the Beta distribution and Kumaraswamy distributions are significant .
P-values for the estimators of alpha of the MBUR distribution is significant .
P-values for the estimators of theta of the Unit Lindley distribution is significant .
Figure 52 shows the fitted PDF for the second version with slight difference than figure 48. The peak is larger than in figure 48.
Figure 53 shows the CDF for this version.
Figure 52.
shows the histogram of the flood data and the fitted Generalized Odd MBUR distribution (second version).
Figure 52.
shows the histogram of the flood data and the fitted Generalized Odd MBUR distribution (second version).
Figure 53.
shows the empirical and the theoretical CDF after fitting the Generalized Odd MBUR distribution (second version).
Figure 53.
shows the empirical and the theoretical CDF after fitting the Generalized Odd MBUR distribution (second version).