Preprint
Technical Note

This version is not peer-reviewed.

A New Index for Measuring the Non‐Uniformity of a Probability Distribution

A peer-reviewed article of this preprint also exists.

Submitted:

24 June 2025

Posted:

26 June 2025

You are already at the latest version

Abstract
This technical note proposes a new index, the “distribution non-uniformity index (DNUI)”, for quantitatively measuring the non-uniformity or unevenness of a probability distribution relative to a baseline uniform distribution. The proposed DNUI is a standardized, distance-based metric ranging between 0 and 1, with 0 indicating perfect uniformity and 1 indicating extreme non-uniformity. It is applicable to both discrete and continuous probability distributions. Several examples are presented to demonstrate its application and to compare it with two classical evenness measures: Simpson’s evenness and Buzas & Gibson’s evenness.
Keywords: 
;  ;  ;  

1. Introduction

Non-uniformity, or unevenness, is an inherent characteristic of probability distributions, as outcomes or values from a probability system are typically not distributed uniformly or evenly. Although the shape of a distribution can offer an intuitive sense of its non-uniformity, researchers often require a quantitative measure to assess this property. Such a measure is valuable for constructing distribution models and for comparing the non-uniformity across different distributions in a consistent and interpretable way.
A probability distribution is considered uniform when all outcomes have equal probability in the discrete case, or when the probability density is constant in the continuous case. Therefore, the uniform distribution serves as the natural baseline for assessing the non-uniformity of any given distribution, and non-uniformity is referred to as the degree to which a distribution deviates from this uniform benchmark. It is essential to ensure that the distribution being evaluated and the baseline uniform distribution share the same support. This requirement is especially important in the continuous case, where a fixed and clearly defined support is crucial for meaningful comparison.
The Kullback–Leibler (KL) divergence can be employed as a metric for measuring the non-uniformity of a given distribution by quantifying how different the distribution is from a baseline uniform distribution. A small KL divergence value indicates that the distribution is close to uniform. The KL divergence is applicable in both the discrete case and in the continuous case provided that the support is fixed. However, one significant drawback of using the KL divergence in this context is that it is unbounded. While a KL divergence value of zero represents perfect uniformity, there is no natural upper limit that allows us to contextualize how “non-uniform” a distribution is. This lack of an upper bound can make interpretation challenging, especially when comparing different distributions or when the scale of the divergence matters.
In recent work, Rajaram et al. (2024a, b), proposed a measure called the “degree of inequality (DOI)” to quantify how evenly the probability mass or density is distributed across available outcomes or support. Specifically, they defined the DOI for a partial distribution on a fixed interval as the ratio of the exponential of the Shannon entropy to the coverage probability of that interval (Rajaram et al., 2024a, b)
D O I = D P c P = 1 c P e x p ( H P ) ,
where the subscript “P” denotes “part”, referring to the partial distribution on the fixed interval, c P is the coverage probability of the interval, H P is the entropy of the partial distribution, and D P = e x p ( H P ) is the entropy-based diversity of the partial distribution. When the entire distribution is considered, c P = 1 , and thus, the DOI equals the entropy-based diversity e x p ( H ) . It should be noted that the DOI is neither standardized nor normalized and does not explicitly measure the deviation of the given distribution relative to a uniform benchmark.
Classical evenness measures, such as Simpson’s evenness and Buzas & Gibson’s evenness, are essentially diversity ratios. For a discrete random variable X with probability mass function (PMF)   P ( x ) and n possible outcomes, Simpson’s evenness is defined as (e.g., Roy & Bhattacharya, 2024)
E S 2 = 1 / i = 1 n [ P x i ] 2 n
where 1 / i = 1 n [ P x i ] 2   is Simpson’s diversity, representing the effective number of distinct elements in the probability system { X ,   P ( x ) } , and n is the maximum diversity that corresponds to a uniform distribution with PMF 1/n. The concept of effective number is the core of diversity measures in biology (Jost, 2006).
Buzas & Gibson’s evenness is defined as (Buzas & Gibson, 1969)
E B G = e x p [ H X ] e x p [ l n ( n ) ] = e x p [ H X ] n
where H X is the Shannon entropy of X, H X = i = 1 n P x i ln P x i , and l n ( n ) is the extropy of the uniform distribution with PMF 1/n. The exponential of the Shannon entropy e x p [ H X ] is the entropy-based diversity, and it also considered to be an effective number of elements in the probability system { X ,   P ( x ) } .
Unlike the DOI, which is not normalized, both E S 2 and E B G are normalized by n, the maximum diversity corresponding to the baseline uniform distribution. Therefore, these indices range between 0 and 1, with 0 indicating extreme unevenness and 1 indicating perfect evenness.
However, as Gregorius and Gillet (2021) pointed out, “Diversity-based methods of assessing evenness cannot provide information on unevenness, since measures of diversity generally do not produce characteristic values that are associated with states of complete unevenness.” This limitation arises because diversity measures are primarily designed to capture internal distribution characteristics, such as concentration and relative abundance within the distribution. For example, the quantity i = 1 n [ P x i ] 2 is often called “repeat rate” or Simpson index (Rousseau, 2018), or Simpson concentration (Jost, 2006); it has historically been used as a measure of concentration (Rousseau, 2018). Moreover, since diversity metrics are not constructed within a comparative distance framework, they inherently lack the ability to quantify deviations from uniformity in a meaningful or interpretable way. This limitation significantly diminishes their effectiveness when the goal is specifically to detect or describe high degrees of non-uniformity.
It is important to emphasize that the non-uniformity or unevenness of a distribution should be quantified by explicitly measuring its distance from the ideal of perfect uniformity. However, neither the DOI nor the evenness indices E S 2 and E B G ​ calculate an explicit distance relative to a uniform benchmark.
The aim of this study is to develop a new standardized, distance-based index that can effectively quantify the non-uniformity or unevenness of a probability distribution. In the following sections, Section 2 describes the proposed distribution non-uniformity index (DNUI). Section 3 presents several examples. Section 4 provides discussion and conclusion.

2. The Proposed Distribution Non-Uniformity Index (DNUI)

The mathematical formulation of the proposed distribution non-uniformity index (DNUI) differs for discrete and continuous random variables.

2.1. Discrete Cases

Consider a discrete random variable X with probability mass function (PMF)   P ( x ) and n possible outcomes. Let X U denote the uniform distribution with the same possible outcomes, so that P U ( x )   = 1 n for all x. We use this uniform distribution as the baseline for measuring the non-uniformity of the distribution of X.
The difference between the two PDFs P ( x ) and P U ( x ) is given by
Δ P ( x ) = P ( x ) P U ( x ) = P ( x ) 1 n .
Thus, P ( x ) can be written as
P ( x ) = Δ P ( x ) + 1 n .
Taking squares on both sides of Eq. (5) yields
P ( x ) 2 = Δ P ( x ) 2 + 2 n Δ P ( x ) + 1 n 2 .
Then, taking the expectation on both sides of Eq. (6) yields
E [ P ( x ) 2 ] = E ( Δ P ( x ) 2 ) + 2 n E ( Δ P ( x ) ) + 1 n 2 = ω P ( x ) 2 + 1 n 2 ,
where ω P ( x ) 2 is called the total variance and ω P ( x ) is called the total deviation
ω P ( x ) = E ( Δ P ( x ) 2 ) + 2 n E ( Δ P ( x ) ) .
where E ( Δ P ( x ) 2 ) is the variance of P ( x ) relative to P U ( x ) , given by
E ( Δ P ( x ) 2 ) = E { [ P ( x ) P U ( x ) ] 2 } = i = 1 n P x i [ P x i 1 n ] 2
E ( Δ P ( x ) ) is the bias of P ( x ) relative to P U ( x ) , given by
E ( Δ P ( x ) ) = E [ P ( x ) P U ( x ) ] = i = 1 n P x i 2 1 n = β X 1 n
where β X is called the (discrete) informity of X in the theory of informity proposed by Huang (2025), which is the expectation of the PMF. The informity of the baseline uniform distribution X U is β X U = P U x = 1 n .
Definition 1. 
The proposed DNUI (denoted by ρ ( X )
) for the distribution of X is given by
ρ X = ω P ( x ) E [ P ( x ) 2 ] = E [ P ( x ) 2 ] 1 n 2 E [ P ( x ) 2 ] = E ( Δ P ( x ) 2 ) + 2 n E ( Δ P ( x ) ) E ( Δ P ( x ) 2 ) + 2 n E ( Δ P ( x ) ) + 1 n 2   ,
where E [ P ( x ) 2 ] is the root mean square (RMS) of P ( x ) and E [ P ( x ) 2 ] is the second moment of the probability P ( x ) , given by
E [ P ( x ) 2 ] = i = 1 n P x i P x i 2 = i = 1 n P x i 3 .

2.2. Continuous Cases

Consider a continuous random variable Y with probability density function (PDF) p y defined on an unbounded support, such as ( , ). Since there is no baseline uniform distribution defined over an unbounded support, we cannot measure the non-uniformity of the entire distribution. Instead, we examine parts of the distribution on a fixed interval [ y 1 , y 2 ] , which allows us to assess local non-uniformity.
According to Rajaram et al. (2024a), the PDF of a partial distribution on [ y 1 , y 2 ] is given by renormalization of the original PDF
p ' y = p y P ( y 1 , y 2 ) ,
where P ( y 1 , y 2 ) = y 1 y 2 p y d y , which is the coverage probability of the interval [ y 1 , y 2 ] .
Let Y U denote the uniform distribution on [ y 1 , y 2 ] with PDF p U ( y )   = 1 ( y 2 y 1 ) . We use this uniform distribution as the baseline for measuring the non-uniformity of the partial distribution.
Similar to the discrete case, the difference between the two PDFs p ' y and p U ( y ) is given by
Δ p ' y = p ' y p U ( y ) = p ' y 1 ( y 2 y 1 ) .
Thus, p ' y can be written as
p ' y = Δ p ' y + 1 ( y 2 y 1 ) .
Taking squares on both sides of Eq. (15) yields
p ' y 2 = Δ p ' y 2 + 2 ( y 2 y 1 ) Δ p ' y + 1 ( y 2 y 1 ) 2 .
Then, taking the expectation on both sides of Eq. (16) yields
E [ p ' y 2 ] = E ( Δ p ' y 2 ) + 2 ( y 2 y 1 ) E ( Δ p ' y ) + 1 ( y 2 y 1 ) 2 = ω p ' y 2 + 1 ( y 2 y 1 ) 2 .
The total deviation ω p ' y is given by
ω p ' y = E ( Δ p ' y 2 ) + 2 ( y 2 y 1 ) E ( Δ p ' y ) ,
where E ( Δ p ' y 2 ) is the variance of p ' y relative to p U ( y ) on [ y 1 , y 2 ] , given by
E ( Δ p ' y 2 ) = E { [ p ' y p U ( y ) ] 2 } = y 1 y 2 p y P ( y 1 , y 2 ) p y P ( y 1 , y 2 ) 1 ( y 2 y 1 ) 2 d y ,
and E ( Δ p ' y ) is the bias of p ' y relative to p U ( y ) , given by
E ( Δ p ' y ) = E [ p ' y p U ( y ) ] = y 1 y 2 p y P ( y 1 , y 2 ) p y P ( y 1 , y 2 ) 1 ( y 2 y 1 ) d y
Definition 2. The proposed DNUI for the partial distribution on  [ y 1 , y 2 ]  (denoted by  ρ ( y 1 , y 2 ) ) is given by
ρ y 1 , y 2 = ω p ' y E [ p ' y 2 ] = E [ p ' y 2 ] 1 ( y 2 y 1 ) 2 E [ p ' y 2 ] = E ( Δ p ' y 2 ) + 2 ( y 2 y 1 ) E ( Δ p ' y ) E ( Δ p ' y 2 ) + 2 ( y 2 y 1 ) E ( Δ p ' y ) + 1 ( y 2 y 1 ) 2 ,
where E [ p ' y 2 ] is the second moment of the PDF p ' y , given by
E [ p ' y 2 ] = y 1 y 2 p y P ( y 1 , y 2 ) 3 d y .
Definition 3. If the continuous distribution is defined on the fixed support  [ a , a ] , P ( a , a ) = 1  and ( y 2 y 1 ) = 2 a , the proposed DNUI for the entire distribution of Y (denoted by  ρ Y ) is given by
ρ Y = ω p y E [ p y 2 ] = E [ p y 2 ] 1 4 a 2 E [ p y 2 ] = E ( Δ p y 2 ) + 1 a E ( Δ p y ) E ( Δ p y 2 ) + 1 a E ( Δ p y ) + 1 4 a 2 ,
where E [ p y 2 ] is the second moment of the PDF p y , given by
E [ p y 2 ] = a a p y 3 d y ,
the variance E ( Δ p y 2 ) is given by
E ( Δ p y 2 ) = a a p y p y 1 2 a 2 d y
                                = a a p y 3 d y 1 a a a p y 2 d y + 1 4 a 2
and the bias E ( Δ p y ) is given by
E ( Δ p y ) = a a p y [ p y 1 2 a ] d y = a a p y 2 d y 1 2 a .
The quantity a a p y 2 d y is denoted by β Y and is called the continuous informity of Y in the theory of informity (Huang, 2025).

3. Examples

3.1. Coin-Tossing

Consider tossing a coin, which is a simplest two-state probability system: {X; P(x)}={head, tail; P(head), P(tail)}, where P t a i l = 1 P h e a d . The DNUI for the distribution of X is given by
ρ X = E [ P ( x ) 2 ] 1 2 2 E [ P ( x ) 2 ]   ,
where the second moment E [ P ( x ) 2 ] can be calculated as
E [ P ( x ) 2 ] = [ P h e a d ] 3 + [ P t a i l ] 3
Figure 1 shows the DNUI for the distribution of X as a function of the bias represented by P h e a d . The two evenness measures: Simpson’s evenness E S 2 and Buzas & Gibson’s evenness E B G are also shown in Figure 1 for comparison.
As shown in Figure 1, when the coin is fair (i.e. P h e a d = P t a i l = 0.5 ), the DNUI is 0, and both Simpson’s evenness E S 2 and Buzas & Gibson’s evenness E B G equal 1, indicating perfect uniformity or evenness. As the coin becomes increasingly biased toward either head or tail, the DNUI increases, while E S 2 and E B G decrease. In the extreme case where P t a i l = 1 or P h e a d = 1 , the DNUI reaches its maximum value of ρ X = 0.866 , reflecting a high degree of non-uniformity. However, in this case, both E S 2 and E B G reach their minimum value of 0.5, which fails to capture the true extent of unevenness. This supports the argument made by Gregorius and Gillet (2021): “… measures of diversity generally do not produce characteristic values that are associated with states of complete unevenness.”

3.2. Three Frequency Data Series

JJC (2024) posted a question on Cross Validated about quantifying distribution non-uniformity. He supplied three frequency datasets (Series A, B, and C), each containing 10 values (Table 1). Visually, Series A is almost perfectly uniform, Series B is nearly uniform, and Series C is heavily skewed by a single outlier (0.6). Table 1 lists these datasets alongside the corresponding DNUI, E S 2 , and E B G values.
From Table 1, we can see that the DNUI value for Series A is 0.1864, confirming its high uniformity, while the DNUI value for Series B is 0.2499, indicating near-uniformity. In contrast, the DNUI value for Series C is 0.9767 (close to 1), signaling extreme non-uniformity. These results align well with intuitive expectations. The E S 2 and E B G ​ values for Series A and Series B are both close to 1, also indicating high uniformity. The E S 2 value for Series C is 0.2625, capturing its pronounced unevenness. However, the E B G value for Series C remains relatively high at 0.4545, failing to adequately reflect the severity of non-uniformity.

3.3. Five Continuous Distributions with Fixed Support [ a , a ]

Consider five continuous distributions with fixed support [ a , a ] : uniform, triangular, quadratic, raised cosine, and half-cosine. Table 2 summarizes their PDFs, variances, biases, second moments, and DNUIs.
As shown in Table 2, the DNUI is independent of the scale parameter a, which is a desirable property for a measure of distribution non-uniformity. By definition, the DNUI for the uniform distribution is 0. In contrast, the DNUI values for the other four distribution range from 0.5932 to 0.7746, indicating moderate to high non-uniformity. These results align well with intuitive expectations. Notably, the raised cosine distribution has the highest DNUI value among the five distributions, suggesting it exhibits the greatest non-uniformity.

3.4. Exponential Distribution

The PDF of the exponential distribution with support [ 0 , ) is
p y = λ e λ y ,
where λ is the shape parameter.
We consider a partial exponential distribution on the interval [ 0 , b ] (i.e., y 1 = 0 and y 2 = b ), where b is the length of the interval. Thus, the DNUI for the partial exponential distribution is given by
ρ 0 , b = E [ p ' y 2 ] 1 b 2 E [ p ' y 2 ] ,
where the second moment E [ p ' y 2 ] is given by
E [ p ' y 2 ] = 1 [ P ( 0 , b ) ] 3 0 b p y 3 d y .
The coverage probability of the interval [ 0 , b ] is given by
P ( 0 , b ) = 0 b λ e x p ( λ y ) d y = 1 e λ b .
The integral 0 b p y 3 d y can be solved as
0 b p y 3 d y = 0 b λ e λ y ) 3 d y = λ 3 0 b e 3 λ y d y = λ 2 3 ( 1 e 3 λ b ) .
Figure 2 shows the plot of the DNUI for the partial exponential distribution with λ = 1 as a function of the interval length b. It also shows the PDF of the original exponential distribution, Eq. (29) with λ = 1 , as a function of y.
As shown in Figure 2, when the interval length b is very small (approaching 0), the DNUI is close to 0, reflecting the high local uniformity within small intervals. As the interval length b increases, the DNUI also increases, indicating the growing local non-uniformity with larger intervals. When the interval length b becomes very large, the DNUI approaches 1, indicating that the distribution over a large interval is extremely non-uniform. These observations align well with intuitive expectations.

4. Discussion and Conclusions

Unlike the degree of inequality (DOI) or the existing evenness measures such as E S 2 and E B G (which are not distance measures), the proposed distribution non-uniformity index (DNUI) is a standardized, distance-based metric derived from the total deviation defined in Eq. (8). Importantly, this total deviation incorporates two components: variance and bias, both measured relative to the baseline uniform distribution. In contrast, the DOI, E S 2 , and E B G are not distance-based metrics and therefore cannot effectively quantify unevenness. As noted by Gregorius and Gillet (2021), diversity-based evenness measures do not capture deviations from uniformity in a meaningful way and fail to provide characteristic values that represent complete unevenness.
The proposed DNUI ranges between 0 and 1, with 0 indicating perfect uniformity and 1 indicating extreme non-uniformity. Lower DNUI values (close to 0) suggest a more uniform or flatter distribution, while higher values (close to 1) suggest a greater degree of non-uniformity or unevenness. Although there are no universally accepted benchmarks for defining levels of non-uniformity, we tentatively propose DNUI values of 0.25, 0.5, and 0.75 to represent low, moderate, and high non-uniformity, respectively, based on the examples presented in this study.
It is important to note that the DNUI depends solely on the probability values and not on the associated outcomes (or scores) or their specific order. This property can be illustrated using the frequency data from Series C in Subsection 3.1: {0.03, 0.02, 0.6, 0.02, 0.03, 0.07, 0.06, 0.05, 0.05, 0.07}. If, for example, the second and third values are swapped, the DNUI value remains unchanged. This invariance implies that different distributions can yield the same DNUI value. In other words, the DNUI is not a one-to-one function of the distribution; it can “collapse” different distributions into the same value. This property is analogous to how different distributions can share the same mean or variance.
In summary, the proposed DNUI provides an effective metric for quantifying the non-uniformity or unevenness of probability distributions. It is applicable to any distributions, discrete or continuous, defined on a fixed support. It can also be applied to partial distributions on fixed intervals to examine local non-uniformity, even when the overall distribution has unbounded support. The presented examples have demonstrated the effectiveness of the proposed DNUI in capturing and quantifying distribution non-uniformity.

Disclosure statement: Conflict of Interest: The author declares no conflicts of interest.

References

  1. Buzas MA, Gibson TG. (1969). Species diversity: benthonic foraminifera in western North Atlantic. Science, 163(3862), 72-5.
  2. Gregorius, H. R., & Gillet, E. M. (2021). The Concept of Evenness/Unevenness: Less Evenness or More Unevenness? Acta biotheoretica, 70(1), 3. [CrossRef]
  3. Huang, H. (2025). The theory of informity: a novel probability framework. To be published in Bulletin of Taras Shevchenko National University of Kyiv.
  4. JJC (https://stats.stackexchange.com/users/10358/jjc), How does one measure the non-uniformity of a distribution? URL (version: 2024-10-12): https://stats.stackexchange. 2582.
  5. Jost, L. (2006). Entropy and diversity. Oikos, 113, 363–375.
  6. Rajaram, R., Ritchey, N., & Castellani, B. (2024a). On the mathematical quantification of inequality in probability distributions. Journal of Physics Communications,8(8), Article 085002. [CrossRef]
  7. Rajaram, R., Ritchey, N., & Castellani, B. (2024b). On the degree of uniformity measure for probability distributions. J. Phys. Commun. 8 115003.
  8. Rousseau, R. (2018). The repeat rate: from Hirschman to Stirling. Scientometrics, 116, 645–65. [CrossRef]
  9. Roy, S., & Bhattacharya, K. R. (2024). A theoretical study to introduce an index of biodiversity and its corresponding index of evenness based on mean deviation. World Journal of Advanced Research and Reviews, 21(2), 022-032. [CrossRef]
Figure 1. The DNUI for the distribution of X as a function of the bias represented by the probability of heads, compared with Simpson’s evenness E S 2 and Buzas & Gibson’s evenness E B G .
Figure 1. The DNUI for the distribution of X as a function of the bias represented by the probability of heads, compared with Simpson’s evenness E S 2 and Buzas & Gibson’s evenness E B G .
Preprints 164993 g001
Figure 2. Plots of the DNUI for the partial exponential distribution with λ = 1 and the PDF of the original exponential distribution.
Figure 2. Plots of the DNUI for the partial exponential distribution with λ = 1 and the PDF of the original exponential distribution.
Preprints 164993 g002
Table 1. Three frequency data series and the corresponding DNUI, E S 2 , and E B G values.
Table 1. Three frequency data series and the corresponding DNUI, E S 2 , and E B G values.
Series ρ X


E S 2


E B G


A: {0.1, 0.11, 0.1, 0.09, 0.09, 0.11, 0.1, 0.1, 0.12, 0.08} 0.1864 0.9881 0.9940
B: {0.1, 0.1, 0.1, 0.08, 0.12, 0.12, 0.09, 0.09, 0.12, 0.08} 0.2499 0.9785 0.9891
C: {0.03, 0.02, 0.6, 0.02, 0.03, 0.07, 0.06, 0.05, 0.05, 0.07} 0.9767 0.2625 0.4545
Table 2. The PDF p y , variance E ( Δ p y 2 ) , bias E ( Δ p y ) , second moment E [ p y 2 ] , and DNUI ρ Y for five continuous distributions with fixed support [ a , a ] .
Table 2. The PDF p y , variance E ( Δ p y 2 ) , bias E ( Δ p y ) , second moment E [ p y 2 ] , and DNUI ρ Y for five continuous distributions with fixed support [ a , a ] .
Distribution p y


E ( Δ p y 2 )


E ( Δ p y )


E [ p y 2 ]


ρ Y


Uniform 1 2 a


0 0 1 4 a 2


0
Triangular ( y + a ) a 2 ,             a y 0 ( a y ) a 2 ,                           0 y a  


1 12 a 2


1 6 a


1 2 a 2


0.7071
Quadratic 3 4 a 1 1 a y 2


1 28 a 2


1 10 a


27 70 a 2


0.5932
Raised cosine 1 2 a 1 + cos π a y


1 8 a 2


1 4 a


5 8 a 2


0.7746
Half-cosine π 4 a cos π 2 a y


1 4 π 2 48 1 a 2


π 2 16 1 2 1 a


1 4 a 2


0.6262
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated