1. Introduction
Non-uniformity, or unevenness, is an inherent characteristic of probability distributions, as outcomes or values from a probability system are typically not distributed uniformly or evenly. Although the shape of a distribution can offer an intuitive sense of its non-uniformity, researchers often require a quantitative measure to assess this property. Such a measure is valuable for constructing distribution models and for comparing the non-uniformity across different distributions in a consistent and interpretable way.
A probability distribution is considered uniform when all outcomes have equal probability in the discrete case, or when the probability density is constant in the continuous case. Therefore, the uniform distribution serves as the natural baseline for assessing the non-uniformity of any given distribution, and non-uniformity is referred to as the degree to which a distribution deviates from this uniform benchmark. It is essential to ensure that the distribution being evaluated and the baseline uniform distribution share the same support. This requirement is especially important in the continuous case, where a fixed and clearly defined support is crucial for meaningful comparison.
The Kullback–Leibler (KL) divergence can be employed as a metric for measuring the non-uniformity of a given distribution by quantifying how different the distribution is from a baseline uniform distribution. A small KL divergence value indicates that the distribution is close to uniform. The KL divergence is applicable in both the discrete case and in the continuous case provided that the support is fixed. However, one significant drawback of using the KL divergence in this context is that it is unbounded. While a KL divergence value of zero represents perfect uniformity, there is no natural upper limit that allows us to contextualize how “non-uniform” a distribution is. This lack of an upper bound can make interpretation challenging, especially when comparing different distributions or when the scale of the divergence matters.
In recent work, Rajaram et al. (2024a, b), proposed a measure called the “degree of inequality (DOI)” to quantify how evenly the probability mass or density is distributed across available outcomes or support. Specifically, they defined the DOI for a partial distribution on a fixed interval as the ratio of the exponential of the Shannon entropy to the coverage probability of that interval (Rajaram et al., 2024a, b)
where the subscript “P” denotes “part”, referring to the partial distribution on the fixed interval,
is the coverage probability of the interval,
is the entropy of the partial distribution, and
is the entropy-based diversity of the partial distribution. When the entire distribution is considered,
, and thus, the DOI equals the entropy-based diversity
. It should be noted that the DOI is neither standardized nor normalized and does not explicitly measure the deviation of the given distribution relative to a uniform benchmark.
Classical evenness measures, such as Simpson’s evenness and Buzas & Gibson’s evenness, are essentially diversity ratios. For a discrete random variable
X with probability mass function (PMF)
and
n possible outcomes, Simpson’s evenness is defined as (e.g., Roy & Bhattacharya, 2024)
where is Simpson’s diversity, representing the effective number of distinct elements in the probability system , and n is the maximum diversity that corresponds to a uniform distribution with PMF 1/n. The concept of effective number is the core of diversity measures in biology (Jost, 2006).
Buzas & Gibson’s evenness is defined as (Buzas & Gibson, 1969)
where is the Shannon entropy of X, , and is the extropy of the uniform distribution with PMF 1/n. The exponential of the Shannon entropy is the entropy-based diversity, and it also considered to be an effective number of elements in the probability system .
Unlike the DOI, which is not normalized, both and are normalized by n, the maximum diversity corresponding to the baseline uniform distribution. Therefore, these indices range between 0 and 1, with 0 indicating extreme unevenness and 1 indicating perfect evenness.
However, as Gregorius and Gillet (2021) pointed out, “Diversity-based methods of assessing evenness cannot provide information on unevenness, since measures of diversity generally do not produce characteristic values that are associated with states of complete unevenness.” This limitation arises because diversity measures are primarily designed to capture internal distribution characteristics, such as concentration and relative abundance within the distribution. For example, the quantity is often called “repeat rate” or Simpson index (Rousseau, 2018), or Simpson concentration (Jost, 2006); it has historically been used as a measure of concentration (Rousseau, 2018). Moreover, since diversity metrics are not constructed within a comparative distance framework, they inherently lack the ability to quantify deviations from uniformity in a meaningful or interpretable way. This limitation significantly diminishes their effectiveness when the goal is specifically to detect or describe high degrees of non-uniformity.
It is important to emphasize that the non-uniformity or unevenness of a distribution should be quantified by explicitly measuring its distance from the ideal of perfect uniformity. However, neither the DOI nor the evenness indices and calculate an explicit distance relative to a uniform benchmark.
The aim of this study is to develop a new standardized, distance-based index that can effectively quantify the non-uniformity or unevenness of a probability distribution. In the following sections,
Section 2 describes the proposed distribution non-uniformity index (DNUI).
Section 3 presents several examples.
Section 4 provides discussion and conclusion.