Preprint
Article

This version is not peer-reviewed.

Discretization of Dynamical Systems Based on Observations

Submitted:

11 December 2024

Posted:

12 December 2024

You are already at the latest version

Abstract
Based on the observations of dynamical system, we define partition ξ_ε, which represents dynamical system as good as possible in the sense of its entropy. The suggested method utilizes ε-entropy, ε-capacity, and the introduced ε-information. The resulting algorithm is also useful for defining bin lengths of histograms, especially for multimodal distributions.
Keywords: 
;  ;  ;  

1. Introduction

Let S = M , M , μ , ν , T be dynamical system on a compact subset M of metric space R with metric ν and additive Lebesgue measure μ , where M is a σ -algebra of the subsets of R and T : M M is an endomorphism.
Assume that in the σ -algebra M are distinguished m sets X 1 , X 2 , , X m M which represent observations of the system and assume that these observations form a partition ξ = X i of the set M .
Then, using observations’ partition ξ it is required to define a real number ε > 0 and partition ξ ε = X i   |   i N of the set M such that the diameters of the sets X i ξ ε are d X i = sup x , y X i ν x , y 2 ε , which represents the system “as good as possible”.
Before formulating strict criterion, let us clarify the problem by the following examples.
Assume that M = x 0 , x m R is an interval of real numbers and assume that observations of the interval are represented by the partition ξ = X i , which includes intervals X 1 = x 0 , x 1 , X 2 = x 1 , x 2 , X 3 = x 2 , x 3 , …, X m = x m 1 , x m such that x j M , j = 0 ,   1 ,   2 , , m , are drawn with respect to certain unknown distribution. Using the numbers x j , j = 0 ,   1 ,   2 , , m , it is needed to plot a histogram, which represents this distribution.
This problem requires definition of the number of bins or the bin length such that the histogram “as good as possible” represents the distribution of the observed values. Despite existence of several formulas for calculating bin lengths, there is no strictly proven criteria for estimating goodness of the histogram.
From the other point of view, the problem can be considered in the terms of discretization of stochastic processes [5]. In this case, given observed realization of the process it is required to define a sequence t i , i = 0 ,   1 ,   2 ,   , of times, 2 ε = t i + 1 t i , t i = i Δ , such that the intervals Δ = 2 ε between the values x i = x t i “as good as possible” represent the process.
To formulate the criterion of consistency between the observations partition ξ and the partition ξ ε , denote ξ ε n = i = 0 n T i ξ ε and ξ n = i = 0 n T i ξ , where stands for multiplication of the partitions.
Given observations’ partition ξ , we say that partition ξ ε represents dynamical system S if a distance d i s t ξ ε n , ξ n between partitions ξ ε n and ξ n as a function of ε is bounded that is
a d i s t ξ ε n , ξ n b ,
for some constant numbers a and b , a b , and that partition ξ ε represents dynamical system S “as good as possible” if
d i s t ξ ε n , ξ n = c o n s t .
In the paper, we suggest a method for defining partition ξ ε . We assume that M is a compact set with a metric ν : M R and together with the Kolmogorov-Sinai entropy h T of dynamical system we consider the Kolmogorov ε -entropy and ε -capacity [6]. Then, we define a diameter 2 ε of the sets X i that results in the required partition ξ ε .
The suggested method is also useful for definition of the bin lengths of histograms; in the paper, we apply the method to different datasets and compare it with the bin lengths obtained by the known methods.

2. Methods

We follow the line of the Schwarz information criterion [10] and apply the Kolmogorov ε -entropy and ε -capacity [6] (see also [2, 15]). Here we recall the required definitions and facts which are used in the solution of the problem.
Let S = M , M , μ , ν , T be a dynamical system. We assume that M is a compact subset of metric space R with metric ν and additive Lebesgue measure μ such that μ M = 1 ; M is a σ -algebra of the subsets of R and T : M M is an endomorphism.
Let β = B i , B i M ,   i B i = M , B i B j = for i j , i , j N , be a partition of the set M .
Definition 1 [12,13]. A value
H μ β = i μ B i log 2 μ B i
is called entropy of the partition  β .
Let β = B i | B i M and γ = C j | C j M be two partitions of the set M .
Definition 2 [12,13]. A value
H μ β | γ = j μ C j i log 2 μ B i | C j
is called conditional entropy of the partition β relatively to the partition γ .
Definition 3 [9,12,13]. A value
d i s t β , γ = H μ β | γ + H μ γ | β
is called Rokhlin distance between the partitions β  and  γ .
For any endomorphism T and any partition β , denote by T β a partition of M to the sets T B i , that is T β = T B i .
Let β γ = B i C j | B i β , C i γ , i , j N be multiplication of the partitions β = B i | B i M and γ = C j | C j M .
Definition 4 [12,13]. A value
h β , T = lim n 1 n H μ β T β T 2 β T n β
is called entropy per symbol of partition or entropy per unit of time.
Definition 5 [12,13]. A value
h T = sup β h β , T ,
where supremum is taken over all finite or countable measurable partitions β of M such that H μ β < , is called entropy of dynamical system.
Entropy h T is an invariant of dynamical system, but direct calculation of the entropy h T is possible only for certain endomorphisms; for examples see [12, chapter 15] and [13, chapter 7].
Theorem 1 [1,12,13]. If T is invertible and M = n = T n ξ * , then  h ξ * , T = h T .
Corollary 1 [1]. If M = n = 0 T n ξ * , then h ξ * , T = h T .
Let ε > 0 be a real number.
Set α = A : A R is called ε -covering of the set M , if M A α A and diameter of any A α is not greater than 2 ε .
Set M is said to be ε -distinguishable, if any two of its distinct points are located at distance greater than ε .
Lemma 1 [6]. Given a compact set M , for any ε > 0 there exists a finite ε -covering of M . Also, for any ε > 0 any ε -distinguishable set M R is finite.
Denote by N ε M a minimal number of sets in ε -covering α of the set M , and by M ε M a maximal number of points in an ε -distinguishable subset of the set M .
Definition 6 [6]. A value
H ε M = log 2 N ε M
is called  ε -entropy of the set  M  and a value
E ε M = log 2 M ε M
is called  ε -capacity of the set  M .
These values are interpreted as follows: ε -entropy H ε M is a minimal number of bits required to transmit the set M with the precision ε , and ε -capacity E ε M is a maximal number of bits, which can be memorized by M with the precision ε .
We will use the following property of ε -entropy H ε M and ε -capacity E ε M .
Theorem 2 [6]. Given a compact set M , both ε -entropy and ε -capacity are non- increasing functions of ε .
Finally, we will use the following property of the entropy of partitions.
Let β = B : B M and γ = C : C M be two partitions of the set M . If each set C γ is a subset of some set B β , then it is said that partition γ is a refinement of the partition β , which is written as γ β .
Lemma 2 [12]. If β γ , then H μ β H μ γ .
Let β γ be multiplication of the partitions β and γ . Since each set D β γ is a subset of some set B β and of some set C γ , partition β γ is a refinement of both β and γ , that is β γ β and β γ γ .
Thus, following lemma 2, H μ β H μ β γ and H μ γ H μ β γ .
For other properties of the entropy H μ and its application for analysis of dynamical systems see the paper [9] and the books [7,13,15].

3. Suggested Solution

Let M is a compact subset of metric space R with metric ν and additive Lebesgue measure μ such that μ M = 1 ; M is a σ -algebra of the subsets of R .
Let β = B i | B i M , i B i = M , B i B j = for i j , i , j N , be a partition of M . If diameter d B i = sup x , y B i ν x , y of each B i β is not greater than 2 ε , then partition β is an ε -covering and is called ε -partition.
Lemma 3. Let β be ε -partition of M . If diameter d M = sup x , y M ν x , y = 1 and measure μ B = sup x , y B ν x , y , B β , then H μ β H ε M .
Proof. Consider entropy
H ε M = log 2 N ε M = 1 N ε M log 2 1 N ε M .
Given ε , a minimal number N ε M of sets in β is reached if diameters d B of its elements B β are maximal that is d B = 2 ε . That is possible either if d M d B = 1 2 ε subsets from β completely cover M , or if d M d B subsets cover a part M ' of M such that d M \ M ' < 2 ε and to cover this part it is required one additional subset, where x denotes maximal integer number smaller than or equal to x . In this case, the number of elements in β is d M d B + 1 .
Then, if 1 2 ε is integer,
N ε M = d M d B = 1 2 ε ;
otherwise
N ε M = d M d B + 1 = 1 2 ε + 1 .
Substituting these values into the definition of entropy, in the first case one obtains
H ε M = i = 1 1 / 2 ε 1 1 / 2 ε l o g 2 1 1 / 2 ε ,
and in the second case –
H ε M = i = 1 1 / 2 ε + 1 1 1 / 2 ε + 1 l o g 2 1 1 / 2 ε + 1 =
= i = 1 1 / 2 ε 1 1 / 2 ε + 1 l o g 2 1 1 / 2 ε + 1 1 1 / 2 ε + 1 l o g 2 1 1 / 2 ε + 1 .
Now consider entropy
H μ β = B β μ B log 2 μ B .
If number 1 2 ε is integer, then
H μ β = i = 1 1 / 2 ε 2 ε × l o g 2 2 ε = i = 1 1 / 2 ε 1 1 / 2 ε l o g 2 1 1 / 2 ε ,
otherwise
H μ β = i = 1 1 / 2 ε + 1 2 ε × l o g 2 2 ε = = i = 1 1 / 2 ε 1 1 / 2 ε l o g 2 1 1 / 2 ε 1 1 / 2 ε l o g 2 1 1 / 2 ε .
Thus, if number 1 2 ε is integer, then H ε M = H μ β .
Consider the case of non-integer 1 2 ε . For any ε such that number 1 2 ε is not integer, holds
1 1 / 2 ε + 1 < 1 1 / 2 ε .
Thus, because of monotonicity of the l o g function, H μ β H ε M . ■
Note that lemma 3 is also true for a measure μ and a metric ν such that μ M = d M .
From the proof of lemma 3 follows the next fact.
Corollary 2. H μ β = H ε M + c ε , where c ε = 2 ε l o g 2 2 ε monotonously increases with ε up to a single maximum and then monotonously decreases.
In addition, let us consider ε -entropy of multiplication of partitions.
Let β = B i   |   B i M and γ = C j   |   C j M be two ε -partitions of the set M and let β γ = D k   |   D k M be multiplication of the partitions β and γ .
Lemma 4. Let d B i 2 ε B , B i β , and d C j 2 ε C , C j γ , and let ε = max ε B , ε C . Then β γ is ε -partition with d D k 2 ε D max ε B , ε C , D k β γ , and
H ε D M H min ε B , ε C M .
Proof. Since β γ β , holds N ε D M N ε B M and H ε D M H ε B M ; and since β γ γ , holds N ε D M N ε C M and H ε D M H ε C M .
Assume that ε B ε C . Then, since N ε B M N ε C M , holds H ε B M H ε C M , and consequently H ε D M H ε C M H ε B M .
On the other hand, if ε C ε B , then N ε C M N ε B M and H ε C M H ε B M . Thus H ε D M H ε B M H ε C M . ■
The suggested method is based on the concept of ε -information which is defined as follows.
Let ξ = X i   |   i = 1 ,   2 , , m be a partition of the set M obtained by observations and let ε m > 0 be a radius that provides maximal ε -entropy H ε m M with respect to ξ .
Radius ε m depends on the diameter of the set M and the structure of the partition ξ and cannot be specified directly. Nevertheless, we can prove the following fact.
Lemma 5.
H ε m M = log 2 m .
Proof. Denote by ξ ε m = Y j | j = 1 , 2 , , m ε m partition of M such that d Y j = sup x , y Y ν x , y = 2 ε m for each Y j ξ ε m . In other words, partition ξ ε m splits the set M to the subsets with equal diameters 2 ε m .
Then ε m is a radius such that an average number of bits in the description of the elements of ξ using the elements of ξ ε m is maximal. Formally it means that for the radius ε m holds
1 n H μ ξ ε m ξ m a x ,
where n = ξ ε m ξ is a number of subsets in the partition ξ ε m ξ .
Note that the number of elements in the partition ξ ε m ξ is n max m , m ε m , and that the function f x = 1 x log 2 x has a single maximum in the point x = e .
Hence, given partition ξ of the size m , maximum of the entropy 1 n H μ ξ ε m ξ is reached for partition ξ ε m of the size m ε m = m . Diameters of the elements Y j ξ ε m of this partition are
d Y j = 2 ε m = d M m ,   j = 1 , 2 , , m .
Recall that the measure of the set M is μ M = 1 . Since partition ξ ε m includes m elements with equal diameters, the measures of these elements are μ Y j = 1 m . Thus, entropy of the partition ξ ε m is
H μ ξ ε m = j = 1 m μ Y j log 2 μ Y j = j = 1 m 1 m log 2 1 m = log 2 m .
Finally, since partition ξ ε m includes the elements with equal diameters 2 ε m , this partition is ε -partition of M and for given ε m its ε -entropy is
H ε m M = log 2 m ,
which is required. ■
Let β = B : B M be a partition of the set M . Similar to definition 6, denote by N ε β a number of sets B β such that d B 2 ε .
Definition 7. A value
H ε β = log 2 N ε β
is called ε -entropy of the partition  β .
Let H ε m M be maximal ε -entropy of the set M with respect to the observations ξ = X i   |   i = 1 ,   2 , , m and let ε > 0 be a radius; recall that radius ε represents certain precision of coding the set M (see definition 6 and remarks after it). Then
H ε m M H ε M
is the difference between minimal number of bits required to transmit the set M with the precision ε m provided by observations and minimal number of bits required to transmit the set M with the desired precision ε .
This difference can be negative which indicates that the observations do not allow transmitting the set M with the desired precision.
Similar to the proof of lemma 5, denote by ξ ε = Y   |   Y M partition of M such that d Y = sup x , y Y ν x , y = 2 ε for each Y ξ ε . Then
H ε m M H ε ξ ξ ε
is the difference between minimal number of bits required to transmit the set M with the precision ε m provided by observations and the number of bits required to transmit the observations ξ of the set M using the partition ξ ε .
The sum of these differences results in the following definition.
Definition 8. A value
I ε M = 2 H ε m M H ε M H ε ξ ξ ε
is called ε -information of the set M .
In the formula of ε -information I ε M , the value H ε m M represents minimal number of bits required to transmit the set M with maximal precision allowed by the observations ξ , the second term H ε M represents minimal number of bits required to transmit the set M with precision ε , and the last term H ε ξ ξ ε represents the number of bits required to transmit the results of observations of the M measured using ξ ε .
Thus, the value I ε M is the number of bits remaining after the transmitting the observed part of the set M with the precision ε .
Theorem 3. Let  ξ  be the observations’ partition of the size  m . Then,  ε -information  I ε M  as a function of  ε  increases and its upper bound is  log 2 m .
Proof. In the definition of ε -information the first term 2 H ε m M is constant and the second term H ε M , by theorem 2, does not increase with increasing ε . Thus, it remains to prove that the last term H ε ξ ξ ε decreases with increasing ε .
Since diameter of each Z ξ ξ ε is d Z 2 ε , the number N ε ξ ξ ε is equivalent to the number n = ξ ξ ε of elements in the partition ξ ξ ε , and so ε -entropy is
H ε ξ ξ ε = log 2 n .
The number n of elements in the partition ξ ξ ε decreases (not necessarily monotonously) with increasing ε down to the number m = ξ of elements in the partition ξ . Hence, entropy H ε ξ ξ ε also decreases down to the value H ε ξ ξ ε = log 2 m , which results in increasing of ε -information I ε M at maximum up to the value 2 log 2 m log 2 m = log 2 m . ■
Let us return to the initial problem.
Let S = M , M , μ , ν , T be a dynamical system, let ξ = X i | X i M , i = 1 , 2 , , m be an observations’ partition, and ξ ε n = i = 0 n T i ξ ε and ξ n = i = 0 n T i ξ .
Theorem 4. If  T  is inversible and
I ε M E ε M = 0 ,
then Rokhlin distance
d i s t ξ ε n , ξ n = H μ ξ ε n | ξ n + H μ ξ n | ξ ε n
as a function of ε is bounded.
Before proving the theorem, let us recall the following facts. Let β = B i and γ = C j be partitions of the set M .
Lemma 6 [12,13].
H μ β γ = H μ β | γ + H μ γ
Lemma 7 [12]. If endomorphism T is inversible, then
H μ T β | T γ = H μ β | γ
Proof of theorem 4. Following lemma 7,
d i s t ξ ε n , ξ n = d i s t ξ ε , ξ = H μ ξ ε | ξ + H μ ξ | ξ ε
Appling lemma 6 for each term, we obtain
d i s t ξ ε , ξ = 2 H μ ξ ε ξ H μ ξ ε H μ ξ
The last term H μ ξ does not depend on ε ; hence we are interested in the difference
2 H μ ξ ε ξ H μ ξ ε
Consider ε -information I ε M . From corollary 2 it follows that
H ε m M = H μ ξ ε m + c 1 H ε M = H μ ξ ε + c 2 ε H ε ξ ξ ε = H μ ξ ξ ε + c 3 ε
where c 1 is constant and c 2 and c 3 monotonously increase with ε up to a single maximum and then monotonously decrease. Then
I ε M = 2 H μ ξ ε m H μ ξ ε H μ ξ ξ ε + c
where c ε = 2 c 1 c 2 ε c 3 ε monotonously decreases down to a single minimum and then monotonously increases.
Similarly, for ε -capacity holds
E ε M = H μ ξ ε + c 4 ε
where c 4 ε = c 2 ε + c o n s t monotonously increases with ε up to a single maximum and then monotonously decreases.
Zero difference between ε -information and ε -capacity leads to
2 H μ ξ ε m + c 5 ε = H μ ξ ξ ε + 2 H μ ξ ε
where c 5 ε = c ε c 2 ε c o n s t is bounded.
Since 2 H μ ξ ε m does not depend on ε ,
H μ ξ ξ ε + 2 H μ ξ ε = c 5 ε + c o n s t
Consequently, the difference
2 H μ ξ ε ξ H μ ξ ε
is bounded and theorem 4 is proven. ■
This theorem motivates formulation of the problem of finding a value Δ = 2 ε in the terms of ε -information and ε -capacity of the set M .
Problem. Find radius ε such that
I ε M E ε M m i n
Calculation of ε -information and ε -capacity is possible only for certain sets M . Example of such calculations for evenly distributed observations over an interval of real numbers are presented in section 4. However, an algorithmic solution of the problem is rather simple.
Below we present an algorithm of finding ε for the interval M = x 0 ,   x m R of real numbers, which can be directly extended to the subset of any lattice with appropriate σ -algebra.
Algorithm 1. Computing radius ε
Input:  Set M = x 0 ,   x m R ;
     observations’ partition ξ = X 1 , X 2 ,   , X m , X i + 1 = x i ,   x i + 1 , x i < x i + 1 ,
                   i = 0 ,   1 ,   2 , , m 1 ;
     step s > 0 .
Output: Radius ε .
1. Calculate ε m = x m x 0 / 2 m .
2. Calculate H ε m M = log 2 m .
3. For ε = ε m to x m x 0 / 2 with step s do:
4.   Calculate H ε M = log 2 x m x 0 / 2 ε .
5.   Create partition ξ ε = Y 1 , Y 2 ,   , Y m ε of M such that
       Y j + 1 = y j ,   y j + 1 , y j < y j + 1 , y 0 = x 0 , y m ε = x m , and y j + 1 y j = 2 ε , j = 0 ,   1 ,   2 , , m ε 1 .
6.   Compute H ε ξ ξ ε = e p s _ e n t r o p y ξ ,   ξ ε .
7.   Calculate I ε M = 2 H ε m M H ε M H ε ξ ξ ε .
8.   Compute E ε M = e p s _ c a p a c i t y ξ , ε .
9.   If I ε M > E ε M then
10.     Break
11.   End if.
12. End for.
13. Return ε .
The algorithm includes two functions, e p s _ e n t r o p y ξ ,   ξ ε and e p s _ c a p a c i t y ξ , ε which are defined as follows.
Function 1.  e p s _ e n t r o p y ξ ,   ξ ε
Input:  Partition ξ = X 1 , X 2 ,   , X m , X i + 1 = x i ,   x i + 1 , x i < x i + 1 ,
       i = 0 ,   1 ,   2 , , m 1 .
      Partition ξ ε = Y 1 , Y 2 ,   , Y m ε , Y j + 1 = y j ,   y i + 1   y j < y j + 1 ,
       j = 0 ,   1 ,   2 , , m ε 1 ,
       y 0 = x 0 , y m ε = x m , and y j + 1 y j = 2 ε , j = 0 ,   1 ,   2 , , m ε 2 .
Output: ε -entropy H ε ξ ξ ε of the partition ξ ξ ε .
1. From the partitions ξ and ξ ε create, respectively, the sets of points
       X = x 0 ,   x 1 ,   x 2 , , x m and Y = y 0 ,   y 1 ,   y 2 , , y m .
2. Join the sets X and Y : X j o i n t = X Y .
3. Find a number N X j o i n t of elements in the set X j o i n t .
4. Set N ε X j o i n t = N X j o i n t 1 .
5. Set H ε ξ ξ ε = log 2 N ε X j o i n t .
6. Return H ε ξ ξ ε .
Function e p s _ e n t r o p y can be implemented using concatenation of the sets with further removing of the doubling elements.
Function 2.  e p s _ c a p a c i t y ξ , ε
Input:  Partition ξ = X 1 , X 2 ,   , X m , X i + 1 = x i ,   x i + 1 , x i < x i + 1 ,
         i = 0 ,   1 ,   2 , , m 1 .
        Radius ε > 0 .
Output:   ε -capacity E ε M .
1. From the partition ξ create the sets of points X = x 0 ,   x 1 ,   x 2 , , x m .
2. If x m x 0 ε then
3.   Set M ε M = 1 .
4. Else
5.   Set M ε M = 2 .
6.   Set j = 0 .
7.   For i = 1 to m 1 do:
8.     If x i x j ε or x m x i ε then
9.       Continue.
10.     Else
11.       Set M ε M = M ε M + 1 .
12.       Set j = i .
13.     End if.
14.   End for.
15. End if.
16. Set E ε M = log 2 M ε M .
17. Return E ε M .
Time complexity C of Algorithm 1 includes the following terms: O 1 – complexity of the lines 1-4; O m – complexity of the line 5; O m log m – complexity of the line 6; O 1 – complexity of the line 7; O m – complexity of the line 8 and O 1 – complexity of the lines 9-13. Then, time complexity of each iteration of the algorithm is O m log m . The maximal number of iterations is n = t m t 0 / 2 s ; hence complexity of Algorithm 1 is
C = n × O m log m .
Convergence of Algorithm 1 is guaranteed by the indicated above fact that ε -information I ε M increases with increasing ε while ε -capacity E ε M decreases with increasing ε . Since the interval M = x 0 ,   x m is bounded, the difference between increasing ε -information I ε M and decreasing ε -capacity E ε M has its minimum in M , which is a terminating point of the algorithm.
Algorithm 1 returns radius ε for which the difference I ε M E ε M between ε -information and ε -capacity is minimal. Consequently, partition ξ ε = Y   |   Y M , in which diameters of the sets Y ξ ε are d Y = 2 ε , is a required partition.
Dependence of the functions I ε M and E ε M on the diameter 2 ε is illustrated in Figure 1. In this example, M = x 0 ,   x m , x 0 = 0 , x m = 100 , and in observations partition ξ = X 1 , X 2 ,   , X m , X i + 1 = x i ,   x i + 1 , x i < x i + 1 , i = 0 ,   1 ,   2 , , m 1 , points x i are evenly distributed on the interval M .
The computed radius is ε 7.1 and diameter is d = 2 ε 14.2 . The values of ε -information and ε -capacity are I ε M = E ε M 3.7 bit.
Note that the accuracy of computing radius ε increases with decreasing the step s .

4. Numerical Examples

Let M = x 0 ,   x m R be an interval of real numbers, x m > x 0 . Given ε > 0 , a minimal number of sets in the ε -covering α of the interval M is
N ε M = x m x 0 2 ε .
Then, the ε -entropy of the set M is
H ε M = log 2 N ε M = log 2 x m x 0 2 ε .
Let ε m be a minimal value of ε which provides maximal ε -entropy
H ε m M = log 2 x m x 0 2 ε m
of the interval M . Following lemma 5
H ε m M = log 2 m ;
thus
ε m = 1 2 x m x 0 m .
Let
ξ = X 1 , X 2 , , X m ,   X i + 1 = x i , x i + 1 ,   x i < x i + 1 ,   i = 0 , 1 , 2 , , m 1 ,
be an observations partition of the interval M and
ξ ε = Y 1 , Y 2 , , Y m ε ,   Y j + 1 = y j , y i + 1 , y j < y j + 1 ,   y 0 = x 0 ,   y m ε = x m ,
y j + 1 y j = 2 ε ,   j = 0 , 1 , 2 , , m ε 1 .
be a partition of the interval M such that all its elements are of the length 2 ε .
Then, ε -information of the interval M is
I ε M = 2 log 2 m log 2 x m x 0 2 ε H ε ξ ξ ε .
Entropy H ε ξ ξ ε depends on distribution of the values x i , i = 0 ,   1 ,   2 , , m , over the interval M .
Assume that these values are distributed evenly and ξ ξ ε . Then ξ ξ ε = ξ ε and
H ε ξ ξ ε = H ε M = log 2 x m x 0 2 ε .
Note that in general this equivalence does not hold and calculation of the entropy of multiplication of partitions is processed according to function 1 (see section 3).
Similarly, ε -capacity E ε M depends on the distribution of the values x i , i = 0 ,   1 ,   2 , , m , over the interval M . If these values are distributed evenly such that x j + 1 x j = x j + 2 x j + 1 and x j + 1 x j > ε for any j = 0 ,   1 ,   2 , , m 2 , then
M ε M = x m x 0 ε
and
E ε M = log 2 M ε M = log 2 x m x 0 ε .
If distribution of the values x i is such that x m 1 x 0 ε , which means that all the values except x m are located between x 0 and x m 1 , and x m x 0 > ε , then
M ε M = 2
and
E ε M = log 2 2 = 1 .
If x m x 0 ε , then the interval M does not contain ε -distinguishable subsets, and we assume that
M ε M = 1
and
E ε M = log 2 1 = 0 .
In general case, ε -capacity is calculated using function 2 (see section 3).
Finally, using the formulated criterion (see the problem in section 3), for evenly distributed values x i we obtain (for simplicity we omit the notion of ceiling)
I ε M E ε M = 2 log 2 m 2 log 2 x m x 0 2 ε log 2 x m x 0 ε = log 2 m 2 log 2 x m x 0 3 4 ε 3 .
Thus, it is required to find a value ε such that
x m x 0 3 4 ε 3 = m 2 ,
which is
ε = x m x 0 4 m 2 3 .
Consequently, diameter d Y j of the intervals Y j , j = 1 ,   2 , , m ε , in the required partition ξ ε = Y 1 , Y 2 ,   , Y m ε is
d Y j = 2 x m x 0 4 m 2 3 .
To illustrate the suggested method, let us calculate diameters d Y j , j = 1 ,   2 , , m ε , of intervals in the partition ξ ε and compare the obtained results with the bin lengths provided by the known methods appearing in the literature.
Let M = 0 , 10 and ξ = X 1 , X 2 , , X 10 , X 1 = 0 , 1 , X 2 = 1 , 2 ,…, X 10 = 9 , 10 . Then,
d Y j = 2 10 0 4 × 11 2 3 = 2.55 ,
for all j = 1 ,   , m ε = 4 . For this interval M , bin lengths calculated by the known methods used for plotting histograms are the following:
-
the simplest rule: d 1 = max A min A m = 10 0 11 = 3.01 ,
-
the Sturges rule [14]: d 2 = max A min A log 2 m + 1 = 10 0 log 2 11 + 1 = 2.24 ,
-
the Scott rule [11]: d 3 = 3.49 s m 3 = 3.49 × 3.32 11 3 = 5.20 ,
-
the Freedman-Diaconis rule [3]: d 4 = 2 Q 3 Q 1 m 3 = 2 × 8 3 11 3 = 4.94 .
It is seen that in this example with evenly distributed observations, diameter calculated using the suggested method is compatible with the bin lengths calculated using the known rules.
Now let us consider examples with different distributions of observed values x i , i = 0 ,   1 ,   2 , , m , and so – different observations partitions ξ = X i   |   X i M . In all examples we assume that M = x 0 ,   x m , x 0 = 0 , x m = 100 and m = 100 .
In these examples, we used uniform distribution with a = 0 and b = x m , normal distribution with μ = x m / 2 and σ = x m / 6 , and exponential distribution with μ = 2 . Diameters d calculated by the suggested method and bin lengths d 1 , , d 4 calculated by the known rules are the following:
-
evenly distributed data:
d = 14.21 ,   d 1 = 9.90 ,   d 2 = 12.95 ,   d 3 = 21.81   and   d 4 = 21.54 ,
-
uniform distribution with a = 0 and b = x m :
d = 14.11 ,   d 1 = 9.83 ,   d 2 = 12.87 ,   d 3 = 22.69   and   d 4 = 22.65 ,
-
normal distribution with μ = x m / 2 and σ = x m / 6 :
d = 11.91 ,   d 1 = 8.27 ,   d 2 = 10.82 ,   d 3 = 13.11   and   d 4 = 10.38 ,
-
exponential distribution with μ = 2 :
d = 1.20 ,   d 1 = 1.07 ,   d 2 = 1.41 ,   d 3 = 1.35   and   d 4 = 0.91 .
It is seen that the suggested method results in the diameters d that are close to the bin lengths d 1 , , d 4 provided by the conventional methods with respect to the distribution of the data. For evenly and uniformly distributed data diameter d is close to the bin length d 2 resulted by the Sturges method, for normal distribution d 2 < d < d 3 and for exponential distribution d 1 < d < d 2 .
Histograms of these data with the bin lengths d 3 calculated using the Scott rule and with the bin lengths d calculated by the suggested method are shown in Figure 2.
Finally, let us consider specific distribution, which appears in the analysis of queues [4].
Consider a clerk serving the clients of some office during a day. Then, arrival λ and departure μ rates measured per day describe the state of the system at the end of the day, but do not provide any information about the system during the day.
On other hand, the rates λ and μ measured per minute are also useless since the clients do not arrive and the clerk does not serve with such velocities.
Thus, it is required to define correct time units per which arrival and service rates should be specified. In such a form the problem was formulated by Yaakov Reis [8].
Assume that the office, where the mentioned above clerk works, serves 480 clients 8 hours during a day. Also, assume that the clients arrive by three “waves” – in the morning, in the midday and in the evening.
For this data, diameter d calculated by the suggested method and bin lengths d 1 , , d 4 calculated by the known rules are the following:
d = 22.0 , d 1 = 21.91 , d 2 = 48.45 , d 3 = 67.99 and d 4 = 66.41 .
Histogram for this data with the bin lengths d 3 calculated using the Scott rule is shown in Figure 3.a, and histogram with the bin lengths d calculated by the suggested method is shown in Figure 3.b.
Dependence of the functions I ε M and E ε M on ε for this distribution is shown in Figure 3.c.
It is seen that for this data the suggested method results in the diameter d which is close to the bin length d 1 provided by the simplest rule. Following this result, arrival rates during a day should be calculated each 22 minutes, while the Sturges rule they should be calculated each 48 minutes, by the Scott rule – each 68 minutes, and by the Freedman-Diaconis rule – each 66 minutes.

5. Conclusion

In the paper, we suggested a method of discretization of dynamical system basing on its observations. The method results in the partition ξ ε , which represents the system, and is as close as possible to the observations partition ξ .
The method utilizes the Kolmogorov and Tikhomirov ε -entropy and ε -capacity and the Rokhlin distance between partitions.
Along with theoretical results, the method is presented in the form of ready-to-use algorithm.
The suggested method is also useful for definition of the bin lengths of histograms, especially for the datasets with multimodal distributions.

Funding

This research has not received any grant from funding agencies in the public, commercial, or non-profit sectors.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Billingsley P. Ergodic Theory and Information. John Wiley and Sons: New York, 1965.
  2. Dinaburg E. I. On the relations among various entropy characteristics of dynamical systems. Math. USSR Izvestija, 1971, 5(2), 337-378.
  3. Freedman, D.; Diaconis, P. On the histogram as a density estimator: L2 theory. Zeit. Wahrscheinlichkeitstheorie und Verwandte Gebiete, 1981, 57, 453–476. [Google Scholar] [CrossRef]
  4. Gross D., Shortle J. F., Thompson J. M., Harris C.M. Fundamentals of Queueing Theory, 4th ed. John Wiley & Sons: Hoboken, NJ, 2008.
  5. Jacod J., Protte P. Discretization of Processes. Springer: Berlin, 2012.
  6. Kolmogorov A., N.; Tikhomirov V., M. ε -entropy and ε -capacity of sets in functional spaces. Amer. Mathematical Society Translations, Ser. 2 1961, 17, 277–364. [Google Scholar]
  7. Martin N., England J. Mathematical Theory of Entropy. Cambridge University Press: Cambridge, UK, 1984.
  8. Reis Y. Private conversation. Ariel University, Ariel, March 2024.
  9. Rokhlin V. A. New progress in the theory of transformations with invariant measure. Russian Mathematical Surveys, 1960, 15(4), 1–22.
  10. Schwarz G. Estimating the dimension of a model. Annals of Statistics, 1978, 6(2), 461-464.
  11. Scott D., W. On optimal and data-based histograms. Biometrika, 1979, 66, 605–610. [Google Scholar] [CrossRef]
  12. Sinai Y. G. Introduction to Ergodic Theory. Princeton University Press: Princeton, 1976.
  13. Sinai Y. G. Topics in Ergodic Theory. Princeton University Press: Princeton, 1993.
  14. Sturges, H. The choice of a class-interval. J. Amer. Statistics Association, 1926, 21, 65–66. [Google Scholar] [CrossRef]
  15. Vitushkin A. G. Theory of Transmission and Processing of Information, Pergamon Press: New York, 1961.
Figure 1. Dependence of ε -information I ε M and ε -capacity E ε M on the diameter d = 2 ε of the sets Y ξ ε for the partition ξ = X 1 , X 2 , , X m , X i + 1 = x i , x i + 1 , x i < x i + 1 , i = 0 , 1 , 2 , , m 1 , where m = 100 points are evenly distributed on the interval M = x 0 , x m , x 0 = 0 , x m = 100 . The step is s = 1 .
Figure 1. Dependence of ε -information I ε M and ε -capacity E ε M on the diameter d = 2 ε of the sets Y ξ ε for the partition ξ = X 1 , X 2 , , X m , X i + 1 = x i , x i + 1 , x i < x i + 1 , i = 0 , 1 , 2 , , m 1 , where m = 100 points are evenly distributed on the interval M = x 0 , x m , x 0 = 0 , x m = 100 . The step is s = 1 .
Preprints 142645 g001
Figure 2. Histograms plotted using the bin lengths d 3 calculated using the Scott rule (figures (a) for each distribution) and using the bin lengths d calculated using the suggested method (figures (b) for each distribution).
Figure 2. Histograms plotted using the bin lengths d 3 calculated using the Scott rule (figures (a) for each distribution) and using the bin lengths d calculated using the suggested method (figures (b) for each distribution).
Preprints 142645 g002
Figure 3. Arrivals of the clients during a day: (a) histogram with the bin length computed by the Scott rule; (b) histogram with the bin length computed by the suggested algorithm; (c) dependences of ε -information I ε M and ε -capacity E ε M on the diameter d = 2 ε .
Figure 3. Arrivals of the clients during a day: (a) histogram with the bin length computed by the Scott rule; (b) histogram with the bin length computed by the suggested algorithm; (c) dependences of ε -information I ε M and ε -capacity E ε M on the diameter d = 2 ε .
Preprints 142645 g003
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated