Preprint
Article

This version is not peer-reviewed.

Fitting Copulas with Maximal Entropy

A peer-reviewed article of this preprint also exists.

Submitted:

14 January 2025

Posted:

14 January 2025

You are already at the latest version

Abstract
We deal with two-dimensional copulas from the perspective of their entropy. We formulate a problem of finding a copula with maximum entropy when some values of the copula are given. As expected, the solution is a copula with a piecewise constant density (a checkerboard copula). This allows to simplify the optimisation of the continuous objective function, entropy, to optimisation of finitely many density values. We present several ideas simplifying this problem. It has a feasible numerical solution. We present several instances which admit also closed-form solutions.
Keywords: 
;  ;  ;  

1. Formulation of the task

1.1. Motivation

Copulas are a successful tool for the description of any dependence of continuous random variables. Based on empirical data, we may try to find a copula best describing the underlying distribution.
The standard approach to fitting the parametric copula classes is widely used across various copula applications. The hidden assumption behind this is the knowledge of the dependence model for which we try to estimate the parameters. However, there are many applications where this approach is not desirable because of the lack of knowledge, insufficient sample size, or a high dimension of the task. Then we want to find a model based on general copulas. When there is no other information than the sample, the empirical or discrete copulas and their continuous extensions can be used. Because of the incomplete information of copula values in between the known values, the maximum entropy principle can also be applied.
We study the problem of finding a joint probability distribution from partial information, when we are given
(I1)
the marginals,
(I2)
the values of the joint cumulative distribution function (joint cdf) at finitely many points.
This refers to the situation when we have (or assume to have) complete information about the marginals, but we do not know their dependence. The sample is discrete and it can possibly be discretised to a coarser scale. Thus we know the desired values of the copula only on a 2D rectangular grid of points resulting from the discretization. This task has a standard solution (a checkerboard copula, see below). However, some of these constraints may be missing for various reasons. In particular, we may have too few elements of the sample (possibly none) in some 2D intervals, so we do not consider it a reliable support for modelling the whole copula. We investigate this case.

1.2. Criteria of Optimality

Suppose that we have two continuous random variables, X , Y , with known cumulative distribution functions, F X , F Y . They need not be independent and their dependence can be specified, e.g., by their joint cdf, F X , Y . According to Sklar’s theorem (see [1,2]), it can be determined also by the respective copula, C X , Y . This is the joint cumulative distribution function of transformed random variables F X ( X ) , F Y ( Y ) , which have the uniform distribution on the interval 0 , 1 ,1
C X , Y ( x , y ) = F F X ( X ) , F Y ( Y ) ( x , y ) = F X , Y ( s , t ) ,
where s , t are such that F X ( s ) = x , F Y ( t ) = y . The copula C X , Y gives “pure” information of the dependence of X , Y , independently of the marginal distributions. It allows the expression of an arbitrary dependence between two continuous random variables.2
Definition 1. 
If there is a function c X , Y such that the copula can be expressed as its integral,
C X , Y ( x ¯ , y ¯ ) = 0 x ¯ 0 y ¯ c X , Y ( x , y ) d x d y ,
it is called a density of the copula C X , Y . The differential entropy of the copula C X , Y is the (Shannon) entropy [3] of its density, given by
Ent ( c X , Y ) = 0 1 0 1 η c X , Y ( x , y ) d x d y ,
where
η ( w ) : = w ln w if w > 0 , 0 if w = 0 .
Remark 1. 
The differential entropy of a copula does not exist if the copula is not absolutely continuous, i.e., it does not have a density. Our proposed solution avoids this problem.

1.3. Related Work and History of the Problem

One of the earliest pioneers in studying multivariate distributions was Claude E. Shannon, who introduced his groundbreaking work on information theory in [3]. Later, in 1959, Abe Sklar introduced copulas in his seminal work, which laid the foundation for the study of dependency structures in multivariate distributions [2].
The principle of maximum entropy in the choice of a distribution was well defended by Jaynes [4]. Several approaches to this aim have been proposed.
The study of copulas in relation to entropy began to gain attention around 2010, with foundational work by Pougaza et al. [5,6]. They give a nice overview and motivation of the maximum entropy principle.3 They maximize the differential entropy of the original joint cdf, while we propose to maximize the differential entropy of the corresponding copula.
The early contributions were primarily theoretical. Later on, Ma and Sun [7] introduced the use of the maximum entropy principle to estimate mutual information via copulas. Subsequently, Singh and Zhang [8] expanded the application of copulas connected to entropy for multivariate stochastic modeling in water engineering.
Piantadosi et al [9] (resp. [10]) found bivariate (resp. multivariate) copulas with maximal differential entropy under the knowledge of
(I1)
the marginals,
(I3)
the grade correlation coefficients.
Their solutions are checkerboard copulas.
Recently Lin et al. [11] applied the principle of maximum entropy and found checkerboard copulas as solutions. Their approach differs in the aspect that they start from distributions whose marginals are not continuous; the discontinuities determine the checkerboard structure. The difficulties of extension of the copula approach to distributions with discontinuous marginals are presented by Genest et al. [12].
While the literature on this topic remains relatively limited, this leaves considerable room for further theoretical advancements and potential new applications.
None of the preceding approaches restricts solutions by prescribed values at some points (I2). Sometimes the values of the joint cdf (and hence also of the copula) at some points are known. Such constraints can originate, e.g., from a discretization of a continuous scale. We want to choose a distribution fitting to these constraints.
Let us choose the distribution whose copula has maximal differential entropy (among those fitting to the constraints). The principle of maximal entropy was successfully used in many other tasks. It expresses our intention “not to include more information than that contained in the constraints.” Choosing this option, we try not to introduce any other dependence than that which follows from the given restrictions. This seems to be a natural criterion for the choice of the model fitting to constraints.
The result can be equivalently described by the joint cdf or by the copula. The differential entropy of the joint cdf suffers some drawbacks, e.g., it is not invariant to the change of scale; the entropies of random variables X and c X , c 1 , are different. In contrast to it, the copula remains the same even if the random variables are transformed by arbitrary increasing functions f , g ,
C X , Y ( x , y ) = C f ( X ) , g ( Y ) ( x , y ) .
Thus the differential entropy of a copula is a well-defined notion which can be a criterion for the choice of the model.4
In this paper, we formulate the task of finding a copula with maximal differential entropy fitting the constraints. We convert it to a finite-dimensional optimization problem. As one of the main contributions, we prove that the result is independent of some inputs. This further simplifies the task. It leads to a system of higher-order polynomial equations which allows for a numerical solution using convex optimization. We show that in the simplest cases, an analytical solution is also feasible.

1.4. Notation and Formulation of the Problem

Before the formulation of the task solved in this paper, let us introduce our notation. We restrict attention to 2-copulas, i.e., binary functions describing the dependence of two random variables. We are trying to find a copula, C X , Y : [ 0 , 1 ] 2 [ 0 , 1 ] . As we shall not deal with another copula in the sequel, we shall denote it briefly by C. For the readers not familiar with the notion of a copula, it can be any joint cdf with marginals uniformly distributed on 0 , 1 .5 They are characterized by the following necessary and sufficient conditions:
C ( w , 0 ) = C ( 0 , w ) = 0 ,
C ( w , 1 ) = C ( 1 , w ) = w ,
C ( x i , y j ) C ( x i , y ) C ( x k , y j ) + C ( x k , y ) 0 for x k < x i , y < y j .
Inequality (4) means that the probability
P ( x k < X < x i , y < Y < y j ) = C ( x i , y j ) C ( x i , y ) C ( x k , y j ) + C ( x k , y )
is non-negative.
The constraints are finitely many points in 0 , 1 2 where the value of the copula is given. Their first (resp. second) coordinates can be organized in an increasing sequence ( x 1 , , x I 1 ) , resp. ( y 1 , , y J 1 ) . These values determine a rectangular grid. At some of the grid points, ( x i , y j ) , the values z i , j are given,
C ( x i , y j ) = z i , j .
We denote by G 0 the set of all indices ( i , j ) { 1 , , I 1 } × { 1 , , J 1 } for which the values z i , j of the copula are given by (6). To simplify the notation, we define additionally x 0 = y 0 = 0 , x I = y J = 1 . The corresponding new grid points are at the boundary of the domain of the copula and form the set
B I , J = { ( x i , 0 ) , ( x i , 1 ) i = 0 , , k } { ( 0 , y j ) , ( 1 , y j ) j = 0 , , J } .
The values of the copula at the boundary are given by (2), (3), so we may define z i , j for all ( i , j ) B I , J by
z i , 0 = C ( x i , 0 ) = 0 , z 0 , j = C ( 0 , y j ) = 0 ,
z i , J = C ( x i , 1 ) = x i , z I , j = C ( 1 , y j ) = y j .
In total, (6) is required for all ( i , j ) from the set
G = G 0 B I , J
and G 0 can be expressed as
G 0 = G B I , J = G 0 , 1 2 .
In figures, we use a filled disk to denote a grid point at which the value is given (an element of G) and an empty circle to denote a grid point where the value is not restricted. Figure 1 and Figure 2 demonstrate our notation.
Problem 1. 
Let 0 = x 0 < x 1 < < x I = 1 , 0 = y 0 < y 1 < < y J = 1 , and let there be given values z i , j [ 0 , 1 ] for all ( i , j ) in some set G such that B I , J G { 0 , , I } × { 0 , , J } . Find a copula C satisfying (6) for all ( i , j ) G . Moreover, among all such copulas, we want to find one with maximum differential entropy.
The necessary and sufficient conditions for a copula, in particular (4), imply necessary conditions for solvability of Problem 1 which later will appear also sufficient.
Proposition 1. 
A necessary condition for the existence of a solution to Problem 1 is the conjunction of (7), (8), and
(N)
If ( i , j ) , ( i , ) , ( k , j ) , ( k , ) G , k < i , < j , then z i , j z i , z k , j + z k , 0 .
In the sequel, we assume that the constraints satisfy these conditions.

2. General Results

2.1. Reformulation to a Finite-Dimensional Optimization

As formulated, Problem 1 looks like a task from variational calculus. We shall transform it into a finite-dimensional optimization problem.
It is well-known that among all absolutely continuous distributions on a bounded set, the uniform distribution has the highest differential entropy (if the integral of the density over the whole domain is fixed). We shall derive a consequence of this principle. To simplify its formulation, we denote the lengths of intervals between neighboring grid points:
a i = x i x i 1 , b j = y j y j 1
for all i { 1 , , I } , j { 1 , , J } (see Figure 1).
Proposition 2. 
Let i { 1 , , I } , j { 1 , , J } . The restriction of the copula C to the two-dimensional open interval x i 1 , x i × y j 1 , y j contributes to the copula differential entropy by
x i 1 x i y j 1 y j η c X , Y ( x , y ) d x d y .
Among all copulas with a given value
n i , j : = C ( x i , y j ) C ( x i , y j 1 ) C ( x i 1 , y j ) + C ( x i 1 , y j 1 ) = P ( x i 1 < X < x i , y j 1 < Y < y j ) ,
the contribution (9) is maximal iff its density is constant,
c X , Y ( x , y ) = n i , j ( x i x i 1 ) ( y j y j 1 ) = n i , j a i b j ,
almost everywhere on x i 1 , x i × y j 1 , y j . Without loss of generality, we may choose the copula with density (11) for all ( x , y ) x i 1 , x i × y j 1 , y j .
Proposition 2 determines the solution to Problem 1 (almost everywhere) when all values
C ( x i , y j ) , C ( x i , y j 1 ) , C ( x i 1 , y j ) , C ( x i 1 , y j 1 )
are given, i.e., ( i , j ) , ( i , j 1 ) , ( i 1 , j ) , ( i 1 , j 1 ) G . Even when this is not the case, there are some optimal values n i , j , ( i , j ) { 1 , , I } × { 1 , , J } , and the copula density on ( x i 1 , x i ) × ( y j 1 , y j ) can be chosen according to (11) because there are no other restrictions on its values. This observation allows us to restrict attention to copulas with piecewise constant densities, so-called checkerboard copulas. Lin et al. [11] also found checkerboard copulas by maximization of the differential entropy, although they solved a different task.
It remains to find the optimal values of finitely many densities on intervals (rectangles) x i 1 , x i × y j 1 , y j for ( i , j ) { 1 , , I } × { 1 , , J } . Equivalently, we look for the values n i , j defined by (10). From them, the values of the copula at grid points can be computed as sums
C ( x i , y j ) = u = 1 i v = 1 j n u , v .
In particular, for i = I , resp. j = J , we obtain
u = 1 I v = 1 j n u , v = C ( x I , y j ) = C ( 1 , y j ) = y j , u = 1 i v = 1 J n u , v = C ( x i , y J ) = C ( x i , 1 ) = x i .
These formulas can be equivalently expressed using row and column sums:
u = 1 I n u , j = y j y j 1 , v = 1 J n i , v = x i x i 1 .
Remark 2. 
We do not deal with the values of density on the boundaries of these intervals. The boundaries are of measure zero, so they do not influence the results. In fact, the densities at these boundaries are not uniquely defined.
Copulas that do not have densities or entropies (see Remark 1) do not seem to be good candidates for approximation and cannot compete with our proposed solution.
The differential entropy of the checkerboard copula is
Ent ( c X , Y ) = 0 1 0 1 η c X , Y ( x , y ) d x d y = i = 1 I j = 1 J a i b j η n i , j a i b j = i = 1 I j = 1 J n i , j ln n i , j a i b j ,
provided that all n i , j 0 (otherwise, the corresponding summand of the differential entropy is 0, i.e., minimal). We shall deal with the following reformulation:
Problem 2. 
Let 0 = x 0 < x 1 < < x I = 1 , 0 = y 0 < y 1 < < y J = 1 , and let there be given values z i , j [ 0 , 1 ] for all ( i , j ) in some set G such that B I , J G { 0 , , I } × { 0 , , J } . Suppose that values z i , j , ( i , j ) G , satisfy the necessary conditions from Proposition 1. Find values n i , j [ 0 , 1 ] , ( i , j ) { 1 , , I } × { 1 , , J } , satisfying
u = 1 I n u , j = y j y j 1 ,
v = 1 J n i , v = x i x i 1 ,
u = 1 i v = 1 j n u , v = z i , j for all ( i , j ) G ,
and such that the differential entropy
Ent ( c X , Y ) = i = 1 I j = 1 J n i , j ln n i , j ( x i x i 1 ) ( y j y j 1 )
is maximal.
Remark 3. 
If G = { 0 , , I } × { 0 , , J } , i.e., if all values z i , j , ( i , j ) { 1 , , I } × { 1 , , J } , are given, then (10) and (6) determine all n i , j and the solution.

2.2. Decomposition of the Problem

We concentrate on rectangles with the following property:
(B)
The values of the copula at all grid points at the boundary of the rectangle [ x k , x K ] × [ y , y L ] are given, i.e.,
{ ( x i , y ) , ( x i , y L ) i = k , , K } { ( x k , y j ) , ( x K , y j ) j = , , L } G .
Then the optimization of the copula inside such a rectangle is independent of the rest of the domain and can be solved separately. E.g., in Figure 2, rectangles satisfying (B) are drawn by thick lines. The whole domain [ 0 , 1 ] × [ 0 , 1 ] has the values on the entire boundary given, so it satisfies (B).
Let us consider a rectangle satisfying (B). If all its grid points in some row or column (not at the boundary) are in G, then we can split it into two disjoint rectangles satisfying (B). (The choice need not be unique.) If this is not the case (i.e., there is a missing value in each row and each column6), we call the rectangle irreducible. Irreducible rectangles can be equivalently characterized as minimal rectangles satisfying (B). The whole domain can be covered by disjoint irreducible rectangles. This may allow us to decompose the problem into simpler ones, dealing with each irreducible rectangle separately. Figure 2 shows eight irreducible rectangles. Each component of the decomposed task has the following formulation. The active restrictions refer to grid points with indices from the set G { k , , K } × { , , L } . Notice that the monotonicity of the values z i , j needs to be formulated explicitly, while in Problem 2 it followed from (7).
Problem 3. 
Let 0 x k < < x K 1 , 0 y < < y L 1 , and let G { k , , K } × { , , L } be a set such that
{ ( x i , ) , ( x i , L ) i = 0 , , I } { ( k , y j ) , ( K , y j ) j = 0 , , J } G .
Suppose that values z i , j [ 0 , 1 ] are given for all ( i , j ) G so that sequences ( z i , ) i = k , , K , ( z k , j ) j = , , L are nondecreasing and(N)is satisfied. Find values n i , j [ 0 , 1 ] , ( i , j ) { k + 1 , , K } × { + 1 , , L } , satisfying
u = k + 1 i v = + 1 j n u , v = z i , j z i , z k , j + z k ,
for all ( i , j ) G and such that the contribution to the differential entropy
i = k + 1 K j = + 1 L n i , j ln n i , j ( x i x i 1 ) ( y j y j 1 )
is maximal.
In the sequel, we analyze special cases of irreducible rectangles. Before that, we simplify indexing by shifting the rectangle to the origin.

2.3. Shifted Indexing

If a rectangle [ x k , x K ] × [ y , y L ] satisfies (B), the corresponding values n i , j ,
( i , j ) { k + 1 , , K } × { + 1 , , L } , are bounded only by the row and column sums
r j = i = k + 1 K n i , j = z K , j z K , j 1 z k , j + z k , j 1 , j { + 1 , , L } ,
s i = j = + 1 L n i , j = z i , L z i 1 , L z i , + z i 1 , , i { k + 1 , , K } ,
which are non-negative values satisfying the equality
j = L r j = i = k K s i = j = + 1 L i = k + 1 K n i , j = z K , L z k , L z K , + z k , .
The same task is obtained if we take a rectangle of the same size with the lower left vertex at the origin and impose the same requirements on the row and column sums. We shall apply this simplification, and instead of the original task for a general rectangle, we shall, without loss of generality, solve the case when k = = 0 in the above notation (with the new upper bounds still denoted by K , L ; the above general indexing will not be used anymore). The boundary grid points form the set
B K , L = { ( x i , 0 ) , ( x i , L ) i = 0 , , I } { ( 0 , y j ) , ( K , y j ) j = 0 , , J } .
The situation simplifies not only in indexing (starting from 0, resp. 1), but also the values at the bottom and left edge become zero. The values at the top and right bound must be modified to general values z i , L , z K , j , which are not determined by x i , y j like they were in (8). The modified task is as follows.
Problem 4. 
Let 0 = x 0 < < x K 1 , 0 = y 0 < < y L 1 , and let and let there be given values z i , j [ 0 , 1 ] for all ( i , j ) in some set G such that B K , L G { 0 , , K } × { 0 , , L } .7 Suppose that values z i , j , ( i , j ) G , satisfy the necessary conditions(N)and (7) for i { 0 , , K } and j { 0 , , L } . Find values n i , j [ 0 , 1 ] , ( i , j ) { 1 , , K } × { 1 , , L } , satisfying
u = 1 i v = 1 j n u , v = z i , j for all ( i , j ) G 0 , K × 0 , L
and such that the contribution to the differential entropy
i = 1 K j = 1 L n i , j ln n i , j ( x i x i 1 ) ( y j y j 1 )
is maximal.
As in [11], this is an optimization of a concave function that has a maximum that can be found numerically (applying standard convex optimization with the opposite sign). For numerical solutions of convex tasks, standard references are, e.g., [15,16,17,18]. We shall show that analytical solutions exist in some cases and we demonstrate them to support the intuition about the task.

3. Less General Tasks

In this section, we collect results related to Problem 4. Throughout this section, we consider an irreducible rectangle [ 0 , x K ] × [ 0 , y L ] .

3.1. Rectangle with no Given Values Inside

Suppose that the set G = B K , L has an empty intersection with the interior of the rectangle 0 , x K × 0 , y L . This is an instance of Problem 4; equations (22) reduce to monotonicity.
Problem 5. 
Let 0 = x 0 < < x K 1 , 0 = y 0 < < y L 1 , and let
G = { ( x i , 0 ) , ( x i , L ) i = 0 , , I } { ( 0 , y j ) , ( K , y j ) j = 0 , , J } .
Suppose that values z i , j [ 0 , 1 ] are given for all ( i , j ) G so that (7) is satisfied for all i { 0 , , K } and j { 0 , , L } and sequences ( z i , L ) i = 0 K , ( z K , j ) j = 0 L are nondecreasing. Find values n i , j [ 0 , 1 ] , ( i , j ) { 1 , , K } × { 1 , , L } , satisfying
u = 1 K n u , j = z K , j ,
v = 1 L n i , v = z i , L ,
and such that the contribution to the differential entropy
i = 1 K j = 1 L n i , j ln n i , j ( x i x i 1 ) ( y j y j 1 )
is maximal.
The solution need not be a copula with a constant density at 0 , x K × 0 , y L because we must keep the correct row and column sums.
We use Lagrange multipliers λ 1 , , λ K , μ 1 , , μ L ; the Lagrange function is
L ( n i , j , λ i , μ j ) = i = 1 K j = 1 L n i , j ln n i , j a i b j + i = 1 K λ i j = 1 L n i , j s i + j = 1 L μ j i = 1 K n i , j r j .
We put all partial derivatives equal to zero and obtain a system of equations
L n i , j = 1 ln n i , j a i b j + λ i + μ j = 0 ,
where i { 1 , 2 , K } , j { 1 , 2 , L } . Equivalently,
ln n i , j a i b j = 1 + λ i + μ j .
Subtracting the last ones, we obtain
ln n i , j a i b j ln n K , j a K b j = λ i λ K , ln n i , j a i b j ln n i , L a i b L = μ i μ L ,
and express n i , j using n K , j or n i , L as
n i , j = n K , j a i a K exp ( λ i λ K ) , n i , j = n i , L b j b L exp ( μ j μ L ) .
We see that all rows (resp. columns) are multiples of the last one and the matrix composed of all n i , j is a dyad (a product of two vectors),
n 1 , 1 n 1 , K n L , 1 n L , K = c r 1 r m s 1 s t ,
where r j are the row sums, s i the column sums, and the constant c can be determined from the known total sum
c j = 1 L i = 1 K s i r j = j = 1 L r j = i = 1 K s i , c = j = 1 L r j i = 1 L j = 1 K s j r i = 1 i = 1 K s i .
This is an expected result that says that the probability density of a result in an elementary rectangle is the product of the probability densities of the marginal distributions.

3.2. Methodology of Solving Particular Cases

Here we collect ideas used to solve particular cases described in the appendices. We consider Problem 4. This leads to a system of linear equations; after the Gauss–Jordan elimination, it could look like
M = 1 0 1 0 0 0 1 C 1 0 1 1 0 0 0 0 C 2 0 0 0 1 0 0 1 C 3 0 0 0 0 1 0 1 C 4 0 0 0 0 0 1 1 C N ,
where C 1 , , C N are some constants computed from the values z i , j . Until we consider the maximized differential entropy, some variables n i , j can be chosen and the rest can be computed from them. This describes all possible solutions to the system of linear equations (ignoring their bounds). To maximize the differential entropy, we compute the partial derivatives of the contribution to the differential entropy and put them equal to zero. We obtain
N = K L card G = K L K L + 1 o
equations, where o = card ( G 0 , x K × 0 , y L ) is the number of given values inside the rectangle.
Theorem 1. 
The optimal solution to Problem 4 is independent of the values a 1 , , a K , b 1 , , b L .
Proof. 
Recall that
n i , j = z i , j z i 1 , j z i , j 1 + z i 1 , j 1 .
This leads to an optimization in those variables z i , j = C ( x i , y j ) where the copula values are not given ( ( i , j ) G ).
In the criterion, each variable z i , j , ( i , j ) G B K , L , occurs in four equations; for n i , j and n i + 1 , j + 1 with coeefficient 1 and for n i + 1 , j and n i , j + 1 with coeefficient 1 .
We solve this task by computing the partial derivatives w.r.t. unknown values z i , j and putting them zero,
ln z i , j z i 1 , j z i , j 1 + z i 1 , j 1 a i b j + 1 ln z i , j + 1 z i 1 , j + 1 z i , j + z i 1 , j a i b j + 1 1 ln z i + 1 , j z i , j z i + 1 , j 1 + z i , j 1 a i + 1 b j 1 + ln z i + 1 , j + 1 z i + 1 , j z i , j + 1 + z i , j a i + 1 b j + 1 + 1 = 0 .
This can be expressed as
z i , j z i 1 , j z i , j 1 + z i 1 , j 1 a i b j · z i + 1 , j + 1 z i + 1 , j z i , j + 1 + z i , j a i + 1 b j + 1 = z i , j + 1 z i 1 , j + 1 z i , j + z i 1 , j a i b j + 1 · z i + 1 , j z i , j z i + 1 , j 1 + z i , j 1 a i + 1 b j .
Multiplying this by a i b j a i + 1 b j + 1 , we obtain an equation independent of a i , b j , a i + 1 , and b j + 1 . This holds for all z i , j and n i , j , ( i , j ) { 1 , , K 1 } × { 1 , , L 1 } G . Due to (28), also the values n i , j are independent of a 1 , , a K , b 1 , , b L . □
Besides the independence of the result on interval lengths a 1 , , a K , b 1 , , b L , we proved that each equation in the system describing entropy maximization is of the form
κ 1 Z + ( n i , j + B κ 1 ) = κ 2 Z ( n i , j + B κ 2 ) ,
where Z + is the set of indices ( i , j ) for which n i , j occurs with the positive sign and Z is the set of indices for which n i , j occurs with the negative sign. Apparently, card Z + = card Z , therefore the units in (29) cancelled.
The only problem is that n i , j may depend also on other variables. Thus the summands B * contain not only constants C * , but also other independent variables.
In the appendices, we present explicit solutions to some specific instances of Problem 4. They lead to higher-order algebraic equations.

4. Conclusion

We formulated the task of fitting a continuous copula to finitely many given values in such a way that the entropy of the copula density is maximal. This is motivated by situations when some of the values are prescribed and the rest of the copula should be “as independent as possible”, with the intention not to include any other dependence than that contained in the constraints. We simplified the task with several hints, showed that it has a unique solution (because it is equivalent to a convex optimization problem), and demonstrated that a closed-form solution can be found analytically in some simple cases, although it may require solving higher-order polynomials. We propose this concept as an alternative to current approaches, which also maximize the entropy, but use different restrictions of the admitted copulas.

Author Contributions

Writing – original draft, Milan Bubák and Mirko Navara. All authors will be updated at each stage of manuscript processing, including submission, revision, and revision reminder, via emails from our system or the assigned Assistant Editor.

Funding

This research was supported by the CTU institutional support (Future Fund).

Data Availability Statement

This research is not accompanied by any additional data.

Acknowledgments

The authors are grateful to Michal Dibala who inspired and initiated this research. They thank the anonymous reviewers for their comments which helped to improve the original manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

In Problem 4, we put K = L = 3 , G = B K , L { ( 1 , 1 ) , ( 2 , 2 ) } (see Figure A1).
Figure A1. Example 1
Figure A1. Example 1
Preprints 146119 g0a1
The task can be formulated as follows.
max n i , j i = 1 3 j = 1 3 n i , j ln n i , j a i b j , i , j { 1 ; 2 ; 3 } , under conditions n i , j 0 , n 1 , 1 = z 1 , 1 , n 1 , 1 + n 2 , 1 + n 3 , 1 = z 3 , 1 , n 1 , 1 + n 2 , 1 + n 1 , 2 + n 2 , 2 = z 2 , 2 , n 1 , 1 + n 2 , 1 + n 3 , 1 + n 1 , 2 + n 2 , 2 + n 3 , 2 = z 3 , 2 , n 1 , 1 + n 1 , 2 + n 1 , 3 = z 1 , 3 , n 1 , 1 + n 2 , 1 + n 1 , 2 + n 2 , 2 + n 1 , 3 + n 2 , 3 = z 2 , 3 , n 1 , 1 + n 2 , 1 + n 3 , 1 + n 1 , 2 + n 2 , 2 + n 3 , 2 + n 1 , 3 + n 2 , 3 + n 3 , 3 = z 3 , 3 .
We take n 2 , 2 , n 2 , 3 as variables allowing us to express the general solution of the system of linear equations; we introduce new constants C * as abbreviations.
n 1 , 1 = z 1 , 1 = C 1 , n 2 , 1 = n 2 , 2 n 2 , 3 z 1 , 3 + z 2 , 3 C 2 = n 2 , 2 n 2 , 3 + C 2 , n 3 , 1 = n 2 , 2 + n 2 , 3 z 1 , 1 + z 3 , 1 + z 1 , 3 z 2 , 3 C 3 = n 2 , 2 + n 2 , 3 + C 3 , n 1 , 2 = n 2 , 3 z 1 , 1 + z 2 , 2 + z 1 , 3 z 2 , 3 C 4 = n 2 , 3 + C 4 , n 3 , 2 = n 2 , 2 n 2 , 3 + z 1 , 1 z 3 , 1 z 2 , 2 + z 3 , 2 z 1 , 3 + z 2 , 3 C 5 = n 2 , 2 n 2 , 3 + C 5 , n 1 , 3 = n 2 , 3 z 2 , 2 + z 2 , 3 C 6 = n 2 , 3 + C 6 , n 3 , 3 = z 2 , 2 z 3 , 2 z 2 , 3 + z 3 , 3 C 7 = C 7 .
Substituting this in the optimality criterion and putting its partial derivatives equal to zero, we obtain
ln C 2 n 2 , 2 n 2 , 3 a 2 b 1 + 1 ln C 3 + n 2 , 2 + n 2 , 3 a 3 b 1 1 ln n 2 , 2 a 2 b 2 1 + ln C 5 n 2 , 2 n 2 , 3 a 3 b 2 + 1 = 0 , ln C 2 n 2 , 2 n 2 , 3 a 2 b 1 + 1 ln C 3 + n 2 , 2 + n 2 , 3 a 3 b 1 1 ln C 4 + n 2 , 3 a 1 b 2 1 + ln C 5 n 2 , 2 n 2 , 3 a 3 b 2 + 1 + ln C 6 n 2 , 3 a 1 b 3 + 1 ln n 2 , 3 a 2 b 3 1 = 0 .
These equations can be simplified to
ln ( C 2 n 2 , 2 n 2 , 3 ) ( C 5 n 2 , 2 n 2 , 3 ) a 2 b 1 a 3 b 2 = ln ( C 3 + n 2 , 2 + n 2 , 3 ) n 2 , 2 a 3 b 1 a 2 b 2 , ln ( C 2 n 2 , 2 n 2 , 3 ) ( C 5 n 2 , 2 n 2 , 3 ) ( C 6 n 2 , 3 ) a 2 b 1 a 3 b 2 a 1 b 3 = ln ( C 3 + n 2 , 2 + n 2 , 3 ) ( C 4 + n 2 , 3 ) n 2 , 3 a 3 b 1 a 1 b 2 a 2 b 3 .
The divisors are equal and thus cancel out, as shown in the proof of Theorem 1, and this leads to the expression
( C 2 n 2 , 2 n 2 , 3 ) ( C 5 n 2 , 2 n 2 , 3 ) = ( C 3 + n 2 , 2 + n 2 , 3 ) n 2 , 2 , ( C 2 n 2 , 2 n 2 , 3 ) ( C 5 n 2 , 2 n 2 , 3 ) ( C 6 n 2 , 3 ) = ( C 3 + n 2 , 2 + n 2 , 3 ) ( C 4 + n 2 , 3 ) n 2 , 3 .
We divide the second equation by the first,
C 6 n 2 , 3 = ( C 4 + n 2 , 3 ) n 2 , 3 n 2 , 2 ,
and n 2 , 2 can be expressed as
n 2 , 2 = ( C 4 + n 2 , 3 ) n 2 , 3 ( C 6 n 2 , 3 ) .
Substituting to (A1) and expanding, we obtain an equation for n 2 , 3 ,
( C 3 C 4 C 6 ) n 2 , 3 3 + ( C 2 C 6 + C 2 C 4 + C 2 C 5 + C 6 2 + C 4 C 6 + C 5 C 6 C 3 C 6 + C 4 C 5 + C 3 C 4 ) n 2 , 3 2 + ( C 2 C 6 2 C 2 C 4 C 6 C 2 C 5 C 6 C 5 C 6 2 C 4 C 5 C 6 C 3 C 4 C 6 ) n 2 , 3 + C 2 C 5 C 6 2 = 0 .
This is an algebraic equation of order three. It can be solved analytically using Cardano formulas. A problem remains that it can have up to three roots; we want one that is real. The cubic coefficient is
C 3 C 4 C 6 = z 3 , 1 z 2 , 3 .
If it is nonzero, there is at least one real root. Moreover, we need it in the interval [ 0 , 1 ] . Substituting to (A2), we obtain n 2 , 2 , which should be also in the interval [ 0 , 1 ] , as well as all other unknowns. However, we know (from the convexity of the task, resp. concavity of the maximized criterion) that a solution with this property exists.

Appendix B

In Problem 4, we put K = 3 , L = 4 , G = B K , L { ( 1 , 2 ) , ( 2 , 1 ) , ( 2 , 3 ) } (see Figure A2).
Figure A2. Example 3
Figure A2. Example 3
Preprints 146119 g0a2
The task can be formulated as follows.
max n i , j i = 1 3 j = 1 4 n i , j ln n i , j a i b j , i { 1 ; 2 ; 3 } , j { 1 ; 2 ; 3 ; 4 } , under conditions n i , j 0 , n 2 , 1 = C 1 n 1 , 1 , n 1 , 2 = C 2 n 1 , 1 , n 3 , 2 = C 3 + n 1 , 1 n 2 , 2 , n 2 , 3 = C 4 + n 1 , 1 n 2 , 2 n 1 , 3 , n 3 , 3 = C 5 n 1 , 1 + n 2 , 2 , n 1 , 4 = C 6 n 1 , 3 , n 2 , 4 = C 7 + n 1 , 3 ,
where the constants C * are some sums of ± z i , j as in the previous example. We take n 1 , 1 , n 2 , 2 , n 1 , 3 as variables allowing us to express the general solution of the system of linear equations.
We put the partial derivatives equal to zero and eliminate the logarithms. We obtain products of factors in the form of a sum of some ± n i , j and a constant C * . We also know that the result is independent of the values a i , b j . In particular, we obtain
n 1 , 1 : ( C 1 n 1 , 1 ) ( C 2 n 1 , 1 ) ( C 5 n 1 , 1 + n 2 , 2 ) = n 1 , 1 ( C 3 + n 1 , 1 n 2 , 2 ) ( C 4 + n 1 , 1 n 2 , 2 n 1 , 3 ) , n 2 , 2 : ( C 3 + n 1 , 1 n 2 , 2 ) ( C 4 + n 1 , 1 n 2 , 2 n 1 , 3 ) = ( C 5 n 1 , 1 + n 2 , 2 ) n 2 , 2 , n 1 , 3 : ( C 4 + n 1 , 1 n 2 , 2 n 1 , 3 ) ( C 6 n 1 , 3 ) = n 1 , 3 ( C 7 + n 1 , 3 ) .
From (A3) we separate the factors containing only the variable n 1 , 1 ,
( C 1 n 1 , 1 ) ( C 2 n 1 , 1 ) n 1 , 1 = ( C 3 n 2 , 2 + n 1 , 1 ) ( C 4 + n 1 , 1 n 2 , 2 n 1 , 3 ) C 5 n 1 , 1 + n 2 , 2 .
The second equation in (A3) can be written as
( C 3 n 2 , 2 + n 1 , 1 ) ( C 4 + n 1 , 1 n 2 , 2 n 1 , 3 ) C 5 n 1 , 1 + n 2 , 2 = n 2 , 2 .
We substitute (A4) to the left-hand side and express n 2 , 2 as
n 2 , 2 = ( C 1 n 1 , 1 ) ( C 2 n 1 , 1 ) n 1 , 1 .
The third equation in (A3) can be rewritten as
C 4 + n 1 , 1 n 2 , 2 n 1 , 3 = n 1 , 3 ( C 7 + n 1 , 3 ) C 6 n 1 , 3 , C 4 + n 1 , 1 n 2 , 2 = n 1 , 3 ( C 7 + n 1 , 3 ) + n 1 , 3 ( C 6 n 1 , 3 ) C 6 n 1 , 3 , C 4 + n 1 , 1 n 2 , 2 = n 1 , 3 ( C 7 + C 6 ) C 6 n 1 , 3 , C 4 + n 1 , 1 n 2 , 2 = ( C 7 + C 6 ) + ( C 7 + C 6 ) C 6 C 6 n 1 , 3 .
In the last equation of (A7), we substitute n 2 , 2 from (A6) and obtain n 1 , 3 :
C 4 + n 1 , 1 ( C 1 n 1 , 1 ) ( C 2 n 1 , 1 ) n 1 , 1 = ( C 7 + C 6 ) + ( C 7 + C 6 ) C 6 C 6 n 1 , 3 , ( C 4 + n 1 , 1 ) n 1 , 1 ( C 1 n 1 , 1 ) ( C 2 n 1 , 1 ) + ( C 7 + C 6 ) n 1 , 1 n 1 , 1 = ( C 7 + C 6 ) C 6 C 6 n 1 , 3 , n 1 , 1 ( C 1 + C 2 + C 4 + C 6 + C 7 ) C 1 C 2 n 1 , 1 = ( C 7 + C 6 ) C 6 C 6 n 1 , 3 , n 1 , 1 n 1 , 1 ( C 1 + C 2 + C 4 + C 6 + C 7 ) C 1 C 2 = C 6 n 1 , 3 ( C 7 + C 6 ) C 6 , n 1 , 1 ( C 7 + C 6 ) C 6 n 1 , 1 ( C 1 + C 2 + C 4 + C 6 + C 7 ) C 1 C 2 = C 6 n 1 , 3 , C 6 n 1 , 1 ( C 1 + C 2 + C 4 + C 6 + C 7 ) C 1 C 2 n 1 , 1 ( C 7 + C 6 ) C 6 n 1 , 1 ( C 1 + C 2 + C 4 + C 6 + C 7 ) C 1 C 2 = n 1 , 3 , C 6 ( n 1 , 1 ( C 1 + C 2 + C 4 ) C 1 C 2 ) n 1 , 1 ( C 1 + C 2 + C 4 + C 6 + C 7 ) C 1 C 2 = n 1 , 3 .
We have both n 2 , 2 in (A6) and n 1 , 3 in (A8) expressed as functions of n 1 , 1 . This can be substituted to the first equation in (A3), which yields
L : ( C 1 n 1 , 1 ) ( C 2 n 1 , 1 ) C 5 n 1 , 1 + ( C 1 n 1 , 1 ) ( C 2 n 1 , 1 ) n 1 , 1 , R : n 1 , 1 C 3 ( C 1 n 1 , 1 ) ( C 2 n 1 , 1 ) n 1 , 1 + n 1 , 1 ( C 4 + n 1 , 1 ( C 1 n 1 , 1 ) ( C 2 n 1 , 1 ) n 1 , 1 C 6 ( n 1 , 1 ( C 1 + C 2 + C 4 ) C 1 C 2 ) n 1 , 1 ( C 1 + C 2 + C 4 + C 6 + C 7 ) C 1 C 2 ) ,
where L (resp. R) denotes the left-hand (resp. right-hand) side of the equation. We modify this expression to discuss its analytical solvability:
L : ( n 1 , 1 2 ( C 1 + C 2 ) n 1 , 1 + C 1 C 2 ) ( ( C 1 C 2 + C 5 ) n 1 , 1 C 1 C 2 ) n 1 , 1 , R : ( ( C 1 + C 2 + C 3 ) n 1 , 1 C 1 C 2 ) · n 1 , 1 2 ( C 1 + C 2 + C 4 ) ( C 1 + C 2 + C 4 + C 7 ) n 1 , 1 C 1 C 2 ( 2 C 1 + 2 C 2 + 2 C 4 + 2 C 6 + C 7 ) + C 1 2 C 2 2 n 1 , 1 ( n 1 , 1 ( C 1 + C 2 + C 4 + C 6 + C 7 ) C 1 C 2 ) .
Multiplication by a common divisor or both sides gives
L : ( n 1 , 1 2 ( C 1 + C 2 ) n 1 , 1 + C 1 C 2 ) ( ( C 1 C 2 + C 5 ) n 1 , 1 C 1 C 2 ) · ( ( C 1 + C 2 + C 4 + C 6 + C 7 ) n 1 , 1 C 1 C 2 ) , R : ( ( C 1 + C 2 + C 3 ) n 1 , 1 C 1 C 2 ) ( n 1 , 1 2 ( C 1 + C 2 + C 4 ) ( C 1 + C 2 + C 4 + C 7 ) n 1 , 1 C 1 C 2 ( 2 C 1 + 2 C 2 + 2 C 4 + 2 C 6 + C 7 ) + C 1 2 C 2 2 ) .
After expansion, the left-hand side is a polynomial of order 4 and the right-hand side is a polynomial of order 3. Their difference is a polynomial of order 4, which is solvable analytically.
The remaining variables are obtained by substitution to (A6) and (A8). This results in the set of all solutions; substitution of the feasible solution to the equation for differential entropy allows us to determine the maximum.

Notes

1
The bounds, 0 and 1, are omitted for simplification of formulas.
2
More generally, an n-copula describes the dependence of n random variables. Here we deal only with 2-copulas.
3
The paper [5] deals equally with several definitions of entropy, here we consider only the original Shannon entropy, as the best-motivated notion.
4
We acknowledge that Michal Dibala came up with the idea of using maximum entropy as a criterion for the choice of a copula [13]. The present paper explores this idea in detail; it is based on the bachelor thesis [14].
5
Strictly speaking, the joint cdf is defined on the whole plane, but its restriction to the unit square, even to its interior, 0 , 1 2 , determines the copula uniquely.
6
The boundary rows and columns are not considered.
7
From now on, we denote this set again by G, although this was used for all given values from { 0 , , I } × { 0 , , J } before.

References

  1. Nelsen, R.B. An Introduction to Copulas, 2nd ed.; Vol. 139, Lecture Notes in Statistics, Springer, New York, NY, 2006.
  2. Sklar, A. Fonctions de répartition à n dimensions et leurs marges; Vol. 8, Publ. Inst. Statist. Univ. Paris, 1959; pp. 229–231.
  3. Shannon, C.E. A Mathematical Theory of Communication. The Bell System Technical Journal 1948, 27, 379–423. [Google Scholar] [CrossRef]
  4. Jaynes, E.T. Information theory and statistical mechanics. Physical Review 1957, 106, 620–628. [Google Scholar] [CrossRef]
  5. Pougaza, D.B.; Mohammad-Djafari, A. Maximum Entropies Copulas. In Proceedings of the 30th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, France, 2010; pp. 329–336, [https://pubs.aip.org/aip/acp/article-pdf/1305/1/329/11567694/329_1_online.pdf]. [CrossRef]
  6. Pougaza, D.B.; Mohammad-Djafari, A.; cois Bercher, J.F. Link between copula and tomography. Pattern Recognition Letters 2010, 31, 2258–2264. [Google Scholar] [CrossRef]
  7. Ma, J.; Sun, Z. Mutual information is copula entropy. Tsinghua Science & Technology 2011, 16, 51–54. [Google Scholar] [CrossRef]
  8. Singh, V.P.; Zhang, L. Copula–entropy theory for multivariate stochastic modeling in water engineering. Geoscience Letters 2018, 5. [Google Scholar] [CrossRef]
  9. Piantadosi, J.; Howlett, P.; Boland, J.W. Matching the grade correlation coefficient using a copula with maximum disorder. Journal of Industrial and Management Optimization 2007, 3, 305–312. [Google Scholar] [CrossRef]
  10. Piantadosi, J.; Howlett, P.; Borwein, J. Copulas with Maximum Entropy. Optimization Letters 2012, 6, 99–125. [Google Scholar] [CrossRef]
  11. Lin, L.; Wang, R.; Zhang, R.; Zhao, C. The checkerboard copula and dependence concepts. ArXiv e-prints, [arXiv:2404.15023].
  12. Genest, C.; Nešlehová, J.G.; Rémillard, B. Asymptotic behavior of the empirical multilinear copula process under broad conditions. Journal of Multivariate Analysis 2017, 159, 82–110. [Google Scholar] [CrossRef]
  13. Dibala, M.; Navara, M. Discrete Copulas and Maximal Entropy Principle. In Proceedings of the Copulas and Their Applications, Almeria, Spain; 2017; p. 24. [Google Scholar]
  14. Bubák, M. Copulas with Maximal Entropy (in Czech). [http://hdl.handle.net/10467/115430]. BSc. Thesis, Czech Technical University in Prague, 2024.
  15. Bertsekas, D.P.; Nedić, A.; Ozdaglar, A.E. Convex Analysis and Optimization; Athena Scientific, 2003.
  16. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press, 2004.
  17. Hiriart-Urruty, J.; Lemaréchal, C. Fundamentals of Convex Analysis; Grundlehren Text Editions, Springer, 2004.
  18. Rockafellar, R.T. Convex Analysis; Princeton Mathematical Series, Princeton University Press, 1970.
Figure 1. Sample of our notation
Figure 1. Sample of our notation
Preprints 146119 g001
Figure 2. Possible set G of points in which the values of a copula are given and a covering of the grid by disjoint irreducible rectangles (bold lines)
Figure 2. Possible set G of points in which the values of a copula are given and a covering of the grid by disjoint irreducible rectangles (bold lines)
Preprints 146119 g002
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated