Truncation Mixture R-vine copulas

Uncovering hidden mixture correlation among variables have been investigating in the literature using mixture R-vine copula models. These models are hierarchical in nature. They provides a huge ﬂexibility for modelling multivariate data. As the dimensions increases, the number of the model parameters that need to be estimated is increased dramatically, which becomes along with huge computational times and eﬀorts. This situation becomes even much more harder and complicated in the mixture Regular vine models. Incorporating truncation method with mixture Regular vine models will reduce the computation diﬃculty for the mixture based models. In this paper, tree-by-tree estimation mixture model is joined with the truncation method, in order to reduce the computational time and the number of the parameters that need to be estimated in the mixture vine copula models. A simulation study and a real data applications illustrated the performance of the method. In addition, the real data applications show the aﬀect of the mixture components on the truncation level. 1


Introduction
Copula is a statistical tool used to model dependencies structures among variables independently from their margins. Several forms of copula functions are exist, which can deal with a wide range of dependency shapes ranging from independent to non-Gaussian distribution. Elliptical copula are most commonly used multivariate models, due to their ease of computation. Archimedes copula is another famous class of copula functions. These families are able to control a wide class of dependency structures including heavy tails. For example, Clayton copula function can capture a lower tail dependence while Gumbel copula is an upper tail function. For more copula families interesting readers are refereed to Nelsen [1] and Joe [2].
Copula has received interesting attention in many applications. For example, Bárdossy [3], Kazianka and Pilz [4] (geostatistic), and Patton [5] (a review of copula models in economics area). As each copula family is corresponding to a specific shape of dependency, copula imposes the same dependence structure type among all variables, which may have different shapes of dependency. Assuming the same relationship among all variables may is not the case for most of the real life dataset. Gaussian and t-student copulas are the most commonly used families in high-dimensions cases, while other families are almost restricted to bivariate cases. Parameters restriction and limited type of multivariate copula are two main reasons for leading the copula-based model to be inappropriate for modelling high-dimensions datasets that exhibit multiple dependency types among variables. Even though mixture copula models show significant results comparing to non-copula mixture models, (see, for example, Hu [6] and Zhang and Shi [7]), they still suffer from the same limitations as copula-based models. Therefore, pair-copula or Regular vine copula model has been established in the literature to address the drawbacks of copula models. Pair-copula are hierarchical models, which model only two variables at a time using bivariate copula functions (pair-copula). In vine copula models the type of bivariate copula do not necessary need to be identical for all pair of variables. Therefore, multivariate distribution is still valid even if, for each pair of variables, we determine the copula that best fit the data (Aas et al. [8], Aas and Berg [9]). This forms the main strength of the vine copula models as the dependence shapes may vary from one pair of variables to another. Since 2009, vine copula models have received raising interesting in the literature (see, for example, Czado [10], Czado et al. [11], Erhardt et al. [12], Gruber and Czado [13]).
Although the individual choice for the best fit bivariate copula is one main strength of vine copula models, knowing the type of each bivariate copula can be a very difficult challenge. For this reason, several interesting works have been introduced in the literature. Mixture pair-copula models are one of the main solution for identifying the best fit bivariate copula types for each copula term. Reducing the misspecification of pair-copula types and uncovering complex hidden dependency among variables are the two main advantages of the mixture pair-copula models (see, for example, Kim et al. [14], Roy and Parui [15], Weiß and Scheffer [16]. Unfortunately, beside the identification problem, R-vine copula models also suffer from the dramatic increase of the number of the model parameters in high-dimension. For n-dimension R-vine copula, one need to estimate n * (n−1) 2 , which becomes huge for large datasets. Mixture regular vine copula even increases the difficulties of the pair-copula models, which can be discussed in two points. First, estimating the mixture components for each pair of variables is not straightforward. Second, the number of the parameters to be estimated increases Truncation method have been already investigated in the literature by Brechmann et al. [17] and Brechmann and Joe [18]. By the truncation method, the pair-copulas at specific levels are replaced by independent copula. Therefore, the parameters at these levels do not need to be estimated, hence, the computation complexity and the needed efforts is reduced significantly.
In mixture vine-copula models, Roy and Parui [15] have been used fixed truncation levels (at the second tree) based on fixed types of mixture pair-copula components. However, the truncation levels should be estimating as fixing the truncation levels may result in losing some important information among the variables. That is, after the truncation level, all the pair of variables must (almost) show independence structures otherwise the model should not be truncated. In the mixture vine copula models, the mixture components affect the truncation levels. This can be shown in the real data application in section (6). Hence, in truncation models, the modellers try to (hopefully) reduce the estimated tree in the model. Hence, they need to estimate the optimal tree where the model should be truncated. For the mixture models, to best of my knowledge, estimating the truncation levels using statistical selection methods have not investigated yet in the literature, which is the main aim of this work.
The rest of the paper is structured as follows: section (2) briefly discusses the theoretical background of copula and pair-copula. Section (3) introduced the mixture R-vine and the Expectation Maximization Algorithm (EM), which is the algorithm used to estimate the model parameters. A truncation method is introduced in section (4). The truncation method with mixture R-vine model is illustrated with a simulation and real data applications in section (5) and section (6), respectively.

Theoretical Background
The aim of this section is to provide a general summary of the theoretical background of copula and pair-copula models. For more details, the interesting readers are referred to the given references.
Copula is a multivariate function that couple the margins distribution to their one-dimension standard uniform margin (Nelsen [1]). [19]) Copula is a multivariate distribution functions with standard uniform margins, such that,

Definition 2.1 (Schweizer and Sklar
Theorem 2.1 Let F be a n-dimensional distribution function with marginal distribution F 1 , ..., F n . Then there exists n-dimensional copula function C such that, ∀ x1, ..., x n ∈ R n , If F n are continuous then C is unique.
One main advantage of copula models is that the modellers are able to model the margins independently from the dependency structures, which is captured via copula function. Another advantages is the ability of copula families to deal with a wide range of dependency forms including non-Gaussian, Gaussian and heavy tails. However, copula function imposes the same type of dependency shapes among all the variables, even in the high-dimensions cases regardless of the strengths or the type of these dependencies. This forms one main limitation of the copula models. In addition, identifying the form of the copula function that best fit the data is not an easy step as each copula function associates with a specific shape of dependency. Therefore, most of copula models are limited to a bivariate cases. Multivariate copula are almost limited to Gaussian and t-student. However, these families are inadequate to deal with non-elliptical dependency.
In 2009, Aas et al. [8] established a much more promising method, based on the work of Bedford and Cooke [20], Bedford and Cooke [21], Joe [22], and Kurowicka and Cooke [23], to address the problem of copula models in high-dimensions. Their method known as vine copula, pair-copula construction (PCC), and a Regular vine (R-vine) copula. The PCC method builds a multivariate models using only bivariate copula (pair-copula). Therefore, only two variables are modelled at a time. Hence, PCC-based model provides even much more model flexibility and capability than copula-based model.

Definition 2.2 (Tree) Bedford and Cooke [20] T = {N, E} is a tree (an acyclic graph) with N nodes and E edges (connect each pairs of N ).
The degree of the node is the total number of edges connected to this node. [23,Ch. 4] V is a vine on d elements if:

Definition 2.3 (Vine, Regular vine) Kurowicka and Cooke
, where, T 1 indicates the first tree of the vine and so on.
In addition, V becomes a Regular vine on d elements if: • For j = 2, ..., d − 1, if l = {l 1 , l 2 } and m = {m 1 , m 2 } in T j are two nodes connected by an edge in T j , then exactly one of l j is equal to m j , j = 1, 2. This condition is known as the proximity condition.
Under the proximity condition, two nodes in tree (T j+1 ) are only connected by an edge if they were sharing a common node in the previous tree (T j ). [23] defined the D and C-vine models as follows:

Kurowicka and Cooke
• If every node at the first tree of a regular vine is connected at maximum with two nodes, then the regular vine is called D-vine.
• If at each tree T j of a regular vine, there is one particular node that is connected to all other nodes, then the regular vine is called a C-vine. At the first tree, this node is called a root node. [21]) (F, V, B) is a Regular vine copula (R-vine copula) specification if,

a vector of continuous invertible distribution functions.
• V is an n−dimensional regular vine (R-vine).
., x n ) be a vector of random variables, e = {l, m} an edge and i = 1, ..., n and D e a conditioning set of the edge e. Bedford and Cooke [21] defined a regular vine dependence as follow:

Definition 2.5 (Regular vine (R-vine) dependence) A joint distribution function
F on x is said to realize an regular vine copula specification (F, V, B) or exhibit regular vine dependence if for each e ∈ E i , the bivariate copula of X C el and The bivariate copula of X C el and X C em given X De is a conditional bivariate copula which assumed to be independent of conditioning variables (see, Aas et al. [8] and Haff et al. [24]).

Theorem 2.2
Dißmann et al. [25] Let (F, V, B) n-dimensional regular vine specification. Then, there is a unique distribution function F that realizes (F, V, B). Its density is: where, Moreover, c C el ,C em |D e stands for the density function of bivariate copula between edge e = {l, m}.
Continuing to the last theorem, let e ∈ E i , e = {l, m}, l = {l 1 , l 2 }, m = {m 1 , m 2 } be the edge that joined C el and C em . Joe [22] showed that, the conditional marginal distribution, F Cel|D e )(x Cel |x D e and F C em |D e )(x C em |x D e ), can be obtained as follows: are then called a transformed variables (see, Aas et al. [8] and Dißmann et al. [25]).
Both PCC and copula models share the same identification problem, which is even much more harder in PCC than copula models. Furthermore, for n-dimensional R-vine copula, there are n * (n − 1)/2 parameters to be estimated, which becomes huge for high-dimensions datasets. This number is however, very large for mixture models. For example, for mixture models, one needs to estimates (for single parameters) (2m − 1)( n·(n−1) 2 ), where m is the number of the mixture components. However, the possible estimated parameters of mixture PCC models is (3m − 1)( n·(n−1) 2 ). Hence, the number of model parameters is strongly depends on the number and the type of the mixture components. For example, for 31-dimensional dataset, and for 2 mixture components, one need to estimates 2790 parameters. This number highly increases with the dimensions and the number of the mixture components. Therefore, model reduction is necessary to reduce the model complexity of the mixture PCC models. This can be achieved by only modelling first k trees instead of the full models, where the higher order trees are set to independences copulas (see, Dißmann et al. [26]). In this case, the full mixture R-vine model (FMRDM) becomes k-truncated mixture R-vine model.

Mixture R-vine models and EM algorithm
Mixture models facilities modelling complex hidden correlations among variables by fitting a sum of weighed densities functions to the underlying problem. A finite mixture pair-copula construction combines the benefits of both the mixture and the vine copula models, in order to provide huge flexibility and modelling capabilities for modelling high-dimensions datasets. By doing so, the mixture pair-copula models allow fitting different mixture bivariate copulas for each pair of variables. That is, mixture vine copulas may be defined as a building block of the mixture pair-copulas.

Finite mixture model
Suppose that a d-dimensional random vector x = (x 1 , .., x d ) is generated from R-vine mixture model of k-components mixture bivariate copulas for each pair of variables. Suppose further that U i = F i (x i ), i = 1, .., d is the probability integral transforms of x. Since in vine copula models only one pair of variables is modelled at a time, the focus here is on the density of the mixture bivariate copulas for each copula term.
Assume that the interesting is in modelling the dependencies structure between two random variables, X 1 and X 2 . Assume further that these variables are transformed to copula data using empirical cumulative distribution function. Hence, the density of the mixture bivariate copulas, which model the bivariate dependence structures between X 1 and X 2 , is given by: where, π k is an unknown parameter (known as a mixture coefficient or weights) of the k th component which satisfying the following: Θ is the set of all model parameters while θ k is the set of all the parameters of the k th component. In mixture models, Expectation Maximization algorithm (EM-algorithm) is commonly used method to estimate the model parameters. Further details of this method will be introduced in the next section.

EM-algorithm
Expectation Maximization algorithm (EM) (Dempster et al. [27]) is an estimation method with two steps, the so-called the Expectation step (E-step) and the Maximization step (M-step).
Suppose that a data set of size N are drawn independently from k-components mixture bivariate copulas. Suppose further that the data is converted to uniform distribution using empirical cumulative distribution function. Then, the log-pseudo likelihood function of Θ, is given as follows: where Θ is the set of all model parameters while θ k is the set of all the parameters of the k th component. EM algorithm introduces a latent variables z n = (z n1 , z n2 , ...., z nk ) where z nk = 1 if the n th observation is drawn from the k th component and z nk = 0 otherwise. In other words, the z nk indicates from which mixture component each observation was drawn. These latent variables are assumed to be independent and identically distributed from the multinomial distribution such that, Consequently, we now have the complete data, Then, the complete-data log likelihood function, c (Θ), is given as follows: EM-algorithm starts with initial values of the unknown parameters Θ (0) and the two steps (E and M) are repeated until convergence ( (Θ) (m+1) − (Θ) (m) ) is smaller than a pre-specified tolerance). E-step: calculate the conditional expectation of the complete data log likelihood, c (Θ) in equation (7), given the observed data and using the current estimate of the parameters (Θ).
Suppose that we are at iteration m + 1. Then the conditional expectation of z nk is calculated as follows:

M-step:
Maximize the complete data log likelihood, c (Θ) (from E-step), with respect to (Θ) in order to produce a new estimate of the model parameters (Θ (n+1) ). In this step estimation of each component parameter is computed independently, i.e., π can be obtained as follows: while the updated of θ (m+1) k can be obtained by maximizing the following equation using numerical maximization method: 4 Truncate mixture R-vine copulas.
The flexibility of pair-copula models reduces as the dimensions increases. Truncating R-vine models is one main solution that plays a key role to address this problem of pair-copula models. Truncating R-vine refers to replace all the pair-copula in higher order tress to independent copulas. The main idea of the truncation mixture R-vine model can be presented in the following example.   In this example, a mixture of two bivariate single parameter copulas are fitted to each pairs. Therefore, there are 63 parameters to be estimated for the full models. Assume that, this model is truncated at tree 3 (k = 3). Hence, we will have 3-truncated mixture R-vine model. By doing so, the conditional mixture bivariate copulas at trees 4, 5, and 6 are set to independent copulas. Hence, in this case, only 45 parameters need to be estimated instead of 63 as with the full model. That is because in 3-truncated mixture R-vine model there are only 15 edges, while there are 21 edges for the full mixture R-vine model. For very high-dimensions datasets with large number of mixture components (say 5), the truncations at the first trees will be very reasonable.

Methodology
Brechmann et al. [17] developed the most widely used truncation method, which truncated the R-vine models sequentially, using different Goodness model fit, including Akaike information criteria (AIC) of Akaike [28], Bayesian information criteria (BIC) of Schwarz et al. [29]. In this section, the sequential truncation method of the Brechmann et al. [17] (Algorithm 1) (also see, Algorithm 7 of Brechmann [30]) is incorporating with mixture R-vine models using well-known selection criteria. In this paper, the AIC, BIC, and the consistent Akaike information criteria (CAIC) of Bozdogan [31] are employed. The formulas of these criteria are given as follows: BIC = −2 ln L(θ) + P (ln (N)) whereθ is the estimation values of the parameters. if BIC 1 < BIC 2 then Truncated mixture R-vine at level t.

end if end for
The truncation of mixture R-vine models can be summarized in the following steps: 1. Select specific number of trees, say the first two trees. truncate the mixture R-vine at the previous model. 6. If the new model shows significant contribution to the previous model, then, iterate steps 3:6.
For example, consider the mixture R-vine model shown in Example (4.1). At the first step, a small model (only the first two trees) is constructed (first model). Then, the mixture of two components of bivariate copulas are fitted to each pair of variables of this model. Then, the model parameters are estimated. After that, and at the second step, the BIC 1 is computed, where BIC 1 refers to the BIC of the first model . Then, a new tree is added to the model. Now the model is constructed using only three trees (second model). After that, the BIC 2 is computed for the second model. If BIC 1 < BIC 2 then, the model is truncated at the t = 2 and the first model is returned. Otherwise, a new model is constructed by adding a new tree and the steps are iterated until the optimal truncated level is reached.
As mentioned above, the truncation process with mixture dependencies is complex and not straightforward as it is affected by the combination of the bivariate copulas. For example, one type of mixture bivariate copulas may cause the model to be truncated at one level, while the same model may be truncated at different level when fitting different mixture components. This potential result is illustrated in section (6).

Simulation study
To illustrate the performance of the Sequential mixture truncation method, a simulated data is generated from two components mixture R-vine model with only two levels (see Figure (4). After that, the true model, a three levels, and a full two components 5-dimensional mixture R-vine model are fitted to the data, respectively. Then, the AIC, BIC and CAIC are computed for each model. Since the test aims to show the performance of the truncation method, and for a comparison reason, the result of all the fitted mixture R-vine models are reported.
Before reporting the final results, the idea of the simulation study is represented in much more details using a graph representation. Consider 5-dimensional, two components, mixture R-vine model. Figures (2, 3, 4) present 3 different mixture R-vine models. These models are, full mixture R-vine model, 3-levels, and 2-levels truncated mixture R-vine models, respectively. The main differences between these models is the number of the trees to be modelled. For example, for the full mixture R-vine models, there are 4-trees and one need to estimate the whole model. However, in the case of the truncation models, the conditional mixture bivariate copulas at levels 2 and 3 are replaced by independent copulas ( c ). Hence, instead of modelling the whole model, one only need to estimates the mixture bivariate copulas up to the truncation levels. For very large datasets, say 100 dimensions, this will result in a very huge reducing of the model complexity and the parameters that need to be estimated.   Tables (1, 2) summarize the informations of three fitted models. The summary includes the mixture type of the bivariate copulas (at each pair) and mixture weights, while Tables (3,4) report the result of the three models.      Table (3) and Table ( 4), the estimation values of the model parameters at the first trees of all the models are very close to the true values. Hence, the dependencies structures are described well and the performance of the EM-algorithm is satisfied. In addition, for the 3-levels truncated model, the corresponding parameters of the mixture bivariate copulas at trees 3 and 4 are very close to the independent boundary of each bivariate copula. For example, at tree 3 of 3-levels truncated mixture R-vine models, the parameters of Frank and Gaussian copulas are −0.5 and 0.091, respectively. In addition, the corresponding Kendall's tau value of these copulas are −0.040 and 0.061, which are very small, indicating that the corresponding variables are almost independent. Again, this illustrates the performance of the EM-algorithm to accurately estimates the models parameters. After estimating the model parameters and testing the model performance, the three model selection criteria are computed for each model, in order to illustrate the ability of the truncation method to select the most optimal truncation level of the mixture R-vine models. The values of the selection criteria are shown in Table (5).  (5), the truncated mixture R-vine model at level 2 shows the most best model fits, while the Full model shows the worst model fits. In addition, all the selection criteria selected the true model (the model from where the simulated data has been generated). Comparing the selection methods values of the truncation mixture R-vine model with the 3-levels truncated model, one can clearly see that the model is truncated correctly. That is, lets AIC 1 , BIC 1 , and CAIC 1 correspond to the 2-levels truncated mixture R-vine model, and AIC 2 , BIC 2 , and CAIC 2 are corresponding to the 3-levels truncated model. Then, from the table, AIC 1 < AIC 2 , BIC 1 < BIC 2 , and CAIC 1 < CAIC 2 . The same result is hold when comparing the true model with the Full one. Therefore, the result can be interpreted as an evidence of the ability of the truncation method to select the most optimal truncation level of the mixture R-vine model. Hence, the performance of the truncation method with mixture R-vine is illustrated.

Real data applications.
This section aims to demonstrate the performance of the sequential truncation method of mixture R-vine models when applied to real datasets. For this reasons, two high-dimensional real datasets are tested, namely, Vowel and Ionosphere datasets, which were obtained from [32] repository. They consist of 990 and 351 observations, respectively. As the aim of this paper is to incorporating truncation method with mixture R-vine models, the focus will be on fixed mixture R-vine models, in order to avoid extra complexity and model computation. For each dataset, different fixed mixture R-vine models.
Before illustrating the performance of the truncation method on the mixture R-vine models, a full information of the fitted mixture bivariate copulas, for each dataset of each model, is given in Table (  The dimensions of these datasets are 10 and 32, respectively. Hence, there are two different Full mixture R-vine models, one with 10-dimensional, 9-trees, and 45-edges, while the second one is 32-dimensional mixture R-vine with 31-trees and 496-edges. For these models, and unlike non-mixture R-vine model, the number of the parameters to be estimated is strongly depends on the type and the number of the mixture components. For example, for 4-mixture components of single parameter bivariate copulas, the second model will contain 3, 472 parameters. One can imagine how much the significant reduction of the model complexity will be obtained if the truncation level can be reached at the first levels. Another important point, as mentioned above, is the influences of the mixture components on the truncation levels. These two points are illustrated in Tables (7, 8).  From Tables (7,8), the two main points, mentioned above, are illustrated. First, from Table (7), the truncation level is strongly influenced by the type of the mixture components. For first and second mixture models, there is no possible truncation level, while the third mixture model is truncated at level 7. Hence, the truncation level should not be fixed and need to be estimated, in order to avoid ignoring any possible information. Furthermore, for the third mixture model, and by truncation method, there are 27 parameters that not need to be estimated in comparison with the full model (third model without truncation level). For the second dataset, both mixture models are truncated at the third levels. Therefore, there are only 609 parameters to be estimated out of 3, 472, which provides a very significant reduction of the model computation complexity and effort, which illustrated the second point mentioned above.

Conclusion
Modelling only two variables at a time using (mixture) bivariate copulas is one of the main benefits of (mixture) pair-copula models. However, these flexibility are reduced with the dimensions, due to the large number of the model parameters to be estimated. In this paper, the truncation method were incorporating with mixture R-vine models. Estimating the truncation levels for the mixture R-vine model is not straightforward approach as the affect of the mixture components on the result of the models. The performance of the truncation method with EM-algorithm was illustrated. The simulation study showed the ability of the model to accurately estimate the truncation level and the model parameters. The real data study showed the significant reduction of the model computation. In addition, from the real study, the affect of the mixture components on the truncation level was illustrated.
The remain questions are, how would estimating the mixture components, for each pair of variables, affect the optimal truncation level? In addition, how could ordering the variables, based on the mixture components, provide a new way to estimate the mixture components of each pair of variables? and how would it affect the truncation level? These questions are left as future works.