Learning dyadic data and predicting unaccomplished co- occurrent values by mixture model

Dyadic data which is also called co-occurrence data (COD) contains co-occurrences of objects. Searching for statistical models to represent dyadic data is necessary. Fortunately, finite mixture model is a solid statistical model to learn and make inference on dyadic data because mixture model is built smoothly and reliably by expectation maximization (EM) algorithm which is suitable to inherent spareness of dyadic data. This research summarizes mixture models for dyadic data. When each co-occurrence in dyadic data is associated with a value, there are many unaccomplished values because a lot of co-occurrences are inexistent. In this research, these unaccomplished values are estimated as mean (expectation) of random variable given partial probabilistic distributions inside dyadic mixture model.


Introduction
Suppose data has two parts such as hidden part X and observed part Y and we only know Y. A relationship between random variable X and random variable Y is specified by the joint probabilistic density function (PDF) denoted f(X, Y | Θ) where Θ is parameter. Given sample = {Y1, Y2,…, YN} whose all Yi (s) are mutually independent and identically distributed (iid), it is required to estimate Θ based on whereas X is unknown. Expectation maximization (EM) algorithm is applied to solve this problem when only is observed. EM has many iterations and each iteration has two steps such as expectation step (E-step) and maximization step (Mstep). At some t th iteration, given current parameter Θ (t) , the two steps are described as follows: E-step: The expectation Q(Θ | Θ (t) ) is determined based on current parameter Θ (t) , according to equation 1.1 (Nguyen, 2020, p. 50).

M-step:
The next parameter Θ (t+1) is a maximizer of Q(Θ | Θ (t) ) with subject to Θ. Note that Θ (t+1) will become current parameter at the next iteration (the (t+1) th iteration). Table 1.1. E-step and M-step of EM algorithm EM algorithm will converge after some iterations, at that time we have the estimate Θ (t) = Θ (t+1) = Θ * . Note, the estimate Θ * is result of EM. The EM algorithm shown in table 1.1 is also called general EM or GEM.
Especially, the random variable X represents latent class or latent component of random variable Y. Suppose X is discrete and ranges in {1, 2,…, K}. As a convention, let k=X. Note, because all Yi (s) are iid, let random variable Y represent every Yi. The so-called probabilistic finite mixture model is represented by the PDF of Y, as follows: Where, Θ = ( , , … , , , , … , ) = 1 Note, the superscript "T" denotes transpose operator for vector and matrix. The Q(Θ | Θ (t) ) is re-defined for finite mixture model as follows (Nguyen, 2020, p. 79): An interesting application of finite mixture model is soft clustering. Traditional clustering methods assign a fixed cluster to every data point in sample, which means that every data point belongs exactly to one cluster. Soft clustering is more flexible when every data point belongs to more than one cluster and the degree of assignment is represented by a probability. It is easy to recognize that when mixture model is applied into soft clustering, latent class k represents a cluster.
Every observation in ordinary sample is univariate or multivariate but there is a case that ordinary sample becomes dyadic sample related to two sets of objects, which causes some modifications of mixture model. Dyadic data which is also called co-occurrence data (COD) contains co-occurrent events of objects. It is necessary to obtain statistical models to represent dyadic data and fortunately, finite mixture model is the one. Recall that EM is applied to learn mixture model. The next section focuses on mixture model for dyadic data.
Given fixed xk, let be the -partitioned subset of which contains co-occurrences whose -objects are fixed at xk (Hofmann & Puzicha, Statistical Models for Co-occurrence Data, 1998, p. 1). Note, can be empty. The size of is . = , , , : = (2.3) Dyadic data is partitioned into | | subsets .
The mixture model of dyadic data is called symmetric mixture model (SMM) if αk (s) are independent from both xi and yj. SMM is defined as follows (Hofmann & Puzicha, Statistical Models for Co-occurrence Data, 1998, p. 2): Where αk is the probability of aspect k. Note, P(.) denote probability. = ( ) The | is the probability of xi given aspect k.

| = ( | )
The | is the probability of yj given aspect k. By applying GEM, given dyadic sample , at the t th iteration of GEM, given current parameter Θ (t) = (αk (t) , pi|k (t) , qj|k (t) ) T , the conditional expectation Q(Θ|Θ (t) ) is: Similarly, the next parameters qj|k (t+1) is: The two steps of GEM algorithm for SMM at some t th iteration are shown in table 2.1.
The mixture model of dyadic data is called asymmetric mixture model (AMM) if αk (s) are only independent from xi or from yj. Without loss of generality, given αk (s) are only independent from yj (of course, it is dependent on xi), AMM is defined as follows (Hofmann & Puzicha, Statistical Models for Co-occurrence Data, 1998, p. 3): The αk|i is the probability of aspect k given xi.

| = ( | )
Where pi is the probability of xi.
= ( ) The qj|k is the conditional probability of yj given aspect k. Suppose yj is dependent from xi given k, we have: | = , = Note, qj|i is the conditional probability of yj given xi, which is defined as follows: The joint probability of xi, yj, and k is: The parameter of AMM is Θ = (αk|i, pi, qj|k) T in which there are K(| | + | |) + | | partial parameters αk|i, pi, and qj|k. Note, By applying GEM, given dyadic sample , at the t th iteration of GEM, given current parameter Θ (t) = (αk (t) , pi|k (t) , qj|k (t) ) T , the conditional expectation Q(Θ|Θ (t) ) is: We use Lagrange duality method to maximize to maximize Q(Θ|Θ (t) ). The Lagrange function la(Θ, λ | Θ (t) ) is sum of Q(Θ|Θ (t) ) and these constraints, as follows: Note, λ = (λ1, λ2, λ3) T where λ1≥0, λ2≥0, and λ3≥0 are called Lagrange multipliers. Of course, la(Θ, λ | Θ (t) ) is function of Θ and λ. The next parameters Θ (t+1) that maximizes Q(Θ|Θ (t) ) at Mstep of some t th iteration is solution of the equation formed by setting the first-order partial derivatives of Lagrange function regarding Θ and λ to be zero. The first-order partial derivative of Lagrange function regarding αk|i is: Setting this partial derivative to be zero, we obtain:  As usual, αk is the probability of aspect ck but | is the probability of xi given of k and is the probability of yj given of k. Learning PMM is like learning SMM and so it is not necessary to duplicate the expansion of Q(Θ|Θ (t) ). The two steps of GEM algorithm for PMM at some t th iteration are shown in table 2.3. E-step: The conditional probabilities P(k | xi, yj, Θ (t) ), P( | xi, yj, Θ (t) ), and P( | xi, yj, Θ (t) ) are calculated based on current parameter Θ (t) = ( ) , | ( ) , ( ) , according to equation  ( ) , which is the maximizer of Q(Θ | two objects ( -object and -object). As a result, an estimate is fixed if the two objects are fixed. In future, I try to find out another method to take advantages of more than two existent objects with a set of values. Combination of dyadic mixture model and regression model is a candidate method but how to prove and explain it is still fuzzy problem.