Conditional mixture model for modeling attributed dyadic data

Dyadic data contains co-occurrences of objects, which is often modeled by finite mixture model which in turn is learned by expectation maximization (EM) algorithm. Objects in traditional dyadic data are identified by names, causing the drawback which is that it is impossible to extract implicit valuable knowledge under objects. In this research, I propose the so-called attributed dyadic data (ADD) in which each object has an informative attribute and each co-occurrence of two objects is associated with a value. ADD is flexible and covers most of structures / forms of dyadic data. Conditional mixture model (CMM), which is a variant of finite mixture model, is applied into learning ADD. Moreover, a significant feature of CMM is that any co-occurrence of two objects is based on some conditional variable. As a result, CMM can predict or estimate co-occurrent values based on regression model, which extends applications of ADD and CMM.


Introduction to dyadic data and mixture model
Suppose data has two parts such as hidden part X and observed part Y and we only know Y. A relationship between random variable X and random variable Y is specified by the joint probabilistic density function (PDF) denoted f(X, Y | Θ) where Θ is parameter. Given sample {Y1, Y2,…, YN} whose all Yi (s) are mutually independent and identically distributed (iid), it is required to estimate Θ based on such sample whereas X is unknown. Expectation maximization (EM) algorithm is applied to solve this problem when only Yi (s) are observed. EM has many iterations and each iteration has two steps such as expectation step (E-step) and maximization step (M-step). At some t th iteration, given current parameter Θ (t) , the two steps are described as follows: E-step: The expectation Q(Θ | Θ (t) ) is determined based on current parameter Θ (t) , according to equation 1.1 (Nguyen, Tutorial on EM tutorial, 2020, p. 50).

M-step:
The next parameter Θ (t+1) is a maximizer of Q(Θ | Θ (t) ) with subject to Θ. Note that Θ (t+1) will become current parameter at the next iteration (the (t+1) th iteration). EM algorithm will converge after some iterations, at that time we have the estimate Θ (t) = Θ (t+1) = Θ * . Note, the estimate Θ * is result of EM.
Given two finite sets = {x1, x2,…, xN) and = {y1, y2,…, yM) with note that xi (s) and yj (s) represent -objects and -objects, respectively; exactly, they are names of objects. An observational pair (xi, yj) ∈ × is called a co-occurrence of xi and yj. Dyadic data or cooccurrence data contains these co-occurrences with note that a co-occurrence (xi, yj) can exist more than one time. So, each co-occurrence (xi, yj) is indexed by an index r. As a result, each co-occurrence is denoted by the triple (xi, yj, r) and we have (Hofmann & Puzicha, 1998, p (1.11) Where the conditional probability P(k | xi, yj, Θ (t) ) of AMM is calculated at E-step as follows: (1.12) Product-space mixture model (PMM) is derived from SMM with a minor change that the aspect set {1, 2,…, K} is Cartesian product of -aspect set {1, 2,…, } and -aspect set {1, 2,…, }. In other words, the aspect space is still symmetric but is checked (stripped) according to two directions and . (1.14) The sign "∼" denotes correspondence. PMM is defined as follows (Hofmann & Puzicha, Statistical Models for Co-occurrence Data, 1998, p. 4): (1.15) By applying EM, given dyadic sample , at some t th iteration, given current parameter Θ (t) = at M-step as follows: (1.18) Where (

Learning attributed dyadic data by conditional mixture model
In dyadic data , if each co-occurrence of xi and yj is associated with a value z (Hofmann, Puzicha, & Jordan, Learning from Dyadic Data, 1998, p. 1), the triple (xi, yj, r) becomes the quadruplet (xi, yj, z, r) which is called valued co-occurrence of xi and yj. The value z is called associative value or co-occurrent value. If z is value of a variable Z then, Z is called associative variable. As a result, the sample is called valued dyadic data. Note, Z can be univariate or multivariate (vector).
indicates that the associative value at r th co-occurrence is Z=z. Thus, the quadruplet (xi, yj, Z, r) can be denoted as (xi(r), yj(r), Z(r), r).
An extension of valued dyadic data is called attributed dyadic data in every xi has an attribute Xi and every yj has an attribute Yj with constraint that all Xi (s) are iid and all Yj (s) are iid. Of course, these attributes are considered as random variables. Let X and Y be random variable representing every Xi and every Yj, respectively. Note, X and Y can be univariate or multivariate (vector), which are called attribute variable. As a result, the sample is called attributed dyadic data (ADD).
= {( , , , , , ): 1 ≤ ≤ | |} As a convention, Xr and Yr indicate that -object attribute and -object attribute at the r th cooccurrence, respectively whereas Zr indicates associative variable at the r th co-occurrence. The attributed dyadic data is represented as follows: = {( , , ): 1 ≤ ≤ | |} (2.2) Thus, each co-occurrence in attributed dyadic data is denoted as a triplet (Xr, Yr, Zr). Theobject and -object of Xr and Yr are denoted (r) and (r) which are some xi and yj, respectively. Here it is required to extends SMM, AMM, and PMM to represent ADD.

= ( | ) ( | ) ( , | ) = ( | ) ( | ) ( | ) ( | , )
Thus, the joint PDF of -object attribute X, -object attribute Y, aspect k, and associative variable Z given conditional variable W is defined as follows: Of course, αk, βk, γk, and θk are partial parameters of fk(W|αk), gk(X|βk), hk(Y|γk), and vk(Z | W, θk), respectively. These functions are PDFs. The whole parameter is Θ = (αk, βk, γk, θk) T . The PDF fk(W|αk) implies distribution of aspect k given conditional W. The two PDFs gk(X|βk) and hk(Y|γk) imply distributions of attributes with regard to -object, -object, and aspect k. The PDF vk(Z | W, θk) is conditional PDF of Z given W with regard to aspect k; later on we will know that it is more useful if it is considered as regression model.
According to Bayes' rule, the conditional probability of k given -object attribute X,object attribute Y, associative variable Z, and conditional variable W is: Symmetric model (SMM) for attributed dyadic data is called symmetric attributed mixture model (SAMM), which is defined based on the joint PDF f(X, Y, k, Z | W, Θ) and K aspects {1, 2,…, K} as follows: (2.10) Note, a and b are dimensions of Z and W. Mean and covariance matrix of Z given W are ωkW and Σθk, respectively. The partial parameter ωk is called regressive coefficient matrix, which is axb matrix having a rows and b columns. Note, the product ωkW is: The equation 2.11 also specifies multivariate regression function. Of course, equation 2.11 implies that: By applying EM algorithm, given attributed dyadic sample , at the t th iteration of GEM, given current parameter Θ (t) = (αk (t) , βk (t) , γk (t) , θk (t) ) T , the conditional expectation Q(Θ|Θ (t) ) of SAMM specified by equation 2.5 is: (Θ|Θ ( ) ) = ∑ ∑ ( | , , , , Θ ( ) ) =1 | | =1 * log( ( | ) ( | )ℎ ( | ) ( | , )) Note, all Xr (s) are iid represented by X, all Yr (s) are iid represented by Y, all Zr (s) are iid represented by Z, and all Wr (s) are iid represented by W. The -object and -object of Xr and Yr are denoted (r) and (r) which are some xi and yj, respectively. In short, we obtain: (Θ|Θ ( ) ) = ∑ ∑ ( | , , , , Θ ( ) ) (log( ( | )) + log( ( | )) =1 | | =1 + log(ℎ ( | )) + log( ( | , ))) (2.12) Following equation 2.4, the conditional probability P(k | Xr, Yr, Zr, Wr, Θ (t) ) is calculated at Estep as follows: which attributes along with gk(X|βk) and hk(Y|γk) are responsible for content-based filtering whereas associative variable along with vk(Z | W, θk) and fk(W|αk) are responsible for collaborative filtering and context-awarded filtering. The attributes X and Y represent information about users and items in rating data with note that users and items are knowns as objects. The associative variable Z represents rating values in rating data. The conditional variable W represents contexts. Equation 2.30 is the ultimate formula of the unified estimation model. I hope that researchers will concern such proposed model because it is not realized yet when I compose this paper.