Tensor Properties of Joint Probability Densities on Riemannian Parametric Manifold

We show the tensor properties of joint probabilities on a Riemannian parametric manifold. Initially we develop a binary data matrix for parameter measurements of a large number of particles confining in a closed system in order to retrieve the joint probability densities of related parameters. By introducing a new generalized inner product as a multilinear operation on dual basis of parametric space, we extract the set of joint probabilities and prove them to meet contravariant tensor properties in a general Riemannian parametric space. We show these contravariant tensors reduce to classical ordinary partial derivatives definition in ordinary Euclidean parametric space. Finally we prove by a theorem that symmetrized iterative contravariant derivative of cumulative probability function on Riemannian manifold gives the set of joint probabilities in those manifold. We bring some examples for compatibility with physical tensors.


Introduction
The current classical theory of probability distributions have been based on the presumptive Euclidean space of parameters and random variables.In this sense the joint probability densities being derived from sequential partial derivatives of corresponding cumulative distribution function [1,2].The non-Euclidean manifolds endowed with Riemannian metrics in the context of probability theory, have been introduced by many authors for phase and parameter space [3,9], [4].Moreover using differential geometry, some variations of information theory having been devised on non-Euclidean (Riemannian) metric spaces in this field by Fisher, Rao, Amari and others [3,5].In these approaches, probability distributions of various models exhibited as points on some Riemannian manifolds.By this background, differential geometry technique could be applied to analyze the probability distribution manifolds.The most applicable metric tensor on these spaces first introduced by Fisher and Rao the so-called Fisher information metric [6].In this short article we introduce a new model based on binary data matrices to record the parameters of a system of particles.Each particle has its own properties and occupies just on infinitesimal interval on each parameter coordinate .In sec (2) we develop the concept of particle oriented coordinates which span a flat Euclidean space on which we embed Riemannian parametric manifolds.Iteration of measurements for all particles and parameters yield a binary matrix containing all particle system information.In sec (3) we proceed by introducing a manifold whose base vectors are the rows of data matrix then we extract the joint probabilities of events based on the dual space base vectors and a new definition of inner products and prove their tensor properties.In sec (4) we prove the general joint probabilities on Riemannian parametric manifolds as the iterative contravariant derivatives of a scaler i.e. cumulative distribution function and proceed for higher order joint probabilities through symmetrized covariant derivatives.In sec (5) we bring some examples of the manifolds whose metrics developed by covariant derivative of a scaler.

Definition 1.1
Parametric space ℳ specified by coordinates with from 1 to the dimension of parametric space, : So the number of involved independent parameters (degree of freedom) is d .The parametric space ℳ in present article, generally presumed to be a Riemannian manifold with local coordinates and basis vectors = at any point ∈ ℳ.If we divide the coordinate into small intervals ∆ ( ), then integer ( ) refers to the i th interval of this parameter.This means that i replaces the coordinate values of point and ranges between 0 and an integer : The whole parametric manifold includes all coordinates and associated basis vectors.The overall number of intervals reads as: = ∑ (1.3)

Introducing Particle oriented coordinate
Assume a system consisting of large number of particles confined in an interval of space-time, undergoes measurements.Taking into account of such a system of particles, brings us the advantage of choosing unlimitedly a sufficient huge number of particles, otherwise we could substitute particles by any kind of systems defined by their arbitrary parametric space.
Suppose a set of related parameters labelled by is to be measured in an interval of time ∆ .We label the particles by integer numbers up to , and divide the possible range of each parameter into such small intervals that satisfy the accuracy of the experiment.These intervals defined as in definition 1.1, denoted by ∆ ( ) where i stands for the ordered location number of an interval on parameter coordinate : So the parameter value of each particle falls in just one of these intervals.For each interval ∆ ( ), there are some particles that their parameter with label places in this interval.Let record the results of these outcomes in an array whose entries are 0 or 1 in such a way that for particles with parameter value in interval ∆ ( ),the corresponding value in array reads 1 and otherwise 0. For instance for the first particle, if its value of parameter falls in the range of ∆ ( ) we return 1, and otherwise 0. If this process iterates for all particles then we obtain an array of entries for this interval which could be arranged as a vector * ( ).This vector is a binary array which carries the information of this interval for system of particles e.g.* ( ) = (1,0,0,1,1,0,1,1, … ) Each of these 0 and 1 connected to a specific particle in the system and the total number of entries equals .one may attribute to any particle, an independent coordinate and consider adimensional manifold spanned by these coordinate in such a way that the result of projection of * ( ) on these coordinates turns out the array in equation (2.2).
Here any particle specifies an independent coordinate with two possible values 0 and 1.These coordinates are orthogonal, because at the initial setting the parameter values of every particle (such as position and momentum etc) considered to be independent of all other particles.We call these set of coordinate as particle oriented coordinate that as a coordinate chart is homeomorphic to a subset of Euclidean flat space ℝ which span a manifold .Moreover parametric space ℳ of considered system includes all coordinates and their dual basis * where the latter span a dual tangential (cotangent) vector space * ℳ defined on a point in parametric space ℳ i.e. * = * ℳ ⊂ The set of * ( )|0 ≤ ≤ for a fixed as dual basis vectors for parameter , span a cotangent space * ℳ that is a sub-space of ℳ * : * ℳ = ⋃ * ℳ (2.4) Manifold ℳ is not necessarily flat and as shown in next sections will be considered as Riemannian manifolds with non-zero curvatures.

Definition 2.2
For a fixed , we define the matrix × with * ( ) as th row.Obviously each column of this matrix contains just one 1, because as described before each column belongs to a particle which occupies just one value (interval) ∆ ( ) .

Lemma 2.2
The set of basis * ( ) for fixed are mutually orthogonal.Proof: any particle takes just one value and consequently one interval on coordinate.So as was shown in definition 2.2, columns of × carries just one entry 1 while other entries takes 0. If the th component of * ( ) be denoted by [ * ( )] , then the inner product 〈 * ( ), * ( )〉 can be read as: [ * ( )] and [ * ( )] do not take the 1 value simultaneously, then it is easy to conclude that inner product 〈 * ( ), * ( )〉 vanishes for ≠ and orthogonality of these bases is proved.
* ( ) ≡ ( ) ≡ ( ) Therefor the notation (2.6) exhibits the bases of a cotangent bundle (space) * which is the dual of parametric space tangent bundle .As described before, these basis vectors specify the interval i on a specific parameter .The relative abundance of particles within this interval is proportional to the ratio of total number of 1 in the * ( ), denoted by ( ), to total particles number : It is straight forward to conclude that ( ) equals the inner product 〈 * ( ), * ( )〉 .In an infinitesimal limit of ∆ ( ) this dual basis vectors approaches to ( ) which belongs to the cotangent (dual) space basis of * ℳ while carries the information about that interval in considered system.Respect to coordinate transformations → ̅ in parametric space we obtain the rule of ( ) transformation with summation on : Or with the identity (2.6): ̅ stands for the transformed locations of i in new coordinate ̅ .Since ( ) are considered infinitesimals, the number of particles confined to this interval always constitute a binary array even in transformed basis vectors ̅ * and therefore these basis vectors in all coordinates contain a binary array similar to * ( ).

Binary Data matrix:
Combining all the row vectors * ( ) for all parameters, gives a matrix × with rows and columns where defined in (1.3) and stated as total particles number.This matrix itself is also the conjoint of matrices each for a specific parameter.As an example if for a parameter with coordinate , there are intervals, then a matrix × with rows and columns could be identified as a part of × where * ( ) (with fixed ) constitute the set of dual basis.The columns of × carries just one entry 1 , because each column corresponds to one particle that its parameter value falls in just one of intervals ∆ , namely ∆ ( ).Obviously combining all × result in the data matrix × .The set of * ( ) with fixed span a tangent space * on a point ∈ * (or ℳ).

Definition 3.1
In this model an event refers to placing the value of in the interval i of that parameter i.e. ( ) for a particle.Definition 3.2 Joint probability density of two events ( ) and ( ) is proportional to the number of particles that simultaneously have the same parameter values of ( ) and ( ) where i and j indicate the values that are taken by and parameters (i.e.coordinate values).If this number presented by the exact form of joint probability density reads as: It is noteworthy to remind that for = the joint density and reduce to and respectively.Definition 3. 3 We define the generalized inner product ⨀ for vectors , , , .. in a vector space with orthogonal local coordinates as a multilinear map: This is simply a generalization of inner products in usual definition or a functional on dual vector space .consequently* is dual of or double dual of coordinate space.
Lemma 3.4 Generalized scalar operation ⨀ is linear.
Due to the definition (3.3) it is straightforward to see that: It is noteworthy that this map lacks the associative property.

Lemma 3.5
Regarding definition (3.2) for generalized inner product of any subset of cotangent space basis * ( ), * ( ), * ( ), … , the joint probability of events ( ), ( ) and ( )... can be given by equation: Summation carried out on all particles.Since components of ( * ) take two values 0 or 1; the non-vanishing terms are those with components that simultaneously take 1, and therefore right side sum of equation (3.5) reduces to the number of particles whose parameter values simultaneously places in the intervals i , j , k , .. on particles coordinate.Normalization of (3.4) by yields the ratio of this number to total particles number and consequently gives the joint probability of simultaneous occurring of events ( ), ( ) and ( ).. .For instance in our model inner products * ⨀ * = 〈 * , * 〉 stand for the number of particles which have the common parameter at i and locations, or the joint probability of i and j in the related system of particles i.e.
. For a general case we consider that represents the number of particles with common parameters at locations i , , …. Consequently with a normalization factor ( is the total number of particles) gives the joint probability of occurrence of i , , … intervals in the above mentioned system: .. ( , , ., , ) = This equation reveals a specific configuration and state of related system for which there is a specific set of joint probabilities .. that represents the exact state of it.Any configuration (state) of this system represented by such a specific set and vice versa.Evidently the order of indices , , ... does not effect on the related joint probability.
For a continuous density, is proportional to .. and volume element and should be modified as: .. ( , , ., , ) = (3.7) As described in equation (3.3), we could attribute a quantity .. to any multilinear form of basis vectors * ( ) defined in (2.5), as follows: This holds true for any transformed basis * : Where * , * , ... considered at ordered i , , … locations and ̅ * , ̅ * , ... at transformed ordered locations ̅ , ̅ , … .The joint probabilities generally speaking, are invariant under the permutation of , , .. indices, because this joint probabilities, axiomatically does not depend on the order of simultaneous events and their indices.Lemma 3.6 .. defined in (3.8) is a contravariant tensor.
Proof: with a coordinate transformation → ̅ we have: Substitution into (3.9)gives: Due to linear property of ⨀, and the definition of = * ⨀ * ⨀ * ⨀ … we obtain: This reveals the tensor property of while we use the Einstein summation convention.Actually ( , , , . . ) exhibits a large number of distributions for all permutation of i , , … locations of parameters.Corollary 3.7 while the indices i , , … refer to the location numbers of intervals belong to some ordered parameters ( ), ( ) and ( ) , the rank of tensor equals the number of these indices.Then for any fixed … indices we have: Proof: we have shown that tensors .. ( , , , . . ) are the joint probabilities of ( ), ( ) and ( )... events.For these joint probabilities the equation (3.13) is valid as a general property [2,10].Definition 3.8 Regarding the definition of cumulative distribution or joint distribution function (CDF) [2,10] we may define a function on any point ( , … ) ∈ ℳ as follows: Where stands for the total particle whose parameters take values below the , … At the limit of → ∞ the function behaves as a smooth and differentiable function on parametric space ℳ.

Joint probability as contravariant derivatives
It is usual in probability texts to explain the multiple variable joint probability density in Euclidean parameter space with Cartesian coordinate as a partial derivative of cumulative distribution function [1,2,10]: Eventually the equation (4.1) is working for a Cartesian coordinates of parameters in a Euclidean manifold while is cumulative probability or joint distribution function [1,2,10].In such a space, the reciprocal dual coordinates coincides the original coordinate .Nevertheless equation (4.1) is not valid for coordinate of a Riemannian space.It is well known that this partial derivative generally is not a tensor except for its first order and in a Euclidean space with Cartesian coordinate.Naturally the first choice for substitution of equation (4.1) in Riemannian space will be fitted by covariant derivatives.However because the tensors .. in lemma 3.6 are contra variant tensors, we propose a contraction of covariant derivatives with metric tensor, to derive contravariant derivatives for the general form of joint probability in parametric space with Riemannian manifold properties.Actually these contravariant derivatives of any order remain as tensors: 2) here we use contraction of covariant derivative ∇ with metric tensor i.e. ∇ = ∇ and call it as contravariant derivative ∇ .Recalling that because of lower indices symmetry for torsion free connection of covariant derivative, the successive operation of covariant derivative will not generally commutative except for double covariant derivative of a scaler.Consequently for contravariant derivative we have the same commutativity: Lemma 4.2 commutative property of contravariant derivative holds for second derivative of a scaler: Proof: after using contraction with metric tensor we obtain in terms of covariant derivative: Because the covariant derivative of metric tensor vanishes, metric tensor acts as a constant with respect covariant derivative: Using the commutative property of covariant derivative we have: Naturally it is supposed that the order of events does not matter for a joint probability of some events.Although the commutativity for two events guarantees it for the case of two events measurements, however for joint probabilities of more than two measurements (events), the commutativity of contravariant derivatives will not met.On the other hand regard to lemma (3.6) joint probability .. as a tensor is full symmetric respect to contravariant indices.Consequently the derivatives should be symmetrized to fulfil this property of joint probabilities.Symmetrization of multiple contravariant derivative (symmetrized covariant derivative) could be accomplished by routine symmetric form: Which stands for the sum over all permutations of indices.

Corollary 4.2
The set of all paired joint probabilities constitute the metric tensor of dual space ℳ * .
Proof: regarding equation (3.3) for paired joint probabilities we have: Right side stands for ordinary inner product of dual space basis vectors.Respect to definition of metric tensor, these inner products are proportional to metric tensor of related manifold: Knowing the metric of dual space ℳ * reveals all properties of manifold (including curvatures).
Taking into account the identity ~ * we conclude that are sufficient to estimate the most part of information about other joint probabilities and distribution of related system.We use the notation and * as determinants of metric tensors and * respectively.Theorem 4.3 .. in a general Riemannian manifold coordinates of parametric space could be represented as an iteration of a symmetrized contravariant derivative of a scaler i.e. cumulative probability distribution as a continuous variable which replaces in (4.1).However we have shown that joint probability .. is a tensor that in Euclidean space with Cartesian coordinates reduces to partial derivative of equation (4.1) and therefore for a general Riemannian manifold best choice is a symmetrized contravariant derivative instead of partial derivatives of cumulative probability as follows: Proof: Respect to Taylor covariant derivative expansion of a scaler on Riemannian manifold we have [11,12]: Where Symmetrization imposed on iterated covariant derivatives.With definition = at an infinitesimal interval − = Δ → 0 the above equation reduces to: Replacing with cumulative distribution function and ( … ) with volume element √ yields ( is metric tensor determinant): Where is metric tensor determinant.The last term in this sum stands for the probability of being in volume element .Therefore respect to (3.6) and after replacing covariant derivatives in (4.11) by contravariant derivatives through contraction with metric tensor: ∇ = ∇ (4.12) (4.11) converts to: We used the well-known property of covariant derivative: ∇ = 0 which makes the metric tensor in covariant derivative, as constants in ordinary derivatives.So could be factorized out from a covariant derivative term.

With transforming
to contravariant form: By performing Einstein Summation convention on indices , the terms reduces to unit tensors ( is the inverse of ) and we obtain: The last term in this summation as we showed in corollary 4.1, will be equal to joint probability i.e. … tensor: Proof: regarding the definition of contravariant derivative ∇ = ∇ and replacing the covariant derivative in terms of connection Γ (Christoffel symbol) we obtain: In a flat space with Cartesian coordinate, all terms of Γ vanishes and (4.19) reduces to:

Results
Corollary 5.1 Almost all physical tensors of rank 2 or more, can be interpreted as a joint probability density, or simply a variable density.Of course, this proposition is the inverse of the lemma 3.6.Eventually the tensors .. as probability density, reminds us the physical tensors such as stress-energy tensor with a definition of energy density or energy flow density in Riemannian curved space-time manifold.Therefore this type of tensors could exemplify a physical realization of paired joint probability densities on Riemannian space.

Generalized Lagrange metric
As an example for existence of a scaler in physics whose second partial derivative leads to metric tensor , we recall the so-called generalized Lagrange metric [7]: These manifolds well-known as Hessian manifold and in the general case could be characterized by: = ∇ ∇ (5.4) Where the partial derivatives in the previous equation have been replaced by covariant derivatives as a generally accepted definition for Hessian.stands for a scaler usually well-known as potential.

Discussion
In this short article a general form of joint probability density compatible with Riemannian parametric manifold has been introduced.We proved the contravariant tensor properties of this general form.Approach to these results facilitated by introducing a binary data matrix that collects the parametric information of a system of particles on a Euclidean particle oriented coordinate where the joint probability densities identified as a new definition generalized inner products.
Extending the result to Riemannian sub-manifolds results in the general form of probability densities on Riemannian manifolds.It has been shown the reduction of this generalized joint probability density to the usual form of iterative partial derivative of cumulative probability function.Compatibility of the results with physical tensors such as stress-energy tensor has been verified.

( 4 . 1 ) 4 . 1
Lemma Let be a differentiable cumulative distribution function as defined in definition 3.8.In Taylor expansion of this function about a point ( , , … ), regarding (4.1) in Cartesian coordinate, the probability densities .. appear as the coefficients of powers (

2 )
Therefor the Lagrangian ℒ resembles the in equation (4.1).Moreover for a locally flat manifold there always exist a metric by definition of the form: