3. Artificial Intelligence, Group Theory, Symmetry and Invariance
We conclude, by reviewing how the principles of group theory, symmetry and invariance, provide a foundational framework to understand the function of machine learning algorithms, and the classifying power of ANNs in relation to statistical variance, transformations, and non-homogeneity in the input data. In broad terms, symmetry is the analysis of geometric and algebraic mathematical structures, and can have applications with data found in the fields of physics, molecular biology and machine learning. A core concept in symmetry is invariance, which in our context is changing data coordinates, such as shifting a drug molecule in space or a cancer histology tissue sample, while leaving the shape of the object unchanged [
4]. Following such a change which as will be formally defined later in the text as
invariant transformation, the machine learning algorithms and ANNs must be able to recognize a drug molecule following rotation, or a tissue to be recognized as cancerous from a shifted histology image.
In order to link the abstract symmetry concepts with data classification in machine learning, following the terminology of Bronstein et al., we consider the input data
to originate from a symmetry domain
. The
is the structure upon which the data are based, and upon the domain structure we train the artificial neural networks to perform classification, through the label prediction function
as mentioned in the earlier section. For example, microscopy images are essentially 2-dimensional numerical grids of
n x n pixels (
Figure 2a), with each pixel having a value for the light intensity captured when the image was taken. In this case the data domain is a grid of integers (
), represented as
. Similarly, for color images the data domain is
, with three overlayed integer grids each representing the green, blue and red layers composing the color image. In either case, the
contains all possible combinations of pixel intensities, while the specific pixel value combinations of the images in the input data
are a "signal"
from the domain. The ANN data classification and label prediction function
is applied on the signal
which is essentially a subset of the domain
.
A symmetry group contains all possible transformations of the input signal called symmetries or otherwise group actions. A symmetry transformation preserves the properties of the data, such as for example not distorting the objects in the image during rotation. The members of the symmetry group are the associations of two or more coordinate points on the data domain (grid in our image example). Between these coordinates, the image can be rotated, shifted or otherwise transformed without any distortion. Therefore, the key aspect of the formal mathematical definition of the group, is that the data attributes are preserved during object distortions that are common during the experimental acquisition of bioinformatics data. The concept of symmetry groups is important towards modeling the performance of machine learning algorithms, for classifying the data patterns correctly, despite the variability found in the input data.
, is composed of nodes shown as circles, and edges connecting the nodes and shown as arrows. It can represent a protein, where the amino acids are the nodes and the peptide bonds between amino acids are the edges.
Another important data structure for bioinformatics is a
graph , composed of
nodes representing biological entities, and
edges which are the connections between pairs of nodes (
Figure 2b). In a specific instance of a graph for a real-world object, the edges are a subset of all possible links between nodes. An example graph data structure for a biological molecule such a protein or a drug, would represent the amino acids or atoms as node entities, and the chemical bonds between each of these entities as edges. The edges can correspond to either the carbonyl-amino (C-N) peptide bonds between amino acids and molecular interactions across the peptide chain on the protein structure, or the chemical bonds between atoms in a drug molecule. Furthermore, attributes in the molecular data such as for example polarity and amino acid weight, or drug binding properties can be represented as
- dimensional node attributes, where
s are the attributes assigned to each node. Similarly, the edges or even entire graphs can have attributes, for experimental data measured on the molecular interactions represented by the edges, and measurements of the properties of the complete protein or drug. Finally, from an algorithmic perspective , images are a special case of graphs where the nodes are the pixels, and connect with edges in a structured pattern that form of a grid (
Figure 2a) representing the adjacent position of the pixels.
Having established the mathematical and algorithmic parallels between graphs and images, we will now utilize the principles of the symmetry group to examine the analytical and classification power of machine learning ANNs, in relation to variability and transformations in the data. For both data types such as input images or molecules represented as graphs that are shifted or rotated, we establish the concept of invariance through the principles of group theory and symmetry. These are the foundational mathematical and algorithmic formalisms, that can be used to model the performance and output of machine learning algorithms ANNs in relation to the variability in the dataset. Consecutively, these principles can then be extrapolated and generalized for other types of data beyond graphs and images, for which ANNs are trained for prediction and classification. While we present the group and symmetry definitions following a data-centric approach, we will nonetheless still follow the mathematical formalism, when describing how the group operations can transform the input data. Furtermore, different types of data can have the same symmetry group, and different transformations can be performed by the same group operation. For example, an image with a triangle which essentially is a graph with three nodes, can have the same rotational symmetry group as a graph of three nodes or a numerical sequence of three elements.
When chemical and biological molecules are represented as graphs as described earlier, the nodes can be in any order depending on how the data were measured during the experiment. This does not change the meaning of the data, and as long as the edges E representing the connections between the molecules are not modified, we have a proper representation of the molecular entity independently of the ordering of V. In this case, where two graphs for the same molecule have the same edges but different ordering of nodes, they are called isomorphic. Any machine learning algorithm performing pattern recognition on graphs, should not depend on the ordering of nodes so that classification with ANNs and artificial intelligence is not affected by experiment measurement variations in real-world data. This is something that is taken for granted with human intelligence, where for example we can recognize an object even when a photograph is rotated at an angle. Returning to our formal definitions, in order for ANNs algorithms to equivalently recognize isomorphic graphs, the functions and overall of the ANN acting on graph data should be permutation invariant.This means that for any permutation of the input dataset, the output value of these functions are identical independently of the ordering of the nodes V for example in the case of graphs. This concept can be similarly applied to images, which as mentioned previously are special cases of fully connected graphs, and furthermore these principles can also be generalized to other data types beyond images or graphs.
In order to formalize further the concept of invariance, and since both examples of the image and graphs are similarly points on a grids on a two dimemensional plane, we can use linear algebra. Specifically, by using a matrix we can represent the data transformations as group actions , within the symmetry group . The use of matrices enables us to connect the group symmetries with the actual data, through matrix multiplications that modify the coordinates of the object and consecutively represent the data transformations through the multiplication. The dimensions of the matrix are usually similar to these of the signal space for the data (for example, images). The the matrix dimensions not depend on the size of the group i.e. the number of possible symmetries, or the dimensionality of underlying data domain . With this definition in place, we can formalize symmetries and group actions for modifying data objects, and the use of matrix and linear transformations as basis for connecting invariance in relation to variability in the data.
We will now conclude by establishing the mathematical and linear algebra formalisms, for resilience of the ANNs and machine learning algorithm pattern recognition, in relation to transformations in the data. While our framework is on a two-dimensional, grid data domain , the formalisms developed here can also be extrapolated without loss of generality to any number of dimensions or data formats. We will first connect matrices to group actions (rotations, shifts etc.) in the symmetry group , by defining a function that maps the group to a matrix as . As mentioned previously, a matrix of numerical values (integers, fractions, positive and negative), when multiplied to the coordinate values of an object on the plane , it rotates or shifts the object coordinates for the exact amount correponsing to the group action within the symmetry group.
With these definitions in place, we will now connect the matrix formalisms with the neural network estimator function , that is identified by adjusting neuron connection weights during multiple training cycles with the input data. Our goal is to leverage the mathematical formalisms of group symmetry and invariance, in order to establish the ANN resilience for classifying and assigning labels to new data points. The data points originate from real-world data that might contain tranformations and distortions. We first define that the estimator function of the ANN to be invariant, if the condition for the input data holds such as for all matrices representing the actions within the symmetry group. This formula presents the condition required for the neural network function to be invariant: its output value is the same whether the input data are transformed or not (i.e an image or graph is not rotated on the plane), as this is represented by the matrix multiplication . Therefore, the output values by the ANN which are essentially predicted output labels (i.e = potent drug / not potent etc.) based on the input data, are resilient to noisy and deformed real-world data, when the network estimator function is invariant. In a different case, the estimator function approximated by the ANN can be equivariant and defined as . This means that the output of the ANN will be modified, but the label prediction result will be equally shifted along with the shift in the input data.
Up to this point, we have discussed only discrete tranformations in linear algebra terms, with matrix multiplications that result in a shift of coordinates and rigid transformations of the data, such as a rotation of the image or the graph by a specific angle on the grid
. However, we can have also also have continuous, more fine grained shifts which is common with real-world data. In this case, the ANNs algorithms should be able to recognize patterns, classify and label the data without any loss of performance. Mathematically, the continuous transformations follow equally with the invariant and equivariant functions described earlier. If for example the domain
contains data that have smooth transformations and shifts, such as moving images (video) or shifts of molecules and graphs that preserve
continuity in a topological definition [
11], in this case we have a
homeomorphism instead of
invariance.
Finally, if the rate of continuous transformation of the data is quantifiable, meaning that the function that maps the group to a matrix is differentiable, then the members of the symmetry groups will be part of a diffeomorphism. As it follows from the principles of calculus, in this case infinitely multiple matrices will be needed to be produced by for the continuous change of the data coordinates at every point. These differentiable data structures are common with manifolds, which for example could be used to represent proteins in fine detail. In this case the molecule would be represented as cloud with all atomic forces around the structure, instead of the discrete data structure of nodes and edges of a graph. Finally, if the manifold structure includes also a metric of distance between its points to further quantify the data transformations, in this case we will have an isometry during the transformation due to a group action from the symmetry group.