The genetic coding system and unitary matrices

Information molecules of DNA and RNA should obey principles of quantum mechanics where unitary operators in form of unitary matrices have key meanings. Unitary matrices are the basis of calculations in quantum computers. This article presents some author's results, which show that matrix forms of the representation of structured systems of molecular-genetic alphabets can be considered as sets of sparse unitary matrices related with phenomenologic features of the degeneracy of the genetic code. These sparse unitary matrices have orthogonal systems of functions in their rows and columns. A complementarity exists among some unitary genetic matrices in relation each other. Decompositions of numeric genetic matrices into sets of sparse unitary matrices are connected with the logical operation of modulo-2 addition used in quantum computers as well. Tensor (or Kronecker) families of unitary genetic matrices with their fractal-like properties are also considered. The described results are discussed in the frame of development of quantuminformation approaches for modeling genetic systems.


Introduction.
The information molecules of DNA and RNA of the genetic coding system belong to the world of molecules, in which the principles of quantum mechanics manage.This article presents results of the author's study of abilities of using formalisms of quantum mechanics and quantum informatics to model regular structures of molecular-genetic systems.First of all, we are talking about searching for correspondences between unitary operators and structured alphabets of DNA and RNA in their matrix forms of representation.The article provides additional materials for the development of quantum information modeling of structured molecular-genetic ensembles; elements of the quantum information modeling have been described in the author's work about tetra-group symmetries in long DNA texts [Petoukhov, 2017a].
In line with one of the postulates of quantum mechanics, the evolution of a closed quantum system is described by unitary transformations.Computational processes in quantum computer science are based on unitary operators that serve as quantum gates."Any unitary matrix specifies a valid quantum gate" [Nielsen, Chuang, 2010, p. 18].Any physical impact on a qubit in quantum mechanics is described by a linear unitary operator.
In quantum mechanics and quantum computer science, an important role is played by unitary operators in form of Hadamard matrices with complete orthogonal systems of Walsh functions in them.Hadamard operators are also widely used for spectral representations of signal vectors in the technique of noise-immune communication [Seberry, Wysocki, Wysocki, 2005], the sequency analysis of Harmuth [Harmuth, 1977[Harmuth, , 1989]], the digital logical holography [Derzhypolskyy, Melenevskyy, Gnatovskyy, 2007;Morita, Sakurai, 1973;Soroko, 1974] and algorithms of quantum informatics [Nielsen, Chuang, 2010].But the Hadamard matrices with their complete systems of orthogonal functions are not the only unitary matrices with complete systems of orthogonal functions in them.
In this paper, we present other unitary matrices with other complete systems of orthogonal functions that were discovered by the author in the course of the algebraic modeling of molecular alphabets of DNA and RNA.These unitary matrices are sparse ones and they form sets of mutualcomplementary matrices (in some algebraic sence).We conditionally call them unitary genetic matrices (or briefly, unitary geno-matrices).This makes us recall the well-known proposition that different natural systems may need -for their spectral analysis -in their own systems of orthogonal functions: "after Fourier it was found that for some problems, harmonic sinusoids rather than other systems of orthogonal functions, for example, the Legendre polynomials, are better suited.In fact, any particular problem needs its own system of orthogonal functions.This was most clearly manifested in the course of the development of quantum mechanics" [Soroko, 1973].These unitary sparse geno-matrices contain complete systems of orthogonal functions and have special algebraic properties.They can serve as the basis for a new class of spectral representations of vectors in biology and other fields of science, as well as a new class of bio-mathematical models and algorithms in classical and quantum computer sciences.
One should add that quantum-information aspects of life are actively discussed in modern science, for example, in the book [Quantum aspects of life, 2008]; in articles about a biology of quantum information [Matsuno, 1999[Matsuno, , 2003[Matsuno, , 2015;;Matsuno, Paton, 2000]; in articles about a possible meaning of the quantum algorithm of Grower in genetic information [Patel, 2001 a,b,c], etc.

Matrix representations of DNA-alphabets and genetic binary oppositions
Science does not know why the basic alphabet of DNA has been created by Nature from just four letters (adenine A, thymine T, cytosine C and guanine G), and why just these very simple molecules were chosen for the DNA-alphabet (out of millions of possible molecules).But science knows [Fimmel, Danielli, Strüngmann, 2013;Petoukhov, 2008;Petoukhov, He, 2009;Stambuk, 1999] that these four molecules are interrelated due to their symmetrical peculiarities into the united molecular ensemble with its three pairs of binary-oppositional traits or indicators (Fig. 1).These three pairs of binary-oppositional traits or indicators are the following:

№ Binary Symbols C A G T/U
(1) Two letters are purines (A and G), and the other two are pyrimidines (C and T).From the standpoint of these binary-oppositional traits one can denote C = T = 0, A = G = 1.From the standpoint of these traits, any of the DNA-sequences are represented by a corresponding binary sequence.For example, GCATGAAGT is represented by 101011110; (2) Two letters are amino-molecules (A and C) and the other two are keto-molecules (G and T).From the standpoint of these traits one can designate A = C = 0, G = T = 1.Correspondingly, the same sequence GCATGAAGT is represented by another binary sequence, 100110011; (3) The pairs of complementary letters, A-T and C-G, are linked by 2 and 3 hydrogen bonds, respectively.From the standpoint of these binary traits, one can designate C = G = 0, A = T = 1.Correspondingly, the same sequence GCATGAAGT is read as 001101101.
Accordingly, each of the DNA-sequences of nucleotides is the carrier of three parallel messages on three different binary languages.At the same time, these three types of binary representations form a common logic set on the basis of the logic operation of modulo-2 addition denoted by the symbol ⊕: modulo-2 addition of any two such binary representations of the DNA-sequence gives the third binary representation of the same DNA-sequence: for example, 101011110 ⊕ 100110011 = 001101101.One can remind here the rules of the bitwise modulo-2 addition: 0 ⊕ 0 = 0; 0 ⊕ 1 = 1; 1 ⊕ 0 = 1; 1 ⊕ 1 = 0.The logic operation of modulo-2 addition is actively used in classical and quantum computers.Below we use the operation of modulo-2 addition for those decompositions of genetic matrices, which lead to interesting sets of inter-complementary unitary matrices of sparse types for special kinds of spectral representations of vectors.
Taking into account the phenomenological fact that each of DNA-letters C, A, T and G is uniquely defined by any two kinds of mentioned binary-oppositional indicators (Fig. 1), these genetic letters can be represented by means of corresponding pairs of binary symbols, for example, from the standpoint of two first binary-oppositional indicators.It is convenient for us -for the further description -use at the first position of each of letters its binary symbol from the second pair of binary-oppositional indicators (the indicator "amino or keto": C=A=0, T=G=1) and at the second positions of each of letters its binary symbol from the first pair of binary-oppositional indicators (the indicator "pyrimidine or purine": C=T=0, A=G=1).In this case the letter C is represented by the binary symbol 0201 (that is as 2bit binary number), A -by the symbol 0211, T -by the symbol 1201, G -by the symbol 1211.Using these representations of separate letters, each of 16 doublets is represented as the concatenation of the binary symbols of its letters (that is as 4-bit binary number): for example, the doublet CC is represented as 4-bit binary number 02010201, the doublet CA -as 4-bit binary number 02010211, etc.By analogy, each of 64 triplets is represented as the concatenation of the binary symbols of its letters (that is as 6-bit binary number): for example, the triplet CCC is represented as 6-bit binary number 020102010201, the triplet CCA -as 6-bit binary number 020102010211, etc.In general, each of n-plets is represented as the concatenation of the binary symbols of its letters (below we will not show these indexes 2 and 1 of separate letters in binary representations of n-plets but will remember that each of positions corresponds to its own kind of indicators from the first or from the second set of indicators in Fig. 1).
It is convenient to represent DNA-alphabets of 4 nucleotides, 16 doublets, 64 triplets, … 4 n n-plets in a form of appropriate square tables (Fig. 2), which rows and columns are numerated by binary symbols in line with the following principle.Entries of each column are numerated by binary symbols in line with the first set of binary-oppositional indicators in Fig. 1 (for example, the triplet CAG and all other triplets in the same column are the combination "pyrimidine-purin-purin" and so this column is correspondingly numerated 011).By contrast, entries of each of rows are numerated by binary numbers in line with the second set of indicators (for example, the same triplet CAG and all other triplets in the same row are the combination "amino-amino-keto" and so this row is correspondingly numerated 001).In such tables (Fig. 2), each of 4 letters, 16 doublets, 64 triplets, … takes automatically its own individual place and all components of the alphabets are arranged in a strict order.It is essential that these 3 separate genetic tables form the joint tensor family of matrices since they are interrelated by the known operation of the tensor (or Kronecker) product of matrices (Fig. 3).So they are not simple tables but matrices.By definition, under tensor multiplication of two matrices, each of entries of the first matrix is multiplied with the whole second matrix [Bellman, 1960].The second tensor power of the (2*2)-matrix [C, A; T, G] of 4 DNA-letters gives automatically the (4*4)matrix of 16 doublets; the third tensor power of the same (2*2)-matrix of 4 DNA-letters gives the (8*8)matrix of 64 triplets with the same strict arrangement of entries as in Fig. 2. In this tensor construction of the tensor family of genetic matrices, data about binary-oppositional traits of genetic letters C, A, T and G are not used at all.So, the structural organization of the system of DNA-alphabets is connected with the algebraic operation of the tensor product (Fig. 3).It is important since the operation of the tensor product is well known in mathematics, physics and informatics, where it gives a way of putting vector spaces together to form larger vector spaces.The following quotation speaks about the crucial meaning of the tensor product: «This construction is crucial to understanding the quantum mechanics of multiparticle systems» [Nielsen, Chuang, 2010, p. 71].For us the most interesting is that the tensor product is one of basic instruments in quantum informatics.As is known, the degeneracy of the genetic code has the important specificity: the entire set of 64 triplets is divided by Nature into 2 equal binary-opposition subsets [Rumer, 1968]: • 32 triplets with "strong roots" (black colors in Fig. 4), i.e., with 8 "strong" doublets AC, CC, CG, CT, GC, GG, GT, TC in their first positions; • 32 triplets with "weak roots" (white colors in Fig. 4), i.e., with 8" weak" doublets CA, AA, AG, AT, GA, TA, TG, TT in their first positions.
Fig. 4. Black-and-white mosaics represent the distribution of strong and weak doublets in the matrix of 16 doublets (left) and the distribution of triplets with strong and weak roots in the matrix of 64 triplets (on the right).Binary number in brackets in each of matrix cells is a sum of modulo-2 addition of binary numberings of the row and the column of the cell.
Code meanings of triplets with strong roots do not depend on the letters on their third position; code meanings of triplets with weak roots depend on their third letter.What are locations of these The unexpected phenomenological fact is a symmetrical location (Fig. 4) of all black and white entries in the genetic matrices of 16 doublets and 64 triplets, which were constructed very formally without any mention about amino acids and the degeneracy of the genetic code.
Symmetrical properties of mosaics in the genetic matrices in Fig. 4 are the following: 1) the left and right halves of the matrix mosaic are mirror-anti-symmetric each to other in its colors: any pair of cells, disposed by mirror-symmetrical manner in the halves, possesses the opposite colors; 2) the block mosaic of the matrix has the cruciform character: both quadrants along each diagonals are identical each other from the standpoint of their mosaic; 3) mosaic of each of rows has the meander character identical to known Rademacher functions rn(t) = sign(sin2 n πt), n = 1, 2, 3,…, which are particular cases of Walsh functions and contain only values +1 and -1 (https://www.encyclopediaofmath.org/index.php/Rademacher_system).
Using this analogy with Rademacher-Walsh functions, one can represent the symbolic genetic matrices in Fig. 4 in forms of numeric matrices R4 and R8 with their entries +1 and -1 in Fig. 5 where numbers +1 (-1) represent black (white) doublets and triplets correspondingly.Taking into account that meander-like mosaics of rows of matrices R4 and R8 correspond Rademacher functions, we conditionally called these matrices "Rademacher matrices" in all our publications beginning from our book [Petoukhov, 2008] (although Hans Rademacher himself never worked with such matrices).

R4 =
Fig. 5. Numeric representations R4 and R8 of the genetic matrices of 16 doublets and 64 triplets from Fig. 4. Matrix cells with number +1 (-1) correspond cells with black (white) doublets and triplets in Fig. 4.Each of rows of the numeric matrices represents one of Rademacher-Walsh functions.
It should be noted that a huge quantity 64! ≈ 10 89 of variants exists for dispositions of 64 triplets in a separate (8*8)-matrix.For comparison, the modern physics estimates time of existence of the Universe in 10 17 seconds.It is obvious that an accidental disposition of black and white triplets in a separate (8*8)-matrix will give almost never any symmetry.But in our approach, this matrix of 64 triplets is not a separate matrix, but it is one of members of the family of matrices of genetic alphabets interrelated by means of binary-oppositional traits of nitrogenous bases A, T, C, G (and additionally it is one of members of the tensor family of matrices [C, A; T, G] (n) of interrelated alphabets of DNA).
These numeric matrices R4 and R8 with their mosaics (Fig. 5) represent the phenomenological peculiarities of the degeneracy of the genetic code.The exponentiation of these genetic matrices in the second power leads to their doubling and quadrupling: R4 2 = 2*R4 and R8 2 = 4*R8.This resembles the doubling and quadrupling the genetic material under mitosis and meiosis of biological cells.Let us analyze algebraic properties of these genetic matrices R4 and R8 more deeply.

The genetic matrix R4 and sparse unitary matrices
We begin the algebraic analysis of the (4*4)-matrix R4 in Fig. 5. Fig. 6 shows the decomposition of this matrix into a sum of 4 sparse matrices: R4 = R04 + R14 + R24 + R34 (below we will explain that this decomposition is not arbitrary but constructed on the principle of dyadic-shift decompositions known in technology of digital signal processing).
By definition, a complex square matrix U is unitary if its conjugate transpose U † is also its inverse: UU † = I, where I is the identity matrix (the conjugate transpose U † is also its inverse matrix U -1 ).The real analogue of a unitary matrix is an orthogonal matrix, for which the conjugate transposition U † is identical to the ordinary transposition: UU T = I.In this article we consider only the case of real square matrix.Unitary matrices have significant importance in quantum mechanics because they preserve norms, and thus, probability amplitudes (https://en.wikipedia.org/wiki/Unitary_matrix). The tensor product of two unitary matrices always generates a unitary matrix [Rumer, Fet, 1970, p. 38].
It is interesting that each of sparse matrices R04, R14, R24 and R34 are unitary (or orthogonal since their entities are real): R04R04 T = I, R14R14 T = I, R24R24 T = I, R34R34 T = I (1) In molecular-genetic systems, relations of complementarity play the very important role.The book [Chapeville, Haenni, 1974, Chapter 1] notes that the proof of the complementary structure of the bases in DNA has led to the most fundamental discoveries in modern biology: this complementarity provides the most important properties of DNA as a carrier of genetic information, including DNA replication in the course of cell division and also all mechanisms of manifestation of genetic information.But one can note that the set of unitary genetic matrices R04, R14, R24 and R34 (Fig. 6) contains the following algebraic complementarities in their pairs: unitary matrices R04 and R24 form the first pair of the algebraic complementarity since they are transformed into each other by the mirror reflection relative to the average vertical line with simultaneous inversion of signs of their non-zero entries (the mirror-anti-symmetry).The same is true for unitary matrices R14 and R34, which form the second pair with the similar algebraic complementarity of the mirror-anti-symmetric type.The degeneracy of the genetic code is connected with such algebraic complementarities in the set of unitary genetic matrices.
Determinants of all the unitary matrices R04, R14, R24 and R34 are equal to 1; by this reason the matrices R04, R14, R24 and R34 belong to the type of so called special unitary matrices.The special unitary matrices are closed under multiplication and the inverse operation, and therefore form a matrix group called the special unitary group (http://mathworld.wolfram.com/SpecialUnitaryMatrix.html ).
The table of multiplication of the closed set of genetic unitary matrices R04, R14, R24 and R34 is shown in Fig. 7 From this fact, one can conclude that the division of the set of 16 doublets in line with the degeneracy of the genetic code is connected with the set of sparse unitary matrices R40, R41, R42 and R43.
The rows of each of the unitary genetic matrices R04, R14, R24 and R34 form a complete orthogonal system of functions.The action of each of these matrices (except for the identity matrix R04) on an arbitrary 4-dimensional vector X = [x0, x1, x2, x3] transforms it into a new vector Y, which can be considered as a spectral representation of the vector X on the basis of the orthogonal system of functions in the rows of the given matrix.The action of the same unitary matrix, taken in its transposed form, on this vector Y restores the original vector X.The exponentiation of each of the matrices R40, R41, R42 and R43 in a tensor power generates a new unitary matrix with an orthogonal system of functions in its rows and columns.
Unitary matrices are used in quantum informatics as quantum logic elements (quantum gates) for performing quantum computations on their basis.In the case of multi-qubit systems, the operation of the tensor product of matrices is of key importance in connection with the postulate of quantum mechanics: the state space of a composite system is the tensor product of the state spaces of its components.In the light of this, it is especially interesting that the entire genetic (4x4)-matrix R4 (Fig. 5) is constructed as the sum of the tensor products of four unitary (2*2)-matrices, that is, of four quantum gates U0, U1, U2 and U3 (Fig. 8) in line with the following expression (2): where matrices U0, U1, U2 and U3 are shown in Fig. 8.These matrices are unitary: U0*U0 T = I2, U1*U1 T = I2, U2*U2 T = I2, U3*U3 T = I2, where I2 is the identity matrix.The set of these 4 matrices is also closed under multiplication.Fig. 9 shows their multiplication table, which coincides with the multiplication table of split-quaternions by J. Cockle by analogy with the case of unitary (4*4)-matrices in Figs. 6 and 7.It should be noted that unitary matrices U0, U1, U2 and U3 (Fig. 8) have relations with quantum gates used widely in quantum computing [Nielsen, Chuang, 2010, p. XXX].
Exponentiation of unitary matrices U1, U2 and U3 into ordinary integer powers n = 2, 3, 4,… gives cyclic groups of matrices with the following periods: U1 n = U1 n+4 , U2 n = U2 n+2 , U3 n = U3 n+2 .In this article we specially note a connection of cyclic groups with algebraic properties of genetic unitary matrices since such cyclic groups can be useful for modeling many inherited cyclic processes in physiology of organisms.

The complementarity of sparse unitary matrices in genetics and the cruciform principle in inherited sensory informatics
This Section considers the cruciform character of the block black-and-white mosaic of the (4*4)matrix R4 of 16 doublets, which reflects essential peculiarities of the degeneracy of the genetic code (Figs. 4 and 5).One can note that genetically inherited constructions of physiological sensory-motor systems demonstrate similar cruciform structures.For example, the connection between the hemispheres of human brain and the halves of a human body possesses the similar cruciform character: the left hemisphere serves the right half of the body and the right hemisphere (Fig. 10) [Annett, 1985[Annett, , 1992;;Gazzaniga, 1995;Hellige, 1993].The system of optic cranial nerves from two eyes possesses the cruciform structures as well: the optic nerves transfer information about the right half of field of vision into the left hemisphere of brain, and information about the left half of field of vision into the right hemisphere.The same is held true for the hearing system [Penrose, 1989, Chapter 9].In particular, due to existence of such inter-complementary right and left parts in genetically inherited visual and hearing systems, a person has a stereoscopic perception of his environment.Now we show that a similar cruciform character, which is represented in the mosaic matrix of 16 doublets (Figs. 4,5,11), is connected with the following fact: this mosaic matrix is a sum of two sparse unitary matrices that are algebraic complementary to each other (they are mirror-anti-symmetric to each other by analogy with the left and right halves of a human body) and that can be considered as the right and left parts of the cruciform matrix R4.Fig. 10.The cruciform schemes of some morpho-functional structures of informational systems in human organism.On the left side: the cruciform connections of brain hemispheres with the left and the right halves of a human body.In the right side: the cruciform structure of optic nerves from eyes in brain.
One can suppose that this inherited cruciform character of sensory-motor systems is connected with genetic cruciform structures that include, in particular, the genetic matrices R4 and R8 [Petoukhov, 2008;Petoukhov, He, 2009].Taking into account the quantum-informational character of moleculargenetic systems and also an important role of unitary matrices in quantum mechanics, it is interesting that -as we have discovered -these genetic matrices R4 and R8 are connected with intercomplementary sparse unitary matrices described below.
Let us begin with a consideration of the genetic matrix R4 of 16 doublets.We reveal that its cruciform character is connected with a pair of two sparse unitary matrices, which are mirror-antisymmetric to each other.Fig. 11 shows that the genetic cruciform matrix R4 is the sum of two sparse matrices R4R and R4L: R4 = R4R+R4L.These two sparse matrices are inter-complementary in an algebraic sense since they mirror-anti-symmetric to each other and they jointly form the non-sparse matrix R4.Using the analogy with our stereoscopic vision by means of two -left and right -eyes, we conditionally call the pair of complementary matrices R4R and R4L as the stereoscopic pair (or briefly, the stereo-pair) where the matrix R4R is called the right stereo-matrix and the matrix R4L is called the left stereo-matrix.
Fig. 11.The cruciform matrix R4 is the sum of two sparse matrices R4R and R4L, non-zero entries of which coincide with non-zero entries in corresponding quadrants along the main diagonal and the secondary diagonal of the matrix R4.
The multiplication of the spectral vector  ̅ R T with unitary transposed matrix 2 -0.5 *R4R T restores automatically the initial vector  ̅ :  ̅ R T *(2 -0.5 *R4R T ) = [5, -3, 7, 9].Turning to the left stereo-matrix R4L, one can check that the action of the unitary matrix 2 -0.5 *R4L on the same vector  ̅ = [5, -3, 7, 9] leads to the following spectral representation  ̅ L of the vector  ̅ on the basis of orthogonal functions in its rows: The multiplication of the spectral vector  ̅ L T with unitary transposed matrix 2 -0.5 *R4L T restores automatically the initial vector  ̅ :  ̅ L T *(2 -0.5 *R4L T ) = [5, -3, 7, 9].Such spectral representations (5, 6) of vector-signals can be used for noise-immunity coding of information by some analogies with known noise-immunity coding on the basis of Hadamard matrices in digital signal processing.
( Another important difference between unitary matrices 2 -0.5 *R4R and 2 -0.5 *R4L is the following: the matrix 2 -0.5 *R4L is symmetric and correspondingly all its eigenvalues are real.By contrast, the matrix 2 -0.5 *R4R is asymmetric and has complex eigenvalues.
The multiplication of the stereo-matrices R4R and R4L are non-commutative: R4R*R4L ≠ R4L*R4R.Their commutator R4R*R4L -R4L*R4R taken with the factor 8 -0.5 is the unitary matrix: In relation to each other, the right and left stereo-matrices R4R and R4L are amicable and disjoint (these algebraic notions are used in different applications of matrices).In linear algebra, by definition, two square matrices M and N of order n are said to be amicable if MN T = NM T (see, for example, [Seberry, Wysocki, Wysocki, 2005]).Also by definition, two {0,±1} matrices M and N of the same size are said to be disjoint if for all of their positions the following rule is true: if M has a nonzero entry at the (i, j)-th position then N has zero entry at the same position and vice versa, i.e., M*N=0.
Stereo-matrices R4R and R4L taken in tensor power k = 1, 2, 3,… defines tensor families of matrices R4R (k) and R4L (k) of order 4 k , members of which satisfy the condition (6): where Is is the identity matrix of order 4 k .
Fig. 13.The algorithmic construction of the second tensor power of genetic stereo-matrices R4R and R4L from Fig. 11.
Genetic stereo-matrices R4R (k) and R4L (k) have some analogies with Hadamard matrices.By definition, Hadamard matrices Hn of order n is a square matrix with entries +1 or -1 that satisfies the condition (8): where In is the n*n identity matrix.The mentioned analogies allow considering applications of stereomatrices R4R (k) and R4L (k) in quantum informatics, noise-immune coding and recovering information in the presence of noise and interferences, and in some other fields in a parallel with traditional using Hadamard matrices there (a web search of bibliography of different applications of Hadamard matrices gives 44 thousands of publications in the period 1978-2005 years [Seberry, Wysocki, Wysocki, 2005]).
As we can judge, stereo-matrices R4R and R4L didn't meet previously in mathematical natural sciences.They were discovered in the analysis of molecular-genetic structures and they seem to be new interesting mathematical tools for genetic researches, quantum informatics and some other areas.The pair of complementary stereo-matrices R4R and R4L gives new materials to discussions existing from the ancient time about a role of binary principles "Yin-Yang", "left-right", "male-female", "odd-even" in organization of Nature (see, for example, a collection of facts in the book [Ivanov, 1978]).
Let us note additionally that described decompositions of the matrix R4 of 16 doublets (Figs. 4,6,11) into sparse unitary matrices were not arbitrary but they were based on objective binaryoppositional indicators of nitrogenous bases C, A, G, A and T/U (Fig. 1) and on the logical operation of modulo-2 addition.One can check that these decompositions were constructed in line with binary numbers in brackets inside matrix cells (Fig. 4).In each cell, such binary number is equal to a sum of binary numberings of the row and the column of the cell on the basis of modulo-2 addition.For example, the cell with the triplet CAG is located in the row 001 and in the column 011.The operation of modulo-2 addition gives their sum: 001⊕011 = 001; this binary number is shown in brackets in the cell with the triplet CAG.Such type of numeration of cells in matrices, whose rows and columns are numerated by means of dyadic groups of binary numbers, is known in theory of processing digital signals as a "dyadic-shift numeration" [Ahmed, Rao, 1975;Harmuth, 1989;Petoukhov, He, 2009].Matrices with such numeration of their cells are called dyadic-shift matrices.One can see that such dyadic-shift numeration of cells of the (4*4)-matrix R4 of 16 doublets (Fig. 4, left) divides the set of 16 doublets into 4 subsets with 4 doublets in each: the subset of cells with binary numberings 00 contains doublets CC, CG, GC and GG; the subset of cells with numberings 01 contains CA, CT, GA and GT; the subset of cells with numberings 10 contains AC, AG, TC and TG; the subset of cells with numberings 11 contains AA, AT, TA and TT.Below we use similar dyadic-shift decompositions for the (8*8)-matrix R8 of 64 triplets from Fig. 5 with non-trivial results.As known, if any system of elements demonstrates its connection with dyadic shifts, it indicates that the structural organization of its system is related to the logic of modulo-2 addition.Correspondingly the structural organization of the genetic system is related to the logic of modulo-2 addition.
Rows and columns of sparse matrices K0, K1, K2 and K3 correspond to complete orthogonal systems of functions.Below we will also meet other examples that genetic unitary matrices with block structures are constructed as sums of more simple unitary matrices.
Fig. 16.Initial members of tensor families of unitary matrices K0 (n) , K1 (n) , K2 (n) and K3 (n) having fractal structures and complete orthogonal systems of functions in their rows and columns.Entries +1, -1 and 0 are marked by yellow, blue and green correspondingly.
One can note that the set of 8 unitary genetic matrices R80, R81, R82, R83, R84, R85, R86, R87 (Fig. 17) also contains the following algebraic complementarities in corresponding pairs of these matrices: unitary matrices R80 and R87 form the first pair of the algebraic complementarity since they are transformed into each other by mirror reflection relative to the average vertical line with inversion of signs of their non-zero entries (the mirror-anti-symmetry).The same is true for the pairs of unitary matrices R81 and R86, R82 and R85, R83 and R84, which form the other pairs with their similar algebraic complementarity of the mirror-anti-symmetric type.Determinants of all these 8 unitary matrices are equal to 1; by this reason they belong to the type of so called special unitary matrices.The special unitary matrices are closed under multiplication and the inverse operation.They form the special unitary group (https://en.wikipedia.org/wiki/Special_unitary_group).The multiplication table (Fig. 18) of this closed set of 8 unitary matrices coincides with the multiplication table of the algebra of bi-split-quaternions of Cockle.
One should note that these two subsets of 8 unitary genetic matrices have some relations to evolution changes of the genetic code.Modern science knows more than 20 variants (or dialects) of the genetic code represented on website http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi.As this website shows, these dialects differ from the Standard Code only by the code meanings of a small number of triplets.For example, in the Standard Code the triplet TAG encodes the amino acid Leu but in the Chlorophycean Mitochondrial Code it serves as the stop-codon.One can check that in all dialects without exception all stop-codons have their first letter A or T; practically all initiative codons also begin with the letters A or T. Beside this, practically all triplets with changed coding meaning are triplets with the first letters A or T in them; such triplets belong to two quadrants along the second diagonal of the matrix in Fig. 4. All entries of these two quadrants in the corresponding numeric matrix in Fig. 5 belong to the second subset of symmetric unitary matrices R84, R85, R86 and R87 (Fig. 17).A small exception is the case of two genetic codes of yeast (the Yeast Mitochondrial Code and the Alternative Yeast Nuclear Code), in which the triplets CUU, CUC, CUA and CUG, having the first letter C, change their coding meaning: in the Standard Code, all these 4 triplets encode the amino acid Leu but in the Yeast Mitochondrial Code they encode the amino acid Thr.These 4 triplets having strong roots are located in the upper quadrant of the matrix of triplets in Fig. 4, and their representations by entries +1 in the numeric matrix R8 belong to the unitary matrices R82 and R83 (Fig. 17).

Connections among amino acids and triplets from the standpoint of unitary genetic matrices
Our results about connections of the genetic (8*8)-matrix R8 (Fig. 5) with unitary matrices give additional approaches to study symmetric relations between the set of 64 triplets and the set of 20 amino acids and stop-codons encoded by triplets.Fig. 19 reproduces the symbolic matrix of 64 triplets from Fig. 4 but with the additional indication of amino acids and stop-codons in the Vertebrate Mitochondrial Code; this dialect is the most symmetrical among all dialects of the genetic code in line with known data of the website http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi.One can see in Figs. 4, 5 and 18 that the genetic symbolic matrix of 64 triplets and its corresponding numeric representation R8 (Fig. 5) have pairs of adjacent rows 0 and 1, 2 and 3, 4 and 5, 6 and 7, whose mosaics are identical inside each of pairs.Moreover, both rows inside each of these pairs have the identical list of amino acids and stop-codons marked by identical colors (Fig. 19).

GGG Gly
Fig. 19.The correspondence among 64 triplets and 20 amino acids and stop-codons in the case of the Vertebrate Mitochondrial Code.Both rows inside each of pairs of adjacent rows 0-1, 2-3, 4-5, 6-7 with their even-odd numberings have identical black-and-white mosaics (reproduced in the numeric matrix R8 in Fig. 5) and also identical lists of amino acids and stop-codons (marked with the same colors).
The described decomposition (Fig. 17) of the numeric matrix R8 of triplets into sums of unitary matrices is accompanied by a separation of the complete set of 64 triplets into corresponding subsets of triplets.This separation of the set of 64 triplets is accompanied by a relevant separation of the complete set of 20 amino acids and stop-codons into appropriate subsets taking into account the code meaning of each of triplets.For example, the matrix R8 can be represented in the following form using the decomposition in Fig. 17: Here each of 4 expressions in brackets taken with the factor 2 -0.5 is a unitary matrix: where I8 is the identity matrix of order 8. Fig. 20 shows these 4 inter-complementary matrices (R80+R82), (R81+R83), (R84+R86) and (R85+R87) whose sum is equal to the matrix R8 from Fig. 5.
Rows and columns of each of the 4 unitary matrices 2 -0.5 *(R80+R82), 2 -0.5 *(R81+R83), 2 - 0.5 *(R84+R86) and 2 -0.5 *(R85+R87) represent complete systems of orthogonal functions.By this reason, these 4 matrices can be used for a decomposition of an arbitrary 8-dimensional vector  ̅ = [x0, x1, x2, x3, x4, x5, x6, x7] on the basis of orthogonal functions of each of these orthogonal systems.By this way, an 8dimensional vector  ̅ receives different spectral representations in a general case.The multiplication of any of such spectral representations of the vector  ̅ with correspondence transposed unitary matrix restores the initial vector  ̅ .Such spectral representations of vector-signals can be used for noiseimmunity coding of information by some analogies with known noise-immunity coding on the basis of Hadamard matrices in digital signal processing.
The 32 triplets in both matrices R80+R82 and R81+R83 begin with letters C or G and define the content of two quadrants along the main diagonal of the symbolic matrix in Figs. 2, 19.Other 32 triplets in both matrices R84+R86 and R85+R87 begin with letters A and T and define the content of two quadrants along the secondary diagonal of the same matrix.It was mentioned above that in all dialects of the genetic code without exception all stop-codons have their first letters A or T; practically all initiative codons also begin with the letters A or T (so they are connected with unitary matrices 2 -0.5 *(R84+R86) and 2 -0.5 *(R85+R87)).Beside this, practically all triplets with changed coding meaning in different dialects are also triplets with the first letters A or T in them and they are connected with the same unitary matrices.

On unitary genetic matrices and the law of frequencies of hydrogen bonds in long DNA sequences of complete sets of nuclear chromosomes
In this Section we will study hidden regularities in long sequences of single stranded DNA, which usually studied by some authors in relation with the second Chargaff's parity rule (see for example [Chargaff, 1971[Chargaff, , 1975;;Prahbu, 1993;Rapoport, Trifonov, 2012; https://en.wikipedia.org/wiki/Chargaff%27s_rules].The term "long DNA sequences" means that each of sequences contains no less than 100000 nucleotides.We will consider any of long DNA sequences as a chain of numbers 2 and 3 of hydrogen bonds, which connect complementary nucleotides A-T and C-G correspondingly (one can denote conditionally A=T=2 and C=G=3).We call such chains of numbers of hydrogen bonds as "hydrogen bonds sequences" or briefly "H-sequences".
Below we describe our discovery of those hidden regularities in H-sequences of long DNA and in H-sequences of complete sets of chromosomes of different organisms, which are connected with members of H-n-plets.These regularities say about approximate invariants of frequencies (or probailities) of members of the alphabets of H-n-plets in long DNA H-sequences.It is obvious that each of members of alphabets of H-n-plets combines a certain set of n-plets of nucleotides: for example, the H-doublet 33 combines 4 doublets of nucleotides CC, CG, GC, GG; the H-triplet 332 combines 8 triplets of nucleotides CCA, CCT, CGA, CGT, GCA, GCT, GGA, GGT (see Fig. 22); etc. Fig. 22 shows the genetic matrices from Fig. 4 with a representation of 16 doublets and 64 triplets of nucleotides in forms of appropriate H-doublets and H-triplets.By a comparison of Fig. 22 with Figs. 5, 6 and 17, one can see the following strong correspondence between the genetic unitary matrices (Fig. 6 and 17) and the genetic matrices showing dispositions of H-doublets and H-triplets (Fig. 22): the disposition of each kind of H-doublets and Htripets in different cells of the matrices in Fig. 22 coincises precisely with the disposition of non-zero entries of different cells of corresponding sparse unitary matrices in Figs. 6 and 17.For example, the disposition of the H-doublet 32 in cells of the matrix of doublets in Fig. 22 coincises with the disposition of non-zero entries in cells of the sparse unitary matrix R14 in Fig. 6; the disposition of the H-triplet 222 in cells of the matrix of triplets in Fig. 22 coincises with the disposition of non-zero entries in cells of the sparse unitary matrix R87 in Fig. 17, etc. Correspondingly, each of the sparse unitary matrices in Figs. 6 and 17 can be defined by dispositions of appropriate H-doublets or H-triplets in cells of matrices in Fig. 22 (taking into account additionally colors of doublets and triplets in Fig. 22 to define signs "plus" or "minus" of entries of these unitary matrices).This connection between the genetic unitary matrices and matrix dispositions of H-doublets and H-triplets in Fig. 22 stimulated us to study frequencies (or probabilities) of H-n-plets in long DNA sequences.Results of the study are described below.As we can judge, studies of the frequencies of n-plets of hydrogen bonds 3 and 2 in long DNA sequences have never been carried out by anyone early.So, it is a new field for researches.
We have studyed frequencies of all members of each of H-n-alphabets (for cases n = 1, 2, 3, 4, 5) in different H-n-plet representations of many long DNA sequences including DNA sequences of complete sets of chromosomes of some organisms.This study has led to interesting evidences in favor of existence of general regularities (principles or laws) concerning frequencies of members of H-nalphabets in long DNA sequences.
As one of our results, Fig. 23 shows frequencies of members of H-n-alphabets (n = 1, 2, 3, 4, 5) in DNA sequences of all 5 chromosomes of a plant Arabidopsis thailana; initial data about DNA sequences of these chromosomes were taken from the GenBank: https://www.ncbi.nlm.nih.gov/genome/4.The lengths of these DNA sequences are the following: in the chromosome #1, the length is equal to 30427671 bp; in the chromosome #2 -19698289; in the chromosome #3 -23459830; in the chromosome #4 -18585056; in the chromosome #5 -26975502.Members of all these H-n-alphabets in Fig. 23 are given in a certain arrangement corresponding to arrangements of coordinates in the tensor family of vectors [3, 2] (n) , where (n) means the tensor power.For example, the vector [3, 2] (2) = [33,32,23,33] have the arrangement of its coordinates 33, 32, 23, 22, which is used in Fig. 23 for the appropriate graph in the case of the alphabet of H-doublets.Similar correspondences hold true for vectors [3, 2] (3) , [3, 2] (4) , [3, 2] (5) and appropriate graphics of frequencies of members of other H-n-alphabets in Fig. 23.chromosomes of a plant Arabidopsis thaliana.The left column contains the sequence numbers of the chromosomes.Each of points in graphs shows a frequency of an appropriate member of H-n-alphabets.For each of chromosomes, its frequencies q and p of H-monoplets 3 and 2 correspondingly are shown in the second column.See explanations in the text.
In addition to graphs in Fig. 23, Table 1 shows numeric values of frequencies of all members of these H-n-alphabets for all 5 chromosomes and for n = 1, 2, 3, 4 (the case n = 5 is not shown because its numeric table has a big volume).For statistical analysis of these phenomenolgical data from all chromosomes we use Excel with denotations and formulas from the website on "Real statistics using Excel" (http://www.real-statistics.com/descriptive-statistics/measures-variability/).Table 1.Frequencies of all members of the H-n-alphabets (n = 1, 2, 3, 4) in DNA sequences of all 5 chromosomes of Arabidopsis thaliana are shown.Symbols q and p denote frequencies of H-monoplets 3 and 2 correspondingly and are shown in the left column separately for each chromosome.The frequency values are rounded to the sixth decimal place.Traditional statistical symbols are also shown: X ̅ denotes the mean of the sample S = {x1, x2, …, xn}; s -sample standard deviation; V -coefficient of variation (see their formulas in http://www.real-statistics.com/descriptive-statistics/measuresvariability/).
The data from Table 1 that all 5 chromosomes have approximately the identical value q of frequencies of the H-monoplet 3 and also approximately the identical value p of frequencies of the Hmonoplet 2 remind the following known data: the article [Okamura, Wei, Scherer, 2007] shows an approximate equality of frequencies of each of nucleotides A, T, C and G in all human nuclear chromosomes with a precision in a few percentage.Correspondingly a summary frequency of nucleotides A and T (and also C and G), which is directly connected with the frequency of the Hmonoplet 2 (the H-monoplet 3), has approximately the same values in all human nuclear chromosomes.
Similar results (with some other levels of accuracy) have been obtained in our analysis of the following complete sets of nuclear chromosomes on the basis of initial data about their DNA sequences from the GenBank: a nematode Caenorhabditis elegans, fruit fly Drosophila melanogaster, house mouse Mus musculus, Homo sapiens.
These results allow assuming existence of the following general rule for eucariots about frequencies of hydrogen bonds in complete sets of nuclear chromosomes: • in complete sets of nuclear chromosomes of different organisms, a frequency of any member of the H-n-alphabets has approximately the same value in all chromosomes.Of course, further researches are needed to define a degree of universality and precision of this rule.This rule has mathematical analogies with the main law of population genetics -the Hardy-Weinberg law as it is described in the next Section.

The mathematical model of frequencies of hydrogen bonds in long DNA sequences
Our conception of the quantum-algorithmic genetics (or the quantum-computing genetics) [Petoukhov, 2017a;Petoukhov, Svirin, 2018] has led the author to the useful mathematical model not only for phenomenologic data presented in Fig. 23 but also for similar data in many other cases of long DNA sequences studied in our researches.In the end of this Section we describe this model using the notion of qubits and also some classical formalisms from quantum informatics [Nielsen, Chuang, 2010].But firstly we describe our model approach by means of a simpler language.
In our proposed approach, phenomenologic frequencies of all members of H-n-alphabets [3, 2] (n) in a long DNA sequence under fixed n are modeled by numeric values of appropriate coordinates of 2 ndimensional vector from the tensor family of frequency vectors [q, p] (n) where q and p mean frequencies of the H-monoplet 3 and the H-monoplet 2 correspondingly (these frequencies q and p are shown for each of chromosomes in the second columns in Fig. 23).Correspondingly, in this model approach, to get model values of frequencies of all 2 n members of the H-n-alphabet in a considered long DNA, one should calculate the following 2 n coordinates of frequency vectors from their tensor family: -in the case of H-doublets-alphabet, 4 coordinates of the vector [q, p] (2) = [qq, qp, pq, pp]; -in the case of H-triplets-alphabet, 8 coordinates of the vector [q, p] (3) = [qqq, qqp, qpq, qpp, pqq, pqp, ppq, ppp]; -in the case of H-tetraplets-alphabet, 16 coordinates of the vector [q, p] (4) = [qqqq, qqqp, qqpq, qqpp, qpqq, qpqp, qppq, qppp, pqqq, pqqp, pqpq, pqpp, ppqq, ppqp, pppq, pppp]; -in the case of H-pentaplets-alphabet, 32 coordinates of the vector [q, p] (5) = [qqqqq, qqqqp, … , ppppp], etc. Briefly speaking, in line with the proposed model approach, for a prediction of approximate values of frequencies «f» of many members of H-n-alphabets in a long DNA sequence on the basis of knowledge about only two frequencies q and p , it is enough to do the following: • calculate frequencies q and p of H-monoplets 3 and 2 in the sequence; • replace digits 3 and 2 in H-symbols of these H-alphabetic members by their frequencies q and p as factors; • calculate the product of these factors.The expression ( 14) shows an example of such calculation of the frequency f(332) for the member 332 of the H-triplets-alphabet under q=0,36 and p=0,64 (q+p=1): f(332) = qqp = 0,36*0,36*0,64 = 0,0829 ( 14) Fig. 24 shows nice correspondences between these model values (red points) of components of frequency vectors [q, p] (n) and phenomenological values (blue points) of frequencies of all members of H-n-alphabets (n = 2, 3, 4, 5).One can see in these graphs that the model points of red color turned out to be almost exactly superimposed on the phenomenological points of blue color.It gives evidences in favor that knowing only two probablities q and p of H-monoplets 3 and 2 in a long DNA sequence, one can predict dozens of frequency values of all members of H-n-alphabets for this DNA sequence with a qood level of accuracy.We presume that a similar model correspondence holds true also for n = 6, 7, 8, ... (if n is not too large) but this should be studied in future researches.
3 q=0,36331075 p=0,63668925 4 q=0,36204022 p=0,63795978 5 q=0,35938926 p=0,64061074 In addition to graphs in Fig. 24, Table 2 shows numeric values of coordinates of vectors [q, p] (n) for all 5 chromosomes and for n = 1, 2, 3, 4 (the case n = 5 is not shown because its numeric table needs a big place and can be created by analogy).By a comparison of Tables 1 and 2, one can see that the proposed model approach gives nice results in modeling the phenomenological frequencies.
Table 2. Values of coordinates of vectors [q, p] (n) are shown, which are used as models of values of frequencies of all members of the H-n-alphabets (n = 1, 2, 3, 4) in DNA sequences of all 5 chromosomes of Arabidopsis thaliana.Symbols q and p denote frequencies of H-monoplets 3 and 2 correspondingly and are shown in the left column separately for each chromosome.The model values are rounded to the sixth decimal place.Traditional statistical symbols are also shown: X ̅ denotes the mean of the sample S = {x1, x2, …, xn}; s -sample standard deviation; V -coefficient of variation (see their formulas in http://www.real-statistics.com/descriptive-statistics/measures-variability/).

Amounts of identical values of coordinates in vectors
This law uses the Hardy-Weinberg equation ( 15) to examine a simple genetic locus at which there are two alleles, A and a: where q is the frequency of the «a» allele and p is the frequency of the «A» allele in the population.In the equation ( 15), q 2 represents the frequency of the homozyqous genotype aa, p 2 represents the frequency of the homozyqous genotype AA, and 2qp represents the frequency of the heterozyqous genotype Aa.The sum of the allele frequencies for all the alleles at the locus must be 1, so p + q = 1.If the p and q allele frequencies are known, then the frequencies of the three genotypes may be calculated using the Hardy-Weinberg equation (https://www.nature.com/scitable/definition/hardy-weinbergequation-299).In population genetics studies, the Hardy-Weinberg equation ( 14) is used in particularly to measure whether the observed genotype frequencies in a population differ from the frequencies predicted by the equation.Now we should describe that in our analysis of the tensor families of H-n-alphabets [3, 2] (n) and frequency vectors [q, p] (n) , we have got in some sense the similar equation ( 16) and the similar rule for frequencies of members of H-n-alphabets though the obvious great difference exists between genotype frequencies in population genetics and frequencies of members of H-n-alphabets in long DNA sequencies.As it was mentioned above, triplets with strong and weak roots have different properties concerning the degeracy of the genetic code (Fig. 4).We will also distinguish in each of the H-nalphabets [3, 2] (n) its members on the basis of their roots, that is, on the basis of their first two Hmonoplets: -members with a root composed of identical H-monoplets will be called as homo-rooted members of the H-n-alphabet by analogy with the term "homozygous genotype" (for example, the H-tetraplets 3323, 3322, 3332, 3333 belong to homo-rooted members relative to H-monoplet 3; the H-tetraplets 2233, 2232, 2223, 2222 belong to homo-rooted members relative to H-monoplet 2); -members with a root composed of different H-monoplets will be called as hetero-rooted members of the H-n-alphabet by analogy with the term "heterozygous genotype" (for example, the H-tetraplets 3233, 3222, 2333, 2322 belong to hetero-rooted members of the H-4-alphabet represented by components of the vector [3, 2] (4) ).Let us calculate for each of H-n-alphabetic vectors [3, 2] (n) (n = 2, 3, 4, … ) the sums of frequencies of its homo-rooted members and separately the sum of its hetero-rooted members, taking into account that frequencies of all these members are represented by appropriate components of the frequency vectors [q, p] (n) where q+p=1.Here q means a frequency of the H-monoplet 3, p means the frequency of the H-monoplet 2. The calculations give the following results.
In the case of the H-doublets-alphabet, its 4 members are components of the vector [3, 2] (n) = [33,32,23,22] and their frequencies are represented by appropriate components of the frequency vector [q, p] (2) = [qq, qp, pq, pp].Correspondingly the frequency of the homo-rooted member 33 is equal to q 2 ; the frequency of the homo-rooted member 22 is equal to p 2 and the sum of frequencies of the heterorooted members 32 and 23 is equal to 2qp.The total sum of all these frequencies should be equal to 1.This is expressed by the equation ( 15), which is completly analogical to the Hardy-Weinberg equation ( 14): q 2 + 2qp + p 2 =1 (16) In the case of the H-triplets-alphabet, its 8 members are components of the vector [3, 2] (3) = [333,332,323,322,233,232,223,222] and their frequencies are represented by appropriate components of the frequency vector [q, p] (3) = [qqq, qqp, qpq, qpp, pqq, pqp, ppq, ppp].In this case, the sum of frequencies of homo-rooted members 333 and 332 is equal to q 2 again since (qqq + qqp) = qq*(q+p) = q 2 ; the sum of frequencies of homo-rooted members 223 and 222 is equal to p 2 again since (ppq + ppp) = pp(q+p) = p 2 and the sum of frequencies of all hetero-rooted members 323, 322, 233, 232 is equal to 2qp again since (qpq + qpp + pqq + pqp) = qp(q+p) +pq(q+p) = 2qp (taking into account that q+p=1).So, in the case of the H-triplets-alphabet the equation ( 16) is also true.
By analogy one can check that in other cases of H-n-alphabets with n = 5, 6, 7, … the equation ( 16) are also true.For all these cases, in the equation ( 16) the summand q 2 means the total frequencies of all homo-rooted members relative to the H-monoplet 3; the summand p 2 means the total frequencies of all homo-rooted members relative to the H-monoplet 2; the summand 2qp means the total frequencies of all hetero-rooted members in any H-n-alphabet (n = 2, 3, 4, …).On this basis, we propose the following general and idealized rule in the frame of our model approach (by analogy with the formulation of the Hardy-Weinberg law): • Frequencies of homo-rooted and hetero-rooted members of H-n-alphabets (n = 2, 3, ,4, …) in any long DNA sequence will remain constant from one H-n-alphabet to another H-n-alphabet in the absence of special evolutionary influences.But what one can say about these "special evolutionary influences"?Briefly speaking from the standpoint of quantum informatics (see the last part of this Section), these are those influences, which disturb so called separable pure states of long DNA sequences and transform them into entangled states.As known, "the state space of a composite physical systems is the tensor product of the state spaces of the component physical systems.Moreover, if we have systems numbered 1 through n and system number i is prepared in the state i, then the joint state of the total system is 1 ⨂ 2 ⨂ … n" [Nielsen, Chuang, 2010, p. 102].If a quantum state can be represented as a vector of a Hilbert space Hi, such state is called a pure quantum state.If a pure state |ψ> ∈ H1⨂H2 can be written in the form |ψ> = |ψ1> ⨂ |ψ2>, where |ψi> is a pure state of the i-th subsystem, it is said to be separable.Otherwise it is called entangled.In our model approach, we represent long DNA sequences as quantum systems in separable pure states.Any discrepancies between real and model values of frequencies (or probabilities) of members of H-n-alphabets in long DNA sequences can be attributed to the violation of separable pure states of the considered model systems due to their transition to entangled states under different evolutionary influences such as mutations, dreifs of genes, etc.
We believe that expressions of the sort ( 14) and the equation ( 16) could play a significant role for structural analysis of long DNA sequences and their evolution by some analogy with the important role of the Hardy-Weinberg equation ( 15) in population genetics and evolutionary biology.Of course, a degree of universality of this rule about frequencies of members of H-n-alphabets in long DNA sequences should be tested and defined in future researches.By some analogy with the Hardy-Weinberg equation ( 15), expressions of the sort ( 14) and the equation ( 16) can be used in particularly to measure whether the observed frequencies of members of H-n-alphabets in long DNA sequences differ from frequencies predicted by them.One can suppose that genetic phenomena described by the Hardy-Weinberg law have an evolutionary connection with the described phenomena of long DNA sequences of hydrogen bonds 3 and 2.
We obtained the same results initially on the basis of applications of notions and formalisms of quantum compiters to analysis of long DNA sequences.First of all, we use the notion of n-qubit systems and the algebraic operation of the tensor product of matrices and vectors.As we already noted above, the tensor product is the crucial operation to understanding the quantum mechanics of multiparticle systems [Nielsen, Chuang, 2010, p. 71] and is one of basic instruments in quantum informatics.
As known, a quantum bit (or qubit) is a unit of quantim information.For two-level quantum systems used as qubits, the state |0> is identified with the vector (1, 0), and similarly the state |1> with the vector (0, 1).Two possible states for a qubit are the states |0> and |1>, which correspond to the states of 0 and 1 for a classical bit.The difference between bits and qubits is that a qubit can be in a state other than |0> or |1>.It is possible to form linear combinations of states, often called superpositions (17): Here the symbol |ψ> means a state of a gubit.The numbers √ 2 and √ 2 can be complex numbers but in our case it is enough to think of them as real numbers.Put another way, the state of a qubit is a vector in a two-dimensional vector space.The standart notation for states in quantum mechanics is the Dirac notation "| >".The special states |0> and |1> are known as computational basis states, and form an orthonormal basis for this vector space [Nielsen, Chuang, 2010, p. 13].As known, we cannot examine a qubit to determine its quantum state, that is, the values of  and .Instead, quantum mechanics tells us that we can only acquire much more restricted information about the quantum state.When we measure a qubit we get either the result 0, with probability q, or the result 1, with probability p. Naturally, the probabilities must sum to one (18): Geometrically, we can interpret this as the condition that the qubit's state be normalized to length 1.Values √ 2 and √ 2 are called amplitudes of probabilities q and p (the terms of "probabilities" and "frequencies" are synonyms in this article).Thus, in general a qubit's state is a unit vector in a twodimensional complex vector space.Let us emphasize again that when a qubit is measured, it only ever gives "0" or "1" as the measurement result -probabilistically.
In more general case, a system of "n" qubits is considered in quantum informatics.The computational basis states of this system are written in the form |x1x2….xn>; a quantum state of such a system is specified by 2 n amplitudes [Nielsen, Chuang, 2010, p. 17].In our model approach we interpret different H-n-representations (described below) of DNA texts as quantum systems of n-qubits.
In technical devices of quantum informatics, a qubit can be represented by many ways on the basis of different pairs of binary-oppositional indicators: for example, by two electronic levels in an atom; by two kinds of polarization of a single photon (vertical polarization and horizontal polarization), etc.
In our model approach for genetic informatics, we represent a qubit as a two-level quantum system: in this quantum system of a long DNA sequence one level corresponds to the indicator «3 hydrogen bonds» and the second level -to the oppositional indicator «2 hydrogen bonds».In other words, such «genetic» qubit is represented by these oppositional indicators and the state of such qubit is a vector in its appropriate two-dimensional Hilbert space H.One can assume that the state |0> corresponds to the state «3 hydrogen bonds», and the state |1> -to the state «2 hydrogen bonds».In this case, the state of a long DNA sequence as a quantum system in the form of a sequence of numbers 3 and 2 of hydrogen bonds is described by the expression ( 14) where q and p are probabilities (or frequencies) of its nucleotides with 3 or 2 hydrogen bonds correspondingly.
The tensor product of the two-dimensional Hilbert space H⨂H gives one 4-dimensional Hilbert space with the following separable pure state of a quantum 2-qubit system (19): Such 2-qubit system (16 19), which is generated by the second tensor power |ψ> (2) of a state |ψ> of a qubit ( 14), has 4 computational basis states denoted |00>, |01>, |10>, |11>.Probabilities (or frequencies) of these states satisfy the normalization condition (20): One can see the following strong correspondence between the denotations of computational basis states |00>, |01>, |10>, |11> and appropriate values of probabilities for them qq, qp, pq, pp: in any of the denotations of these computational basis states, replacements of the symbols 0 and 1 by values q and p correspondingly give the value of the probability of this state.For example, such replacement in the denotation of the state |01> gives its probability qp.Such strong correspondence between denotations of computational basis states and values of their probabilities as the product of the probabilities q and p holds true in other cases of the tensor powers |ψ> (n) of a state |ψ> of a qubit (17).

Unitary genetic matrices, frequencies of hydrogen bonds and hypercomplex numbers
Above we have described some evidences in favor that structurisation of long DNA texts is connected with frequencies of members of H-n-alphabets in the texts.Now let us pay our attention initially to the H-doublets-alphabet with its 4 members 33, 32, 23, 22. Frequencies of these members in a real DNA sequence will be denoted by symbols 0, 1, 2, and 3 correspondingly.Fig. 22 shows locations of the members in the genetic (4*4)-matrix of 16 doublets.If in this matrix the members 33, 32, 23, 22 are replaced by their frequencies 0, 1, 2, and 3, the numeric matrix F4 of the frequencies arises.Fig. 26 shows this matrix F4 and its decomposition into the sum of sparse matrices 0*R04+1*R14+2*R24+3*R34 where R04, R14, R24 and R34 are unitary matrices from Fig. 6.
But the set of unitary matrices R04, R14, R24 and R34 is closed relative to multiplication in correspondence with the multiplication table of the algebra of split-quaternions by J. Cockle (Fig. 7).By this reason the frequency matrix F4 represents a split-quaternion by J. Cockle having the following linear form of its representation by means of the expression ( 22): where E is the real unit and i, j, k -imaginary units of split-quaternions of J. Cockle.Each long DNA sequence can be considered as a sequence of 4 H-doublets 33, 32, 23, 22 with certain values of their frequencies  0,  1,  2, and  3. It allows using the expression (22) of splitquaternions of J.Cockle to characterize and compare different long DNA sequences on the language of the algebra of these 4-dimensional hypercomplex numbers.This gives a new algebraic tool for comparison analysis of different DNA and RNA sequences in the field of evolutionary biology on the basis of properties of this algebra (https://en.wikipedia.org/wiki/Split-quaternion ).
But the set of 8 unitary matrices R80, R81, R82, R83, R84, R85, R86, R87 is closed relative to multiplication in correspondence with the multiplication table of the algebra of bi-split-quaternions (Fig. 18).By this reason the frequency matrix F8 represents a bi-split-quaternion having the following linear form of its representation by means of the expression (23): where E is the real unit and i1, i2, i3, i4, i5, i6 -imaginary units of bi-split-quaternions.Each long DNA sequence can be considered as a sequence of 8 H-triplets 333,332,323,322,233,232,223,222 with certain values of their frequencies 0, 1, 2, 3, 4, 5, 6, 7.It allows using the expression (23) of bi-split-quaternions to characterize and compare different long DNA sequences on the language of the algebra of these 8-dimensional hypercomplex numbers in addition to the abovementioned algebra of 4-dimensional split-quaternions.The algebra of these 8-dimensional hypercomplex numbers can be extended further for cases of 2 n -dimensional hypercomplex numbers of a relevant type to use them for a comparative characterizing different long DNA sequences.One can suppose that such method of a comparative characterison and analysis by means of hypercomplex numbers can be used not only for long DNA sequences but also for genes and other relative short sequences of DNA and RNA.
But sets of members of the H-monoplets-alphabet, the H-doublets-alphabet, the H-tripletsalphabet, etc. can be characterized also in a connection with matrix representations of the algebra of hyperbolic numbers (or split-complex numbers, also double number) (https://en.wikipedia.org/wiki/Split-complex_number ) and its 2 n -dimensional extensions.Fig. 28 shows the known matrix form of the repsesentation of these hyperbolic numbers d=a*1+b*j and its decomposition into 2 sparse matrices with the multiplication table for these matrices or basic units (here j 2 = +1; j is the imaginary unit; a, b are real coefficients).Both of these sparse matrices are unitary matrices.
* 1 j 1 1 j j j 1 Fig. 28.On the left: the matrix representation of hyperbolic numbers d=a*1+b*j and its decomposition into two sparse matrices, which represent the real unit 1 (the first sparse matrix) and the imaginary unut j (the second sparse matrix).Here j 2 = +1, a and b are real numbers.On the right: the multiplication table of the basic elements of hyperbolic numbers.
Taking this into account, let us return to the symbolic matrices in Fig. 2 and begin with the (2*2)matrix [C, A; T, G].If each of these nucleotides is represented by its number of hydrogen bonds (C=G=3 and A=T=2), then this matrix is transformed into the numeric matrix [3, 2; 2, 3].If each of these numbers 3 and 2 of hydrogen bonds is replaced by its frequency q or p in the considered long DNA sequence, the frequency matrix [q, p; p, q] arises.In line with the expression in Fig. 28, this frequency matrix is the matrix representation of the hyperbolic number q+p*j, where j 2 = +1.This frequency (2*2)-matrix in its different tensor powers [q, p; p, q] (n) represents an appropriate 2 n -dimensional hyperbolic matrion.In the proposed model approach on the basis of hyperbolic matrions, entries of the matrix [q, p; p, q] (n) gives model sets of approximate frequencies for all members of a corresponding H-n-alphabet, members of which are components of the matrix [3, 2] (n) .From this point of view, each long DNA sequence in relation of frequencies of its members of H-n-alphabets can be modeled on the basis of hyperbolic matrions for a comparison analysis with other DNA sequences.
The study of these questions is continued and its results will be published later.

Some concluding remarks
The article describes author's results about connections of genetic matrices with unitary matrices, the logical operation of the modulo-2 addition and complete orthogonal systems of functions.These genetic matrices represent genetic alphabets jointly with known features of the degeneracy of the genetic code.These results are interesting by the following main reasons.
Firstly, they give new approaches to model some genetic structures and phenomena on the basis of mathematical formalisms of quantum mechanics and quantum informatics where unitary operators have a key meaning.Correspondingly -from this standpoint -a hidden logic organization of the genetic system should be considered in the light of notions of quantum logic.Our results show that, from this modeling standpoint, the genetic system is a whole hierarchical system of interconnected unitary matrices of different orders woven together and formed tensor families of unitary matrices.Some of these unitary genetic matrices coincide with well-known quantum gates of quantum informatics; all other unitary genetic matrices can be also considered as special quantum gates for hidden quantuminformation calculations in the genetic system.Complementary relations exist among some unitary genetic matrices.
We suppose that unitary genetic operators (unitary matrices) are the basis for calculations in genetics by some analogy with calculations in quantum informatics.In the frame of our model approach we put forward the working hypothesis that DNA-and RNA-sequences of n-plets (of doublets, triplets, etc.) serve to define unitary operators for quantum calculations in genetics by analogy with quantumlogical calculations in quatum computing.From this standpoint, DNA-and RNA-sequences are instruments to define systems of interconnected unitary operators for quantum calculations by means of the quantum logic (in particular, this is reflected in the special mosaic organisation of genetic matrices in Figs. 4 and 5).The presented materials about connections of genetic systems with quantum informatics (see additionally [Petoukhov, 2017a]) has led us to revealing hidden regularities (or rules) in long DNA sequences and complete sets of chromosomes of some organisms.These supposed and described rules concern frequencies (or probabilities) of members of H-n-alphabets of n-plets of DNA hydrogen bonds.The model of these phenomenological rules was proposed on the basis of the new notion of genetic qubits and formalisms of quantum informatics.Some analogies of these rules and of their model with the Hardy-Weinberg law and its equation were noted.Here one should note that the Hungarian scientist Gyorgy Darvas was the first who -in his study of quantum electrodynamics -paid attention on connections of the genetic numeric matrices with Pauli's matrices [Darvas, Petoukhov, 2017].It is additional interesting that cyclic shifts of positions in doublets and triplets transform the mosaic matrices in Figs. 4 and 5 into new mosaic matrices [Petoukhov, 2008;Petoukhov, He, 2009], which are connected with new systems of unitary genetic matrices.
Secondly, described unitary genetic matrices contain complete orthogonal systems of functions in their rows or columns.But it is known the following: "after Fourier it was found that for some problems, harmonic sinusoids rather than other systems of orthogonal functions, for example, the Legendre polynomials, are better suited.In fact, any particular problem needs its own system of orthogonal functions.This was most clearly manifested in the course of the development of quantum mechanics" [Soroko, 1973].Correspondingly one can think that the genetic systems have their own orthogonal systems of functions, which should be used in physiology for appropriate spectral decompositions to study genetically inherited processes and structures (including genetic sequences, information processes in neuronal systems, cardio-vascular processes, etc.).
Thirdly, described fractal features of the mentioned tensor families of unitary genetic matrices give additional materials to the wide topic of inherited fractal-like structures in biological bodies, including symmetries in long texts of single stranded DNA [Petoukhov, 2017a] and facts about connections of fractals with cancer [Baish, Jain, 2000;Bizzarri et al, 2011;Dokukin et al, 2015;Lennon et al, 2015;Perez, 2017].Fractal patterns are related with the theory of dynamic chaos, which has many applications in sciences and technology (see, for example, [Dmitriev, 2002;Potapov, 2015]).A specifity of fractal patterns in tensor families of unitary genetic matrices can be used for a further development of the theory of dynamic chaos and its applications.The bridge between knowledges about fractals in information techniologies and in bio-information systems can lead to a mutual enrichment of both these fields.
Fourthly, we show the connection between the probabilities of members of alphabets of n-plets of hydrogen bonds in DNA sequences with some algebras of 2 n -dimensional hypercomplex numbers.This connection allows a comparison of different DNA sequences by means of their personal hypercompex numbers («hypercomplex passports» of DNA sequences) for studies in the field of evolutionary biology, medicine and biothechnology.
The author hopes that the further usage in genetics the concepts and formalisms of quantum informatics, which was undertaken in this article in the connection with unitary genetic matrices, will lead to the development of substantial quantum-information genetics.This will promote the inclusion of genetics and all biology in the field of profound mathematical natural science.Consideration of biological phenomena, including the phenomena of inheritance of the intellectual abilities of biological bodies, from the standpoint of the theory of quantum computers, gives many valuable opportunities for their comprehension and also for development of artificial intelligence systems [Petoukhov, 2016a,b;Petoukhov, Petukhova, 2017a;Petoukhov et al., 2017] (the work [Biamonte et al, 2017] contains a review about quantum computing and the problems of artificial intelligence).For example, an adult human organism has around 10 trillion (10 14 ) human cells and each of cells containts an identical complect of DNA, whose genetic information is used for physiological functioning organism as the holistic system of cells.How such huge number of cells can reliably functioning as a cooperative whole?Quantum informatics and associations with quantum computers can help to model and understand such holistic biological systems with their ability of computing complex tasks and transfering genetic information from one generation to another.Quantum-information approaches allow modeling complex biological systems without using data and hypotheses about interactions between adjacent molecules or between separate biological cells each with other; all of such separate elements are parts of a holistic organism as a quantum-information essence.The fundamental question about quantum computing was firstly touched upon in the book [Manin, 1980].
Acknowledgments: Some results of this paper have been possible due to a long-term cooperation between Russian and Hungarian Academies of Sciences on the topic "Non-linear models and symmetrologic analysis in biomechanics, bioinformatics, and the theory of self-organizing systems", where S.V. Petoukhov was a scientific chief from the Russian Academy of Sciences.The author is grateful to G. Darvas, E. Fimmel, M. He, Z.B. Hu, I. Stepanyan and V. Svirin for their collaboration.Special thanks

Fig. 1 .
Fig. 1.Left: the four nitrogenous bases of DNA: adenine A, guanine G, cytosine C, and thymine T. Right: three binary sub-alphabets of the genetic alphabet on the basis of three pairs of binary-oppositional traits or indicators.
Fig.2.The square tables of DNA-alphabets of 4 nucleotides, 16 doublets and 64 triplets with a strict arrangement of all components.Each of tables is constructed in line with the principle of binary numeration of its column and rows on the basis of binary-oppositional traits of the nitrogenous bases (see explanations in the text).
) and weak (white) members of DNA-alphabets in the genetic matrices shown in Figs. 2 and 3?

Fig. 23 .
Fig. 23.Frequencies of members of H-n-alphabets (n = 1, 2, 3, 4, 5) in DNA sequences of all 5chromosomes of a plant Arabidopsis thaliana.The left column contains the sequence numbers of the chromosomes.Each of points in graphs shows a frequency of an appropriate member of H-n-alphabets.For each of chromosomes, its frequencies q and p of H-monoplets 3 and 2 correspondingly are shown in the second column.See explanations in the text.

Fig. 24 .
Fig. 24.The nice correspondence between coordinate values (red points) of model vectors [q, p] (n) and phenomenological values (blue points taken from Fig. 23) of frequencies of all members of H-n-alphabets (n = 2, 3, 4, 5) for DNA sequences of all 5 chromosomes of a plant Arabidopsis thaliana.The left column contains the sequence numbers of the chromosomes.See explanations in the text.

Fig. 25 .
Fig. 25.The Pascal's triangle for amounts of identical values of coordinates in vectors of the tensor family [q, p] (n) .