Structural connections between long genetic and literary texts

Impressive recent discoveries in genetics have borrowed terminology from linguistics and the theory of communications. Leading structural linguistics experts postulate that human languages were influenced by a combination of genetic language and environment, suggesting the compelling possibility that all organisms may utilize their genetic code in communication mechanisms. The article explores this connection between linguistic languages and genetic language. Long genetic and literary texts are considered in special binary forms of their "n-plet representations", where each text is interpreted as a concatenation of fragments of identical length “n”. Probabilities of the presence of such n-plets in two types of considered texts are compared: 1) for long DNA sequences of two kinds of hydrogen bonds 2 and 3 of complementary nytrogenous bases in long DNA sequences of different organisms; 2) for long Russian texts represented as sequences of two phonetic groups of Russian letters in novels by L.Tolstoy, F.Dostoevsky, A.Pushkin, etc. Quantum informatics formalisms are used in modeling some of the results. The author's research is in general alignement with P. Jordan’s conjecture, published in 1932, in which he claimed that life's missing laws were the rules of chance and probability of the quantum world. These results can be used for further development of quantum biology (this term was proposed by P. Jordan).

1. Introduction 2. Some properties of long literary texts in Russian 3. Regarding similar properties of long DNA-texts 4. Quantum biology and life's rules of probability postulated by P. Jordan 5. Some concluding remarks

Introduction
Impressive recent discoveries in genetics have borrowed terminology from linguistics and the theory of communications.As experts in molecular genetics note, "the more we understand laws of coding of the genetic information, the more strongly we are surprised by their similarity to principles of linguistics of human and computer languages" [Ratner, 2002, p. 203].
Leading experts in the field of structural linguistics have long believed that languages of human dialogue were formed not by random processes but by a continuation of genetic language or, are, at least, closely connected with genetic language, suggesting the compelling possibility that all organisms may utilize their genetic code in communication mechanisms.Analogies between systems of genetic and linguistic information are of wide and important scientific interest, which this article briefly illustrates.Some relevant concepts will be referred to by R. Jakobson [1987Jakobson [ , 1999]], one of the most famous lingusitics experts and author of an in-depth theory of binary lingustic oppositions.Jointly with F. Jacob, Nobel Prize winner in molecular genetics, and with other linguistic specialists holding the same views, Jakobson proposed that genetic language is the structural basis of linguistic languages [Jacob et al., 1968;Jakobson, 1985].Among all systems of information transfer, the genetic and linguistic codes are based on use of discrete components, which in itself make no sense, but serve for construction of the minimum units that make sense.In both the genetic and linguistic languages, we deal with separate units that, individually, make no sense, but in studying their special groupings, their sense becomes clear.This similarity between both information systems is an important fact, yet is only one aspect of this relatively new field of study.According to Jakobson, all relations among linguistic phonemes are decomposed into a series of binary oppositions of elementary differential attributes (or traits).By analogy, the set of four letters of the genetic alphabet contains the three binary sub-alphabets, which allow creating new mathematical models in molecular genetics [Petoukhov, 2017[Petoukhov, , 2018a]].As Jakobson wrote, the genetic code system is the basic simulator, which underlies all verbal codes of human languages."The heredity in itself is the fundamental form of communications … Perhaps, the bases of language structures, which are imposed on molecular communications, have been constructed by its structural principles directly" [Jakobson, 1985, p. 396].These questions had arisen to Jakobson as consequence of his long-term research into the connections between linguistics, biology and physics.Such connections were considered at a united seminar of physicists and linguists, organized by Niels Bohr and Roman Jakobson, jointly, at the Massachusetts Institute of Technology.
"Jakobson reveals distinctly a binary opposition of sound attributes as underlying each system of phonemes...The subject of phonology has changed by him: the phonology considered phonemes (as the main subject) earlier, but now Jakobson has offered that distinctive attributes should be considered as "quantums" (or elementary units of language)… Jakobson was interested especially in the general analogies of language structures with the genetic code, and he considered these analogies as indubitable" [Ivanov, 1985].We are reminded also of the title of the monograph "On the Yin and Yang nature of language" [Bailey, 1982], which is characteristic for the theme of binary oppositions in linguistics.
F. Jacob, Nobel Prize winner in molecular genetics, also considered the relationship between genetics and linguistic languages in connection with the principle of binary oppositions, systematically described in the Ancient Chinese book "I-Ching".He wrote: « C'est peut-être I Ching qu'il faudrait étudier pour saisir les relations entre hérédité et langage» (In English: To understand the relationship between genetics and language, perhaps it would be necessary to study the Ancient Chinese "I Ching") [Jacob, 1974, p. 205].
This connection between linguistics and the genetic code interests many researchers, and some even perceive linguistic language as a living organism.In his book, "Linguistic Genetics", Makovsky says: "A look at language as a living organism, subject to the natural laws of nature, ascends to a deep antiquity … Research of a nature, of disposition and of reasons of isomorphism between genetic and linguistic regularities is one of the most important fundamental problems for linguistics of our time" [Makovsky, 1992].
The study of linguistic languages can have important implications toward a deeper understanding and modeling of mental functions and for the appropriate expansion of physics.In the work of B. Josephson, Nobel laureate in physics [Josephson, 2018] he says: "Regular physics is unsatisfactory in that it fails to take into consideration phenomena relating to mind and meaning, whereas on the other side of the cultural divide such constructs have been studied in detail.This paper discusses a possible synthesis of the two perspectives.Crucial is the way systems realising mental function can develop step by step on the basis of the scaffolding mechanisms of Hoffmeyer, in a way that can be clarified by consideration of the phenomenon of language".
In the next section the author will describe the structural analogies between long DNA-'texts' and long literary works in Russian.The represented analysis of long literary Russian texts (by L.N. Tolstoy, F.M. Dostoevsky, A.S. Pushkin, etc.) is based on binary peculiarities of the Russian alphabet.In addition, some applications of mathematics, concerning quantum informatics for modeling bio-informational structures, are considered.The results described were derived from the author's long-time researches in the field of quantum biology (see details about the history of quantum biology in [McFadden, Al-Khalili, 2018]).P. Jordan, credited with the authoring the first work on quantum biology, postulated that the mechanisms of living organisms are associated with what he referred to as his 'amplifier theory', based on Bohr's notion of the 'irreversible act of amplification', required to bring the fuzzy quantum reality into sharp focus by 'observing' it [Jordan, 1932].Jordan claimed that «life's missing laws were the rules of chance and probability (the indeterminism) of the quantum world that were somehow scaled up inside living organisms» [McFadden, Al-Khalili, 2018].It is these laws of chance and probability, postulated by Jordan, that we are looking for in our studies of the probabilistic characteristics of long DNA sequences of hydrogen bonds and nitrogenous bases [Petoukhov, 2018a,b;Darvas, 2018].The article is connected with these results.

Some properties of long literary texts in Russian
The Russian alphabet has a binary-oppositional structure since it has two binary-oppositional subalphabets: the sub-alphabet of vowels and the sub-alphabet of consonants.Each of these sub-alphabets also has its own binary-oppositional structure: the sub-alphabet of vowels consists of the sub-sub-alphabet of long vowels and the sub-sub-alphabet of short (or iotated) vowels; the sub-alphabet of consonants consists of the sub-sub-alphabet of voiced consonants and the sub-sub-alphabet of deaf consonants (Fig. 1).The soft sign ь and the hard sign ъ in the Russian alphabet do not convey any sound and therefore they are not taken into account in its phonologic structure.
The Russian alphabet can be considered to consist of the following two class of equivalency (in Fig. 1, the first class is marked by yellow and the second class is marked by green): 1.The first class of equivalency combines all short (iotated) vowels and all deaf consonants: e, ё, ю, я, п, ф, к, т, ш, с, х, ц, ч, щ.We can denote the 14 members of this class by the general symbol 0; 2. The second class of equivalency combines all long vowels and all voiced consonants: а, и, о, у, ы, э, б, в, г, д, ж, з, й, л, м, н, р.We can denote the 17 members of this part by the general symbol 1. Adopting this method, any Russian text can be converted into the corresponding binary sequence of these two symbols 0 and 1 (for example, 100110110...) by the following procedure: 1) all punctuation marks and the spacings between words, as well as all soft signs «ь» and hard signs «ъ», are deleted from the text; 2) each of the remaining letters is replaced by the corresponding symbol 0 or 1 depending on which of these two classes of equivalency it belongs.
For example, as a result of this procedure, the Russian text "Лев Толстой -великий русский писатель" (in English: «Leo Tolstoy -a great Russian writer») turns into a binary sequence 1010110011101101111000110101001.The computer program for analysis of literary texts was created by graduate student V.I.Svirin, under the guidance of the author.
It is important to mention the role played by the mathematical operation of the tensor product.This well known operation in mathematics, physics and informatics provides a method of joining vector spaces to form larger vector spaces.The tensor product is the crucial operation to understanding the quantum mechanics of multiparticle systems [Nielsen, Chuang, 2010, p. 71] and is one of basic instruments in quantum informatics.The following quotation highlights the importance of the tensor product: «This construction is crucial to understanding the quantum mechanics of multiparticle systems» [Nielsen, Chuang, 2010, p. 71] since a postulate of quantum mechanics holds that the state space of a composite system is the tensor product of the state spaces of its components.By definition, under the tensor product of two vectors, each of the components of the first vector is multiplied with all components of the second vector.The expression (1) shows an example of the tensor product (denoted by the symbol ) of two 2-dimensional vectors [x, y] and [v, w], which gives in the result one 4-dimensional vector [xv, xw, yv, yw]: xv, xw, yv, yw] (1) The named binary representations of long Russian literary texts were analyzed [Petoukhov, 2018a] by a method analogous with the analysis of long DNA sequences of numbers of hydrogen bonds 3 and 2 [Petoukhov, 2018a].In more detail, each of long binary sequences of numbers 0 and 1 (for example the sequence 1-0-1-0-1-1-0-0-1-1-1-0-…) can be represented as a sequence of binary doublets (10-10-11-00-11-10-…), or a sequence of binary triplets (101-011-001-110-…) or, in a general case, as a sequence of binary n-plets (n = 1, 2, 3, 4, 5, …).Such different representations of a long literary text, in the form of sequences of binary n-plets, are termed its "binary n-plet representation".In this article only long literary texts in Russian are considered, though similar approches are available for long literary texts in some other languages.
For each fixed n, we can analyze probabilities (or frequencies or percentage) of each kind of n-plets inside n-plet representations of any long Russian language literary text (or a literary text of any other language translated into Russian).Under a fixed value n, each of these probabilities is equal to the ratio: (the total quantity of a corresponding member of an alphabet of binary n-plets) divided by (the total quantity of these binary n-plets).For example, in the text of the work "Anna Karenina", by Leo Tolstoy, in its binary doublet representation there are 654523 doublets 00, 01, 10 and 11.This number includes 75895 doublets 00, 142504 doublets 01, 142547 doublets 10 and 293577 doublets 11.Correspondingly, the probability of doublets 00 is equal to 75895/654523 = 0,115954672; the probabilty of doublets 01 is equal to 142504/654523 = 0,217721914; the probabilty of doublets 10 is equal to 142547/654523 = 0,21778761; the probability of doublets 11 is equal to 293577/654523 = 0,448535804 (these values are shown rounded to four decimal places in an appropriate row in Fig. 3 in blue color).
In addition to calculating the probabilities of n-plets (n = 1, 2, 3, 4) in n-plet representations of long literary Russian texts, the author has modeled these probabilities by the same quantum-algorithmic method used for modeling probabilities in long DNA sequences of hydrogen bonds 3 and 2, in his article [Petoukhov, 2018a].This method explores some classical formalism from quantum informatics [Nielsen, Chuang, 2010].
In the author's model approach, phenomenologic probabilities of all members of binary n-plet alphabets [0, 1] (n) (under fixed n, where n = 1, 2, 3, 4, … is much less than the length of the considered literary text) are modeled by numeric values of appropriate coordinates of 2 n -dimensional vector from the tensor family of vectors of probabilities [0P, 1P] (n) , where 0P and 1P denote probabilities of the symbol 0 and of the symbol 1 in the binary monoplet representation of the analyzed long Russian text.These probabilities for the novel «Anna Karenina» are shown in the first column in Fig. 2 and in the first rows of the table in Fig. 3.
Correspondingly, in this model approach, to obtain model values of probabilities in all 2 n members of the binary n-plet alphabet, in a considered binary n-plet representation of the analyzed literary text, the following 2 n coordinates of vectors of probabilities from their tensor family should be calculated: in the case of the alphabet of binary doublets, there exist 4 coordinates of the vector [0P, 1P] (2) = [0P0P, 0P1P, 1P0P, 1P1P]; these 4 numeric coordinates correspond to 4 appropriate coordinates of the vector of all members of the alphabet of binary doublets [00,01,10,11]; in the case of the alphabet of binary triplets, there exist 8 coordinates of the vector [0P, 1P] (3) = [0P0P0P, 0P0P1P, 0P1P0P, 0P1P1P, 1P0P0P, 1P0P1P, 1P1P0P, 1P1P1P]; these 8 numeric coordinates correspond to 8 appropriate coordinates of the vector of all members of the alphabet of binary triplets [000,001,010,011,100,101,110,111]; in the case of the alphabet of binary tetraplets, there exist 16 coordinates of the vector [0P, 1P] (4) = [0P0P0P0P, 0P0P0P1P, 0P0P1P0P, 0P0P1P1P, 0P1P0P0P, 0P1P0P1P, 0P1P1P0P, 0P1P1P1P, 1P0P0P0P, 1P0P0P1P, 1P0P1P0P, 1P0P1P1P, 1P1P0P0P, 1P1P0P1P, 1P1P1P0P, 1P1P1P1P]; these 16 numeric coordinates correspond to 16 appropriate coordinates of the vector of all members of the alphabet of binary tetraplets [0000,0001,0010,0011,0100,0101,0110,0111,1000,1001,1010,1011,1100,1101,1110,1111], etc.It is important to emphasise that all coordinates of these vectors [0P, 1P] (n) , where n = 2, 3, 4, … are expressed by only two values, 0P and 1P.Summarising, in accordance with the proposed model approach, for an approximate prediction of probabilities of all members of binary n-plet alphabets in a binary n-plet representation of a long Russian literary text -on the basis of knowledge about only two probabilities 0P and 1P -the following procedure has been found to be sufficient:  calculate probabilities 0P and 1P of binary monoplets 0 and 1 in the binary sequence;  calculate the product of these probabilities 0P and 1P in coordinates of vectors [0P, 1P] (n) , where n = 2, 3, 4, … .The expression (2) shows an example of such calculation of the probability 0P0P1P for the member 001 of the alphabet of binary triplets under 0P = 0,36 and 1P = 0,64 (0P + 1P = 1): 0P0P1P = 0,36*0,36*0,64 = 0,0829 Figs. 2, 3 represent results -in graphical and tabular forms -of the author's analysis of the novel «Anna Karenina», by Leo Tolstoy, by the above described approach.Significant correspondences can be seen between phenomenologic values (blue points in graphs in Fig. 2) of probabilities of all members of alphabet binary n-plets, and model values (red points in graphs), represented by components of probability vectors [0P, 1P] (n) , n = 2, 3, 4.These graphs reveal that the model points of red color were almost exactly superimposed on the phenomenologic points of blue color.Fig. 3 shows the proximity of the numerical phenomenological and model values of the studied probabilities.All values are rounded to four decimal places.Therefore, knowing only two probablities 0P and 1P of binary monoplets, 0 and 1 in the binary n-plet representation of this well known novel, many probabilties for all members of the alphabets of binary n-plets can be predicted.We presume that a similar model correspondence also holds true for n = 5, 6, ... (if n is much less than the length of the considered literary text) but this should be studied in future research.(Figs. 12,13); the Russian Bible (Figs. 14,15).All these results are similar to those described for the novel «Anna Karenina» (Figs, 2, 3): they confirm that probabilities of members studied for different alphabets of binary n-plets (n = 1, 2, 3, 4) are, to some degree, interrelated to each other and that this interrelation can be modeled on the basis of the tensor family of vectors [0P, 1P] (n) (n = 1, 2, 3, 4).The computer program for the analysis of literary texts was created by our graduate student V.I.Svirin.points correspond to model values of the probabilities calculated as components of the vectors [0P, 1P] (n) , where 0P and 1P are probabilities of binary monoplets 0 and 1; (n) refers to tensor powers.
Probabilities of members in the alphabet of binary triplets Reality: 0P0P0P = 0,0318; 0P0P1P = 0,0789; 0P1P0P 0,0678; 0P1P1P = 0,1484; 1P0P0P = 0,0794; 1P0P1P = 0,1386; 1P1P0P = 0,1493; 1P1P1P = 0,3058.Our results show that the described properties of long Russian literary texts reflect the deep specifics of that language and not the particular literary style of a particular writer.It can be assumed that any long literary text in a foreign language, translated into Russian, will demonstrate similar properties.
It will be important to discover if there are similar patterns in the texts of other languages with differing alphabets and differing phonetic features.The author has begun to conduct these studies for literary texts in English, German, French and many other languages.It is hoped to publish the results of these studies in the future.
The next Section shows that analogical properties exist in many long DNA-texts, from data analyzed by the author, and that the described model approach, using the tensor product of vectors of probabilities, is effective not only in the case of long Russian literary texts but also in the case of long DNA-texts.Rules of these probabilities in long DNA-texts and in long literary Russian texts have mathematical analogies with the main law of population genetics -the Hardy-Weinberg law [Petoukhov, 2018a].

Regarding similar properties of long DNA-texts
In DNA molecules, genetic information is written in very long sequential texts using only 4 letters: adenine A, cytosine C, guanine G, thymine T. For example, the human genome consists of several billion such letters.In the double helix of DNA, complementary letters C-G and A-T are always located opposite each other and form complementary pairs by means of 3 hydrogen bonds for the pair C-G and by means of 2 hydrogen bonds for the pair А-Т (it can be denoted as C=G=3 and A=T=2).Correspondingly, any DNA sequence contains a chain of 2 and 3 hydrogen bonds, for example, 33223223233… .We term such number chains of hydrogen bonds: "hydrogen bond sequences" (briefly, "H-sequences" or "H-texts").The author analysed the properties of such long hydrogen bond sequences for many different organisms (here the term "long" means DNA sequences containing ≥ 100000 letters).The results of the author's study are briefly presented, below, which confirm the existence of structural connections between long genetic texts and long literary texts.
The article (Petoukhov, 2018a) describes in detail these hidden regularities in the long DNA H-sequences, which are connected with members of the alphabets of H-n-plets of many different organisms.These regularities concern probabilities (or frequencies or percentage) of each of hydrogen n-plets (briefly, H-n-plets) in long DNA H-sequences of the complete set of chromosomes.The author has revealed that for a concrete long DNA sequence, complete sets of individual probabilities of H-n-plets (under different values n) are interrelated with each other.This interrelation can be modeled by the same method described above for the analysis of literary texts: if q and p are probabilities of numbers 3 and 2 of hydrogen bonds in the considered DNA, then tensor powers (n) of the 2-dimensional vector [q, p] gives 2 n -dimensional vectors [q, p] (n) , components of which are model values of probabilities of members of the corresponding alphabet of H-n-plets in this DNA (n = 1, 2, 3, 4, … is much less than the length of such sequences).
For an illustration of this statement, Fig. 16 shows -in a graphical form -an example of phenomenological values of probabilities of all members of alphabets of H-n-plets (n = 1, 2, 3, 4, 5) in the case of the DNA sequence of the first chromosome of the plant Arabidopsis thaliana, which contains 30427671 nucleotide pairs.Simultaneously, Fig. 16 shows model values of these probabilities as components of 2 n -dimensional vectors [q, p] (n) , where q=0,35873552 and p=0,64126448.
Fig. 16.The graphic representation of probabilities in members of hydrogen n-plet alphabets (n = 1, 2, 3, 4, 5) in the DNA sequence of the first chromosome of the plant Arabidopsis thaliana (initial data relating to this chromosome were accessed from https://www.ncbi.nlm.nih.gov/nuccore/NC_003070.9).Blue points in the graphs show phenomenological probabilities of n-plets of numbers of hydrogen bonds, while red points show model values of these probabilities as components of 2 n -dimensional vectors [q, p] (n) , where q and p are probabilities of hydrogen bonds 3 and 2 in this DNA.
Fig. 17 shows phenomenological and model values of probabilities of all members of hydrogen n-plet alphabets (n = 1, 2, 3, 4, 5) for the same DNA sequence as in Fig. 16.It can be seen that these model values reproduce phenomenological values with the level of accuracy, which one can see in Fig. 17.
Probabilities of members in the alphabet of H-monoplets (numbers 3 and 2) Reality:  in complete sets of nuclear chromosomes of different organisms, a probability of any member of the H-n-alphabets has approximately the same value in all chromosomes.
However, further research is needed to define a degree of universality and precision of this rule.The author has also calculated probabilities of members of H-n-alphabets (n = 1, 2, 3, 4, 5) in 19 genomes of bacteria and archaea from the full list in the article [Rapoport, Trifonov, 2012, p. 2]: "Aquifex aeolicus, Acidobacteria bacterium, Bradyrhizobium japonicum, Bacillus subtilis, Chlamydia trachomatis, Chromobacterium violaceum, Dehalococcoides ethenogenes, Escherichia coli, Flavobacterium psychrophilum, Gloeobacter violaceus, Helicobacter pilory, Methanosarcina acetivorans, Nanoarchaeum equitans, Syntrophus aciditrophicus, Streptomyces coelicolor, Sulfolobus solfataricus, Treponema denticola, Thermotoga maritima and Thermus thermophiles".The calculated sets of these probabilities were also modelled on the basis of sets of coordinates of appropriate vectors [q, p] (n) .These results confirm that the proposed model approach on the basis of vectors [q, p] (n) can be used to obtain idealized models of probabilities of all members of H-n-alphabets in actual long DNA sequences (n = 1, 2, 3, 4, … is much less than the length of such sequences).
The proposed model allows the prediction of probabilities of many members of H-n-alphabets in long DNA sequences with a high level of accuracy, on the basis of knowledge of probabilities of only two numbers -3 and 2 -of hydrogen bonds in the DNA.
It is important to remember that if a quantum state can be represented as a vector of a Hilbert space, such a state is termed a pure quantum state.If a pure state |ψ> can be written in the form |ψ> = |ψ1> |ψ2>, where |ψi> is a pure state of the i-th subsystem, it is said to be separable [Nielsen, Chuang, 2010].Otherwise it is termed entangled.Correspondingly, the present model approach, using the tensor product, represents long DNA sequences of n-plets of hydrogen bonds (n = 2, 3, 4,…) as quantum systems in their separable pure state.The observed difference between the real and model values of the probabilities under study can be interpreted as a violation of a separable pure state, due to the presence of an entangled state.In the case of human chromosomes, this difference between the real and model values of the probabilities studied is more significant than that of the plant Arabidopsis thaliana shown in Fig. 16.This infers that, from the model point of view, entangled states are more pronounced in the case of human chromosomes.On the basis of this model approach it is interesting to compare characteristics of such entangled states in genomes of many organisms having different levels of consciousness.Whether the human genome possesses the highest level of such entangled state or not? 4. Quantum biology and life's rules of probability postulated by P. Jordan Let us return to the historically first work on quantum biology [Jordan, 1932] written by one of creators of quantum mechanics.He asked: Are the laws of atomic and quantum physics of essential importance for life?In fact, Jordan had been thinking about this question for over a decade and had been using the term «Quantumbiologie» since the late 1930s.Jordan believed that the specifics of living organisms are based on special laws that should be discovered in the future.Moreover, he claimed that «life's missing laws were the rules of chance and probability (the indeterminism) of the quantum world that were somehow scaled up inside living organisms» [McFadden, Al-Khalili, 2018].Here, the difference between biological and inanimate objects should be explained.Jordan correctly pointed out that inanimate objects were governed by the average random motion of millions of particles, such that the motion of a single molecule has no influence whatsoever on the whole object.This insight is usually credited to Erwin Schrödinger, who later claimed that life was different from inorganic chemistry because of its dependence on the dynamics of a small number of molecules.Jordan similarly argued that the few molecules that control the dynamics of living cells within the control center have a dictatorial influence, such that quantum-level events that govern their motion, such as Heisenberg's uncertainty principle, are amplified to influence the entire organism.Jordan called this his 'amplifier theory' and based it on Bohr's notion of the 'irreversible act of amplification' that is required in order to bring the fuzzy quantum reality into sharp focus by 'observing' it.Jordan believed that living organisms were uniquely able to carry out this amplification in a way that was conspicuously different from inanimate matter, such as a Geiger counter.Jordan was convinced he could extend quantum indeterminism from the subatomic world to macroscopic biology.He even made a connection with free will by suggesting a link between quantum mechanics and psychology.Jordan's insistence that living organisms have a unique ability to amplify the quantum into the macroscopic world has a lot of resonance with modern views of quantum biology [McFadden, Al-Khalili, 2018].
In accordance with these pioneering ideas of Jordan on the hidden laws of probability in living organisms, the author conducted a systematic study of the rules of probability in long DNA sequences as it has been described in this article and in [2018b].DNA sequences are a very important and convenient for such computerized research, which in Jordan's day was impossible.
In addition, the author pays special attention to DNA sequences of hydrogen bonds, taking into account the well-known data on the exceptional role of hydrogen bonds in biological organization and in water.For example, L. Pauling believed and wrote that «the significance of the hydrogen bond for physiology is greater that of any other single structural feature" [Pauling, 1940, p. 450].The author's book [Petoukhov, 2001] describes the regular proton groupings in the molecular genetic system showing phenomenological data on the structural and "arithmetic" role of protons (nuclei of hydrogen atoms) in genetics; these phenomenological data are connected in particular with Yin-Yang dichotomous schemes, digrams, trigrams and hexagrams of the Ancient Chinese book «I Ching» written a few thousand years ago and including the famous table of 64 hexagrams in Fu-Xi's order."I Ching" declares a universality of a cyclic principle of organization in nature; traditional Oriental medicine is based on positions of this book.As known, the famous expert in molecular genetics G. Stent was the first author published a hypothesis about a possible connection between genetic code structures and a symbolic system of the "I Ching" [Stent, 1969, p. 64].A few authors have supported him and his hypothesis later.
The ancient Chinese claimed that this table of 64 hexagrams is a universal archetype of nature, a universal classification system.They knew nothing about the genetic code, but the genetic code with its 64 codons and other important features is strikingly similar to this ancient table and the system of the "I-Ching".It should also be noted that the numbers 2 and 3 of hydrogen bonds in DNA, played the basic role in Ancient Chinese arithmetic.According to the creator of analytical psychology, Carl Jung [Fordham, 1978;Jung, 1947;Samuels, 2005], trigrams and hexagrams of the "I Ching" fix a universal set of archetypes (innate psychic structures).Jung was a connoisseur of this book and called it a "great and unique" work.On the basis of the concept of archetypes with using schemes of the "I Ching", he developed his amplification method that he successfully applied to the treatment of patients.Wolfgang Pauli made a prominent contribution to the development of the concept of archetypes as a result of his many years of cooperation on this topic with Jung [Pauli, Jung, 2001].Pauli introduced Jung to the ideas and concepts of quantum mechanics and quantum biology.It seems not to be accidental that Jung used the same term «amplification» in his method that Jordan used in his work on quantum biology.The materials on structural connections between the genetic code and schemes of the "I Ching" were additionally described in detail in [Petoukhov, 1999[Petoukhov, , 2001[Petoukhov, , 2008;;Petoukhov, He, 2009;Hu, Petoukhov, Petukhova, 2017b].
Taking into account the structural connection of the genetic molecular systems and Jung's archetypes of the unconscious with schemes of the «I Ching», the concept of "biological archetypes" in an expanded sense, should be borne in mind, including universal biological properties and phenomena.One such biological archetype or universal biological property is the main psychophysiologic law by Weber-Fechner (http://en.wikipedia.org/wiki/Weber-Fechner_law): the intensity of the perception is proportional to the logarithm of stimulus intensity.It is known that different types of inherited sensory perception are subordinated to this law: sight, hearing, smell, touch, taste, etc.Because of this law, the power of sound in music and technology is measured on a logarithmic scale in decibels.One can suppose that the innate Weber-Fechner law (WF-law) is a law pertaining to the human nervous system.However, its meaning is actually much wider because it holds true in many kinds of lower organisms without a nervous system: "this law is applicable to chemo-tropical, helio-tropical and geo-tropical movements of bacteria, fungi and antherozoids of ferns, mosses and phanerogams ... .The Weber-Fechner law, therefore, is not the law of the nervous system and its centers, but the law of protoplasm in general and its ability to respond to stimuli" [Shultz, 1916, p. 126].The conception of multi-resonant genetics proposes the mathematical model of the Weber-Fechner law on the base of natural resonant frequences of a particular class of vibrational systems with 2 degrees of freedom [Petoukhov, 2015a[Petoukhov, , 2016a]].It seems that the structural connections described in this article between long genetic and literary texts represent one of the biological archetypes in bioinformatics.

Some concluding remarks
It is worthy of note that the probabilities of alternation of vowels and consonant letters have been the focus of studies in Russian literary texts by professional mathematicians.Their results are important for many scientific problems including problems of artificial intelligence [Domingos, 2015].For example, Russian mathematician A.A.Markov, who is a creator of the famous "Markov's chains", developed his theory (of chains) using analysis of the poem "Evgeniy Onegin", where he manually calculated probabilities of vowels and consonants more than 100 years ago [Markov, 1924].Analysis of probabilities of different nplets of vowels and consonants in the poem "Evgeniy Onegin" was also made in the work devoted to the mentioned Markov's research [Petrenko, 2018].The great Russian mathematician, A.N. Kolmogorov, studied rhythms of verses to reveal the mechanism of developing internal goals of self-organizing systems [Kolmogorov, 1997;Nikolaev, 1999].In contrast to all known research concerning vowels and consonants and their concatenations in doublets, triplets, etc., in Russian literary texts, the author is studying the following:  First, the author studies the total probabilities of all representatives of two described classes of equivalency of Russian letters (in Fig. 1 the first class is denoted by yellow and the second class is denoted by green) including the probabilities of different n-plets of these representatives (n = 1, 2, 3, 4, …); (as distict from the probabilities of alternation of separate vowels and consonant letters);  Second, from this standpoint the author studies analogies between long literary texts and hydrogen sequences of DNA to test the concept that linguistic languages are continuations of genetic languages, which are basic for all biological organisms;  Third, the author studies possibilities of applications of formalisms of quantum informatics to model long literary texts and DNA texts for developing the known hypotheses of some authors concerning quantum-informational organization of informatics in living bodies.
Science has led to a new understanding of life itself: «Life is a partnership between genes and mathematics» [Stewart I., 1999].The author's results, published in various editions, favourably testifies that the mathematics of quantum informatics and of the theory of resonances of oscillatory systems with 2 n degrees of freedom, could be such a partner of the genetic system [Petoukhov, 2015a,b,c;Petoukhov, 2016a;Petoukhov, Petukhova, 2017b;Hu, Petoukhov, Petukhova, 2017a, 2018].
The results described above concerning the parallels between long genetic and literary texts were obtained by the author on the basis of his thoughts regarding the deep connections that exist between molecular genetics and quantum informatics, and of his hypothesis that living organisms are quantumalgorithmic essences [Petoukhov, 2018a,b;Petoukhov, Petukhova, Svirin, 2018;Petoukhov, Svirin, 2018].
Quantum computers use mainly the following mathematical formalisms:  unitary (or orthogonal) operators;  the tensor (or Kronecker) multiplication of matrices;  the logical operation of modulo-2 addition.The author's research has revealed that different families of structured DNA alphabets of n-plets of nucleotides C, G, A, T, and also of hydrogen bonds, correspond to these formalisms in a high degree [Petoukhov, 2018a,b;Petoukhov, Petukhova, Svirin, 2018].The research described in this paper also points to the possibility that applications of mathematics of quantum informatics for modeling biological phenomema can help the understanding of many biological phenomena.For example, an adult human organism has around 10 trillion (10 14 ) cells and each cell contains an identical copy of DNA, genetic information used for the physiological functioning of the organism as a holistic system of cells.It is important to ask how such a huge number of cells reliably function as a cooperative whole?Quantum informatics and associations with quantum computers can help to model and understand holistic biological systems and can assist in creating new and effective systems of artificial intelligence on the basis of biological prototypes.(For more information on perspectives of using quantum informatics for artificial intelligence see [Biamonte et al, 2017]).
It should be noted that in his researches the author is purposefully looking for the intersection of bioinformational structures with the formalisms of quantum informatics.For example, the data given in this article concerning the probabilities of n-plets in long literary texts and long DNA sequences can also be interpreted on the basis of the known theorem: the probability of the product of two independent events is equal to the product of the probabilities of these events.But the consideration of DNA n-plets as combinations of independent events poorly corresponds to the phenomenology of the genetic coding system, in which, for example, in single-stranded DNA the probabilities of n-plets combined from DNA letters C, G, A, T depends on the letter order in n-plets of the same letter composition.The individual probability of doublet CG can differ by a factor 6 from the individual probability of the doublet GC in the long DNA sequence (see such example in [Petoukhov, 2018b]).This means that the probabilities of the doublet CG and the doublet GC in the long single-stranded DNA cannot be simultaneously represented as the product of the probabilities of the monoplets C and G; in other words, the probabilities of douplets CG and GC cannot be considered as the product of probabilities of independent events C and G.In the case of DNA chains of numbers 3 and 2 of hydrogen bonds, we are referring to the total probability for the set of doublets CG + GC because both these doublets of DNA letters belong to one class of equivalence "33" doublets of hydrogen bonds.Since members CG and GC of this equivalence class of hydrogen bonds are composed of dependent events, it seems to be inappropriate to model the probabilities of hydrogen n-plets as the product of two independent events; therefore the use of quantum algorithmic modeling is preferred, with the tensor product of the probability vectors and with other formalisms of quantum informatics, which have proven themselves in other genetic models, including the concepts of multi-resonance genetics and of geno-logic coding [Petoukhov, 2008[Petoukhov, , 2011[Petoukhov, , 2016b[Petoukhov, , 2017[Petoukhov, , 2018b;;Petoukhov, He, 2009;Hu, Petoukhov, Petukhova, 2017b;Petoukhov et al., 2017;Petoukhov, Petukhova, Svirin, 2018].
Referring to the connection between quantum informatics and resonances, E. Schrodinger emphasised the basic meaning of resonances in quantum mechanics: "The one thing which one has to accept and which is the inalienable consequence of the wave-equation as it is used in every problem, under the most various forms, is this: that the interaction between two microscopic physical systems is controlled by a peculiar law of resonance» [Schrodinger, 1952, p.115].In considering an exact balance in nature between bundles of energy, lost by one system and gained by another, he noted: «I maintain that it can in all cases be understood as a resonance phenomenon» [ibid, p.114].
L.Pauling used ideas of resonances in quantum mechanical systems in his theory of resonance in structural chemistry.His book [Pauling, 1940] about this theory is the most quoted among scientific books of the 20th century.The actual molecule, as Pauling proposed, is a sort of hybrid, a structure that resonates between the two alternative extremes; and whenever there is a resonance between the two forms, the structure is stabilized.He wrote: «Among the most interesting problems of science are those of the structure and properties of substances of biological importance.I have little doubt that in this field resonance and the hydrogen bond are of great significance, and that these two structural features will be found to play an important part in such physiological phenomena as the contraction of muscle and the trasmission of impulses along nerves and in the brain» [Pauling, 1940, p. 570].
In the 1940's Pauling was unaware of the existence of the double helix of DNA, but the author's results, concerning the reqularities of hydrogen bonds in long DNA sequences, confirm in some degree his prediction regarding the important role of hydrogen bonds in the transmission of biological information.These results also seem to have relations with research into the unusual properties of water in living bodies (see for example [Pollack, 2013] and materials of annual conferences on the physics, chemistry and biology of water, http://www.waterconf.org/).Many authors believe that hydrogen bonds play an important role in the transmission of information through water and other water-based liquids.Biological matter contains mainly structured water.For example, jellyfish consist of 99% water, yet are vibrant, energetic creatures.Hydrogen is a component of almost all organic matter and is present in all living cells, where the number of hydrogen atoms is almost 63% [Hornak, 1996].Taking into account the existing hypotheses about the cosmic origins of life, it should be recalled that hydrogen is the main component of stars and interstellar gas.A hydrogen atom, devoid of an electron, is a proton; the participation of numbers of protons in the structural organization of the molecular genetic system is described in the book [Petoukhov, 2001].It may be reasonable to consider that living bodies are, in essence, a construction of hydrogen atoms and hydrogen bonds.Science, in the future, should answer in what degree this thought is true.
Pauling's theory uses the fundamental principle of a minimal energy because -in resonant combining of parts into a single unit -each of the members of the ensemble requires less energy for performing their own work than when working individually.Of course, this fundamental principle can be used in many other cases of resonances in different systems.In particular, in the article [Petoukhov, Petukhova, 2017b], we used the theory of oscillators with many degrees of freedom to model some phenomena of Mendelian genetics and to analyze structures of genetic-molecular alphabets; also to explain the phenomena of segregation in these molecular alphabets, the existence of dominant and recessive resonances in nitrogenous bases of DNA and RNA that have been postulated by analogy with dominant and recessive alleles in Mendelian genetics.A hypothesis concerning resonance in genetic systems as binary computers was formulated in [Petoukhov, 2016a;Petoukhov, Petukhova, 2017b].The concept of the multi-resonance genetics was put forward taking into account deep analogies between molecular-genetic structures and the mathematical theory of resonances of oscillatory systems with many degrees of freedom [Petoukhov, 2016a].
In this paper regarding genetic and literary structural interconnections it is important to recall that our speech communication and singing are also based on the inherited ability to use resonances when generating and perceiving sounds.Complex structured systems of resonances of vibrational biosystems with many degrees of freedom can be assumed as a deep reason of described structural analogies.Resonances participate in our perception of music, which has some inherited aspects.The hypothesis concerning the close parallels between genes and music was formulated in [Josephson, Carpenter, 1996].In our works [Petoukhov, 2008[Petoukhov, , 2015c;;Koblyakov, Petoukhov, Stepanyan, 2015;Petoukhov, He, 2009;Hu, Petoukhov, Petukhova, 2017a] we described a connection between the molecular structure of DNA, including its sequences of hydrogen bonds, with ratios of musical harmony in the Pythagorean musical scale and in a new musical system of so-termed "Fibonacci-stages musical scales", resembling the known phyllotaxis laws of inherited biological morphogenesis.Figuratively speaking, taking into account this connection, living bodies can be considered as complex musical instruments that are developed in phylogenesis and onthogenesis.
Another example of the theoretical approaches in biology relating to the principle of resonances is given by works devoted to cell language theory [Ji, 2015[Ji, , 2017]].This author postulated an analogy between enzymic catalysis and blackbody radiation, which was modeled by Planck due to his thoughts regarding the resonances of many oscillators.He has noted that some important biological phenomena are described by histograms, which are analogical to the histograms of blackbody radiations.Ji has proposed a generalization of the Planck equation for modeling many biological phenomena that embody long-tailed histograms.By analogy with the principle of quantization of energy in quantum mechanics, Ji has postulated a quantization of free energy levels in enzymes.
E.Schrodinger (1944) noted: "from all we have learnt about the structure of living matter, we must be prepared to find it working in a manner that cannot be reduced to the ordinary laws of physics… because the construction is different from an anything we have yet tested in the physical laboratory».For comparison, the enzymes in biological organisms work a million times more effectively than catalysts in the laboratory [Varfolomeev, 2005].We believe that such ultra-efficiency of enzymes in biological bodies is defined not only by laws of physics, but also by mathematics of quantum informatics and quantum logic, and therefore -in accordance with Schrodinger -this ultra-efficiency cannot be reduced to the ordinary laws of physics.

Fig. 2 .
Fig. 2. Graphical analysis results of the novel "Anna Karenina" by Leo Tolstoy (the original literary text was accessed from http://samolit.com/books/62/).Probabilities of members in alphabets of binary n-plets (n = 1, 2, 3, 4) from the binary n-plet representations of this novel are shown.Blue points correspond to phenomenologic values of the probabilities of hydrogen nplets, while red points correspond to model values of the probabilities calculated as components of the vectors [0P, 1P] (n) , where 0P and 1P are probabilities of binary monoplets 0 and 1; (n) refers to tensor powers.

Fig. 3 .
Fig. 3.The numeric representation of the analysis of the novel "Anna Karenina" by Leo Tolstoy (the original literary text was accessed from http://samolit.com/books/62/).Probabilities of members in alphabets of binary n-plets (n = 1, 2, 3, 4) from the binary n-plet representation of this novel are shown.All values are rounded to four decimal places.Blue numbers correspond to

Fig. 4 .
Fig. 4. Graphical analysis results of the novel "War and Peace" (Book 1) by Leo Tolstoy (the original literary text was accessed from http://samolit.com/books/64/).Probabilities of members in alphabets of binary n-plets (n = 1, 2, 3, 4) from the binary n-plet representation of this novel are shown.Blue points correspond to phenomenologic values of the probabilities of nplets, while red points correspond to model values of the probabilities calculated as components of the vectors [0P, 1P] (n) , where 0P and 1P are probabilities of binary monoplets 0 and 1; (n) refers to tensor powers.

Fig. 5 .
Fig. 5. Numeric analysis results of the novel "War and Peace" (Book 1) by Leo Tolstoy (the original literary text was accessed from http://samolit.com/books/64/).Probabilities of members in alphabets of binary n-plets (n = 1, 2, 3, 4) from the binary n-plet representation of this novel are shown.All values are rounded to four decimal places.Blue numbers correspond to phenomenologic values of the probabilities for cases of alphabets named in tabular sections, while red numbers correspond to model values of these probabilities calculated as components of the vectors [0P, 1P] (n) , where 0P and 1P denote phenomenologic values of probabilities of binary monoplets 0 and 1; (n) refers to tensor powers.

FFig. 6 .
Fig. 6.Graphical analysis results of the novel "Crime and Punishment" by F.M. Dostoevsky (the original literary text was accessed from http://samolit.com/books/57/).Probabilities of members in alphabets of binary n-plets (n = 1, 2, 3, 4) from the binary n-plet representation of this novel are shown.Blue points correspond to phenomenologic values of the probabilities of n-plets, while red points correspond to model values of the probabilities calculated as components of the vectors [0P, 1P] (n) , where 0P and 1P are probabilities of binary monoplets 0 and 1; (n) refers to tensor powers.

Fig. 7 .
Fig. 7. Numeric analysis results of the novel "Crime and Punishment" by F.M. Dostoevsky (the original literary text was accessed from http://samolit.com/books/57/).Probabilities of members in alphabets of binary n-plets (n = 1, 2, 3, 4) from the binary n-plet representation of this novel are shown.All values are rounded to four decimal places.Blue numbers correspond to phenomenologic values of the probabilities for cases of alphabets named in tabular sections, while red numbers correspond to model values of these probabilities calculated as components of the vectors [0P, 1P] (n) , where 0P and 1P denote phenomenologic values of probabilities of binary monoplets 0 and 1; (n) refers to tensor powers.

FFig. 8 .
Fig. 8. Graphical analysis results of the novel "Idiot" by F.M. Dostoevsky (the original literary text was accessed from http://samolit.com/books/56/).Probabilities of members in alphabets of binary n-plets (n = 1, 2, 3, 4) from the binary n-plet representation of this novel are shown.Blue points correspond to phenomenologic values of the probabilities of n-plets, while red points correspond to model values of the probabilities calculated as components of the vectors [0P, 1P] (n) , where 0P and 1P are probabilities of binary monoplets 0 and 1; (n) refers to tensor powers.

Fig. 9 .
Fig. 9. Numeric analysis results of the novel "Idiot" by F.M. Dostoevsky (the original literary text was accessed from http://samolit.com/books/56/).Probabilities of members of alphabets of binary n-plets (n = 1, 2, 3, 4) from the binary n-plet representation of this novel are shown.All values are rounded to four decimal places.Blue numbers correspond to phenomenologic values of the probabilities of the named n-plets, and red numbers correspond to model values of these probabilities calculated as components of the vectors [0P, 1P] (n) , where 0P and 1P denote phenomenologic values of probabilities of binary monoplets 0 and 1; (n) refers to tensor powers.A.S. Pushkin «Evgenij Onegin» (107146 letters) Probabilities of 2 monoplets (0, 1)Probabilities of 4 doublets(00, 01, 10, 11)

Fig. 13 .
Fig. 13.Numeric analysis results of the novel "Dubrovsky" by A.S. Pushkin (the original literary text was accessed from http://samolit.com/books/61/).Probabilities of members in alphabets of binary n-plets (n = 1, 2, 3, 4) from the binary n-plet representation of this novel are shown.All values are rounded to four decimal places.Blue numbers correspond to phenomenologic values of the probabilities of the named n-plets, while red numbers correspond to model values of these probabilities calculated as components of the vectors [0P, 1P] (n) , where 0P and 1P denote phenomenologic values of probabilities of binary monoplets 0 and 1; (n) refers to tensor powers.Russian Bible (3122489 letters) Probabilities of 2 monoplets (0, 1)Probabilities of 4 doublets(00, 01, 10, 11)

Fig. 15 .
Fig. 15.Numeric analysis results of the Russian Bible (the original literary text was accessed from http://petoukhov.com/bible.zip).Probabilities of members in alphabets of binary n-plets (n = 1, 2, 3, 4) from the binary n-plet representation of this novel are shown.All values are rounded to four decimal places.Blue numbers correspond to phenomenologic values of the probabilities of the named n-plets, while red numbers correspond to model values of these probabilities calculated as components of the vectors [0P, 1P](n) , where 0P and 1P denote phenomenologic values of probabilities of binary monoplets 0 and 1; (n) refers to tensor powers.