Phase Transitions in Virology

Viruses have established symbiotic relationships with almost every other living organism on Earth and at all levels of biological organization, from other viruses up to entire ecosystems. In most cases, peacefully coexisting with their hosts, but in most relevant cases, parasitizing them and inducing diseases. Viruses are playing an essential role in shaping the eco-evolutionary dynamics of their hosts, and also have been involved in some of the major evolutionary innovations either by working as vectors of genetic information or by being themselves coopted by the host into their genomes. Viruses can be studied at different levels of biological organization, from the molecular mechanisms of genome replication, gene expression and encapsidation to global pandemics. All these levels are different and yet connected through the presence of threshold conditions allowing for the formation of a capsid, the loss of genetic information or epidemic spreading. These thresholds, as it occurs with temperatures separating phases in a liquid, define sharp qualitative types of behaviour. These {\em phase transitions} are very well known in physics. They have been studied by means of simple, but powerful models able to capture their essential properties, allowing to understand them. Can the physics of phase transitions be an inspiration for our understanding of viral dynamics at different scales? Here we review the best-known examples of transition phenomena in virology and their simplest mathematical modelling approaches. We suggest that the advantages of abstract, simplified pictures used in physics are also the key to properly understand the origins and evolution of complexity in viruses. By means of several examples, we explore this multilevel landscape and how minimal models provide deep insights into a diverse array of problems. The relevance of these transitions in connecting dynamical patterns across levels and their evolutionary and clinical implications are outlined.


I. INTRODUCTION
Researchers from different disciplines tend to study viruses at their favourite level of biological organisation ( Fig. 1a-e). At the lowest scale of biological complexity, molecular virologists carefully characterise the mechanisms of viral replication and pathogenesis, the molecular interactions between viral proteins and among viral and host proteins, and how cells respond to infection by triggering diverse defense mechanisms. At the highest scale, ecologists and epidemiologists describe the incidence of viruses in their reservoir and potential novel hosts, and their spread among host populations and at the global level, including their impact on nutrient cycling. At each scale, what is considered as the 'host' is different: single cells or tissues for molecular virologists, individuals for clinical virologists and plant pathologists, populations of hosts or even entire ecosystems for ecologists and epi-ing the infected host and thus effectively reducing the rate of transmission (Doumayrou et al. 2012). Another remarkable example of seemingly opposed observations: population-level human immunodeficiency virus type 1 (HIV-1) phylogenies are mainly shaped by selective neutral epidemiological processes, implying that genealogybased population genetic inferences can be useful to study the HIV-1 epidemic history and dating events (Lemey et al. 2006). In sharp contrast, HIV-1 phylogenies reconstructed from within-host sequences indicate the action of strong selective pressures imposed by the heterogeneity of cell types in which the virus replicated (Lemey et al. 2006). Indeed, the existence of within-host reservoirs of latently infected CD4 + T cells produces a delay in the evolutionary dynamics within single hosts. These delays can fundamentally change the dynamics of the virus transmission between individuals and, hence, have an impact at the epidemiological scale (Doekes et al. 2017). Therefore, while the general connection between the infection dynamics within a host and the populationlevel transmission dynamics of viruses is widely acknowledged, a comprehensive and quantitative understanding of the integration of the two scales is still lacking.
There have been attempts to model multi-scale selection for viruses. At most, studies have tried to model between-host transmissions as a function of within-host replication parameters (Coombs et Kumberger et al. 2016). Unfortunately, the former models usually ignore the inherent within-host complexity, while the latter never extend beyond the individual tissue or single host. Gong et al. (2015) identified seven challenges for developing multi-scale models of virus evolution: (1) Lack of models and data to elucidate the processes underlying transmission probabilities and bottlenecks. From the donor host side, how does infectiousness depend on virus load? From the recipient host side, initial infection depends on the dose (bottleneck), route of transmission, time of exposure, etc. Stochastic and spatial invasion models will offer insights if fed with the appropriate empirical data.
(2) Heterogeneity within a single host. Many of the existing within-host models describe the host as a single population of target cells without any structure. This is obviously not the case and heterogeneity in cell type, spatial structure, susceptibility and immune response will all play important roles in shaping infection dynamics.
(3) Fitness landscapes may be highly dynamic and variable between host types. Very little is known about the topography of the fitness landscapes into which viral populations evolve. Evidences suggest they may be rugged but also contain some degree of neutrality (Kouyos et al. 2012; Lalić and Elena 2015). Land-scape's topography may be strongly affected by the cell types and immune responses, with epistasis (Elena et al. 2010) and antagonistic pleiotropy being essential components (Cervera et al. 2016). Indeed, it has been recently suggested that the dominant view among biologists of a static fitness landscape as a succession of valleys and peaks may be misleading to explain many observations in virus evolution. Instead, it has been suggested that an adaptive multiscape (Catalán et al. 2017), or a time-fluctuating adaptive seascape (Mustonen and Lässig 2009), may provide a much better representation.
(4) Current models do not easily incorporate highthroughput next-generation sequencing (NGS) data. Empirical studies have demonstrated that during acute and chronic infections, RNA viruses generate massive amounts of genetic variability (Domingo et al. 2012). In some cases, this genetic diversity is transmitted, in some others not. NGS provides valuable information to assess the size of bottlenecks and the spread of resistance variants. Furthermore, quantitative methods have been established at the epidemiological level, or using global level phylogeography analyses, typically based on consensus sequences, but there seem to be no well-established methods for analysis of NGS data at this level, thus missing the opportunity to link within-and betweenindividual diversity with epidemiological processes. (5) Ignoring superinfection a has greatly simplified modelling efforts. However, it is not clear when this approximation should be valid. Superinfection is known to be important in many viruses, e.g. HIV-1, increasing viral load and hastening progression to AIDS (Korenromp et al. 2009). (6) The distinction between within-and between-host dynamics can be easily made. However, the fact that both of these scales involve further nested levels has been often neglected. While multi-scale models at the population level are common, these models ignore the withinhost components. This lack of integration of within-host levels of complexity, so far, resulted from the lack of information in vivo. Fortunately, this limitation is being overcome with NGS and the development of models that take into account cellular processes (e.g., Loverdo et al. 2012).
(7) What approaches should be used to link processes across scales? Modelling multi-scale processes in full mechanistic detail, or even simulating such models, is unrealistic. One possibility is to come up with ways of extracting the essential features of lower-scale models to embed them into higher-scale models efficiently (Mideo et al. 2008). An approach that has been successfully taken is to separate timescales, which essentially separates models that may be used at different scales. Fola Superinfection refers to the process in which an already infected host becomes secondarily infected by a different virus or a different strain of the same virus. (e) contact network of hosts, represented as a graph with links indicating potential transmissions due to social behaviour or to contact with other vectors. To explore the nature of all these scales, different theoretical models can be used (right panel) each one involving a threshold separating two different regimes. Here the scale increases, from molecular self-assembly (bottom) to pandemics on geographic scales (top). Different sets of equations and the nature of the phases change as we move from one scale to the next. In grey we indicate some crucial components required to move from one level to another, from the presence of a mutational landscape to ecological interactions (allowing for arms races if mutation is included) or whether transmission between hosts is possible and the kind of interaction pattern (well mixed or heterogeneous) at work.
lowing this philosophy, Park et al. (2013) used Markov chain modelling at the within-host level embedded into a stochastic branching process for between-host transmission. The challenge is to develop better methods for incorporating multiple scales into a single framework.
As complex adaptive dynamical systems, viruses experience critical phase transitions at different levels of organisation (Solé and Elena, 2019). These critical transitions involve a sudden change in the dynamical behaviour or in the internal structure of the system. Differ-ent types of models (mathematical and computational) can be defined at each scale ( Fig.1, right panel). At the lowest molecular level, the concept of critical points at phase transitions has been used to describe process such as the assembly and disassembly of viral particles (Dharmavaram et al. 2017), the error threshold associated to the highly mutagenic replication of viral genomes (Eigen 1971), the spread of perturbations across host protein-protein and regulatory networks induced by viral factors ( In this article, we promote the idea that phase transitions should be considered as a unifying principle across the different scales of virus complexity. In the following sections, we will illustrate this idea by presenting a number of well-characterised examples of phase transitions in virology. All selected examples involve transitions between different, well-defined phases and involving distinct thresholds separating them. Some of the mathematical forms used below are summarised in Fig. 1 (right), and in some cases they have a beautiful simplicity and yet a great explanatory power. These include: self-assembly of viral particles, intra-host infection dynamic, the existence of epidemic thresholds, the triggering of large-scale pandemics on highly connected networks, and the evolution of multipartite viruses as a case of mutualistic interactions between molecular replicators. In addition, we will describe three more examples that fit within the more general class of information and diversity thresholds: the error catastrophe at increasing mutation rates, the twosides of recombination in purging deleterious variation or in driving towards extinction, and the antigenic diversity threshold in the set point of HIV-1 progression.
These models have, to some extent, an independent nature with respect to others in the hierarchy. The reason of that is that a qualitative component (grey labels, fig. 1) is required to move towards different levels. Importantly, moving to another scale does not mean that the model inherits all the features of the previous scale. On the contrary: at the new scale, many details from the previous become irrelevant. In all examples, we will provide a simplified mathematical description of the system and show the existence of these critical organisation thresholds that can be clearly identified by means of the so-called bifurcations. Readers less interested in mathematical details can jump over the next section without losing the main messages.

II. PHASE TRANSITIONS: SIMPLE MODELS EXPLAIN COMPLEX CHANGES
Phase transitions are introduced in this paper with the aim of providing a proper framework to understand the nature of change in virology. By change we refer to the qualitative dynamical and structural shifts of organisation of complex systems. The term phase transition was coined to describe the sudden shift between states of matter, but has been since then generalised to many other domains (Solé, 2011). An example is the phase change from ice to liquid water or from water to steam, as shown in Fig. 2a . Each phase is defined by some state that is uniform within its domain in the phase diagram, which essentially tells us which phases are stable under what conditions. Such properties can rapidly change close to the boundaries between phases. Figure 2b shows a classical example, namely the transition from liquid water to steam. Within a given phase, no major differences can be seen beyond smooth quantitative changes. As an example, when water is heated from 2 o C to 25 o C the same state (liquid) is observed and its density only changes slightly (by just 2 %) while it decreases 1600fold as boiling temperature is reached. In other words, a slight change close to the transition point drives a very large, abrupt change in density. Such abrupt transition is termed "first-order" and is also observed in the melting of ice. This occurs also in other diverse systems, such as polymers and other materials experiencing qualitative structural re-arrangements (Dill and Bromberg 2012), abrupt climate change (see Solé 2011 and references therein) or ant colonies exploring their environments (Beekman et al. 2001;Piñero and Solé 2019). In all these systems it is possible to define a phase diagram showing the parameter values at which the different states (phases) are found. Now, instead of temperature and pressure, the axes can include, for example, molecular concentrations or strength of ionic forces, which affect the types of capsid assemblies for the tobacco mosaic virus (TMV) (Fig. 2c-d). The process of virus assembly requires the formation of rods (Fig. 2c) which is affected by key external parameters defining a space of shapes (Fig. 2d). The structures shown here are the dominant ones in each phase (Klugg 2010).
A different class of transition, known as a second-order (or continuous) phase transition, involves a qualitative change as well but this change occurs in a smooth manner as parameters vary. This is the case for example in the behaviour of ferromagnetic materials: when a magnet is heated, it initially maintains its magnetisation, which decays until it vanishes altogether at a critical temperature T c . In these transitions, remarkably rich behaviour can be observed, including both very high variance in both structural and dynamical traits. wide fluctuations in the measured macroscopic properties (such as magnetisation) are observed as we approach criticality (the phase transition point) where the variance diverges. For this example, we have a microscopic description that allowed to formulate a simple model capable of capturing most relevant features of the real transition. This is the so called Ising model or, as physicist Nigel Goldenfeld calls it, "the Drosophila model of statistical mechanics".
A specially important consequence of the Ising model was the realisation that it was able to explain not just the presence of phase transitions. When analysing the quantitative properties of this model close to critical points, it turned out that they predicted with enormous precision the experimental observations. In other words, in order to accurately explain the experimental data, extremely simple models with the minimal description of units and their interactions turned to be enough. Actually, very different systems undergoing phase transitions were shown to respond to exactly the same class of minimal model. Such universality pervades the theory of critical phenomena (Stanley et al. 1996, Kadanoff 2000, Goldenfeld 2018) and has been used in multiple scenarios, from cosmology to social and economic systems. A full analysis of these Ising-like models would require going deeper into statistical physics methods (Goldenfeld 2018) but it can be shown that, on a coarse-grained approximation, the so called mean field b Ising-like models b Mean field models are a very powerful tool in statistical physics.
They ignore local, spatial correlations among the interacting en-6 still provide a very accurate approximation to the problem (Christensen and Moloney, 2005;Solé 2011). An additional message that emerges from the success of simple models is that even apparently too complicated systems can be properly approached by highly simplified dynamical pictures. This is in fact at the root of the success of mathematical models and, to a large extent, the success of well established disciplines like the physics of collective phenomena and, in a more general sense, the theory of complex systems. Intuition tells us that a complicated system (almost any real system we would handle) would require a proportional degree of detail and complication. If that were true, models would be seldom useful to understand reality, since they would become too complicated to be understood. The good news is that, for most complex nonlinear systems, there are a few key variables that play a major role in controlling the dynamics of different phases of a system as well as the presence of transitions between them (Haken 1975;Nicolis 1995).
In this paper we present several models for a diverse range of problems in virology. Although models incorporating microscopic dynamics have been developed, our approach is based on mean field approximations where only the population-level dynamics is taken into account. This means that we consider the average features of interactions and use them to construct population-level descriptions. This allows to show the general relevance of the concept of phase transition, its conditions for occurrence and relevance, avoiding a more technical and lengthy presentation of each case study.
A. Self-assembly of viral particles As a first example of phase transition at the molecular scale, let us consider the self-assembly of viral capsids. This process depends on energy-driven physical forces. Self-assembly results from cooperative interactions and is a crucial component of emergent dynamics ( Zlotnick (2004), understanding the physics of viral self-assembly requires a proper knowledge of the requirements of viral stability and how it is regulated. In this section we consider the problem of how key parameters such as the concentration of capsomers can act as a control parameter for a phase transition separating a set of disconnected units from a fully formed capsid. How such model can be formulated and what are its consequences? tities assuming the system is perfectly mixed. This assumption usually allows deriving dynamical equations.
Several studies have used physical models of capsomere structure and interactions, usually based on different extensions of well-known reaction kinetic models including self-assembly and polymerization (Kushner 1969)  Let us now explore the simplest picture of self-assembly processes based in a kinetic model leading to a dynamical pattern of aggregation characterised by the presence of a second-order phase transition behavior in micelle aggregation (Dill and Bromberg 2011) or the assembly kinetics of spherical capsids (Endres and Zlotnick 2002). In both cases, the same minimal model can be sued, where a set of n objects (capsomers) aggregate to form a closed system (the capsid). If we indicate by A 1 single building capsomeres and as A n the full assembled capsid, the reaction for the whole process could be easily described by If we indicate by x the concentration of capsomers, the parameter K allows to write Which gives ν ≈ 1 for small x and ν ≈ n for large x. This solution predicts the existence of a critical capsomer c This is a simple approximation that assumes that at equilibrium only capsomers and full capsids are present at significant concentrations. A more realistic approach to the process would require a set of single-step reactions (Zoltnick 1994) but the observed nonlinear effects described here remain essentially valid.

Preprints
and it can be shown that the shape of the ν(x) follows a marked stepwise behaviour. This cooperative behaviour indicates that, once a critical capsomeres concentration x c is reached, the system experiences a rapid and irreversible transition into large structures with a characteristic size. This occurs in a thermodynamically favoured direction, and thus the resulting assemblies are highly stable structures. In a chemical system formed by inert molecules, self-assembly takes place by an energy-minimization process (that is captured in the irreversible reaction described above) eventually ending up in stable assemblies with a characteristic size. Continuous translation of the virus' capsid protein gene(s) into capsomeres creates the conditions for this transition as the concentration of capsomeres in the cell cytoplasm increases until the x c required for selfassembly is reached.
The specific application of this approach to the self-assembly of viral capsids can be used to obtain an expression for the fraction of capsids f (ρ) present in a given system (such as the cytoplasm) as a function of the capsomere concentration (ρ) assuming that (as before) we neglect all molecular intermediates except free capsomeres (Hagan 2014). Once again, the total concentration of capsid units ρ T is given by: where N is the number of capsomeres necessary to build a capsid, and ρ 1 and ρ N the densities of single capsomeres and whole capsids, respectively, one can define the fraction of capsomeres already forming part of capsids using The critical concentration ρ c reads (Hagan 2014) for ρ T ρ c , from a second phase in which virus capsids form, and f reads when ρ T ρ c . The phase transition curves predicted from the model are displayed in Fig. 3a, using three different values of N . The results of an experimental test of this model are shown in Fig. 3b, where different concentrations of capsomeres have been used under variable salt concentrations enhancing the self-assembling process (see Hagan (2014) for details).
The study of the microscopic processes underlying viral assembly has been a very active area where the use of both kinetic and thermodynamic (physical) models has been very successful. In particular, molecular dynamic models provided the basis for describing the details of such self-assembly using energy functions that capture the nature of the underlying potentials (Rapaport 2010(Rapaport , 2014. In Fig. 3c we display an example of the simulation outcome of this process, where the units have a three-dimensional geometry and assembly follows several intermediate steps, involving for example the formation of shells. More sophisticated models allow to study the interaction between genomes and capsomeres as the full viral particle is formed (Permuter et al 2013).

B. Intrahost infection dynamics
When a model of cell-virus interactions is to be build, population biologists usually consider the number of individuals in each two species (say X and Y for cells and virus, respectively) as the key variables. Interaction, growth, and mortality parameters are then introduced and the main problem is how to properly express the functional relations associated to all these processes (Case 2000). The reactions required to represent this Lotka-Volterra type model (May 1973; Gotelli 1988) are represented schematically in the inset of Fig. 4. One could say that this is too simple to represent the true complexity of a natural virus-cell system, but the truth is that this toy model accounts for one particularly relevant property of these systems, namely the presence of cycles, which are a consequence of the internal dynamics of the system. Instead of being driven by some external driver, the nonlinearities of the model fully account for the emergence of oscillatory behaviour.
Consider now the problem of a minimal model of viral infection ( Nowak et al. 1996) and assumes that a population of cells is generated at some constant rate λ. These cells are the target of a virus that infects them at some rate k and this process generates infected cells, which produce virus particles at some rate p (the burst size). All the three populations decay with given rates (indicated as ε, δ and γ in the inset of Fig. 4).
The set of interactions also involves pairwise 'reactions', but in this case viruses transform their host (turn- and the equilibrium value for the virus population given by The last expression gives a critical condition for the virus to persist: since we need V s > 0 to meet this condition, we have p/ > "/k or, by rearranging terms, if R 0 is the so called basic reproductive number and gives a threshold condition for infection success. This model gives intuition concerning the potential outcomes of the infection dynamics. It has two alternative steady states, obtained from dT /dt = dI/dt = dV /dt = 0. The first corresponds to the virus-free system, and is given by S 0 = ( /", 0, 0), whereas the second point provides the steady state S 1 = (T s , I s , V s ) where we have a steady infected population and the equilibrium value for the virus population given by The last expression gives a critical condition for the virus to persist: since we need V s > 0 to meet this condition, we have p/ > "/k or, by rearranging terms, if R 0 is the so called basic reproductive number and gives a threshold condition for infection success.
This model gives intuition concerning the potentia outcomes of the infection dynamics. It has two alternative steady states, obtained from dT /dt = dI/dt = dV /dt = 0. The first corresponds to the virus-free system, and is given by S 0 = ( /", 0, 0), whereas the second point provides the steady state S 1 = (T s , I s , V s ) where we have a steady infected population and the equilibrium value for the virus population given by The last expression gives a critical condition for the virus to persist: since we need V s > 0 to meet this condition we have p/ > "/k or, by rearranging terms, if R 0 is the so called basic reproductive number and gives a threshold condition for infection success. R 0 > 1. R 0 infection rate model assumes that we have a well mixed ergodic system where encounters between viruses and cells occur at random with homogeneous rates. This is of course another simplification: susceptible cells are usually strongly structure in space forming tissues, the model considers no evolution of the viral component, which of course is also a very strong assumption.
The following set of equations describe the basic dynamics outlined in Fig. 3: This model gives intuition concerning the potential outcomes of the infection dynamics. It has two alternative steady states, obtained from dT /dt = dI/dt = dV /dt = 0. The first corresponds to the virus-free system, and is given by S 0 = ( /", 0, 0), whereas the second point provides the steady state S 1 = (T s , I s , V s ) where we have a steady infected population and the equilibrium value for the virus population given by The last expression gives a critical condition for the virus to persist: since we need V s > 0 to meet this condition, we have p/ > "/k or, by rearranging terms, if to the a m clu as int tio wh rem gen we we scr cov an sys wh po of con dim act model assumes that we have a well mixed ergodi tem where encounters between viruses and cells oc random with homogeneous rates. This is of cour other simplification: susceptible cells are usually st structure in space forming tissues, the model consid evolution of the viral component, which of course a very strong assumption.
The following set of equations describe the bas namics outlined in Fig. 3: This model gives intuition concerning the pot outcomes of the infection dynamics. It has two native steady states, obtained from dT /dt = dI dV /dt = 0. The first corresponds to the virus-fre tem, and is given by S 0 = ( /", 0, 0), whereas the s point provides the steady state S 1 = (T s , I s , V s ) we have a steady infected population ◆ and the equilibrium value for the virus population by The last expression gives a critical condition for the to persist: since we need V s > 0 to meet this cond we have p/ > "/k or, by rearranging terms, if model assumes that we have a well mixed ergodic system where encounters between viruses and cells occur at random with homogeneous rates. This is of course another simplification: susceptible cells are usually strongly structure in space forming tissues, the model considers no evolution of the viral component, which of course is also a very strong assumption.
The following set of equations describe the basic dynamics outlined in Fig. 3: This model gives intuition concerning the potential outcomes of the infection dynamics. It has two alternative steady states, obtained from dT /dt = dI/dt = dV /dt = 0. The first corresponds to the virus-free system, and is given by S 0 = ( /", 0, 0), whereas the second point provides the steady state S 1 = (T s , I s , V s ) where we have a steady infected population and the equilibrium value for the virus population given by The last expression gives a critical condition for the virus to persist: since we need V s > 0 to meet this condition, we have p/ > "/k or, by rearranging terms, if ing T cells into infected I cells) into a different type of 'particle' and moreover they are produced by the infected cell population I at a constant rate. Additionally, the model assumes a completely mixed system where encounters between viruses and cells occur at random with homogeneous rates. This is of course another simplification: susceptible cells are usually strongly structured in space forming tissues, the model considers no evolution of the viral component, which of course is also a very strong assumption.
The following set of equations describes the basic dynamics outlined in the inset of Fig. 4: This model gives intuition concerning the potential outcomes of the infection dynamics. It has two alternative equilibrium points (steady states), obtained from dT /dt = dI/dt = dV /dt = 0. The first corresponds to the virus-free system, and is given by S 0 = (λ/ε, 0, 0), whereas the second equilibrium point provides the steady state S 1 = (T s , I s , V s ) where we have a steady infected population and the equilibrium value for the virus population given by The last expression gives a critical condition for the virus to persist: since we need V s > 0 to meet this condition, we have pλ/δγ > ε/k or, by rearranging terms, if R 0 is known as the basic reproductive number, representing a threshold condition for infection success occurring when R 0 > 1. R 0 thus defines the critical value for a second-order transition, as shown in Fig. 4 Perelson and Weisbuch 1997 for a more physics-oriented view). Despite the potential richness and high-dimensional character of extended models, it is often feasible to find the conditions for successful propagation by means of simple models. This has also important consequences for defining the critical conditions for successful therapies (Perelson 2002).

C. Epidemic thresholds
Epidemiological models are diverse in their structure and complexity, depending on how many categories of individuals are included and whether they are deterministic or incorporate stochastic components. The simplest possible model is the susceptible-infected-susceptible (SIS) one, that only includes susceptible (S) and infected (I) individuals. This is a toy model, and some basic assumptions are required. The first one being that I + S = N , which means that, at some scale, the total population remains constant. Secondly, there is no heterogeneity and thus all interactions between individuals are weighted with exactly the same parameter values. In a well-mixed system, the rules outlined above can be described with two reactions associated to infection and recovery, given by: It is easy to show that the equations describing our system are: and since the total population I + S is conserved, we can use a normalized set of densities, with ρ = I/N and S/N = 1 − ρ and write a one-dimensional model: consistently with the previous result. We now have a one-variable model that can be solved analytically. This equation is actually the well-known logistic model with density-independent decay rate. This simple model is known to suffer a smooth transition as µ decreases below a given threshold (Fig. 5). Indeed, the mathematical picture of this smooth continuous phase transition is given by the so called transcritical bifurcation (Strogatz 1994). Equation (17) can actually be re-written in a logisticlike form by defining ρ * = 1 − α/µ, i.e.
The solution of this differential equation is: ρ 0 being the initial condition. After long periods of time, i.e., in the limit ρ ∞ = lim t→∞ ρ(t) two solutions are possible, namely ρ ∞ = ρ * when µ > α and ρ ∞ = 0 when µ < α. The first point involves a stable epidemic event that would infect a fraction (1−α/µ) of individuals of the population, whereas the second represents the extinction of the virus. The critical point µ c = α separates the subcritical phase, where the epidemics dies out from the supercritical phase, where the epidemics is self-maintained. An important result from the previous model definition is that, at the supercritical phase, the initial growth of the epidemics is exponential. This can be shown by assuming Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 29 October 2020 doi:10.20944/preprints202002.0261.v2 that the current relative frequency of infected individuals is very small, i.e., ρ 1. In this case, it is possible to make the approximation 1 − ρ ≈ 1 and thus the equation for epidemic spreading now becomes: where we define R 0 = µ/α, as the basic reproductive number. Using this definition, an exponential growth is obtained which is positive (and epidemic spreading occurs) provided that R 0 > 1 and the epidemic would die out otherwise. R 0 can also be seen as a the critical point at which the system transitions from no epidemic to epidemic success. Values of R 0 are quite variable among viruses. For example, it was estimated in the range 1.5 -2.5 for the Ebola virus. It ranges from 2 to 5 for the sexual transmission of HIV-1, while the measles virus, which is airborne transmitted, has R 0 values ranging from 12 to 18. It is interesting to note that R 0 involves several components, including the infectivity of the pathogen µ, the recovery rate α and the population size N . Therefore, the behaviour of the critical point can be explored as a function of any of these three parameters, e.g., µ as shown in Fig. 5. This very basic SIS model can be improved by adding specific spatial structure: hosts live in a lattice and the probabilities of transmission depend on the physical distances between individuals in the lattice. Obviously, this system makes computations slightly more complicated, but the conclusion is that a critical R 0 can still be recovered as a function of the rate arrival of infected individuals into the population, β, and the rate of recovery, α (Anderson and May 1998): R 0 =β/α, which still defines a second-order phase transition. The exact location of the transition point has moved and the shape of the curve on the right hand side (Fig. 5) is slightly different, but the phenomenon itself remains preserved.

D. Large-scale pandemics on networks
In the previous section we have considered the very simple SIS model of epidemic spreading that occurs in a well mixed (mean field) context. Similar results would be found by assuming that individuals interact at random with a given probability. However, the networks of interactions among humans can depart from the mean field approach. Similarly, transportation networks connecting humans on large scales strongly depart from these simplified pictures If we look at the probability P (k) of a node (e.g., an airport) being connected to k other nodes, they follow heavy-tailed distributions described as scaling laws, i.e. P (k) ∼ k −γ , where γ is known as the scaling exponent. Here we will explore the problem of epidemic spreading in scale-free networks by means of a SIS model. Each node in the graph of interactions represents an individual and each link a potential transmission event. The average density of infected individuals ρ(t) (prevalence) at the mean-field level is where µ is the infectivity of the virus and k the average degree of the graph (i.e. the number of connections per node). As we can see, we have just recovered the mean field equation for the SIS model, given by Eq. (17). By defining an effective spreading rate λ = µ/α (this implies rescaling time i.e. t → αt) we can write: The benefit of this mean-field equation stems from the fact that density correlations are ignored. On random graphs and related graphs, one can assume that k k . It is easy to demonstrate that a non-zero epidemic threshold exists at λ c = k −1 such that So far, everything seems pretty much the same, but a crucial property of scale-free networks changes everything The fluctuations k 2 in scale-free networks diverge for any value of the critical exponent 2 < γ < 3 and thus highly connected nodes are statistically significant: the mean field approximation breaks down. In order to take into account these fluctuations, the relative density ρ k (t) of infected nodes with given connectivity k must be taken into account. The mean-field equations can thus be written as for k = 1, ...N . A new term Θ(ρ(t)) indicates the probability that any given link points to an infected node. The probability that a link points to a node with k links is proportional to kP (k). A randomly chosen link is thus more likely to be connected to an infected node with high connectivity, yielding where k kP (k) = k by definition. At the steady state dρ k (t)/dt = 0, and hence and the following relation follows: where Θ is now a function of λ alone. The solution Θ = 0 is always satisfying the previous equation. A non-zero stationary prevalence (ρ k = 0) is obtained when the right hand and the left hand sides of Eq. (34), expressed as function of Θ, cross in the interval 0 < Θ ≤ 1 allowing a nontrivial solution. This leads to the following inequality d dΘ (See Pastor-Satorras and Vespignani 2001 for details) be-ing satisfied, defining the critical epidemic threshold by: or rewritten as which is nothing but the inverse of the coefficient of variation of the network's degree (Anderson and May 1991). This result means that in scale-free networks with γ ∈ (2, 3) λ c = 0 (see dashed line in Fig. 6b)  previous equation and calculate the prevalence in the endemic state as follows: In the particular case of the Barabási-Albert network with γ = 3, we have ρ ∼ exp(−C/λ) where C is a constant.
The absence of any epidemic threshold in this network can be understood by noticing that in heterogeneous systems R 0 contains a correction term linearly dependent on the standard deviation of the connectivity distribution. In scale-free networks the divergence of the connectivity fluctuations always leads to an R 0 > 1 at any rate λ. This ensures that epidemics always have a finite probability to survive indefinitely. This of course makes more difficult to properly exploit a propagation threshold and define containment strategies. However, it is possible to "cure" the hubs (by immunizing them) and generate an effective threshold ( The key idea here is that, once these highly connected nodes are protected from infection, the remaining graph is not fat-tailed anymore, as shown by the inset graph in Fig. 6d. The presence or absence of epidemic thresholds in the real world has thus considerable implications, both in the context of disease spreading and the propagation of computer viruses (see Solé and Elena 2019 and references therein).

III. THE EVOLUTIONARY DIMENSION: MULTIPARTITE VIRUSES AND MUTUALISM
The previous examples illustrate how simple dynamical models can display marked changes in the behaviour of a diverse range of problems in virology. Except for the self-assembly problem, all these case studies share a unifying feature: in a way or another, the transition requires achieving a critical threshold level of infection beyond which propagation occurs. There is, however, another layer of complexity beyond these two-phase systems that has to do with the nature itself of ecological interactions and their impact on virus evolution. Crossing thresholds in these cases has consequences for the presence or absence of cooperative interactions among different viral components.
Transitions between different kinds of population level dynamics, such as competition versus mutualism or parasitism, are particularly important for evolutionary biology. What kind of transitions can be found in models of virus dynamics? A very interesting example is given by a particular class of RNA viruses known as multipartite viruses . In a multipartite virus, no single viral particle contains all the genomic segments, which appears segregated in two or more particles. As a consequence, the information required for a full infection cycle is not linked into a single molecule but distributed.
An example of multipartite virus (Fig. 7a) is tobacco rattle virus (TRV), which infects, along with tobacco, many other plant species. It consists of two particles, a long and a short one. The first contains the RNA1 segment encoding for the replicase, while the second particles encapsulates RNA2 segment that encodes for the coat protein. As a consequence, the long particle can infect and trigger the disease on its own, but the RNA1 will be trapped in the plant unless the coat protein is present. A full cycle of infection thus requires the two particles to be present. This kind of constraint creates somewhat of a puzzle: viruses are typically seen as very efficient, optimal replicators, but the picture of viruses split into different particles seem to contradict this picture. What are the consequences of this genome splitting? A model brought forward by Nee (2000) illustrates the richness of population models of multipartite viruses and the presence of different types of ecological interactions. The model includes three variables and the transitions between states, as described in Fig. 7b. Here a given habitat is considered, which can be visualised as a grid of patches, as usual in many models of metapopulation dynamics (Mollanen and Hanski 1998; Hanski 1999). Patches can be empty, occupied by a plant carrying only RNA1 or plants carrying the two components of the multipartite virus. These populations (normalised) are indicated as x, y, z respectively. A fraction 0 ≤ h ≤ 1 of this habitat is occupied.
Two main events are considered here: colonisation, at a given rate c and extinction, at a rate e. Both are probabilities and thus 0 ≤ c, e ≤ 1. The basic model defining the dynamics of each component is given by (Nee 2000): with a normalization condition defined by: This constraint allows to reduce our three-equation model to a two-dimensional system, i. e.
The equilibrium points obtained from dy/dt = dz/dt = 0, are (after some simple algebra) A domain involving three possible equilibrium points (two of them stable) is given by the condition i.e. the argument within the square root needs to be positive. This will occur provided that This is illustrated in the bifurcation diagram in Fig.  7c, where the possible equilibrium values of z are plotted against the available population size h. The discontinuous jump separating the phases of mutualism and extinction is given by the so-called saddle-node bifurcation, which often arises in systems with positive feedbacks such as facilitating or cooperation. The diagram thus indicates that a minimal accessible number of individuals (plants in our metapopulation context) is required to allow for the propagation of the multipartite viruses. This is indicated by the extinction phase (left), separated by a mutualistic, persistence phase (grey domain). The positive solution starts at a value z * = Σ defining the minimal (critical) population size. This result is related to those found within the context of extinction thresholds in metapopulation models due to habitat loss and fragmentation including continuous (Bascompte and Solé 1996)

IV. INFORMATION AND DIVERSITY TRANSITIONS
In this section we expand our previous case studies of phase transitions with three more examples. These cases are not related to transitions at different levels of virus organisation, as we have discussed so far, but with another type of phenomena: phase transitions involving changes in information and genetic diversity. The first example is the well known phenomenon of the meltdown of genetic information at increasing mutation rates, with viral populations entering into the so-called error catastrophe regime. This is a nice example of second-order phase transition. The second example has to do with the dual role of homologous recombination in the rescue of mutationally compromised populations if recombination is at some critical value, or driving them to extinction if recombination is far too high. Interestingly, the first situation corresponds to a second-order phase transition, while the second situation corresponds to a first-order one. The third example deals with the role of withinhost HIV-1 diversity in the progression towards AIDS, which is also a second-order phase transition.

A. The error catastrophe
Because of their intrinsic simplicity and their strong dependence upon the host molecular machinery to complete their cellular infection cycle, viruses are unique dynamical systems. One particularly important trait of RNA viruses is their high mutation rates, much higher than any other rates exhibited by cellular systems and a consequence of the lack of repair mechanisms associated to their RNA-dependent RNA polymerases (the inset in Fig. 8 shows the structure of hepatitis C virus replicase). This enzyme catalyses the replication of RNA templates producing new RNAs and mutation rates per nucleotide and replication cycle are in the range 10 −4 −10 −5 (Sanjuán et al. 2010). Recent research has also determined mutation rates of about 10 −3 per base per cell in the revers transcriptase of HIV-1 (an RNA-dependent DNA polymerase) in peripheral blood mononuclear cells (Cuevas et al. 2015). In DNA-based systems, such as the cellular hosts, the process of DNA polymerisation is usually associated to a proofreading and highly efficient and redundant repair mechanisms that effectively reduces mutation rates to a range 10 −8 − 10 −11 ensuring a controlled replication cycle (Drake et al. 1998). Since high mutation carries a burden of genetic errors, this implies that many resulting viral genomes can contain deleterious changes leading to non-viable viral particles.
Mutation is a crucial component of evolution, as genetic variability is the fuel on which natural selection operates to adapt populations to their environment. In this sense, an error-prone polymerase can be seen as useful to keep pace with the always changing environmental conditions in which RNA viruses live (Domingo 2000). However, keeping in mind that mutation is a random process independent on the value that mutations may have in the future generations, mutation itself is a double-edge sword: too many mutations per genome may simply drive fitness levels to such a low values that would not be compatible anymore with a successful replication. Therefore, mutation rates, as any other trait, have evolved and have been optimised for the lifestyle of RNA viruses: just high enough but not more (Elena and Sanjuán 2005). For RNA viruses, a heterogeneous population results in a socalled viral quasispecies (Eigen 1971;Eigen et al. 1987;Schuster 1994;Domingo and Holland 1994;Domingo et al. 1995;Domingo et al. 2012). A viral mutant swarm can be seen as a group of genomes dominated by a master sequence of high fitness that may, or may not, coincide with the average sequence of the population, the consensus sequence.
The quasispecies population structure has many implications for the biology of RNA viruses. The most important is that the mutant swarm stores a reservoir of phenotypes crucial to cope with environmental uncertainties: within the context of the virus infection and pathogenesis, that includes the host responses tied to immunity but also others such as tissue specificity or resistance to drugs ( One particularly unexpected consequence of the quasispecies nature of viral populations is deeply connected to the informational nature of RNA viruses. This is known as the error catastrophe problem (Eigen 1971;Eigen et al. 1987;Schuster 1994;Domingo and Holland 1994) and is tightly related with second-order phase transitions. It was originally defined within the context of an abstract population of mutating molecular replicators competing for limited resources. More precisely, Eigen and Schuster considered a large population of genomes where each sequence could replicate at some fix rate. Replication rate will be sequence-dependent and the relation between sequence and growth rate should be expected to be complex. Additionally, it is assumed that every time a string replicates, mutations can occur at a given rate µ. Eigen (1971) predicted that there is a critical mutation rate, µ c , that decays as µ c ∼ 1/ν beyond which no Darwinian selection can occur, and thus no viable sequences would be observable for mutations higher than µ > µ c . In that case, random drift would be observed. Instead, below the threshold, information can be maintained in stable ways. Experimental data confirms this inverse relationship. and thus mutation rates decrease as an inverse power law of genome length (Fig. 8). (if any) of mutation rates in RNA viruses? Eigen-Schuster quasispecies model (Eigen 1971) considers a set of populations {x i } representing the abundance of different genomes, changing in time by the following set of dynamical equations: where x i indicates the fraction of the population associated to the i−th mutant genome equipped with a Mletter alphabet (here i = 1, ...n, where n = M ν is very large, ν being the length of the genome) so that a normalization condition applies, namely: n j=1 x j = 1. Here f j is the growth rate of the j-th mutant, µ(j → i) is the probability of having a mutation from sequence j to sequence i and Φ(x) d is the average fitness associated to the population vector x = (x 1 , ...x n ), i.e., This model can sometimes be treated analytically under a number of well defined set of conditions, showing that the population structure corresponds to a cloud of sequences (Eigen 1971;Eigen et al. 1988Eigen et al. , 1989Schuster 1994).
In this section we consider a specific case that will illustrate how mutation can sharply limit the length of genomes and thus the amount of information stored in a quasispecies. The problem considered here can be mapped into a high-dimensional sequence hypercube (Fig. 9) , where each string is a digital genome, connected to nearest neighbors in the cube that differ by one bit (single mutation), considering binary genomes i.e. M = 2. Without loss of generality, the general model described above can be collapsed into only two fitness classes, namely f m for the master and f for any other sequence of the mutant spectrum (i.e., f 1 = f 2 = ... = f n = f ) where n is very large (n 1). Hereafter it is assumed that f m > f , i.e., the master sequence replicates more efficiently than any other sequence (Swetina and Schuster 1982). Assuming that i.e., µ(i → j) = µ, we can split our system of equations into two sets: the master sequence and the mutant sequences. The system presented below is only d Φ(x) is usually termed as the dilution outflow, which ensures a constant population ( n i=1 x i = constant and n i=1ẋ i = 0) also introducing competition between replicating genomes a simplified approximation to the space connecting different genomes. A more accurate picture is provided by Fig. 9(b): from a given population x j , mutation will not lead back to the master sequence or will be difficult to occur. In this case, we will have x m + x = 1 where we use x = j x j . For the master sequence we get: whereas the set of equations for the mutant sequences reads: It is easy to show, after some algebra, that the first equation can be simplified to where we have used µ/n 1. Since the ultimate goal is to describe the dynamics of all these populations together, so we made the sum over all possible mutant genomes, i.e., Given the homogeneous mutation rates, a stationary distribution will lead to x i ≈ x/n and thus the approxima- and as a result of these approximations, the twocompartment model ( Fig. 9(a)) can be collapsed into a one-dimensional system (Swetina and Schuster 1982): As we did above for the replicator model, the condition x m + x = 1 leads to Φ(t) = f = f m x m + f x. Using this function, and since x = 1 − x m , it can be obtained after some algebra an equation for the master sequence: Two alternative equilibria are possible. The first one being x m = 0, that corresponds to the extinction of the a b mains dominant regardless the replication m strong-effect mutations accumulate less th mild effects, irrespective of the mode of rep GR is more sensitive to the accumulation than SMR (Fig. 4a), as indicated by the ste positive master strands. A similar situation o diate mutation rates ( ϭ 0.25) (Fig. 4b) modes accumulate more mild-than strong-ef the GR accumulates proportionally more m higher mutation rates ( ϭ 0.6) (Fig. 4c) r SMR in an important way, that is, positive m not dominant anymore for the entire range o ities and, instead, the mutant ones become t class, although it is still possible to recover th along all the range of mutational severities Mutation rate

Per-bit mutation rate
Using a string description of the population structure (Sole´et al., 1998), we can measure the expected effects of changing replication rates on the distribution of mutants and their frequencies. Here each string S k ðk ¼ 1; . . . ; NÞ is a small genome of size n, i.e.
where S i k 2 f0; 1g represents a vertex S k 2 H n of a ndimensional hypercube (see Fig. 3). The algorithm starts with an initial population of master sequences and repeats, at each generation, N times the following set of rules: 1. We take a string S i at random from the population and replicate it with probability f ðS i Þ. Here two replication probabilities are also defined, one for the master sequence (where S k ¼ 1 for all k ¼ 1; . . . ; n) and the other for the rest of strings, to be indicated as f 1 and f 2 , respectively. 2. Replication takes place by replacing one of the strings in the population (also chosen at random) say S j aS i by a copy of S i . The copy mechanisms present error (mutation rate m b ) per bit and replication cycle, respectively.
Using digital genomes of size n ¼ 32 we represent, in Fig.  4, the evolution of the master sequence frequency at decreasing fitness values for such sequence. For this particular run, we have used m b ¼ 0:04 and f 2 ¼ 0:05 in a population of N ¼ 500 strings. Using Eq. (8), we have a theoretical estimate for a replication threshold of f c 1 ¼ 0:184, which fits well the simulation results. As expected, beyond the critical rate f c 1 the master sequence (thick line) is not found at all in the population, whereas the other sets drift at random through sequence space. In Fig. 4, we also display the frequencies for the sequences with 1; 2; 3; . . . ; n different bits from the master one (narrow lines). Beyond such replication threshold the whole population consists of a wide spectrum of mutants with different sequences and thus a larger population diversity, as seen in the HCV cases with low viral load. The time evolution of the master sequence is represented in Fig. 5.

MSeq MSeq
(a) (b) Fig. 3. Sequence space for a small-sized, four-bit ðn ¼ 4Þ strings. All but one of the sequences have the same fitness f 2 . The bottom left node has a fitness f 1 . The replication rates thus define a fitness landscape. Two situations are indicated here, namely: low (a) and high (b) master sequence replication rates. At low replication of the master sequence, other sequences are equally able to occupy the space, whereas for higher master replication it shows a localized quasispecies centred around the master.
The lower drawings indicate what should be expected to observe in terms of the abundances of each string. Here the master sequence frequency is indicated with a black bar. This model can sometimes be treated analytically under a number of well defined set of conditions, showing that the population structure corresponds to a cloud of sequences (Eigen 1971;Eigen et al. 1987;Schuster 1994). Let us consider a specific case that will illustrate how mutation can sharply limit the length of genomes and thus the amount of information stored in a quasispecies.
The general problem considered here can be maped into a high-dimensional sequence space, with the structure of a hyper-cube (fig, 5) where each string is a digital genome, connected to nearest neighbours in the cube that di↵er by one bit (single mutation). Without loss of generality, the general model described above can be collapsed into only two fitness classes, namely f m for the master and f for any other sequence of the mutant spectrum (i.e., f 1 = f 2 = ... = f n = f ) where n is very large (n 1). Hereafter it is assumed that f m > f, i.e., the master sequence replicates more e ciently than any other member of the mutant cloud. Let's asume for now that all mutation rates are the same, i.e., µ(i ! j) = µ. In this case we can split our system of equations into two sets: the master sequence and the rest. The system presented below is only a simplified approximation to the space connecting di↵erent genomes. A more accurate picture is provided by Fig. 7b: from a given population x j , mutation will not lead back to the master sequence or will be di cult to occur. In this case, we will have For the master sequence we get: This model can sometimes be treated analytically under a number of well defined set of conditions, showing that the population structure corresponds to a cloud of sequences (Eigen 1971;Eigen et al. 1987;Schuster 1994). Let us consider a specific case that will illustrate how mutation can sharply limit the length of genomes and thus the amount of information stored in a quasispecies. The general problem considered here can be maped into a high-dimensional sequence space, with the structure of a hyper-cube (fig, 5) where each string is a digital genome, connected to nearest neighbours in the cube that di↵er by one bit (single mutation). Without loss of generality, the general model described above can be collapsed into only two fitness classes, namely f m for the master and f for any other sequence of the mutant spectrum (i.e., f 1 = f 2 = ... = f n = f ) where n is very large (n 1). Hereafter it is assumed that f m > f, i.e., the master sequence replicates more e ciently than any other member of the mutant cloud. Let's asume for now that all mutation rates are the same, i.e., µ(i ! j) = µ. In this case we can split our system of equations into two sets: the master sequence and the rest. The system presented below is only a simplified approximation to the space connecting di↵erent genomes. A more accurate picture is provided by Fig. 7b: from a given population x j , mutation will not lead back to the master sequence or will be di cult to occur. In this case, we will have x m + x = 1 where we use x = P j x j . For the master sequence we get: whereas the set of equations for the mutant sequences reads: It is easy to show that the first equation can be simplified to where we have used µ/n ⌧ 1. For the equation describing the mutant classs, we have: master sequence and the full dominance of the pool of mutants, and a nontrivial equilibrium allowing the coexistence of both master and mutant sequences. The later solution will be positive (and the master sequence will be present) provided that x m > 0 and this will occur if the mutation rate is lower than the critical value: This expression defines the boundary separating two well defined phases, shown in Fig. 10(a). Within the quasispecies phase, master genomes will be present, along with a tail of mutants. Once the boundary is crossed, all master sequences disappear, despite their higher replicating efficiency, and only mutant genotypes remain in the population (grey area in in Fig. 10a-c). This boundary actually corresponds to a transcritical bifurcation that causes a smooth transition (Bull et al. 2005;Sardanyés 2009 Fig. 10 display results using bit-string simulations, respectively (see the next paragraph for the meaning of mutation in bit-string systems). Specifically, they show how the stationary distribution of sequences change at increasing mutation rate (solid arrow in Fig. 10a) and decreasing the replication rate of the master sequence (dashed arrow in Fig. 10a, see also Solé et al. (2006)).
As discussed above, a scaling law seems to connect the mutation rate of a given genome and its length (Fig. 8).
We are now in the position of mathematically deriving this remarkable dependence. In the previous calculations, µ defines the mutation rate per genome, but the accurate definition of mutation is to express it as changes per nucleotide site (i.e., relative to the string of length ν). If µ b is the mutation rate per site (per-bit mutation), it is not difficult to see that it is related to µ by: Here p = (1−µ b ) ν is the probability that none of the units are mutated, the difference 1 − p is just the probability that some unit (and thus the genome) does mutate. Since µ b is typically very small, we have: If we return to the previous critical condition for mutation rates and write it down as a function of µ b it turns out of where α > 0 is a scaling factor. The last expression actually corresponds to the observed inverse decay of mutation rates as an inverse of their genome size, as shown in Fig. 8.
This work has inspired the suggestion that deep connections exist between the critical threshold in RNA viruses and other transition phenomena such as cancer relapse when genomic instability grows beyond a (still to be tested) critical threshold (Solé 2003 As a final point, it is worth mentioning that the similarity between the quasispecies as described by Eigen's theory and the physics of phase transitions goes beyond the analogy. It is actually possible to show that there is actually a mapping between this theory and the dynamics of the two-dimensional Ising model (Leuthäusser 1986(Leuthäusser , 1987Tarazona 1992). Specifically, it is possible to establish a parametrisation connecting mutation rate with a temperature parameter and the existence of a universal theory that pervades both classes of models. These The quasispecies model presented above defines a minimal approach to a largely complex problem, where other phenomena can play an important role. An important phenomenon, pervasive among RNA viruses, and that contributes to generate genetic variability along with mutation is recombination. To explore whether recombination also generates informational phase transitions, let us consider, once again, our simplified model where a twospecies (master and non-master) are used, but now we also introduce the possibility of recombination between individual genomes. This process implies an additional form of diversity generation as genomes not only display mutations but also get scrambled in many ways through recombination. How does the existence of genome recombination affect the presence of error thresholds?
One approximation involves the same minimal model just described in the previous section. Consider the previous landscape and a new phenomenon that affects the generation of new sequences. Now two strings can experience cross-over thus generating, at a given rate r, new strings from the old ones. The likelihood that such event takes place grows with the amount of sequences. It can be shown (Boerlijst et al. 1996) that the two-dimensional model with a single peak with recombination now reads: The second term in the right-hand side of the first equation accounts for recombination by means of interactions between master and non-master strings taking place at rate r. The additional term ψ(x m ) weights recombination in terms of the frequency x m . The simplest choice here is to assume a linear function, namely To see the logic of this choice, consider the following argument. For small values of x m , most sequences will be non-master, and thus recombination probability 1 − x m will be high. This makes sense since a whole repertoire of strings can recombine. This is the case shown schematically in Fig. 10a, where a three-dimensional landscape considering the recombination of the sequences 111 (master) and 000 (non-master) is shown. The event is indicated by the open circle with letter ψ.
A simple recombination event can generate several non-master strings (grey arrows). Instead, when the master sequence dominates (and 1 − x m will be small) recombination events will take place with the one-bit (single mutation) distance strings, essentially returning the same strings as a result.
Using our previous definitions, the equation describing the master sequence dynamics now reads: where x = 1 − x m and, for simplicity, we have set f m = 1 (and thus 0 < f < 1). With these choices, one obtains (after some algebra) These one-dimensional equation has a trivial equilibrium point x * m = 0 and two potential extra equilibria x * m ∈ {Γ − , Γ + } given by: where we use φ = 1−f −r. In order to have real, positive solutions, it is not difficult to show that the following inequality for recombination rate is required: The critical value marks the domain where extinction is expected to occur (when r ≥ r c ). However, the nature of the transition is different for smaller mutation rates, as summarised in Fig. 10b. For higher mutation rates, the decay of the master sequence involves a continuous phase transition. By contrast, for smaller mutation rates the dominant role of recombination changes the nature of the transition, that becomes of first-order. This is a typical outcome (a rather universal one) that is observed in a wide class of dynamical systems exhibiting cubic terms ( . The introduction of a higher-order term that requires a density-dependent pairwise exchange is responsible for this phenomenon.

C. Diversity thresholds in AIDS progression
A final example of our list of viral transitions involves a theoretical model that was used to explain a paradoxical behaviour displayed by the long-term progression of HIV-1 infected patients. An intriguing question that emerged in the early times of the pandemic was the presence of an asymptomatic period: before the symptoms associated The solution for this conundrum was provided, to a large extent, by a mathematical model including the heterogenous nature of these populations and the arms race between the immune system. Such arms race creates the selection pressure required for the generation of HIV-1 antigenic escape mutants. During the asymptomatic phase of infection, error-prone replication of HIV-1 generates increasing numbers of antigenic variants. The immune system would keep fighting with novel variants as it gets undermined by the interaction and more strains accumulate. The theory predicts that there is a threshold value in antigenic diversity above which the immune system cannot control the viral population, triggering immune system's collapse Nowak et al. 1996).
To understand this antigenic diversity threshold, instead of lumping together all viral strains into a single phenotypic class, viral diversity was introduced by modelling the underlying fitness landscape of HIV-1: different viral strains lead to a different phenotype. The model here developed will be narrowed to a very simple interaction scheme where specific HIV-1 strains interact only with some specific lineage of CD4 + T cells capable of clearing the virus but also being damaged by its infection. The specific nature of the response is introduced in a minimal model with 2n equations  defined as: Here x i and v i indicate populations associated to the i−th virus strain and its corresponding CD4 + T cell partner. For simplicity it is assumed that all viruses replicate with the same rate r and the immune response given by the term p x i v i is also homogeneous. Similarly, the growth and clearance terms in the second equation use identical parameters for all strains. The interactions outlined above are graphically summarised in Fig. 12a-b. Figure 12a displays the network of interactions between the two compartments, including both specific and nonspecific interactions. This diagram is made more explicit in Fig. 12b by using one of the pairs of strains (x i and v i ) with all the assumed interactions. Summing over all the previous equations, we obtain the following expressions for the total viral population, and the corresponding immune cell response: The virus population will be under control of the immune response, i.e. dv/dt < 0 provided that the following inequality holds: From the equation for x i , we can see that the immune response leads to a stationary state, i.e. x i = kv i /(uv).
Using this result and applying it to the total viral population, we obtain an equation for the overall virus popu- where D s stands for Simpson's diversity index used in theoretical ecology as a measure of diversity. This measure is maximal for a totally homogeneous population with all strains equally represented, i.e. D s = 1 − 1/n and minimal when all viruses belong to the same strain, and D s = 0. The previous equation for the viral load can be rewritten as follows with D c s = 1 − ru/(pk). From this definition, we have a threshold condition separating a phase with no (or very low) viral growth from the escape phase where viral load grows. This is summarised in Fig. 12c, where we schematically indicate the qualitative change from no increasing diversity to the second phase (here there is an evolutionary dynamics over time) highlighted in grey where antigenic diversity starts growing. Here the model has been run under a given set of parameters (Nowak et al. 1991) along with a simple probabilistic rule where new strains are introduced with some (small) probability. As can be appreciated, population peaks occur in the first phase, associated with a single variant, to be eventually followed by a set of overlapped populations in the supercritical phase.
This model thus provides an elegant explanation of what is taking place throughout the latency phase in terms of a continuous phase transition. It also allows to understand this transition in terms of a virus-immune system arms race and makes a well-defined prediction. The model suggests (as it seems to be the case) that the long and apparently calm interval of low viremia and slow decay of CD4 + T cells hides a rapid turnover of constantly emerging viral escape mutants while a progressive damage to immune responses takes place. This would explain the apparent dormancy of the virus: in reality, there is a constant battle between host defenses and the new variants being created through mutation. This battle keeps the viral load at low levels, but also involves a high cost due to the constant turnover of CD4 + T cells.
The virus-host dynamics described above has been developed under an extremely simplified model, which lacks several layers of realism. However, a proper implementation using a quasispecies description including mutation rates and other relevant features essentially confirms the presence of the diversity threshold (Nowak and May 1991). Other authors have also addressed the nonlinear dynamics of HIV, showing how simple models can help understanding the tempo and mode of the process (Perelson and Nelson 1999; dos Santos 2001; Funk et al. 2005) and some of them inspired optimisation models to maximize the immune response against the virus (Culshaw et al. 2004). Other studies connect the phase transition phenomena described for quasispecies within the context of viruses infecting cells (Alonso and Fort 2010). Finally, some potential connections with phase transitions in percolation theory (Kamp and Bornholdt 2002) have also been put forward.

V. DISCUSSION
To build a viral capsid, a concentration threshold of capsomeres needs to be surpassed (Hagan 2014). Afterwards, self-assembly processes take control and allow building a new supramolecular structure required to complete the virus life cycle. The difference between the two regimes (sub-critical and super-critical) is not of quantitative nature. There are system-level processes that are unleashed in the second phase that have a deep biological meaning that cannot occur in the first phase, where the outcome is disconnected from biological function. The same can be said in a much larger scale, when infection by already well-formed viruses is mediated by intra-host, cell-virus interactions or (at the epidemic spreading level) to the organisms carrying them.
At this levels of description, a novel phenomenon occurs: epidemic spreading. On this scale, the selfassembly example becomes irrelevant as part of the description on the higher scale. However, we have again an interesting mathematical similarity associated to the fundamental nature of the critical condition: success and failure of infection are separated by a marked threshold. Against our intuition, there is no linear increase in the number of infected agents (either cells or individuals) as infection efficiently increases. Instead, non-linearities allow or suppress infection. In other examples, such as the multipartite virus dynamics case study or the recombination model, the transition can be sharp, described as a discontinuity. A very small parameter change translates into a catastrophic shift. Either way, a minor changes in key features can end up in major changes in the qualitative behaviour of the entire system. This is a very important point, well known within epidemiology but perhaps no so well appreciated in other areas.
The consequences of transition thresholds are considerable, since they affect the persistence of multipartite viruses in metapopulation landscapes or allow designing ways of enhancing immune responses against retroviruses. The implications are not less relevant. One in particular needs to be highlighted: because of the inevitable occurrence of transition phenomena, punctuated change should be expected to occur frequently. Small changes in parameters, if they cross transition points, trigger qualitative changes with potentially major eco-Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 29 October 2020 doi:10.20944/preprints202002.0261.v2 logical or functional impacts. Since RNA viruses have high mutation rates (Domingo et al. 1995;Cuevas et al. 2015), such events are likely to provide a source of innovation. In that case, it would be possible to approach major evolutionary transitions under the umbrella of critical phenomena.
As we have seen along this review, phase transitions typically involve a qualitative change by a small variation of some parameter/s. From a more applied point of view, such transitions may be identified in laboratory experiments since they have universal properties (e.g., increase in the size of the fluctuations and in the transient times close to phase transitions). Some of these properties can be directly quantified from observable measures in time series. These universal features have been recently discussed within the so-called early warning signals (EWS). These EWS are generic and their presence can allow to predict and anticipate critical transitions The progression of infectious diseases are nonlinear processes that can be divided into three stages (Chen et al. 2012): (i) A normal state, in which the host is not infected and its entire physiology is in a homeostatic state, which is considered to have high resilience and robustness to perturbation due to its dynamical structure. (ii) A pre-disease state, in which the host has been already infected and responds to infection; it is a latent period that can be seen as the limit of the normal state, which occurs before the imminent phase transition tipping point is reached. In this state, the system is sensitive to the virus perturbations but still reversible to the normal state when it properly controls virus? Replication and spread, but a small change in the parameters of the system may suffice to drive it into collapse through bifurcation, which often implies a critical phase transition to the disease state. (iii) The disease state represents a seriously damaged stage, possibly with high resilience and robustness, where the system usually finds it hard to recover and return to the normal state, even with antiviral interventions, which contrasts with the pre-disease state. Therefore, it is crucial to detect the pre-disease state to prevent qualitative deterioration and to further elucidate its molecular mechanisms. The recently developed dynamical network biomarker (DNB) theory (Chen et al. 2012), may help in identifying EWS indicating an imminent phase transition towards disease. Such DNBs should range from individual molecules (genes, proteins, metabolites) or more complex structures (subnetworks, modules, or pathways) in protein-protein in-teraction networks, transcriptional regulatory networks, metabolic networks, or noncoding RNA-mediated regulation of DNA epigenetic marks that are leading towards the critical transitions. In this context, Tarazona et al.
(2020) had applied the DNB theory to identify EWS of the progression of the etch disease in tobacco plants infected with tobacco etch virus. The results were promising: several genes related to plant response to stresses and metabolism were identified. Interestingly, these candidate genes were highly connected elements of the transcriptional regulatory network (hubs). These predictions suggest new experiments that may contribute to develop in a future a systems medicine approach to treat viral diseases.
Two important venues will need further theoretical exploration in the future. One is the development of evolutionary theories of change where phase transition points are reached through coevolutionary dynamics. The error catastrophe (Eigen 1971) is a potential example: RNA viruses have coevolved with immune selection barriers to a critical mutation rate. In this view, evolution pushed viral quasispecies to criticality in a self-organised fashion. The evolutionary as well as the ecological dimensions should be integrated to show how particular life forms, such as multipartite viruses, can emerge. Such studies are likely to require the introduction of genomes in some explicit form. Finally, the diverse dynamical and spatial scales considered in each example naturally suggest the minimal requirements for each level and why parameters and features relevant at the lower scale can be ignored. Does it mean they are disconnected? Probably not, and the right answers will need an explicit description of complexity that includes several simultaneous complexity layers.