Quantitative Network Organization of Interactions Emerging from the Evolution of Sequence and Structure Space of Antibodies : the RADARS Model

Based on the key molecule of humoral adaptive immunity, the antibody, evolution of the system comprises molecular genetic, cell biologic and immunologic mechanisms, and as a network the system is likely governed and can be characterized by physical rules as well. While deep sequencing can provide vast amounts of data related primarily to clonal relationships, functional interpretation of such data is hampered by the inherent limitations of converting sequence to structure to function. In this paper a novel model of structural interaction space, termed radial adjustment of system resolution, or RADARS, is proposed. The model is based on the radial growth of resolution of structural recognition, corresponding to increasing affinity of immune reactivity, and the virtual infinity of directions of growth, corresponding to the ability to respond to almost any molecular structure. Levels of interaction strength appear as shells of the sphere representing the system. B-cell development and immune responses can be readily interpreted in the model and quantitative properties of the antibody network can be inferred from the physical properties of a quasi-spherical system growing multi-radially. The system is described by double-Pareto distribution, sampling the lognormally distributed equilibrium constants at a rate of phi square. Finally, general strategies for merging antibody sequence space into structural space are outlined.


Introduction
Appearance of complex multicellular life was accompanied by the evolution of a system that maintains cellular and molecular integrity in the host organism [1].The adaptive immune system is a complex system in the physical sense, being composed of a vast number of cells that engage in interactions, self-organize and -most impressively -adapt to the molecular and cellular environment.Technological advances now allow us to measure and characterize this complexity in ever growing details, at the gene, transcript, protein and cellular levels, driving the field of systems immunology [2].
The vast amount of data generated requires not only data storage and analysis capacity, but also theoretical frameworks, models that simplify data organization and systems level interpretation.
Humoral adaptive immunity comprises the cells and mechanisms that lead to the production of antibodies.In human adults B-cells develop in the bone marrow throughout life and build up a system of effector and memory cells, which accumulate as a lifetime of immunological experiences.Continuously emerging naive B cells only differentiate further if selected for immunological actions based on their B-cell antigen receptor (BCR) specificity.Because this specificity is genetically coded in the individually rearranged immunoglobulin heavy and light chain sequences, it is possible to capture the antibody repertoire in a given sample of B cells.Deep sequencing or next generation sequencing is capable of generating sequence data of antibody repertoires with varying resolution and length [3][4][5][6].
It is also possible to profile the antibody repertoire functionally, based on the identification of antibodies binding to huge sets of potential targets [7].This approach is biased by the fact that a priori knowledge of targets is not always possible and only those antibodies that bind to the tested antigens are identified.Antigen microarray assays are useful for the focused analysis of antibodies related to allergy, autoimmunity, infection or cancer.Such functional analyses provide a more meaningful profile in the immunological sense and if carried out from blood it is less prone to sampling error than cell-based sequencing approaches.
In this paper, following a brief introduction to the sequence space of antibodies, a model for the molecular organization of antibody structure space or interaction space is proposed.The model builds on the generalized quantitative model of antibody homeostasis [8][9][10] and provides a quantitative network description of the humoral immune system.

Antibody clonal network representation in sequence space
Sequence space in the biological sense is a theoretical space comprising collections of nucleic acid or protein sequences of interest.We usually talk about protein sequence space and define what protein sets are involved (proteome of a given species, cells, etc.) and whether any restrictions hold (fully random, functional, identified, etc.).An amino acid sequence with a given length 'd' and full randomization with 20 amino acids occupies a sequence space 20^d (Fig 1A).An exact sequence with no ambiguity defines an exact position in sequence space; moves in this space are discrete steps along a given dimension.As the figure suggests, it is impossible to visualize high-dimensional protein space in 2D.Exponential growth is incredibly fast, leading to the generation of vast amounts of space in high dimensions.
It is accepted that only a fraction of all theoretically possible sequences are thermodynamically stable and protein evolution can be interpreted as a search for acceptable and functional structures in sequence and structure space [11].Thinking along these lines, the evolution of antibody binding surface, the paratope, is a search for the thermodynamically stable sequences and the selection from among these the ones meeting immunological criteria for B-cell survival.The set of viable antibody sequences, functional antibody sequence space, lies much below the theoretically possible and close to the already observed and annotated antibody sequence space [12].
Collections of antibody protein sequences obtained by translating DNA or RNA of deep sequencing data ideally span the whole variable domain of heavy (VH) and light chains (VL) and can also pair these two.In such a case the gene segments contributing to the rearrangement of VH and VL can be predicted and visualized in 3D and 2D respectively, as shown (Figure 1B).A repertoire can be represented by identifying coordinates of rearrangements identified, and symbol size or color can represent segment frequencies [13].While the use of gene segments for classification allows tremendous reduction in dimensionality, it is not best suited for functional network analysis, where the use of complete rearranged and mutated sequences is preferable [14].
In a much simpler approach, heavy chain CDR3 regions only are used as an estimate of diversity.Though this region is often regarded as being most important for determining specificity, it has been found to be present in functionally unrelated cells and therefore seems insufficient for functional classification [15].Selection of the pre-BCR bearing cells depends on signals that may be triggered by ubiquitous ligands present in the bone marrow microenvironment.The presence of uniform reactivity against such common public self-antigens may lead to the positive selection of CDR3 with similar binding properties, and thereby similar sequences.
Whatever the depth and methodology, the clonal relationships of sequences can be used for the construction of family trees, often displayed in circular forms.Assuming that heavy chain V segment-derived protein regions (H-CDR1, H-CDR2) have a dominant role in target recognition, these trees usually start classification with V, clustering clones with common V use [16].This practice again may be misleading in a functional analysis, since the noted assumption may not necessarily hold.The use of the complete VDJ-H sequence as a first stage classifier, followed by VJ-L use could better reflect the natural development scheme of B cells (Fig 1C).

Antibody interaction network representation in structural space
In contrast to this graded qualitative scheme, which may well serve the purpose of tracking peripheral clonal expansions accompanied by affinity maturation, a quantitative scheme should place genetic changes into structural rather than sequence space.
While sequence can be defined with various levels of certainty of an amino acid occupying a given position in the sequence, molecular structure can be defined at various levels of resolution.As we are talking about antibody molecules structural resolution is on the atomic scale, crystal structures define atomic coordinates on the Angstrom scale.The binding site of an antibody can also be characterized by the surface area that comes into close contact with the antigen [23,24].Water molecules are displaced from this area as a function of the goodness of fit.The so-called buried surface area (BSA) is therefore a good predictor of binding energy of protein interactions [25].Another measure of goodness of fit is the decrease of free energy of the antibody molecule upon binding.All these approaches are correlated: higher resolution "description" of a structure by the antibody corresponds to greater BSA and to a higher binding energy.The advantage of using thermodynamic description for the characterization of structural resolution is that it conveys the sense of function: higher binding energy means higher affinity of antibody to target.Besides defining resolution, which is a general descriptor, the identification of a given structure requires the description of shape.The higher the resolution the more information is required for defining shape.
Going in the opposite direction, by decreasing structural resolution we can reach a theoretical minimum.At this minimum all structures are defined by a single quantity of interaction strength.Let this quantity be 1, so that an equilibrium constant with the value 1 can represent the lowest possible resolution.At this minimal resolution binding to any target is possible and can be interpreted as minimal random contact upon collision of molecules.By starting to increase resolution we shall be able to distinguish between different shapes, the higher the resolution the more shapes becoming distinct.Because of the immense complexity of a binding area the distinction between all possible shapes at high resolution would require a multidimensional space.Let us gradually generate a multidimensional space by considering a point of origin from which a particular direction represents a particular shape.In this representation the extent by which we leave the point of origin corresponds to the extent of precision by which we can define the direction.Thus, going away from the minimal resolution of 1 we can define shape at gradually higher resolutions.(Fig. 2A,B).Different levels of resolution, that is different levels of binding energies appear in our scheme as shells of a sphere.Theoretically the number of directions originating from a single point is infinite, so the shapes available in this representation are also infinite if we go far enough.Practically, considering a reversible interaction, the resolution is limited by the binding energy of reversible interactions.Beyond this limit structure of the binding partners becomes fixed, rigid, corresponding to covalent binding.This model of the evolution of interactions of a system we shall call 'RAdial ADjustment of System Resolution' or RADARS in short.The abbreviation intentionally reminds of radiolocation, where emitted electromagnetic waves interact with objects in their way and are reflected to provide a virtual image of the surrounding.The RADARS model implies that elements of the growing system interact with the surroundings, gaining information and adjusting system growth accordingly.
Immunological interpretation of the model requires us to fit B-cell development and antibody evolution into this interaction space.We shall assume that a common lymphoid progenitor has the potential to generate any and all functional VDJ-VJ sequences and therefore to produce via sequential differentiation and maturations steps antibody against any and all targets.By functional VDJ-VJ sequences we mean all sequences that are physically and biologically viable.This means thermodynamic stability (able to fold into a structure compatible with the Ig domain), ability to pair, forming a VH-VL functional binding unit, and ability to sustain a B-cell via delivering survival, differentiation and proliferation signals.
A differentiation step that reduces this total potential introduces restrictions in structural space.This will appear as an expansion along a radius that point towards the cognate target.Expression of the surrogate light chain (SLC) marks the first step towards BCR formation.These pro-B cells represent the founders of all B cells (Fig. 2C).While signaling via the SLC may be required, it is uncertain whether binding of SLC is required for further differentiation, therefore we assume that these cells seed the complete antibody interaction space.Rearrangement of the heavy chain introduces a structural restriction: a particular functional heavy chain variable domain (VH) sequence has a limited range of targets.Pre-B cells displaying the pre-BCR composed of VH-SLC pairs will divide until and as long as ligands (antigens) are available.Cells with different VH sequences will populate the structural space and share this space according to the availability of target.Cells with more abundant targets expand more, cells with less frequent targets remain in lower numbers, until optimal BCR engagement is achieved [8].As a result, structural space as represented by this resolution will be filled with different pre-B cell clones according to the availability of the common self-antigens.
The next level of structural resolution, as well as the next level of restriction of binding space, comes with the rearrangement of the light chain.Individual pre-B cells rearrange their light chains independently and randomly.Therefore, all pre-B cells reserve a particular area on the next level of structural resolution.The size of this area again will correspond to the nature of the rearranged VL domain, with those finding more available targets expanding more.The pool of immature B cells fills thus the outer level of resolution in the bone marrow (Fig. 2C). Figure 2. Quantitative interaction space and visualization of antibody network relationships A) The system of interactions has a center as a reference point.Structural diversity appears as directions from this center towards the target.Distinct directions can be defined with a precision dependent on the distance from the center, equivalent to the radius of the sphere representing the system.The longer the radius of a vector defining a direction, the higher the resolution of the system in that direction.Multidimensionality is theoretically infinite in this representation, practical limits being introduced by resolution.B) Structural diversity appears as we leave the center, spherical representing various levels of structural resolution.Structural resolution corresponds to goodness of fit or binding affinity.Thus, the further we go outwards the more detailed structural recognition becomes.The model allows the theoretical quantitative organization of antibody evolution or molecular evolution in general.C) Systemic organization of antibody evolution.The evolution of the system of antibodies can be interpreted as clones filling the interaction space at various levels of resolution.Along this pathway cells continually increase their specificity and affinity towards their target direction.A common lymphoid progenitor has the potential to develop antibody against any target, equivalent to a radius of 1. Differentiation proceeds outwards from this point of origin, the center of the network.At every level of differentiation asymmetric divisions fill the level with a clone, expansion being dictated by antigen driven selection of clones with ideal receptor engagement.Accordingly, pre-B cells expand based on the ability of rearranged HVD to pair with SLC and signal for clonal expansion.The generated cells rearrange their LVD and exit bone marrow.Primary TD responses and TI responses expand B2 or B1 cells without further evolution of antibody binding.TD responses allow further directed differentiation in germinal centers.B2 cells are continuously generated and only survive if recruited for specific immune responses.B1 cells, on the contrary, survive in an activated state producing antibodies and dividing rapidly upon activation.Effector cells from clonal expansions can establish long lived plasma cells if they arrive in the required niche.
Different colors stand for structural differences and relationships.
Taking a somewhat unique route of differentiation are the B1 cells.These cells seem to generate antibodies that keep B1 cells in a continuous state of low-level activation.This may reflect their ability to respond to soluble, highly abundant antigens [8], or an intrinsic ability of the BCR of sustained signaling, perhaps due to structural properties [26].In any case, B1 cells represent a stable population with the highest affinity towards self and non-self, which is achieved without affinity maturation.Meanwhile B2 cells are continuously generated but die in a few days unless recruited as effector cells for further differentiation (Fig. 2C).
To allow for an even balance between differentiation and divisions we can employ a Fibonacci branching scheme in our model: every division yields a cell that can divide further and another one that differentiates (Fig. 2C).Assuming equal time and energy investments for division and differentiation this scheme results in a Fibonacci series number of cells after every time unit.In our model dividing cells remain on the same level of resolution, differentiating cells move outwards if interaction affinity grows.The scheme is modified by the selection steps though, dictated by the continuous selection of BCR reactivity against target.As a result, the distribution of cells in interaction space will be determined by the rate of division and differentiation along with random sequence generation pruned by interaction-based selection events.
As implied above, selection in the bone marrow is a passive process: randomly generated sequences find their positions in the structural space, expanding according to the availability of fitting target structures.As a result, the emerging population of immature B cells will bear the antigenic signature of the bone marrow environment.This can be interpreted both as deletion of highly selfreactive clones to prevent autoimmunity [27], and as selection for mildly autoreactive clones to help homeostatic antibody functions and setting a reference for recognition of non-self [28].
The development of cells reacting with antigens that are only temporarily present in the host presents the risk of investing energy into clones that will become useless once that antigen disappears.Therefore, such clonal expansions beyond the border of self only takes place when 2nd signals inform the host of danger.This is the development of an immune response, aided by various molecular and cellular help signals.Thymus independent and primary thymus dependent responses expand populations of B cells without improving their affinity, thus keeping them on the same level in the interaction space.Thymus dependent responses aided by helper T cells lead to the formation of germinal centers where affinity maturation takes place.This adjustment of affinity focuses interactions into a particular direction, shape in our model, leading to increased resolution of interaction space only in that direction.In this region of interaction space, affinity and corresponding resolution of target is high, but once target is cleared cells go into dormancy.These are the memory cells that conserve the genetic information acquired during affinity maturation but minimize their activity: no divisions, no antibody secretion; remaining guards against future encounter with the same or similar target (Fig. 2C).Another set of cells that remains after target has been cleared are plasma cells, which presumably arise from effector cells finding appropriate niche for survival.

Characterization of the antibody interaction network
What is the use of devising a theoretical space for the description of antibody evolution in the host?Functional antibody sequence space might be small enough to be handled by today's databases and tools.With the growth of interaction databases and structural biological data we might simply merge these sets of data and be able to predict binding specificity and affinity from sequence, so why introduce another model?
The main reason could be the rational organization of all those data.By following the natural flow of events of antibody evolution, by understanding the natural organization of sequences we should be able to simplify and rationalize our databases and tools.The model presented here can be developed into a quantitative network and mathematical tools applied for the analysis of the system.
The biological aspects of the RADARS model are summarized in Figure 3A.Expansion and differentiation of cells originating from CLPs create a cellular repertoire in the bone marrow, then further expansions and differentiation steps increase this repertoire and supplement it with antibody secretion.Events in the bone marrow represent self-organization because of control mechanisms involved, events in the periphery are more stochastic because interactions with the outside molecular world takes control.To describe the network of antibody interactions we follow convention and generate a diagram with the most central node to the rightmost position on an axis representing connectivity of nodes (Fig. 3B).Thus, CLPs, cells with the potential to generate all further B-cell developmental forms are regarded as the central node of the system.The radius of the network grows leftwards in this representation, corresponding to decreasing KD and free energy changes, that is increasing affinity and interaction strength.A) Biological network structure of the humoral immune system, shown as a cumulative distribution function of evolving antibody clones.Self-organizing system in the bone marrow meets environment in the periphery to maintain immunological self and generate immunological antibody profile.‚r' represents the radius of the system.B) Complementary cumulative distribution of antibody clones as a function of equilibrium distribution, affinity, free energy of binding and network centrality.Based on the radial growth interaction network model the network structure of the humoral immune system can be inferred.Network nodes are functionally identical B cells, identically extending into the direction of cognate target structure but may comprise distinct sequences, as long as binding is identical.*somatic hypermutations; CLP, common lymphoid progenitor; immat., immature; B2eff, effector B2 cell; Bmem, memory B cell; PC, plasma cell Deduction of network properties requires the knowledge of distributions of nodes and their connectivity.We shall make two assumptions here regarding these parameters.
Universal laws governing the distribution of binding energies for biomolecular recognition have been uncovered by Zheng and Wang [29].Based on energy landscape theory they suggest that binding free energy follows a Gaussian distribution, while the equilibrium constant KA follows a log-normal distribution, accordingly.In our model directions correspond to theoretical targets: molecular shapes.If all these targets have binding energy distributions as predicted by the universal distribution model, then our interaction space will represent a collection of these statistical distributions.Mean binding energy, the energy gap between native configuration and the average unbinding states, corresponds to the average binding energy of antibodies.This value is around -30kJ/mol and is generally regarded as the lower limit of BCR sensitivity.
The RADARS model suggests that the greater the resolution of structural recognition the more restricted the number of shapes recognized.The further the system grows in a given direction the more focused is the recognition.We proposed above that a balanced growth of the system is only possible if cellular division and differentiation are in perfect harmony.Here it is further proposed that the generation of a network with such radially adjusted resolution is possible if rate of node generation is defined by the golden ratio, phi.By increasing radius length with phi, the surface area of the sphere increases proportionally to phi square.Since phi square is equal to phi plus one (one of the special properties of phi), it is possible to promote cells from the previous interaction level (r x 1) and add new cells in proportion to the rate of increase (r x phi).A perfect balance between exponential growth of cells (nodes) and of radius is established, at the same time the number of links (clonal relationships) is determined.
The combination of a lognormal process and sampling at an exponential rate is defined by the double-Pareto distribution, as suggested by Reed [30] and further developed by Mitzenmacher [31].
Based on these mathematical approaches, probability distribution function of the antibody interaction network according to the RADARS model is given by: where r is the radius of the system, the equilibrium dissociation constant normalized to system size and binding energy distribution.Here we shall call this the system equilibrium distribution constant Ksys=  − DG°D ° Ksys describes the molecular behavior of a component of the system based on the relative affinity of the component compared to the mean affinity of the system, and stands for the probability of the component to be free or bound in the system (Fig 3B).Further detailed explanation and analysis of the network and physical characteristics of the RADARS model are given in an accompanying paper (manuscript in preparation).
The immunological implication of the RADARS model is that the naive repertoire generates a low affinity image of the human antigenome available in the bone marrow environment.Such an image has been called the immunological humunculus, as an analogy to the representation of human body in the brain cortex [32].While this property of the repertoire often leads investigators to conclude that autoreactive antibodies are enriched in the naive and B1 repertoire, the practical use is quite obvious, as has been suggested by Irun Cohen [28].Telling the difference between molecular structural self and non-self can be quite easy is the difference is striking.For identifying small differences, it is useful to have a starting repertoire that represents self and then increase resolution only in regions where encountered dangerous structures show up in structural space.The importance of self-recognition has also been recognized in antibody-mediated physiological functions [10,33].

Merging sequence space to interaction space
Considering that our technological capacity is ripe for the high-resolution determination and comprehensive analysis of antibody sequence space, the question is how we could convert a dataset in sequence space into a dataset in structural, functional interaction space.
The model presented here suggests that the following key points are important steps towards fully functional models of the adaptive immune system: sequence-based clonal structure of B-cell developmental stages need to be determined comprehensive analysis of the naive repertoire and the B1 repertoire is to be determined immune-response analysis data are to be interconnected sequence-to-structure algorithms be improved and applied to comprehensive datasets [34,35] predicted structural repertoire data should be combined with HTS interaction assay and epitope database data These developments should help understanding how functional sequence space fits into structural space and interaction network (Table 1), potentially leading to the ability to model whole immune systems and simulate their function.

Figure 1 .
Figure 1.Sequence space and visualization of antibody clonal relationships A) Theoretical diversity of a sequence is determined by its length and the number of values a particular position in the sequence can take.More than 3 dimensions are difficult to visualize in 2D.An antibody Fv region of 250 amino acids has an astronomical sequence diversity if full randomization is allowed.If the sequences are exact the positions in sequence space are discrete, but there are no structurally meaningful directions or distances in the multidimensional space.B) Antibody sequences are frequently interpreted as recombined germline sequences.This approach allows the simplified display of repertoires obtained by NGS, preferably with paired heavy and light chain VD identification.Such a display of combinatorial diversity may allow the tracking of specific clonal expansions and further diversification by SHM but reveals little about the overall functional network of interactions.C) Conventionally dimensions grow outwards in dendrograms of clones.Here the potential development scheme of a given antibody clone is shown along with B-cell differentiation steps.

Figure 3 .
Figure 3. Predicted network properties of the humoral immune system.