How Biological Concepts and Evolutionary Theories Are Inspiring Advances in Machine Intelligence

Since its advent in the mid-twentieth century, the field of artificial intelligence (AI) has been heavily influenced by biology. From the structure of the brain to evolution by natural selection, core biological concepts underpin many of the fundamental breakthroughs in modern AI. Here, focusing specifically on artificial neural networks (ANNs) that have become commonplace in machine learning, we show the numerous connections between theories based on coevolution, multi-level selection, modularity and competition and related developments in ANNs. Our aim is to illuminate the valuable but often overlooked inspiration biologists have provided AI research and to spark future contributions at this intersection of biology and computer science. Although recent advances in AI have been swift, many significant challenges remain requiring innovative solutions. Thankfully, biology in all its forms still has a lot to teach us, especially when trying to create truly intelligent machines.


Introduction
In 1950, Alan Turing posed the question -"Can machines think?" 1 Since then the field of Artificial Intelligence (AI) has strived to find ways to create more intelligent machines 2 . While many theories have been developed over the past seventy years, it has only been during the last few decades that many practical systems have been realised. This recent resurgence stems less from new methodologies but more from the availability of suitably vast data sets that can be used to accurately train AI models and sufficiently powerful computers able to perform this task. Together these capabilities have led to a revolution where machine learning and AI techniques are now being applied far and wide across science and with growing success [3][4][5][6][7][8][9][10] .
The most ambitious goal in AI is the creation of Artificial General Intelligence (AGI), that is AI which can learn to perform tasks as well as humans [11][12][13] . An evolutionary approach to this problem is a promising direction due to the open-ended nature of evolution 14 . Unlike other methods that rely heavily on human intervention and which result in systems with inherently limited capabilities (e.g. the manual reverse engineering of brain circuits to understand key behaviours 15 ), systems that can evolve are able to adapt and complexify themselves as needed to the task at hand. Even though these biologically inspired perspectives hold great value, biologists often do not see the links between their research and AI, or fully appreciate the contributions they could make to this emerging field.
Complex information processing and problems solving occurs at all levels of biological organisation, from signal transduction 16 , non-neural cell communication 17 , neural cognition, and collective brains 18 . In this work, we focus on how biological concepts and evolutionary principles have been applied to Artificial Neural Networks (ANNs) as they are commonly used in AI today. However, it should be noted that these ideas could be applied to all forms of machine learning including those inspired by non-neural biological systems. We demonstrate the inspiration that biology and evolution has provided AI research, some of the key breakthroughs they underpin, and future directions that will requiring bringing these fields even closer together. We begin by briefly introducing ANNs and methods from evolutionary computing that are central to many recent advances in machine learning. These offer a path towards the creation of open-ended and self-evolving systems that will be likely to be essential for AGI. We then use examples to demonstrate the core biological concepts that support recent advances in the design of ANNs, discuss specific techniques inspired by evolutionary phenomena that can address some of their inherent limitations, and look towards the possibility of using engineered biological substrates for future AI systems. We end by highlighting some of the next steps and the important role that biologists can play in developing the intelligent machines of tomorrow.

Artificial neural networks
Human brains consist of an interconnected network of billions of neurons from which our intelligence emerges (Figure 1a). ANNs are a highly simplified model of these real neural networks 19 . They consist of layers of nodes used to represent neurons and directed edges of varying weights to capture the strength of synaptic connections between these ( Figure 1b).
In general, they are structured so that an input layer connects to numerous hidden layers (that are not observed by a user) before an output layer is reached. Each node transmits its current state (i.e. activity level) to neighbours and these transmissions are modulated according to the edge weight between the nodes. Each node then integrates this information (generally 3 summing all input states) and updates its state according to an activation function. This commonly takes the form of a sigmoidal shaped function that has an output between 0 and 1 capturing whether the node is inactive (state = 0) or firing (state = 1). This state is then transmitted to the next layer by the same process until the output layer is reached.
There are many ways that information can be encoded as input to an ANN and the types of output it can produce. When analysing images, it is common for input nodes to correspond to the intensities of individual pixels and for output nodes to represent various classifications such as 'cat' or 'dog' that are active depending upon the contents of the image.
In this scenario, the ability for the ANN to correctly recognise which animal is present in an image is acquired through learning and, like in real brains, involves the adjustment of the edge weights. Correct outputs result in a strengthening of specific connections similar to how intrinsic reward systems modify synaptic weights in animal brains 20 . With sufficient training, an ANN can reliably label pictures of dogs and cats it has never seen before.
The precise mechanism of adjusting synaptic strengths in brains is still an open research question but the dominant method in ANNs is through the supervised method of back propagation of errors, often referred to simply as backpropagation or gradient descent 21 .
Backpropagation works by comparing the output of an ANN to its expected value to calculate what is termed the loss function which is fundamentally a measurement of error. The backpropagation algorithm moves backwards from the output layer to the input layer adjusting the weights at each step such that the loss function (error) is reduced. Crucially, this process can be performed efficiently by a computer allowing an ANN's weights to rapidly converge to a configuration that accurately captures the desired input-output relationship.
In addition to the feed-forward multi-layer architectures described above, other forms of ANN are possible. A prominent example is Recurrent Neural Networks (RNNs) which allow for additional connections between layers and individual neurons (e.g. feedback) 22 . These more complex architectures provide additional capabilities, such as the ability to use internal states as memory for processing variable length inputs, which has caused RNNs to be extensive use in text and sequence processing tasks. Unfortunately, backpropagation does not always work so well for RNNs as they lack a single feed-forward flow of information through the network 23 . Therefore, alternative approaches to learning are often used, such as evolutionary based methods discussed later.
The development of ANNs caused a lot of initial excitement but it was soon realised that they would have limited success in producing the complexity of human cognition. Brains have been refined over millions of years of evolution to have architectures that form an ideal basis for learning. The product of this evolutionary optimisation is the diverse range of abilities animals are born with such as walking at birth or basic capabilities we take for granted such 4 as 'objectness' (i.e. being able to distinguish objects from their environment), whereas conventional ANNs rely on learning alone, resulting in a number of deficiencies 24 .
ANNs are also susceptible to stagnation at local optima in their fitness landscape. For example, each ANN has an associated fitness landscape for the task they are confronted with ( Figure 1c). Each dimension corresponds to a different parameter which can be adjusted (e.g. edge weight), resulting in changes in fitness (e.g. improved prediction accuracy). Often, the landscape has many dimensions and a complex topology with many peaks and troughs, making an exhaustive search impossible or impractical. Training methods often involve incremental movement toward high fitness levels allowing them to converge to local optima but hindering their ability to discover global optima that would require exploring initially counterintuitive (i.e. low fitness) regions known as deceptive domains. For example, a biped controlled by an ANN where fitness is the total distance travelled will initially fall in the direction of the finish; balance and coordination appear as a deceptive domain. This particular deceptive domain can be avoided by using an easing period 25 . However, this requires reliance on human intervention and detection of the deceptive domain, which is not always as intuitive as in this locomotive example. Recently, it has been shown that local minima may not pose a problem when the ANN is over parameterised (i.e., is very deep with many intermediate layers) [26][27][28] , demonstrating that the core limitations in some learning models are still not fully understood.

Genetic algorithms
Various computer-based evolutionary techniques were developed independently in the 1960s, which today fall under the umbrella of evolutionary computation 29 . Genetic Algorithms (GAs) 30 are one of the most commonly used and work by encoding a potential solution to a problem as a bit string (e.g. 01010100011…) where each bit corresponds to the presence/absence (1/0) of a particular characteristic (e.g. a link between two nodes in an ANN). A population of solutions (e.g. strings) is generated and the highest performing are chosen to mate, intermediate performing variants continue to the next generation and low performing strings perish before the next cycle. Mating works similarly to crossing over in meiosis ( Figure 1). In the prophase of meiosis, maternal and paternal chromosomes align to form chiasmata at random homologous loci. The sequences of DNA between these chiasmata are exchanged.
In GAs, a random point along the string is selected and bits are swapped to create two new offspring. Mutations occur rarely (e.g. for 1 in every 10,000 bits for classical GAs) switching a 1 to a 0 or vice versa. Closely adjacent bits are less likely to be separated by crossing over (similar to genetic linkage), so tend to be synergistic and form partial solutions similar to genes in biology. Recombination of good partial solutions (i.e. sub-strings) enables the search to 'jump' between optima focusing in the most promising parts of the search space, while purging poor partial solutions which would otherwise accumulate 31 . Because mutations are stochastic rather than guided by a fitness gradient, and populations of solutions need to be tracked over time, GAs are computationally expensive to run. However, they are able to avoid local optima making them particularly well-suited to solving problems with complex fitness landscapes.
It should be noted, that while bit strings were originally the basis of most GAs, many modern systems that now fall under the more general umbrella of Evolutionary Algorithms (EAs) 32 make use of more complex data structures to represent a possible solution (i.e. specific ANN) and provide tailored functions that allow for mutations and the combination (i.e. crossover) of multiple solutions.

Neuroevolution
Neuroevolution is the artificial evolution of ANNs where edge weights are determined by EAs instead of backpropagation. Specifically, an EA is used to evolve a population of bit strings which represent the connection weight matrix of an ANN. The fitness of each bit string is determined by the performance of their resulting ANNs at a given task. The field emerged in the 1990s 33 and EAs are now argued to be a competitive alternative to backpropagation 34 .
Neuroevolution became popular due to the development of the NeuroEvolution of Augmented Topologies (NEAT) system 35 . In NEAT, ANNs evolve from a minimal architecture and, in addition to adjusting their weights, mutations can affect the topology of the network by adding and removing nodes and connections. This allows optimisation to occur simultaneously with complexification, mirroring how our brains have evolved from primitive smaller ones or how a mature brain develops from the limited capacity of an infant brain. Complexification enables the elaboration of existing optimised features so enters larger search spaces in more promising domains than would otherwise exist with a large, fixed topology. Structural elaboration is often initially detrimental to ANNs and takes multiple generations to optimise.
So, to prevent new innovations being lost due to this initial loss in fitness, competition is limited to structures which emerge at the similar stages via a process called 'speciation'. This enables the preservation of innovations with future potential, further enabling exploration of deceptive domains. NEAT significantly outperforms neuroevolved ANNs with a fixed topology in pole balancing 35 , a standard benchmark learning task. In addition, the evolution of topology eliminates reliance on human design prior to evolution which requires insight that may not always exist.
Just as pure learning has its deficiencies, so does pure evolution. As discussed previously, using hill climbing techniques such as backpropagation to adjust ANN weights enables a local optimum to be efficiently reached but the global optimum may never be 6 discovered. In contrast, pure evolution is more likely to locate the region of a global optimum but won't necessarily locate its peak because it does not directly follow a local fitness gradient 36 . As such, these approaches are complementary, just as in nature. Evolution shapes the fundamental innate behaviours of an organism based on the experiences of ancestors, yet the precise conditions and events experienced by each individual in a lineage will differ subtly.
Learning operates on the innate framework of behaviour to optimise it specifically to the individual's unique life experience. In other words: evolution discovers the fittest innate behaviour, and learning optimises it so it can be best used. Coupling evolutionary computing with pure learning, such as via neuroevolution with backpropagation, strengthens this analogy with nature and exploits the best qualities of each approach. This may be done using neuroevolution to evolve an ANN structure and training with backpropagation to optimise the edge weights, or perhaps by alternating between the two or a combination as in nature 37 .

Evo devo and indirect encoding
Modern ANNs can contain billions of connections. If using neuroevolution, each ANN would need a string of at least an equal length and as a large population of these ANNs is also required, the space to store each iteration of the evolutionary process would become prohibitive. In contrast, the human genome sequence holds approximately 1.5 GB of information, which is not enough to directly encode each of the trillions of synaptic weights in the brain, let alone the entire organism. Despite this, it is possible for the genome to encode the phenome because it does so indirectly. The full connective architecture of the brain is not Stanley and Ambrosio incorporated this property of indirect encoding for ANN parameters which they called Hypercube-based NEAT (HyperNEAT) 40 . Unlike earlier where NEAT is used to evolve the ANN directly, in HyperNEAT it is used to evolve compositional pattern producing networks that generate highly regular connection topologies 41 . These patterns are then used to encode the parameters for the functional ANN (Figure 3b). The result is an ANN that can efficiently exploit regularities in a problem. For example, HyperNEAT can automate the evolution of highly coordinated four-legged gaits which requires genetic reuse for each leg 42 . Previous work has demonstrated that indirect encoding can be used to produce coordinated movement 25 , whereas when limited to direct encoding this is only possible by simplifying the task by manual decomposition 43,44 . Additionally, as HyperNEAT represents the problem geometrically, solutions are scalable without further evolution.
Therefore, high dimensional tasks can be tackled using a low resolution/complexity system making them computationally cheaper to train. As the parasite evolved, the sorting problems it produced gradually increased in difficulty. This gradual increase in task difficulty enabled more efficient learning and discovery of better sorting strategies. Increasing the difficulty of a task over time has also proven a useful approach in training ANNs, helping to avoid deceptive domains in a technique known as 'incremental evolution' 55 . Often it involves manually designing progressively harder tasks for the ANN, which is laborious. However, the same process occurs automatically in simulated arms-races due to the simultaneous evolution of opponents.

Complexification through development, competition and evolution
9 Multiple selection pressures enable exaptation, where features which were selected for under one pressure can be adapted for by another 56 . For example, feathers initially evolved for warmth, then for gliding after jumping from a height, and eventually for flight 57 58 . In their experiments, artefacts of increased complexity arose when exposed to selection pressures besides that for task performance, again highlighting the deficiency of using only a single selection pressure to guide evolution.
The evolution of the brain, from smaller primitive brains to complex ones, inspired the opens up a vast possibility of phenotypes to be explored compared to using a single fixed morphology 59 . As mentioned earlier, one way to capture this feature in ANNs is by employing CPPNs which enable the structure of an ANN to dynamically evolve and explore potentially useful topologies for a task at hand 60 .
Complexification occurs not only gradually with evolution but also due to learning within the lifetime of an individual. It is well understood that humans and other animals learn most efficiently if they begin with simpler tasks and gradually expand their knowledge and skills 61 .
For example, musical students begin practice at grade one and slowly improve their skills before attempting a grade eight piece. This is so effective in producing complex cognition that it is effectively 'built in' to cognitive development: the initial cognitive limitations and slow development of some animals, particularly humans, is thought to be adaptive to facilitate incremental learning during early life, rather than just being an artefact of incomplete development 62 . For example, infants are only able to focus on objects ~10 cm away, allowing them to learn from simpler sensory information without the complication of size inconsistency and distance. This form of gradual learning can be applied to AI too. For example, two purelearning ANNs which play Go against one another both begin at as novices and gradually become more skilled, learning better strategies faster than if they were trained against master opponents to begin with. The major benefit of self-play is that it eliminates the need to obtain human game data across a range of skill levels or the need to spend time playing against humans in real-time, which is slow and limited in scope.  The equivalent of such a division of labour in computing is the functional modularity that computer software and hardware often exhibit. From an engineering perspective, modularity is useful because components can be replaced, upgraded, or reused more easily without having to redesign or construct an entire system from scratch. In AI, modularity is also important because it can simplify a problem by decomposing a complex task into smaller ones that are easier to solve. Many types of AI exhibit levels of modularity which are analogous to biological ones, such as in GAs, ANNs and multi-agent systems.

Modularity, hierarchy and division of labour
As the lowest levels of biology, genetic information is encoded in linear DNA sequences composed of four bases. Although having a simple linear structure, sub-sequences encode higher-level functional elements such as genes, and these are further grouped together into chromosomes that ultimately make up the final genome of an organism. Each component in this hierarchy has a distinct role in synthesis and the proliferation of each component depends on the success of the complete phenome 74 . A similar phenomenon also occurs during the evolution of GAs where bit strings are used to encode the genome of a potential solution. Crossing over operations allow for partial solutions to become localised within a string so that they can propagate effectively. These sub-sequences can then be further combined at larger length scales establishing a hierarchy of modularity even within this simple system. By viewing a complex task as a collection of smaller interacting subtasks, it is possible to tackle larger problems in smaller chunks, facilitating search in even highly deceptive spaces. Because of this, genetic algorithms have been found to better tackle tasks with complex fitness landscapes compared to ANN-based deep learning approaches 34 .
At the neural level, brains are also modular being composed of many distinct regions dedicated to different cognitive processes. For example, the Broca's area is involved in articulation of language. Upon injury of the Broca's area, a patient may significantly struggle to speak their native language but have no problem understanding it because language comprehension is located separately in Wernicke's area. ANNs tend to be non-modular, or 'monolithic', so changes made to one part may have unpredictable knock-on effects elsewhere due to their highly integrated structure. Thus, the learning of a new skills can cause catastrophic forgetting of an ANN's previous function. In contrast, in modular biological brains, the acquisition of a new skill such as piano playing does not tend to alter existing skills such as mental arithmetic. Modular ANNs can be evolved by including selection based on a humandesigned metric of modularity 58 or by including connection costs as a selective pressure 75 .
The latter was based on the suggestion that modularity in the brain evolved initially to reduce the amount of synaptic connections to save energy 76 . Modular ANNs outperform monolithic ANNs and they are more evolvable to new tasks because they are able to reuse modules corresponding to regularities in the task. They also suffer less from catastrophic forgetting 77 .
At higher levels of organisation, social groupings are common between organisms (e.g., parental care, herds and flocks). There are many benefits to group living. An extreme example of this are the eusocial insects whose colonies form what are termed a 'superorganism'. These colonies are composed of distinct castes -such as queens, workers, and soldiers -which are morphologically and behaviourally specialised to their task. In AI, a problem can be tackled using multiple cooperating agents if fitness can be measured by the sum of their behaviours. For example, OpenAI created a human-relevant example of how this works using a game of hide and seek 78 . Pairs of seekers played against pairs of hiders where each individual is controlled by its own ANN. The hiders worked together to build a shared wall around themselves before the seekers finished counting. Once the seekers discovered they could use ramps to breach the wall, the hiders exhibited a division of labour where one began constructing the wall while the other confiscated the ramps -a strategy that would have been impossible by either alone due to the short countdown.
Despite the three described levels of modularity above, most of the present AI literature involves implementing only a single level within a system, which may not always enable sufficient decomposition of a task at hand. The performance of GAs, for instance, is known to deteriorate as the dimensionality of the search space increases 79 . There are some cases that demonstrate combining GAs and multi-agent modularity increases performance: Potter and De Jong coevolved multiple cooperating bit strings 79 ,and Miikkulainen et al. used neuroevolution to coevolve predators for a prey-capture task which caught more prey together than separately 80 . However, the hierarchy of modularity in current artificial systems is modest compared to that found in biology. In nature, group selection occurs on numerous levels simultaneously, establishing an extensive hierarchy of task decomposition which scales with complexity covering genomes 81 , endosymbiosis, cell compartmentalisation 82 , multicellularity 51 , tissues and organs, organisms, sex 83 , social groups 74 , and inter-specific mutualisms 84 to name but a few. In effect, a complete ecosystem also exhibits modularity through niche differentiation, which operates to reduce competition 85

Engineering biologically based AI systems
Beyond inspiration, future developments in bioengineering may allow for a radical rethink as to AI's connection to biology, allowing it to become more closely integrated within the computer science domain. Rather than running simulations of artificial neural networks on electronic computers, which are often ill-suited to the task, why not grow living neural networks into desired interconnection patterns and exploit the innate capabilities of these cells for learning and computation? Some progress has been made on this front 89 and while we are still far from understanding the molecular complexity of living neural networks, this does not prevent us using some of their capabilities in new ways. The field of synthetic biology is a great example of this, where simple biologically based systems are built using biological parts (not always fully understood) as a means to repurpose them in new ways 90 . This has led to many breakthroughs and the creation of modified living systems able to convert waste into valuable chemicals and materials 91,92 , act as new forms of therapeutic to treat disease 93 , and even perform novel distributed computations [94][95][96] .
There have already been several attempts at creating systems that can learn by implementing AI models using simple biomolecular parts. Examples include the creation of a four-input perceptron based on metabolic circuits as a basis for more advanced sensing applications 97 and the use of DNA stand displacement to implement basic artificial neural networks, including a Hopfield associate memory consisting of four fully connected artificial neurons 98 .
In addition to using existing models of AI as a basis for developing new biologically based system, the biological substrates themselves might also offer novel information encoding and processing capabilities that go beyond that which is easily achievable using electronic computers 94,[99][100][101] . This would provide a chance to potentially rethink how AI is implemented and spark creative approaches that harness the unique qualities of biology.
There has even been some speculation as to the 'cellular supremacy' of biological systems in terms of their potential computational power compared to their classical electronic counterparts 99 . For example, key features in biology such as the use of continuous and noisy signals, multiple levels of organisation/hierarchy, and the innate ability of biology to adapt and evolve may be key ingredients for implementing intelligence, but which are difficult or potentially impossible to implement efficiently using classical computers.
Using biology as a substrate for AI may also offer new ways to tackle other challenges.
Biological systems are nearly always highly robust to the failure of individual parts, are energy efficient, and are self-replicating and repairing. Existing electronic AI systems can be difficult to deploy because of the computational power and energy requirements they are generally fragile to hardware failure, placing concerns over their safe deployment. Using engineered biology to implement AI systems could open avenues for systems that can be deployed in virtually any environment being able to exploit local resources to generate energy and including processes to continually monitor and repair faulty biomolecular components implementing the AI system. They even could be designed to exploit evolution itself 102 and create truly autonomous and self-adaptive systems.

Conclusion
The prospect of building AGI depends upon our ability to create systems which are able to work in an autonomous, innovative and open-ended way -much like biology. Here, we have shown how some of the core characteristics, concepts and theories underpinning biological systems have greatly influenced the design and evolution of ANNs to date (Figures 2 and 3).
But this is only a fraction of what might be possible. Biological systems evolve, function, and learn across many levels of organisation simultaneously and in concert with their changing environment. Seeing an AI system not as a single isolated entity, but instead as a computational ecosystem of specialised modules working across scales in a dynamic and multi-faceted environment, could be an important step towards realising some of the additional capabilities we crave.
While our focus here has been on ANNs as a basis for intelligent machines, the ability for single cells and vast collectives to display intelligent behaviours mean that future inspiration for AI could come from our emerging understanding of computation in developmental biology 103 , swarm cognition 18 , as well as information processing in chemical reaction networks 100 and evolving populations 104 . In fact, the more we investigate biological systems, the more we realise the diverse and unintuitive ways information is encoded and processed by biology that goes beyond the capabilities of modern electronic computers. Perhaps concepts such as 'liquid brains' 105 and knowledge regarding distributed biological cognition need to find their way into computer science for progress to accelerate?
Discovering the ingredients needed to create a true AGI is one of the major challenges facing science today 11 . Even though AI research is often perceived as computer science, biology has provided important inspiration for many modern approaches and may form an important substrate for building the possibly "living" AI systems of the future. Horizontal axes correspond to two adjustable parameters (P1 and P2) and the vertical axis is the associated fitness. This landscape has multiple fitness peaks meaning that incremental improvement from some starting point (i.e. gradient ascent) may lead to a local optimum rather than global one (red arrow). Fitness landscapes tend to have many more dimensions. For example, the ANN in panel B has 12 adjustable parameters (edge weights) so its fitness landscape would have 13 dimensions.