Non-Markov Stateful Evolutionary Games

A evolutionary game is introduced which explicitly models states and actions in the strategies of the organisms of the evolving population. The game principally features actions that result in demographic flow between states that may not conserve organism numbers. The game’s formalism is expounded and the nature of the game’s equilibrium is discussed. This discussion leads to an algorithm for numerically determining the stable equilibrium points which is exemplified in the context of a modified Hawk-Dove game. The game’s flexibility for modeling population dynamics is discussed in the context of other evolutionary games. Data Set License: CC-BY


Introduction
Biological species are well recognized as being engaged in an evolutionary fight-for-survival and Game Theory has been used to analyze the strategies in such a fight.This analysis is the defining feature of Evolutionary Game Theory, whose many features and concepts are often credited to John Maynard Smith and George R. Price [1,2].Evolutionary Games can have different forms but one of the most standard evolutionary game-forms concerns the continuous growth/decay of organism types; the organism types are defined by the strategy they play as they are continuously randomly paired to participate in a simultaneous symmetric two-player game where the expected payoff determines each participant's growth.[3] Evolutionary game theory has been a valuable tool in characterizing the dynamics and interactions among biological species and it also has proven utility in other fields which feature evolutionary dynamics.Some examples of phenomena which have been modeled via Evolutionary Game Theory include: altruism, empathy, human culture, moral behavior, private property, proto-linguistic behavior, social learning, societal norms, personality and mating-dynamics [4][5][6][7].
However many organisms exhibit behaviors which are coupled with the state that they are in, and this interaction is not often modeled using the most standard evolutionary game theory.Some canonical examples include the behavior of perspiring with increasing body temperature causing dehydration, foraging behavior with hunger signals causing food shortage, sleep with ambient light levels causing vulnerability to predation, or hibernation with the change of season causing hunger.
Within this paper we will discuss the nature of 'state' and develop a evolutionary game for modeling this dynamic.We also discuss a notion of equilibrium in the context of the game and give an algorithm which solves for the game's equilibria.We give an example of the game as an extension of the classic Hawk-Dove game [1] and we also briefly compare how our game's dynamics compares with the games of others.

Motivation
A concept that is sometimes referred-to in the computer modeling of evolution is of 'digital organisms', defined as interacting computer programs designed to test or verify hypotheses about evolution.[8][9][10] These computer programs may feature a range of different qualities such as simulated position, communication, movement, competition and sexual reproduction and can be self-replicating and/or mutating in-memory.There exist well-developed software platforms to simulate the evolving populations these digital organisms (such as Avida and Darwinbots 1 or historically Tierra [11]).The use of digital organisms as an avenue of exploring the dynamics of life is associated with the field named 'artificial life' or sometimes abbreviated as 'A-Life' [12] [13] Within the configuration of these software platforms the parameters of the simulation are specified.These parameters can be grouped as follows: • The potential States that the organisms can be in, such as their position on a grid or graph, internal conditions such as body temperature, the states of their sensory inputs such as heat or light, their social relationships such as being single or conjoined, as well as the potential states of their own memory.
• The Actions which the organisms can execute, such as 'reproduce', 'move', 'eat' and 'sleep'.These actions may only be available to some states and not others, for instance, as an organism may not be able to execute action 'eat' when it is in a state of being 'full', or 'sleep' when in a state 'in daylight'.• The Code or Strategy is private to each organism in the simulation, and simply determines what action/s the organism does depending on the state it is in.• And the Consequences of the actions of the organisms.This can be as simple as a transition between state, for instance the action 'move-left' might change the state executing organism to 'position -1'.And this may be contingent on the state and actions of the other organisms, as 'move-left' might only succeed if there is not another organism in 'position -1' doing action 'move-right'.Alternately actions such as 'reproduce' may add or remove organisms from the simulation entirely and also add random changes into the code of the organisms.
As these simulations progress it is possible to analyze the changes in the population and the code of the evolving organisms.Indeed it is possible to watch time-lapse videos of swarms of these digital organisms undergo the evolution of interesting behaviors in silico.
Unfortunately the results of these simulations are seldom very deterministic and a badly tuned simulation can easily be devoid of any interesting results.However recent work has been conducted to incorporate some of this potentially complex state-action dynamics into the mathematical framework of evolutionary game theory.It is the objective of the paper to attempt to extend it yet still.

Related Work
A relatively simple example of in which state can be seen in the literature of game theory is in the context of the famous Iterated Prisoners Dilemma.In the Iterated Prisoners Dilemma successful strategies such as 'tit-for-tat', make the player's choice of action a direct function of the memory of previous interactions.It is possible to conceive of the tit-for-tat program (as it existed in the computer cores of Axelrod's famous tournaments [14]) as a particularly simple digital organism.The organism of tit-for-tat might be modeled as having had a single binary state (storing the opponents previous play) which determined what action would be executed (cooperate or defect) and together with the executed action of the opponent, have determined the consequences -immediate payoffs and state transitions.
Another line of literature in which state can be seen is in the discussion of Markov-Decision-Processes (MDPs).In MDPs there is a single player which has a set of possible states and a set of actions.The strategy (or 'policy') determines what actions the player will execute as a function of what state it is in.The actions that are executed result in probable transitions between the states of the player, and also determine the player's immediate payoffs (or 'rewards').
MDPs bear a strong conceptual similarity with Lloyd Shapley's multiple-player Stochastic-Games (SG) [15] [16] in which the game itself has a set of possible states or 'positions'.Within this context each of the finite number of players have actions which they can execute.The actions which the players execute determine the transitions between the states of the game and also the immediate payoffs to each of them.
It important to note that in these games the strategies of the participants are evaluated by the expected summations of the payoffs.
Another example in which state can be seen in literature is in the structures of 'Spatial Evolutionary Games' and more generally 'Evolutionary games on graphs' where the organisms have the state of belonging to nodes on a grid or graph structure.In these games the organisms at a node play actions against their nearest neighbors and/or themself and there is specified a 'who-plays-with-who' in the game structure.It can be seen that these games capture a general sense of location as a state for the organisms, and also introduces unique and dynamic behavior.[17][18][19][20] In the structure of these games it is seen that the organisms at the nodes have strategy which determines what action they execute (such as cooperate or defect).And in some of these games the success of a strategy is evaluated by the expected payoff of that the action receives against a weighted combination of the organism's neighbors.
Specifically we note that in Szabó and Fáth [14] there is faithfully detailed a large collection of these games and the reader is encouraged to compare the structure of our game with them.
In MDEG games each organism can occupy one of a finite set of states, and has actions available to it depending on what state it is in.Within the population, the organisms are paired randomly and each of the participants chooses one of their available actions (as determined by their strategy) to execute.Within this interaction the actions that the two organisms execute determine the immediate payoff to both and also the probable transitions in state that the organisms will make.In these games the expected long-term expected sum of payoffs that the organisms receives for their strategy determine the growth-rate of the presence of the strategy in the population.This growth then changes the composition of the population in which the pairings occur.
Several example MDEG games are introduced in Altman's literature including modifications and extensions of the most classic Hawk-Dove game, from which we take inspiration.[22,24] MDEG includes many features for modeling state-action interactions within evolutionary game theory and serves to provide a primary contrast for our game.
Within MDEG games, the transitions between states bear close similarity with markov transitions between state (such as embedded in MDPs), and we hope to show in this paper that by relaxing this markov property we get a game that is quite different and hopefully more powerful in its potential for representing evolutionary phenomena.
Before truly beginning the body of the document, it is worth stating that we will sometimes make cartoon illustrations.This is done in an attempt to soften the presentation of mathematical content, to highlight core concepts and to encourage the imagination of the reader -it is not intended to be belittling or making light of any content or concepts whatsoever.

Structure
The remainder of this paper is organised as follows: section 2 presents the core concept of non-markovian transmission of organisms between states, section 3 gives formalism to the non-markovian game and its algorithm, section 4 discusses the game's equilibria and gives confinement for the algorithm's equilibria search, section 5 details a Hawk-Dove game as example of the working algorithm, and section 6 concludes the paper with some general comments about the features and limits of our game.

Non-Markov transmission
Suppose for a moment that we wanted to make an evolutionary model of the behavior of an organism, such as a monkey.We might describe the monkey as having states and actions, such as 'being in a tree' or 'throwing a banana'.And these actions might have consequences on the state of the monkey and perhaps of other monkeys in our population.All of this can be described by existent models in literature, some of which are described in section 1.2.
There is a question that motivates the model and content of this paper, it is: "What do the payoff values mean?" Suppose that our hypothetical monkey 'in a tree' did action 'throw banana' at an opponent and received an immediate payoff of -1.7.What could this correspond to?If the value were very positive we might think of it as relating to how many offspring the monkey might be likely to have, or if very negative then relating to the monkeys death.Indeed it is seen that in some Evolutionary Games the relative 'success' or 'fitness' of a particular strategy is related to the sum of these values.It is also worth noting that fitness is not a very simple concept [27].We might think of a strategy as being fitter if the organism using it is more functional in its environment (conceived statically) and/or fitter if it assists the survival of the genes of its family.But if the family members are also competitors for resources then relative fitness might not be particularly easy to determine -especially if many other factors are considered and included.
In any-case, the primary drive of this paper is to attempt a game-theoretic model of evolution that bypasses these questions.At no point in our model do we specify the payoffs for actions.For instance, if a payoff of +2 is to be interpreted as 'having two offspring' we attempt to have the model facilitate that outcome directly -by actually generating two additional monkeys perhaps in state 'healthy baby' -who might then proceed to do action 'feed' which then might change daddy monkey's 'resource level' state and etc.
In order to model this effect at a population-level it is unfortunate that some of the details of between the organisms would have to be lost -such as the detail about which baby monkeys belong to what daddy.Such individual relationships would seem difficult to model without considering the particular organisms on an individual level. 2  Instead what can be modeled is the total number of the organisms in each of the potential states, and consider that the organisms that execute a particular action such as 'reproduce' might increase the total number of organisms in other states.Another instance of an action that is seen to affect the total number of organisms between states is movement.If an individual monkey executes action 'moves north' it could be seen to decrease the number of organisms in its original state and increase it in another.In this context the monkey is generally said to 'transition' between the two states.In these examples of reproduction and movement there can be seen to be demographic flow from one state into others.And since it seems odd to say that a daddy monkey 'transitions' into baby monkeys we use the term transmission to capture the broader notion. 3 The demographic flow of individuals of a species' population between states is sometimes described in ecological-studies by a matrix that is not necessarily markov.[28] The simplest example 2 of such matrices are Leslie Matrices used for studying the structure of populations of individuals transitioning between evenly spaced age-states.Leslie Matrices are square, and they have form [29]: after two age brackets, M 3 n after three, and so on.Successive applications eventually yield a steady population profile between the n i , and a constant exponential growth rate λ given by the Euler-Lotka equation.The λ is the dominant and only real-positive eigenvalue of the matrix, with the steady distribution n as its corresponding eigenvector, that is Mn = λn.
Although the elements in the Leslie matrix are positive and represent the states of organisms in the population and the transition between, the matrix isn't Markov because its columns (or alternatively rows) don't necessarily sum to one.The informal difference is that whereas in a Markov-chain matrix the elements represent the expectation of transition between states, Leslie matrix elements represent the expectation of transmission between states inclusive of such possible factors as births and deaths.We term the class of such matrices as 'transmission matrices' in this article and assert the only thing defining such matrices are that they are real, square and have non-negative elements. 4We do this because such matrices can be built more broadly than simple Leslie-matrix form [31,32]. Consider the rich interaction between organism-states captured by the matrix of transmissions for the 'Nodding Thistle' in figure 1.We might imagine that if we had a cohort of 100 monkeys in a state 'with banana' whose strategy dictated that they do action 'throw banana' that this might result in 75 monkeys moving to a state 'without banana' and 25 remaining in state 'with banana'.In this case the monkeys of this certain strategy might be described by such a matrix with 0.75 and 0.25 as elements.We might imagine that a cohort of 100 monkeys in a state 'south' did action 'attempt migrate north' might result in 120 monkeys in state 'north' and 20 monkeys remaining in state 'south'.In this case the monkeys of the strategy might be described by a matrix with 1.2 and 0.2 as elements.
It is by these transmission matrices that we are able to highlight the notion of transmission between two states as being the demographic flow of the population from one to the other.This notion forms a core concept in the next section as we formalize states and actions for the organisms and thence proceed to compare strategies in game-theory analysis for equilibria.

Description of the Game
In this section we specify the mathematical elements of our game and then proceed to give examples in order to provide more intuition to what the elements mean, afterwards we give an algorithm that simulates the organisms.
Consider an ecosystem of different species of organisms, where the organisms of each species have a distinct set of states which they can occupy.Further imagine that each state has a set of actions which an organism in the state can execute.Let: • K be a finite set of species • S be the finite set of all states • S k be nonempty disjoint subsets of states S available to species k ∈ K • A be the finite set of all actions Further imagine that each individual organism has a strategy (or a 'genetic code'), which dictates the probabilities of what action it will execute depending on the state it is in.Let: • W k be the set of possible strategies for species k ∈ K, such that for any strategy w k ∈ W k that w k a,s denotes the probability that an organism with strategy w k will execute action The elements of W k are all that satisfy the basic rules of probability: -probabilities of taking actions from any state must sum to one: The remaining elements are: is the number of organisms at time t of species k ∈ K in a state s ∈ S k which are going to take action a ∈ A k,s .5 P t,k,s,w defines a distribution of the population at time t, which may be normalized and hence represent a probability distribution or left unnormalized as representing actual numbers of organisms.The only constraint is that it be non-negative ∀t, k, s, w P t,k,s,w ≥ 0. If the probability distribution is to be normalized then the normalization can either be 'built-in' to the transmission T terms or included as a separate step in the algorithm 1 • T k,s,a (P * t ) are non-negative functions of argument6 P * t , giving transmission of organisms (of a strategy w k ) to state s ∈ S k when action a ∈ A k is executed by an organism (of strategy w k ).
• α as the proportion of the population that will take an action (or actualize it) at a time step t → t + 1; 0 < α < 1.
Once the above bullet-point elements K, S, A, T, α and initial population P 0 are given -the game is fully specified.

The meaning of the Game's elements
Some parts of this game should be relatively intuitive, such as the set of species K -which might be 'catfish' or 'dog', but may also be 'tit-for-tat' players, or 'players at a node X'.Equally the states S could be construed to be any number of things -such as 'in Europe', 'sensing light', 'remembers being cheated-on', 'shy', 'shy in Europe and sensing light', or 'in a pH 9.3'.The actions A can equally be construed any which way -such as 'travel to Japan', 'twitch left leg', 'forgive husband', 'move into sunlight', 'produce Aldosterone', or 'try to travel to Japan otherwise forgive'.Indeed it is one of the virtues of modeling in this way that a mathematical specification could be used to represent so many things. 7he set of strategies W k are the possible ways the organisms will choose actions based on their state.This could be interpreted as being part of the organisms programming, or part of its personality, perhaps as the instincts encoded in a genome, or the sequences of proteins as imprinted on a bacterial plasmid.
At any point in time the population must have a basic specification, and P t,k,s,w is that specification.It is the number of organisms of a species k in a state s who have strategy w at a time t.The primary thing worth noting here is that the actual strategies in the population need to be kept track-of.For instance, If we imagine that a population at a time t of passive monkeys 'in a tree' doing action 'throw banana' would be a very different thing from aggressive monkeys 'in a tree' doing action 'throw banana'.In this case the difference between the passive and aggressive strategies might show at a later time when the monkeys are in a different state.Another note is that the number of different strategies in a population could be very large as there is a continuum of ways that the probabilities which define the strategies can be set.
However the strategies in the population should probably not be the things in-themself which determine the immediate actions and reactions among the organisms.It is more natural to think that the actions which are executed should determine the immediate consequences for the populationand this specification is the P * t,k,s,a .P * is the specification of organisms of a species k in a state s doing action a at a time t, it is determined by P and it contains a lot less information than P.
The T k,s,a (P * t ) functions are the primary source of flexibility in the model.The concept of transmission was discussed in section 2 and describes the 'demographic flow' of individuals from one state to another.These functions give the numbers that might otherwise appear in the entries of the transmission matrices -such as the Leslie matrices.For intuition: If 100 monkeys in state 'on the ground' did action 'reproduce' which would result in an expected 75 monkeys in state 'baby' in the next time-step, then the number 0.75 would be the value of the respective T function.The T k,s,a (P * t ) are a set of functions giving the transmission to a state s by the organisms doing action a; and within the model these can be any non-negative function of the numbers P * t .For instance: the number of baby monkeys produced per time-step might depend linearly, quadratically, exponentially or even sinusoidally on the number of alligators specifically 'in the lake' doing action 'snap teeth' in that same time-step.Or as another instance: the population of monkeys in state 'blind' doing action 'go home' might transition to a number of states dependent on any number of such factors.
Finally the term α captures the consideration that we generally don't want the entire ecosystem taking an action at every single time-step.It is perceived that such a thing would probably lead to enduring (perhaps unrealistic) oscillations in the population, and having the actions staggered in this way would be expected to smooth the dynamics out -as might be thought to occur in real-world evolution.

An algorithm for the Game's process
The update of the game can be seen to consist in stages: The organisms in the population of a strategy w k have population distribution across states given by P t .α of the those individuals have w k strategy which determines the distribution of actions taken by them.The total actions taken by all strategies determines the total transmissions among the states -thus updating P t to P t+1 .The process is embedded as Algorithm 1.And that this process may or may not settle into any kind of stable equilibrium.

15: end procedure
From Algorithm 1 is noticed that if the states were indexed S k = {s k,0 , s k,1 , . . .}, that every strategy w k would have its own transmission matrix analogous to those given in section 2 (such as the Leslie matrix): Such that m l,j would be net transmission from the jth state to the lth state for the individuals that take actions.We term such a matrix the "strategy's transmission matrix".

Some bookkeeping
Before we proceed further it is necessary to make an odd distinction between a strategy and the probability numbers by which it is defined by.This distinction is important because it is possible to perform operations on those numbers such that they may represent a different strategy or possibly no strategy at all.If the actions were indexed A k,s k,j = {a k,j,0 , a k,j,1 , . . .} then the probabilities of any strategy w k ∈ W k would form an indexed set of numbers which we define to be the strategy's "terms".
Here we say that appropriately dimensioned indexed set of numbers q j,i can be the terms of a strategy if it is "implementable", which is iff the numbers could be taken to be probabilities of a strategy.ie.∀j ∑ i q j,i = 1 and ∀i, j q i,j ∈ R + {0}.
The terms of a strategy have the same size and dimensions as the terms of all the strategies of the same species.Because of this it possible to add, subtract and multiply them together element-wise to form linear combinations of them.Although it may take some thought to realize, it is the case that the result of a linear combination of strategy terms is implementable if the coefficients of the linear combination are positive and sum to unity.In any case, in the next section we will talk of linear combinations of strategies in this manner.
As a side-note: In anticipation of Appendix B we will present it here that -any strategy's transmission matrix has a form where it has columns are linear combination of column vectors weighted by its terms.Consider that if we index the T functions as y j,i,l = T k,s k,l ,a k,j,i , and if y j,i denotes a column of such terms then m l,j has form: m l,j = ∑ i y j,i,l q j,i = y 0,0 q 0,0 + y 0,1 q 0,1 + y 0,2 q 0,2 + . . .y 1,0 q 1,0 + y 1,1 q 1,1 + . . .y 2,0 q 2,0 + . . .

Searching for Stable Equilibria
In direct correspondence with common game theory language [3], it is possible to define basic relationships between the strategies.Each organism's strategy w k encodes the probabilities of what actions it will take across its states.A strategy is 'pure' if these probabilities encode certainty of taking a single action per state otherwise it is 'mixed'.Any mixed strategy can be decomposed (perhaps not uniquely) into a linear combination of pure strategies.And any set of pure strategies defines a span of mixed strategies which can be linearly composed of them.The set of pure strategies which could feature in a linear decomposition of a mixed strategy is defined as the 'support' of the mixed strategy.
If we define an 'equilibrium' as being the condition where all the m l,j transmission matrices remain constant -and an 'equilibrium point' being defined by those values.Then it is necessarily the case that an equilibrium leads to a condition where all the species and strategies that are significantly present in the population are steadily growing by the same growth-rate in steady-state (see appendix A for discussion and a limited proof).For if any organisms of a strategy existed in the population with a lesser steady-state growth-rate then it would proportionally die out, or if any organisms of a strategy existed with a greater steady-state growth-rate then it would lead the others to proportionately die out.
We further define the equilibrium as being 'stable' in a similar way to Maynard Smith [1][2][3], specifically if it cannot be disturbed from equilibrium by the presence of a small incorporation-of (or 'invaded by') any possible 'mutant' strategy.We note that this is at-least the case where no 'mutant' strategy has a greater steady-state growth-rate in the context of the population.
There is a proof in appendix B that for any stable equilibrium established with a population of mixed strategies that it is possible to establish the same equilibrium point without the mixed strategies at all.Informally the reasoning is that: because any mixed strategy is a stochastic mix of its supporting pure strategies then it can only perform as well as the best of them.And when it performs equal to the best then they must all perform equally.And in this case there is an equivalent combination of the supporting strategies which have the same state-action profile P * t,k,s,a as the mixed strategy; the same profile which defines the transmission matrices and thus the equilibrium point itself.From these considerations it is thus unnecessary to consider mixed strategies in the search for stable equilibria because every stable equilibria can be established by combinations of pure strategies alone (although there may be zero or multiple such stable equilibria between them).In this way, multiple runs of Algorithm 1 with different initial combinations of pure strategies is sufficient to determine all possible stable equilibria of the game.It is unfortunate that the demonstration for these claims seems to need to be so exceedingly mathematical.

Software Implementation
An implementation of Algorithm 1 for arbitrary configuration of Species/States/Actions using pure strategies was written in the Python programming language using Scoop and SymPy libraries for parallelisation and for mathematical expression parsing respectively.The source-code is available at https://github.com/Markopolo141/FSM-evolve/and at the time of publication consists of a small ∼400 lines.

An Example
One of the most famous evolutionary games is that of Hawk-Dove [1] which has been extended to multiple states by Eitan Altman et al [22,24].A simplified version of Altman's game (as presented in [24]) is as follows: Imagine a population of simple organisms that can occupy three distinct distinct States: Young, Aggressive Adult, Passive Adult.And that the organism's genome encodes a single probability γthat the Young will (if given the opportunity) mature into an Aggressive Adult as opposed to a Passive one.Imagine that in a distinct unit of time the Young of the population mature into a type of Adult, and also that the Adults die leaving Young offspring.That a young organism suffers a static probability C of encountering an adult before maturing and if encounters an Aggressive Adult then has a static probability of surviving D and matures into a Passive Adult, otherwise matures into an aggressive Adult or passive adult depending on its genome.That the Adults in the population of both types have offspring directly dependent on the proportion of Adults that are Aggressive p. Aggressive adults in a population of Passive ones take all resources and have expectation of two offspring, whereas in a population of Aggressive ones expend significant energy fighting for resources and have no offspring.Aggressive adults will have expected offspring 2(1 − p) Passive adults in a population of Passive adults will generally share resources and save energy by not fighting giving them 1 + A expected offspring (where A is a static number; 0 < A < 1).But in a population of Aggressive ones will loose resources to the Aggressive Adults and have A expected offspring.Passive adults will have expected offspring 1 − p + A Various observations can be made about this game: Where the population consists entirely of Passive Adults (p = 0) any organism with γ > 0 might have greater reproductive success and growth rate than the population and thus increase the number of Aggressive Adults.Where the population consists entirely of Aggressive Adults (p = 1) any organism with γ < 1 might have greater reproductive success and growth rate than the population and thus increase the number of Passive Adults.So there might be an equilibrium in the game. . .?
The demographic flow of organisms of between states can be visualized as per Figure 2. In any-case the game can be formalized as follows: • K = {b} A singular species of bird • S = S b = {y, a, p} are the states of: young, aggressive adult, passive adult where A, D, C are parameters of the game all between 0 and 1, and p = It is noted that the only state which has multiple actions available to it is y with G a and G p , and therefore the any strategy w b is totally specified once γ = w b G a ,y is specified, thus all strategies of the game can be parameterized by a single number γ, with 0 ≤ γ ≤ 1.If the states are indexed in order y, a, p and the actions are indexed in order R a , R p , G a , G p then a strategy w b where γ = w b G a ,y has transmission matrix m i,j of the form: We compared the results of the python software (of section 4.1) on the Hawk-Dove game with those obtained by mathematical analysis (as given in Appendix D) and also via stochastic simulation.
The Moran process is a very simple stochastic model of the evolution of finite populations, wherein each 'turn' a random individual is chosen for reproduction proportional to its fitness and a corresponding random individual is chosen for death, the Moran process is generally regarded as a cornerstone technique of stochastic evolutionary game dynamics.[34] A Moran process for the above game is programmed (with source-code shown in Appendix E) and the results of the Moran process against the python implementation and mathematical analysis are shown in figure 3. The figure shows the value p (the proportion of adults that are aggressive) and the proportion of young (%Y) at equilibrium against the parameter A for fixed C and D. The methods of analysis show a strong coincidence in achieving a non-trivial result.

Discussion
The game (as defined in section 3) is designed with broad features in an attempt to encapsulate a large number of potential applications.The game's elements consist of there being a population/s of entities that can be described as stateful and stochastically transmit themselves between states based on their present state and the states and actions of others in accordance with a conserved strategy of choosing actions.
In a former version of this paper there was an objecting concern: how could the organisms of the simulation ever be responsive to the actions of the others?If the strategy that is encoded into an organism determines what actions it will take based on its state alone, then how could that action ever change?And thence from this, how could the organisms even have the most basic intelligence?
It is a good question, and a tentative answer is by allowing to organisms to execute actions that are responsive to the actions of others.Probably one of the most basic examples is the tit-for-tat strategy in the iterated prisoners dilemma, where the strategy changes the action played (cooperate or defect) based on the last play of the opponent.In appendix F we flesh out another example where a tit-for-tat-like strategy can be played by the organisms.
The approach involves the use of a binary state which could easily be interpreted as holding the 'memory' of the previous play, and allowing a set of actions which play cooperate/defect along with changing the state of the memory depending on the play of the opponent.At this point it is good to note that the organisms in this example have an uncanny resemblance to Finite-State-Machines.
The primary breadth of the game's flexibility comes from allowing the transmission terms T k,i,j to be any function of population state P * t .For instance, the T terms can be non-linear and represent non-linear dynamics between individuals, such as might be potentially encoded in a classical evolutionary game with a 3-player symmetric payoff matrices.The T terms might keep the total population size under a maximum, or only under a maximum or minimum for a particular state.And can be seen to encode dynamics similar to various evolutionary models eg.replicator dynamics, best-response dynamics or payoff comparison dynamics.[35,36] It might be seen that the game's representation can be built to capture some of the dynamics of other evolutionary games.
Consideration must be made in running the game's simulation (per algorithm 1) that there is no guarantee it will fall into a stable equilibrium, or that it will do so in a timely manner.This is particularly true if astability is intrinsically part of the model (such as the game of paper-scissors-rock [37]).It is also worth noting that setting α too high can potentially introduce astability into borderline stable models.
A limitation of our game is that it is intentionally designed to 'wash-out' periodic transients between the states as the simulation progresses (as α < 1 acts as a dampener on such transients) so it cannot be used to model populations in which long-term periodic behavior between states in the simulation is desired 8 .Another limitation is that there is no current facilitation for transmission of organisms from one strategy into another, such as might be used to explicitly model the effects of significant mutation on the population (see Nowak [18] for example analysis) or also perhaps model the sexual combination of strategies.
However ultimately the features of evolutionary games and frameworks, with their virtues and shortcomings have to be evaluated and compared for some purpose.And it is on this note that we must leave the game with its potential uses, limitations, and concepts unto the reader's imagination.

Conflicts of Interest:
The authors declare no conflict of interest.And the founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Appendix A That all proportionally present strategies have the same growth-rate at equilibrium
At an equilibrium, if m w k l,j is the constant transmission matrix for strategy w k , and if v w k ,t j = P t,k,s k,j ,w k is a vector of the number of organisms at time t of strategy w k in the jth state.Then application of the transmission matrix to the vector gives the vector at the next time index t + 1 via Algorithm 1 as: Thus the strategy's population vector v w k ,t can be stepped forward in time by repeated matrix multiplication by the non-negative matrix Z = (αm w k + (1 − α)I) where I is identity matrix.It is generally observed that repeated matrix powers often yields exponential growth and if we assume Z is irreducible and diagonalizable matrix then the proof is comparatively simple and Theorem 1 gives the desired result: Where λ w k and vw k are largest eigenvalue and corresponding eigenvector of m w k , and b w k is a constant.If we assume the same is true for all strategies in the population then each has an asymptotic exponential growth-rate γ w k = αλ w k + 1 − α.And so between any two strategies w k and w p , that if λ w k > λ w p then γ w k > γ w p and strategy w k dominates strategy w p .Thus between the strategies in the population the only strategies that will be ultimately undominated are those of the maximum growth-rate.And thus at equilibrium the only strategies of significant proportion in the population have the same growth-rate γ.
The set of possible matrices m w k (and Z) is obviously much larger than those diagonalizable and irreducible.And while is generally observed that most matrices yield the same exponential-growth character there are some which don't, specifically defective matrices 9 .In this paper we assume such matrices are the exception and our analysis does not treat their case.Although we believe that all conclusions of the paper hold with their case included it will be left to future (and much more mathematically involved) work to demonstrate such.However until such time we note that the major theorem of this paper (the statement of Appendix B) is potentially vulnerable in cases where the transmission matrices are defective.

Appendix B That any stable equilibrium point can always be among pure strategies
In this section we will attempt to prove that any stable equilibrium point can be established among pure strategies alone.The total demonstration of this claim is formulated in matrix mathematics to avoid any possible vagueness as Theorem 5 (which builds on Theorems 4,3,2 and Lemma 2).But because of this, there is a step of interpretation needed between accepting the Theorem and understanding its connection and relevance to our game.It is this interpretation that we address in this section.
To make this connection we begin by coming to a definition of a strategy's being 'replaceable' by other strategies, if there exists a possible replacement of one's organisms for the others' in a population such as would not disturb the equilibrium point.Then we re-frame this condition in terms of matrices such as to directly relate to the theorems.The theorems are then shown to demonstrate that all mixed strategies are replaceable by sets of pure strategies, which demonstrates the claim of this appendix.

Appendix B.1 on 'replaceable' strategy
Suppose that there are two populations P1 and P2 consisting of the same set of strategies W k except that P2 has an additional mixed strategy w Σ .Suppose that both have the same values of P1 * t,k,s,a = P2 * t,k,s,a = P * t,k,s,a ie. the same numbers of the organisms at time t, of species k, in states s taking actions a in the population (as per definition in section 3).As P * t defines the transmission matrices of any strategies (via equation 1) then the strategies in both populations have the same transmission matrices.Therefore if P1 is in stable equilibrium then so to is P2 and they both have the same equilibrium point.At a stable equilibrium each strategy w k in the population has the same maximal exponential growth-rate of γ (as per equation 5 in Appendix A) as: And per definition of P * t (in section 3): where each b w k is interpreted as the relative 'amount' of strategy w k (especially if vw k is normalized), Thus: If we let d w k = b w k −c w k c w Σ , then this implies: Thus if there exists positive 'amounts' d w k of strategies in the set w k ∈ W k such that the above condition is true, then P1 and P2 will have identical equilibrium point.And indeed any 'amount' of strategy w Σ can be exchanged one for the others while keeping equilibrium.This leads naturally to our informal definition: Definition 1.A strategy in the population w Σ ∈ W k of growth-rate γ is 'replaceable at stable equilibrium' by a set of other strategies W k , if all strategies w ∈ W k have a growth-rate γ and there exists positive coefficients c w such that: ∀j, With vw denoting an eigenvector corresponding to growth-rate γ of w's transmission matrix per Appendix A.

Appendix B.2 'replaceable' strategy terms
The span of strategy transmission matrices are defined by their columns as weighted sums of column vectors (see equation 3).Each set of weights are the subsets of the strategy's terms, and suffer the constraints of their being non-negative and summing to unity.The pure strategies have terms which are entirely 0s and 1s and their matrices are the extreme poles of such a span.The condition of replaceability (as per the above definition 1) is a relationship of eigenvectors vw between several transmission matrices and the same probabilities which define them w a,s .And thus replaceability is actually a very specific relationship of the eigenvectors of sets of matrices who's columns are weighted sums of column vectors and the weights themself.
We conclude this Appendix by giving a definition of "replaceability", as it applies to the terms of strategies in precise mathematical language.It is in relation to this definition that Theorem 5 applies, to the conclusion that any strategy is replaceable by pure strategies.Definition 2. A 'strategy's terms' q j,i is a set of numbers indexed by j, i such that ∀j ∑ i q j,i = 1 and ∀j, i q j,i ∈ R + {0} Definition 3. A 'pure' strategy's terms is a strategy's terms q i,j such that ∀i, j q i,j ∈ {1, 0} Definition 4. For sets of element-wise non-negative column vectors y j,i , a 'strategy terms' q j,i is 'replaceable at stable equilibrium' by other strategy terms q0, q1, . . .iff there exists positive real coefficients c 0 , c 1 , . . .such that: Where m(q) denotes the matrix m(q) l,j = ∑ i y j,i,l q j,i = y 0,0 q 0,0 + y 0,1 q 0,1 + . . .y 1,0 q 1,0 + . . . . . .Where ρ(•) denotes spectral radius.Where V(•, λ) denotes an eigenvector of a matrix with an eigenvalue of magnitude λ Hence γ 0 is upper bound for set |γ m | and also is identical to an element in the set, hence is a maxima; satisfying the first part of the proof.For a γ m = γ 0 then λ m = λ 0 , and we break λ m into real and imaginary components λ m = r m + ii m (for i being imaginary number). If , and also that r m < λ 0 : , and the proof is complete.
Theorem 1. for an irreducible, diagonalizable non-negative real n × n matrix M with spectral radius λ, and non-negative non-zero vector v and an α such that 1 > α > 0, then for Z = (αM + (1 − α)I).That λ is an eigenvalue of M and its corresponding positive eigenvector z is such there is an b ∈ R + that: Proof.Since M is irreducible and non-negative then it has a non-negative real eigenvalue λ 0 equal to its spectral radius λ and a corresponding positive eigenvector z 0 via Perron-Frobenius theorem 10 .Let z 0 , z 1 , . . .and λ 0 , λ 1 , . . .be set of complex eigenvectors/values for M (vectors as scaled to have magnitude of 1), in which case z 0 , z 1 , . . .and γ 0 , γ 1 , . . .are eigenvectors/values for Z with γ i = αλ i + (1 − α).Since λ 0 = λ, then γ 0 = αλ 0 + (1 − α) is unique largest magnitude eigenvalue of Z, as via lemma 1. since z 0 , z 1 , . . .span C n , therefore v can be decomposed into a linear combination of them: where • is hermitian inner product) With c 0 , c 1 , . . .being the complex coefficients.Because vector v is non-negative and non-zero and z 0 is also positive therefore c 0 is real and positive.Taking m repeated applications of 1 γ 0 Z gives: m c 2 z 2 + . . .for large m, all terms with γ a i magnitudes less than that of γ 0 tend to zero leaving the single term: Lemma 2. for a n × n matrix A, and n column vector b, with A b,k denoting the matrix with its kth column as b.If λ is an eigenvalue for both A and A b,k then it is also an eigenvalue for αA + (1 − α)A b,k for any α ∈ R Proof.Consider the characteristic polynomials of λ for A and A b,k : det(A − λI) = det(A b,k − λI) = 0 If we let C(•) i,j denote the i,jth cofactor of a matrix, then these determinants can be expanded along the kth column to give: Theorem 2. For a real n × n element-wise non-negative matrix A, and real element-wise non-negative column vector b, with A b,k denoting the matrix with its kth column as b.For the matrix mapping B(α) = αA + (1 − α)A b,k defined on a range 0 ≤ α ≤ 1.If ρ(B(α)) denotes the spectral radius of B(α).Then ρ(B(α)) is continuous, and either constant or strictly monotonic with α.
Therefore ρ(B(α)) is monotonic.If there does not exist any then ρ(B(α)) is constant via lemma 2. Which completes the proof.Theorem 3.For a n × n matrix A i,j , and column vectors b and c, with A b,k and A c,k denoting the matrix with its kth column as b and c respectively.For the matrix mapping B(α) = αA b,k + (1 − α)A c,k for α ∈ R. Let λ(α) and v(α) be an eigenvalue/vector pairing of B(α) If there exists different α 1 and α 2 such that λ(α ) is an eigenvalue/vector pairing for all α, with the kth value of the eigenvector -v(α) k being constant.∂v(α)  ∂α + (b − c)v(α) k = 0 now, there are two cases: if there is an α 3 such that v(α 3 ) k 0 then:

Proof
) is solution which fulfills the proof if there is not an α 3 such that v(α 3 ) k = 0, then: It is possible do scaling, thus setting v(α) k = d to be a non-zero constant and therefore ∂v(α) k ∂α = 0 Thus there is a ) is only linear solution that adjoins v(α 2 ) and v(α 2 ).Which completes the proof.Theorem 4. For a real n × n element-wise non-negative matrix A, for m element-wise non-negative column vectors b 0 , b 1 , . . ., for A b i ,k denoting the matrix with its kth column as b i , for the matrix mapping B(c 0 , c 1 , . . . ) = ∑ i c i A b i ,k defined on inputs where: ∀i c i ≥ 0 and ∑ m−1 i=0 c i = 1, for ρ(B) denoting the spectral radius of B, for V(•, λ) k denoting the kth element of an eigenvector of a matrix corresponding to eigenvalue λ, for a set of reals d 0 , d 1 , . . .: and:

they all have the same kth value, equal C
Proof.We begin by introducing the following mapping on the coordinates c 0 , c 1 , . . ., c m−1 by parameter α, valid for 0 ≤ α ≤ 1 and c m−1 = 1: Q is a valid mapping under the input constraints, Ie. that the inputs to B satisfy the constraints if the c-inputs of Q (the c 0 , c 1 , . . ., c m−1 ) satisfy the constraints (that they are non-negative and they sum to one).We also notice by inspection of equation 6 that Q(. . ., α) satisfies the criteria for Theorem • Considering the case m = 1: In which case there is a singular vector b 0 and the only permissible value of B(d 0 ) under the constraints is B(1) therefore V(B(d 0 ), λ) = V(A b 0 ,k , λ) thus the theorem is satisfied for m = 1.• Considering the case m = j + 1 under the assumption of theorem satisfaction of m = j: ) and the theorem is satisfied Otherwise: ρ(Q(d 0 , d 1 , . ., d j , α)) exists and is constant or strictly monotonic with α -as per theorem 2 on equation 6 ρ(Q(d 0 , d 1 , . . . ,d j , α)) passes through ρ(B(d 0 , d 1 , . . . ,d j )) -as per equation 9. Assuming ρ(B(d 0 , d 1 , . . ., d j )) ≥ max(ρ(Q(c 0 , c 1 , . . ., c j , 1)), ρ(Q(c 0 , c 1 , . . ., c j , 0))) -which is a condition of the theorem.If ρ(Q(d 0 , d 1 , . . ., d j , α)) is strictly monotonic then it must be maximal at either side α = 0 or α = 1 or otherwise it must be constant, thus there are three cases: If Which completes the proof. 13this line is technically redundant, but is included for an . . .attempt. . .at logial flow. 14 • In the first there is the action which keeps the current state irrespective of the outcome of the pairing.For instance, an organism in "C1" may choose this action which keeps it in state "C1" irrespective of what state the opponent was from.This action is called 'hold' or "H" for short.• In the second case, there is the action which switches the state if the opponent in the pairing played differently.For instance, an organism in "C1" may choose to transition to state "D1" only if the opponent in the pairing was from state "D2".This action is called 'mirror' or "M" for short.• In the third case, there is the action which changes the state irrespective of the outcome of the pairing.For instance, an organism in "C1" may choose to transition to state "D1" irrespective of what state the opponent was from.This action is called 'switch' or "S" for short.
It is possible to draw a table of the combinations of offspring, and also possible to draw a diagram of the states with actions showing transmissions between them.these two are shown in the two parts of figure 4. the formal specification of the game is as follows: A simulation of such organisms also needs an initial population of the strategies to be specified.For instance, the species k 1 might be entirely comprised of organisms with the strategy of executing only 'mirror' actions, which would be very analogous to the tit-for-tat strategy.Alternately it may be that species k 2 might be entirely comprised of organisms with the strategy of executing 'mirror' if they are in state C2 and also play 'hold' if they are in state D2, which would be very analogous to the unforgiving 'grim-trigger' strategy.It is also possible to have the simulation run with combinations of these strategies and/or others between both populations.
It is also worth noting that the two species k 1 and k 2 could be interpreted as playing against each other between adjacent nodes on a very simple graph or grid, and thus also possible to extend it to much larger grids where perhaps the cooperation/defection offspring numbers change.
There is only so much modeling capacity with a binary state, but there is nothing prohibiting an extension to even more states for even more complex strategies, such as might be used for modeling 'tit-for-two-tats' strategy.Indeed the purpose of this appendix F is to illustrate some further possibilities.

1 −Figure 2 .
Figure 2. A diagram of the demographic flow between states of the example Hawk-Dove game γ being strategy parameter between G a and G b , p being proportion of Adults that are Aggressive, and A,D,C being game parameters

Figure 3 .
Figure 3. Dynamics of the Hawk-Dove game across parameter A, with D = 0.75 and C = 0.70 shown are results for p as well as proportion Young %Y for Moran stochastic simulation, analytical solution, and our software solver's results

Preprints
(www.preprints.org)| NOT PEER-REVIEWED | Posted: 18 May 2017 doi:10.20944/preprints201703.0234.v2 Acknowledgments: A great thankyou to Paul Scott & Sylvie Thiebaux for essential proof-reading, academic advice and support!This study was conducted under grant of the Australian Postgraduate Award (APA) for participation in PhD program through the Research School of Computer Science (RSCS) at the Australian National University (ANU).No specific funds were received for costs associated with open-access publication.

Figure 4 Preprints
Figure 4 Where P i represents the probability of and individual in the ith age bracket successfully living into the (i + 1)th age bracket, and F i is average number of offspring for an individual in ith age bracket within the duration of the age bracket.For a column vector n = [n 0 , n 1 , n 2 , . . ., n m ] with each n i representing the number of individuals in each age-bracket, Mn gives the expected number of individuals in the population after the duration of one age bracket of time, and M 2 n the expectation individuals

1 :
procedure SIMULATE(K, S, W, T, P 0 , α, t max ) * t,k,s,a ← ∑ w k ∈W k P t,k,s,w k w k a,s calculate reduced population distribution 5: for k ∈ K do for each species: 6: for w k ∈ W k do for each strategy: k ← αz s + (1 − α)P t,k,s,w k incorporate new population by α