From allostatic agents to counterfactual cognisers: active inference, biological regulation, and the origins of cognition

What is the function of cognition? On one influential account, cognition evolved to co-ordinate behaviour with environmental change or complexity (Godfrey-Smith in Complexity and the function of mind in nature, Cambridge Studies in Philosophy and Biology, Cambridge University Press, Cambridge, 1996). Liberal interpretations of this view ascribe cognition to an extraordinarily broad set of biological systems—even bacteria, which modulate their activity in response to salient external cues, would seem to qualify as cognitive agents. However, equating cognition with adaptive flexibility per se glosses over important distinctions in the way biological organisms deal with environmental complexity. Drawing on contemporary advances in theoretical biology and computational neuroscience, we cash these distinctions out in terms of different kinds of generative models, and the representational and uncertainty-resolving capacities they afford. This analysis leads us to propose a formal criterion for delineating cognition from other, more pervasive forms of adaptive plasticity. On this view, biological cognition is rooted in a particular kind of functional organisation; namely, that which enables the agent to detach from the present and engage in counterfactual (active) inference.


Introduction
What is cognition? What is it for? While the former question is a perennial source of philosophical dispute, the latter seems to attract rather less controversy. Cognition-whatever it consists in and however realised-is ultimately functional to adaptive success. It enables the organism to register information about the state of its environment, and to exploit such information in the service of adaptive behaviour. Cognition, in short, is for action.
As benign as this characterisation might appear on first blush, a host of thornier questions lie in wait: Are all varieties of adaptive behaviour mediated by cognition, or only a select few? If the former, does this notion of behaviour extend to artificial and multi-agent systems, or is it limited to individual organisms? If the latter, what properties distinguish cognitive from non-cognitive modes of behaviour (assuming there is a clear distinction to be made)? And what of those cognitive processes that seem entirely encapsulated from one's present transactions with the world-how do they fit into the picture?
This paper attempts to approach some of these difficult questions indirectly, via an analysis of the principles by which cognition might have evolved. This broadly telenomic strategy-whereby cognitive processes are understood in terms of their fitness-enhancing properties-draws inspiration from Peter Godfrey- Smith's (1996) environmental complexity thesis. On this view, cognition evolved to coordinate organismic behaviour with certain complex (i.e. heterogeneous or variable) properties of the eco-niche. Thus construed, cognition functions to generate flexible patterns of behaviour in response to fluctuating environmental conditions. We shall not dwell on the details of the environmental complexity thesis here. What interests us, rather, is how the general shape of Godfrey-Smith's explanatory framework-taken in conjunction with more recent advances in theoretical biology, computational neuroscience, and related disciplines-can inform contemporary philosophical debates about the nature of (biological) cognition. Drawing on insights afforded by these fields, we interpret complexity in terms of uncertainty, and suggest that distinctive profiles of adaptive plasticity emerge as the capacity to represent and anticipate various sources of uncertainty becomes increasingly more sophisticated. This analysis suggests behavioural flexibility per se is not sufficient to determine the cognitive status of an adaptive organism. Rather, we propose a narrower conception of cognition as a process rooted in a particular kind of functional organisation; namely, one that affords the capacity to model and interrogate counterfactual possibilities. This paper is structured as follows: "Homeostasis and the free energy principle" begins by considering the homeostatic challenges posed by uncertain environments. We approach this topic from the perspective of the free energy principle (Friston 2010), a formal account of the autopoietic processes by which biological systems organise and sustain themselves as adaptive agents. "Beyond homeostasis: Allostasis and hierarchical generative models" outlines how the theoretical resources of the free energy principle extend to predictive (i.e. allostatic) forms of biological regulation. We focus on two complementary formulations of allostasis, highlighting how these hierarchical control schemes inform fundamental questions about learning, planning, and adaptive behaviour. "Biological regulation in an uncertain world" examines the relation between environmental and biological complexity via an analysis of generative models. We sketch out three scenarios designed to illustrate how different kinds of model architecture endow distinctive capacities for the representation and resolution of uncertainty. Finally, "Two options for cognition" elaborates some of the key implications of this analysis for the concept of biological cognition. We argue that cognition does not simply coincide with adaptive biological activity (allostatic or otherwise), but inheres rather in the agent's capacity to disengage from the present and entertain counterfactual states of affairs.

Homeostasis and the free energy principle
The free energy principle provides a mathematical framework explaining how adaptive organisms come to exist, persist, and thrive-at least for a while-by resisting what Schrödinger described as "the natural tendency of things to go over into disorder" (1992, p. 68). In this section, we sketch a relatively non-technical overview of this perspective, and show how it relates to familiar notions of homeostasis and adaptive behaviour. 1

Life, formalised: thermodynamics, attracting sets, and (un)certainty
The free energy principle starts with the simple (but fundamental) premise that organisms must maintain the stability of their internal dynamics in order to survive (Bernard 1974;Cannon 1929;Friston 2012a). This is to say that living systems must act to preserve their structural and functional integrity in the face of environmental perturbation (cf. autopoiesis; Maturana and Varela 1980), thereby resisting the tendency to disorder, dispersal, or thermodynamic entropy alluded to by Schrödinger (Friston 2013;Nicolis and Prigogine 1977). 2 Reformulated in the language of statistical mechanics: Living systems live in virtue of their capacity to keep themselves within some (nonequilibrium) thermodynamic steady-state. In other words, they maintain invariant (steady-state) characteristics far from equilibrium-as open systems in exchange with their environment. 3 It follows from this postulate that any entity qua adaptive biological system can be expected to frequent a relatively small number of attracting states; namely those which compose its attracting set (Friston 2012a(Friston , 2013. In dynamical systems theoretic terms, this set of states corresponds to a random dynamical attractor, the invariant set towards which the system inevitably evolves over time (Crauel and Flandoli 1994). The existence of this invariant set means that the probability of finding the system in any given state can be summarised by a distribution (technically, an ergodic density), which can be interpreted in terms of its information-theoretic entropy or uncertainty (Shannon 1948).
The upshot of this picture is that any biotic (random dynamical) system which endures over time must do so in virtue of maintaining a low-entropy distribution over its attracting set (Friston 2012a;Friston and Ao 2012). This is tantamount to saying there is a high degree of certainty concerning the state of the system at any given moment in its lifetime, and that such attracting states will correspond to the conditions of the organism's homeostatic integrity. Conversely, there is a low probability of finding the system occupying a state outside of its attracting set, since such states are incompatible with the system's (long-term) existence. It follows that the repertoire of attracting states in which the system is typically located is constitutive of that agent's phenotype (Friston et al. , 2010a, insofar as the phenotype is simply a description of the organism's characteristic (i.e. typically-observed) states.

Surprise and free energy minimisation
According to this framework, then, homeostasis amounts to the task of keeping the organism within the bounds of its attracting set (or, equivalently, of maintaining a low conditional entropy over its internal states). How might biological agents realise this outcome?
To answer this question, we must invoke another information-theoretic term: surprise (Shannon 1948). Surprise (i.e. 'surprisal' or self-information) quantifies the improbability (i.e. negative log-probability) of some outcome. In the present context, the outcome in question refers to some sensory state induced in any part of the system receptive to perturbation. Obvious realisers of sensory states include the sensory epithelia (e.g., retinal photoreceptor cells), but also extend to ion channel receptors in cell membranes, photosensitive receptors in plants, and so on. These receptive surfaces can be construed as states embedded within a (statistical) boundary or interface (technically, a Markov blanket ;Pearl 1988) separating (i.e. 'shielding' or 'screening-off') system-internal from system-external conditions (see Friston 2013;Friston and Ao 2012;Hohwy 2017a). 4 Importantly, the quantity of surprise associated with any given sensory state is not absolute, but depends rather on the kind of system the organism embodies (i.e. its phenotype or internal configuration; Friston and Stephan 2007). The fish that finds itself on dry land (i.e. well beyond the bounds of its attracting set) experiences a high degree of surprise, and will perish unless something is done (quickly!) to reinstate its usual milieu. Conversely, this very same state will elicit relatively little surprise in land-dwelling creatures. It turns out that minimising or suppressing the surprise evoked by sensory states-that is, by avoiding surprising states and favouring unsurprising ones-the agent will tend to keep the (conditional) entropy of its states low, since entropy (almost certainly) converges with the long-term time average of surprise (Birkhoff 1931;Friston and Ao 2012).
In other words, by avoiding surprising interactions with their environment, biological systems keep themselves within the neighbourhood of attracting states that are conducive to their ongoing existence. Indeed, as a random dynamical system that repeatedly revisits its attracting set over time, the agent thereby realises itself as its own random dynamical attractor-and by extension, its own 'existence proof' (Friston 2018; more on which shortly).
There is, however, an important complication to this story: Surprise is computationally intractable, since its direct evaluation would require the agent to possess exhaustive knowledge of the external dynamics responsible for its sensory experiences (Friston 2009). This is where the concept of free energy minimisation comes in.
Variational free energy is an information-theoretic quantity developed to finesse difficult integration problems in quantum statistical mechanics (Feynman 1972). 5 In the present context, free energy serves as a proxy for the amount of surprise elicited by sensory inputs (Friston 2010(Friston , 2011. As free energy is a function of the agent's sensory and internal states (i.e. two sources of information available to the agent), and can be minimised to form a tight (upper) bound on sensory surprise, free energy minimisation enables the agent to indirectly evaluate the surprise associated with its sensory states (Friston and Stephan 2007). Moreover, since the agent is also capable of evaluating how free energy is likely to change in response to state transitions (Friston et al. 2012d), it will appear to select (or 'sample') actions that reduce surprise (Friston et al. 2015b). 6 The free energy principle thus implies that biological systems will tend to avoid (or suppress) surprising observations over the long-run, thereby restricting themselves within the neighbourhood of their invariant (attracting) set.
Naturally, this explanation raises yet further questions: How does the agent minimise free energy to a 'tight bound' on surprise? How can simple organisms 'expect' to occupy certain states, or be said to 'prefer' these states over others? In order to address such questions, we first need to elaborate a notion of the agent as a generative model. 5 Variational inference techniques are also widely used in machine learning to approximate density functions through optimisation (see Blei et al. 2017). 6 Of course, just because a system can be described as behaving in a way that minimises variational free energy (maximises Bayesian model evidence, approximates Bayesian inference, etc.) does not guarantee that it actually implements any such computation. The extent to which the free energy principle should be construed as a useful heuristic for describing and predicting adaptive behaviour (a kind of intentional stance ;Dennett 1987), versus a more substantive ontological claim, remains an open question. That said, recent progress has been made towards casting the free energy principle as a process theory of considerable explanatory ambition (Friston et al. 2017a).

Existence implies inference: agents as generative, self-evidencing models
According to the free energy principle, adaptive biological agents embody a probabilistic, generative model of their environment (Calvo and Friston 2017;Friston 2008Friston , 2011Friston , 2012aKirchhoff et al. 2018;Ramstead et al. 2018). As we shall see, this is a rather bold claim that moves us far beyond conventional accounts of homeostatic regulation 7 and their reformulation in the language of statistical mechanics and dynamical systems theory. Roughly, the system's form and internal configuration are said to parameterise a probabilistic mapping between the agent's sensory states and the external (hidden) causes of such states. This is to say that organisms interact with their eco-niche in ways that distill and recapitulate its causal structure, meaning that biological agents constitute (embody) a statistical model encoding conditional expectations about environmental dynamics Friston 2011;Kirchhoff et al. 2018). 8 Indeed, according to the free energy principle, the very existence of the organism over time implies that it must optimise a generative model of the external causes of its sensory flows. This follows from the observation that optimising a model of the hidden dynamics impinging on one's sensory surfaces will give rise to (free-energy minimising) exchanges with the environment, which manifest as adaptive responses to evolving external conditions (Friston et al. 2006;Friston and Stephan 2007).
Under this account, then, even such simple biological agents as unicellular organisms will 'expect' (abstractly and nonconsciously) to find themselves in certain (unsurprising) states, according to the model they embody. Moreover, such agents will strive to sample (i.e. bring about) those attracting, free energy minimising states they expect to occupy-or risk perishing (Friston et al. 2006;Friston and Stephan 2007).
In Bayesian terms, this activity of expectation-fulfilment (or maximisation)where expectations correspond to prior probability distributions parameterised by the agent's internal states-is tantamount to maximising the evidence for the agent's model (and by extension, their own existence; Friston 2010, 2013), a process known as self-evidencing (Hohwy 2016). Hence, under the free energy principle, adaptive biological systems conserve their own integrity through free energy minimising interactions which, over the long-term time average, minimise entropy (i.e. resolve uncertainty) and maximise self-evidence. 9 The process by which they accomplish this feat is active inference. 9 See Parr and Friston (2018b) for a mathematical explanation of the (bound) relationship between variational free energy and model evidence. 7 Note that we interpret the notion of regulation rather broadly here. For philosophical arguments distinguishing regulation from related concepts such as feedback control and homeostasis, see Bich et al. (2016). On this view, regulatory control consists in a special kind of functional organisation characterised in terms of second-order control. This formulation seems broadly in line with our understanding of allostasis (see "Beyond homeostasis: Allostasis and hierarchical generative models"). 8 Note that the organism's morphology and internal organisation impose constraints on the way it models and represents environmental dynamics (e.g., Parr and Friston 2018a)-a point we shall elaborate in "Biological regulation in an uncertain world".

Active inference: closing the perception-action loop
The scheme outlined above implies that biological agents conserve their morphology and internal dynamics (and in turn, the generative model these characteristics embody) by acting to offset the dispersive effects of random environmental fluctuations. But why should the agent sustain its model through such adaptive exchanges, rather than allowing its model to change in line with evolving environmental dynamics? As it turns out, the free energy principle supports both of these possibilities: agent and environment are locked in a perpetual cycle of reciprocal influence. This dialectical interplay, which emphasises the inherent circular causality at the heart of adaptive behaviour, is formalised under the active inference process theory (Friston et al. 2017a).
Active inference comprises two basic processes that play out at the agent-environment interface: perception and action. 10 Here, perception is construed as the process of changing (i.e. 'updating') one's internal states in response to external perturbations, and over longer timescales corresponds to learning (i.e. Bayesian updating of time-invariant model parameters; Fitzgerald et al. 2015;Friston et al. 2016Friston et al. , 2017a. 11 In other words, perceptual (state) inference describes how the agent updates its representation of environmental dynamics to resolve uncertainty about the hidden causes of its sensory fluctuations. A prevalent neurocomputational implementation of this scheme is predictive coding (Elias 1955;Lee and Mumford 2003;Rao and Ballard 1999;Srinivasan et al. 1982;Huang and Rao 2011;Spratling 2017; for some variational free energy treatments, see Barrett and Simmons 2015;Bastos et al. 2012;Pezzulo 2014;Seth et al. 2012;Shipp et al. 2013;Shipp 2016).
Action, on the other hand, involves the activation of effector mechanisms (e.g., motor reflexes, cell migration; Friston et al. 2015a) in order to bring about new sensory states Friston et al. 2010a). Different states can be sampled either through actions that directly intervene on the environment (e.g., turning off a bright light), or alter the relationship between the agent's sensory surfaces and external states (e.g., turning away from a bright light). In either case, free energy is affected by the sensory consequences of the agent's actions, where expectations 10 While active inference is sometimes narrowly construed as the active or behavioural component of the perception-action loop, the term was originally introduced to characterise the reciprocal interplay between perception and action (e.g., Friston et al. 2009, p. 4). This broader interpretation emphasises the deep continuity of the (Bayesian inferential) processes underwriting perception, learning, planning, and action under the free energy principle (Friston et al. 2017a). about the modifiability of sensory flows are conditioned on a model of hidden states and their time-evolving trajectories (Friston and Ao 2012). 12 Active inference thus recalls the cybernetic adage that organisms "control what they sense, not… what they do" (Powers 1973, p. 355, emphasis in original).
Although we shall have more to say about the role of action under active inference in later sections, these cursory remarks are sufficient to motivate the basic claim that adaptive agents recruit effector systems in order to propel themselves towards the sensory states they expect to inhabit.
Superficially at least, the inferential dynamics underwriting perception and action seem to pull in opposing directions (i.e. change the model to reflect the world vs. change the world to reflect the model). Under the active inference scheme, however, these two processes are complementary and deeply interwoven. This is because perception can only minimise free energy (or, under certain simplifying assumptions, prediction error ;Friston 2009;) to a tight (upper) bound on surprise, whereas action suppresses surprise by invoking new sensory states that conform to (expectations prescribed by) the agent's phenotype. Consequently, perception serves to optimise the agent's model of environmental conditions, such that the agent has adequate information to choose actions that engender low sensory entropy (Friston et al. 2010a). 13 Although perceptual inference might seem to imply that agents ought to adapt their internal organisation to reflect environmental fluctuations as accurately as possible, unrestricted acquiescence to such dynamics would result in a precarious (and in many cases, rather brief) existence. Rather, the exigencies of homeostatic control dictate that biological systems preserve the conditional independence of their internal and external states (Ramstead et al. 2018). This is to say that the biological agent must maintain a boundary (i.e. Markov blanket) that separates (and insulates) its internal dynamics from external conditions. 14 Consequently, the free energy minimising agent must exploit inferences about the state of the world beyond its Markov blanket in order to act in ways that keep it within the neighbourhood of its attracting states (Friston 2013).
The agent's capacity to maintain the integrity of its Markov blanket is aided by prior beliefs about the sorts of conditions it expects to encounter. Many such expectations are directly functional to homeostasis (Pezzulo et al. 2015), having been 12 Technically, actions are physical, real-world states that are not represented within the agent's generative model (Attias 2003). Rather, the agent infers (fictive) 'control' states that explain the (sensory) consequences of its actions (Friston et al. 2012a, d). Action selection (or decision-making) thus amounts to the optimisation of posterior beliefs about the control states that determine hidden state transitions (Friston et al. , 2015b. 13 Although one might be tempted to subordinate perceptual inference to free energy minimising action, we interpret perception and action as mutually dependent moments within a unified dynamical loop (cf. the perception-action cycle ;Fuster 2001Fuster , 2004. Ultimately, both modes of active inference are in the service of uncertainty reduction: Percepts without actions are idle; actions without percepts are blind. 14 Formally speaking, the sensory and active states that compose the Markov blanket render the probability distributions over internal and external states statistically independent of one another (see Pearl 1988). In other words, internal and external states provide no additional information about one another once the Markov blanket's active and sensory states are known.
shaped and refined through generations of natural selection de Vries and Friston 2017;Friston 2010). Pushing this logic one step further, we can say that the agent embodies a deeply-engrained expectation to survive (i.e. to remain within the confines of its attracting set-and thus to maintain its homeostatic integrity over time); this is simply the expectation to minimise average surprise over the long-run (Allen and Tsakiris 2018;Seth 2015). This remark highlights the point that not all beliefs are equally amenable to model updating. Rather, certain stronglyheld or high-precision beliefs (e.g., those pertaining to homeostatic stability) will be stubbornly defended through actions that seek to substitute conflicting sensory evidence with input that conforms more closely to prior expectations (Yon et al. 2019).
In sum, perception and action work in concert to achieve free energy minimisation, ensuring that the biological system maintains itself in an invariant relationship with its environment over time. Critically, this formulation explains how apparently teleological or purposive behaviours emerge as a consequence of free energy minimising sensory sampling, without resorting to additional concepts such as 'value' or 'reward' (Friston et al. , 2010a. Rather, value and reward simply fall out of the active inference process, as what is inherently valuable or rewarding for any particular organism is prescribed by the attracting states that compose its phenotype (i.e. those states the agent expects itself to occupy; Friston and Ao 2012). Simply put, unsurprising (i.e. expected) states are valuable; hence, minimising free energy corresponds to maximising value ). 15

Beyond homeostasis: allostasis and hierarchical generative models
The free energy principle is founded on the premise that biological systems act to maintain their homeostatic equilibrium in the face of random environmental perturbations. Until recently, however, the question of how adaptive organisms secure their homeostatic integrity had attracted relatively little theoretical attention from within this perspective. A growing number of researchers are now leveraging predictive coding and active inference to explain how complex nervous systems monitor internal bodily states (i.e. perceptual inference in the interoceptive domain) and regulate physiological conditions (Allen et al. 2019;Barrett and Simmons 2015;Iodice et al. 2019;Seth 2013;Pezzulo 2014; for recent reviews, see Khalsa et al. 2018;Owens et al. 2018;Quadt et al. 2018).
An important conceptual development within this line of work was the move beyond traditional notions of homeostatic stability to more modern accounts of allostatic variability. The concept of allostasis ("stability through change") was first introduced by Sterling and Eyer (1988), who criticised conventional homeostatic control theory as overly restrictive and reactive in character. 16 By contrast, allostasis 15 Note that value here is not equivalent to expected utility, but rather a composite of utility (extrinsic value) and information gain (epistemic value; see Friston et al. 2015b;Schwartenbeck et al. 2015). 16 Although we focus here on allostasis, numerous other concepts emphasising the dynamic nature of biological regulation have been proposed in an effort to extend (or transcend) classical notions of homeostatic setpoint control (see for e.g., Bauman 2000; Berntson and Cacioppo 2000 and references therein). was intended to replace setpoint defence with a more flexible scheme of parameter variation, and to supersede local feedback loops with centrally co-ordinated feedforward mechanisms (e.g., central command; Dampney 2016;Goodwin et al. 1972;Krogh and Lindhard 1913). Allostasis was thus posited to account for a wide variety of anticipatory physiological activity that resisted explanation in terms of closedloop control.
Despite controversy over the theoretical merits and conceptual scope of allostasis (see , for a recent overview), there is ample evidence that biological regulation consists in both anticipatory and reactive modes of compensation (see for e.g., Burdakov 2019; Ramsay and Woods 2016;Schulkin and Sterling 2019). 17 These complementary mechanisms are easily accommodated within the active inference framework, mapping neatly onto the hierarchically-stratified models posited under the free energy principle (Friston 2008). Moreover, we believe that mature versions of allostatic theory are enriched and invigorated by active inference, insofar as the latter furnishes precisely the kind of inferential machinery required to underwrite effective forms of prospective control across various timescales (Corcoran and Hohwy 2018; Kiebel et al. 2008;Friston et al. 2017d;Pezzulo et al. 2018).
The remainder of this section briefly outlines two recent attempts to integrate homeostatic and allostatic mechanisms within the broader scheme of active inference. Although these perspectives assume a rather complex, neurally-implemented control architecture, we shall argue in "Biological regulation in an uncertain world" that the basic principles underwriting such schemes can be generalised to much simpler biological systems with relative ease.

Allostasis under active inference
Stephan and colleagues (Stephan et al. 2016, see also Petzschner et al. 2017) developed an active inference-based account of allostasis that maps interoception and physiological regulation onto a three-layer neural hierarchy. At the lowest level of this hierarchy are homeostatic reflex arcs, which operate much like classical feedback loops (i.e. deviation of an essential variable beyond certain limits elicits an error signal, which in turn triggers a countervailing effector response; see Ashby 1956, Ch. 12;Wiener 1961, Ch. 4). Critically, however, the range of states an essential variable may occupy is prescribed by intermediate-level allostatic circuits. This formulation thus recasts essential variable setpoints as (probabilistic) prior expectations (or equivalently, top-down model-based predictions) about the likely states of interoceptors (cf. Penny and Stephan 2014), with deviations from expected states provoking interoceptive prediction error. 18 Two important features of this account are that (1) prior expectations about essential variables encode a distribution over states (rather than a singular ideal reference value), and that (2) the sufficient statistics which specify this distribution-its mean and precision (inverse variance)-are free to vary (cf. Ainley et al. 2016). On this view, such classic allostatic phenomena as diurnal patterns of body temperature (Kräuchi and Wirz-Justice 1994) and blood pressure variation (Degaute et al. 1991) emerge as a consequence of the cyclical modulation of the priors over these physiological states (cf. Sterling 2004Sterling , 2012. Likewise, phasic increases or decreases in the stability of such variables correspond to periodic shifts between more-or lessprecise distributions, respectively. 19 Subordinating homeostatic reflex arcs to allostatic circuits transforms the traditional conception of physiological control as setpoint defence into a far more dynamic and context-sensitive process. Access to perceptual and cognitive representations (e.g., via the anterior insular and cingulate cortices; Barrett and Simmons 2015;Craig 2009;Gu et al. 2013;Menon and Uddin 2010;Paulus and Stein 2006) enables allostatic circuitry to harness multiple streams of information such that homeostatic parameters may be deftly altered in preparation for expected environmental changes (Ginty et al. 2017;Peters et al. 2017). Not only does this arrangement enable the system to anticipate periodic nonstationarities in essential variable dynamics (such as the circadian oscillations in body temperature and blood pressure mentioned above), it also confers potentially vital adaptive advantages under unexpected and uncertain conditions.
As a brief illustration, consider the case of an animal that detects the presence of a nearby predator. Registering its perilous situation, the animal's brain triggers a cascade of autonomic activity-the 'fight-or-flight' response famously characterised by Cannon (1914Cannon ( , 1915. On Stephan and colleagues' (2016) account, these rapid physiological alterations are mediated via the allostatic enslavement of homeostatic reflex loops. This generative model-based scheme explains why physiological parameters should change so dramatically in the absence of any immediate homeostatic disturbance: Predictions (or 'forecasts'; Petzschner et al. 2017) about the likely evolution of external conditions mandate the adoption of atypical, metabolically expensive states in preparation for evasive action (cf. Requin et al. 1991).
Notice that the physiological states realised via allostatic modulation of homeostatic loops might themselves constitute surprising departures from the organism's typically-expected states. Since these deviations cannot be resolved locally on account of the higher-order imperative to mobilise metabolic resources for impending action, interoceptive prediction error propagates up the neural hierarchy, possibly manifesting as the suite of sensations associated with acute stress (Peters et al. 2017). Such prediction error is tolerated to the extent that these emergency measures are expected to expedite a more hospitable environment (namely, one in which there is no immediate threat of predation). In other words, allostatic regimes of interoceptive active inference are functional to the agent's deeply-held expectation to survive, insofar as they serve to minimise uncertainty and maximise self-evidence over the long-run. 20 Stephan and colleagues (2016) crown their hierarchical framework with a metacognitive layer that monitors the efficacy of one's control systems. This processing level is posited to explain the emergence of higher-order beliefs about one's ability to adaptively respond to homeostatic perturbation. Persistent failure to suppress interoceptive surprise-either as a consequence of harbouring inaccurate allostatic expectations, or one's inability to realise free energy minimising actions-results in a state of dyshomeostasis (cf. allostatic load; McEwen and Stellar 1993;Peters et al. 2017), the experience of which may erode confidence in one's capacity for self-regulation. Stephan and colleagues (2016) speculate that the affective and intentional states engendered by chronic dyshomeostasis contribute to the development of major depressive disorder (cf. Badcock et al. 2017;Barrett et al. 2016;Seth and Friston 2016). Although such psychopathological implications are beyond the scope of this paper, the basic idea that the brain's homeostatic/allostatic architecture is reciprocally coupled with higher-order inferential processing will be explored further in "Biological regulation in an uncertain world".
In sum, the hierarchical regulatory scheme proposed by Stephan and colleagues (2016) provides a promising formal description of the inferential loops underwriting both reactive (homeostatic) and prospective (allostatic) modes of biological regulation, and their interaction with higher-order beliefs. This framework accommodates a rich variety of allostatic phenomena spanning multiple timescales; ranging from deeply-entrenched, slowly-unfolding regularities (e.g., circadian and circannual rhythms) to highly unpredictable, transient events (e.g., predator-prey encounters), and everything in between (e.g., meal consumption; Morville et al. 2018;Teff 2011).

Broadening the inferential horizon: preferences, policies, and plans
A second, complementary perspective focuses on the ways organisms can develop complex behavioural repertoires that optimise physiological regulation in an anticipatory manner (e.g., buying food and preparing a meal before one is hungry).
Active inference agents can acquire such skills by leveraging information about evolving state transitions, or policies. Policies are (beliefs about) sequences of actions (or more precisely, control states; see Footnote 12) required to minimise free energy in the future, thereby realising some preferred (i.e. expected, self-evidencing, and thus valuable) outcome (Attias 2003;Friston et al. 2012aFriston et al. , 2013Pezzulo et al. 2018). In active inference, policies are explicitly evaluated (and therefore selected) 20 One might protest that all we have done here is pivot from one sort of reactive homeostatic mechanism to another; albeit, one involving responses to an external (rather than internal) threat. Nevertheless, we consider this simple scenario as exemplary of the fundamental principle of allostatic regulation; namely, the modulation of physiological states in anticipation of future conditions, and in the absence of any immediate homeostatic perturbation. This example can easily be extended to capture a rich assortment of allostatic dynamics that play out across increasing levels of abstraction and spatiotemporal scale.
depending on their expected free energy, i.e. the amount of free energy they are expected to minimise in the future. It is important not to conflate this notion of expected free energy with that of variational free energy (as introduced in "Surprise and free energy minimisation"). The former only arises during policy evaluation and uses expectations about future states of affairs that may arise from selecting a particular policy; whereas the latter uses (available) information about past and present states of affairs.
Policy selection is important for allostatic control, because by explicitly considering future states of affairs in addition to one's immediate needs, agents can (learn how to) engage in relatively complex courses of action that miminise more free energy over the long-run. Consider for instance the decision to purchase ingredients from a local supermarket and return home to cook a meal, versus ordering a meal from a neighbouring fast food restaurant. In both cases, the underlying homeostatic motivation driving behaviour (i.e. increasing prediction error manifesting as intensifying hunger) is identical; the interesting question is why one does not always opt for the policy that is most likely to resolve prediction error (hunger) most rapidly. Selecting the Cook policy, which postpones the resolution of interoceptive prediction errors (and thus engenders greater free energy in the short-term), might appear on first blush to contradict the free energy principle. Such choices can however be explained by recourse to the agent's superordinate expectation to minimise expected free energy over longer timescales (e.g., prior beliefs about the health, financial, and/or social benefits associated with domestic meal preparation; cf. Friston et al. 2015b;Pezzulo 2017;Pezzulo et al. 2018). 21 Pezzulo and colleagues (2015) offer an account of allostasis that seeks to explain the gamut of behavioural control schemes acquired via associative learning from a unified active inference perspective. 22 Specifically, this account grounds the emergence of progressively more flexible and sophisticated patterns of adaptive behaviour on evolutionarily primitive control architectures (e.g., low-level circuitry akin to Stephan and colleagues' (2016) homeostatic reflex arc). From a broader ethological perspective, this scheme implies a deep continuity between the homeostatic loops underpinning simple, stereotypical response behaviour on the one hand, and the complex processes supporting goal-directed decision-making and planning on the other.
According to this view, all associative learning-based control schemes fall out of the same uncertainty-reducing dynamics prescribed by the free energy principle. What distinguishes these schemes under the active inference framework is their place in the model hierarchy: While rudimentary adaptive behaviours (e.g., approach/avoidance reflexes) are availed by 'shallow' architectures, more sophisticated modes of control require greater degrees of hierarchical depth. Goal-directed actions require generative models that are capable of representing the prospective evolution of hidden states over sufficiently long intervals (cf. Botvinick and Toussaint 2012;Penny et al. 2013;Solway and Botvinick 2012), while simultaneously predicting how these projected trajectories are likely to impact upon the internal states of the organism (cf. Keramati and Gutkin 2014). On this account, activity at higher (or deeper) hierarchical layers (e.g., prefrontal cortical networks) contextualises that of more primitive control schemes operating at lower levels of the hierarchy (see also Pezzulo and Cisek 2016;Pezzulo et al. 2018). This means that higherlevel inferences about distal or remote states (and the policies most likely to realise them) inform lower-level mechanisms governing action over shorter timescales (see also Attias 2003;Badre 2008;Friston et al. 2016;Kaplan and Friston 2018;Pezzulo et al. 2018).
A distinctive feature of Pezzulo and colleagues' (2015) scheme is the crucial role played by the (cross-or multimodal) integration of interoceptive, proprioceptive, and exteroceptive information over time. This is required if one wants to translate inferences on time-varying internal states (e.g., declining blood glucose concentration) into complex behavioural strategies (e.g., preparing a meal) that anticipate or prevent homeostatic disturbance. This is to say that the emergence of nervous systems which enable their owners to envisage and pursue certain future states at the expense of others depends upon the (allostatic) capacity to track and anticipate coevolving internal/sensory and external/active state trajectories. 23 In short, Pezzulo and colleagues (2015) posit that hierarchical generative models harness prior experience to map sensorimotor events to interoceptive fluctuations. This mapping enables the agent to learn how their interoceptive/affective states are likely to change both endogenously (e.g., I am likely to become irritable if I forgo my morning coffee), and in the context of external conditions (e.g., I am likely to dehydrate if I exercise in this heat without consuming fluids). 24 With this (hierarchical) inferential architecture in place, it is relatively easy to see how allostatic policies may take root. As alluded to above, interoceptive/homeostatic dynamics often exhibit (quasi)periodic cycles, thus facilitating the modelling and prediction of time-evolving changes in internal sensory states. Given a model of how interoceptive states typically oscillate, the agent learns how particular external perturbations (including those caused by its own actions) modulate this trajectory (cf. Allen and Tsakiris 2018). As the agent accrues experience, it progressively refines its model of the contingent relations that obtain between sensorimotor occurrences and physiological fluctuations, engendering the ability to extrapolate from sensations experienced in the past and present to those expected in the future (Friston et al. 2017a). This capacity is not only crucial for finessing the fundamental control problems posed by homeostasis (i.e. inferring the optimal policy for securing future survival and reproductive success), but also for its vital contribution in establishing the agent's understanding of itself qua autonomous agent (cf. Fotopoulou and Tsakiris 2017;Friston 2017). It is a relatively small step from here to the emergence of goal-directed behaviours that are ostensibly independent of (i.e. detached or decoupled from) current stimuli, hence permitting anticipatory forms of biological regulation (e.g., purchasing food when one is not hungry; see Pezzulo and Castelfranchi 2009;Pezzulo 2017).

Interim summary
In this section, we have presented two closely-related computational perspectives on biological regulation that cast homeostasis and allostasis within the broader scheme of active inference. We believe these accounts can be productively synthesised into a comprehensive framework that explains the emergence of increasingly versatile, context-sensitive, and temporally-extended forms of allostatic regulation. This framework provides a formal account of biological regulation that eschews the conceptual limitations of setpoint invariance (see Cabanac 2006;Ramsay and Woods 2014), unifies habitual ('model-free') and goal-directed ('model-based') behaviour (Dolan and Dayan 2013) under a single hierarchical architecture (see Fitzgerald et al. 2014;, and converges with neurophysiologically-informed perspectives on mind-body integration (e.g., Critchley and Harrison 2013;Smith et al. 2017). We have also introduced the important notion of policy selection, which explains how adaptive behaviour emerges through (active) inference of beliefs about the future (cf. 'planning as inference' ;Attias 2003;Botvinick and Toussaint 2012;Solway and Botvinick 2012).
From a broader perspective, the capacity of higher model levels to track the evolution of increasingly distal, temporally-extended, and abstract hidden dynamics, and to infer the likely consequences of such dynamics for the agent's own integrity and wellbeing, provides a compelling explanation of how allostatic control schemes could have established themselves over ontogenetic and phylogenetic timescales. Not only does this perspective provide a principled account of how allostatic mechanisms should 'know' when to initiate adaptive compensations in the absence of physiological disturbance (i.e. how the body 'acquires its wisdom'; Dworkin 1993), the embedding of such processes within an overarching hierarchical model also explains how agents are able to effectively arbitrate and trade-off multiple competing demands (a core feature of many allostatic frameworks; e.g., Sanchez-Fibla et al. 2010;Sterling 2012;Schulkin and Sterling 2019;Verschure et al. 2014). 25 In the next section of this paper, we consider why such allostatic regimes should have evolved.

Biological regulation in an uncertain world
We have argued that adaptive biological activity is underwritten by active inference, where more sophisticated (predictive or prospective) forms of biological regulation are supported by increasingly more sophisticated generative models that extract and exploit long-term, patterned regularities in internal and external conditions. In this section, we take a closer look at how the functional organisation of the inferential architecture constrains the organism's capacity to represent time-evolving state trajectories, and the impact this has upon its ability to deal with uncertainty.
Our analysis draws inspiration from Peter Godfrey-Smith's influential environmental complexity thesis (1996), which casts cognition as an adaptation to certain complex (i.e. heterogeneous or variable) properties of the organism's eco-niche. On this view, cognition evolved to mitigate or 'neutralise' environmental complexity by means of behavioural complexity-"the ability to do a lot of different things, in different conditions" (Godfrey- Smith 1996, p. 26). 26 The concept of complexity at the core of Godfrey-Smith's analysis is deliberately broad and abstract. Environments may comprise manifold dimensions of complexity, many of which may be of no ecological relevance to their inhabitants. Patterns of variation only become biologically salient once the capacity to track and co-ordinate with them confers a selective advantage (i.e. when sensitivity to environmental variation helps the organism to solve problems-or exploit opportunities-that bear on its fitness; Godfrey -Smith 2002). Much like the notion of surprise (conditional entropy) introduced in "Surprise and free energy minimisation", then, the implications of environmental complexity for any given organism are determined by the latter's constitution and relation to its niche.
In what follows, we analyse the connection between environmental and behavioural complexity as mediated by increasingly elaborate schemes of active inference. Following Godfrey-Smith's observation that complexity can be cast as "disorder, in the sense of uncertainty" (1996, p. 24; see also pp. 153-154), we consider how the exigencies of biological regulation under conditions of uncertainty may have promoted the evolution of increasingly more complex inferential architectures, and how such architectures enable organisms to navigate complex environments with increasing adroitness.
To this end, we will consider three successive forms of generative model that may underwrite different sorts of creatures. First, we take a simple generative modeland implicit architecture for active inference-that may be suitable for explaining single-celled organisms that show elemental homoeostasis and reflexive behaviour. We then consider hierarchical generative models that have parametric depth, in the sense that they afford inference at multiple timescales (where faster dynamics at lower levels are contextualised by slower dynamics at higher levels). This produces adaptive systems that evince a deep temporal structure in their exchange with the environment by simply minimising free energy. An illustrative example of this in the active inference literature is birdsong; namely, the generation and recognition of songs that have an elemental narrative with separation of temporal scales (Kiebel et al. 2008). We will use this hierarchical scheme to explain certain aspects of allostasis such as circadian regulation, which permits the agent to implicitly track and adapt metabolic operations to slow temporal dynamics (i.e. cycles of night and day).
The third kind of generative model supplements parametric depth with temporal depth, or the ability to engage in counterfactural active inference. It is important to note that agents that are endowed with parametrically (but not temporally) deep models are quite limited; they can infer and adapt to future circumstances, but cannot actively select which one to attend. For example, although birds can recognise particular songs of conspecifics, this form of perceptual inference does not entail actively attending to one bird or another. In other words, it does not entail a selection among ways in which to engage with the sensorium. To bring this kind of selection into the picture, one needs to evaluate the expected free energy following one or another action (e.g., attending to one bird or another). However, in order to evaluate expected free energy, one has to have a generative model of the future-that is, the consequences of action. This in turn calls for generative models with temporal or counterfactual depth that are necessary to evaluate the expected free energy of a given policy. It is this minimisation of expected free energy-that converts sentient systems into agents that reflect and plan, in the sense of entertaining the counterfactual outcomes of their actions-that we associate with cognition.

Model 1: Minimal active inference
First, let us consider a simple example of homeostatic conservation through a 'minimal' active inference architecture. 27 We model this 'creature' on simplified aspects of Escherichia coli (E. coli) bacteria to emphasise the generality of such schemes beyond neurally-implemented control systems.
Our E. coli-like creature is a unicellular organism equipped with a cell membrane (i.e. a Markov blanket separating internal from external states), a metabolic pathway (i.e. an autopoietic network that harnesses thermodynamic flows to realise and replenish the organism's constitutive components), and a sensorimotor pathway; but at the outset nothing approximating a nervous system (actual E. coli is of course much more complicated than this). Cellular metabolism depends on the agent's ability to absorb sufficient amounts of nutrient (e.g., glucose) from its immediate environment. However, the distribution of nutrient varies across the environment, meaning the agent must seek out nutrient-rich patches in order to survive. Like real E. coli, our creature attempts to realise this goal by alternating between two chemotactic policies: Run (i.e. swim along the present course) versus Tumble (i.e. randomly reorient to a new course, commence swimming; see Fig. 1).
Our simplified E. coli-like creature embodies a model that encodes an expectation to inhabit a nutrient-rich milieu. Variation in the environment's chemical profile means that this expectation is not always satisfied-sometimes the agent finds itself in regions where chemical attractant is relatively scarce. Crucially, however, the organism can infer its progress along the nutrient gradient through periodic sampling of its chemosensory states, and acts on this information such that it tends to swim up the gradient over time. 28 This rudimentary sensorimotor control architecture affords the agent a very primitive picture of the world-one that picks out a single, salient dimension of environmental complexity (i.e. attractant rate of change). The capacity to estimate or infer this property implies a model that prescribes a fixed expectation about the kind of milieu the agent will inhabit, while also admitting some degree of uncertainty as to whether this expectation will be satisfied at any given moment. The task of the agent is to accumulate evidence in favour of its model by sampling from its policies in such a way that it ascends the nutrient gradient, thereby realising its expected sensory states (cf. Tschantz et al. 2019).
Although severely limited in terms of the perceptual or representational capacities at its disposal, this need not imply suboptimality per se. Consider the case in which various kinds of attractant are compatible with the organism's chemoreceptors. The agent cannot discriminate amongst these chemical substances; all it can do is infer the presence (or absence) of 'nutrient' at its various receptor sites. Assuming all forms of chemical attractant are equally nutritious (i.e. equally 'preferable' or 'valuable' given the agent's phenotype), this source of environmental heterogeneity turns out to be entirely irrelevant to the system's ongoing viability. Consequently, the extra structural and functional complexity required to distinguish these substances would afford the organism no adaptive benefit-on the contrary, the additional metabolic costs incurred by such apparatus might pose a hindrance. 29 Our E. coli-like creature thus trades in a rather coarse representational currency, thereby minimising the costs associated with unwarranted degrees of organisational complexity. This is an example of optimising the trade-off between model accuracy and complexity (Fitzgerald et al. 2014;Hobson and Friston 2012;Moran et al. 2014), where the simplest model to satisfactorily explain observed data (i.e. the presence/absence of nutrient) defeats more complex competitors (or on an evolutionary timescale, where natural selection favours the simplest model that satisfices for survival and reproductive success ;Campbell 2016;Friston 2018). This also explains why some creatures might have evolved simpler phenotypes from more complicated progenitors-natural selection 'rewards efficiency' over the long-run (McCoy 1977).
This caveat notwithstanding, there remain a great many aspects of the environment that the minimal active inference agent fails to model despite their potential bearing on its wellbeing. One such omission is the system's incapacity to represent the evolution of its states over multiple sensory samples. This limitation is significant, since it prevents the organism from discerning patterns of variation over time, which in turn renders it overly sensitive to minor fluctuations in prediction error. For instance, the organism might trigger its Tumble policy at the first sign of gradient descent, even though this decrement might stem from a trivial divergence in the quantity of attractant detected across sensory samples. Unable to contextualise incoming sensory information with respect to the broader trajectory of its sensory flows, the agent risks tumbling out of a nutrient-rich stream due to innocuous or transient instability of the gradient, or due to the random error introduced by inherently noisy signalling pathways.
Relatedly, the agent's inability to retain and integrate over past experiences precludes the construction of map-like representations of previously-explored territory. The organism thus loses valuable information about the various conditions encountered on previous foraging runs-information that a more sophisticated creature could potentially exploit in order to extrapolate the most promising prospects for future forays. Moreover, it also lacks the necessary model parameters to track various distal properties that modulate or covary with the distribution of attractant (e.g., weather conditions, conspecifics, etc.). The agent is thus unable to exploit the patterned regularities that obtain between proximal and distal hidden states, and that afford predictive cues about the likely consequences of pursuing a particular policy (cf. fish species whose swim policies are informed by predictions about distal 29 The story changes if the organism's receptors are compatible with molecules it cannot metabolise, or that afford low nutritional value (assuming such molecules are prevalent enough to significantly interfere with chemotaxis). See Sterelny (2003, pp. 20-26) for discussion of the challenges posed by 'informationally translucent environments' that confront organisms with ambiguous (or misleading) cues. Environmental translucence calls for greater model complexity; e.g., the capacity to integrate information harvested across multiple sensory channels (cf. robust tracking; Sterelny 2003, pp. 27-29). 32 Page 20 of 45 feeding conditions and temperature gradients; Fernö et al. 1998;Neill 1979). Unable to 'see' beyond the present state of its sensory interface with the world, the organism has no option but to tumble randomly towards a new, unknown territory each time prediction error accrues.
In sum, the agent we have described here embodies a very simple active inference scheme; one which supports adaptive responses to an ecologically-relevant dimension of environmental complexity. While the agent does not always succeed in inferring the best chemotactic policy in a given situation, its strategy of alternating between active states in accordance with local nutrient conditions is cheap  Fig. 1 A simple active inference model of bacterial chemotaxis. This figure depicts a simple active inference agent that must sample from its sensory states in order to infer the best course of (chemotactic) action. Since the organism expects its transmembrane chemoreceptors to be occupied by attractant molecules, absence of attractant at these sites evokes prediction error (red triangles). These signals are projected (e.g., via protein pathways; red arrows) to the agent's motor control network, where they are summed and compared to the expectation induced by the previous wave of sensory input (black circle). If the prediction error generated by current sensory input is reduced relative to that of the preceding cycle of perceptual inference, this constitutes evidence that the agent is ascending the nutrient gradient; i.e. evidence favouring the Run policy (1). Conversely, increased prediction error furnishes evidence of gradient descent, thus compelling the agent to sample from its Tumble policy (2). Here, policies are enacted via prediction errors that induce clockwise (Tumble) or anti-clockwise (Run) flagellar motion. Note that the organism's metabolic system has been omitted from this schematic. Figure  and efficient, and tends to prevent it from drifting too far beyond its attracting set. But the severe epistemic constraints enforced by the agent's extremely narrow representational repertoire-both in the sense of its highly constricted spatiotemporal horizon, and in the poverty of its content-render this organism a creature of hazard. Unable to profit from past experience or future beliefs, it is locked in a perpetual present. This creature is thus thoroughly homeostatic in nature, activating its effector mechanisms whenever error signals indicate deviation beyond setpoint bounds. 30 Before moving onto our next model, let us briefly consider whether a creature could exist by simply maintaining its homeostatic stability in the absence of exteroceptive modelling and action. 31 When a creature of this sort encounters surprising deviations from its homeostatic expectations it only ever adjusts its internal states, never its active states. It may for instance change its metabolic rate (e.g., slow respiration, inhibit protein synthesis) in response to altered nutrient conditions, rather than acting on the environment in order to reinstate homeostatic equilibrium. 32 It is difficult to see how such a creature could actually exist in anything but a transitory, serendipitous manner. Changing its internal states in response to interoceptive prediction error is tantamount to yielding entirely to uncertainty. For example, as the nutrient gradient declines the organism's metabolic rate keeps decreasing, until it eventually starves to death-its states disperse throughout all possible states. An organism that fails to act upon its environment is ill-placed to avoid surprise and resist entropy. Only by happening to occupy a perfectly welcoming niche could it survive, but this is just to assume an environment devoid of uncertainty-not our world. 33 30 Indeed, one might construe the minimal model as a simplified analogue of Ashby's (1960) 'Homeostat'. 31 See Godfrey-Smith (2016b) for a complementary discussion of this topic in relation to microbial proto-cognition and metabolic regulation. 32 One might call this entity a Spencerian creature; i.e. an organism that responds to environmental change through "the continuous adjustment of internal relations to external relations" (Spencer 1867, p. 82; see discussion in Godfrey- Smith 1996, pp. 70-71). From an active inference perspective, this creature is the embodiment of pure perception; i.e. an organism that reconfigures its internal states (updates its model) in accordance with external conditions, without ever seeking to alter such conditions (cf. Bruineberg et al. 2018;Corcoran 2019). 33 One might play with the idea of entities that could exist like this quite happily once the ideal, invariant niche is discovered-perhaps deep within rocky crevices or underwater (one is reminded of the sea squirt that consumes its own brain after settling upon a permanent home, but the anecdote turns out to be an exaggeration; see Mackie and Burighel 2005). However, entities of this sort would surely fail to qualify as adaptive biological systems-at least insofar as the notion of adaptability implies some capacity to maintain one's viability in the face of time-varying environmental dynamics (cf. 'mere' vs. 'adaptive' active inference; Kirchhoff et al. 2018). Moreover, such entities would also fail to qualify as agents in any biologically relevant sense (see for e.g., Moreno and Etxeberria 2005).
Interestingly, this scenario is reminiscent of a common criticism levelled against the free energy principle: the so-called dark-room problem (Friston et al. 2012e). The thrust of this argument is that free energy minimisation should compel agents to seek out the least-surprising environments possible (e.g., a room devoid of stimulation) and stay there until perishing. Various rejoinders to this charge have been made (see for e.g., Clark 2018; Hohwy 2013; Schwartenbeck et al. 2013), including the observation that this strategy will inevitably lead to increasing free energy on account of accumulating interoceptive prediction error (Corcoran 2019;Pezzulo et al. 2015). More technically, "itinerant dynamics in the environment preclude simple solutions to avoiding surprise" (Friston et al. 2009, p. 2), where the environment referred to here includes the biophysical conditions that obtain within the organism, as well as without.

Model 2: Hierarchical active inference
Next, let us consider a more elaborate version of our creature, now equipped with a more sophisticated, hierarchical generative model of its environment-one which captures how environmental dynamics unfold over multiple timescales. Because higher levels of the generative model subtend increasingly broad temporal scales (Friston et al. 2017d;Kiebel et al. 2008), we shall see that this creature is capable of inferring the causes of slower fluctuations in the nutrient gradient. An implication of this arrangement is the emergence of parameters encoding higher-order expectations about the content and variability of sensory flows over time (cf. the fixed expectation of a high-nutrient state in Model 1).
In the interests of tractability, we limit ourselves to a fairly schematic illustration of hierarchical active inference in the context of circadian regulation. Circadian processes are near ubiquitous features of biological systems (even bacteria like E. coli show evidence of circadian rhythmicity; Wen et al. 2015), and provide a useful example of how internal dynamics can be harnessed to anticipate environmental variability. Circadian clocks are endogenous, self-sustaining timing mechanisms that enable organisms to co-ordinate a host of metabolic processes over an approximately 24 h period (Bailey et al. 2014;Dyar et al. 2018). From an allostatic perspective, circadian oscillations furnish a temporal frame of reference enabling the organism to anticipate (and efficiently prepare for) patterned changes in ecologically-relevant variables (e.g., diurnal cycles of light and temperature variation). 34 We can incorporate a molecular clock within our active inference agent by installing oscillatory protein pathways within its metabolic network (Nakajima et al. 2005;Rust et al. 2007;Zwicker et al. 2010). With this timing mechanism in place, our creature may begin to track systematic variations in the temporal dynamics of its internal and sensory states.
Suppose our organism exists in a medium that becomes increasingly viscous as temperature declines overnight. The impact of these environmental fluctuations is two-fold: Colder ambient temperatures cool the organism, slowing its metabolic rate; greater viscosity increases the medium's resistance, making chemotaxis more energy-intensive. Initially, the agent might interpret unexpectedly high rates of energy expenditure as indicative of suboptimal chemotaxis, thus compelling it Footnote 33 (continued) This is to say that the attractors around which adaptive biological systems self-organise are inherently unstable-both autopoietic ('self-creating') and autovitiating ('self-destroying')-thus inducing itinerant trajectories (heteroclinic cycles) through state-space (Friston 2011(Friston , 2012bFriston and Ao 2012;Friston et al. 2012c).
In other words, dark rooms may very well appeal to creatures like us (e.g., as homeostatic sleep pressure peaks towards the end of the day), but the value such environments afford will inevitably decay as alternative possibilities (e.g., leaving the room to find breakfast after a good night's sleep) become more salient and attractive (cf. alliesthesia, the modulation of affective and motivational states according to (time-evolving) physiological conditions ;Berridge 2004;Cabanac 1971). 34 Note that the allostatic treatment of circadian regulation may in principle be extended to periodic phenomena spanning shorter or longer timescales; e.g., ultradian and circannual rhythms. to sample its Tumble policy more frequently in an effort to discover a nutrient-rich patch. Over time, however, the agent may come to associate a particular phase of its circadian cycle with higher average energy expenditure irrespective of policy selection. Our creature can capitalise on this information by scheduling its more expensive metabolic operations to coincide with warmer times of day, while restricting its nocturnal activity to a few essential chemical reactions. In other words, the agent can reorganise its behaviour (i.e. develop a rudimentary sleep/wake cycle) in order to improve its fit with its environment. 35 This scenario is indicative of how a relatively simple hierarchical agent may come to model time-varying hidden states in the distal environment. Like its minimal active inference counterpart, the hierarchical agent registers fluctuations in its sensory and internal states, and responds to them appropriately given its available policies. Unlike the minimal agent, however, these rapid fluctuations are themselves subject to second-order processing, in which successive sensory samples are integrated under a probabilistic representation of first-order variation (see Fig. 2). The ability to contextualise faster fluctuations in relation to the slower oscillatory dynamics of the circadian timekeeper enables the agent to infer that it is subject to periodic environmental perturbations, the origin of which can be parsimoniously ascribed to some unitary external process. 36 This example hints at a central tenet of the active inference scheme; namely, that the hierarchical organisation of the generative model implies a hierarchy of temporal scales, where causal dynamics subtending larger timeframes are encoded at higher levels of the model (Friston 2008;Friston et al. 2017d;Kiebel et al. 2008).
The hierarchical picture we have sketched here speaks to two complementary aspects of representational detachment (cf. Gärdenfors 1995;Pezzulo and Castelfranchi 2007;Pezzulo 2008) engendered by allostatic architectures. First, the separation of processing layers within the model hierarchy gives rise to a kind of temporal decoupling, in which higher layers construct extended representations of low-level sensory states. Although it might be tempting to think of these representations as aggregates of successive sensory samples, this does not do justice to the sophisticated nature of perception under active inference. Rather, higher layers of the hierarchy are perpetually engaged in modelling the evolution of the organism's sensory and internal states, and thus inferring the probable motion of the distal causes of its sensory flows. Consequently, higher-order representations 'reach out' beyond 35 This scenario is not meant to imply that circadian rhythms are actually acquired in this fashion (although they are clearly susceptible to modulation through external cues). Rather, the idea we are trying to illustrate here is the way hierarchical architectures ground adaptive regulation over longer timescales by dint of their capacity to capture recurrent, slowly evolving patterns of environmental variation. 36 Notice that the agent forms a representation of a hidden cause corresponding to diurnal patterns of temperature variation despite its lack of exteroceptive sensitivity to such variables as temperature, viscosity, light, etc. Rather, it detects regular changes in its dynamics that cannot be ascribed to its own actions (which average out across the 24 h period), and infers some hidden external process as being responsible for these changes. It might not be right to say the agent represents ambient temperature per se, nor indeed the higher-order causes of the latter's oscillation (sun exposure, planetary rotation, etc.). Our agent lacks sufficient hierarchical depth to arrive at such conclusions, collapsing these fine-grained distinctions into a fairly 'flat', undifferentiated representation of diurnal variation. the limits of each sensory moment, extrapolating forwards and backwards in time to synthesise an expanded temporal horizon (see Fig. 2a).
Second, there is a related sense in which higher-level processing within the hierarchy realises a more negative or reductive kind of detachment from low-level sensory input. Higher-level representations do not merely recapitulate (and predict) the bare contents of sensory experience, but seek instead to extract patterned continuities amidst the flux of sensory stimulation. This is to say that higher levels of the model attempt to carve out biologically-relevant signals within the agent's environment, while dampening or discarding the remaining content of sensory flows. This again speaks to the tension between model accuracy and complexity: Good models capture real patterns of environmental complexity, without being overly sensitive to the data at hand (and thus at risk of accruing prediction error over the long-run; Hohwy 2017b).
If this account is on the right track, the generative model can be construed as a kind of (Bayesian) filter (Friston et al. 2010b) that strips sensory signals of their higher-frequency components as they are passed up the hierarchy. In conjunction with the 'horizontal' temporal processing described above (which can likewise be understood in terms of noncausal filtering or smoothing, where past and future state estimates are updated in light of novel sensory data; Friston et al. 2017a), this 'vertical' filtering scheme enables the organism to form reliable higher-order representations of the slowly-evolving statistical regularities underlying rapid sensory fluctuations. The organism is thus able to model the slow oscillatory dynamics embedded within the distal structure of its eco-niche (e.g., the diurnal temperature cycle), even though the particular sensory states through which these dynamics are accessed may vary considerably over time (e.g., temperature variation may be modulated by multiple interacting factors subtending multiple timescales-momentary occlusion of the sun, daily and seasonal weather cycles, climate change, etc.).
These dual facets of representational detachment help to explain not only how the hierarchical agent learns about invariant properties of an ever-changing environment, but also how it can exploit such regularities to its advantage. Circadian rhythms offer a particularly good example of how abstract representations of oscillatory dynamics foster adaptive behaviour in the context of environmental uncertainty. 37 Given a reliable model of how certain environmental properties are likely to evolve, the agent can form allostatic predictions that enable it to act in preparation for impending conditions, even if such expectations run contrary to current sensory evidence.
An interesting corollary of this view is the role of allostatic representations (e.g., circadian templates or programmes of activity) in compelling the agent to act 'as if' particular states of affairs obtain. Under certain conditions, such allostatic predictions amount to a kind of false inference about the hidden states that are currently in play. Although such predictions might be expected to engender actions that accumulate prediction error, the agent persists with them on account of their prior precision, which causes conflicting sensory evidence to be downweighted or attenuated (Brown et al. 2013;Wiese 2017).
Returning to our earlier example, let us imagine that the hierarchical agent leverages its internal representation of diurnal temperature variation to schedule its activities to coincide with favourable environmental conditions. For instance, the organism might preemptively downregulate metabolic activity in preparation for nocturnal quiescence, irrespective of whether the ambient temperature has declined to an extent that would impair its metabolic efficiency. Likewise, the agent might begin to upregulate its activity around its usual time of 'awakening', despite the fact Perceptual dynamics under hierarchical active inference. a In this illustration, the minimal active inference scheme has been augmented with a second-order perceptual inference level that tracks changes in the nutrient gradient over time. The purple function in the top panel indicates the agent's time-evolving estimate of ambient nutrient levels, which is derived from first-order sensory inferences (middle panels) on successive chemosensory receptor states (bottom panel raster plots; black cells indicate occupied receptor sites at time t). This function oscillates slowly as detected nutrient levels remain more or less stable over time, each incoming 'packet' of sensory information smoothly integrated within the broader temporal horizon of predicted and postdicted sensory states. The function begins to oscillate more rapidly when the organism experiences marked deviations from its expected states (right panels). This sudden volley of prediction error precipitates an increase in the precision on first-order prediction errors, enhancing the agent's perceptual sensitivity to environmental fluctuations. Increasing variability of sensory input also induces greater uncertainty about the trajectory of sensory states (as reflected in the broadening blue shading). b Schematic of a possible implementation of the hierarchical active inference scheme depicted in A. Sensory input from chemoreceptors (green hexagons) is received at the first processing level and compared to sensory expectations (black circles). Discrepancies between expected and actual input generate prediction errors (red triangles), which are passed up the hierarchy to the second processing level. Crucially, these prediction errors are modulated by precision estimates (blue square), which determine the 'gain' or influence ascribed to error signals (where high gain compels expectation units to conform with prevailing sensory evidence). Expected precision over first-order prediction errors is modulated in turn by second-order prediction error, which increases the gain on first-order errors. See Kanai et al. (2015), Parr and Friston (2018a), and Shipp (2016) for more detailed discussion of how such hierarchical schemes might be implemented in the brain. Figure reproduced from Corcoran et al. 2019 (CC BY 4.0) that this routine provokes an elevated rate of energy expenditure on an usually chilly morning.
On first blush, this arrangement might seem suboptimal-surely the agent would be better off tuning its behaviour to actual environmental conditions, rather than relying on error-prone predictions? However, this would simply return us to the kind of closed-loop architecture of the minimal active inference agent; a creature incapable of distinguishing a genuine change in distal conditions from a transient deviation in its sensory states. In this sense our agent's circadian gambit constitutes a more intelligent mode of regulation-armed with implicit knowledge of how state trajectories tend to evolve, the organism acts on the assumption that the future will roughly approximate the past, and treats transient deviations from this prescribed pattern as mere noise (i.e. the inherent uncertainty associated with stochastic processes).
Hence, although circadian rhythms might not guarantee ideal behaviour on shorter timescales, their adaptive value inheres in their ability to approximate the trajectory of homeostatically-relevant states over time. Such allostatic representations provide useful heuristics for guiding action-behaving in accordance with circadian predictions keeps the agent within the vicinity of its attracting set, thus affording a highly efficient means of reducing average uncertainty. Representations of this sort are insensitive to short-term fluctuations precisely because such transient dynamics (e.g., an unseasonably cold morning) are unlikely to afford information that improves its capacity to accurately predict future states. Circadian rhythms are therefore 'robust' to outlying or stochastic fluctuations in sensory data, thus constituting a reliable model of the underlying generative process. 38 In contrast to the minimal active inference agent, the hierarchical organism can exploit regularities in its environment to predict when and where it will be best placed to act, rather than responding reflexively to online sensory updates. Yet, while deep hierarchical architectures afford substantial advantages over the minimal scheme of Model 1, their capacity to reduce uncertainty through parameter estimation is most effective in a relatively stable world. Sudden alterations in environmental conditions (e.g., exchanging the European winter for the Australasian summer) require relatively long periods of reparameterisation, and may engender suboptimal, surprise-accruing behaviour in the interim. Flexible adaptation to novel (or rapidlychanging) situations requires generative models endowed with a temporal depth that transcends the hierarchical separation of fast and slow dynamics. We discuss such models next. 38 The remarkable robustness of circadian oscillations is thrown into relief whenever one traverses several time-zones-a good example of how strongly-held (i.e. high-precision or 'stubborn'; see Yon et al. 2019) allostatic expectations persist in the face of contradictory sensory evidence (i.e. the phase-shifted photoperiod and feeding schedule, to which the system eventually recalibrates; Asher and Sassone-Corsi 2015; Menaker et al. 2013).

Model 3: Counterfactual active inference
Our final model describes a biological agent equipped with a temporally deep model, which furnishes the ability to explicitly predict and evaluate the consequences of its policies. While this kind of generative model is undoubtedly the most complex and sophisticated of our three active inference schemes, it is also the most powerful, insofar as it allows the agent to perform counterfactual active inference. 39 Counterfactual active inference adds to the hierarchical processing of progressively deeper models through subjunctive processing: The agent can evaluate the expected free energy of alternative policies under a variety of different contexts before alighting on the best course of action (Friston 2018;Limanowski and Friston 2018). Our understanding of subjunctive processing draws on the Stalnaker-Lewis analysis of counterfactual conditionals, where the truth-conditions of a consequent are determined in relation to the possible world invoked by its antecedent (Lewis 1973b;Stalnaker 1968, see also Nute 1975Sprigge 1970;Todd 1964). 40 In the context of active inference, counterfactual processing translates to the simulation of those sensory states that the organism would observe if it were to enact a certain policy under a particular set of model parameters (i.e. a possible world).
Our formulation of counterfactual inference implies two complementary processes, which we briefly introduce here. The first of these involves counterfactual inference on policies under spatiotemporally distal conditions. For example, the agent could reflect on a previous decision that precipitated a negative outcome, and consider how events might have unfolded differently (for better or worse) had it selected an alternative course of action (i.e. 'retrospective' inference). Similarly, the agent could envisage a scenario that it might encounter in the future, and imagine how various policies might play out under these circumstances (i.e. 'prospective' inference). This kind of counterfactual processing is useful for resolving uncertainty over the outcomes expected under various policies, and is integral to many sophisticated forms of cognitive processing (e.g., causal induction, mental time travel, mindreading, etc.; Buckner and Carroll 2007;Pezzulo et al. 2017;Schacter and Addis 2007;Corballis 1997, 2007).
The second kind of uncertainty reduction mediated by counterfactual processing pertains to the arbitration of policies when the state of the world is itself ambiguous. This situation may arise due to uncertainty about the context that currently obtains (or relatedly, uncertainty over the consequences of policies within a particular context), or because the inhabited niche is inherently volatile (i.e. prone to fluctuate in ways that are relevant for the organism's wellbeing, yet difficult to anticipate). Under such circumstances, counterfactual hypotheses may prove useful in two ways: (1) they may enable the agent to infer the policy that minimises (average) uncertainty across a variety of possible worlds; (2) they may point towards 'epistemic' actions that help to disambiguate the actual state of the world (i.e. disclose the likelihood mapping that currently obtains), thus improving precision over policies.
As a brief illustration of counterfactual inference, let us consider an iteration of our E. coli-like creature that can evaluate the outcomes of its policies across several possible worlds. An organism sensitive to incident light could for instance run a counterfactual simulation for a possible world in which there is much scattered sunlight, and compare this to an alternative world featuring relatively little sunlight. If sunlight poses a threat to the bacterium (perhaps sun exposure causes the nutrient patch to dry up), tumbling constitutes a riskier strategy in the sun-dappled world. If it can order these possible worlds on the basis of their similarity to the actual world, then these counterfactual simulations could prove informative about the best action to take in a particular situation. 41 Should the sun-dappled world turn out more similar to the actual world, then the organism would do well to confine its foraging activity to shady regions of the environment. The agent might consequently adapt its policies such that it tolerates gradient descent in the context of low incident light, only risking the Tumble policy when the nutrient supply is critically depleted.
Counterfactual processing enriches the generative model greatly, relative to the hierarchical organisation described in the previous section. Now there is wholly detached generative modelling of fine-grained elements of the prediction error landscape through simulated action; there is (Bayesian) model selection in terms of the best policy (i.e. minimising the free energy between the nutrient gradient simulated under a policy and the organism's expected nutrient gradient; cf. Fitzgerald et al. 2014;Friston et al. 2016Friston et al. , 2017bParr and Friston 2018b); and there is processing that orders possible worlds (i.e. hypotheses entailed under competing model parameterisations) according to their comparative similarity to the actual world (where similarity may be cashed out in terms of representations of law-like relations (e.g., between nutrient gradient and sunlight) and particular matters of fact (e.g., amount of nutrient and sunlight); cf. Lewis 1973aLewis , b, 1979. This contrasts sharply with the hierarchical agent, whose representational states are never completely detached from the content of its sensory flows, and whose active states are modulated gradually in response to reliable patterns of covariation.
More formally, counterfactual active inference rests on the ability to calculate the expected free energy of one's policies. This is important for our analysis because the expected free energy of a policy can be decomposed into two terms-expected complexity and expected accuracy-which can be regarded as two kinds of uncertainty: risk and ambiguity (Friston et al. 2017a, b, d). 42 Technically, risk constitutes 41 Interestingly, recent psychological evidence suggests that counterfactual scenarios deemed more similar to previously experienced events are perceived as more plausible and easier to envisage (i.e. simulate) than more distant alternatives (Stanley et al. 2017). This observation lends weight to the idea that humans evaluate competing counterfactual predictions in accordance with their proximity to actual states of affairs, where proximity or similarity might be cashed out in terms of (Bayesian) model evidence (see Fitzgerald et al. 2014). 42 Risk and ambiguity are also known as irreducible uncertainty and (parameter) estimation uncertainty, respectively (de Berker et al. 2016;Payzan-LeNestour and Bossaerts 2011). Note that uncertainty can be a relative uncertainty (i.e. entropy) about predicted outcomes, relative to preferred outcomes, whereas ambiguity is a conditional uncertainty (i.e. entropy) about outcomes given their causes. More intuitively, risk can be understood as the probability of gaining some reward (e.g., finding a cookie) as a consequence of some action (e.g., reaching into a cookie jar), while ambiguity pertains to the fact that an observation might have come about in a variety of different ways (e.g., the cookie in my hand might have been given to me, stolen from the jar, etc.). 43 Counterfactual active inference agents need to consider both of these sources of uncertainty during policy selection. This is because resolving ambiguity will increase the agent's confidence about the process(es) responsible for generating observations, enabling it to calculate the risk (i.e. expected cost) associated with alternative courses of action.
With counterfactual inference at its disposal, the organism is potentially even better equipped to meet the demands of a complex and capricious environment. 44 Rather than engaging 'hard-wired' responses to current states (cf. Model 1), or 'softwired' responses to anticipated states (cf. Model 2), it can exploit offline computation of the likely consequences of different policies under various hypothetical conditions (Gärdenfors 1995;Grush 2004;Pezzulo 2008). This affords the opportunity to generate and test a wide variety of policies in the safety of its imagination, where actions that turn out to be too risky (or downright stupid) can be safely trialed and (hopefully) rejected (cf. Craik 1943, p. 61;Dennett 1995, pp. 375-376;Godfrey-Smith 1996, pp. 105-106). This capacity (or competence, see Williams 2018) to disengage from the present and undertake such 'thought experiments' confers a powerful mechanism for innovation, problem-solving, and (vicarious) learning-major 43 This characterisation of risk and ambiguity is broadly consistent with descriptions in economics (e.g., Camerer and Weber 1992;Ellsberg 1961;Kahneman and Tversky 1979;Knight 1921) and neuroscience (e.g., Daw et al. 2005;Hsu et al. 2005;Huettel et al. 2006;Levy et al. 2010;Payzan-LeNestour and Bossaerts 2011;Preuschoff et al. 2008; for a review, see Bach and Dolan 2012). Importantly, these two sorts of uncertainty rest upon the precision (inverse variability) of the likelihood mapping between outcomes and hidden states-and transitions amongst hidden states that may or may not be under the creature's control. Technically, the first sort of precision relates to observation noise, while the second relates to system or state noise, i.e. volatility. Formally, volatility can be construed as the (inverse) precision over transition probabilities (i.e. confidence about the way hidden states evolve over time; Parr and Friston 2017;Parr et al. 2019;Sales et al. 2019;Vincent et al. 2019). This formulation suggests that volatile environments will tend to generate more surprising outcomes than stable environments, insofar as their states are apt to change in ways that are difficult to anticipate. Note that the term volatility is used differently in various contexts (see for e.g., Behrens et al. 2007;Bland and Schaefer 2012;Mathys et al. 2014). 44 One caveat to this claim is that the (neuro)physiological mechanisms and cognitive operations required to enrich and exploit counterfactual predictive models may themselves engender additional costs (e.g., planning a new course of action requires time, energy, and effort; see Zénon et al. 2018). We assume that the costs incurred by such processes 'pay for themselves' over the long-run (or at least tend to on average), insofar as they enable the agent to exploit prior experience in ways that are conducive to adaptive behaviour (see Buzsáki et al. 2014;Pezzulo 2014;Pezzulo et al. 2017;Suddendorf et al. 2018). It is also worth pointing out that some of the costs engendered by counterfactual inference-supporting architectures may be mitigated by a variety of adaptive strategies (e.g., model updating during sleep, habitisation of behaviour under stable and predictable conditions; see Fitzgerald et al. 2014;Friston et al. 2017b;Hobson and Friston 2012;. Footnote 42 (continued) advantages in complex environments (Buzsáki et al. 2014;Mugan and MacIver 2019;Redish 2016).
The counterfactual active inference scheme described here implies additional degrees of organismic complexity that can be exploited to mitigate the impact of environmental uncertainty. The counterfactual agent is not only capable of 'expecting the unexpected' (inasmuch as it can countenance states of affairs that are unlikely under its current model of reality), but can prepare for it too-exploiting counterfactual hypotheses to formulate strategies for solving novel problems that might arise in the future (e.g., deciding what one should do in the event of sustaining a puncture while cycling to work). Moreover, the agent may organise its policy sets in ways that are sensitive to outcome contingencies, such that it can choose a backup policy if its initial plan is thwarted (e.g., being prepared to order the apple pie if the tiramisu has sold out). This ability to deftly switch between a subset of low-risk policies may confer a huge advantage under changing (or volatile) environmental conditions, where the time and effort required to re-evaluate a large array of policies from scratch could prove extremely costly.
Counterfactual processing is also valuable when the system is confronted with a sudden or sustained volley of prediction error. The counterfactual agent is able to interpret such signals as evidence that the hidden dynamics underwriting its sensory flows may have changed in some significant way (e.g., finding oneself confronted by oncoming traffic), and can draw on alternative possible models to evaluate which parameterisation affords the best explanation for the data at hand (cf. parameter exploration; Schwartenbeck et al. 2019). If the contingent relations structuring relevant environmental properties have indeed altered (e.g., realising one is visiting a country where people drive on the opposite side of the road), the agent will need to update its model so as to capture these novel conditions (see Sales et al. 2019). Failure to do so runs the risk of accruing further prediction error, since persisting with policies predicated on inaccurate (i.e. 'out-of-date') likelihood mappings may yield highly surprising outcomes.
One way to assess whether conditions or contexts have indeed changed is to engage in epistemic action, the final feature of counterfactual active inference we address here. Epistemic actions are active states that are sampled in order to acquire information about environmental contingencies (Friston et al. 2015b(Friston et al. , 2017a. 45 When faced with the problem of identifying which model best captures the causal structure of the world, the agent can run simulations to infer the sensory flows each model predicts under a certain policy. The agent can then put these hypotheses to the test by sampling actions designed to arbitrate amongst competing predictions (Seth 2015). If the agent selects actions that are high in epistemic value, it will observe outcomes that afford decisive evidence in favour of the model that best captures the current environmental regime.
The possibility of resolving ambiguity over the parameterisation of state-outcome contingencies through counterfactually-guided epistemic action also extends to ambiguity over policies. Here, the agent may run counterfactual simulations to infer actions that are likely to harvest information that clarifies the best policy to pursue. 46 These epistemic capabilities recapitulate the point that the policies of the counterfactual agent are not only scored with respect to risk-reduction or expected value (i.e. the extent to which they are expected to realise a preferred outcome), but also with respect to ambiguity-reduction or epistemic value (i.e. the extent to which they are expected to produce an informative outcome). Such epistemic actions are unavailable to the (merely) hierarchical agent, who can only reduce uncertainty over model parameters by slowly tuning its estimates to capture stable, enduring patterns of variation. 47

Two options for cognition
We began this paper with the lofty ambition of learning something about the nature and function of cognition, but have for the most part been careful to eschew talk of the cognitive or the mental. In this final section, we sketch out some of the broader implications of our analysis for the concept of biological cognition, and how the latter might be delimited from more general notions of life and adaptive plasticity.
As a precursory step, let us begin by considering how the three schematic models described in "Biological regulation in an uncertain world" might relate to real biological agents. One obvious strategy would be to map these architectures onto different taxonomic classes. For instance, one might construe the difference between these models as approximating the difference between relatively primitive organisms (like E. coli and other unicellular organisms), creatures with some degree of hierarchical depth (like reptiles or fish), and animals that demonstrate evidence of counterfactual sensitivity (like rodents; e.g., Redish 2016;Steiner and Redish 2014;Sweis et al. 2018;corvids;e.g., Bugnyar et al. 2016;Kabadayi and Osvath 2017;Raby et al. 2007; and primates; e.g., Abe and Lee 2011;Krupenye et al. 2016;Lee et al. 2005). 46 Such activity is sometimes referred to as epistemic foraging, where the agent seeks out information about the way state transitions are likely to unfold (Friston et al. 2017d;Mirza et al. 2016;Parr and Friston 2017). For a nice example of epistemic foraging in wild dolphins, see Arranz et al. (2018). 47 It is interesting to remark how epistmic action contributes to the practical utility of cognition as understood under the environmental complexity thesis. Following Dewey (1929), Godfrey-Smith (1996, pp. 116-120) notes that cognition is most likely to be useful in environments that comprise a mixture of regularity and unpredictability. Specifically, distal states should vary in ways that are a priori unpredictable (but worth knowing about), while maintaining a stable relationship with proximal states (see also Dunlap and Stephens 2016). The capacity to engage in epistemic action enhances the potential utility of cognition precisely insofar as it helps the agent to reduce uncertainty over this mapping, thus affording more precise knowledge (or novel insight; Friston et al. 2017b) about the state of the world and its possible alternatives. This approach is immediately undermined however by the remarkable complexity evinced by (at least some) unicellular organisms. Bacteria like E. coli integrate information over a variety of sensory channels, modulate their metabolic and chemotactic activity in response to reliable environmental contingencies, and alternate policy preferences in a context-sensitive fashion (Ben-Jacob 2009; Freddolino and Tavazoie 2012;Hennessey et al. 1979;Mitchell et al. 2009;Salman and Libchaber 2007;Tagkopoulos et al. 2008;Tang and Marshall 2018; see also van de Cruys 2017, for discussion from a predictive processing perspective). Although this does not rule out the possible existence of minimal active inference agents, it might suggest that all extant lifeforms instantiate some form of allostatic architecture. This raises the question of whether meaningful distinctions can be drawn in terms of hierarchical organisation (e.g. shallow vs. deep hierarchies), and whether such distinctions can be systematically mapped to particular functional profiles (e.g., capacities for learning and adaptive flexibility).
It might also be tempting to think of our model organisms as exemplifying creatures that are more or less evolved or adapted to their environment. Undoubtedly, the counterfactual agent comprises a more complex information-processing architecture than its minimal active inference counterpart, one equipped with a much greater capacity for flexible, selective adaptation to the vicissitudes wrought by uncertainty. However, we must be careful not to conflate adaptation to a specific set of environmental properties with adaptation to environmental complexity per se. On both the environmental complexity thesis and the free energy principle, organisms are adapted to their environments to the extent that they successfully track and neutralise ecologically-relevant sources of uncertainty (cf. 'frugal' generative models; Baltieri Clark 2015). This means that organisms comprising radically divergent degrees of functional complexity can in principle constitute equally good models of the same environment, assuming they are equally capable of acting in ways that minimise the conditional entropy over their sensory states.
Finally, given that the free energy principle conceives of all biological agents as being engaged in the same essential activity (i.e. the singular project of minimising free energy, maximising self-evidence, and thus conserving self-organisation over time), one might question whether there really are any substantive differences to be found between the levels of our three-tiered scheme. In conjunction with the argument presented in the previous paragraph, it might seem that these architectures differ from one another in a fairly superficial way: They simply illustrate alternative solutions to the fundamental problem of uncertainty reduction over time.
This point notwithstanding, we believe that the distinct functional capacities we have ascribed to these models carry important implications about the origins and limits of cognition. The fact that all three architectures are afforded equal footing by the free energy principle does not speak against this view-despite its neuroscientific origins (Friston 2002(Friston , 2003(Friston , 2005, the free energy principle makes no explanatory commitments to cognition per se; it simply imposes certain formal constraints on the sort of functional organisation a cognitive system must realise in order to resist entropy. This marks a significant distinction from the environmental complexity thesis, which on Godfrey-Smith's telling limits its explanatory scope to the subset of living organisms that count as cognitive agents. Put differently, the free energy principle is neutral on the ontological relation between life and cognition (pace Kirchhoff and Froese 2017). The environmental complexity thesis, on the other hand, endorses a weak continuity ("Anything that has a mind is alive, although not everything that is alive has a mind"; Godfrey-Smith 1996, p. 72) without specifying a principled way of demarcating the boundary between the cognitive and the non-cognitive. 48 We suggest this boundary can be located at the nexus between hierarchical and counterfactual forms of active inference. This would mean that only those biological systems capable of engaging in fully detached modes of representation, and of exploiting such representations for the purposes of uncertainty reduction, count as cognitive agents. 49 Associating cognition with counterfactual active inference might strike some as unduly restrictive, limiting category membership to humans and only the most intelligent of mammals and birds (for instance). It is important to bear in mind, however, that our construal of counterfactual processing is a formal one; many kinds of animals are likely to exploit counterfactual inferences in ways that enable them to learn about the world and make sensible (uncertainty-reducing) decisions. Some of these processing architectures might turn out to be highly impoverished compared to the rich counterfactual capacities at our own disposal (cf. Carruthers 2004), but we consider this difference a matter of degree, not kind.
Notably, our counterfactual criterion does not exclude such organisms as bacteria, protists, and plants from the cognitive domain by fiat. If clever empirical studies were to reveal that E. coli (for example) proactively solicit ambiguity-reducing information to plan their future chemotactic forays, this would afford compelling evidence they constitute cognitive agents. However, as pointed out in recent debates about future-oriented cognition in non-human animals, seemingly complex patterns of behaviour do not always licence the attribution of complex representational or inferential capacities (Redshaw and Bulley 2018;Suddendorf and Redshaw 2017;see Mikhalevich et al. 2017, for an environmental complexity-inflected counterargument). If empirical observations can be parsimoniously explained by appeal to such allostatic mechanisms as information integration (Read et al. 2015) and elemental 48 Godfrey-Smith thus rejects strong continuity, the view that "[l]ife and mind have a common abstract pattern or set of basic organizational properties.
[…] Mind is literally life-like" (1995, p. 320, emphasis in original). Evan Thompson (2007) has defended a position similar to this ('deep continuity'), albeit with the addition of an existential-phenomenological supplement (for discussion, see Wheeler 2011). This view inherits from Maturana's canonical account of autopoiesis, where one finds the strongest expression of life-mind continuity: "Living systems are cognitive systems, and living as a process is a process of cognition" (Maturana and Varela 1980, p. 13, emphasis added; see also Heschl 1990). 49 It is perhaps worth noting that other scholars have used the criterion of "detachment" (or "decouplability") to distinguish representational versus non-representational agents, rather than cognitive versus non-cognitive agents (cf. Clark and Grush 1999;Grush 2004). Without digressing into a discussion of the relationship between representational and cognitive systems, we remark that our view conceives of cognition as a computational architecture that engages in a particular subset of representational operations-i.e. the generation, manipulation, and evaluation of counterfactual model predictions. These operations are situated within a broader class of uncertainty-resolving processes, including the homeostatic and allostatic representational schemes outlined in "Biological regulation in an uncertain world". learning (Giurfa 2013;Perry et al. 2013), admittance to the cognitive domain ought to be withheld.
An alternative (and increasingly popular) approach would be to ascribe some form of 'minimal' or 'proto-cognitive' status to bacteria, plants, and other aneural organisms (Ben-Jacob 2009;Calvo Garzón and Keijzer 2011;Gagliano 2015;Godfrey-Smith 2016a, b;Lyon 2015Lyon , 2019Segundo-Ortin and Calvo 2019;Smith-Ferguson and Beekman 2019;van Duijn et al. 2006; for a dissenting view, see Adams 2018). Such terms might seem appealing in light of the mounting body of research claiming that many 'simple' organisms engage in primitive or precursory forms of cognitive activity (Baluška and Levin 2016;Levin et al. 2017;Tang and Marshall 2018). Granting such cases do indeed demonstrate genuine instances of learning, memory, decision-making, and so on, it seems only the staunchest of neuro-chauvinists would persist in denying the cognitive status of such organisms.
While we cannot do justice to this complex topic here, a few remarks are in order. First, we should acknowledge that there may be few substantive differences between the kinds of organisms we designate as hierarchical or allostatic agents, and the biological systems Godfrey-Smith and others would identify as exhibiting 'minimal' or 'proto-cognitive' capacities (e.g., Godfrey-Smith 2002, 2016b. 50 Both categories imply systems that track relevant states in their (internal and external) environments, and exploit this information to adaptively regulate their activity. Both categories also imply some form of evolutionary precedence over 'fully-fledged' cognitive agents-cognition 'proper' builds on the foundations laid by allostatic/proto-cognitive architectures.
The problem with such terminology is that it implies the ascription of some form of cognitive capacity, while remaining opaque as to its precise relation to 'fullblown' cognition-including the reason for its segregation from the latter (see Lyon 2019, for an extended critique). Is there some fundamental cognitive ingredient that proto-cognition lacks, or is it simply a scaled-down, severely degraded version of (say) animal cognition? If the latter, is the distinction between proto-and 'genuine' cognition marked by a critical boundary, or is the difference gradual and indeterminate? Godfrey-Smith explicitly endorses some variety of the latter view, frequently remarking that cognition 'shades-off' into other biological processes. But if protocognitive organisms ultimately fail to qualify as cognitive agents, 51 such talk may obscure a fundamental discontinuity.
We take it that the capacity for counterfactual processing marks the subtle but significant functional boundary hinted at in Godfrey-Smith's analysis. This proposal is-in most cases-stricter than other criteria often mentioned in the debate about minimal cognition: it implies that organisms that only engage in allostatic regulation (sometimes requiring forms of learning, memory, and decision-making) would not necessarily qualify as cognitive agents. Of course, testing which organisms meet this counterfactual criterion remains an important conceptual and empirical challenge.
In this respect, our proposed definition is not neuro-chauvinistic, but is focussed rather on a functional (computationally-grounded) definition of cognition that can be met-at least in principle-by many different kinds of organisms. On this view, a minimally cognitive agent is a minimally counterfactual agent-an organism that not only learns about itself and its environment, but imagines them anew. If we are wrong, and sophisticated forms of cognitive activity simply emerge as allostatic processing schemes become increasingly more powerful and hierarchically elaborate, then a single dimension along which cognition 'shades off' into primitive forms of sensorimotor control and metabolic regulation would seem the better option.