Optimising ‘ positive ’ findings from judgement bias tests : a consideration of inherent 1 confounding factors associated with test design and biology 2

9 The assessment of positive emotional states in animals has been advanced considerably through the use 10 of judgement bias testing. JBT methods have now been reported in a range of species. Generally, these 11 tests show good validity as ascertained through use of corroborating methods of affective state 12 determination. However, published reports of judgement bias task findings can be counter-intuitive and 13 show high inter-individual variability. It is proposed that these outcomes may arise as a result of inherent 14 interand intra-individual differences as a result of biology. This review discusses the potential impact 15 of sex and reproductive cycles, social status, genetics, early life experience and personality on 16 judgement bias test outcomes. We also discuss some aspects of test design that may interact with these 17 factors to further confound test interpretation. 18 There is some evidence that a range of biological factors affect judgement bias test outcomes, but in 19 many cases this evidence is limited and needs further characterisation to reproduce the findings and 20 confirm directions of effect. It is our proposition that researchers should consider dedicated study on 21 these factors and their impact on judgement biasing. This is needed to confirm effect and investigate 22 mechanisms. Alternately, consideration and reporting of these factors in JBT studies through 23 incorporation in statistical analyses will provide much needed additional data on their impact. These 24 actions will enhance the validity and practical applicability of the JBT for welfare assessment. 25


Introduction 27
Recently, scientists have tended to utilise the school of animal welfare science, which considers 28 how animals' 'feel', in preference to assessment of biological function or 'natural-like' state (Boissy et 29 al., 2007). Concomitantly, researchers have concentrated their efforts on assessment of positive 30 affective states (Mellor, 2012). Affective states being defined according to their two main dimensions, 31 1) arousal or magnitude of activation and, 2) valence or direction of effect. Positive welfare is 32 understood to be a complex equilibrium between providing the animal with conditions that encourage 33 positive emotional (affective) valence and an evaluation of the maintenance requirements of that animal 34 (Yeates and Main, 2008). As a result, there is a general desire to design housing and husbandry systems, 35 that not only minimise suffering and pain, but maximise factors likely to induce positive affect (Harding 36   It is generally accepted that the valence of an animal's affective state cannot be measured 41 directly (Mellor, 2012). In spite of this there are known linkages between an animal's affective state 42 and the functional states that generate them. These functional states include four components: 43 behavioral, neurophysiological, cognitive and subjective (Mellor, 2012). The latter is the most 44 controversial with there being controversy over what animals' emotional experiences consist of, and 45 their level of conscious awareness of emotional state (Boissy et al., 2007). The most widely investigated 46 components in relation to animal welfare are neurophysiological and behavioural markers (Mendl et 47 al., 2009;Novak et al., 2015). However, there are some well-described challenges associated with the 48 use of existing measures for welfare assessment. As a representative example, an increase in plasma 49 glucocorticoids may occur, but there is no means of determining the original source of stimulus in the 50 brain. This increased synthesis may have been triggered by a positive event such as reward, or negative 51 event such as fear (Ralph and Tilbrook, 2016). Glucocorticoids, and many other biomarkers, are Positive reward is always associated with stimulus 1. Exposure to stimulus 1 results in expression of 98 behaviour 1 to obtain the reward. The negative reward is always associated with stimulus 2. Exposure 99 to stimulus 2 results in expression of behaviour 2 to avoid punishment. A stimulus directly intermediate 100 between the two previously learned stimulus introduced. Depending on the behaviours expressed to the 101 ambiguity, a cognitive bias can be discerned. 102

103
The practical application of these tests for animal welfare assessment is that it is assumed that an 104 intervention, for example housing change or treatment of illness, will alter animal affect, which can be 105 determined by task performance. This provides information on the impact of the change of welfare, and 106 can be used for ongoing monitoring of welfare state. As already discussed the task has advantages over 107 traditional assessment methods since it can establish valence of emotional response. Additionally, it 108 has been suggested that the task is better suited to measuring long-term mood, rather than shorter-term 109 emotion (Bethell, 2015). Since welfare generally refers to a lasting state made up of an individual's 110 cumulative experiences, and their relative weighting over time (Mellor and Beausoleil, 2015), the 111 outputs of the judgment bias task therefore align neatly with our current welfare definition. 112 There are two current impediments to widespread implementation of these tasks in animal 113 enterprises for practical welfare assessment. The first is that significant training of animals in the task 114 is required before it can be used to gauge affective state. This issue is not insurmountable, with some 115 progress already made towards automating a judgment bias task in rats, using their natural inquisitive 116 behaviour to drive self-direction of the task . The second issue arises due to the 117 relative novelty of the test. At the current time, potential confounders of the test results are not well 118 documented or characterised. These confounders may result from test design (reviewed extensively by 119 Mendl et al., 2009, Bethell, 2015, but also as a result of inherent biological variation between animals, 120 or due to the imposition of social structures. This review uncovers the literature on the impact of these 121 factors on judgement bias task (JBT) interpretation, posing questions for further research. We consider 122 the influence of sex and reproductive cycles, social status, genetics, and personality on judgement bias 123 test outcomes. We then discuss specific facets of test design that may influence JBT outcomes and 124 provide suggestions for mitigating these impacts. Throughout, we refer to state and trait effects. Traits 125 being defined as behaviours or emotions that are resistant to change and enduring, whereas states are 126 usually temporary and influenced by the immediate situation (Steyer et al., 2015). 127

Judgement Bias Task (JBT) Types 128
Published JBTs vary in design in accordance with the learning capability of the animal, 129 ethological relevance of the behaviours required to the species in question, and the parameters of interest 130 (Mendl et al., 2009;Bethell, 2015). These tests are commonly categorised as being go/no-go, or active 131 choice tests and are described briefly below (see Bethell (2015) for review). 132 A go/no-go test involves presenting two stimuli to an animal, with one stimulus encouraging the animal 133 to perform an action, such as pressing a lever (go response). Alternatively, the animal learns to not 134 perform an action in response to the second stimulus (no-go response). When the unlearned, There are well-documented concerns associated with the go/no-go methodology. Depression 142 and anxiety have been associated with a decrease in activity level and a decrease in overall food 143 consumption (Willner et al., 1998;Mendl et al., 2009). Therefore, a reduction in the number of "no go" 144 responses could be attributed to this reduced activity, or reduced motivation for food, as opposed to a 145 negative judgment bias (Matheson et al., 2008). In recognition of this concern, judgment bias tasks 146 have been developed where the animal is required to respond actively to both the positive and negative 147 stimuli (Matheson et al., 2008);so-called 'active choice' tests. 148 The active choice test differs from the go/no-go in that an active response is required to the 149 presented stimuli, and the response required is the same in nature and therefore should be unaffected 150 by motivation (Bethell (2015)). An active response is defined as being a deliberate, quantifiable action regimes may also be reduced (Vögeli et al., 2014). Therefore, a positive versus less-positive, or neutral 168 reward paradigm is suggested to optimise judgement bias task interpretation and reliability of results. 169

Sex Differences 171
It is well established from human studies that there are trait differences in affective state 172 between males and females. Negative affectivity, or the tendency to experience negative affect, is 173 with female Japanese pygmy squid (Idiosepius paradoxus) being more likely than males to make 205 pessimistic decisions after non-reward for behaviours that would normally go rewarded (Takeshita and 206

Sato, 2016). 207
There is some evidence from the Barker et al., 2017a study that variability in between sex JBT 208 comparisons may have actually arisen because of greater intra-individual variability in the female rats. 209 The most likely cause of this is the cyclic release of hormones as part of the oestrus cycle. The 210 psychiatric literature supports this with evidence that some rare disorders appear in synchrony with 211 ovarian cycle phases, for example premenstrual dysphoric disorder (Einstein et al., 2013). Further more 212 compelling evidence for the link is that the increased prevalence of mood disorders in women only 213 commences at puberty, and subsides after menopause (Kessler et al., 1993). increased optimism as might be expected given that rats generally show greater sociability and 220 exploration at this time to increase the chance of securing a mate (Frye et al., 2000). There may be 221 considerations other than a change in affective state which contributed to this finding. The increased 222 progesterone of dioestrus may reduce spatial memory (Sutcliffe et al., 2007). This has implications for 223 JBTs that utilise spatial reference locations with no other associative cues. 224 Whilst there is no consensus as to the nature of inherent sex differences in response to the JBT, 225 due to the complex nature of the task, with the need for prior conditioning, retained memory of the 226 associations and even a possible role for risk taking, sex differences are likely. Further investigation of 227 this will improve the reliability of JBTs in animal welfare assessment and assist in tailoring husbandry 228 conditions based on sex if required. Whilst the effects of sex cannot be controlled for, including sex as 229 a predictor variable in data analysis should be performed, and consideration of reproductive cycle 230 effects should be considered when using animals as their own controls. 231

Early Life Experience 232
There is extensive evidence of an association between adversity in early life, peri-and 233 postnatally, and risk of later neuropsychiatric disease (Heim et al., 2008;Kessler et al., 2010;Pechtel 234 and Pizzagalli, 2011). This link has been supported by findings from animal research, largely using 235 rodent models, and has elucidated various contributing mechanisms. During early life the brain shows 236 considerable plasticity and it is postulated that adversity enhances or inhibits maturation of brain regions 237 responsible for emotional function, and the hormonal responses to stress, such as the cortico-limbic state. Impacts on response inhibition and food motivation may also arise confounding test interpretation. 269

Social Status 270
Most animal species have social structures that may be incredibly complex. Some animals are 271 highly social and live in groups throughout life, whilst others may only group periodically (Parreira 272 and Chikhi, 2015). In many group-living animals, dominance hierarchies emerge as a means for standardised criteria faster than their subordinate counterparts. However, rats did go on to learn the 292 required associations. Training times may need to be extended to account for this, rather than simply 293 excluding animals as non-learners. Only one study that investigated social status of animals found no 294 difference in JBT response as a result. This study involved laying hens performing a spatial judgment 295 bias task. The author did pose that a possible explanation for the null finding was that lower ranking 296 birds, free from usual competition, seized the opportunity to gain the food reward in the JBT 297 (Lindström, 2010).   Given the limited study, it is impossible to say whether breed affects JBT performance. 319 Whilst a dearth of literature suggests there may be differences, it has also been suggested that 320 differences may be rare, but due to publication bias studies reporting no differences remain unreported 321 (Bushby et al., 2018). Further considerations are that current evidence on breed differences relates to 322 learning of the test, rather than actual differences in affective state. This should be considered in 323 experimental design by providing longer for training, rather than excluding animals who do not meet 324 test training criteria. A second consideration is that breed may not be the correct factor to be  In a recent canine study the assumed link was supported (Barnard et al., 2018). Dogs that 337 scored higher on sociability, excitability and non-social fear in standardised personality assessment 338 protocols showed reduced latency to probes than those with traits for separation-related anxiety and 339 dog-directed fear. However, some associations between personality types and the JBT that were 340 predicted did not eventuate, for example an effect of curiosity and playfulness. In red jungle fowl 341 chicks, personality was assessed through traditional behavioural tests, including the novel object/arena 342 and tonic immobility tests. The chicks then performed an active choice JBT with the finding that less 343 nervous chicks were more optimistic than their nervous counterparts (Jansson, 2015).  valence hypothesis proposes that negative emotions are controlled by the right cerebral hemsiphere, and vice versa for positive emotions (Davidson, 1995), although there is evidence disputing this neat 373 separation from humans (Rogers, 2010). Since the laterality link with emotion is focussed on valence 374 of response, the JBT would seem to be the ideal candidate for evaluating direction of effect further. 375 However, whilst studies to date have demonstrated effects of handedness on judgement bias, too few 376 studies have examined this to be able to draw any firm conclusions on specific linkages.  to learn that not every trial is rewarded, reducing the chances of extinction when it comes to the 407 ambiguous probe (Barker et al., 2016). It is also important to note that these unrewarded training trials 408 occurred for a limited period (5 days), and took place after the animals had already demonstrated their 409 ability to perform the task to the researchers established criteria. This is important, as unrewarded trials 410 in the early stages of the study could hamper animals' learning, and subsequent performance of the JBT. 411 The possibility of extinction suggests that the judgement bias test has relatively low 412 repeatability, and will be of particular concern when there is repeated exposure to the ambiguous probe A novel, recently reported method to overcome this concern has been to intentionally train 428 animals to recognise the cues presented in the task and make associations between these cues and the 429 probability of reward or punishment. (Lecorps et al., 2019b). The task then no longer relies on ambiguity 430 interpretation. This test was successfully validated in calves expected to be in pain because of

Parameters measured in the tests 435
Typically studies using JBTs tend to report either latency to perform the required learned Time taken to achieve a task (latency) has traditionally been used in behavioural tests 441 conducted on animals. There is good evidence that latency to approach a novel object indicates an 442 animal's preference, and therefore decreased latency to approach an object can identify a decreased 443 desire for that object (Bateson and Kacelnik, 1995). However, the use of latency as an outcome measure 444 may suffer from similar confounds as go/no-go test designs (Hernandez et al. 2015). Decreased

Other Methodological Issues 468
There are a number of other methodological issues to consider in JBT testing. Some of these may 469 be eliminated through choice of alternate test or careful experimental design. Food is commonly used 470 as a reinforcer in JBTs since animals are generally highly motivated to acquire it. The use of food 471 rewards has been discussed extensively elsewhere (Mendl et al., 2009). However, there are a few salient 472 issues based on factors presented here. There is evidence to suggest that elevated glucocorticoids, 473 released during a stress response, motivate animals to consume food (Dallman, 2010,Willner et al., 474 1998). This increases the incentive value of food rewards used, therefore animals in a negative state 475 may respond with seemingly optimistic biases (Hernandez et al. 2015). A similar concern exists when 476 considering animals that have suffered early life adversity since they may be underweight, and Related to laterality many animals have side biases in motor behaviour or perception of stimuli 485 on opposite sides (Rogers, 2000). To our knowledge this has not been investigated specifically in the 486 context of JBTs but it is postulated that these biases may influence spatial task performance, especially 487 in prey species. In the authors' experience, there may also be more practical reasons for side biases; in 488 creating a JBT arena for larger animals, usually situated in a room or open space, it is easy to overlook 489 facets of the environment that may be imperceptible to human observers but of importance to animals.

Conclusion 495
Whilst the JBT is now well-established as a tool for measuring valence of emotional response across 496 a range of species, results are sometimes contrary to those expected based on physiological principles 497 and corroborating behavioural data. Some of these unexpected results may arise due to significant 498 inter-and intra-animal variation, as well as facets of test design. This review has proposed that animal-499 based factors including sex, social status, early life experience, genetics, personality and laterality may 500 all lead to unanticipated responses that do not necessarily relate to current state effects. Test design 501 features, for example the use of food reinforcers when animals have suffered early life adversity, may 502 also exacerbate some of these factors. Whilst test design may be able to be modified to minimise these 503 concerns, animal biology cannot be changed. 504 It is our proposition that 1) researchers consider undertaking dedicated study to investigate the 505 impacts of these inherent biological factors on various types of JBT across a range of species, 2) that 506 the impact of these factors are considered in experimental design and analysis, by documenting their 507 presence, and including them, for example, as covariates in statistical analysis of results. These 508 refinements will increase the reliability of JBTs for animal welfare assessment and as a reference for 509 developing new welfare biomarkers, particularly those for positive affect.