Improved cognitive performances in trace amine-associated re- ceptor 5 (TAAR5) knock-out mice

Trace amine-associated receptors (TAARs) are a family of G protein-coupled receptors present in mammals in the brain and in several peripheral organs. Apart from its olfactory role, TAAR5 is expressed in the major limbic brain areas and regulates brain serotonin functions and emotional behaviors. However, most of its functions remain undiscovered. Given the role of serotonin and limbic regions in some aspects of cognition, we used a temporal decision-making task to unveil a possible role of TAAR5 in cognitive processes. We found that TAAR5 knock-out (KO) mice showed a generally better performance due to a reduced number of errors and displayed a greater rate of improvement at the task than WT littermates. However, task-related parameters, such as time accuracy and uncertainty have not changed significantly. Overall, we show that TAAR5 modulates specific domains of cognition, highlighting a new role in brain physiology.


Introduction
Trace amine-associated receptors (TAARs) are a family of G protein-coupled receptors (GPCR) discovered in 2001 [1,2]. TAAR1 is the first member of this family and the most studied one [3,4]. It is expressed in distinct brain regions, and several studies demonstrated its ability to modulate the dopaminergic, serotonergic and glutamatergic systems [5][6][7][8][9][10][11]. Initial studies described the other TAARs members as exclusively olfactory receptors sensing innate odors [12], however, recent evidence shows that most of them are also expressed in the central nervous system (CNS) as well as in the periphery [13][14][15]. TAAR5 was first discovered in human samples and named putative neurotransmitter receptor (PNR) [16]. Northern blot experiments demonstrated its expression in different brain regions, including the amygdala, caudate nucleus, hippocampus, hypothalamus and thalamus [16]. In mice, in situ hybridization studies confirmed the presence of TAAR5 transcript in the amygdala, the arcuate nucleus and the ventromedial hypothalamus [17]. By using a TAAR5 knock-out line (TAAR5-KO) expressing the LacZ reporter instead of the TAAR5 gene, a distinct expression was observed not only in the olfactory bulb but also in limbic regions such as the amygdala, the entorhinal cortex, the hippocampus, the nucleus accumbens and the thalamic and hypothalamic nuclei [15].
Olfactory TAAR5 functions are related to its ability to sense trimethylamine (TMA) [12], a chemical present in mouse urine that is attractive in mice and repulsive in rats and humans [18]. TMA is a product of microbial fermentation of choline and is present in hu-man bodily fluids and certain foodstuffs such as spoiled fish. Interestingly, a recent population study found a rare TAAR5 polymorphism that decreases the aversion to fish odor in carriers [19]. In the brain, TAAR5 modulates emotional behavior and serotonergic system [15]. Indeed, TAAR5-KO mice display an anxiolytic and antidepressant-like phenotype. 5-HT1A receptors are more sensitive to agonists, and the level of serotonin and its metabolites are altered in the striatum, the hippocampus and the hypothalamus of TAAR5-KO mice [15]. Conversely, the injection of α-NETA, a TAAR5 agonist, causes a sensorimotor gating deficit in rats and increases mismatch negativity-like response in both rats and mice [20,21], thus indicating cognitive deficits psychosis-related as seen in humans and experimental animals [22].
Cognition is a high order function that depends on the proper functioning of several brain areas and involves many neurotransmitter systems. Among others, serotonin is an important player in decision-making and behavioral flexibility [23,24]. Several psychiatric disorders display problems in cognitive processes. In depression, cognitive impairments are considered a core point of the symptomatology and are often persistent after the remission of the depressive status [25]. Cognitive domains often affected in depression are executive functions, working memory and processing speed. Anxiety and stress have a strong impact on cognitive functions, affecting the performances (e.g. students facing an exam) and influencing the decision-making, especially when involving emotional information [26]. Both antidepressant and anxiolytics drugs may be beneficial in treating these types of cognitive impairments in psychiatric disorders, although results are not always positive, given the complexity and the heterogeneous clinical manifestations of these conditions [27][28][29].
This study aimed to evaluate the cognitive functions in TAAR5-KO mice since the role of this receptor in emotional behavior and in serotonin system. To this end, we used a timing task, the switch task, to unveil a possible role of TAAR5 in decision-making and behavioural flexibility. We found that TAAR5-KO mice performed better than WT littermates, by making fewer errors in the task and having a greater rate of improvement over days, suggesting that TAAR5 KO mice are more engaged in the task and adapt more flexibly to changes in the environment.

The switch task: a temporal decision-making task in home-cage
To investigate the impact of TAAR5 on high cognitive functions, we tested mice on a temporal discrimination task, called the switch task [30] (Supplementary Figure 1a). This task requires a fine judgment of two different temporal signals that last in the range of seconds [31]. The training lasted two weeks after an initial week of pre-training in which the animals familiarised with the operant wall in the home-cage. The operant wall consisted of three holes/hoppers (central, left and right) equipped with an infrared beam to detect the nose-poking (NP) activity. Each hole was additionally supplied with a light bulb on the top to signal the start, the duration and the end of every trial [32]. During the first week of training, the mice had to learn to discriminate between two intervals: one of 3 seconds (short signal) and the other of 9 seconds (long signal). Each time interval was associated with one of the two lateral hoppers, and its duration was signalled by the light above the hopper. Short and long signals were randomly intermixed. During the second week, probe trials were introduced with a probability of 20% on each side. Probe trials consisted of a light signal of the same duration, but no reward was delivered in case of correct response. Probe trials were randomly intermixed with regular trials aiming to assess the accuracy in time perception and the perseverance in nose-poking activity around the learned target time.
The main advantage of this task is that it allows monitoring the animals' behavior in their home-cage continuously for several days (24 hours a day, for several weeks), and it can highlight subtle differences even between mouse substrains [32]. We first checked whether the TAAR5-KO mice showed alteration or deviation of their circadian parameters compared to WT mice. We couldn't observe any differences between the groups in terms of the circadian period (Supplemental Figure S1b-e) or activity distribution along the 24 hours cycle (Supplemental Figure S1d-g) in both weeks of training. A significant effect of time was present in the first week of training (Supplemental Figure S1c), likely due to a learning dynamic that required higher activity at the beginning of the training to obtain enough food. This activity decreased over the training with an improvement in performance, and it increased again in the second week, where it remained constant (Supplemental Figure 1f). The increase during the second week was due to the addition of probe unrewarded trials, which required higher engagement in the task to obtain the same amount of food. The continuous monitoring of mice performance allowed to track the evolution over time and over the learning of fine cognitive functions. With this task, we expected to identify when and how a change in the cognitive performance of TAAR5-KO mice emerged during training.

TAAR5-KO mice show better performance over training
In a previous study, it was found that TAAR5-KO mice showed less anxiety and antidepressant-like phenotype [15]. Here, we asked whether this alteration in emotional behavior affected fine cognitive functions and potentially contributed to a better performance in temporal decision-making tasks. Indeed, we observed that KO mice showed on average to perform better for the duration of the entire training ( Figure 1). This improved performance was evident since the first day of training, reaching a steady performance by day 2 for KO mice (Figure 1a). In contrast, WT mice showed a significantly lower performance (Figure 1a-f). We further explored whether this effect was specific to the light or the dark phase. We found that in both phases and for the entire duration of the training, KO mice showed a better performance compared to WT (Figure 1b,g). A more temporally refined analysis revealed that this difference was maintained hourly over the circadian cycle (Figure 1c,h). We already excluded that the better performance wasrelated to altered activity patterns (hyper-or hypo-activity), as shown in Supplementary Figure S1. Furthermore, this difference could not be related to a different engagement in the task as both groups performed a comparable number of trials on a daily and hourly basis (Figure 1d, e, i, l).
The better performance of TAAR5-KO mice and the comparable number of trials between the two groups suggests that the difference should be in the number of rewarded trials (performance includes probe trials too. Probe trials are correct trials but not rewarded). However, the absolute number of rewarded trials was not different along the circadian cycle between the two groups (Supplementary Figure S2a  f), suggesting that TAAR5-KO mice were proportionally more efficient at the task despite the number of overall trials and reward received was similar between the two groups. Furthermore, we found proportionally more probe trials for TAAR5-KO mice compared to their littermate control (Supplementary Figure S2g,h,i), confirming that the better performance of TAAR5-KO mice was due to a combination of successfully rewarded trials and accurately performed trials. The efficiency of TAAR5-KO mice could possibly depend on two factors. One is that they make fewer mistakes and potentially learn faster, suggesting that they are more engaged in the task, and more attentive/focused. The other factor is task-related and assumes that TAAR5-KO mice are more accurate in temporal decision-making tasks. We explored here both possibilities. .88, no effect of group p=0.09, F=2.94). e. Total number of trials per hour along the circadian cycle grouped by intervals of 3 hours. (2-way Anova, effect of time p<0.005, F=103.26, no effect of group p=0.1, F=2.69). f. Similar to panel a for week 2. 2way Anova, no effect of time p=0.9, F=0.19, effect of group p<0.005, F=11.54. g. Similar to panel b for week 2. 2-way Anova, light phase: effect of time p<0.005, F=9.83, no effect of group p=0.06, F=3.5; dark phase no effect of time p=0.8, F=0.29, effect of group p<0.005, F=13.7. h. Similar to panel c for week 2. 2-way Anova, effect of time p<0.005, F=34.42, effect of group p=0.008, F=7.32. i. Similar to panel d for week 2. 2-way Anova, no effect of time p= 0.8, F=0.29, no effect of group p=0.07, F=3.22. l. Similar to panel e for week 2. 2-way Anova, effect of time p<0.005, F=71.4, no effect of group p=0.08, F=3.1.

TAAR5-KO mice make fewer mistakes and have higher rate of improvement
We hypothesised that TAAR5-KO mice were performing better due to a general nontask-related cognitive ability to learn better and remain engaged in the task for a longer time. Therefore, we first evaluated the error trials to see where these mice were performing better. Then, we examined the learning phase and rate of improvement to investigate whether they were learning earlier or were consistently improving more than WT.
Indeed, TAAR5-KO mice made fewer errors over both training days and the circadian cycle ( Figure 2, Supplementary Figure S3 a, b, e, f). To further explore the type of errors most frequently made by WT mice, we separated the analysis into time-out and timing error trials. Time-out trials ended without any response, as the maximum time window allowed to nose-poke in one of the locations to make a choice elapsed without response. Timing error trials are instead those in which the animal made the incorrect choice. We found fewer time-out trials over the circadian cycle (Figure 2 b, f) and over the two weeks of training (Figure 2 a, e) for TAAR5-KO mice, suggesting that these mice are overall more engaged in the task. Indeed, KO mice complete their trials by making choices much more often than WT, and this is also true during the light phase, when animals are sleepier. Similarly, the timing error trials were significantly different between groups over days of training and over the circadian cycle ( Figure 2c, d, g, h, Supplementary Figure  S3d, h). Interestingly, we found TAAR5-KO mice being more accurate (less timing error trials) than WT, especially during the dark phase, when animals are more active ( Figure  2d, h). To further investigate when and how the improvement in the performance emerged in TAAR5-KO mice, we analyzed the learning during the first week of training to identify its occurrence. Multiple factors could determine the higher performance over time: either an earlier and better learning or a better rate of improvement over time or a combination of both. We defined learning as the time point (or trial) at which the cumulative number of correct trials (Supplementary Figure S3i) showed the maximum inflexion (Figure 3a, see Methods). We found that both groups learned early in training; KO mice within the first day and all WT within the second day (Figure 3b), with no significant difference between groups. They also learned at the beginning of the dark phase, when mice become more active (Figure 3b, right panel). The learning trial and time of learning were not significantly different between groups. Neither was the learning rate, defined as the change in the slopes of the regressions between before and after the learning point (Figure 3c, see Methods), despite a marginal trend suggesting better improvement by TAAR5-KO mice (Kolmogorov-Smirnov test, p= 0.07).
Since TAAR5-KO mice learn as well as WT and at the same time, then we hypothesised that TAAR5-KO mice might show a higher rate of improvement than WT over time. Therefore, we computed the correct rate (see Methods) per hour over the training (Supplementary Figure S3j). We observed a clear circadian effect resulting in a steeper correct rate during the dark phases compared to the light phases. We compared the average correct rates around the time the light is turned off (hour 0 in Figure 3d) over days, and we observed an increasingly steep correct rate for both groups over days. However, TAAR5 KO mice showed to get close to the optimal correct rate (the diagonal dashed line, Figure  3d) faster and earlier. To further compare the distributions of correct rates across subjects and over time, we quantified the slope of cumulative correct rate curve after the light switch for each subject in each day of training. We found a significant effect of group and time, suggesting that TAAR5 mice reached a better performance sooner and their rate of improvement was also higher. These results support the hypothesis that the performance improvement is not task-related but instead is due to a higher engagement (lower timeout trials) and better cognitive flexibility (higher rate of improvement). However, the significantly lower timing error rate compared to the time-out rate (Supplementary Figure  S3 c,d,g,h) did not exclude that TAAR5-KO mice might show significant alteration in timing parameters. We explored this option below.

Interval-timing is preserved in TAAR5-KO mice
To fully explore the efficiency of KO mice in this behavioural paradigm, we checked the accuracy in task-related parameters. In particular, probe trials were introduced during the second week of training to investigate the persistence in nose-poke activity around the target time when the reward was not delivered. Two possibilities of poking behaviour are feasible. Prolonged poking might suggest perseveration in reward-seeking, whereas reduced poking activity suggests inactivity, which we excluded (Supplementary Figure S1) or inattention. To explore the former, we evaluated the distribution of nose-pokes during probe trials for short (Figure 4 a,b) and long duration (Figure 4b) trials. No difference in the distributions of nose-poke activity between TAAR5 KO and WT mice (Figure 4c) for either short or long probes has been found, suggesting that TAAR5 KO mice have intact interval-timing estimation. . Preserved Interval timing in TAAR5-KO mice. a. Example distribution of NP during short probe trials. Each row is a probe, the solid grey line identifies the interval in which the animal was poking into the short hopper location. Green and orange vertical lines identify the short and long time, respectively. b. Normalised NP distribution for short and long probe trials for the example subject in a. Green and orange vertical dashed lines identify the short and long time, respectively. c. Cumulative distribution of NP activity for each subject and each group. Solid lines identify group average. Each single lighter line is a subject. Blue curves are for WT mice, while red curves for TAAR5-KO, as before.

TAAR5-KO mice have optimal temporal accuracy
Relevant task-related parameters of the switch task include estimating temporal accuracy and uncertainty (see Methods). Typically, control animals develop an optimal strategy to solve this task moving to the short location soon after self-initiating the trial and waiting until the short time elapses. If the light signal is short, the animal pokes in the short location; otherwise, it switches to the long location, waiting for the long duration to elapse. The time of the switch from short to long location is called 'switch latency'. By knowing the distribution of the switch latencies, we can estimate how close the behaviour is to an optimal one ([33] and see Methods). For each subject, we assessed the switch latency distribution parameters from the fitted normal distribution [41]. The average switch latency reflects the subject's target switch latencies, also called timing accuracy ( ), while the dispersion around the mean (the coefficient of variation, CV = / ) reflects the endogenous timing uncertainty. We estimated the accuracy and uncertainty of every subject during the first (Figure 5 top panels) and second ( Figure 5 bottom panels) week of training. Both groups were performing nearly perfect (Figure 5 a, d) since no difference in the distributions of the parameters between TAAR5-KO and WT mice (Figure 5 b, c, e, f) has been found. These results suggest that TAAR5-KO mice performance is not related to the specific task demands but instead is a general feature of this mouse line, likely to show consistent performance improvement across a wide range of behavioural and cognitive tasks.

Discussion
TAAR5 expression in the CNS was demonstrated in recent studies, and some of its putative functions in brain physiology were characterised [13][14][15]34]. TAAR5 is involved in regulating emotional behavior, and TAAR5-KO mice show an anxiolytic and anti-depressant-like phenotype [15]. In this study, we evaluated the role of TAAR5 on cognitive processes and we showed that TAAR5-KO mice were able to perform better by making fewer errors and displaying a higher rate of improvement in the performance. In particular, we found that TAAR5-KO mice were more engaged (lower time-out trials) in the task, especially during the light phase, where mice are typically sleepier, and were more accurate (lower timing error) especially during the dark phase. This improvement in the performance was not due to earlier learning, but instead, we found a constant higher rate of improvement throughout the training.
Apart from its role in olfaction, the comprehension of TAAR5 functions in brain physiology is still in its infancy. Since its low expression, initial reports did not find TAAR5 outside the olfactory epithelium [12]. However, independent reports show a discrete TAAR5 expression in several brain areas using different techniques, demonstrating its presence in limbic regions such as the amygdala, entorhinal cortex, nucleus accumbens, thalamic and hypothalamic nuclei [13][14][15]17]. Recently, by analysing transcriptomic datasets derived from human samples, it was found a low but ubiquitous expression of TAAR5 in limbic and cortical areas [13,14]. A similar situation was initially in TAAR1 studies since it was found at low levels in discrete brain regions [3,35]. However, TAAR1-KO mice display a clear phenotype and many reports demonstrated its relevant role in dopamine, serotonin and glutamate homeostasis [8,9]. TAAR1 selective agonists are now in late-stage clinical trials with the indication of potential antipsychotic agents [36].
TAAR5-KO mice did not show gross abnormalities nor overt neurological phenotype [15]. However, a series of behavioral tests assessing emotional behavior highlighted that TAAR5-KO mice are less anxious and with an antidepressant-like phenotype compared to WT littermates [15]. Serotonin and its metabolites levels are also altered in this mouse line and the hypothermic effect of the 5-HT1A agonist 8-OH-DPAT is increased. Another report shows that striatal dopamine levels and the number of dopamine neurons in the substantia nigra is increased and, interestingly, the neurogenesis in the subventricular and subgranular zones is increased in mutants [37]. Recently, an altered sensorimotor function in TAAR5-KO mice was also demonstrated (31).
To unveil possible roles of TAAR5 in cognition, we used a home-cage behavioural paradigm that tested temporal decision-making in mice. Home-cage behavioral test allows to collect large amount of data and to reveal subtle differences between substrain of animals [32]. In this study, this paradigm highlighted several interesting aspects in the analysis of cumulative correct rate (Figure 3d) over the days of training at the time the light switched off. First, both genotypes show a clear step-change in the average hourly activity between the light and dark phases. Second, before the light switch off (before 0 in Figure 3d) KO and WT mice have a 50% or lower probability of success, respectively, over the days of training, with no effect of time. This phenomenon clearly suggests a sleepiness effect unrelated to the level of training. Third, as soon as the light switch off (after trial 0 in Figure 3d), we can observe a clear improvement in the performance, which increases over the training days. Finally, we showed that TAAR5-KO mice had, on average, a better rate of improvement across all training days (Figure 3e).
In this test, animals have to learn the task to obtain a food pellet, in particular, to discriminate between two time intervals. Both WT and TAAR5-KO learned quite fast the test (Figure 3). The speed in learning is due to the continuous exposure to the task (24/7), forcing the animal to work to obtain food, the home-cage environment, which removes the stress caused by the moving from one cage to another and the possibility of mice to engage in the task at their own rhythm, which typically follows a circadian oscillation through the 24 hours.
Interestingly, KO mice performed better in the test, indicating an increased accuracy in the decision-making process visible from the lower timing error trials (Figure 2). The better performance was also evident in the light phase, a period of the day where animals are usually sleepier and make more errors. Indeed, WT mice showed higher time-out trials. These trials occur when the animal self-initiates the trial but is not keen to complete it. The elevated number of time-out trials during the light phase suggests that these trials do not happen due to a momentary inattention to the task but are more likely due to disengagement and sleepiness.
Another interesting distinction between time-out trials and timing error trials is their temporal evolution. In particular, the latter reflects the learning dynamic; the former is constant over the entire training. Figure 2c shows the decrease in timing error trials over the first week of training, which is then maintained constant throughout the second week (Figure 2g). This dynamic nicely resembles the improvement and maintenance of the performance seen in Figure 1a, f. On the contrary, time-out trials remain constant over the entire training, supporting our previous claim that these trials are a reflection of sleepiness.
If we look at task-related parameters, both groups of mice displayed a correct interval-timing estimation and an optimal combination of temporal accuracy and uncertainty. Overall, our results suggest that TAAR5-KO mice are better learners, more engaged in the task, and adapt more flexibly to change in the environment. This overall better performance is not related to the specific task demand but it may be a general feature of these mice. To confirm these data, more behavioral tests specific to each cognitive domain are needed to understand the precise role of TAAR5 in cognition. It should be noted that these subtle differences may be difficult to unveil using standard tests done during a few hours in the daylight phase and that usually cognitive assays reveal more easily deficits rather than a pro-cognitive effect in WT animals. Another option would be to inject a TAAR5 antagonist to mimic these actions in vivo, similarly to what has been done for TAAR1 studies. Although the first TAAR5 antagonists were described some years ago [38], no other selective and potent compounds have been reported so far. In principle, a selective TAAR5 antagonist may be a new potential drug with several therapeutic indications with a new mechanism of action. Apart from the endogenous agonist TMA that has a clear role in olfaction, only another putative agonist has been found and tested in animal models, namely α-NETA. Interestingly, in mice and rats, this compound was able to induce psychotic-like behavioral abnormalities, including features related to cognitive deficits present in psychotic patients [20,21].
How TAAR5 influences cognitive domains is still under investigation. Serotonin, a neurotransmitter whose levels are altered in TAAR5-KO mice, play an interesting role in cognition [23,24]. Although the serotonergic system is very complex and serotonergic receptors are a big family of GPCRs with a myriad of functions, there is a general consensus that serotonin is important in decision making. Moreover, compounds that increase serotonergic transmission, such as antidepressants, are used in neuropsychiatric diseases where cognitive impairments are present and in particular an impairment in decision making [23]. 5-HT1A agonists, especially ones acting mostly on post-synaptic receptors, increase behavioral flexibility and facilitate performances [39]. Similarly, anxiolytics may be beneficial in decision making, particularly when the decision is influenced by an emotional component [26]. An anxiolytic effect may also facilitate a flexible choice behavior, increasing the speed to find the optimal strategy. A recent study showed that TAAR5-KO mice have an increased adult neurogenesis in both the subventricular zone and the subgranular zone [37]. Adult neurogenesis is linked to many aspects of brain physiology, including cognition and the behavioral effect of stress and antidepressant [40]. In particular, several pieces of evidence suggest a role of adult neurogenesis in cognitive flexibility and that this action may reduce anxiety and depressive-like behavior [40].
In conclusion, we showed that TAAR5 might be considered as a new player in cognitive processes and a potential new drug target for various neuropsychiatric disorders involving deficiencies in emotional states and cognition.

Mice and Husbandry
Groups of 8-12 weeks old male mice were studied (TAAR5-KO and their WT littermates). Each group included 6 mice and were generated as described previously [15]. All mice were group-housed two weeks before the experiment with food and water ad libitum under a 12:12 light-dark cycle (lights on from 7:00 to 19:00). The week before the experiment, 20 mg food pellets were gradually mixed into regular food for habituation. Then mice were singly housed in type III TSE PhenoMaster cages (TSE Systems Bad Homburg, Germany) and subjected to the experimental phases. The animal study was reviewed and approved by all procedures involving animals. Their care was carried out in accordance with the guidelines established by the European Community Council (Directive 2010/63/EU of September 22, 2010) and was approved by the Italian Ministry of Health. During the experimental phases, animal wellbeing was monitored daily. If the weight loss was between 10% and 20% (referring to the free-feeding weight taken on day one), one or two additional standard food pellets (approx. 1.3 g) were given, respectively. If weight loss exceeded 20% of the free-feeding weight, animals had to be culled. All the animals in this study completed the experiment.

Apparatus and Procedure
In this study, we used an automated operant wall (Cognition & Welfare, COWE), developed by TSE Systems (Germany) based on its PhenoMaster System. The device consists of three holes/hoppers over a metal wall inserted in type III cages. Each hole is equipped with infrared beams that detect the nose poking. A LED with 4 mcd (millicandela) of luminous intensity is mounted in each hopper to serve as a stimulus. The two lateral hoppers are attached to independent hidden feeders that dispense 20 mg dustless precision pellets (BioServ, USA). The sensors (LED and infrared beams) and the actuator (feeder) were remotely controlled via computer to design trial by-trial protocols for individual and/or group cages. Each COWE cage (n=12) was maintained in individual ventilated and sound-proof light-controlled cubicles. The house light (approximately 100-110 lux) was timed on a 12:12 light-dark schedule as described above.

Experimental Design
The whole experiment consisted of two experimental phases following a pre-training phase. The experiment included 12 animals. During the pre-training phase, all mice familiarised themselves with the COWE cage to obtain food pellets from both lateral hoppers. This pre-training phase consisted of self-initiating trials by nose-poking in the central hopper triggering the switch-on of the lights in the three hoppers. Nose-poking in the lateral hoppers gave access to food rewards. No temporal limitations were imposed during this phase, and the goal of the pre-training phase was to develop the association between hopper location and food pellet. The trial ended when the animal received a pellet from each side and concomitantly lights switched off. Each trial was followed by an intertrial interval (ITI). The ITI was set as a 30 s fixed delay plus a random interval drawn from a geometric distribution with a mean of 60 s. The mice could not initiate a new trial during the ITI.
After four days of pre-training, all mice were introduced to the two consecutive experimental phases, each lasting about a week. During the first week, mice were trained in the switch task [32]. In this task, animals had to discriminate the duration of two light signals (i.e., short-vs. long-latency signals, called here short and long trials) to obtain a food pellet in a trial. The duration of the light signal determined the location of the pellet availability. Short (TS) and long (TL) trials were randomly intermixed with the same probability (P(TS)=P(TL)=0.5).
The left hopper was associated with the short trials, whereas the right hopper was associated with the long trials. The first nose poke after the short or long duration at the corresponding lateral hopper was reinforced with a food pellet, the trial was declared finished and the ITI started. Any nose-poke at the long location after a short signal or viceversa was not reinforced, triggering the start of the ITI. These trials were classified as timing error trials. If the animal self-initiated the trial but did not engage in the task by not poking in the hoppers, the trial ended after 30 seconds with no reward. These trials were classified as time-out trials. Short-latency signals lasted 3 seconds and long-latency signals lasted 9 seconds.
In the second week of training, we introduced 20% probe trials for both short-latency and long-latency trials. This means that short probes (Sp) and long probes (Lp) were introduced with the same conditional probabilities P(Sp|TS) = P(Lp|TL) = 0.2. During probe trials, the signal was presented as for regular trials but the correct responses of the animals were not reinforced. Probe trials lasted 30 seconds each, followed by an ITI as described earlier in this section. This manipulation allowed us to further characterise mouse timed behaviour in its full complexity.

Data Analysis
We recorded all events in the COWE cages with a millisecond resolution and these events were timestamped. Each timestamp was paired with an event code that identifies a specific type of event (i.e., light on/off, nose in/out, etc.). This strategy allowed us to standardise specific codes for data analysis across laboratories [32]. Every analysis was performed using MATLAB (www.mathworks.it) software.
All the analyses were performed on the first and second weeks of training separately to highlight the impact of probe trials. The analyses along the circadian cycles were computed for each subject by quantifying the parameter (e.g. performance, time-out trials, etc.) for each hour and then averaged across three hour intervals. From the resulting dataset, we computed a group average. The performance for each mouse was computed as the count of correct trials over the total number of trials performed. Correct trials included all rewarded trials during the first week of training; however, during the second week of training, correct trials included also probe trials.
We computed the cumulative number of correct trials to identify the learning point for each subject. This curve is the cumulative sum of correct (+1) and error (-1) trials over time. To identify the learning trial, we fitted a piecewise linear regression model to each trial allowing a minimum of five trials from the beginning of training. The model was applied to a moving window of length twenty trials, moving every five trials. The model comprised a robust regression line fitted to the cumulative curve before each trial and another line fitted to the cumulative curve after the trial. The learning trial was then identified as the first trial having the maximum increase in slope between the fitted regression before and after the trial. The day of learning and time of learning was reconstructed from the timestamps of the identified learning trial. The learning rate is the difference between the slope of the regression line before and after the learning trial.
To quantify the rate of improvement across days during the transition between light and dark phase, we quantified the cumulative correct rate curve for each subject. This curve looked at an interval of 22 hours around the time of the switch-off of lights (10 hours before and 12 hours after the switch-off of the house light). For each hour, each subject and each day, we computed the rate of correct trials.
To assess the circadian rhythmicity, we quantified the circadian period with a nonlinear curve-fitting to the number of nose-pokes over the recording hour. The periodic function that was fit to these data is defined by equation (1).
where T = (t1, . . . , tn) are the time points (15 min) during the recording and K = (A, P, Φ) are, respectively the amplitude, period, and phase of the sinusoidal function. Bestfit coefficients (A, P, Φ) were determined by minimising the mean-square difference between F and the data. The fit was repeated for multiple values of the parameter P (from 21 to 27 hours every 0.5 hours). The goodness of the fit was quantified by the Pearson Correlation Coefficient (CC) between the data of each subject and the corresponding fit function F. The best fit for P converged to the Subjective Period (Supplementary Figure  1b) for every initialisation of the parameter P between 22 and 26 hours for every subject.
Probe trials were introduced to assess how timing behaviour changed when correct responses were reinforced probabilistically (note that these probabilities were equal between the two trial types). The raster plot of NP activity (Figure 4a) for each subject and each trial type was analysed. We showed the empirical normalised NP distribution for an example subject (Figure 4b) and compared the cumulative distribution of NP between groups (Figure 4c).
We assessed the temporal decision-making performance by analysing the switch latency or accuracy ( Figure 5). The switch-latency is defined only for long trials and is the trial time at which the mouse leaves the short-latency location for the long-latency location [41]. For each subject, the distribution of the switch latencies was fit with a Gaussian function. From each Gaussian fit, we estimated the mean (μ) and variance (σ). We considered the μ as the accuracy in timing estimation and the coefficient of variation (CV= μ/ σ), which is the dispersion of switch latencies around the mean, as the timing uncertainty [42]. The dependence of optimal target-switch latency on the level of timing uncertainty was formulated in [41] and then expanded in [32]. This formulation, called the Normalized Expected Gain function (equation 2), allows the evaluation of timed behaviours within the framework of optimality based on experienced probabilistic reinforcement, endogenous timing uncertainty, and the payoff matrix.
The optimal target switch latency for a given mouse was defined as the ̂ that maximises the output of equation 2, where Φ is the normal cumulative distribution function.

Statistical Analysis
Data were analysed with a one-way or repeated measure ANOVA test using the Matlab package. The significant difference and F-statistic, which is the ratio of the mean squares, is reported for every test in the Figure legends.