Choice Under Uncertainty and Ambiguity: An Empirical In- quiry of a Behavioral Economic Experiment Applied to COVID-19

Results from a behavioral economic laboratory experiment are used to enhance our understanding of public health decisions made during the COVID-19 pandemic. The identification of systematic biases from optimal decision theory found in controlled experiments could help inform public policy design for future public health crises. The laboratory and the shelter-in-place decisions made during COVID-19 included elements of risk, uncertainty and ambiguity. The lab findings found individuals adopt different decision rules depending on both personal attributes and on the context and environment in which the decision task is conducted. Key observations to consider in the context of the COVID-19 decision environment include the importance of past experience, the ability to understand and calculate the odds of each action, the size and differences in economic payoffs given the choice, the value of information received, and how past statistical independent outcomes influence future decisions. The academic space encompassing both public health and behavioral economics is small, yet important, particularly in the current crisis. The objective of continued research in this area would be to develop a more representative model of decision-making processes, particularly during crisis, that would serve to enhance future public health policy design.


Introduction
A pandemic is classified as a Public Health Event with International Consequences (PHEIC). The International Health Regulations defines PHEIC as, "an extraordinary event which is determined to constitute a public health risk to other States through the international spread of disease and to potentially require a coordinated international response" [1]. This designation for COVID-19 signaled a need for immediate action worldwide. The statistical probabilities that were provided pertaining to the spread of the disease and the consequences, in terms of health care resources and deaths arising from COVID-19, were informative but imperfect. This uncertainty, i.e., lack of evidence or confidence in the information, and ambiguity, i.e., ability to solve the issue from past experience (unprecedented in living memory), led to various decision choices by government officials with varying economic consequences. Fontanet & Cauchemez, [2] aspects of the COVID-19 pandemic [17,32,20,23]. During COVID-19 more research has been published in the area, however, there is still much work needed to address the systematic cognitive biases that could be leveraged to influence public policy effectiveness [32,33].
The questions for researchers arising from the pandemic are numerous and varied depending on the research lens applied.
The study investigates the following research questions: What are the systematic biases observed in a closed binary decision environment with uncertainty and ambiguity? How might these systematic biases influence the binary choice decisions of public officials and politicians in the current pandemic environment? To answer these questions, we: 1. conduct an empirical inquiry of observed data from a binary-choice closed laboratory experiment to identify systemic decision inconsistencies in environments of uncertainty and ambiguity, 2. corroborate the findings with other studies, 3. discuss how the results from this simple environment may be applicable to more complex decision environments such as the COVID-19 crisis, and 4. express the limitations of our study and the potential areas for future research. The overall objective of this investigation is to provide valuable insights to shape future experimental designs that could layer in the additional situational contexts not considered here (e.g., multidimensional decision environments, time restrictions). The ultimate goal for this research would be to create a perfectly reproducible, randomized field study in order to gain a better understanding of how crisis decisions could be nudged towards those providing optimal benefits for society.

Empirical Inquiry Rationale
The following empirical inquiry of the closed laboratory experiment involving elements of uncertainty and ambiguity demonstrated systematic inconsistencies in decision-making by the subjects relative to an optimal statistical inferencing rule. In the experiment, subjects' behavior in an individual decision task involving the choices between two different actions was examined. Subjects were given additional information before reaching their final decision (terminal choice). The decision environment involved both informative but imperfect message signals. Subjects were presented with incentives that were meant to capture features of an environment where in-field decisions are made. Subjects' terminal decisions were benchmarked relative to an optimal statistical inferencing heuristic (Bayesian Expected Utility) and a WIN-STAY, LOSE-SHIFT (reinforcement statistic adapted from Charness & Levin [34]).
Although, the decisions made in this experiment take place in a much simpler context and with different agents, subjects were required to maximize their expected outcomes in a binary choice decision in an environment of uncertainty and ambiguity; a subset of similar features faced by decision-makers during COVID-19. Behavioural economic research helps identify principles of generalized economic behavior that deviate from predicated economic theory that could at some point be externally validated [35]. At a minimum, the qualitative observations from this experiment provide directional insight for future lab and field study research that could sharpen our understanding of systematic human decision behaviors that could be used to develop public health policies that 'nudge' best responses during subsequent PHEICs.

Laboratory Experimental Design
180 students were recruited by e-mail from the undergraduate Bachelor of Commerce student population from a Canadian public university. The participants took part in 24 rounds of an individual task consisting of two binary-choice decisions per round, where the second binary choice decision occurred after observing an imperfect, statistically relevant information signal. Subjects were told the objective of the experiment was to maximize their personal earnings. Subjects were shown at the beginning of each round two opaque bags (representing two states), each containing a combination of red and blue poker chips. The distribution of red to blue chips within the two bags were asymmetric with one bag containing a greater proportion of red chips and one bag containing a greater proportion of blue chips. A random draw determined with equal probability which one of the two bags described above were selected for use during each round. Participants do not learn until the end of the round which bag had been chosen. For each round, subjects are asked to choose one of two actions (action A or action B), where each action is associated with two different payoff amounts dependent on the bag (state) that was randomly selected. One of the payoff lotteries, conditional on the action choice, was always larger than the other (first-order stochastic dominates). For example, the lotteries associated with each action for a subset of the rounds were as follows: Pick Action A: If the bag chosen for the round was bag 1 you receive $2.00 If the bag chosen for the round was bag 2 you receive $0.75

Pick Action B:
If the bag chosen for the round was bag 1 you receive $0.50 If the bag chosen for the round was bag 2 you receive $1.75 The first binary decision (action choice) occurred prior to receiving a message signal. As such, there is an equal probability of the round being played with bag 1 or bag 2. The second binary decision (action choice) occurred after observing an informative but imperfect message signal; a colour chip is drawn and revealed to the subject and replaced. The observation of the colour would help predict the state (bag) and assist subjects to better maximize their payoffs. After both binary decisions are complete, the random bag selected for the round is revealed and the consequent payoffs, given their second action choice (after observing the colour chip), are recorded by the participant. Subjects received, at the end of the experiment, the earnings gained over the 24 rounds played. Figure 1 illustrates the decision tree faced for each round and the decision made by the subject. Further details of the experiment can be found in Appendix A. The subjects' observed behaviors are benchmarked against the decision rule required to follow the 1. Risk-neutral Bayesian Expected Utility 2 (BEU) maximizer; the optimal decision rule 3 and 2. a reinforcement learning (RL) theory algorithm adapted from 1 EP= expected payoff 2 A subject with a risk -neutral utility function would maximize their expected utility when maximizing expected payoffs. In this case, expected utility is equal to expected payoffs. 3 A decision that would result to at minimum as good an expected outcome as the other available decision option. Charness and Levin [34]. The description of the decision rules associated with each benchmark can be found in Appendix B.
This experiment tested the following research hypotheses: 1. The subjects' decision choices will be more reflective of optimal decision theory (BEU) with experience. Specifically, the more rounds of the experiment played, the more their responses will converge toward BEU. 2. When the higher economic state (first-order stochastic dominant lottery) is aligned with the optimal choice, subjects' behaviors will be more reflective BEU choices. 3. The subjects are more able to apply optimal decision theory when they are provided with the odds calculation of being in either state, given the value of the message received. 4. The more informative the imperfect message source, i.e., increased confidence level that it is predicting one state over the other as indicated through the experimental design, the more behavior will be reflective of BEU optimal choices. 5. When the reinforcement learning heuristic is aligned with the BEU decision choice subjects will make fewer errors.

Empirical Inquiry Analysis
The data was analyzed using two different measurement criteria. First, subject behavior was benchmarked relative to the action choices of a risk-neutral BEU maximizer and that of a Reinforcement Learner (RL). In the data set, inconsistency rates described deviations from these two behavior types. Hence, for each subject in the experiment a BEU first and second choice inconsistency rate and an RL second choice inconsistency rate were calculated.
To understand the causes of the observed BEU and RL inconsistency rates, logit regressions were run (random and fixed effects) 4 with the 1 st and 2 nd choice BEU inconsistency and 2 nd choice RL inconsistency as the dependent variable to determine the marginal effects 5 of the independent variables on these three outcomes. The dependent variable in equation (1) represented a 1st choice inconsistency from the risk neutral BEU decision by round and subject (Table A, Appendix C). The dependent variables in equations (2) & (3) represented a 2 nd choice inconsistency from the risk neutral BEU decision and the 2 nd choice inconsistency from the RL decision, respectively, by round and subject (Table B & C, Appendix C). In all three equations, the dependent variable was a dichotomous outcome variable. There are three types of variables used to explain the data. First, there is a group of explanatory variables that changed over the rounds but are the same for all individuals in a given round. Second, there is a set of explanatory variables that varied both over the rounds and between subject and session. Finally, there were explanatory variables that vary between individuals but do not vary over the rounds.

Results
In total 4320 observations of subjects' first and second action choices were collected from 180 undergraduate students. 4 The Breusch and Pagan Lagrangian multiplier test for all three equations established that individual effects are present in the data. The Hausman test cannot reject the null hypothesis that the coefficients for the fixed and random effects model are the same; implying that the random effects coefficients are not correlated with the individual error terms. As an additional test, we ran a GLS regression fixed and random effects model and performed the Hausman test and obtained the same result. Comparisons of the same coefficients from all models show the differences to be minimal; the signs and the statistical significance on the coefficients remain the same. 5 Change in the probability of observing the dependent variable, if the independent variable changes by one unit

Hypothesis 1 Result:
The subject's first decisions made before observing an informative but imperfect message signal, were more reflective of optimal decision theory (BEU) with experience. This was not the case for the second decision made after observing the message signal. Figure 2 illustrates the subjects' first decision inconsistency rate relative to the BEU benchmark over the 24 rounds. In early rounds, subjects violated BEU decision rules and converged on optimal decisions with practice and when the difference between the higher state lottery and the alternative state lottery were exaggerated (-0.609, p < 0.001) (See Table  A, Appendix C). However, for the second decision, when the BEU decision rule required a subject to update their initial beliefs given the new observed information (apply Bayes law in conjunction with expected utility theory) and when the BEU response was not aligned with the higher payoff lottery choice, subjects' had a higher BEU inconsistency rate and experience with the decision task had no impact (-0.115, p > 0.1) (Table A, Appendix C).

Hypothesis 2 result:
When the higher economic state (first-order stochastic dominant lottery) was aligned with the optimal choice, subjects' behaviors was more reflective of BEU choices.
The 2 nd choice BEU inconsistency rate varied significantly depending on the message received. On average the subjects' behavior was reflective of the BEU sequence of decisions more often when there was a higher payoff lottery associated with the BEU action given the message ( Table B, Appendix C) Figure 3 below highlights this result.

Hypothesis 3 Result:
The subjects were unable to apply optimal decision theory because they lacked the 'ability to do the math' (i.e., update their prior beliefs given new statistical evidence).
Half the subjects within this experiment were informed (provided with the Bayes' law calculations), and their responses were compared to subjects who were uninformed, (left to calculate Bayes' law on their own). For example, informed subjects were told prior to receiving a message that there was an equal chance that the round was being played in state 1 or state 2. After the message was received, they were told the new probability of being in either state (i.e., given the message observed there is now 70 chances out of 100 that we are playing in state 1). Overall, subjects who were provided with the Bayes law calculation (informed) once a message signal was received, did not have statistically different inconsistency rates than subjects who were left to calculate Bayes law on their own (uninformed) (0.29, p > 0.1) (See Table B, Appendix C).

Hypothesis 4 Result:
The more informative the imperfect message, the more behavior was reflective of BEU optimal decisions.
Deliberate experimental design changes across the 24 rounds changed the degree of informativeness of the imperfect message observed. These design changes allowed us to observe two systematic decision behavior patterns that deviated from optimal BEU decision theory that are not fully explained by the Reinforcement Learning model: 1. Behavior which is reflective of an over-weighing the informational value of the message received (a.k.a. Over-weigh); 2. Behavior which is reflective of an under-weighing the informational value of the message received (a.k.a. Status Quo) Importantly, the subjects who followed the non-optimal decision of underweighting the informational value of the message received were not the same subjects who followed the non-optimal decision of over-weighing the informational value of the message received. Subject behavior was most likely to be reflective of the 'status quo' decision rule when they are not math or economics students (3.7ppts increase, p ≤ 0.05) and classified as a RL based on the post-experiment survey (4.2ppts increase, p ≤ 0.01). Although a proportion of subjects had behavior reflective of over-weighing the informational value of the message received, there were no characteristics that were statistically significant contributions to this behavior type.

Hypothesis 5 Result:
There is evidence that suggested that the RL and BEU heuristics are complementary behaviors and when both are present they could either enhance or diminish optimal decisions. The 2 nd choice RL heuristic was aligned with the 2 nd choice BEU heuristic for subjects for 39.8% of the observations. The BEU inconsistency rate is 6.6% [95% CI: 0.056-0.075] when the RL and BEU heuristics were aligned and 35.2% [95% CI: 0.326-0.378] when the heuristics clashed. The inconsistency rate was 13.8% [95% CI: 0.128-0.148] when no RL heuristic existed. These results indicate that when a past BEU decision was rewarded (i.e. a WIN) a subject had a greater propensity to apply the BEU decision rule in the future resulting in fewer BEU inconsistencies (the RL and BEU heuristic are aligned). Additionally, if the BEU decision was not rewarded (i.e. LOSE), potentially creating a future decision environment where the subject's RL and BEU heuristic clash, optimal decision behavior was compromised. This result implied that subjects may treat statistically independent events as interdependent (See Table C, Appendix C).

Discussion
Generalizability of BE laboratory experiments to real world environments has long been contested [36]. However, it has been found that for experiments that do not match the external environment, there is an opportunity to begin developing and testing scientific hypotheses [37]. We acknowledge that the decision environment within the laboratory is simplistic in comparison to that of a public health crisis. It does not include various contextual factors (i.e., politics, ego, decisions made by teams of experts). Conversely, in both situations, decision-makers must take action with economic consequences in an environment of uncertainty, i.e., lack of evidence or confidence with the information provided and, in many cases lacking prior experience (ambiguity).
Despite an understanding that laboratory experiments cannot provide perfect external validity, Herbst & Mas [35] concluded that laboratory experiments may have more external validity than previously recognized. Therefore, it is reasonable to posit that systematic biases relative to optimal statistical inference decision choices found in a stripped down decision environment, could also be observed in more complex environments. As such, the results discussed in the context of COVID-19 below should be considered as possible explanations for why different decisions have been observed, particularly in decision environments of close proximity with similar constraints.
Ideally, benchmarking the actual decisions during COVID-19 relative to an optimal statistical inferencing decision rule to determine systematic biases in decision-making would be most beneficial in better understanding the observed behavior. In the absence of re-living this extreme public health event, researchers need to create models through available information, inferences and prior research that better explain outcomes and use these constructed models to predict future behavior that can be used to develop optimal public policy or best practises [38].
The COVID-19 binary decision to impose social distance orders or not, is simplified here to reflect the common elements of ambiguity and uncertainty also found in the binary choice lab experiment to provide context for the discussion that follows. It is hypothetical and, assumes that the decision-maker has been stripped of all outside influencers. It is presented here as a base model for future enhancements.
A public leader responsible for enforcing local rulings for their organization or jurisdiction is faced with a binary choice decision with known consequences depending on the end state. The two potential end states, good or bad, are known with evidencebased probabilities calculated by experts, where the good state predicts lower cases of COVID-19 deaths than the bad state. Of course, as the leader of this organization, we all hope for the low COVID-19 state; however, we also must prepare properly for the high COVID-19 state based on the evidence presented. The leader must take one of two actions.
Each action taken is associated with a known lottery (i.e., economic payoffs depending on the actual end state). The first action is to impose no social distancing orders by leaving businesses, schools and sporting events open to the public. If the end state is indeed the low COVID-19 state, then this will be associated with the highest economic payoff. However, if the end state happens to be the high COVID-19 state then the economic payoff will be even less than if the leader had decided to close these activities initially. The second action imposes social distancing orders; closing businesses, schools, and sporting events. If the end state in this case is the high COVID-19 state, then the economic payoff from closing will be greater than if the decision was made to keep these activities open. Before taking a terminal action, decision-makers are provided message signals that are both informative yet imperfect in the form of testing rates. Although imperfect, the message does provide statistically relevant information on the probability of being in either the low or high COVID-19 state. Assume, the lotteries associated with each action are represented as follows: Pick: No Social Distancing Orders: If the actual end state is low COVID-19 the net benefit is $$$$ If the actual end state is high COVID -19 the net benefit is $$ Pick: Social Distancing Orders: If the actual end state is low COVID-19 the net benefit is $ If the actual end state is high COVID -19 the net benefit is $$$ There were five systematic deviations from optimal statistical inferencing behavior observed in the lab study discussed in the context of the COVID-19 decisions: (1) the importance of experience, (2) the importance of understanding the odds when presented with additional relevant statistical information, (3) the size and difference in economic pay-off given the choice, (4) how a decision-maker values new information, and (5) past outcomes conditional on past choices made.

Hypothesis #1: Experience Matters…To a Point.
This experiment showed experience plays a role in achieving optimal choices in certain decision environments. Specifically, it found that for a sub-set of decisions, where only expected payoffs of either alternative is observed, the subjects' behavior converged over time toward optimal decision choices (BEU). However, in more complex environments where subjects were required to update their prior likelihood ratios of a certain event occurring and combine it with the expected payoffs for each alternative, experience did not lead to more optimal choices. To better understand how this finding might apply to decisions made during COVID-19, we look to the early stages of the pandemic where the spread of COVID-19 was largely restricted to China, and information regarding the health of their nation was suppressed. Given no available statistically relevant information (similar to the sub-set of decisions described above), nations chose actions that maximized expected payoffs (chose the higher payoff lottery) and elected to continue business as usual. Given no reported cases or deaths occurring within their country, the optimal BEU decision was to stay open. This could be described as a business-as-usual strategy, the experienced choice. Additionally, we observed many countries' decisions that elevated health initiatives while maintaining their economic prosperity [39]. Examples included questionnaires and screening procedures for individuals arriving from abroad, and eventually quarantine procedures for these same travelers. Some countries behaved as outliers, instituting early travel bans and heavy restrictions; however, most converged on the optimal action (BEU) given the available information [39]. As meaningful information was disseminated from reputable sources, public officials were required to update their prior beliefs regarding the potential probability that their country or region would either be in a low or high COVID-19 state, and given this information re-calculate their expected payoffs with their new predictions on the state of the nation. This represented a new decision environment, and one which was more complex; a decision environment more representative of the second action choice in the laboratory experiment. A balance needed to be struck between economic costs such as job and GDP loss, with health costs such as screening, cost of treatment and mortality rates. With the novelty of the virus and the volley of incoming information, leaders needed to update their beliefs at nearly every decision. The inconsistencies are seen both across publicly available data as well as the media [40,41]. Countries, states, provinces and regions made many different decisions, even though the information published was largely the same [42]. For example, some countries implemented full travel bans, others full shutdowns with high levels of testing, while others remained completely open with little to no restrictions [42]. The observed erratic decision-making could be attributed to the complex nature of the decisions, the lack of lived experience and/or competing political and personal interests. Ren et. al [43], found that as experience with a decision task increased, business managers relied on past decisions when making future choices. They further noted that when the decision environment became more complicated this over-reliance on experience resulted in stronger biases than decisionmakers with less experience.
The immediate action requirement given the rapid spread of the disease may also have contributed to the varying responses. According to the dual cognition processing theory [44], decisions are subject to two cognitive processes, an implicit (automatic) unconscious process and an explicit (controlled) conscious process. The decisions could vary therefore, depending on which process is activated and by whom, where improvements in decision choices in environments of uncertainty would occur with additional time and education [45]. Specifically, time and persuasion efforts to reform the implicit or automatic response of the decision-maker for future immediate action requirements and time and education for explicit conscious thinking. These findings and the aforementioned observations point to the importance of keeping a historical log of the data provided, the decisions made, and the consequences of the decisions during COVID-19, to proxy as experience for future complex public health crises.
3.2 Hypothesis #2: Optimal decisions are more likely to be made by decision-makers if it's associated with a higher economic payoff.
When the optimal decision choice aligned with the higher payoff state, the inconsistency rates were significantly lower. With no available information regarding the rate of spread, or mortality rates, nations and regions could easily observe the largest economic payoffs associated with keeping businesses open and chose accordingly aligning their choice with the optimal statistical inferencing rule. However, as statistically relevant information became available from reputable sources, countries were required to update their beliefs and give some consideration to the possibility of being in a high COVID-19 state. Even if the information indicated a high state, similar to subjects in the experiment, admitting to this and choosing the state associated with the lower payoff lottery was more difficult for many decision-makers [46]. Samuelson & Zeckhauser [47] and Charness & Levin [34] in their studies identified regret avoidance and a taste for consistency, respectively as a possible reason for this status quo behavior; theories for status quo bias that continue to be supported by research [48][49][50]. The deviation from optimal choices was a common occurrence throughout all stages of the pandemic. For instance, within nations where COVID-19 data indicated a high COVID-19 state, many regions still opted to keep schools open [51,52]. Additionally, given this same data, many businesses considered essential for some regions were not essential in other regions. Ontario updated its list of essential businesses several times [53]. This led to public outrage as medical professionals and the general public did not deem several of the services on the list as essential [54].
The observed behavior indicated a bias toward decisions associated with a higher economic payoff, regardless of accurate additional information that would suggest otherwise. The challenge this presents for public health officials is that decision-makers appeared to be willing to gamble on a higher payoff lottery choice, with lower odds of being in a low COVID-19 state and a potential of more lives lost, as opposed to gambling on a lower payoff lottery choice, with better odds of being in a high COVID-19 state leading to more lives saved. Providing additional information to public officials who make these high-stake decisions could be beneficial if it demonstrates how the decision aligned with the lower economic payoff lottery in the short term, may be more optimal and lead to higher economic payoffs in the longer term.

Hypothesis #3: Giving the Odds Won't Change Much.
Historically, the deviations from optimality in the decision-making task similar to the experiment presented in this study were attributed to an inability to do complicated math [55]. To test this theory, the probability of being in either state given the message received was provided to half the subjects within the experiment and the results found that deviations from optimal decision-making were the same as those who were not provided this calculation. As the pandemic spread, many countries diverted away from the economically focused decision choice toward a more health-focused choice [53,42]. With the evidence and the calculated odds for the predicted state (low vs. high COVID-19 state) from public health officials, many countries shut down and issued stay-at-home orders, bringing the world's economy to a standstill. On the other hand, for other countries provided with the same calculated odds regarding the state of the nation, the countries chose to keep businesses and schools open [42,56]. The observed USA decisions to close or open businesses and/or public areas (such as beaches and parks) appeared random [42]. Given the results of the experiment, the observed behavior could suggest that decision-makers are incapable of properly assessing the impact of the data, inhibiting their ability to calculate the odds of being in either state, leading to non-optimal choices. Public health officials must consider that even if perfect testing and health metrics were being provided, inconsistencies relative to the optimal decision rule could still occur. Bounded rational decision agents may have difficulty separating values from objective scientific evidence [57] or rank policy aims in a logical manner [58]. Efforts to improve the comprehensive rationality of decision-makers are important i.e., how to separate values from facts, how to properly rank policy aims, or how to proceed with a decision by providing a linear step-by-step processes.

Hypothesis #4: Over-valuing or Under-valuing information? BOTH.
Findings highlighted individual characteristics of the decision-maker mattered when making decisions.. Some individuals consistently overweighed the informational value of the statistics, while others under weighed it. In the context of the Pandemic, the potential for this type of biased behavior was observed when information on the severity of the virus and its potential economic impacts were released. As new COVID-19 cases, recoveries, deaths, and symptoms were reported daily from Wuhan, China, different conclusions were reached by leaders, health officials, and the general public. Some compared COVID-19 to the seasonal flu, SARS-CoV or a novel more deadly virus, while others concluded the economic hardships far outweighed any potential health consequences [59][60][61]. The observed undervaluing or overvaluing of the statistical information provided by the message signal within the experiment had a statistically significant finding explained by demographic and socio-cultural characteristics of the individual decision-maker. This observation suggests that a major consideration when developing optimal policy is proper identification and understanding of the personal and socio-cultural characteristics that may influence decision choices.

Hypothesis #5: Optimal decision once, optimal decision again?
The reinforcement of a previous choice as an optimal decision led to lower inconsistency rates for future decisions. Countries labeled as models for handling the COVID-19 pandemic may have benefitted from their history [42]. South Korea's widespread testing, and New Zealand's early and swift closure were reaffirmed as optimal decisions when COVID-19 cases continued to rise in other countries not employing similar control measures [42]. Both countries given these positive outcomes continued to implement successful COVID-19 control strategies [42,62]. Results from this experiment also suggested that subjects treated statistically independent events as interdependent, and this led to higher inconsistency rates. The balance between learning from experience and recognizing the independent nature of a situation could be difficult during a crisis. For example, upon witnessing New Zealand's approach, other countries such as India and Argentina followed applying the same swift shut down protocols [42]. This decision for these countries did not have the same outcomes given differences in the social and economic environment and the healthcare capacities between the countries.
Two concepts in experimental literature affirm the observed biased behavior of treating statistically independent events as statistical inter-dependent. The 'hot hand' fallacy; associated with the game of basketball; describes the belief by individuals that a basketball player who has scored baskets several times in a row is more likely to score again because they have a 'hot hand' [63]. The second concept is known as the 'gambler's fallacy'. The gambler's fallacy has been observed in casinos and card games. Opposite to the 'hot hand fallacy', this observes an individual's tendency to underestimate the odds of winning after consistently winning several times in a row [64]. However, in both cases when computing statistical dependencies between each event (e.g., basketball shot, hand played) it was found that there was either no dependency or if there was a dependency, that the likelihood of the trend (either negative or positive) was actually the opposite [64]. As such, it is pertinent that public health officials recognize the unrelated nature of decisions made across time frames, in different contexts, and by different people.
Overall, subjects performing a relatively simple binary-decision task are adept at selecting optimal choices over time, prior to observing additional statistically relevant information. Although this may provide evidence that decision-makers can maximize expected payoffs, it is also possible based on the lottery choices associated with each action, that decision-makers choose optimally simply by properly ranking the action associated with the first-order stochastically dominant lottery (picking the action associated with higher payoff lottery choices). Although the results are not sufficient evidence to confirm or refute the existence of a threshold where subjects no longer apply BEU decision rules due to task complexity [34], it does lend further support to the notion that learning behavior depends in part on the context and environment in which the decision-making is conducted.
Unlike the pandemic decision environment, in the laboratory setting the economic payoffs given each state were known, as well as the informational value of the message, and yet mistakes were still made relative to optimal decision theory. It would seem prudent, given the emergence of superbugs in the future, for nations/jurisdictions to do the cost-benefit analysis for various outcomes (states of the nation) to provide timely and accurate payoff calculations for decision-makers. This may provide an opportunity to eliminate significant inconsistencies in decision-making. Furthermore, uncovering systematic biases found in experiments such as the one shown in this study could assist in informing future field studies toward the development of policies that nudge decisionmakers to make the best decisions that result in greater well-being for all stakeholders.
A major limitation of this research is that the laboratory experiment conditions do not represent the real context in which these pandemic decisions are made. Although we are able to observe the systematic biases in optimal decision-making within a simple closed experiment and can apply the findings to help explain what may have happened, there are many other factors that may have influenced the varied decisions made given the same information to open/close businesses and public services within the economy. Such factors may include, egos, risk tolerance, stress, political agenda, and personal characteristics [65][66][67].
Future research that applies BE to decisions made in the public health space has tremendous potential to enhance public policy. Firstly, the lab results from this simple experiment discussed in the context of COVID-19 should be considered as the first phase of discovery toward a better understanding of the divergent responses by public officials given same statistical information. In predicting outcomes, optimal statistical inferencing results in the most accurate predictions in environments of uncertainty and ambiguity. The second phase of discovery would be to layer on additional situational factors to build a more representative model of the decision environment and once again observe systematic biases relative to the optimal decision rule in both laboratory and field settings. This iterative discovery process assists in designing decision-making models with greater predictive value that can be used to develop future Public Health policies in a time of crisis. Specifically, how can we use the observed systematic biases to nudge decisionmakers toward the public health defined optimal choices?

Conclusion
This study conducted an empirical inquiry to understand, the systematic biases that are present in binary choice decisions and explore how they might be applicable in times of crisis. Specifically, we asked: What are the systematic biases observed in a controlled binary choice experiment with an uncertain and ambiguous decision environment? How might these systematic biases have influenced the binary choice decisions of public officials and politicians in the uncertain and ambiguous COVID-19 pandemic environment?
The findings from this study suggest individuals adopt different decision rules depending on both personal attributes (i.e. skillset, sex, experience) and on the context and environment in which the decision task is conducted. A number of important observed behaviors emerged from this simple experiment which may have influenced and could help explain the contradictory binary choice COVID-19-related decisions, including: (1) the importance of experience, (2) the importance of understanding the odds, (3) the size and difference in economic pay-off given the choice, (4) how a decision-maker values information, and (5) past outcomes conditional on choices made.
This cross-disciplinary research between BE and public health provides an opportunity to gain a better understanding of both the situational factors, and the systematic biases that influence decision making, within the public health environment. The objective of continued research in this area would be to develop a more representative model of decision-making processes, particularly during crisis, that would serve to enhance future public health policy design.
Author Contributions: Conceptualization, K.R., E.R., A.P.; methodology, K.R., E.R., A.P.; validation, K.R., A.B. and E.R.; formal analysis, K.R., A.B.; investigation, K.R., E.R., and A.B.; resources, A.P.; data curation, K.R.; writing-original draft preparation, K.R., E.R., and A.B.; writing-review and editing, K.R., A.P., E.R., A.B.; visualization, A.B.; supervision, K.R., E.R.; project administration, A.P.; funding acquisition, K.R. All authors have read and agreed to the published version of the manuscript." Funding: This research received no external funding. Institutional Review Board Statement: The members of the University of Guelph, Ontario, Canada Research Ethics Board have examined the protocol which describes the participation of the human participants in the above-named research project and considers the procedures, as described by the applicant, to conform to the University's ethical standards and the Tri-Council Policy Statement, 2nd Edition.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: The data presented in this study are available on request from the corresponding author. The data are not publicly available due to REB specifications.
Acknowledgments: We would like to acknowledge our Undergraduate Research Assistant: Vinuli Da Silva Conflicts of Interest: The authors declare no conflict of interest.

Appendix A. Experimental Design
We conducted 6 different treatments during 12 classroom sessions on the University of Guelph campus, Guelph, Ontario, with 180 students recruited by e-mail from the undergraduate Bachelor of Commerce student population. On average subjects earned $33.60 for a 90 minute session. Each classroom session consisted of approximately 15 students who participated in 24 rounds of an individual task consisting of two (2) binarychoice decisions per round; where the second binary choice decision occurred after observing an imperfect statistically relevant information signal.
Upon arrival, participants were given a handout explaining the experiment set-up and detailed instructions. The facilitator read the instructions aloud and demonstrated the experiment. The subjects were told that the amount of money that they would earn depends both on their individual choices and on random chance. In addition, they were told that the objective of the experiment is to maximize their earnings. Each subject participated in a practice round prior to commencing the rounds designated for payment.
Subjects are shown at the beginning of each round two opaque bags, each containing a combination of red and blue poker chips. The distribution of red to blue chips within the two bags is symmetric with one bag containing a greater proportion of red chips and one bag containing a greater proportion of blue chips. For example, if bag 1 contains 35 red and 15 blue chips, bag 2 will contain 15 red and 35 blue chips. Subjects are told and shown the precise number and combination of red and blue chips contained within each bag.
The step-by-step procedure outlined in Table 1 and described below.
In step 1, a random draw determines with equal probability which one of the two bags described above is selected for use during the round. All participants do not learn until the end of the round which bag has been chosen. In step 2, subjects are asked to choose one of two actions (action A or action B), where each action is associated with two different payoff amounts dependent on the bag that was randomly selected in step 1. In step 3, subjects are shown a sample draw of a poker chip (imperfect message) from the selected bag. In step 4 subjects can either maintain the action choice selected in step 2 BEFORE observing the sample draw or change their action choice selection AFTER observing the sample draw. Table 2 provides the information that is shown and communicated to the subjects prior to taking their first and second action choice decisions for rounds 1-4 when performing the FREE message decision task.
In step 5 a random draw determines with equal chance whether the subjects' first or second action choice is used to calculate earnings. This payment mechanism incentivizes participants to apply effort to both action choices. In step 6, the bag that was used during the round is revealed. The action that was selected (1 st or 2 nd ) based on the random draw in step 5 determines the size of the payment received by the participant as outlined in table 3. From table 3 for rounds 1-4, if bag 1 is revealed as the bag selected in step 1 of the experiment, the participant will receive $2.00 if they selected action A and $0.50 if they selected action B. However, if bag 2 is revealed as the bag selected in step 1, the participant will receive $0.75 if they selected action A and $1.75 if they selected action B. Subjects are informed each round of their earnings. Subjects are asked to record their first and second action choices, the results of each of the random draws, whether they received payment for their first or second action choice and their actual earnings for each round on the provided tracking sheet. The objective of the tracking sheet is to keep an account of each subject's history of events from past rounds to allow for the potential manifestation of reinforcement learning behaviour.
The exogenous parameters, the distribution of red to blue chips contained within each bag and the payoffs associated with each action choice, change every four rounds and remain constant for 4 consecutive rounds. Given the exogenous parameters for this experiment, the risk neutral (RN) optimal action taken prior to receiving an imperfect message is associated with a lottery that first-order stochastically dominates the alternative action's lottery for all rounds. Therefore, any expected utility maximizer with monotonic preferences should select the optimal first action regardless of risk preferences. The rationale for this design is to assist subjects in an easy optimal first choice, allowing for a cleaner assessment of subject behaviour when selecting a second action conditional on an imperfect information signal.
Similarly, the 2 nd RN optimal action conditional on the red chip message is also associated with the lottery that first order stochastically dominates the alternative action's lottery for all rounds. Again in this case, any expected utility maximizer with monotonic preferences should select the optimal action regardless of risk preferences. On the other hand, there is no first or second order stochastic dominate lottery associated with either of the action choices conditional on a blue chip message. Although in this case it is now possible for risk preferences to influence choice, the optimal second choice for the risk neutral BEU maximizer continues to be the same optimal choice over a wide range of constant relative and absolute risk aversion utility curves. 1 Therefore, given this experimental design, when the message received is a blue chip versus a red chip, the consequent action choice is more suggestive of a subject's ability to follow the BEU decision rules.
There is one final note on the choice of the risk neutrality assumption when establishing the BEU benchmark for comparison with subject behaviour. Arrow [68] demonstrates in his Essays on the Theory of Risk Bearing that expected utility maximizers are (almost everywhere) arbitrarily close to risk neutral behaviour when stakes are arbitrarily small. This is later verified by the Rabin Calibration [69] which shows that the risk neutral prediction holds not only for small stakes but also for large and economically important stakes. 2

Appendix B. The BEU & RL Heuristics used as Benchmarks
The BEU Benchmark A risk neutral 6 BEU participant takes an initial action given the unconditional (prior) probability of either state with the objective of maximizing her expected earnings. In this experiment, there are two possible states, represented by , j ϵ {1,2}, Let the unconditional probability (initial belief) of playing in state j be, ( ), where, ∑ ( ) = 1. Let C(a,Sj) be the payoff if action a is chosen conditional on the state (Sj), where a ϵ {A,B}. The initial decision to choose action A or B is based on the prior probabilities of being in either state, ( ) , and the state contingent payoffs associated with each action, C(a,Sj). Specifically, the risk-neutral BEU will choose action A versus action B prior to an informative but imperfect message signal when: = ( ) ( , ) + ( ) ( , ) ≥ = ( ) ( , ) + ( ) ( , ) The risk neutral BEU maximizer is then provided with one of two possible randomly selected messages signals. Let the two possible messages be , k ϵ {1,2}, where is message 1and is message 2.The participant is then required to propose a second action choice conditional on the message received. To do this the BEU maximizer will first, update her prior probabilities of being in either state to a new set of probabilities (posterior) using Bayes theorem. Second, she will combine these updated probabilities to determine the expected payoff from taking either action then choose the action with the highest expected payoffs.
Bayes theorem states that the posterior probability that a risk-neutral BEU maximizer should attach to the state after receiving a message, ( | ), is: 6 The optimal choices for the risk neutral BEU maximizer continue to be the same optimal choice over a wide range of constant relative and absolute risk aversion utility curves. Arrow (1971) demonstrated that expected utility maximizers are (almost everywhere) arbitrarily close to risk neutral behavior when stakes are arbitrarily small. Rabin Calibration (Rabin, 2000) shows that the risk neutral prediction holds not only for small stakes but also for large and economically important stakes. These findings and the exogenous parameter choices for this experiment provide good rational for the Risk neutral assumption. Where the represents the likelihood of the message ( ) conditional on state, . Note that regardless of the message received, one of two states must persist. Therefore, In short-form notation let, The Reinforcement Learner (RL) decision rule: The Reinforcement Learner (RL) decision rule is based on the simple WIN-STAY, LOSE-SHIFT heuristic used by Charness and Levin (2005). If a subject is successful in the first round of the experiment, she will STAY with this same action choice in the second round (WIN-STAY) and if the subject is unsuccessful in the first round, she will shift to the alternative action choice in the second round (LOSE-SHIFT); where, both RL actions are predicated on the subject experiencing the same past history dictated by both the fixed and random exogenous parameters set by the experiment.
It is assumed that the subject will apply the WIN-STAY heuristic for a current round when the prior round correctly identified the state associated with the higher payoffs (WIN-guessed the right state) and will apply the LOSE-SHIFT heuristic for a current round when the prior round choice incorrectly identified the state associated with the higher payoffs (LOSE-guessed the wrong state), The WIN-STAY or LOSE-SHIFT heuristic is only in a current round if the exogenous parameter values experienced by the subject are the same as what was experienced in a prior round. As such, there are less RL inconsistency observations versus the BEU benchmark. Table   Table A