(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Developmental shifts in computations used to detect environmental controllability [1] ['Hillary A. Raab', 'Department Of Psychology', 'New York University', 'New York', 'United States Of America', 'Careen Foord', 'Center For Neural Science', 'Romain Ligneul', 'Champalimaud Research', 'Champalimaud Center For The Unknown'] Date: 2022-07 Accurate assessment of environmental controllability enables individuals to adaptively adjust their behavior—exploiting rewards when desirable outcomes are contingent upon their actions and minimizing costly deliberation when their actions are inconsequential. However, it remains unclear how estimation of environmental controllability changes from childhood to adulthood. Ninety participants (ages 8–25) completed a task that covertly alternated between controllable and uncontrollable conditions, requiring them to explore different actions to discover the current degree of environmental controllability. We found that while children were able to distinguish controllable and uncontrollable conditions, accuracy of controllability assessments improved with age. Computational modeling revealed that whereas younger participants’ controllability assessments relied on evidence gleaned through random exploration, older participants more effectively recruited their task structure knowledge to make highly informative interventions. Age-related improvements in working memory mediated this qualitative shift toward increased use of an inferential strategy. Collectively, these findings reveal an age-related shift in the cognitive processes engaged to assess environmental controllability. Improved detection of environmental controllability may foster increasingly adaptive behavior over development by revealing when actions can be leveraged for one’s benefit. The ability to determine when one’s actions are consequential organizes learning and decision making across the lifespan. However, few studies have examined how the ability to detect control over our environment changes from childhood to adulthood. Here, we leveraged a computational modeling framework to characterize the component learning processes underlying controllability assessment in children, adolescents, and adults. We observed age-related improvements in controllability assessment that stemmed from an increasing ability to represent contingencies between states and actions and to use that knowledge to make informative interventions that yield diagnostic evidence of the current degree of control. Increasing ability to accurately assess environmental controllability may confer greater recognition of opportunities to adaptively pursue rewards through goal-directed action across development. Funding: This work was supported by a Jacobs Foundation Early Career Research Fellowship (to C.A.H.), a Klingenstein-Simons Fellowship in Neuroscience (to C.A.H.), a National Science Foundation CAREER grant 1654393 (to C.A.H.), a Fyssen Foundation postdoctoral award (to R.L.), a National Science Foundation Graduate Research Fellowship DGE1839302 (to H.A.R.), the NYU Vulnerable Brain Project, and the NIH-funded R90DA043849 NYU Training Program in Computational Neuroscience (to C.F.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Here, we asked whether children, adolescents, and adults differed in their ability to detect changes in environmental controllability and to use informative interventions to reveal these shifts in causal structure. Ninety individuals, aged 8–25, performed a child-friendly adaptation of the ‘Explore-and-Predict’ task—a novel task designed to assess individuals’ ability to estimate the degree of environmental controllability in a dynamic, yet predictable, context and how these beliefs are shaped by the informativeness of interventions [ 54 ]. Throughout the task, participants flew with one of two pilots on different colored planes around a set of islands. A key feature of the task was that participants’ choices determined where one of the pilots would fly (i.e., the controllable condition), but critically had no influence on the flight path of the other pilot (i.e., the uncontrollable condition). The two pilot conditions alternated covertly throughout the task, and participants were instructed that they could earn more points if they accurately tracked the current condition. On exploratory trials, participants could make active interventions to infer whether their actions were consequential. On prediction trials, participants were asked to report the likely subsequent state, revealing these controllability beliefs. We used computational modeling to assess both the complexity of participants’ task structure representations and the degree to which their estimation process reflected use of an inferential strategy. We hypothesized that the detection of environmental controllability would improve with age, reflecting the acquisition and use of mental models of environmental structure in controllability estimation across development. Whereas adults’ greater ability to maintain complex task structure representations in working memory may facilitate the use of informative interventions to infer controllability, younger individuals may, instead, rely on random exploration to generate evidence that can reveal the contingency structure of the environment. An environment can be considered controllable to the extent that actions bring about specific state transitions. Such a causal coupling between the actions of an agent and the states of the environment can be estimated by comparing the predictability of upcoming states (e.g., dessert) when only considering previous states (e.g., dinner) versus when considering both previous states and actions (e.g., dinner at which one showed good manners) [ 52 , 53 ]. Incorporating actions into the predictive process will only improve one’s forecasts in controllable environments, where actions causally influence state transitions. Thus, computational models that compare the accuracy of state predictions based on states alone, versus those based on states and actions, can provide a formal account of the controllability estimation process [ 2 , 54 ]. Importantly, predictions about upcoming states can be made in different ways depending on the task and the amount of knowledge one has about its underlying structure [ 2 , 5 , 54 ]. By comparing how distinct computational models fit participants’ behavior at different stages of development, we can characterize the nature of the learning processes through which participants detect environmental controllability. As children may not be as adept as adults at making informative causal interventions, they may instead rely on other cognitive processes that can effectively support the detection of environmental controllability from an early age. Learning of statistical regularities present in the environment emerges early in development [ 42 – 44 ]. Younger individuals also tend to exhibit increased stochasticity in action selection [ 45 – 49 ]. In lieu of making highly informative interventions, younger individuals may discern the contingency structure of the environment by applying their robust statistical learning abilities to the observations of actions and outcomes generated through random exploration [ 50 , 51 ]. As children inherently have less experience than adults across many situations, the coupling of greater intrinsic behavioral variability with robust statistical learning ability may be particularly advantageous, as it provides a generalizable strategy for detecting control across diverse environments [ 45 ]. An extensive literature has investigated the development of causal inference more broadly [ 28 , 29 ]. Preschoolers, and even infants as early as 8 weeks of age, can perform actions to reveal the causal structure of their environments [ 30 – 32 ]. However, younger children often choose less informative interventions in more complex tasks [ 30 , 33 , 34 ]. The ability to identify and select informative interventions to reveal the causal structure of the environment continues to develop across childhood and into adolescence [ 34 , 35 ]. While making causal interventions to test hypotheses about action-outcome contingencies is an efficient way to learn, it may require the use of a mental model of the environment, an ability that undergoes continued refinement from childhood into young adulthood [ 36 – 38 ]. Working memory, a cognitive process critical for maintaining and manipulating these mental representations of environmental structure [ 39 ], shows similar age-related improvements across adolescence [ 40 , 41 ] may underpin both the ability to make informative interventions and to use the resulting evidence to infer causal structures. Thus, developmental improvements in the cognitive processes that support effective intervention strategies and causal inference suggest that accurate assessment of the degree of environmental controllability may also exhibit marked age-related changes from childhood to adulthood. However, given that the ability to control the environment is typically a premise in causal learning studies, it remains unclear how the ability to test the hypothesis of whether or not one’s own actions actually have causal efficacy might change with age. Experiences of environmental control have a profound effect on learning and behavior, shaping developmental trajectories from an early age [ 3 , 11 , 12 ]. Infants as young as two months old are sensitive to when outcomes are contingent upon their own actions [ 13 ], and early contingent social interaction influences diverse aspects of social and cognitive development, including language learning [ 14 , 15 ] and caregiver attachment [ 16 ]. Perceptions of control, whether actual or illusory, are often experienced as subjectively rewarding, a phenomenon proposed to underpin an intrinsic motivation for controllability [ 17 , 18 ]. Motivation to exert control is proposed to be a key driver of development [ 19 , 20 ]. Control facilitates learning and memory [ 21 , 22 ], even from a young age [ 23 , 24 ], and artificial agents equipped with a drive to exert control develop more complex action repertoires [ 25 , 26 ]. During adolescence, greater parental independence provides increased opportunity to make autonomous decisions, which may make accurate recognition of the contexts in which one’s actions are most consequential particularly beneficial [ 27 ]. Collectively, these findings suggest that detection of environmental controllability provides foundational knowledge about the structure of the environment that supports the development of an individual’s behavioral repertoire. Over the course of our lives, we are faced with the challenge of determining when our actions are consequential. In environments that are highly controllable, our actions can reliably produce a particular outcome, whereas in uncontrollable environments, our actions have no causal influence. By estimating the degree of contingency between actions and their resulting outcomes, individuals can assess the extent of control they have over their environment, and adapt their behavior accordingly [ 1 – 6 ]. For example, imagine a child whose parents typically reward her good behavior at the dinner table with dessert. This child might be on her best behavior when eating with her parents because she has learned that her actions directly influence the likelihood of her getting a treat. When at her friend’s house, this same girl might assume that, just like at home, her behavior at the table will determine whether she can have dessert. Thus, she might expend energy minding her manners even if, in actuality, meals at her friend’s home always finish with dessert. Rather than simply generalizing prior beliefs about the controllability of the environment, individuals can make informative interventions to reveal the actual degree of contingency between actions and outcomes [ 7 – 10 ]. Returning to the girl in the example, always being on her best behavior when at her friend’s house cannot test her assumption that her table manners are consequential. To disambiguate whether good behavior, or simply an indulgent parent, is responsible for her getting dessert, a more informative intervention would be for her to occasionally behave poorly. By varying her behavior—that is, by exploring—she can obtain better evidence for assessing whether her actions influence the outcomes she receives when the true causal structure of the environment is unknown. Methods Ethics statement This study was approved by the New York University Committee on Activities Involving Human Subjects (IRB #2016–1194). All participants or a parent, in the case of a minor, provided written consent prior to participation. Participants were compensated $15/hour and were instructed that they would receive a bonus payment based on their performance. In reality, all participants earned a $5 bonus regardless of their performance. Participants As we did not know the size of our hypothesized effect, we targeted a sample size of 90 participants, based on previous developmental studies using computational modeling of choice behavior to characterize age-related cognitive changes [35,55–57]. Ninety-three participants, recruited from the New York City metropolitan area, completed questionnaires and a computer-based learning task. We excluded three participants due to technical errors during the task. Our final sample of 90 participants included thirty children (8–12 years old, mean = 10.46, s.d. = 1.55, female n = 15), thirty adolescents (13–17 years old, mean age = 15.44, s.d. = 1.44, female n = 15), and thirty adults (18–25 years old, mean age = 22.06, s.d. = 2.30, female n = 15). The breakdown of participants’ self-identified race was as follows: 35.56% Caucasian/White, 16.67% African American/Black, 25.56% Asian, and 22.22% Mixed Race. 13.33% reported identifying as Hispanic/Latinx. Participants’ total combined family incomes for the previous twelve months ranged from less than $5,000 to $100,000 or greater. Exclusionary criteria for the study included colorblindness, a diagnosis of psychiatric or learning disorders or disabilities, or the current use of psychoactive medications. All participants had normal or corrected-to-normal vision. Assessment of control task We assessed participants’ ability to detect the causal structure of their environment using a task that covertly switched between a controllable and uncontrollable condition. This task was adapted from a previous version of the paradigm created for adults [54]. To make the study suitable for children, we decreased the complexity of the task by removing a second set of controllable and uncontrollable task transitions and included a child-friendly narrative. Participants acted as a travel guide for other passengers by providing them information about flights to three destinations (i.e., the volcano island; the palm tree island; the lighthouse island). There were two pilots, representing the controllable or uncontrollable condition, who could fly passengers around on three different colored planes (i.e., pink, green, orange). One of the pilots flew to the destination based on a specific route, whereas the other pilot flew to the destination depending on the color of the plane chosen (Fig 1A). Participants were never told which pilot was flying the plane, encouraging them to discover the current condition through their choices on exploratory trials. The participant’s goal was to learn where the plane would fly and which pilot was currently flying, in order to help other passengers find their way (Fig 1B). Participants were never told that the two pilots reflected controllability conditions, and the concept of controllability was not introduced in the instructions. To provide a general incentive to perform well throughout the task, participants were told that they would earn treasure for helping other passengers successfully reach their destinations, and that the more treasure they earned, the more bonus money they would receive. To avoid any potential age differences in the subjective value of the bonus, participants were not told the amount of bonus money that they could possibly earn or the conversion rate of treasure into money. The computerized task was coded in Psychtoolbox v3.0.14 with Matlab v2016b. PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 1. Task Design. (a) Two pilots fly participants around a set of islands. In the controllable condition, the pilot flies to an island based on the color of the plane that the participant selected, allowing participants’ choices to influence the subsequent state. In the uncontrollable condition the pilot flies in a particular route without considering the plane that the participant selected. (b) A schematic of the task is shown. The controllable and uncontrollable conditions of the task alternate covertly. After every six exploratory trials, participants are probed about the subsequent state (“Where is the plane most likely to fly next?”) and the condition (“Which pilot was flying the plane?”). (c) On every exploratory trial, participants see the current island and two planes. They can choose one plane to see where it will fly. (d) Participants can use their knowledge of the transition structure during the exploratory trials to select choices that reveal which pilot is flying the plane. Only discriminatory choices can be used to disentangle the current condition. https://doi.org/10.1371/journal.pcbi.1010120.g001 Exploratory trials On each exploratory trial, participants saw the current island and two planes (Fig 1C). Participants chose one of the planes to see where it would fly. Notably, each pilot flew according to a particular rule. In the controllable condition, the color of the plane that the participant selected determined the destination. Thus, participants’ choices determined where the “color” pilot would fly (e.g., selecting the pink plane led to the palm tree, whereas selecting the green plane led to the volcano). In the uncontrollable condition, the pilot flew around the islands according to a particular route. The “route” pilot flew to the next destination in the route irrespective of the color plane that the participant had selected (e.g., volcano to the lighthouse to the palm tree). Prior to playing the game, participants were explicitly instructed about the flight routes that governed each pilot’s flying patterns. As the pilots alternated covertly, participants could use the knowledge about the transition structure for each pilot to discover the current condition. Critically, only one of the two planes displayed on each exploratory trial could be diagnostic as to the current condition. Choosing the other plane was not informative, as it flew to the same island in both the controllable and uncontrollable conditions. In the example shown in Fig 1C, selecting the pink plane would lead to the palm tree in the controllable condition and in the uncontrollable condition. Thus, from that departure island, selecting the pink plane contributes to learning transition probabilities but is uninformative as to the current condition. Instead, the green plane is the informative choice (Fig 1D). In the uncontrollable condition, selecting the green plane would lead to the palm tree, because the “route” pilot visits the palm tree after the lighthouse. However, in the controllable condition, the green plane would lead to the island with the volcano. Only by selecting this diagnostic choice (i.e., the green plane) could participants obtain information that enables discrimination between conditions. The controllable and uncontrollable conditions alternated eleven times during the 360 exploratory trials of the task. Although the total number of exploratory trials remained the same for all participants, the number of exploratory trials prior to each shift in condition was variable (between 18 to 42 trials), and the order of these condition intervals was shuffled between participants (i.e., all participants experience the same numbers of exploratory trials prior to condition shifts, just in a different sequence). Participants were prompted to take a short break three times during the task, resulting in four runs. The length of each run also varied between participants and consisted of no fewer than 72 exploratory trials and no greater than 108 exploratory trials. Following the break, they were reminded of the transition structures for both conditions. In order to make learning more challenging, the transitions were probabilistic throughout the four runs of the task. Participants were told that sometimes air traffic control instructed the pilot where to fly, but that most of the time pilots followed the usual pattern. During the first and third runs, the planes transitioned as described above 90% of the time, and went to each of the other two islands (the current and other “off-rule” islands) 5% of the time. During the second and fourth runs, we changed the transition probabilities to make the task more difficult. The planes transitioned as described above 80% of the time, and to each of the other two islands 10% of the time. Shifts in these transition probabilities were not instructed. Whereas previous studies have manipulated controllability by changing the probability that an action will yield a specific reward outcome [2,58], this task equates state-transition probabilities across the controllable and uncontrollable conditions, eliminating the potential confound of coupling controllability with predictability [53]. State predictions After every six exploratory trials, participants were asked to predict where a particular plane would fly next. Participants were sequentially asked about the two different colored planes that could depart from a given island. In the uncontrollable condition, correct predictions required selecting the same island for both planes because the color of the plane (i.e., the choice the participant made) did not influence which island appeared next. Conversely, accurate responses in the controllable condition required making divergent predictions about where the different colored planes would fly because the next island was contingent upon the color of the plane. In this way, state predictions could reveal the participant’s current beliefs about control. Feedback was randomly given on only one of the two state predictions to incentivize learning without revealing the underlying condition. For reinforced trials, correct predictions yielded an image of a chest full of treasure and incorrect responses led to an image of an empty treasure chest. On the half of trials that were unreinforced, a treasure chest was not displayed. In total, there were 60 prediction pairs (120 state predictions). Condition predictions Following the state prediction pairs, participants were asked which pilot was flying the plane as a direct index of control beliefs that did not depend on accurate knowledge of the task structure. Participants saw a picture of the two pilots side-by-side and then selected the pilot they believed to have been flying the plane, reflecting their beliefs about the current condition. No feedback was given on any of these 60 trials. Training phase Prior to the task, participants were given explicit instructions about where each pilot would fly and had practice playing the game. The training proceeded in the same manner as the task except that all state transitions for the exploratory trials were deterministic and participants were told which pilot was flying the plane. As in the task, participants were asked to predict where the plane would fly and which pilot was flying the plane. In the training, unlike during the task, feedback was given for every prediction. The flight paths for the “color” and “route” pilots were used in both the training and the task. Thus, participants became quite knowledgeable about the structure of the task by the end of the practice. The practice consisted of 48 exploratory trials and eight sets of predictions (where a set of predictions consists of 2 state predictions and 1 condition prediction), split evenly between the controllable and uncontrollable condition. Post-task questions After completing the game, participants answered six questions about the structure of the task. For each pilot, they were asked where they would fly next based on either the color of the plane (controllable condition only) or the current location (uncontrollable condition only). Participants were shown all three destinations and asked to select the correct one for each pilot. Secondary measures As we were interested in the underlying role of working memory in the ability to assess control, participants completed a short list-sorting working memory task from the NIH Toolbox Cognition Battery, which has previously been demonstrated to have high reliability and good construct validity in child and adolescent samples [59]. Participants were shown up to seven items from the same category (e.g., food or animals) displayed one at a time on an iPad. When the image was displayed, participants heard the name of the item. Participants were instructed to repeat the items back to the experimenter in order of increasing size. In the next part of the task, participants were presented with up to 7 items from two distinct categories and asked to sort all the items from one category by size prior to sorting the items from the other category by size. Thus, this task required maintaining and manipulating items in mind in order to perform well. In our analyses, except for the mediation analysis where we used raw scores, we included the age-corrected working memory scores to assess whether differences in working memory, independent of age, were related to behavioral performance. The working memory task was added to the experimental protocol after data collection had begun. Thus, data from one male and five female adolescents are not included in any of the working memory analyses, resulting in a sample size of 84 participants. To ensure that age was not confounded with differences in age-normed reasoning ability, we also administered the Vocabulary and Matrix Reasoning subtests of the Wechsler Abbreviated Scale of Intelligence (WASI). We observed no significant age differences in age-normed WASI scores [60] (see S1 Appendix). We also administered the internal locus of control questionnaire to assess subjective sense of control [61] and the MacArthur socioeconomic status questionnaire for exploratory analyses. Results are not reported here. Statistical analyses To examine both linear and quadratic effects of age, we conducted likelihood ratio tests for logistic models and ANOVAs for linear models to determine whether the inclusion of age alone or age and age-squared as predictors in the model provided a significantly better fit. We report which model provided a better fit through model comparison, along with the corresponding statistics from the winning model. Continuous and interval predictors in the regression models were z-scored for interpretability. Age was included as a continuous variable, unless otherwise noted, and categorical age bins were applied for visualization. Age-squared was calculated by squaring the z-scored age. Behavioral analyses were performed using R version 3.5.2 [62] and Matlab 2016a (Mathworks). All p-values reflect a two-tailed alpha threshold of p < .05. Mixed-effect models were conducted in R using the afex package Version .22–1 [63]. We used the optimizer “bobyqa” and set the number of model iterations to one million. The maximal model was specified to minimize Type I error [64], except where noted. If the maximal model did not converge or resulted in a singular fit, we reduced the random effects structure until the model converged. Details on model specification and full results can be found in S1 Appendix. Mediation analyses were performed using the mediation package in R [65]. Confidence intervals were estimated using 10,000 bootstrapped samples to test the significance of the mediation effects. Computational modeling To gain insight into the cognitive mechanisms underlying participants’ choices and the source of potential age-related differences, we fit four models (the Spectator, Actor, Learned Transition Structure, and Task Set models) inspired by a previous study in adults using a more complex version of the task [54]. These models formalize different ways of making predictions about the state that will be encountered next (s′)—by estimating transition probabilities based on states (Spectator model) or states and actions (Actor model); by estimating controllability based on the difference between the Spectator and Actor predictions (Learned Transition Structure model); or by inferring controllability based on perfect knowledge of the task transition structure (Task Set model), which affords the ability to make diagnostic interventions from the start of the task. The Task Set model is the only model that formalizes an inferential strategy and requires representation of the complete task structure in working memory. Thus, the set of models requires tracking an increasingly large set of transition probabilities, with inference requiring the maintenance and manipulation of these complex structures in working memory. For all but the Task Set model, predictions for transitions that were experienced are updated using an error-driven process. The prediction error captures the difference between the experienced transition (which was coded as 1) and the predicted transition probability. The extent to which learned transition probabilities are updated by the most recent prediction error is governed by the learning rate (α). Predictions for the transitions that did not occur are decremented, ensuring that transition probabilities sum to 1 (see S2 Appendix for details). The Spectator model makes predictions about the subsequent state (P(s′|s)) based solely on states (s), without taking actions into account, thus assuming environmental uncontrollability; Eq 1). (1) The Actor model makes these predictions (P(s′|s, a)) based on states (s) and actions (a), thus assuming environmental controllability; Eq 2). (2) The remaining two models, the Learned Transition Structure model and the Task Set model, both dynamically estimate the causal influence of actions over state transitions by comparing predictions about subsequent transitions from the Spectator and Actor models (P(s′|s, a)−P(s′|s)). This expected difference, Ω, represents an online estimate of the degree of controllability of the environment. In a controllable environment, actions contribute to predictions about the upcoming state and therefore, there will be an action for which P(s′|s, a) > P(s′|s). Higher Ω values provide evidence that the environment is more controllable. Unlike the Spectator and Actor models, both of the controllability models have a second-order learning rate (α Ω ), which governs the updating of the expected difference in the predictive capability of the Spectator versus the Actor model (i.e. P(s′|s, a)−P(s′|s); Eq 3). (3) In order to transform Ω into a probability between 0 and 1 that can be used for prediction, and to capture distinct ways in which estimates of control may be biased, Ω is transformed into an “arbitrator”, ⍵, using a two-parameter sigmoidal function (Eq 4). (4) The ‘bias’ parameter (bias Ω ) in the sigmoidal function acts as a threshold above which Ω is interpreted as evidence in favor of a controllable environment, and thus can capture persistent biases toward estimates of controllability or uncontrollability. The slope parameter of the sigmoidal transformation (β Ω ) determines the extent to which the most likely first-order model (i.e. the Spectator or the Actor depending on whether Ω is above or below the ‘bias’ estimate) is given priority when making predictions about a future state. For the Learned Transition Structure and Task Set models, state predictions are made by weighting the spectator and actor model using the arbitrator ⍵ (Eq 5). (5) The difference between the Learned Transition Structure model and the Task Set model is the manner through which the transition structure is learned. The Learned Transition Structure model updates the state-state and state-action-state transition probabilities from experience, whereas the Task Set model uses prior knowledge about the rules governing the task transition structure to infer the degree of controllability of the environment and make state predictions. Therefore, in the Task Set model, transition probabilities are fully pre-learned and set to either 1 or 0 based on the rules governing state-state and state-action-state transitions that were explicitly instructed during the training phase. As transition probabilities for this model are not updated, the first-order learning rate is set to 0. Within the Task Set model, the Spectator and Actor models explicitly represent the two possible task sets that can alternate covertly during the experiment. The arbitrator, ⍵, derived from Ω, represents the arbitration between task sets when making a prediction. Thus, the Task Set model reflects the use of an inferential (or hypothesis-testing) strategy, as opposed to the Learned Transition Structure model that reflects a continuous updating based on experience. For all four models, the probability that the participant predicts the next state is determined by a softmax equation (Eq 6). An inverse temperature parameter (β choice ) controls choice consistency with respect to predicted transitions. Higher values for β choice implies that the participant systematically selected the most likely transition, whereas a value close to 0 implies that the participant randomly guessed on prediction trials. (6) In total, the Learned Transition Structure model has five free parameters: two learning rates, two parameters that transform Ω into ⍵, and an inverse temperature. The Task Set model has only four free parameters, as the first-order learning rate is set to 0. The Spectator model and the Actor model both have two free parameters: a learning rate and an inverse temperature. Model variables were updated in the same manner following every exploratory trial and on state prediction trials that ended with feedback. However, only state predictions were used to constrain model fits, as there were no correct choices during exploration (even though there were informative or uninformative choices). Bayesian Information Criterion (BIC) was used for model comparison. A full description of the model space, fitting, and model comparison procedures is available in the S2 Appendix (see also S2 Fig). [END] --- [1] Url: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010120 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/