(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . The computational relationship between reinforcement learning, social inference, and paranoia [1] ['Joseph M. Barnby', 'Department Of Psychology', 'Royal Holloway', 'University Of London', 'London', 'United Kingdom', 'Cultural', 'Social Neuroscience Group', 'Department Of Neuroimaging', 'Institute Of Psychiatry'] Date: 2022-08 Theoretical accounts suggest heightened uncertainty about the state of the world underpin aberrant belief updates, which in turn increase the risk of developing a persecutory delusion. However, this raises the question as to how an agent’s uncertainty may relate to the precise phenomenology of paranoia, as opposed to other qualitatively different forms of belief. We tested whether the same population (n = 693) responded similarly to non-social and social contingency changes in a probabilistic reversal learning task and a modified repeated reversal Dictator game, and the impact of paranoia on both. We fitted computational models that included closely related parameters that quantified the rigidity across contingency reversals and the uncertainty about the environment/partner. Consistent with prior work we show that paranoia was associated with uncertainty around a partner’s behavioural policy and rigidity in harmful intent attributions in the social task. In the non-social task we found that pre-existing paranoia was associated with larger decision temperatures and commitment to suboptimal cards. We show relationships between decision temperature in the non-social task and priors over harmful intent attributions and uncertainty over beliefs about partners in the social task. Our results converge across both classes of model, suggesting paranoia is associated with a general uncertainty over the state of the world (and agents within it) that takes longer to resolve, although we demonstrate that this uncertainty is expressed asymmetrically in social contexts. Our model and data allow the representation of sociocognitive mechanisms that explain persecutory delusions and provide testable, phenomenologically relevant predictions for causal experiments. Responding to shifts in inanimate and social environments is important for adaptation and appropriate communication. Studies have demonstrated generic cognitive distortions to the processing of information in shifting contexts to underpin or accompany the development of symptoms of severe mental disorders, such as persecutory delusions. However, given the clear social phenomenology and clinical needs regarding social function which accompany persecutory delusions, explanations that detail how changes in generic cognition dovetail with social cognition are urgently needed. We addressed this gap by measuring the relationship between computational mechanisms governing non-social decision making and social inferences upon reversal of task contingencies, and the impact of pre-existing paranoia. We found that paranoia was related to uncertainty in both non-social and social contexts, and crucially, increased non-social uncertainty was related to changes in sociocognitive parameters. Paranoia was related to context-dependent, asymmetric biases in prior beliefs and belief-updating in social contexts. Importantly, paranoia increased the propensity to explain behaviour shifting away from beliefs about harm intent through alternative attributions. Our model and data bridges non-social and social theory explaining persecutory delusions and provides a mechanistic, phenomenologically relevant framework for causal experiments. Funding: JMB was supported by the UK Medical Research Council (MR/N013700/1) and King's College London member of the MRC Doctoral Training Partnership in Biomedical Sciences. MM is supported by the Wellcome Trust as a member of the ‘Neuroscience in Psychiatry Project’ (NSPN) which is funded by a Wellcome Strategic Award (ref 095844/7/11/Z). The Max Planck – UCL Centre for Computational Psychiatry and Ageing is a joint initiative of the Max Planck Society and UCL. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. In line with prior evidence, we predicted that during the probabilistic reversal learning task paranoia would be associated with lower decision consistency, greater win-switch rates, and greater perseveration errors following the reversal. In the modified repeated reversal Dictator game, we hypothesised that higher paranoia would lead to rigidity in harmful intent attributions formed about a partner when a partner’s behaviour changes, regardless of whether they were fair or unfair pre-reversal. In an exploratory analysis we tested the relationship of individual parameter values in the non-social task with parameters derived from the social model to understand how biases in probabilistic learning may be expressed in social contexts. In this set of experiments, we build bridges between formal, domain-general accounts of probabilistic reasoning and changes to social-cognitive representations central to paranoia. We tested whether participants varying in paranoid ideation displayed differences and/or commonalities in social and non-social reversal learning, inference, and decision consistency. If paranoia is simply an example of a dysfunctional but general reinforcement learning mechanism applied to social interaction, we should expect all types of motivational attributions to be influenced in similar ways, irrespective of content: harmful intent and self-interest judgements should both be affected in parallel by higher pre-existing paranoid beliefs when changes in a partner’s behaviour could be due to either motive. Alternatively, if intention attributions are not affected in the same way by a partner’s behavioural changes, it is likely that domain-general neurocomputational changes are subject to differentiated interactions with the specifics of social cognition. This makes it important to understand the mechanisms giving rise to social asymmetries. We used conceptually similar probabilistic social and non-social tasks in the same large population to detect such key cognitive differences. Building on previous work [ 18 ], we built separate computational models to capture behavioural (choice) and inferential differences within each task. Each model quantified decision/inferential uncertainty as precision in the agent’s decision making, or precision of an agent’s beliefs about how closely their partner’s decisions reflected their true intent, respectively. Each model also quantified participants’ response to contingency reversals. Experimentally demonstrating the phenomenological relevance of reinforcement learning in paranoia is important as we move as a field to develop more precise formal models of persecutory delusions. Current neurocognitive theories of persecutory delusions suggest associative learning mechanisms underpin the development of positive symptoms in psychosis [ 11 – 12 ], particularly through poor integration of lower perceptual information leading to uncertainty over beliefs about the world [ 13 ]. However, theories that implicate the role of reinforcement learning biases in persecutory delusions need to explain how learning biases lead to phenomenologically relevant experiences that form the basis for current cognitive models of persecutory delusion formation and maintenance in the clinic [ 14 – 16 ]. Indeed, the necessity to build formalised model which can accommodate the rich state space of social contexts have been called for more broadly [ 17 ]; formal explanations of social interaction must ensure learning is outlined explicitly in relation to how we probabilistically represent beliefs about ourselves and others. Psychiatric disorders are characterised by difficulties in social interaction and poor adaptation to new environments. In the case of persecutory delusions, individuals hold unwarranted beliefs that others intend to harm them, even in the absence of tangible evidence. Formal modelling of choice behaviour has suggested paranoia is characterised by increased perseveration and greater non-deterministic action preferences which are attributed to higher expectations of volatility in the environment [ 1 – 4 ]. These studies used probabilistic learning tasks with changing reward probabilities over time, in the absence of a discernible agent controlling the contingency shifts (e.g., [ 5 – 6 ]). To examine reinforcement learning observations within social contexts relevant to paranoia, experimenters have also framed probabilistic tasks in terms of interaction with social agents, demonstrating that those with higher paranoia are slower learners and more sensitive to changes in the social environment [ 7 ], more rigid in their beliefs about partners [ 8 ], and less likely to take advice from partners [ 9 – 10 ]. (A) Spearman correlations between decision temperature and mean attributions observed summed across 20 trials for each participant. (B) Permutation analysis of the relationship between decision temperature, and computational model-based parameters from the winning model and pre-existing paranoia. The grey distribution represents the null distribution following random sampling of the population for each Spearman pairwise correlation. The true Spearman correlations of each social parameters against tau are depicted for each parameter. Only the strength of prior beliefs over harmful intent (pHI 0 ; ρ = 0.16, p permuted ~ 0), uncertainty over partner policies (uπ; ρ = 0.09, p permuted = 0.015), and paranoia (ρ = 0.16, p permuted ~ 0) were associated with decision temperature. Red lines denote that the observed correlation with tau is very unlikely due to chance (p < 0.05). Black lines denote the observed correlation is more likely due to chance (p > 0.05). We then tested the associations of all social parameters with decision temperature. Independent spearman correlations suggested that decision temperature was associated with greater strength of priors over the harmful intent (ρ = 0.16, p permuted ~ 0), uncertainty over partner policies (ρ = 0.09, p permuted = 0.015), and paranoia (ρ = 0.16, p permuted ~ 0; See Fig 3 ). We then regressed all social parameters together against decision temperature. In this model (model J2a), decision temperature was only associated with the strength of priors over harmful intent (0.17, 95%CI: 0.09, 0.24). After including statistical controls (model J2b), decision temperature was still associated with the strength of priors over harmful intent (0.10, 95%CI: 0.02, 0.18). After introducing paranoia (model J2c), decision temperature was associated with both paranoia (0.11, 95%CI: 0.03, 0.18) and the strength of priors over harmful intent (0.09, 95%CI: 0.01, 0.16; see S4 Table for all estimates and 95%CIs). We initially tested the relationship between decision temperature from the probabilistic reversal learning task and observed attributions in the modified repeated reversal Dictator game. In unadjusted analysis, we found that decision temperature was positively associated with HI (0.14, 95%CI: 0.08, 0.19; model J1a), and negatively associated with SI (-0.07, 95%CI: -0.13, -0.01; model J1a; see Fig 3 for spearman correlations). Adjusting for statistical controls did not influence the effect of HI (0.08, 95%CI: 0.02, 0.13; model J1b) but attenuated the effect of SI (-0.02, -0.09, 0.02; model J1b). White nodes represent free parameters of the model. Grey shaded nodes represent numerical probability matrices built from free parameters. Thick solid and thick dotted lines represent transitions between trials. Thin solid lines represent the causal influence of a node on another node or variable. The agent or participant updates their initial beliefs (starting prior) about the partner’s intentions (p(HI, SI) t = 0 ) each trial using their policy matrix of the partner (π gen ) which maps the likelihood between a partner’s return to the participant and the partner’s true intentions weighted by three free parameters: a policy-map intercept (w 0 ), sensitivity to update self-interest attributions (w SI ), and sensitivity to update harmful intent attributions (w HI ). The integration between the likelihood and prior belief from the previous trial is also subject to another free parameter, uncertainty over partner policies (uπ). We assume that upon detecting a change (in this task, a reversal), participants re-set their beliefs, using their priors about people in general (thin dotted line), biased by what they have learnt already about their present partner (reset-at-reversal—η dg ). Both the policy matrix and initial beliefs about the partner are numerical matrices that assigned probabilities to each grid point of values of harmful intent (0–1) and self-interest (0–1). The model can be used to simulate observed attributions of intent given a series of returns, or inverted to infer the parameter values for participants, using experimentally observed attributions. Following the generative and replication analysis, we asked how parameters might be associated with paranoia, controlling for age, sex, general cognitive ability, and initial partner behaviour. As expected from our previous study [ 18 ] we found that paranoia was associated with higher strength of priors over harmful intent and uncertainty over a partner’s policy ( Table 2 ). In contrast to our preregistered predictions, we did not find that the reset-at-reversal parameter was associated with paranoia (which might account for general, non-specific fixity). Instead, we found that paranoia was associated with policy, i.e., the propensity to give unfair returns, being more sensitive to adjustments in self-interest (w SI ). While this may sound counter intuitive, in fact, greater sensitivity to adjustment self-interest means that those who are more paranoid are more likely to explain changes in behaviour through SI, rather than changing beliefs their beliefs about HI (see S11 Fig for a simulation and illustration of this change with a range of w SI values). We also replicate prior results [ 18 ]: using bootstrapped network analysis we observed positive associations between the strength (pHI 0 ) and uncertainty (uHI 0 ) of the prior over a partner’s harmful intent (0.19, 95%CI: 0.11, 0.26), the strength of priors over harmful intent and paranoia (0.13, 95%CI: 0.05, 0.20), and paranoia and uncertainty over a partner’s policy (uπ; 0.12, 95%CI: 0.04, 0.20), and a negative association between strength (pSI 0 ) and uncertainty (uSI 0 ) of the prior over a partner’s self-interest (-0.11, 95%CI: -0.20, -0.03). We also found a positive relationship between uncertainty over a partner’s policy and how much participant’s reset their beliefs following a reversal (η dg ; 0.09, 95%CI: 0.01, 0.16; See S12A Fig and S3 Table ). An unexpected negative relationship between the strength of priors over harmful intent and uncertainty over a partner’s policy (-0.13, 95%CI: -0.21, -0.05) may also exist, suggesting that it is normative to have a more consistent map of a partner if priors over harmful intent are larger. However, this relationship may be a result of collider bias due to their independent positive relationships with paranoia ( S13 Fig ) and therefore needs to be interpreted with caution. After comparing original belief-based [ 18 ], extended belief-based ( Fig 2 ), and associative social attribution models (see methods and S1 Text ), we found the extended belief-based social attribution model best fitted the data—this model allowed participants to weight their explanations of behavioural change through independent adjustments of HI and SI, rather than prior iterations that fixed these parameters. We were able to recapitulate observed data with our winning model (see S7 Fig ) and recovered our parameters very well ( S11 Fig ). To outline, data was best explained by a Bayesian-Belief model that hypothesised that participants’ separately weight changes to harmful intent and self-interest attributions following changes to a partner’s behaviour. After adjusting for confounders, paranoia was associated with greater uncertainties over a partner’s policy (uπ) and stronger priors over harmful intent (pHI 0 ; but not self-interest, pSI 0 ). We found that paranoia was not associated with general, non-specific fixity in attributions (η dg ), but rather was associated with a higher sensitivity to explain changes in behaviour by adjusting SI (w SI ), but not adjustments to HI (w HI ). After controlling for general cognitive ability, age, and sex, we found that only decision temperature was associated with paranoia, with all other parameters sharing non-significant relationships (see Table 1 ; model P7b). As decision temperature can be conflated with model fit, we additionally regressed paranoia against decision temperature, statistical controls, and included the sum loglikelihood score for each participant as an extra regressor (model P8). Decision temperature was still associated with paranoia in this adjusted model (0.11, 95%CI: 0.04, 0.19). We tested how well several models captured choice behaviour across all participants. These models were variants of the Q-learning model [ 22 – 23 ] with a Softmax response function, so that all models included a decision temperature (higher values mean noisier choice behaviour), and a learning rate (λ), although some included additional parameters (see Methods ). We found that a modified Pearce-Hall model including a ‘reset-at-reversal’ parameter (η pr ) best accounted for the data while retaining rich enough a parametrization to allow straightforward comparisons across individuals (see methods for full model comparison statistics, equations, and model fitting procedure; S1 Table ). We were able to recover all model parameters very well and generate simulated data that closely matched the real data observed ( S4 Fig ). We then examined adjusted effects. There was an influence of initial partner behaviour on both attributions, with partners who were initially more unfair inducing higher attributions compared to partners who were initially fairer (HI: 0.43, 95%CI: 0.31, 0.55; model S1a; SI: 0.82, 95%CI: 0.72, 0.91; model S1b). There was still also an interaction between initial partner behaviour and attributions before and after the reversal (HI: -0.93, 95%C: -0.98, -0.89; SI: -1.20, 95%CI: -1.25, -1.15), such that both HI and SI changed less after an initially unfair dictator became fair, compared to when an initially fair dictator became unfair. Paranoia was associated with higher HI (0.10, 95%CI: 0.04, 0.16; model S1a) but not SI (-0.01, 95%CI: -0.07, 0.03; model S1b) across the board. Paranoia interacted with reversals, such that HI changed less after reversal as paranoia increased (-0.05, 95%CI: -0.08, -0.03). There was no interaction between paranoia and trials after reversal for SI (-0.02, 95%CI: -0.07, 0.03). We additionally allowed paranoia and initial partner behaviour to interact. There was no meaningful interaction between paranoia and initial partner behaviour for either attribution (HI: 0.07, 95%CI: -0.04, 0.18; model S3a; SI: -0.01, 95%CI: -0.07, 0.03; model S2b). Again, we first report raw associations with paranoia, and then account for key covariates. Across all trials there was an influence of initial partner behaviour on HI (0.44, 95%CI: 0.32, 0.55) and SI (0.81, 95%CI: 0.71, 0.91), such that initially unfair partners were associated with greater HI and SI. There was also an interaction between initial partner behaviour and attributions before and after the reversal (HI: -0.93, 95%C: -0.98, -0.89; SI: -1.20, 95%CI: -1.25, -1.15), such that both HI and SI less after an initially unfair dictator became fair, compared to when an initially fair dictator became unfair. Paranoia was associated with HI (0.12, 95%CI: 0.06, 0.17), but not SI (-0.03, 95%CI: -0.07, 0.02) across all trials. Paranoia interacted with reversals, such that HI changed less after reversal as paranoia increased (-0.05, 95%CI: -0.08, -0.03). There was no interaction between paranoia and trials after reversal concerning SI (-0.01, 95%CI: -0.04, 0.02). ICAR scores were associated with both lower win-switch (-0.15, 95%CI: -0.22, -0.08; model P5a) and greater lose-stay rates (0.19, 95%CI: 0.12, 0.26; model P5b) across all trials in the same adjusted models where it was included as a covariate. In exploratory analysis we also allowed paranoia and ICAR scores to interact in separate auxiliary models. Paranoia and ICAR scores did not interact to predict win-switch rates (0.04, 95%CI: -0.01, 0.15; model P5a-Aux), nor interacted to predict lose-stay rates across all trials (interaction not included in final top model; model P5b-Aux). Accounting for covariates abolished win-switch rates across all trials (0.06, 95%CI: -0.01, 0.13; model P5a), as well as lose-stay associations after reversal (-0.06, 95%CI: -0.14, 0.02; model P5b). Paranoia was still not associated with the probability of choosing the optimal card before the reversal (0.03, 95%CI: -0.06, 0.11; model P1), nor with lose-stay rates (-0.01, 95%CI: -0.09, 0.04; model P5b), and nor with fewer self-reported correct answers before the reversal (0.04, 95%CI: -0.15, 0.24; model P4a) or after the reversal (-0.01, 95%CI: -0.29, 0.11; model P4b). We first report raw associations between paranoia and cognition, and then account for key covariates, as per pre-registration. Paranoia was not associated with the trial-by-trial probability of choosing the optimal card (80/20 card) before the reversal (-0.01, 95%CI: -0.06, 0.11), but was after the reversal (-0.12, 95%CI: -0.22, -0.02; S1 Fig ). The worst card (with a 20/80 chance of reward) was chosen significantly more on a trial-by-trial basis in those with higher paranoia after the reversal (0.06, 95%CI: 0.02, 0.09; S2 Fig ), but there was no relationship between paranoia and the probability of choosing the card with 50/50 probability of reward after reversals. Paranoia was not associated with fewer rewards prior to reversal (0.05, 95%CI: -0.02, 0.13) but was after reversal (-0.12, 95%CI: -0.20, -0.05). Paranoia was associated with win-switch rates after reversals (the probability that after receiving a reward, participants selected a different card on the next turn; 0.12, 95%CI: 0.05, 0.19) and lower lose-stay rates after reversal (after not receiving a reward, participants stick with the card they last selected; -0.08, 95%CI: -0.15, -0.00). Calculating rates across all trials as previously analysed [ 21 ] showed paranoia was associated with win-switch rates (0.10, 95%CI: 0.03, 0.17) but not lose-stay rates (-0.05, 95%CI: -0.12, 0.02). Finally, when participants self-reported which card gave the most rewards at the end of the task, paranoia was not associated with fewer correct answers before the reversal (0.00, 95%CI: -0.03, 0.03), nor after reversal (-0.02, 95%CI: -0.05, 0.01) (A) Experimental design and analysis plan for each paradigm. (B) An example of a trial from the probabilistic reversal paradigm. There were 60 trials in total, and after 30 trials, the contingency of the rewarding card changed unknown to the participant. (C) Example trial from the modified repeated reversal Dictator Game, where participants had to infer their partner’s intent. There were 20 trials in total, and after 10 trials, the contingency of the Dictator changed unknown to the participant. Participants were paired with a partner who was either at first more likely to be fair or unfair, and then changed their policy after the reversal. (D) Model space. Reversal learning was assessed across both non-social decision making and social attributions, using a probabilistic reversal learning task and modified repeated reversal Dictator game as measurement tools, respectively. All models were assessed using MAP estimation with weak priors. The winning models across both Bayesian-belief and associative classes within the repeated reversal Dictator Game were further assessed using Concurrent Bayesian Modelling (Piray et al., 2019). R-GPTS scores were highly skewed to the left and low (mean [sd] = 3.88 [6.18], skew = 2.22, range = [0, 33]). Compared to previously reported norms on the R-GPTS subscale B (mean = 2.53; [ 19 ]), our population had significantly higher scores on average (t (692) = 5.72, p < 0.001), but lower than the typically reported cut-off clinical mean (mean discriminatory of clinical populations = 11; t (692) = -30.29, p < 0.001). ICAR scores were normally distributed (mean [sd] = 4.96 [2.42], skew = 0.08) and not significantly different to previously reported means ([ 20 ]; mean = 4.97; t (692) = -0.16, p = 0.87). We administered a non-social probabilistic reversal learning task and a modified repeated reversal Dictator Game to 693 participants, in addition to collecting data on participants persecutory ideation (hereafter termed ‘paranoia’; measured via subscale B of the Revised Green Paranoid Thoughts Scale; R-GPTS [ 19 ]), general cognitive ability (using the International Cognitive Ability Resource–Progressive Matrices {ICAR} [ 20 ]), age, sex, and task comprehension. We conducted computational model-agnostic and model-based analyses; in model-based analyses, we tested a range of associative models for the non-social task (k = 8), and a range of associative (k = 7) and Bayesian-belief (k = 6) models in the social task to account for participant choice and attributional behaviour, respectively. In addition to reporting model-based and model-agnostic outcomes for each paradigm, we report the relationship between key parameters across winning non-social and social computational models (see Fig 1 and Methods for more details). Discussion We assessed the association between social and non-social reversal learning, and the impact of paranoia on both, in a large sample of non-clinical individuals. In the non-social task, paranoia was associated with suboptimal choices following a reversal, and greater decision temperature. In the social task, attributional model comparison uncovered that a Bayesian-Belief model that used separate weights on harmful intent and self-interest attributions to explain a partner’s behavioural change best fit the data. From this we found that paranoia was associated with policy uncertainty, larger strength of priors over beliefs about a partner’s harmful intent (but not self-interest), and that paranoia was associated with greater sensitivity to explain a partner’s behavioural change through self-interest rather than harmful intent. Finally, we observed that decision temperature in the non-social task was associated with larger strength of priors over a partner’s harmful intent (but not self-interest), harmful intent attributions over all trials, and uncertainty over partner policies in the social task, and with pre-existing paranoid beliefs. Our model and data raise hypotheses that may bridge general reinforcement learning and specific phenomenological explanations of the paranoia and allow experimental testing of predictions with formalised computational targets. In line with predictions, we found elevated decision temperature in the non-social task in those with higher paranoia, although the interpretation of this is not straight forward. Higher decision temperature can be indicative of different causes: it could be signs of information-seeking behaviours (e.g., strategic or directed), or instead random stochastic exploration without any reward or information gain [24–25]. The former would reflect lower-valued options being selected less frequently over time, and the latter demonstrated by frequent switching trial to trial with repetitions of the same actions regardless of reward. Prior work has found noisier decision making is associated with high risk and clinical participants after initial reversals [1–2], in those reporting psychotic experiences [26], and in healthy populations with higher paranoia [3,24]—these latter studies in particularly found larger win-switch rates across all trials in addition to larger decision noise. This would suggest decision temperature in paranoia might be related to more random behaviour. However, in one study, global impairment was found to confound random trial by trial switching behaviour: those with a schizophrenia diagnosis but higher in verbal and working memory showed win-stay behaviour no different to healthy controls [3]. Converging with this finding, and using a larger sample than previously employed, we found no increased win-switch or lose-stay rates when examined across all trials after statistical adjustment for fluid intelligence. Instead, we found increased win-switch rates and choosing suboptimal choices in the more paranoid only after reversals. Along with prior work, we suggest: 1) paranoia is related to directed exploratory behaviour when the environment changes with the overestimation of previously optimal cards and 2) optimal choices are not ignored in those who are more paranoid but may instead take longer on average to become exploited, leaving more room for ambiguity. We replicated key parameter relationships from the social model [18]. We found that larger priors over beliefs about a partner’s harmful intent conferred greater prior uncertainty over harmful intent, whereas the opposite was true for self-interest: larger prior beliefs concerning a partner’s self-interest were held with more certainty. We also replicated the relationship between paranoia and uncertainty regarding how strongly a partner’s actions relate to their true intentions. Unexpectedly, we found that uncertainty over partner policies were positively, rather than negatively, associated with the switch parameter. This means that as individuals become more uncertain over partner behaviour, they become more rigid in their attributional changes after the reversal. This disparity may have been due to our different task design and our extended model: the original task was used to explain between-partner adaptation [18] whereas in this task we model within-partner adaptation. Therefore, we are estimating qualitatively different changes in behaviour. This suggests that believing the same partner to be inconsistent with their actions is linked to less inferential flexibility when a partner’s behaviour changes. Unexpectedly we found that paranoia was associated with a greater weight being placed on a partner’s policy of self-interest, rather than a general fixity in attributional dynamics. Our winning model allowed participants to hold asymmetric sensitivities to whether fluctuations in a partner’s behaviour was attributed to changes in their underlying harmful intent or self-interest. This won over and above our previous model [18] which held the partner’s policy map with fixed parameters. Contrary to our prior hypothesis, rigidity over harmful intent was not due to a lack of sensitivity to changing partner behaviour, but rather a hypersensitivity to explain changes in behaviour with counter factual reasoning. Specifically, simulations using a range of w SI values demonstrated that this led to greater flexibility over self-interest attributions but not harmful intent attributions following a change in behaviour from a partner. Our results are congenial with models of general belief fixity (cf. [27]) that explain delusional maintenance through a desire to dismiss incongruent, counterfactual evidence with alternative hypotheses, although our model allows for the measurement of clinically relevant phenomena. Decision temperature in a non-social task was associated with larger priors over harmful intent, uncertainty over beliefs about a partner in unadjusted analyses, and pre-existing paranoia, but not parameters that control self-interest attributions. Given the empirical relationship between pre-existing paranoid beliefs and psychosis on uncertainty over environments [2, 3, 7, 21, 28–30] it is unsurprising that both non-social and social uncertainties are jointly related to paranoia in this present experiment, although we demonstrate this explicitly in relation to pre-existing paranoia and attributions in the moment. There may be several reasons for these associations. First, there may be a common biological mechanism responsible for the expression of uncertainty in both non-social and social contexts. Prior theoretical work explains the relationship between dopamine (dys)regulation, psychosis, and probabilistic reasoning [11,13], and empirical evidence has supported the common role of dopamine (dys)regulation in influencing uncertainty about the world [3, 31], the learning of information from primary vs secondary sources [32], adjusting harmful intent and externalising attributions [33–34], and increasing psychotic experiences [35–36]. While we do not use psychopharmacological manipulations in this paper, evidence to date is consistent with dopaminergic signalling being causally implicated in the basic computational processes underlying decision making (e.g., decision temperature) and should also be tested to assess whether changes to dopamine signalling also underlies uncertainty about a social partner, and whether this added uncertainty mediates increases in harmful intent attributions. A second, non-mutually exclusive explanation may be that increases in non-social decision temperature is a response to second-order social uncertainty made about the experimenters. In one study, paranoia was found to increase belief that a cards task was intentionally sabotaging the participant [21] and may have been responsible for the studies reported increase in overall win-switch behaviour. This raises the question: to what extent can ‘non-social’ task designs can be considered to measure non-social behaviour uncorrupted by agentive attributions? Not only is this question important for psychological measurement of behaviour, but the attribution of agency also has implications when associating neural activity with performance in tasks: prior work has demonstrated differential temporal-parietal junction activity as part of the ‘mentalising network’ dependent on whether a participant is perceiving to play against a computer, robot, or human social partner [37]. A way to remedy this would be to control for first- and second-order agency attributions, i.e., whether a partner was perceived to be ‘real’, or the inference that experimenters were intentionally trying to mislead the participant, respectively. Our belief-based model explicitly defines parameters that capture sociocognitive processes outlined in prior descriptive theory that explain the formation and maintenance of persecutory ideation. Rich state space models are required to capture the added complexity of a social interaction over and above those which quantify leaner learning processes [17, 38] belief-based model contributes to this theoretical requirement. First, uncertainty over others or over the self as a prerequisite for persecutory ideation has been theoretically [13–16] and empirically [7, 39–40] supported. Our model identifies the consistency to which we hold our internal statistical map of social others (uπ), which when elevated, causes greater uncertainty in a participant’s beliefs about a partner. Secondly, persecutory ideation has been robustly associated with externalised attributions of harmful intent [15, 34, 41–42]. The degree to which one holds strong beliefs of harmful intent at the start of an interaction is formalised in our model (pHI 0 ), which when increased, leads to higher initial expectations of harmful intent from a partner before interaction. Importantly, this parameter can be dissociated from priors over other, qualitatively different attributions (pSI 0 ). Finally, cognitive models of persecutory delusions [16] and in silico demonstrations [27, 43] suggest disconfirmatory evidence is explained away with alternatives when evidence deviates from a delusional belief. In our model, two parameters (w HI , w SI ) quantify attributional flexibility which may be used to probe how pre-existing beliefs bias asymmetric interpretations of behavioural change. We offer several predictions: 1) as demonstrated in our non-social task, it may be that healthy participants with higher paranoia need longer to gauge a social partner’s intentions, but over longer periods may eventually reach the same conclusions as the group. We predict that when partners become more consistent in their social behaviours, a high-paranoia participant’s map of an interaction partner will become more precise (uπ will reduce). 2) In line with prior work examining the influence of cannabis on paranoia [44] and the specific role of dopamine modulation on attributions of harmful intent [45], we predict dopamine potentiation will increase uncertainty over partner policies (uπ) and the strength of priors over harmful intent (pHI 0 ), but not the strength of priors over self-interest (pSI 0 ). 3) On a neural there is evidence that social context may be biologically realised through the engagement of different structures [46], including the dorsomedial prefrontal cortex where social computations may be implemented [9]. We predict that dopaminergic changes that underlying learning in multiple contexts may lead to context specific effects (e.g., social vs non-social learning) such as a participant’s uncertainty over their partner (uπ). 4) In clinical populations with a history of aversive or traumatic social environments during childhood and adolescence, belief maps will be more uncertain (uπ will remain high), harmful intent attributions will remain higher (higher initial priors, pHI 0 ) and less flexible (lower w HI or higher w SI ) than that of healthy controls. We note three limitations. While the similarity of constructs across different, ecologically valid tasks is a strength of our study, it also means we cannot directly compare behaviour in one task to another as they require different models/task content. An alternative would be to create a ‘social’ version of a non-social task (e.g., [21]). Suthaharan and colleagues [21] aimed to assess whether probabilistic reversal learning in those with higher paranoia differed between card decks that were and were not putatively controlled by a social agent, finding no difference in parameter estimates in those more paranoid across both tasks. However, tasks such as that used by Suthaharan and colleagues may be measuring social observation more than they are measuring social interaction; the latter requires an interaction partner’s behaviour to be ‘online’ (i.e., the decisions of the partner result in outcomes for both the partner and the participant; [47]). Secondly, we use a non-clinical population, and it is unclear whether the parameter estimates derived from our models in those with higher pre-existing paranoia would exist in clinical populations, although as mentioned above, we make some predictions about how the transition to clinical populations may unfold. Finally, we did not use varying volatility in our non-social task, keeping the same probabilistic environment with a single reversal. It may be that our single reversal meant participants had less time to build up expectations of contingency changes, despite not being told when the reversal might occur. [END] --- [1] Url: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010326 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/