(C) PLOS One
This story was originally published by PLOS One and is unaltered.
 . . . . . . . . . .


Humans can infer social preferences from decision speed alone [1]

['Sophie Bavard', 'Department Of Psychology', 'University Of Hamburg', 'Hamburg', 'Erik Stuchlý', 'Arkady Konovalov', 'Centre For Human Brain Health', 'School Of Psychology', 'University Of Birmingham', 'Birmingham']

Date: 2024-06 

Humans are known to be capable of inferring hidden preferences and beliefs of their conspecifics when observing their decisions. While observational learning based on choices has been explored extensively, the question of how response times (RT) impact our learning of others’ social preferences has received little attention. Yet, while observing choices alone can inform us about the direction of preference, they reveal little about the strength of this preference. In contrast, RT provides a continuous measure of strength of preference with faster responses indicating stronger preferences and slower responses signaling hesitation or uncertainty. Here, we outline a preregistered orthogonal design to investigate the involvement of both choices and RT in learning and inferring other’s social preferences. Participants observed other people’s behavior in a social preferences task (Dictator Game), seeing either their choices, RT, both, or no information. By coupling behavioral analyses with computational modeling, we show that RT is predictive of social preferences and that observers were able to infer those preferences even when receiving only RT information. Based on these findings, we propose a novel observational reinforcement learning model that closely matches participants’ inferences in all relevant conditions. In contrast to previous literature suggesting that, from a Bayesian perspective, people should be able to learn equally well from choices and RT, we show that observers’ behavior substantially deviates from this prediction. Our study elucidates a hitherto unknown sophistication in human observational learning but also identifies important limitations to this ability.

To answer this question, we propose a preregistered orthogonal design to investigate the role of both choices and RT in learning and inferring others’ social preferences. In our lab study, participants (n = 46, here referred to as observers) observed other people’s decision process in a Dictator Game [ 41 , 42 ], where the decision makers (N = 16, here referred to as dictators) were asked to choose between different monetary allocations between themselves and another person. Based on their behavior in the Dictator Game, participants can be ranked on a scale from selfish (choosing the allocation with the higher number of points for themselves) to prosocial (choosing the allocation with the lower number of points for themselves). Consequently, we assume that the dictators’ position on this scale reflects their preferred allocation: their ideal ratio of points for themselves versus the other person. Therefore, a decision problem with 2 options equally distant from the preferred allocation represents a choice between 2 equally liked allocations, resulting in high decision difficulty and the expectation that RT should be very long. Conversely, if the options’ distances to the preferred allocation are unequal (in other words, one option is much closer to the preference), this results in low decision difficulty and the expectation that RT should be very short. In this framework, we varied the amount of information provided to the observers: choice and RT information was either hidden or revealed to observers in a 2-by-2 within-subject design. Behavioral analyses confirmed our hypothesis, as observers were able to learn the dictators’ social preferences when they could observe their choices, but also when they could only observe their RT. To gain mechanistic insights into these observational learning processes, we developed a reinforcement learning (RL) model that takes both choices and RT into account to infer the dictator’s social preference. This model closely captured the performance and learning curves of observers in the different conditions. On the other side, recent studies have proposed (inverted) Bayesian inference as the optimal framework underlying the cognitive process of social learning [ 24 , 43 – 47 ], and (quasi-)optimal Bayesian learning has been reported in various fields such as reward-based learning [ 48 , 49 ] or multisensory integration [ 50 ]. Motivated by this work, we designed a benchmark Bayes-optimal (BO) model in which the observer’s belief on the dictator’s social preferences and choice processes is updated using Bayes’ rule on prior and current observations. By comparing this BO model to the RL model, we show that, while observers’ learning is close to optimal when they can observe choices, they substantially deviate from optimality when they can only observe RT, suggesting that the underlying mechanisms are better captured by our approximate reinforcement learning model. Overall, our study proposes an innovative approach to investigate the role of RT in learning and inferring preferences, identifies a new sophistication in human social inferences, and highlights the importance of considering a greater extent of decision processes when investigating observational learning.

On theoretical grounds, this notion was taken even further and it has been proposed that RT alone (i.e., without observed choices) could be used to infer preferences and predict future choices. For example, Chabris and colleagues argued that RTs reveal key attributes of the cognitive processes that implement preferences in an intertemporal choice setting [ 15 ]. Konovalov and Krajbich used RT to infer an indifference point in risky choices, social decision-making, and intertemporal settings [ 13 ]. Schotter and Trevino showed that the most informative trial-based RT has out-of-sample predictive power for determining someone’s decision threshold in a social decision-making setting [ 29 ]. So, in principle, it should be possible to infer latent information or processes from RT alone, including preferences in value-based decisions. Yet, none of these studies have tested empirically whether individuals are capable of using RT information as effectively and to learn someone else’s preference by observing their RT alone.

Despite the critical information on strength of preference provided by RT, their impact on learning about others’ preferences has received limited attention compared to the extensive study of learning from choices. It has recently been proposed that taking RT into account can be used to predict the choice of future unseen decisions, even when choices alone would fail to make correct out-of-sample predictions [ 14 , 22 , 29 , 33 – 36 ]. Most of these studies, however, do not use RT as information for humans to make inferences on someone else’s decision-making process, but rather as a tool to improve model fitting or model simulation in predicting future choices. On the other side, recent literature suggests that human adults [ 24 , 31 , 37 – 39 ] and children [ 40 ] do take RT into account when estimating someone else’s hidden preference or competence, in paradigms where observers were informed both about the decision-maker’s choices and RT. Importantly though, all these studies only use RT as a supplementary measure to choices, not as the sole piece of information available to the observer.

Each person’s unique set of preferences shapes the decisions they make: by closely observing these decisions, one can gain valuable insights into their likes, dislikes, and priorities. Whether and how one can learn and understand the preferences of others from observing their choices has been well documented in the social and reinforcement learning literatures [ 1 – 7 ]. Yet, focusing solely on choices is often not sufficient to determine the strength of a person’s preference (i.e., the confidence with which the person has made their choice or how likely they are to make the same choice again). That is, a person would choose option A if they found it twice as valuable as option B, just as they would if they found option A 10 times as valuable. From the observer’s perspective, this leads to a many-to-one relationship between strengths of preference and choice, making it impossible to narrow down the strength of preference from choices alone (unless they can extrapolate from observing multiple choices). Fortunately, the decision-making process offers more than just choices as an output. It also generates response times (RTs), which have been found to decrease as the strength of preference increases. In other words, when faced with equally liked options, individuals tend to take more time to make their decisions. This negative relationship between RT and utility difference has been established in many value-based domains, including decisions under risk [ 8 – 14 ], intertemporal choices [ 13 , 15 – 17 ], food choices [ 18 – 27 ], happiness measurements [ 28 ], and social decision-making [ 13 , 16 , 29 – 32 ].

(A, D) Simulated data (colored dots) superimposed on behavioral data (colored curves) representing the accuracy in the estimation phase for the RL model (A) and the BO model (D) in each condition. Shaded areas represent SEM. N = 46. (B, E) RL model (B) and BO model (E) accuracy predictions as a function a behavioral accuracy in the estimation phase for the last estimation of each participant in each condition. Dashed line represents identity. N = 184. (C, F) Estimated difficulty extracted from RL model (C) and BO model (F) predictions as a function of the estimated difficulty from behavioral data from observers and dictators, after the estimation phase, for trials from the prediction phase. Each point represents 1 average trial difficulty for each duration (fast/slow) for each condition for each observer. N = 368. ***p < 0.001. Data and analysis scripts underlying this figure are available at https://github.com/sophiebavard/beyond-choices .

The model closely captures several key aspects of observers’ behavior. In particular, it matches observers’ accuracy in all conditions ( Fig 4A and 4B ), as well as their last estimation per dictator ( S4C Fig ). Besides matching accuracy, our model was able to reproduce the difficulty patterns (our best proxy for RT, which are not simulated by our model; Figs 4C and S4A ). In addition, the RL model also captured observers’ choices in the prediction phase ( S4B Fig ). To compare the model and the empirical data of our study to an optimal benchmark, we designed a Bayes-optimal inference model (BO model) that learns the social preference by updating the posterior probability of the model’s parameters, given the available information (see Materials and methods ). We found that, while observers’ learning is close to optimal when they can observe choices, they substantially deviate from optimality when they can only observe RT (BO model predictions versus behavioral last estimation: Spearman’s ρ(44) = 0.10, p = 0.52; RL model predictions versus behavioral last estimation: Spearman’s ρ(44) = 0.47, p = 0.0011; Fisher’s z = 1.89, p = 0.029; Fig 4D–4F ). Actually, while it is able to match observers’ behavior when they predict dictators’ decisions, the BO model was unable to match observers’ accuracy in all conditions, contrary to the RL model ( S5 Fig ). Together, these modeling results suggest that the computational mechanisms underlying RT-based observational learning are better captured by our approximate RL model.

The outcome O t depends on the type and the amount of information provided to the observer (RL model, see Materials and methods ). Intuitively, when only the choice information is available, the outcome is computed as whether or not the chosen option was the more selfish one. When the RT information is available, it is used to categorize the decision between fast and slow. In case of observing a slow decision, the outcome is always computed as the midpoint between the objective values of both options (see S1 Text for more details). In case of observing a fast decision, the outcome depends on whether or not the choice information was displayed. If yes, it is computed as whether or not the chosen option was the more selfish one. If not, the observer is assumed to believe that the option with the higher subjective value was chosen. Finally, when no information was displayed, the outcome was computed as the midpoint between the objective values of both options, implying that in case of receiving no information, the observer is assumed to believe that the dictator was asked to make very difficult decisions (and thus decisions that would be diagnostic of their social preference).

Behavioral analyses confirmed our hypothesis: trial-by-trial, observers were able to learn the dictators’ social preferences when they could observe their choices, but also when they could only observe their RT. To gain a more thorough understanding of the mechanisms underlying social preference learning on the basis of observing different features of the decision process, we developed a modified version of a well-established reinforcement learning model [ 51 , 52 ]. To infer the dictator’s social preference, the model takes both choice and RT information (if available) into account, as well as features of the choice options. At each trial t, the estimated preference P is updated with a delta rule: where α is the learning rate and δ t is a prediction error term, calculated as the difference between the outcome O t (defined below) and the current estimation:

Together, these results all converge to suggest that (1) observers were able to extrapolate the learned social preference to predict decisions for someone else, even if this person had dissimilar social preferences; (2) if a trial was difficult for the dictator, it was also difficult to predict for the observer; (3) whether or not the decision problem was difficult for the observer themselves did not impact how difficult it was for them to predict the dictator. To conclude, in our task, observers were not only able to learn other people’s social preference, but they also applied this information to make decisions for this individual that matched the person’s preferences and decision dynamics, even though they would behave differently when choosing for themselves.

To further substantiate this interpretation, we also applied the regression model to the “self” RT on the trials shared with both types of dictators. Here, we would expect the duration effect (short versus long) to be present for trials shared with the similar dictator in all conditions, but to be entirely absent for trials shared with the dissimilar dictator. Indeed, the duration effect was significant for the trials shared with similar dictators (estimate = 0.40, SE = 0.079, t = 5.01, p < 0.0001, Table 3 and Fig 3D , middle) but not for those shared with the dissimilar dictators (estimate = 0.066, SE = 0.037, t = 1.79, p = 0.074, Table 3 and Fig 3D , bottom).

To test this hypothesis, we leveraged the fact that option sets in the “prediction” stage were a subset of option sets in the “self” stage. We then categorized all dictators based on how similar their social preferences were to each of the observers and performed the same regression as reported above on “prediction” RTs for the similar and dissimilar groups of dictators. As expected, we found that observers’ prediction RT were longer for long RT of similar as well as dissimilar dictators in the “both,” “choice only,” and “RT only” conditions ( Fig 3C , middle and bottom). In the “none” condition, however, this effect was only seen for similar dictators. In line with these patterns, the regression analyses revealed a significant main effect of duration for similar dictators (as the effect was present in all 4 conditions) but significant interactions of duration with both choice and RT visibility for dissimilar dictators (as the effect was not present in the “none” condition”) ( Table 2 ). These results are consistent with the notion that observers put themselves in the shoes of the dictator whenever they could learn the dictator’s individual social preferences. In the “none” condition, however, observers most likely used on their own preferences to make predictions (in line with the findings of the estimation phase; S2B Fig ). Thus, because of the high match of easy versus difficult choice sets with similar but not dissimilar dictators, the duration effect on prediction RT was seen in the former but not the latter case.

The GLMM (generalized linear mixed model with Gamma distribution and identity link function) was fitted on the observers’ RT, with choice visibility in the estimation phase, RT visibility in the estimation phase, and trial duration (i.e., whether the dictator’s RT was short or long), as independent variables. Denotation: Du = duration (fast or slow), Ch = choice visibility (displayed or not), RT = RT visibility (displayed or not), ***p < 0.001, **p < 0.01, *p < 0.05. Data and analysis scripts underlying this table are available at https://github.com/sophiebavard/beyond-choices .

Critically, we also found a main effect of trial duration (estimate = 0.32, SE = 0.086, t = 3.71, p = 0.00021, Table 2 ), suggesting that a choice set that elicited a long RT for the dictator also elicited a long RT for the observer ( Fig 3C , top). This main effect of trial duration is particularly interesting as it suggests that observers put themselves in the shoes of the dictator and predicted the decision in line with the dictator’s perceived difficulty. Under this assumption, one would expect observers to show a long RT when predicting decisions that were hard for the dictator, even if the observer themselves found the decision to be easy (and vice versa). Importantly, however, this pattern should only emerge in the 3 conditions, in which observers could learn inter-individual differences in social preferences, that is, in the “both,” “choice only,” and “RT only” conditions, but not in the “none” condition.

The analyses of observers’ accuracy when predicting decisions confirm that they efficiently learned the dictators’ social preferences and were able to use this information to infer which future decisions might be made in previously unseen contexts. Yet, the inspection of their choices alone does not provide much information about the underlying mechanisms and dynamics of how observers predict others’ decisions. To dig deeper into these mechanisms, we analyzed observers’ RT during the prediction phase. Unbeknownst to the observers, they always predicted 2 easy decisions (where the dictator’s RT was fast) and 2 hard decisions (where the dictator’s RT was slow, S1 Fig ). We ran a generalized linear mixed model (GLMM), regressing the observers’ RT onto the independent variables: choice visibility (in the estimation phase), RT visibility (in the estimation phase), and trial duration (i.e., whether the dictator’s RT was short or long). We found a significant main effect of choice visibility (estimate = −0.26, SE = 0.086, t = −3.04, p = 0.0024, Table 2 ), suggesting that observers made overall faster predictions when the choice information had been available in the estimation phase. The main effect of RT visibility was also significant (estimate = 0.19, SE = 0.091, t = 2.05, p = 0.041, Table 2 ), suggesting observers were overall slower to predict when the RT information had been available in the estimation phase. For interaction effects, please refer to Table 2 .

(A) Observers’ accuracy (correct choice rate, i.e., whether they chose the same allocation as the dictator) as a function of the condition (choice and RT visibility). (B) Observers’ consistency (choice rate, i.e., whether or not the chosen allocation is consistent with their last estimation for each particular dictator) as a function of the condition (choice and RT visibility). (C) Observers’ RT when predicting the dictators’ decision, as a function of the dictator’s RT for each condition. Top: average for all 16 dictators; middle: average for 8 similar dictators only; bottom: average for 8 dissimilar dictators only. (D) Observers’ RT when choosing for themselves, only in trials corresponding to the decisions they had to predict. Top: average over the corresponding trials of all 16 dictators; middle: average over the corresponding trials of the 8 similar dictators; bottom: average over the corresponding trials of the 8 dissimilar dictators. In all panels, points indicate individual average, shaded areas indicate probability density function, 95% confidence interval, and SEM. N = 46. ns: p > 0.05, ***p < 0.001, Bonferroni-corrected for pairwise comparisons. Data and analysis scripts underlying this figure are available at https://github.com/sophiebavard/beyond-choices .

After having observed all 12 trials of a dictator, we asked the observers to predict what the dictator’s choices would be in a series of 4 previously unseen trials (see Materials and methods for more details on the trial selection). From here on, in contrast with “choice only” and “RT only” conditions, we define “choice visibility” and “RT visibility” as orthogonal factors in our design representing whether or not the choice (resp. RT) information was displayed in each condition; for example, choice visibility is set to 1 in the “choice only” and “both” conditions, and to 0 in the “RT only” and “none” conditions. We first looked at the accuracy, i.e., whether the observer chose the same option as the dictator. In line with the estimation phase results, we found a main effect of choice visibility on prediction accuracy (F(1,45) = 91.52, p < 0.0001, = 0.67), but no effect of RT visibility (F(1,45) = 2.07, p = 0.16, = 0.04) and no interaction (F(1,45) = 2.20, p = 0.15, = 0.05, Fig 3A ). We then looked at the consistency, i.e., whether the observer’s choice was consistent with their last preference estimation of the dictator. We found a small main effect of choice visibility (F(1,45) = 5.61, p = 0.022, = 0.11), but no effect of RT visibility (F(1,45) = 0.041, p = 0.84, = 0.00) and no interaction (F(1,45) = 0.12, p = 0.73, = 0.00, Fig 3B ). Overall, both the average accuracy and consistency were higher than the chance level of 0.5 (accuracy: t(45) = 23.11, p < 0.0001, d = 3.41; consistency: t(45) = 36.01, p < 0.0001, d = 5.31), suggesting that observers were able to extrapolate their learning of the dictators’ social preference to previously unseen decision problems, and they did so in accordance with their last estimation.

(A) Observers’ accuracy for each estimation as a function of the condition (choice and RT visibility). Left: learning curves; right: average across all trials. Points indicate individual average, shaded areas indicate probability density function, 95% confidence interval, and SEM. N = 46. (B) Reported fourth and last estimation per observer per observed dictator, as a function of the true preference of each dictator, for each condition. N = 184. ρ: Spearman’s coefficient. In all panels, ns: p > 0.05, **p < 0.01, ***p < 0.001, Bonferroni-corrected for pairwise comparisons. Data and analysis scripts underlying this figure are available at https://github.com/sophiebavard/beyond-choices .

After showing that social preference was a good indicator of how long it takes one to make a decision in our task, we turned to the main experiment. The main goal of this study is to investigate whether observers can effectively learn someone else’s social preference by observing their decisions, and more specifically either their RT alone, choices alone, or both ( Fig 1D and 1E ). To this end, we selected 12 trials per dictator to be observed by the observers (see Materials and methods and S2 Fig for more details on the trial selection). To assess learning during the task, observers were asked to estimate the dictator’s preference on several occasions: once before any observation, then after each 4 trials. First, in accordance with our preregistered analyses, we found significant correlations between the observers’ own preference and (1) their first estimation (before any observation; Spearman’s ρ(44) = 0.38, p = 0.0099, S2A Fig ), as well as (2) their average estimation, depending on the amount of information provided to them (average estimation per condition; none: Spearman’s ρ(44) = 0.56, p < 0.0001; RT only: Spearman’s ρ(44) = 0.48, p = 0.0018; choice only: Spearman’s ρ(44) = 0.31, p = 0.038; both: Spearman’s ρ(44) = 0.27, p = 0.073, S2B Fig ). Then, according to our main preregistered hypothesis, we analyzed observers’ accuracy in estimating the dictators’ preference (note that for readability, statistical tests of this paragraph are summarized in Table 1 rather than reported in the text). On average, observers were able to learn above the empirical chance level (see Materials and methods , t(45) = 22.59, p < 0.0001, d = 3.33), even in the “RT only” condition ( Fig 2A ). Surprisingly, observers’ accuracy was above the empirical chance level in the “none” condition as well ( Fig 2A ). However, the correlation between the dictators’ true preference and the observers’ last estimation was not significant in this condition (Spearman’s ρ(182) = 0.13, p = 0.071), whereas it was significant in all conditions where some information was provided (RT only: Spearman’s ρ(182) = 0.41, p < 0.0001; choice only: Spearman’s ρ(182) = 0.84, p < 0.0001; both: Spearman’s ρ(182) = 0.84, p < 0.0001). This suggests that observers learned to distinguish more prosocial from more selfish dictators in conditions with information but not in the “none” condition, where they mostly used their own preference ( Figs 2B and S3 ). Furthermore, while accuracy was higher in the “both” condition than in the “RT only” condition, observers seemed to learn equally well in the “choice only” and “both” conditions ( Fig 2A ). The latter result was in contrast with our predictions. All statistical analyses across conditions are reported in Table 1 . Finally, to get a more fine-grained understanding of learning dynamics, we amended the preregistered analysis and performed a 4 × 4 ANOVA with factors condition (“none,” “RT only,” “choice only,” “both”) x estimation number (1st, 2nd, 3rd, 4th). Consistent with our results so far, we found significant main effects of both conditions (F(3,135) = 36.84, p < 0.0001, η 2 = 0.45, Huynh–Feldt corrected) and estimation number (F(3,135) = 80.71, p < 0.0001, η 2 = 0.64, Huynh–Feldt corrected), and more interestingly a significant interaction (F(9,405) = 13.58, p < 0.0001, η 2 = 0.23, Huynh–Feldt corrected), suggesting that observers learned faster in the “choice only” and “both” conditions ( Fig 2A ).

We first ascertained that the social preference could, in principle, be learned in all conditions, i.e., that the dictator’s choices and RT would be good predictors of their social preference. In our task, their social preference refers to the same construct as their preferred allocation, defined earlier as their ideal ratio of points for themselves versus the other person. On each trial, we calculated subjective values s(∙) for each option (left and right), using the social preference estimated as a free parameter (see Materials and methods ): where left is the objective value of the left (resp. right) option (i.e., the number of points allocated to “self”) and Pref is the fitted social preference (see S1 Text for more details). Therefore, an option close to the social preference will have a higher subjective value. We regressed the RT with the difference in subjective values between both options and found a negative effect of this difference in all conditions for all 16 dictators, suggesting that decision problems with options of similar subjective values produce longer RT ( Fig 1D ). In addition, we found a significant positive correlation between the preference estimated from the choices only fitting a softmax rule, and the preference estimated from the RT only fitting a DDM (Spearman’s ρ(14) = 0.83, p < 0.0001, Fig 1E ), replicating previous results [ 13 ]. This suggests that both information types are not only sufficient on their own to make inferences on someone else’s social preference, but also lead to inferring the same preference. Together, these results show that, in our Dictator task, RT is a good predictor of social preference as captured by the subjective values ( Eq 2 ).

(A) Trial sequence of the Dictator Game part. (B) Orthogonal design of the observation part. Observers were presented with 4 successive conditions varying in the visibility of the choice information (with or without, represented with a black square around the chosen option) and the RT information (with or without, represented with a time interval between allocations onset and choice onset). Both allocations were displayed in all conditions. (C) Task design of the observation part. Observers were explicitly informed that they were about to observe a new dictator’s decisions, and in which condition. After observing all trials of a said dictator, they were asked to predict what this person would choose in previously unseen decision problems. (D) Regression coefficients with RT per trial as the dependent variable and trial difficulty as the independent variable, for the complete Dictator Game task performed by the dictators. The difficulty was estimated as the difference in subjective values between both allocations ( Eq 2 ), using the preference fitted as a free parameter from RTs only (left), choices only (middle), or both (right). Int.: intercept. Diff.: difficulty. Points indicate individual average, shaded areas indicate probability density function, 95% confidence interval, and SEM. N = 16. (E) Dictators’ social preference fitted from the RTs alone as a function of their preference fitted from the choices alone, and their preference extracted from behavioral data in the Dictator Game task ( Eq 2 ). ρ: Spearman’s coefficient. N = 16. ***p < 0.001. Data and analysis scripts underlying this figure are available at https://github.com/sophiebavard/beyond-choices .

To test whether people learn someone else’s social preference when observing only their RTs, we designed a two-task experiment involving a variant of the Dictator Game [ 41 , 42 ]. In this variant, participants were asked to choose between 2 two-color circles, each representing a proportion of points allocated to themselves (“self”) and to another person (“other,” Fig 1A ). For the “Dictator task,” we recruited a sample of 16 participants, which will be referred to as dictators, and we recorded both their choices and RT. For the “Observer task,” we recruited a sample of 46 participants. These participants, which will be referred to as observers, were asked to first complete a shortened version of the Dictator task, before observing the (previously recorded) decisions of all 16 dictators. For the observation phase, we used a 2 × 2 within-subject orthogonal design, manipulating the amount of information provided to the observers: the dictator’s choices revealed or hidden, their RT revealed or hidden ( Fig 1B ). Before observing the decisions of a dictator, observers were informed that they were about to observe a new person’s decisions, and whether they would see their choices, RT, both, or no information. They were asked to estimate the social preference of this person (their most preferred allocation), once before observing any decision and then after every 4 trials, for a total of 4 estimations over the 12 observed trials per dictator (estimation trials, Fig 1C ). After observing the 12 trials, observers were asked to predict what this person would choose in 4 previously unseen decision problems (prediction trials, Fig 1C ). After these 4 prediction trials, the instruction screen for a new dictator was presented. Crucially, all observed and predicted trials were decision problems that observers completed for themselves in the Dictator Game task before observing the dictators.

Discussion

Humans and other animals are known to learn not only by experiencing rewards and punishments themselves, but also by observing others’ actions and outcomes. On the one hand, this allows learning from punishments and losses without incurring these negative outcomes directly, which comes with obvious evolutionary benefits [7]. On the other hand, observing others can reveal information about their beliefs and preferences, which may be critical for future interactions [53]. So far, research on observational learning has focused on testing whether and how people learn from others’ choices but has largely ignored other sources of information. Here, we set out to fill this gap by studying the computational mechanisms of learning from observing (only) the speed with which decisions are made. We find that people are, indeed, capable of learning from observing RTs only, but that—contrary to previous assertions [24,31,37]—this ability falls short of an optimal Bayesian learner and is instead better described by an RL model.

In the Dictator Game, where one participant has the power to allocate money to another participant, the RT of the dictator can provide insights into their underlying social preferences. When individuals have a clear and strong preference for a particular allocation, they tend to respond quickly and assertively. However, when faced with a decision where their preferences are less well-defined or when considering 2 options with similar appeal, individuals often exhibit longer RT, indicating hesitation or conflict in their decision-making process. This illustrates how RT can serve as a window into the underlying social preferences of individuals: RT can be used as cues to infer other people’s social preferences. Their influence extends beyond the choices individuals make, as RTs are intimately related to the cognitive decision-making processes and reflect the complex interplay between preferences, beliefs, and social context. Building on previous studies, which either suggested RT to be an important source of information theoretically [13,15,29] or showed that humans do use RT to improve their predictions [24,31,37], we designed a task where participants observed someone else’s decisions and had to estimate their underlying preference and predict their future decisions. Combining a factorial design that systematically varied the available sources of information, with asking participants to observe, estimate, and predict individual dictators over repeated trials, allowed us to go beyond existing work in characterizing the computational mechanisms of observational learning from different decision process in great detail.

First, we showed that the dictators’ RT negatively correlated with the difficulty of the trial, i.e., the subjective value difference between the 2 options. In other words, difficult decisions tend to take more time in the social decision domain as well.

Second, we found that observers were able to learn the dictators’ preference in all conditions where they had relevant information, even when they could only observe the dictators’ RT. Interestingly, compared to RT only, participants learned faster and better when they could only observe the dictators’ choices or when they could observe both choices and RT, but their accuracy did not differ between the last 2 conditions. These results suggest that, in our task, participants used the RT information when no other piece of information was available, but they seemed to disregard RT when the choice information was available. We cannot rule out that there might have been an aleatoric uncertainty effect [54,55], already achieved in the choice only condition, meaning that natural constraints (such as some noise in the dictator’s responses, or some sort of representational noise on the observer’s end), prevented the addition of RT information on top of choice to improve participants’ performance beyond this limit. In any case, since this latter result is not in line with recent literature, which suggests that people sometimes use RT on top of choice-only information to improve their inferences and predictions [24,31,37], further research is needed to dig deeper into these mechanisms. For example, contrasting choice and RT as conflicting pieces of information would be more informative to answer this specific question, which was not the main goal of the current study.

Third, we found that participants were able to predict the dictators’ future decisions after having learned their preferences reaching a prediction accuracy that was higher than chance level. However, the arguably most interesting finding with respect to these predictions was that participants’ RT patterns when predicting someone else’s decisions matched the other person’s more than their own (Fig 3C versus Fig 3D). This result strongly suggests that people are able to put themselves into someone else’s shoes when predicting their decisions.

Another interesting finding is that participants showed improvement in their social preference estimation when no information (neither choice nor RT) was displayed, apart from the 2 options available to the other person. We believe that this unanticipated behavioral pattern might reflect a form of higher-order inference, where participants were able to extract information from observing the given options alone. Therefore, when no choice or RT information was given, we assume that the participant believes that the other person was asked to make very difficult decisions (and thus decisions that would be diagnostic of their social preference). Although we implemented this idea of higher-order inference in our specification of the RL model for the “none” condition and obtained support for it in our modeling results (Fig 4A), future research should investigate this further.

Over the past decades, many cognitive neuroscience studies in the field of learning and decision-making have used computational modeling to shed light on how people learn and make decisions in social contexts. Current theories suggest that 3 strategies are at play in this process [56]: vicarious RL, action imitation, and inference about others’ beliefs and intentions (see [57] for a review). Of note, this distinction has been extensively discussed in developmental and comparative psychology—also referred to as “imitation versus emulation” distinction (see [1] for a review). In opposition to vicarious RL where observers learn from others’ experienced outcomes, or from action imitation where observers learn from others’ actions, our task involves a more complex inference process about someone else’s hidden preferences. This framework usually assumes that observers update their beliefs about others’ goals and intentions in a Bayesian manner [24,43,44,47,58,59], combining their prior beliefs with evidence they get from observing others’ actions, both choices [45,46,57] and RTs [24,31,37]. To gain mechanistic insights into these observational learning processes, we compared such a Bayesian inference model against an RL model that takes both choices and RT into account to infer the dictator’s social preference. Instead of learning the value of options or actions, as in more conventional learning scenarios, the RL model seeks to learn the social preferences of others—in our case, the preferred allocation of money in the Dictator Game. When only choices are available, this allocation is updated in accordance with the choice (selfish versus prosocial). When only RTs are available, the updating rule depends on the speed of the decision. In case of slow decisions, the midpoint of the 2 options is used for updating. In the case of fast decisions, the observed agent is assumed to have chosen the higher-valued option, which strengthens any existing belief about the agent’s preferred allocation. In our view, this implementation offers a cognitively plausible approximation that allows inferring social preferences through repeated observations of choices or RT. Accordingly, our model closely captured the performance and learning curves of observers in all the different conditions.

When comparing the RL model to a Bayesian inference model adapted to learn social preference by updating the posterior distribution, qualitative model comparison suggests that, while our participants’ learning is close to optimal when they can observe choices, they substantially deviate from optimality when they can only observe RT. A potential reason why humans fall short of learning from RT in a Bayes-optimal way is its high computational complexity. The complete Bayesian solution requires one to possess an accurate generative model of the decisions and decision speed, such as a drift-diffusion model (DDM) that takes the preferred allocation, as well as the choice options into account to inform the drift rate. Furthermore, the belief distributions of a total of 5 parameters from this generative model must be updated after each observation in an accurate manner. It is conceivable that humans simplify the learning process (akin to our proposed RL model) to reduce the computational complexity and avoid getting lost in a curse of (parameter) dimensionality. A second potential reason for suboptimal performance in the RT only condition is the need to perceive the decision speed accurately for classifying an observed decision as being either fast or slow. Making incorrect classification or being uncertain in this regard will slow down learning substantially as it is likely to produce a substantial number of erroneous inferences.

Taken together, our work deviates from previous literature by challenging the expectation that, from a Bayesian perspective, people should be able to learn equally well from choices and RTs. While our empirical results are in line with the Bayesian prediction on a qualitative level, they diverge substantially from it on a quantitative level.

Our present work builds on a growing literature suggesting that RT alone should be sufficient to produce an accurate estimation of someone’s preference [13,15,29]. Konovalov and Krajbich recently used a DDM without the choice data to estimate individual preferences using subjective value functions in 3 different settings: risky choice, intertemporal choice, and social preferences. Our study replicates their findings, as we were able to accurately estimate the DDM-based preference parameter from RT alone in the first sample of participants. We then took this idea a step further and showed that a second sample of participants were able to provide an accurate estimation of others’ social preference when they observed their RT alone. To the best of our knowledge, this is the first time that this has been empirically tested and validated. Notably, other studies have attempted to increase out-of-sample predictive power with other indices of information processing, such as eye movements [18,60–63] or computer mouse movements [64–68]. In the neuroimaging literature, attempts have been made to move beyond brain–behavior correlations and to predict behavior from brain activity without choices [69–76] (see [77] for a review). Nevertheless, unlike eye movements or neural data, RT are easy to collect from the experimenter’s point of view, and have the benefit of being directly accessible to the actual observer, making them a stronger candidate than many of the other variables mentioned above. Altogether, these and our findings point toward the richness of process data in helping to better understand and predict behavior.

An open question for future research is to elucidate the neural mechanisms that underlie the remarkable ability to learn from observing decision speed and to use this information for making predictions. Historically, brain activity tracking social inference computations was found in regions that are known to be part of the Theory of Mind network, such as dorsomedial prefrontal cortex, temporoparietal junction, and posterior superior temporal sulcus [6,45,78–80]. Nonetheless, as stated above, taking decision speed into account requires an accurate estimation of time passage, suggesting that brain regions related to time perception, such as the pre-supplementary motor area and the intraparietal sulcus [81], should play a critical role. Furthermore, our modeling indicates that a prediction error signal, which quantifies the degree of mismatch (i.e., surprise) between the anticipated and observed decision speed, should play a critical role in the RT-based updating process. Interestingly, a recent EEG study has identified such a surprise signal when participants categorized stimulus durations as being either fast or slow, and modeled this EEG signal as reflecting the distance of a diffusion particle from the anticipated threshold in a DDM-like model [82]. It is tempting to speculate that people compare the observed decision speed with their own expectations in a similar way and that the ensuing (neural) surprise signal drives the social observational learning process. Future research will need to test these predictions to further promote our understanding of how people make sense of other people’s behavior.

To conclude, by investigating the relationship between RT and social preferences in the Dictator Game, we aim to contribute to the existing literature on decision-making, social cognition, and economic behavior. Our findings shed light on the intricate interplay between RT, learning and social preferences, expanding our understanding of the mechanisms underlying human decision-making in social contexts.

[END]
---
[1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002686

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/