(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org. Licensed under Creative Commons Attribution (CC BY) license. url:https://journals.plos.org/plosone/s/licenses-and-copyright ------------ Causal inference regulates audiovisual spatial recalibration via its influence on audiovisual perception ['Fangfang Hong', 'Department Of Psychology', 'New York University', 'New York City', 'New York', 'United States Of America', 'Stephanie Badde', 'Tufts University', 'Medford', 'Massachusetts'] Date: 2022-01 In this section we lay out the definition of recalibration underlying all models reported here and then describe the recalibration process during the audiovisual recalibration phase according to each of the models. Finally, we provide a formalization of each of the tasks used to constrain the model parameters followed by the details of how the models were fit to the data. We further assumed that stimulus reliability differed between unimodal ( ) and bimodal ( ) stimulus presentations. Note that we denote the visual-reliability condition for variables associated with the auditory modality (i.e., with a subscript A i ) when the value of that variable can be impacted by visual measurements (e.g., shifts or, below, sensory estimates ), but not otherwise (e.g., measurements or measurement variances and ). We use the variables and to exclusively capture the update in measurement shifts after encounters with the spatially discrepant audiovisual stimulus pairs with visual reliability i (i ∈ {1, 2, 3}) during the audiovisual recalibration phase. In addition, we assumed that and are updated in every trial of the phase, and thus the location-independent shifts at the end of the task can be written as the sum of the initial shifts and the shift updates over 120 trials, that is and . The final shifts accumulated after 120 recalibration trials were assumed be maintained throughout the subsequent post-recalibration task as observers were not exposed to spatially aligned audiovisual pairs after the recalibration phase (Section S12 in S1 Appendix ). We assumed that observers were calibrated as far as possible at the beginning of each experimental session. Thus, the remapped stimulus locations in internal space were understood as linear functions of the physical stimulus locations s A and s V , that is, and . However, given that we can only measure relative biases there is no way to empirically isolate the remapping of one modality. As a consequence and without loss of generality, we set to be equal to s V (i.e., a V = 1 and b V = 0). Each stimulus at location s in the world leads to a sensory measurement m′ in an observer’s brain. This measurement is corrupted by Gaussian-distributed sensory noise. Thus, with repeated presentations of stimuli at location s, the sensory measurements correspond to scattered spatial locations . The variability of the measurements is determined by the stimulus reliability 1/σ′ 2 . To allow integration of information from different modalities, measurements are remapped into a common internal reference frame. Hence, the measurement distribution is centered on s′, the remapped location of s. As part of the remapping process, spatial discrepancies between the senses are accounted for by shifting the measurements by a modality-specific amount Δ. We model recalibration as the process of updating these shifts following each encounter with a cross-modal stimulus pair [ 32 , 63 ] and probed this updating process by misaligning the physical visual and auditory stimuli to create an artificial sensory discrepancy. The likelihood of different sources for the visual and auditory measurements in trial t, is the product of the likelihood of internally represented auditory and visual stimulus locations and given the auditory measurement, , and the visual measurement , and the supra-modal prior. Given that the measurements in this causal scenario stem from different sources, the product is integrated over all possible, remapped visual and auditory stimulus locations, and : (13) The likelihood of a common source of the visual and auditory measurements in trial t, , is the product of the likelihood of the internally represented audiovisual stimulus location given the auditory measurement, , and the visual measurement , and the supra-modal prior, integrated over all possible remapped audiovisual stimulus locations [ 30 ]: (12) The posterior probability of a common source for the auditory and visual measurements in trial t, , is proportional to the product of the likelihood of a common source for these measurements in trial t and the prior probability of a common source for visual and auditory measurements in general, P(C = 1): (11) The final location estimates are derived by model averaging (see alternative decision strategy in Section S3 in S1 Appendix ). Specifically, the final location estimate is the average of the conditional location estimates, and , with each estimate weighted by the posterior probability of its causal structure: (9) and analogously for the visual location estimate: (10) In the case of two separate sources, the location estimates of the auditory and the visual stimulus, and , are equal to the reliability-weighted averages of and for the auditory estimate, and and for the visual estimate, respectively: (7) and (8) The location estimates are a mixture of two conditional location estimates, one for each causal scenario (common audiovisual source, C = 1, or different auditory and visual sources, C = 2). In the case of a common source, the location estimate of the audiovisual stimulus pair, , equals the reliability-weighted average of the measurements , and the mean of an internal, Gaussian-shaped, supra-modal prior across stimulus locations with variance : (6) In this model, the shift updates are determined by the discrepancy between a measurement and the corresponding perceptual estimate, and [ 63 ] for each modality. The spatial discrepancy between auditory and visual measurements and the relative reliabilities of both stimuli have indirect influence on the shift updates by means of their influence on the location estimates, and . Additionally, the location estimates and thus the shift updates are contingent on the degree to which the brain infers a common cause or separate causes for the two measurements [ 30 , 63 ]. According to this model, after every trial, and are updated in the direction of the discrepancy between the visual and auditory measurements by a fixed ratio of this discrepancy. The ratio of the update depends solely on the identity of the modality and thus is independent of stimulus reliability [ 62 ]. and are updated according to the following equations: (4) and (5) where α A and α V are modality-specific learning rates. According to this model, each modality should be recalibrated in the direction of the other modality by an amount that is proportional to the other modality’s relative reliability [ 53 ]. In other words, after every trial, the measurement shifts, and , are updated in the direction of the discrepancy between the visual and auditory measurements by an amount proportional to the two modalities’ relative reliabilities as follows: (1) where (2) and analogously (3) where l A and l V index the auditory and visual locations (l A , l V ∈ {1, 2, 3, 4}), α denotes a supra-modal learning rate, and t denotes trial number. We assumed that the probability distributions of the localization responses are centered on these updated values, that is, we did not implement the updated remapping for the location estimates of the cursor. In sum, localization responses to unimodally presented visual and auditory stimuli have a Gaussian probability distribution that, after the audiovisual recalibration task, additionally depends on the final shift updates and . In the post-recalibration phase, the remapping from physical to perceptual space had been updated so that additional shifts and , accumulated after 120 exposures to discrepant audiovisual stimuli, were incorporated: and . This change in the measurement distributions affects the centers of the location estimates’ probability distributions as follows: (34) As for the pointing practice task, we assumed that the probability distributions of the localization responses in physical space, r A,l and , are centered on the location estimates and in perceptual space, and corrupted by additional unbiased noise. As in the spatial-discrimination tasks (and unlike in the pointing practice task that used a different, maximally reliable stimulus), the stimulus location estimates are assumed to be biased due to the remapping process and the incorporation of the supra-modal spatial prior. It follows that the probability distributions of the localization responses are (33) where the terms are defined in Eqs 24 – 27 . The unimodal localization task was conducted before and after the recalibration phase to measure shifts in auditory and visual localization responses as a consequence of exposure to spatially discrepant audiovisual stimuli during the recalibration phase. Stimuli were presented at four locations for each modality, s A,l and s V,l . By doing so, we assume that 1) the location of the visual cursor in physical space maps directly to its location in perceptual space and 2) the stimulus location estimate is unbiased. This is based on our general assumption of identity remapping for visual stimuli as well as on the high spatial reliability of the visual cursor and the visual stimulus, which should safeguard their estimates against the influence of spatial priors. See Section S2 in S1 Appendix for a model that does not have these assumptions. The pointing practice task was used to estimate localization response variability, , due to sources unrelated to the spatial perception of the stimuli. Visual stimuli were presented at eight different locations s V,o , where o indexes the stimulus location (o ∈ {1, 2, …, 8}). Localization responses (i.e., confirmed cursor positions) in each trial were modeled as perturbed by Gaussian-distributed noise and centered on the physical stimulus location: (32) As in the unimodal spatial-discrimination task, the model includes occasional lapses at rate λ AV . Therefore, the probability of reporting a visual stimulus at s V,n as located to the right of an auditory stimulus at location s A,l (r l,n = 1) is equal to (31) The bimodal spatial-discrimination task was conducted to estimate the relative bias of auditory compared to visual spatial perception, i.e., to estimate a A and b A . Auditory stimuli were presented at four different locations in physical space s A,l , where l indexes the auditory location. Guided by a staircase procedure, on each trial t, an auditory stimulus at location s A,l was paired with a visual stimulus of high spatial reliability (i = 1) at one of N test locations s V,n , where n indexes the finer grid of locations of visual stimuli that were presented during the task. For each pair, the model predicts p l,n , the probability of judging the visual stimulus at location s V,n as to the right of the auditory stimulus at location s A,l . Finally, the model includes occasional response lapses (i.e., random button presses) at rate λ, so that the probability of reporting the test stimulus as located farther to the right than the standard (r A,n = 1) is (23) However, as experimenters we only have access to response probabilities as a function of the stimulus locations in physical space. Given that the remapped location of is a function of the physical stimulus location s A , we can rewrite Eq 20 as (21) and analogously, (22) The probability distribution of the difference between the two location estimates and is (19) where . Taken together, the probability of perceiving an auditory test stimulus at location s A,n to the right of an auditory standard stimulus at location s A,0 is (20) where Φ(x;μ, σ 2 ) is the cumulative Gaussian distribution. To further specify p A,n , we have to derive the probability distributions of the internal location estimates. Each physical stimulus at location s A,n results in an internal measurement, . The measurement distribution is Gaussian ( ) and for a given measurement the estimate of the remapped location of the stimulus is the average of the measurement and the mean of the spatial prior, , each weighted by their relative reliabilities ( ). Thus, the probability distribution of the location estimates of a test stimulus is (17) where (18) The unimodal spatial-discrimination task was conducted to estimate the one auditory and three visual stimulus reliabilities under unimodal presentation conditions as well as to constrain the estimates of the variable bias a A introduced by the remapping process. We begin by describing the auditory version. The standard stimulus was presented straight ahead, at location s A,0 , and the test stimulus was presented at one of N A locations, s A,n , determined by an adaptive procedure. For each pair, the probability, p A,n , of estimating the test stimulus to be located to the right of the standard stimulus is a function of the physical distance between the two stimuli. Model fitting All models were fit using a maximum-likelihood procedure. That is, a set of free parameters Θ was chosen to maximize the log likelihood of the data given a model M. Our model fitting strategy aimed to reduce the number of free parameters estimated at once. We split the set of free parameters into three subsets Θ i , i = 1, 2, 3, each fit to a subset of the data X i , i = 1, 2, 3, and maximized the log-likelihoods of each subset X i separately, (35) where X 1 , X 2 , X 3 ⊂ X and Θ 1 , Θ 2 , Θ 3 ⊂ Θ. The first dataset, X 1 , refers to the unimodal spatial-discrimination task, which was used to constrain parameter subset, Θ 1 , that comprised unimodal stimulus reliabilities as well as lapse rates in the different sessions of this task. X 2 refers to the pointing practice task used to estimate parameter subset, Θ 2 , which comprised only the variability in localization responses due to other factors than spatial perception. The third subset comprised data from three tasks, , the bimodal spatial-discrimination task, and the unimodal spatial-localization task for the pre- and post-recalibration phase, respectively. These three datasets constrained overlapping sets of parameters . Thus, they were fit jointly. and comprised only localization bias parameters as well as task-specific lapse rates; stimulus reliabilities and response noise parameter estimates were taken from Θ 1 and Θ 2 . Only , the parameter set constrained by participants’ post-recalibration localization responses, included parameters specific to each of the three models of cross-modal recalibration such as the learning rate and common-cause prior. As outlined before, we did not fit the localization responses from the audiovisual recalibration task, i.e., the build-up of the recalibration effect (Section S13 in S1 Appendix), because the shifts ( and ) are serially dependent, that is, the size of the shift in trial t depends on the size of the shift in trial t − 1. Given that there is no closed-form solution for the causal-inference model, we would have needed to use Monte Carlo simulations to approximate the probability distribution of the location estimates. Yet, the location estimates depend on the serially dependent shifts and consequently the number of necessary samples would have grown exponentially from trial to trial. Thus, it was computationally challenging to estimate the likelihood of the parameters and the model given the data from the audiovisual recalibration task. Instead, we used Monte Carlo simulations to approximate the probability distribution of the shift updates and accumulated at the end of the audiovisual recalibration task, i.e., we fitted the final recalibration effect rather than its build-up. Model log-likelihood—Unimodal spatial-discrimination task. In the unimodal spatial-discrimination task (auditory session), participants indicated whether the test stimulus was located to the left, r A,n(t) = 0, or to the right of the standard stimulus, r A,n(t) = 1. For each such trial, the likelihood of model parameters given the response r A,n(t) is (36) where is defined in Eq 23. Thus, the log likelihood given responses across all T 1,A trials is (37) Analogously, (38) The log likelihood across all four sessions is (39) The set of free parameters that were constrained by the binary responses in this task is . Model log-likelihood—Pointing practice task. For each trial, the likelihood of the model parameters given a visual stimulus at location s V,o(t) and a subsequent response (cursor setting) r V,o(t) is where φ refers to the Gaussian probability density. The only free parameter that was constrained by this task is Θ 2 = {σ r }. The maximum-likelihood estimate of σ r is (40) where T 2 and the sum do not include outlier trials. Model log-likelihood—Bimodal spatial-discrimination task. In the bimodal spatial-discrimination task, for each trial t, participants indicated whether the visual test stimulus at location was located to the left, r l(t),n(t) = 0, or to the right, r l(t),n(t) = 1, of the auditory standard stimulus presented at s A,l(t) . For each such trial, the likelihood of model parameters given the response r l(t),n(t) is (41) where is defined in Eq 31. Thus, the log likelihood given the responses across all trials is (42) is a function of p l(t),n(t) , which in turn depends on the bias parameters a A and b A , the parameters of the supra-modal prior over locations and , as well as the measurement variances and (see Eqs 24–27). Fitting both the bias parameters and the supra-modal prior at once was impossible as they effectively traded off. Thus, we implemented a non-informative supra-modal prior over stimulus locations by setting to 100 and to 0. , , , and were estimated based on the forced-choice responses from the unimodal spatial-discrimination task. The final set of free parameters that were constrained by the binary responses in this task was . The bias parameters, a A and b A , were jointly estimated using the data from this task as well as pre- and post-recalibration responses from the unimodal localization task. Model log-likelihood—Unimodal localization task—Pre-recalibration phase. In this task, each localization results in cursor location settings r A,i,j,l(t) and on trial t of session (i, j) where i indicates the visual-reliability condition and j the recalibration direction in the subsequent recalibration phase. The localization responses from this task were modeled as Gaussian-distributed. From these distributions, we can compute the likelihood of a model M and the parameter set as the Gaussian probability density function in Eq 33 evaluated at the observed localization responses r A,i,j,l(t) and : (43) The log likelihood is the sum of the log likelihoods across the trials of all six sessions: (44) The log-likelihood depends on and , which in turn depend on the bias parameters a A and b A , the parameters of the supra-modal prior and , as well as the measurement variances and (see Eq 34), and the response noise σ r . We chose a flat prior over stimulus locations, the (scaled) measurement variances ( and ) were estimated based on the unimodal spatial-discrimination task, and σ r was estimated based on the pointing practice task. Consequently, the actual set of parameters constrained by localization responses from the pre-recalibration task was . Here, the values of the T variables and the sums do not include outlier trials. Model log-likelihood—Unimodal localization task—Post-recalibration phase. Localization responses in the post-recalibration phase additionally depend on the updates for the visual and auditory shifts accumulated after 120 trials during the recalibration phase, and (Eq 34). Since these accumulated shift updates are not accessible to the experimenter, we marginalized over these shift updates to calculate the log-likelihood. For each of the six experimental sessions (i, j), the log likelihood of a model M and its parameter set is the integral over and of the likelihood of the final shift updates given the observed data , the model M, and the parameter set , , multiplied by the joint probability of the auditory and visual shift updates, , summed across all six sessions (45) We will describe in the following sections how the joint probability and the log-likelihood were derived for each of the three models of cross-modal recalibration. Reliability-based model of cross-modal recalibration. In this model, auditory and visual shift updates have a constant ratio of , the ratio of the measurement noise variances. Therefore, can be rewritten as , and we can express the likelihood given a single auditory localization response as (46) where (47) The visual response likelihoods and means are defined analogously (Eq 34). Thus, the joint likelihood can be written as . Given that the likelihood depends only on , we only need to integrate over and the log likelihood simplifies to (48) The shift updates are stochastic because the visual and auditory measurements in each trial of the audiovisual recalibration task are stochastic. We cannot derive their probability distribution in closed form. Instead, we used Monte Carlo simulation to approximate this probability distribution. Given the reliability-based model, for each candidate set of parameters , visual-reliability condition i, and recalibration direction j, we simulated 120 recalibration trials analogous to the audiovisual recalibration task. We repeated this simulation 1,000 times, resulting in a sample of 1,000 shift updates ( ) and checked whether the distribution of the 1,000 samples was well fit by a Gaussian with mean and standard deviation equal to the corresponding empirical parameters of the sampled distribution. To do so, we binned the simulated shift updates into 100 bins of equal size and computed the correlation between the observed and predicted number of samples per bin. The resulting value of R2 was greater than 0.925 in all cases (Section S8 in S1 Appendix). The approximated probability distribution of the shift updates is denoted as . We approximated the integral in Eq 48 by numerical integration over a region discretized into 100 bins. To ensure that we include enough of the tails of the probability distribution of the shift updates, we set the integration region to be three times larger than the range of the samples, and centered the integration region on that range. Thus, the lower bound, lb, is defined as lb = Δ min − (Δ max − Δ min ) and the upper bound is ub = Δ max + (Δ max − Δ min ). The numerical integration region was derived separately for each session. The log likelihood is: (49) where (50) with defined in Eq 47 and defined in Eq 34. and depend on the bias parameters a A and b A , as well as on , which depends on the measurement variances and given bimodal presentation and the common learning rate α. Note that and are not directly constrained by data from bimodal trials (because these trials were not included in the model fitting), but estimated based on their influence on the shift updates. Specifically, and affect the spread of the measurements, and as a consequence they influence the width of the predicted probability distribution of measurement-shift updates, which in turn affect the log likelihood of the model. The set of free parameters for this model is . was constrained to be a non-decreasing function of visual-reliability condition i, and and were constrained to be no greater than five times the average values of and across participants (Section S14 in S1 Appendix). The values of the T variables and the sums do not include outlier trials. Log-likelihood—fixed-ratio model of cross-modal recalibration. In this model, auditory and visual shift updates have a fixed ratio of (Section S15 in S1 Appendix). Thus, we can express the likelihood for the fixed-ratio model and parameter set given an auditory localization response in a similar form to Eq 47: (51) The approximation was generated in the same way as for the reliability-based model. The set of free parameters for this model is . Note that even though the shift updates in the fixed-ratio model do not depend on the stimulus reliabilities, the log-likelihood does due to the influence of stimulus reliability on the estimates in the localization task (Eq 51) and due to the influence of the spread of the simulated measurements on the spread of the estimated distribution of . As in the reliability-based model, was constrained to be a non-decreasing function of visual-reliability condition i, and and were constrained to be no greater than five times the average values of and across participants. Log-likelihood—causal-inference model of cross-modal recalibration. For this model, the joint likelihood was truly two-dimensional. Thus, we approximated the joint probability of the auditory and visual shift updates, by drawing 1000 samples of shift-update pairs and compared the set of sample pairs to a 2-d Gaussian with the sample mean and covariance as parameters. We again tested whether the two-dimensional Gaussian distribution provided a good fit to the simulated density (defined as R2 > 0.925). If the Gaussian fit was poor, we used a kernel density estimate (Gaussian kernel smoother with σ chosen automatically) of the distribution based on the 2-d density of the samples [77, 78]. Overall, the simulated auditory and visual shift updates were very well fit by a bivariate Gaussian, and we rarely used a kernel density estimate (Section S8 in S1 Appendix). We additionally used simulations to verify that our estimates of the partial model log-likelihood ( ) had reasonably small bias (Section S8 in S1 Appendix). For the causal-inference model, we approximate the log likelihood by numerical integration over a 2-dimensional region of Δ A , Δ V space discretized into 100x100 bins. The upper and lower bounds were determined for both dimensions in the same way as before. The log likelihood is: (52) where is defined analogously to the reliability-based model (see Eq 50) with the exception that and are defined in Eq 34. The set of free parameters used to fit the causal-inference model to the localization responses in the post-recalibration task is or . Parameter estimation. For each model, we approximated the set of parameters Θ 1 and Θ 2 that maximized the likelihood using the MATLAB function fmincon and Python SciPy.optimize [79], and approximated Θ 3 using the BADS toolbox [80]. To deal with the possibility that the returned parameter values might correspond to a local minimum, we ran BADS multiple times with different starting points, randomly chosen from a D-dimensional grid, where D is the number of free parameters in Θ 3 (see Table 1 for a summary of the free parameters for each model) and with three evenly spaced values chosen for each dimension. The final parameter estimates were those with the maximum likelihood across all runs of the fitting procedure. PPT PowerPoint slide PNG larger image TIFF original image Download: Table 1. Summary of model parameters in Θ 3 . https://doi.org/10.1371/journal.pcbi.1008877.t001 [END] [1] Url: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008877 (C) Plos One. "Accelerating the publication of peer-reviewed science." Licensed under Creative Commons Attribution (CC BY 4.0) URL: https://creativecommons.org/licenses/by/4.0/ via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/