(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . A mechanism for punctuating equilibria during mammalian vocal development [1] ['Thiago T. Varella', 'Department Of Psychology', 'Princeton University', 'Princeton', 'New Jersey', 'United States Of America', 'Yisi S. Zhang', 'Princeton Neuroscience Institute', 'Daniel Y. Takahashi', 'Asif A. Ghazanfar'] Date: 2022-08 Evolution and development are typically characterized as the outcomes of gradual changes, but sometimes (states of equilibrium can be punctuated by sudden change. Here, we studied the early vocal development of three different mammals: common marmoset monkeys, Egyptian fruit bats, and humans. Consistent with the notion of punctuated equilibria, we found that all three species undergo at least one sudden transition in the acoustics of their developing vocalizations. To understand the mechanism, we modeled different developmental landscapes. We found that the transition was best described as a shift in the balance of two vocalization landscapes. We show that the natural dynamics of these two landscapes are consistent with the dynamics of energy expenditure and information transmission. By using them as constraints for each species, we predicted the differences in transition timing from immature to mature vocalizations. Using marmoset monkeys, we were able to manipulate both infant energy expenditure (vocalizing in an environment with lighter air) and information transmission (closed-loop contingent parental vocal playback). These experiments support the importance of energy and information in leading to punctuated equilibrium states of vocal development. Species can sometimes evolve suddenly; their appearance is preceded and followed by long periods of stability. This process is known as “punctuated equilibrium”. Our data show that for three mammalian species—marmoset monkeys, fruit bats, and humans—early vocal development trajectories can also be characterized as different equilibrium states punctuated by sharp transitions; transitions indicate the advent of a new vocal behavior. To better understand the putative mechanism behind such transitions, we show that a balance model, in which variables trade-off in their importance over time, captured this change by accurately simulating the shape of the developmental trajectory and predicting the timing of the transition between immature and mature vocal states for all three species. Two variables—energy and information—were hypothesized to trade-off during development. We tested and found support for this hypothesis in analyses of two marmoset monkey experiments, one which manipulated energy metabolic costs and another which manipulated information transmission. We find that all three species’ trajectories are best fit by the balance model, the one consistent with punctuated equilibria. We then show that energy and information—and how their importance varies over time for individuals—are good candidates for a contextual change that leads to the sharp transitions observed in vocal behavior. Finally, we test our predictions using new analyses of previously published data from experiments with marmoset monkeys [ 10 , 11 ]. We focus on three mammalian species—marmoset monkeys, fruit bats, and humans. In all of them, vocal development is influenced by postnatal experience during infancy. Using densely sampled longitudinal vocal recordings, we first demonstrate that gradual vocal development was punctuated by rapid change on the timescale of days. We ask two basic questions: 1) how can we mathematically describe the different species’ vocal developmental trajectories? And 2) are there commonalities among them? We consider three different models of development—linear, recurrent, and balance. Linear and recurrent models are standards in the behavioral development literature. In the linear model, the trajectory is changing at a constant rate. The recurrent model consists of a trajectory that changes until it reaches a stable state and then there is no further change. It is generated by a factor that changes iteratively depending upon a previous time point until it achieves a stable state [ 8 ]. Finally, the balance model is one in which change occurs between two stable states and would best represent punctuated equilibria. The balance model can be generated by the weighted sum of two factors whose weights slowly change during development; it is like the double-well potential model used in statistical physics [ 9 ]. Since evolution and development are in many ways similar, contingency-based processes (just on different timescales [ 5 – 7 ]), the punctuated equilibrium framework may also be applicable to behavioral development. By analogy with the rapid formation of a new animal species or languages, contextual changes for a developing individual could suddenly lead to a new behavior (i.e., a new locomotion pattern) while co-existing with other behaviors previously established. In these cases, “context” is any new state the individual may occupy; this could be a new body state (following, for example, the growth of one or more parts of the vocal anatomy) and/ or it could be a new environmental state (for example, changing interactions with caregivers or other members of the social group). Here, we investigate this possibility in the development of vocal behavior and explore putative mechanisms. The sudden appearance of a new species preceded and followed by periods of relative stability is known as “punctuated equilibria” (i.e., periods of equilibrium punctuated by abrupt changes) [ 1 , 2 ]. This theory accounts for the appearance of new species in a manner that is different from the gradual change that we normally associate with evolution [ 1 ]. More recently, the theory has been applied to the evolution of communication in humans. In both natural and artificial language evolution, long periods of gradual divergence are interrupted by periods of rapid change [ 3 , 4 ]. The source of this non-linearity in language evolution is thought to be the balance between optimization and flexibility [ 4 ]. Results Marmoset monkeys, fruit bats, and humans all use mechanisms for vocal production similar to most mammals [12]. Vocal production results from the interactions among a large number of components: the vocal apparatus (larynx, vocal tract, lungs, etc.), the muscles that move them, the neural circuit activity that leads to muscular contraction, and the organism’s experience that modifies those neural circuits [13]. Monitoring and manipulating all these components are not possible, especially during development. Moreover, at any age, there are an exponentially large number of possible configurations they could take. Measurement of a few key parameters that can account for the shape of a particular vocal developmental trajectory, and that works across species, would be ideal. With this in mind, we analyzed longitudinal datasets freely available to the public [14–17] and measured the changes in the distribution of vocal acoustic features common to our three species’ vocalizations: duration, Wiener entropy, dominant frequency, and dominant frequency of amplitude modulation [17–19]. To reduce the dimensionality of these four vocal parameters, we used principal component analysis to compute their collective first principal component. In our subsequent analyses, we used only this first principal component (“Principal Acoustic Component” or PAC) as it was the component that captured most of the variance of the vocal development dynamics (Fig 1A). PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 1. Different models could explain a transition between two developmental states. a. Diagram illustrating one hypothetical transition between immature vocalizations to mature vocalizations. The y-axis is the first component of the PCA performed on the vocalizations and labeled as the Principle Acoustic Component (PAC). b. Sample spectrograms of vocalization for an infant (immature, left) and an adult (mature, right). From top to bottom, we show an example from common marmoset (Callithrix jacchus), Egyptian fruit bats (Rousettus aegyptiacus), and humans. c. Comparison between the probability distribution of the PACs of immature calls (gray) and mature calls (black). From top to bottom: common marmoset, Egyptian fruit bats, and humans. *** means p < 0.001. d-f. Dynamics of the vocalization landscape during the development. The vocalization landscape depends on the probability distribution of the PAC. Lighter colors (light blue, yellow, and pink) represent the immature stage of development, darker colors (dark blue, orange, and red) the mature stage. For an explanation of equations on top of each figure, see Methods - Development models. g-i. Transition predicted by each mechanism. The transition is shown by the PAC associated with maximum height in the vocalization landscape throughout development. From left to right: d,g. linear model, e,h. recurrence model for an asymptotic transition (inspired by [8]) and f,i. balance model for a phase transition, based on a non-equilibrium dynamical balance given by the weighted sum of two constraints. https://doi.org/10.1371/journal.pcbi.1010173.g001 For all three species, the distribution of PACs in earlier versus later periods of development was different (p < 0.001, Kolmogorov-Smirnov test for the equality of probability distribution) (Fig 1B and 1C). We associated a thermodynamic cost function for each probability distribution using the maximum entropy principle [13,20]. The thermodynamic cost is proportional to the negative logarithm of the probability distribution. Then, we defined a two-dimensional landscape as the opposite of the thermodynamic cost. In other words, changes in the probability distribution of vocal acoustics can be interpreted as modifications in the landscape of the vocal production. Thus, we can ask, What kind of changes in the vocalization landscape best describes the trajectory of vocal development? The fit of three different models were tested: linear (Fig 1D and 1G), recurrent (Fig 1E and 1H), and balance (Fig 1F and 1I). The models were chosen based on the behavioral development literature and on the possible behavioral landscape dynamics that could be observed (see Methods - Development models). The linear model is the simplest possible trajectory between two points; it represents a range of psychological and developmental models that (like most evolutionary accounts) involve gradual changes. For example, Piaget [21] and J.J. & E.J. Gibson [22] argued that sensory perception gradually emerges in the human infant. The recurrence model focuses on a curvilinear building-up to a single stable state. The development of mature song in zebra finch via tutoring is captured by this model [8]. Finally, the balance model could account for a nonlinear shift between two stable states; we identified this possibility via its application in the phase-transition literature of physics [9]. We fit each of these models to the vocal developmental trajectories of each species to determine which one best captures their shape. Fig 2A–2C shows exemplar trajectories, while Fig 2D–2F show the shape of change across the population (marmosets: n = 10 and 105,904 vocalizations; bats: n = 13 and 1878 vocalizations; humans: n = 8 and 1055 vocalizations) (Fig 2A–2F). For each model, we fit the initial (immature) and the final (mature) landscapes through 3 parameters: the last immature day, the first mature day, and the thermodynamic “temperature” β that is important for the relationship between the landscape and the probability distribution (see Methods - Estimation of the vocalization thermodynamic cost and landscape). In this manner, the success of the model would be measured by whether the shape of the trajectory could be predicted using only the extreme data points—the data between the last immature day and the first mature day were not used as inputs. We compared the goodness of fit (adjusted R2) of each model to the full data. For all species, the balance model best captured the trajectory of vocal development (adjusted R2 respectively for linear, recurrent and balance model: marmosets: 0.54, 0.60, and 0.86; bats: 0.80, 0.63, and 0.96; humans: 0.70, 0.71, and 0.89)(Fig 2G–2I). We performed statistical tests to assess whether the adjusted R2 of the balance model was significantly higher than that of the linear and recurrence model for the three species. All were significant (p < 0.05) except for the balance model compared to the linear model in humans (p = 0.093). PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 2. The energy-information balance model most successfully reproduces the transition day between two states for different species. Analysis made using common marmoset in the left column, Egyptian fruit bat in the middle column, and human in the right column. a-c. Best model fit for individuals of the three species. Notice that for the bat dataset, the data collection was from different recording periods instead of a single longitudinal experiment; this lead to some gaps in the data. The black dots are the typical PAC, i.e., a moving average calculated from the experimental data per day of recording (see Methods - Estimation of typical PAC per day). Blue lines are the best fit for the linear model, orange lines are the best fit for the recurrence model and red lines are the best fit for the balance model. d-f. Model fitting for the population for the three different species. g-i. Comparison between the best R2 for each model. j-l. Comparison of transition date predicted by the balance model and calculated from experimental data. Both transition dates were calculated by fitting a sigmoid to the values, and distributions were obtained via bootstrap. https://doi.org/10.1371/journal.pcbi.1010173.g002 If the balance model is accurate, it should also be able to predict when the transition between the two stable states occurs. That transition day is the time when the two landscapes are balanced, i.e., have the same maximum height. To estimate the transition day during vocal development, we fit the experimental data with an S-shaped curve (a sigmoid function). We then tested whether the balance model could correctly estimate the timing of developmental transitions in the vocalization data. For all species, the transition day estimated by the balance model is within the confidence interval of the transition seen in the data (Fig 2J–2L). For marmoset monkeys, the model transition day was 20.07 whereas the experimental transition day was 20.85 (p = 0.528; 95% CI = [17.55, 20.87]). For bats, the model and experimental transition days were 54.22 and 55.88, respectively (p = 0.604, 95% CI = [35.67, 60.55]). For humans, the model and experimental transition days were 191.83 and 173.41 (p = 0.331, 95% CI = [104.30, 238.47]). These data indicate that early vocal development in all three species exhibits punctuated equilibria—equilibrium states separated by a sharp transition. The balance model provides an account for why there are sudden transitions during vocal development. Knowing how a behavioral landscape changes helps us understand the underlying causes of those transitions. The balance model assumes that the vocalization landscape—a changing context—consists of two components (that are each landscapes as well). Furthermore, it predicts that these two components will trade-off, one increasing and the other decreasing throughout development. What could those components be? One of the most important trade-offs in animal behavior is between metabolic energy and information [23], so that was a logical possibility. In many animals, vocal output is linked to increases in metabolic energy expenditure [24,25]. For example, louder versus softer vocalizations in zebra finches and humans require greater energy expenditures [26,27]. In marmoset monkeys, infant vocal output is tightly correlated with fluctuations of arousal (a marker of energy allocation) [28,29]. With regard to information and vocalizations, there are also many accounts. For example, the crying rate is highest in human infants during the first two months of life during which they have poor control over phonation [30]. As they begin to cry less and produce more steady (tonal) vocal sounds, they more reliably elicit vocal responses from caregivers [31]. The same is true for marmoset monkey infants and the elicitation of responses from caregivers [17,32]. It is important to keep in mind the way we defined the landscape: the peaks in the landscape are associated with the behavior that is produced with higher probability. Moreover, we assume some vocalizations are more energetically costly (as in metabolic cost) than others, and some are more efficient in transmitting information than others. When the energy landscape is at its peak, the assumption in our model is that more energy is being expended on vocalizations as opposed to other behaviors because to do so is less costly. Likewise, when the information landscape is at its peak, we are assuming that the vocalizations are more efficient in transmitting information. If the final landscape is a weighted sum of two landscapes, as described by the balance model, one landscape will start high and decrease, and the other will start low and increase. Thus, for the case of vocal development, we hypothesized that very young animals produce immature vocalizations at high rates (higher energetic landscape since vocalizations will be less energetically costly to produce); these vocalizations have less information content as they less reliably elicit responses from conspecifics (lower information landscape). As they get older, more mature-sounding vocalizations are produced at lower rates (lower energetic landscape) but contain more information content with greater likelihood of eliciting a conspecific response (higher information landscape). Thus, according to our hypothesis, the information component of the vocalization landscape, C 1 (x), is initially low (near 0) and then increases due to the changing weight λ. A higher information component of the landscape leads to an increase in the information transmission efficiency (Fig 3A and 3B). We tested our hypothesis directly in developing marmoset monkeys by measuring information transmission efficiency via the change in probability of parental responses following an infant vocalization. Using Granger causality, we found that, as marmoset infants get older, their vocalizations elicit parental responses with greater reliability (i.e., information component of the landscape increases and the information transmission efficiency increases; Fig 3C). If information plays a causal role in shaping the vocal developmental trajectory in the manner predicted by the balance model, then changing the transmission of information should alter the timing of the transition between equilibrium states. That is, the punctuation described by the transition day should shift. For example, in a situation where one individual produces vocalizations with higher information transmission efficiency (i.e., the landscape of the information component has a higher maximum (Fig 3D)), the transition from immature to mature vocalization should occur earlier (Fig 3E). We used data from a published experiment that manipulated the degree of parental contingency of vocal feedback that an infant marmoset receives during development to test this “information” hypothesis [10]. PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 3. Information transmission efficiency is related to the mature component of the vocalization landscape. a. Schematic of how the predicted information component of the vocalization landscape, λC 1 (x), varies from immature (smaller λ) to mature phases (larger λ) of development, being lower during the mature stage. λ is the parameter that controls the balance between landscapes. C 1 (x) is the component of the landscape which relevance increases during development. b. Expected increase of efficiency in information transmission given the increase in information landscape. c. Observed information transmission of marmoset infant calls throughout development. The shaded region represents a 95% confidence interval. d. Schematic of two situations with different information component of the landscape C 1 (x): the plot on the top represents a higher information landscape, the plot on the bottom represents a lower information landscape. The black line represents the energy component, C 0 (x), and is assumed to be constant in the two scenarios. e. Predicted vocal dynamics, measured by the optimal PAC, for the two scenarios, showing that higher information landscape predicts faster transition. f. Observed vocal dynamics from the feedback contingency manipulation setup in marmosets. Dashed lines represent transition day T high for high contingency data (dark green line) and T low for low contingency data (light blue line). T high < T low with p < 0.001. https://doi.org/10.1371/journal.pcbi.1010173.g003 In the experiment, there were three pairs of dizygotic twins (6 infants from 3 different sets of parents). Starting at postnatal day 1 (P1), one randomly selected twin was provided the best possible simulated “parent” who gave 100% vocal feedback via a computer-controlled closed-loop playback system when the infant produced an immature contact call. The other twin received vocal feedback to only 10% of the contact calls it produced [10]. This contingency experiment was performed approximately every other day for less than 1 hour after which the infants were returned to their families. In the context of the current study, a higher level of simulated parental responsivity to an infant’s vocalizations is effectively increasing the information landscape and thus should shift the transition day in a manner predicted by the balance model: an earlier transition. The opposite should be true for infants whose vocalizations elicited simulated parent calls with a low probability. As predicted, we observed that the low contingency marmosets do have a significantly later (p < 0.001) transition day than the high contingency marmosets, both estimated by fitting a sigmoid (high contingency data transition day = 9.0, 95% CI = [7.1, 10.2]; low contingency transition day = 25.5, 95% CI = [13.5, 33.7]) (Fig 3F). Statistical tests were performed to check whether the balance model would predict the transition day similarly to what we observed in Fig 2J–2L. The statistical test performed after bootstrapping the transition day given by the balance model and the sigmoid fit showed that they were not significantly different for either the high contingency data (p = 0.284) or low contingency data (p = 0.168). Therefore, using the balance model and different information landscapes, we could predict the qualitative changes in the transition day without any fitting to the data. Likewise, according to our hypothesis, the energetic component of the landscape, C_0 (x), is initially high then decreases as the weight (1-λ) changes (Fig 4A), which could be a result in changes of the energetic costs of producing vocalizations. One consequence of that change could be a decrease in call rate (Fig 4B), given that the more the marmoset vocalizes, the more energy it spends. We tested the hypothesis indirectly by measuring the vocalization rate over time. We found that, as the marmoset gets older, the number of vocalizations decreases (Fig 4C). Similar to the manipulation of information landscapes, the manipulation of energy landscapes should also affect the timing of the transition day. If we increase the energy landscape, then it should take longer for both landscapes to balance out: The transition day is predicted to be later (Fig 4D and 4E). For both infants and adult marmoset monkeys, vocal production is dependent upon respiration [17,29,33], as it is for all terrestrial mammals [12,34]. We can manipulate the energy landscape by reducing the effort it takes to respire by placing individuals in a helium-oxygen (heliox) environment. (Indeed, for this reason, it is used by clinicians to treat children with respiratory ailments [35].) The lighter air reduces the energy expenditure for respiration and, logically, for vocalizations as well. PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 4. Energy metabolic cost is related to the immature component of the vocalization landscape. a. Schematic of how the predicted energy component of the vocalization landscape, (1−λ)C 0 (x), varies from immature (λ closer to 0) to mature (λ closer to 1) phases of development, being higher during the mature stage. λ is the parameter that controls the balance between landscapes. C 0 (x) is the component of the landscape which relevance decreases during development. b. Expected decrease in call rate. c. Observed decrease in call rate. The shaded region represents a 95% confidence interval. d. Schematic of two situations with different energy component of the landscape C 0 (x): the plot on the top represents a lower energy landscape; the plot on the bottom represents a higher energy landscape. The black line represents the information landscape, C 1 (x), and is assumed to be constant in the two scenarios. e. Predicted vocal dynamics, measured by the optimal PAC, for the two scenarios, showing that lower energy landscapes predict faster transition. f. Observed vocal dynamics from the heliox setup in marmosets. Dashed lines represent transition day T air for regular air data and T heliox for heliox data. T air < T heliox with p < 0.001. https://doi.org/10.1371/journal.pcbi.1010173.g004 We again used data collected from a published study wherein, for 10 minutes per recording session (every other day for 2 months), infant marmosets were placed in an 80% helium and 20% oxygen environment; the mix is lighter than regular air but has the same concentration of oxygen [11]. Thus, in the brief period in which the air is lighter, the vocalization metabolic cost is reduced. A lowered metabolic cost of producing vocalizations translates to greater frequency of vocal output and an increase in its representation in the landscape. A lower metabolic cost will increase its representation in the landscape (see Methods –Estimation of the vocalization thermodynamic cost and landscape). As such, we predicted that the transition day would be later for those vocalizations of infant marmosets recorded while in the heliox condition versus those recorded while in regular air. [To be clear, we are comparing the vocal developmental trajectory as measured in heliox versus measured in regular air. We are not assessing the long-term influence of heliox on vocal production in regular air.] Indeed, this is what we observed: the transition day is significantly later for heliox compared to air (p < 0.001; heliox transition day = 30.0, 95% CI = [29.8, 32.6]; regular air transition day = 10.9, 95% CI = [8.5, 13.6]) (Fig 4F). Likewise, the statistical test performed after bootstrapping the transition day given by the balance model and the sigmoid fit revealed that they were not significantly different for vocalizations produced in either regular air (p = 0.61) or heliox (p = 0.092). [END] --- [1] Url: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010173 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/