(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org. Licensed under Creative Commons Attribution (CC BY) license. url:https://journals.plos.org/plosone/s/licenses-and-copyright ------------ Imperfect language learning reduces morphological overspecification: Experimental evidence ['Aleksandrs Berdicevskis', 'Department Of Language', 'Linguistics', 'Uit The Arctic University Of Norway', 'Langnes', 'Tromsø', 'Arturs Semenuks', 'Department Of Cognitive Science', 'University Of California', 'San Diego'] Date: 2022-02 In Section 3.1, we show that the assumption our experiment is based upon is valid and that reduced learning time did lead to imperfect learning. In Section 3.2, we show that imperfect learning led to a decrease in overspecification. In Section 3.3, we investigate this decrease more closely and show that it affected verbs, but not nouns, and that within verbs the endings (agent markers) were affected much more strongly than the stems (lexical meanings). We start by testing the assumption that reduced learning time actually leads to imperfect learning (see Section 2.2). The differences between transmission fidelity at generation 2 in the normal condition (only long-time learners) and both interrupted conditions (only short-time learners) are represented on Fig 3 , and a two-tailed t-test yields the following results: t(42.9) = 2.84, p = 0.007, 95% CI for difference in means [0.01, 0.06], Cohen’s d = 0.73. We do not include later generations into analysis since their learner type is confounded with the complexity of the input, which depends on the output of previous generations. See S3 Fig for more detailed results. Interestingly, if we compare TTR of long-time and short-time learners at generation 2 (see Fig 5 ), as we did in Section 3.1 with transmission fidelity, we observe no differences in means, though variance is visibly different between the conditions (t(32.3) = 0.86, p = 0.395, 95% CI for difference in means [-0.009, 0.023], Cohen’s d = 0.196). In other words, imperfect learning does not necessarily cause simplification immediately, within one generation. From Table 1 we can conclude that there was a reduction of TTR over generations in condition T (since the negative slope for generation is significantly different from zero), and a similar reduction in condition P (since the interaction for generation and condition P is small and not significant). In condition N, however, the reduction was smaller, since the interaction between condition N and generation is of the same magnitude as the effect of generation. The LMM includes fixed effects of generation, condition and their interaction, and by-chain random intercepts and random slopes for generation (the lme4 notation is provided in Eq 1 ). We use treatment coding (a.k.a. dummy coding) for condition, with condition T as reference level. Since TTR is on a bounded scale (0, 1], we log-transform the TTR values before fitting the model. See S1 Appendix for the R implementation and tests of the assumptions. (1) In order to explore the role of condition and generation we fit a linear mixed-effect regression model (LMM). We largely follow the recommendations for applying regression models outlined in [ 65 ]. We do all calculations in R [ 66 ], using packages lme4 [ 67 ] for constructing mixed-effect models and lmerTest [ 68 ] for calculating significance of estimated parameters by REML t-tests with the Satterthwaite approximation to degrees of freedom. We also use ggplot2 [ 69 ] for creating plots and effsize [ 70 ] for measuring effect sizes for t-tests. R scripts with comments are available in S1 Appendix . 3.3. Simplification primarily affects agent-marking on verbs Fig 6 represents TTR calculated separately for nouns and verbs. For verbs the pattern of change is similar to the overall trend, cf. Fig 4. For nouns, no decrease is observed (there is a very small increase, but it is not significant). We fit an LMM with a specification similar to Eq 1, but add part-of-speech (reference level: noun) as a fixed effect. The model includes all possible interactions (that is, three two-way interactions and one three-way interaction). With the maximal random-effect structure, the model does not converge. We deal with that by removing the correlation parameter (cf. [71]). The resulting specification in lme4 notation is shown in Eq 2. (2) The summary of the model is given in Table 2. The most interesting coefficients in Table 2 are those that include the effect of the generation. For nouns in condition T, there is a minor increase in complexity (though the p-value is higher than any conventional significance threshold, and thus we do not have strong evidence to claim that the true value of the coefficient is different from zero), the same holds for other conditions. For verbs, however, the effect of generation is reversed and clearly negative (as was the case for the TTR in general). The slope is less steep in condition N. To sum up, verbs got simpler, while nouns did not. There was a clear difference between the normal condition and the interrupted ones. We resort to manual analysis in order to qualitatively explore how exactly languages may be simplified and complexified. Here and below we will refer to languages by means of one letter (N for normal chains, T for temporarily interrupted, P for permanently interrupted) and two numbers (a-b), where a is the number of the chain (ranging from 1 to 45), and b is the number of the generation (ranging from 0 to 10). Two examples of the complexification of the nominal system can be found in languages N9–10 (Table 3) and T18–10 (Table 4). N9–10 has two patterns of marking nominal number: -p (the main one) and -s. The -s ending originally emerged as a random mutation at generation 3 in a single sentence (‘round animals fall apart’) and was preserved unchanged (which is possible due to high transmission fidelity) until generation 10, where it spread also to the sentence ‘round animals’, thus developing from a single exception into a minor pattern. Language T18–10 lost all double agent-marking, and had its nominal system reorganized, with an emergent pattern where number distinction is marked through non-concatenative morphological processes—metathesis for one noun (senz, sezn) and consonant mutations for another (sign, dign). These changes, however, are not instances of complexification according to our definition and will not be captured as such by the TTR measure. The mutated plural form digm (instead of dign, a random change first appearing at generation 8), however, would. This language deserves further attention. Its unique development emerged through several stages (see chain T18 in S1 Appendix). First, a poor learner in generation 3 drastically reorganized the system, introducing numerous inconsistencies. Through generations 4–7, these inconsistencies were either eliminated or underwent exaptation (cf. [72]), which resulted in a stable system at generation 8 (identical to that in generation 10). For verbs, the manual analysis shows that the decrease in diversity occurred primarily due to the loss of the double agent-marking, either partial or full. T25–10 (Table 5) is an example of a language where the double agent-marking has completely disappeared. Interestingly, this language did not just abandon one of the agent markers -e and -u in favour of another one, but instead kept both, reanalyzing them as parts of the stems (out of 14 languages that shed the double agent-marking completely, only three abandon one of the markers, another 11 reanalyze them). Thus, verbs fu and fe both originate from the generation zero stem f-, while the stem b- did not survive. To test the aforementioned claim that the complexity loss mostly affects agent-marking (expressed by the last letter of the verb, when present), but not the lexical meaning (usually expressed only by the first letter), we calculate the TTR of verb “stems” (first letters) and verb “endings” (last letters). To make the measurement more adequate, we perform an additional manipulation. For endings, we calculate TTR within every verb and then average them. The reason for this step in calculations is that we want to focus on agent-marking and thus eliminate other semantic factors that could inflate TTR. If there is no agent-marking, the same verb should always look the same, and the TTR should be 0.25. For example, for language T25–0 (Table 2) that means averaging the TTR over the three subcorpora that all look like {u, u, e, e}, resulting in the value of 0.5. For language T25–10, the subcorpora look like {u, u, u, u}, {e, e, e, e}, {e, e, e, e}, and the resulting average TTR is 0.25. We should note that in some languages the ending gets reanalyzed and denotes not the type of agent, but the number of agents. We consider this phenomenon to be a type of agreement with subject, equally complex to the double agent-marking present in the initial languages, and thus our TTR measure reflects it correctly. For stems, we calculate TTR within two subcorpora: verbs that occur with the noun denoting the round animal and verbs that occur with the noun denoting the square animal. The rationale is the same as for endings: we want to eliminate all differences between verbs apart from lexical meaning. The drawback of this method is that languages like T25–10, where two verbs have the same first letter (but still have different stems since the vowel has been reanalyzed as part of the stem) receive a lower TTR than they should. Both subcorpora look like {t, t, f, f, f, f} and the TTR is 0.33, while 0.5 would have been a more adequate value. Such cases, however, are rare. For further details of TTR calculation, see S7 Text. The change of TTR of stems and endings over time is represented on Fig 7. We fit an LMM with the same specification as in Eq 2, but instead of part of speech, we add morpheme type (stem or affix, with stem being the reference level) as a fixed effect. The model is applied to verb data only. The summary of the model is given in Table 6. The most important pattern that can be observed is that complexity decreased over time in condition T, and that this trend was much more pronounced for affixes than for stems. In condition N, the trend was weaker (absent for stems). [END] [1] Url: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0262876 (C) Plos One. "Accelerating the publication of peer-reviewed science." Licensed under Creative Commons Attribution (CC BY 4.0) URL: https://creativecommons.org/licenses/by/4.0/ via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/