(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Meta-analysis reveals an extreme “decline effect” in the impacts of ocean acidification on fish behavior [1] ['Jeff C. Clements', 'Department Of Biology', 'Norwegian University Of Science', 'Technology', 'Trondheim', 'Josefin Sundin', 'Department Of Aquatic Resources', 'Swedish University Of Agricultural Sciences', 'Drottningholm', 'Timothy D. Clark'] Date: 2022-11 While we were able to test and exclude 3 biological factors, there are other potential factors that could drive the decline which are not readily testable from our database. For example, while we were able to partially test for the influence of background CO 2 variability by comparing cold- and warm-water species, most studies do not report the actual background CO 2 levels that the experimental animals (and their ancestors) have historically experienced. As such, we are unable to account for the historic CO 2 acclimation conditions of animals used in experiments. The impact of this with respect to the observed decline effect could stem from an increasing proportion of studies using captive-bred fish from recirculating aquarium systems with high CO 2 levels, as compared to fish from wild populations experiencing natural CO 2 levels. This is an unlikely explanation for the decline effect, however, given that the earliest studies conducted in 2009 to 2010 reporting high effect sizes were conducted with both captive-bred and wild-caught fish [ 8 – 10 , 17 ]. Furthermore, recent replication attempts of those initial studies using wild-caught fish have failed to replicate the large effect sizes [ 7 ]. Nonetheless, we recommend that future studies provide better background CO 2 information for the fish used in their experiments and use best practices for measuring and reporting carbonate chemistry [ 18 ]. Mean effect size magnitude (absolute lnRR ± upper and lower confidence bounds) as a function of time for datasets that only included experiments with (a) warm-water species, (b) olfactory-associated behaviors, and (c) larval life stages. Mean effect size magnitudes and confidence bounds were estimated using Bayesian simulations and a folded normal distribution. Note: Colors are aesthetic in nature and follow a gradient according to year of publication online. Source data for each figure panel can be found in S1 Data . The large effect size magnitudes from early studies on acidification and fish behavior are not present in the majority of studies in the last 5 years ( Fig 1b , S1 Table ). This decline effect could be explained by a number of factors, including biological. For example, cold-water fish in temperate regions experience a higher degree of temporal variability in carbonate chemistry parameters over large spatial areas [ 15 ]. Therefore, they may be less sensitive to changes in seawater CO 2 as per the Ocean Variability Hypothesis [ 16 ]. As such, if an increasing number of studies on cold-water species over time was responsible for the decline effect, removing cold-water species from the dataset (i.e., only including warm-water species) should result in the decline effect trend disappearing. This was not the case, as the decline effect persisted when only warm-water species were considered ( Fig 2a ). In the same vein, the strongest ocean acidification effects on fish behavior have undoubtedly been reported for chemical cue (herein “olfactory”) responses, and an increasing number of studies on nonolfactory behaviors could explain the decline effect. If this was true, removing nonolfactory behaviors from the dataset should negate the decline effect trend. Again, this was not the case ( Fig 2b ). Finally, early studies of ocean acidification and fish behavior used larval fish, which are typically considered to be more sensitive to environmental perturbations than juveniles and adults. If a greater proportion of studies used less sensitive life stages through time, then removing those life stages and focusing exclusively on larvae should abolish the decline effect. Once again, this was not the case ( Fig 2c ). These analyses show that ocean acidification studies on fish behavior exhibit a decline effect that is not explainable by 3 biological processes commonly considered important drivers of acidification effects ( Fig 2a–2c , S1 Table ). (a) Trend in raw effect size magnitudes (absolute lnRR) for each experiment in our dataset plotted as a function of year of publication online and color coded according to study. Data are fit with a Loess curve with 95% confidence bounds. (b) Mean effect size magnitude (absolute lnRR ± upper and lower confidence bounds) for each year of publication (online) in our dataset. Mean effect size magnitudes and confidence bounds were estimated using Bayesian simulations and a folded normal distribution. Note: Colors for (b) are aesthetic in nature and follow a gradient according to year of publication. Source data for each figure panel can be found in S1 Data . ES, effect size. Based on a systematic literature review and meta-analysis (n = 91 studies), we found evidence for a decline effect in ocean acidification studies on fish behavior ( Fig 1a and 1b ). Generally, effect size magnitudes (absolute lnRR) in this field have decreased by an order of magnitude over the past decade, from mean effect size magnitudes >5 in 2009 to 2010 to effect size magnitudes <0.5 after 2015 ( Fig 1a and 1b , S1 Table ). Mean effect size magnitude was disproportionately large in early studies, hovered at moderate effect sizes from 2012 to 2014, and has all but disappeared in recent years ( Fig 1a and 1b ). (a) Trend in raw effect size magnitudes (absolute lnRR) for each experiment in our dataset excluding all studies authored (or coauthored) by lead investigators of the 3 initial studies [ 8 – 10 ] plotted as a function of year of publication online and color coded according to study. Data are fit with a Loess curve with 95% confidence bounds. (b) Mean effect size magnitude (absolute lnRR ± upper and lower confidence bounds) for each year of publication online in our dataset excluding all studies authored (or coauthored) by lead investigators of the 3 initial studies. Mean effect size magnitudes and confidence bounds were estimated using Bayesian simulations and a folded normal distribution. Note: Colors in (b) are aesthetic in nature and follow a gradient according to year of publication. Also note that data begin in 2012 since all publications prior to 2012 included initial lead investigators in the author list. Vertical axes are scaled to enable direct comparison with Fig 1 . Source data for each figure panel can be found in S1 Data . It is important to note that the early studies published in 2009 to 2010 [ 8 – 10 ], and some subsequent papers from the same authors, have recently been questioned for their scientific validity [ 31 ]. Indeed, these early studies have a large influence on the observed decline effect in our analysis. At the request of the editors, we thus explored the potential for investigator effects, as such effects have been reported to drive decline effects for the field of ecology and evolution in the past (e.g., fluctuating asymmetry [ 32 ]). When all papers authored or coauthored by at least one of the lead investigators of those early studies were removed from the dataset (n = 41 studies, 45%), the decline effect was no longer apparent from 2012 to 2019 ( Fig 5 ). While conclusions regarding the potential roles of invalid data await further investigation [ 31 ], our results do suggest that investigator or lab group effects have contributed to the decline effect reported here. We suggest that future studies documenting the presence or absence of decline effects—and indeed meta-analyses in general—should carefully consider and evaluate whether investigator effects may be at play in a given field of study. Together, our results suggest that large effect sizes among studies assessing acidification impacts on fish behavior generally have low sample sizes, but tend to be published in high-impact journals and are cited more. Consequently, the one-two punch of low sample sizes and the preference to publish large effects has seemingly led to an incorrect interpretation that ocean acidification will result in broad impacts on fish behavior and thus have wide-ranging ecological consequences—an interpretation that persists in studies published today ( S2 Table ). (a, b) Google Scholar citation metrics as of September 10, 2021 for each of the studies included in our meta-analysis, including average citations per year (a) and total citations since 2020 (b). The initial 3 studies spearheading this field are denoted by the gray background, and the red dashed line represents the lowest citation metric among those 3 studies. Studies are ordered chronologically along the x-axis and color coded by year published online. (c) Mean effect size magnitude for each individual study as a function of journal impact factor (at time of online publication). (d) The number of citations per year for each study as a function of journal impact factor (at time of online publication). (e) The number of citations per year for each study as a function of mean effect size magnitude for that study. Note that, for panels (c) and (e), mean effect size magnitude for a given study is not a weighted effect size magnitude, but is simply computed as the mean of individual effect size magnitudes for a given study. Data are fit with linear curves and 95% confidence bounds, and points are color coded by study; the size of data points represents the relative mean sample size of the study. Source data for each figure panel can be found in S1 Data . Another prominent explanation for the decline effect is selective publication bias, as results showing strong effects are often published more readily, and in higher-impact journals, than studies showing weak or null results. Indeed, publication bias has been suggested as perhaps the most parsimonious explanation for the decline effect in ecology and evolution, as studies showing no effect can be difficult to publish [ 2 ]. This can be attributed to authors selectively publishing impressive results in prestigious journals (and not publishing less exciting results) and also to journals—particularly high-impact journals—selectively publishing strong effects. This biased publishing can result in the proliferation of studies reporting strong effects, even though they may not be true [ 26 ] and can fuel citation bias [ 27 ]. Indeed, a recent analysis suggested that field studies in global change biology suffer from publication bias, which has fuelled the proliferation of underpowered studies reporting overestimated effect sizes [ 28 ]. To determine if studies testing for effects of ocean acidification on fish behavior exhibited signs of publication bias and citation bias, we assessed relationships between effect size magnitude, journal impact factor, and Google Scholar citations ( Fig 4 ). Examining average citations per year and the total number of citations since 2020, 4 papers stood above the rest: the initial 3 studies in this field [ 8 – 10 ] and the sentinel paper proposing GABA A neurotransmitter interference as the physiological mechanism for observed behavioral effects [ 29 ] ( Fig 4a and 4b ). While it is difficult to quantify whether authors selectively published only their strongest effects early in this field, we were able to quantify effect size magnitudes as a function of journal impact factor. We found that the most striking effects of ocean acidification on fish behavior have been published in journals with high impact factors ( Fig 4c ). In addition, these studies have had a stronger influence (i.e., higher citation frequency) on this field to date than lower-impact studies with weaker effect sizes ( Fig 4d and 4e ). Similar results have been reported in other areas of ecology and evolution, perhaps most notably in studies regarding terrestrial plant responses to high CO 2 [ 30 ]. Experimenter/observation bias during data collection is known to seriously skew results in behavioral research [ 21 ]. For example, nonblinded observations are common in life sciences, but are known to result in higher reported effect sizes and more significant p-values than blinded observations [ 22 ]. Most publications assessing ocean acidification effects on fish behavior, including the initial studies reporting large effect sizes, do not include statements of blinding for behavioral observations. Given that statements of blinding can be misleading [ 23 ], there has also been a call for video evidence in animal behavior research [ 24 ]. Moreover, the persistence of inflated effects beyond initial studies can be perpetuated by confirmation bias, as follow-up studies attempt to confirm initial inflated effects and capitalize on the receptivity of high-profile journals to new (apparent) phenomena [ 25 ]. While our analysis does not empirically demonstrate that experimenter bias contributed to the decline effect, it is possible that conscious and unconscious experimenter biases may have contributed to large effect sizes in this field. Mean effect size magnitude (absolute lnRR) for each study as a function of the mean sample size of that study (i.e., sample size per experimental treatment). Note that mean effect size for a given study is not a weighted effect size magnitude, but is simply computed as the mean of individual effect size magnitudes for a given study. The vertical red dashed line denotes a sample size of 30 fish, while the horizontal red dashed line represents a lnRR magnitude of 1. Source data for each figure panel can be found in S1 Data . Experimental designs and protocols can introduce unwanted biases during the experiment whether or not the researchers realize it. For example, experiments with small sample sizes are more prone to statistical errors (i.e., Type I and Type II error), and studies with larger sample sizes should be trusted more than those with smaller sample sizes [ 19 ]. While we did not directly test it in our analysis, studies with small sample sizes are also more susceptible to statistical malpractices such as p-hacking and selective exclusion of data that do not conform to a predetermined experimental outcome, which can contribute to inflated effects [ 20 ]. In our analysis, we found that almost all of the studies with the largest effect size magnitudes had mean sample sizes (per experimental treatment) below 30 fish. Indeed, 87% of the studies (13 of 15 studies) with a mean effect size magnitude >1.0 had a mean sample size below 30 fish ( Fig 3 ). Likewise, the number of studies reporting an effect size magnitude >0.5 sharply decreased after the mean sample size exceeded 30 fish ( Fig 3 ). Sample size is of course not the only attribute that describes the quality of a study, but the effects detected here certainly suggest that studies with n < 30 fish per treatment may yield spurious effects and should be weighted accordingly. It is clear that the ocean acidification field, and indeed science in general, is prone to many biases including methodological and publication biases [ 6 ]. The key thing to note is that if science was operating properly from the onset, and early effects of ocean acidification on fish behavior were true, the relationships presented in Figs 1 and 2 would be flat lines showing consistent effect sizes over time. It is also evident that the decline effect discovered herein is not explainable by 3 likely biological culprits (outlined above). Thus, the data presented here provide a textbook example of a new and emerging “hot topic” field likely being prone to biases. Below, we underscore and assess the roles of 3 potential biases: (1) methodological biases; (2) selective publication bias; and (3) citation bias. We then explore the potential influence of authors/investigators in driving the decline effect. Being on our best behavior Our results suggest that large effects of ocean acidification on fish behavior were at least in part due to methodological factors in early studies (e.g., low sample sizes). Furthermore, the proliferation and persistence of this idea have likely been aided by the selective publication of large effect sizes by authors and journals, particularly at the onset of this field, and the continued high frequency of citations for those papers. It is important to note, however, that low sample size and selective publication cannot fully explain the strong decline effect detected here, and other biases and processes may be at play [7,31]. Nonetheless, we call on journals, journal editors, peer reviewers, and researchers to take steps to proactively address the issues of low sample size and selective publication, not only in the ocean acidification field, but also more broadly across scientific disciplines. To this end, we strongly argue that future ocean acidification studies on fish behavior should employ a sample size greater than 30 fish per treatment in order to be considered reliable. It is the combined responsibility of researchers, journal editors, and peer reviewers to ensure that submitted manuscripts abide by this guideline. To achieve this, authors should report exact sample sizes clearly in the text of manuscripts; however, from our analysis, 34% of studies did not do this adequately (see raw data in S2 Data). In addition, for other fields, we suggest that studies with higher sample sizes should be published alongside, if not very soon after, an original novel finding to ensure that such a finding is robust. Ideally, researchers would conduct pilot studies with varying sample sizes to determine an adequate sample size threshold and conduct appropriate prestudy power analyses; however, time and financial constraints can make this difficult. While adequate sample sizes will vary across topics and fields, ensuring that studies with large sample sizes are published early alongside those with smaller sample sizes can strive toward reducing the amount of time it takes to truly understand a phenomenon. Journals, researchers, editors, and reviewers can take additional steps to limit biases in published research. First and foremost, we suggest that journals adopt the practice of registered reports to ensure that studies not detecting an effect are published in a timely manner. Herein, journals should provide authors with the ability to submit proposed methodologies and have them formally peer reviewed prior to studies even being conducted. If methodologies are deemed sound (or revised to be so) and “accepted” by reviewers, journals should commit to publishing the results regardless of their outcome so long as the accepted methods are followed. Although registered reports may not be sufficient to avoid the influence of some issues such as poor data, they may reduce the risk of inflated results driving decline effects—and prolonged incorrect understanding—for other phenomena in the future. While not a silver bullet solution, this practice could help to reduce selective publication bias and the risk of early, flawed studies being disproportionately influential in a given field [33]. Researchers should also seek, develop, and adhere to best practice guidelines for experimental setups [34] to minimize the potential for experimental artifacts to influence results. Properly blinded observations [22] and the use of technologies such as automated tracking [35] and biosensors [36] can also reduce observer bias and increase trust in reported findings [37]. When automated methods are not possible, video recordings of experiments from start to finish can greatly increase transparency [24]. Editors and the selected peer reviewers should closely consider and evaluate the relevance and rigor of methodological approaches, which can help increase accuracy and repeatability [38]. When selecting peer reviewers for manuscripts, editors should also be aware that researchers publishing initial strong effects may be biased in their reviews (i.e., selectively accepting manuscripts that support their earlier publications) and ensure a diverse body of reviewers for any given manuscript when possible. While we do not empirically demonstrate this bias in our analyses, it is important to recognize and mitigate the potential for it to prolong inaccurate scientific findings. Finally, being critical and skeptical of early findings with large effects can help avoid many of the real-world problems associated with inflated effects. Interestingly, a recent study showed that experienced scientists are highly accurate at predicting which studies will stand up to independent replication versus those that will not [39], lending support to the idea that if something seems too good to be true, then it probably is. Nonetheless, the citation analysis provided herein suggests that researchers have been slow to adopt studies reporting negative and null results for this field, as the early studies with large effect sizes remain the most highly cited among all articles in our dataset. The earlier that a healthy skepticism is applied, the less impact inflated results may have on the scientific process and the public perception of scientists. Ultimately, independent replication should be established before new results are to be trusted and promoted broadly. [END] --- [1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001511 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/