(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Systematic assessment of the replicability and generalizability of preclinical findings: Impact of protocol harmonization across laboratory sites [1] ['María Arroyo-Araujo', 'Groningen Institute For Evolutionary Life Sciences', 'University Of Groningen', 'Groningen', 'The Netherlands', 'Bernhard Voelkl', 'Animal Welfare Division', 'Vetsuisse Faculty', 'University Of Bern', 'Bern'] Date: 2022-11 Abstract The influence of protocol standardization between laboratories on their replicability of preclinical results has not been addressed in a systematic way. While standardization is considered good research practice as a means to control for undesired external noise (i.e., highly variable results), some reports suggest that standardized protocols may lead to idiosyncratic results, thus undermining replicability. Through the EQIPD consortium, a multi-lab collaboration between academic and industry partners, we aimed to elucidate parameters that impact the replicability of preclinical animal studies. To this end, 3 experimental protocols were implemented across 7 laboratories. The replicability of results was determined using the distance travelled in an open field after administration of pharmacological compounds known to modulate locomotor activity (MK-801, diazepam, and clozapine) in C57BL/6 mice as a worked example. The goal was to determine whether harmonization of study protocols across laboratories improves the replicability of the results and whether replicability can be further improved by systematic variation (heterogenization) of 2 environmental factors (time of testing and light intensity during testing) within laboratories. Protocols were tested in 3 consecutive stages and differed in the extent of harmonization across laboratories and standardization within laboratories: stage 1, minimally aligned across sites (local protocol); stage 2, fully aligned across sites (harmonized protocol) with and without systematic variation (standardized and heterogenized cohort); and stage 3, fully aligned across sites (standardized protocol) with a different compound. All protocols resulted in consistent treatment effects across laboratories, which were also replicated within laboratories across the different stages. Harmonization of protocols across laboratories reduced between-lab variability substantially compared to each lab using their local protocol. In contrast, the environmental factors chosen to introduce systematic variation within laboratories did not affect the behavioral outcome. Therefore, heterogenization did not reduce between-lab variability further compared to the harmonization of the standardized protocol. Altogether, these findings demonstrate that subtle variations between lab-specific study protocols may introduce variation across independent replicate studies even after protocol harmonization and that systematic heterogenization of environmental factors may not be sufficient to account for such between-lab variation. Differences in replicability of results within and between laboratories highlight the ubiquity of study-specific variation due to between-lab variability, the importance of transparent and fine-grained reporting of methodologies and research protocols, and the importance of independent study replication. Citation: Arroyo-Araujo M, Voelkl B, Laloux C, Novak J, Koopmans B, Waldron A-M, et al. (2022) Systematic assessment of the replicability and generalizability of preclinical findings: Impact of protocol harmonization across laboratory sites. PLoS Biol 20(11): e3001886. https://doi.org/10.1371/journal.pbio.3001886 Academic Editor: Cilene Lino de Oliveira, Universidade Federal de Santa Catarina, BRAZIL Received: June 27, 2022; Accepted: October 24, 2022; Published: November 23, 2022 Copyright: © 2022 Arroyo-Araujo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: All raw data files and the code for the statistical analysis are available from the OSF database (DOI: 10.17605/OSF.IO/8F6YR). All other summaries and raw data are within the paper and its Supplementary files. Funding: This project has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 777364. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and the European Federation of Pharmaceutical Industries and Associations. All authors in this publication were granted this funding. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Introduction In recent years, the scientific community has raised concerns about the replicability of results, particularly in the preclinical biomedical sciences. Defining results replicability as the ability to duplicate results from a previous scientific claim supported by new data [1,2]. Various causes of poor replicability have been proposed, including the diverse methodologies used in the field and the lack of rigorous research practices (e.g., underpowered studies, risks of biases, inadequate statistics) [3–7]. Although these causes can certainly explain part of the problem, they permeate different science subfields differently [8] and cannot account for the poor replicability of results on their own. To our knowledge, no systematic studies have been performed to investigate the effect of protocol standardization within laboratories and protocol harmonization across laboratories regarding between-laboratory variation in light of replicability and generalizability of results. The current and most common research practice of conducting single laboratory studies under standardized conditions has recently been proposed as a source of the high variability of results between laboratories [9,10]. Whenever rigorous standardization of environmental conditions within a study leads to homogenous study populations, the study results may become idiosyncratic as the study population is only representative of the narrow set of conditions in which it was tested. This increases the risk of replication failure even under only slightly different conditions as standardized; such single-site study designs do not allow predicting changes in the expression of the phenotype in response to different environmental influences. The change in the expression of the phenotype is caused by biological variation [11], which describes how genetic variation interacts with environmental factors to which experimental animals are exposed throughout development (gene–environment interactions), thereby shaping their phenotype [12]. Another approach taken to deal with the variability of results across laboratories is to harmonize the same standardized protocol across studies [13]. If harmonization includes those environmental and experimental factors that may influence the phenotype expression, it should result in replicable results. However, current evidence is ambiguous. Whereas in one study a rigorously standardized protocol that was harmonized across 3 laboratories resulted in many nonreplicable findings [14], another study that also followed protocol standardization and harmonization across 3 sites found similar phenotypic and pharmacological effects; however, the proportion of variation explained by lab was not formally assessed [15]. This suggests that this experimental approach may be missing to address some unknown source of variability between sites. Certainly, there are inherent differences between laboratory environments that are not addressed in multi-laboratory protocols because of the low feasibility of harmonizing them or simply because these differences are not known (e.g., different ways to handle the animals, diversity in equipment). Some of these differences likely interact with the phenotype expression; this interaction may be accentuated when other sources of variability are minimized (i.e., standardized). Thus, although the same standardized protocol is implemented in different sites, it may still produce different results [16]. Still, there are no accounts to evaluate the impact that protocol harmonization across sites has on between-lab variability. Furthermore, it has been recently suggested that if the between-lab variation can be incorporated within a single lab, the replicability of results between studies would increase [17–19]. Such an approach has been previously implemented [17,19–21]; yet, it has not been compared to a nonharmonized study across laboratories to assess the effect on between-lab variation. To shed light on the effects of protocol harmonization across laboratories, we studied on one side whether harmonization of a standardized protocol reduces between-lab variation in comparison to a nonharmonized local protocol. Furthermore, we tested the effect of systematic heterogenization to assess whether within-lab heterogenization can further reduce between-lab variation compared to the standardized protocol. The experiments performed in this paper are defined as knowledge-claiming research according to Bespalov and colleagues [22]. Discussion Overall, our study shows that harmonization of experimental protocols across sites reduced the outcome variability across laboratories compared to site-specific versions of the protocol (i.e., local protocol). Moreover, we found that sex did not affect the results and that illumination of the test arena and time of testing relative to the light–dark cycle were not suitable factors to systematically introduce variation in the results of an open field test in C57BL/6 mice. Regarding the time of testing, we could speculate that the treatment effect had such a strong effect on the outcome variable that there was no room for the time variable to further affect the outcome. Another possible explanation is that this environmental factor does not have a strong influence on the particular outcome tested with the current experimental setup (e.g., the drug and dose used). The present study showed that between-lab variation is rather large when lab-specific protocols are followed (e.g., local protocol), and, although it was reduced by protocol harmonization, it remained considerable. This corroborates earlier findings [14] that site-specific variation in conditions produces between-lab variability that cannot be neutralized by protocol harmonization across sites. This in turn affects the replicability of study outcomes. Although the standardized protocol successfully produced replicable results across laboratories, the sensitivity to detect drug treatment effects can still be improved as not all sites found a significant drug treatment effect in stage 3 for the lowest dose (Fig 4; right panel). The choice of the 2 doses tested in stage 3 was based on a literature review performed by one of the partners where the higher dose had a robust effect while the lower dose showed conflicting results. It seems possible that the discrepancy between the sites is due to inherent differences between laboratories that were heightened by the stringent local standardization. It was suggested that a way around this would be to introduce systematic variation within sites, hoping this will account for the variance between sites and test the same drug treatments [17,23]. To test this hypothesis, we introduced systematic variation to the standardized protocol. Contrary to our expectation, this heterogenized cohort did not increase the overall variability, and neither did it decrease the between laboratory variability in outcomes when compared to standardized alone. The overall outcome of the results did not change (i.e., similar drug treatment effects were obtained following the heterogenized and standardized cohorts). Therefore, we could not confirm that diversifying the environmental conditions further reduces the variability across laboratories. The current selection of “heterogenizing” factors was rather limited by the feasibility to diversify them across all labs. Further factors, for example, genotypic variation of the study sample, should be considered for future studies as they may have stronger power to introduce within-study variability than environmental variability as seen in other disciplines [24]. A recent initiative that could prove helpful for identifying heterogenization factors is the Platform for the Exchange of Experimental Research Standards (PEERS) developed to rate the factors and variables most likely to influence experimental outcomes [25]. Moreover, the standardized protocol showed to be robust to the introduction of animals of both sexes in stage 3. Sex did not increase the variability of results across sites compared to the standardized protocol (Table C in S3 Supplementary Stage) and did not account for the variance in the data. In this case, sex may be included without a need to increase the sample size. However, sex should always be included as a biological variable in biomedical research for reasons of inclusion, regardless of its effect on the results [23]. While the harmonization of a standardized protocol across laboratories decreased the overall variability of results compared to when each laboratory followed its own local protocol, the question arises whether these results, although replicable across the participating laboratories, could be further generalized to other laboratories outside the present study. Assuming that the participating laboratories are a representative random sample of laboratories doing phenotyping studies, we could say our results can be extrapolated to other laboratories; however, caution must be taken as the participating labs were all highly interested in data quality and results replicability. This fact might have biased the current sample. To be able to extrapolate an experimental result to other conditions or populations (i.e., have a broad inference space), the study population has to be representative of the desired target population. Our finding that systematically introducing additional factors (illumination and time of testing in stage 2 and sex in stage 3) did not affect the overall variation shows that diversifying a study population and its environment does not necessarily lead to more “noisy” experimental outcomes but allows to broaden the inference space and increase the external validity of the results and thus their generalizability [26]. This supports diversifying environmental factors that (i) are not tightly linked with the outcome measure or (ii) are not directly involved in the research question as a means to increase the robustness of results. On the other hand, it is necessary to continue exploring the effects of protocol harmonization in results variability since our results suggest that although harmonizing protocols across laboratories reduced between-lab variation, the laboratory factor explains most of the variance, meaning that standardizing is not enough. Conclusions Altogether, we can say that both harmonized (i.e., standardized and heterogenized) open field protocols consistently and significantly reduced the between-lab variability of the behavioral outcome. In addition, the protocols resulted in consistent treatment effects across laboratories that were also replicable within laboratories across the different stages. The replicability of results within and between laboratories in the present study highlights the impact of study-specific variation in between-lab variability, and the importance of transparent and fine-grained reporting of methodologies, and research protocols. It also shows that it is possible to diversify the study sample by incorporating blocking factors like sex or introducing systematic heterogenization of conditions without the need to increase the overall sample size. Acknowledgments We would like to thank Eva-Lotta von Rüden and Sarah Glisic for their contribution to the Muenchen site. This publication reflects only the authors’ view and the Innovative Medicines Initiative 2 Joint Undertaking is not responsible for any use that may be made of the information it contains. [END] --- [1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001886 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/