(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org. Licensed under Creative Commons Attribution (CC BY) license. url:https://journals.plos.org/plosone/s/licenses-and-copyright ------------ Genetic architecture of variation in Arabidopsis thaliana rosettes ['Odín Morón-García', 'The National Plant Phenomics Centre', 'Institute Of Biological', 'Rural', 'Environmental Sciences', 'Ibers', 'Aberystwyth University', 'Aberystwyth', 'United Kingdom', 'Gina A. Garzón-Martínez'] Date: 2022-02 Abstract Rosette morphology across Arabidopsis accessions exhibits considerable variation. Here we report a high-throughput phenotyping approach based on automatic image analysis to quantify rosette shape and dissect the underlying genetic architecture. Shape measurements of the rosettes in a core set of Recombinant Inbred Lines from an advanced mapping population (Multiparent Advanced Generation Inter-Cross or MAGIC) derived from inter-crossing 19 natural accessions. Image acquisition and analysis was scaled to extract geometric descriptors from time stamped images of growing rosettes. Shape analyses revealed heritable morphological variation at early juvenile stages and QTL mapping resulted in over 116 chromosomal regions associated with trait variation within the population. Many QTL linked to variation in shape were located near genes related to hormonal signalling and signal transduction pathways while others are involved in shade avoidance and transition to flowering. Our results suggest rosette shape arises from modular integration of sub-organ morphologies and can be considered a functional trait subjected to selective pressures of subsequent morphological traits. On an applied aspect, QTLs found will be candidates for further research on plant architecture. Citation: Morón-García O, Garzón-Martínez GA, Martínez-Martín MJP, Brook J, Corke FMK, Doonan JH, et al. (2022) Genetic architecture of variation in Arabidopsis thaliana rosettes. PLoS ONE 17(2): e0263985. https://doi.org/10.1371/journal.pone.0263985 Editor: Karthikeyan Adhimoolam, Jeju National University, REPUBLIC OF KOREA Received: July 23, 2021; Accepted: February 1, 2022; Published: February 16, 2022 Copyright: © 2022 Morón-García et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: All relevant data are within the paper and its Supporting Information files. Funding: JHD and AC acknowledge support from Biotechnology and Biological Sciences Research Council for grants BB/J004464/1 and BBS/E/W0012844A (https://www.ukri.org/councils/bbsrc/). JHD acknowledges Horizon2020 support via project EPPN2020 Grant agreement ID: 731013. OM-G and GG-M acknowledge receipt of an AberDoc scholarship from Aberystwyth University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Introduction Understanding the role of environmental conditions on plant development is of increasing importance to counterbalance climate change effects in agricultural species [1]. Natural variation found in nature can reveal features that are subject to selective pressure, including that exerted by the local environment [2–4]. For example, shoot architecture integrates the interaction between genetic determinants, developmental history, the environment and adaptation to particular lifestyles [5, 6]. The rosette forms during the vegetative phase of many species including several major crops in the Brassicaceae family. The rosette is an attractive model to understand the genetic architecture of variation in plant form thanks to the quantity and quality of genetic and natural resources available in the model species, Arabidopsis [2, 7, 8]. Rosettes display limited internode extension and generally are comprised of a spiral of leaves, overlapping to a greater or lesser extent [9]. This leaf assemblage has recurrently appeared along angiosperm phylogeny [10], yet its adaptive significance remains unclear. Rosettes probably occupy the space for photosynthesis, excluding nearby plants by establishing ground cover [11] while remaining cryptic to larger herbivores [12]. Leaf distribution and size may respond to environmental factors such as light interception [13], herbivore grazing [14], abiotic stress [4, 15]. We rationalise that, over evolutionary time, small changes in leaf size and shape, internode extension, and other developmental processes could have been selected and therefore, shape variation could give insight into the genetic control of the rosette habit. Shape is a ubiquitous concept in biological research with context dependant morphometrical methodologies [16, 17]. Outline shape descriptors [18–21] define specific and quantitative [22] aspects of form, e.g. roundness [23]. They can be combined with computer vision [24, 25] for high-throughput phenotyping [26] of plants organs, assemblages, full plants and field plots. These are global descriptors [27, 28] with some degree of overlapping [29–31] and dependency [32]. For example, a cogwheel, a starfish-like shape or Arabidopsis rosettes are visually different but with similar roundness scores and different circularity ones, due to their different definition [33, 34]. High dimensional vectors of descriptors need multivariate statistics to discriminate shapes numerically. Particularly, Principal Component Analysis (PCA) [35, 36] build latent variables [37, 38] (or latent shapes [39]) generating a “morphospace” [40–42], with specific mathematical properties [43, 44] e.g. relative warps and synthetic shape axis [45–47]. PCA-based morphospace cleaves univariate morphological features [47] that can be analysed as individual morphological phenotypes [43] in quantitative trait loci (QTL) mapping experiments [48–50]. Also, PCA is used to separate size from shape [51, 52], study allometric growth [41, 53] or when sample size is much smaller than the number of variables measured. To interrogate shape and size variation over populations large enough to reveal associations with the underlying genetics requires measurement techniques that are scalable. Imaging techniques are readily scalable and have been widely used to measure growth rate [3, 54, 55], leaf number and size [56], leaf hyponasty [57] and hypocotyl angle [58], and can be used to estimate morphological parameters. We have previously characterized rosette morphology in 19 Arabidopsis ecotypes using image based approaches during growth and development [22] and other studies have used similar descriptors for screening large Arabidopsis populations and mutants, tracking morphological changes over time and allowing a more precise dissection of developmental timing of plant growth and development [59–63]. Image analysis can quantify size and shape variation due to defined genetic lesions in rosette plants [64] and used to identify QTL for variation in rosette area, revealing a number of candidate genes for growth and size traits [3, 4, 62, 63, 65–67]. More recently, a multi-scale approach was used with the purpose of linking genes to plant shape at several scales, from the whole plant to cells and tissues attending to quantitative measurements extracted from digital images and models [50]. It can be argued that research on plant morphology and gene mapping would benefit advanced automatic phenotyping methods scalable to large populations [68, 69]. For that reason, it is important to find and describe measurements that ensure an accurate capture of morphological variability with biological meaning. QTL mapping experiments leverages the phenotypic variation associated with the standing genetic variation. Biparental crosses estimate linkage between markers [70, 71] and between markers and phenotypes e.g. Composite Interval Mapping and Haley-Knot regression [72, 73]. Mapping resolution depends on recombination rate, marker distribution [74, 75] and trait complexity [76]. In contrast, natural populations can harbour higher genetic and phenotypic variation [77], have shorter haplotypic regions due to higher crossover number [78] and may present some population structure that confound analysis. Genome-Wide Association Mapping (GWAS) exploits linkage disequilibrium [79] to associate statistically traits to genotyped markers [80] by ANOVA-like tests [81, 82]. Linear mixed models correct for experimental factors [83–85] like population structure, replication levels, treatments, etc. Multiple testing, for many markers, need False Discovery Rate correcting procedures like Bonferroni or Benjamini-Hochberg. Advanced mapping populations, like Multiparent Advanced Generation Inter-Cross (MAGIC) [86], trade the advantages and disadvantages of classical mapping populations in terms of resolution and efficiency [87]. These are crosses of up to 20 genetically and phenotypically diverse parental types [88] and several generations of selfing. The strategy reduces population structure and generates small haplotypic mosaics, providing finer resolution mapping than bi-parental with less false positives than natural populations. MAGIC allows haplotype reconstruction for markers alleles as founder-of-origin [89–91] tracking genetic variation back to parental sequence level [92]. Tailored bayesian models [93, 94] improves parameter estimation and permutation-based procedures reduce False Discovery Rates [95]. The use of highly diverse natural accessions increases the opportunity to find trait-associated markers that may not be available in laboratory experimental populations. In particular, the wild-type ecotypes phenotyped in [22] are the parentals of the MAGIC [96]. This population consists of a large genetically-unstructured population of Recombinant Inbred lines (RILs) generated by inter-crossing 19 Arabidopsis parental lines and several generations of single seed descent [96]. The Arabidopsis MAGIC population displays a high degree of phenotypic variation in terms of rosette shape and size, making it suitable to dissect the genetics of complex phenotypic traits. Here, we report machine-assisted acquisition of time-stamped images from MAGIC RILs during their rosette development and we extract a range of morphology descriptors for rosette size and shape variation. We continue the work of [22] and extend their approach to quantify and explore the underlying genetic basis of size and shape variation by a QTL mapping approach. Combined phenomics and genomics analyses identify 116 loci linked to the shape in the early developmental stages of Arabidopsis rosette growth. Material and methods Plant material Phenotyping was performed on a core set of 485 RILs (3 replicates of each) from the MAGIC population [96, 97]. Seeds were obtained from the Nottingham Arabidopsis Stock Centre (NASC). Plants were previously genotyped by [96], using 1260 single nucleotide polymorphisms (SNPs) at the Illumina GoldenGate assay. Growth conditions Experiments were performed in greenhouse chambers at the National Plant Phenomics Centre (NPPC) at IBERS, Aberystwyth University, UK. Seedlings were vernalized for 28 days at 5°C and 8h light/16h darkness cycle. This ensured that all genotypes germinated and flowered within the time course of the experiment. Single seedlings were pricked out into 6 cm diameter pots (half filled with vermiculite and the upper half with 30% grit sand/70% Levington F1 peat based compost) and were transferred after 7 days to PlantScreen Phenotyping System (Photon Systems Instrument, PSI, Brno, Czech Republic) and grown under controlled conditions (18°C, 14/10h photoperiod, white light ~ 400 μmol m-2 s-1). Plants were imaged, weighed and watered to a predefined target weight of 65% of field capacity, daily, until most plants had flowered. Image acquisition, processing and morphology descriptors Top view images were processed using an internal automatic workflow provided by the manufacturer. It performed the tasks of image processing, which calculated size and shape descriptors and stored them on the platform database. A detailed description of all descriptors can be found in [61] and Table 1. Rosette size was described by Projected Rosette Area (PRA) and Perimeter Length (PL). Rosette ground coverage comprised Compactness and Rotational Mass Symmetry (RMS) descriptors. Rosette deviation from a circle was measured with Slenderness of Leaves (SOL), Roundness (RND), Convex Hull Roundness (RCH), Isotropy (ISO) and Eccentricity (ECC). PPT PowerPoint slide PNG larger image TIFF original image Download: Table 1. Morphology descriptors used in this study. Modified from PlantScreen User Manual, v1.5, 2017. https://doi.org/10.1371/journal.pone.0263985.t001 Phenotypic data analysis All summaries and plots were performed using the R statistical computing environment [98]. Replicate values by Days After Sowing (DAS) were averaged and all calculations, including QTL mapping are performed using the mean value as representative of the RIL at given DAS. Principal components analysis (PCA, function prcomp from the package stats [98]) was calculated to generate an uncorrelated shape space, i.e. to eliminate remaining size-effects and correlations among descriptors. PCA was built with the correlation matrix and all 9 Principal Components were retained so that RILs distances remain constant, given overall mean and variance scaling. Therefore, this PCA does not reduce dimensionality but constructs an uncorrelated morphospace with common aspects of rosettes grouped in each principal component. PCA was performed on RIL- averaged values rather than individual values. Pairwise Pearson’s correlation across descriptors’ averages were calculated at each DAS (function cor and cor.test from the package stats [98]). Broad-sense heritability (H2) was calculated independently at each DAS for all shape descriptors to estimate the proportion of the phenotypic variance explained by genetic variation. Variance decomposition random-effect models were fitted (function lme from the package nlme [99]) with phenotype (P ij ) as dependent variable, RIL (R i with I as line number) as a random factor and random residuals (ε ij ) for each replicate plant j as P ij = μ+R i +ϵ ij [100]. Variance component estimates were extracted from the model with the function varcomp from the ape package [101]. Broad-sense heritability (H2) for each descriptor was estimated as , where V g is the variance among the RILs and V e is the environmental variance. The general workflow for the data analysis is summarized in Fig 1. PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 1. Workflow used in this study for phenotyping and QTL mapping. 485 RILs (3 replicates) were grown in 5x4 trays and imaged daily. Images were automatically segmented, rosettes were extracted, and analysed by device built-in pipelines. Shape descriptors and other metadata are recorded into a database. Shape descriptors and their PCA-derived morphospace were used for QTL mapping with happy.hbrem followed by gene search at ARAPORT11. https://doi.org/10.1371/journal.pone.0263985.g001 QTL mapping and candidate genes Each shape descriptor (9 variables) and principal component (9 PCs) were split by DAS and used as input for QTL mapping with the happy.hbrem R package. This software was specifically designed for multi-parental population analysis [89] and previously used for Arabidopsis MAGIC population [96]. RILs genomes are reconstructed as parental haplotype mosaic (happy) working out the Identity By Descent (IBD) using a Hidden Markov Model [89]. For each phenotype, a genomic scan fits a Hierarchical Bayes random effects (hbrem) model [92] with 19 random factors, corresponding to founder alleles, weighed by the IBD probabilities. A permutation test randomizing phenotypes 500 times, established a genome-wide threshold for statistically significant QTL and corrects for multiple testing according to [96, 102]. Finally, a QTL location is defined as the peak marker with largest logP-value (from hbrem procedure) within an interval where other SNPs pass the genome-wide P-value (from the resampling procedure). A boxplot with allelic phenotypic effect for each founder ecotype was calculated using the hbrem procedure. The amount of phenotypic variation explained by each marker was estimated using the package MagicHelpR (https://github.com/tavareshugo/MagicHelpR). The closest gene to the QTL marker was identified and gene annotations were retrieved from ARAPORT11 [103] using custom scripts. The total number of traits was 180 comprising 9 shape descriptors and 9 PCs multiplied by 10 days. Some QTLs were found at several days and traits. To make the analysis tractable, a procedure to select non redundant QTLs was developed as follows. An R script goes through all QTLs, in order of date, from 35 to 44 DAS, and then by shape descriptor, following the order in Table 1 and PC1, PC2 … PC9. At each day and descriptor, the R script saved any QTL that was not reported before and removes redundant ones. Therefore, the significant QTL list was sorted, first by day and then by descriptor. Then, the relevance of each shape descriptor and the length of the phenotyping experiment can be evaluated by the number of QTLs found each day and per variable. Discussion Plant development occurs throughout the individual lifetime [108–110] and the production of new organs, i.e. stems, roots, leafs or flowers, is constantly influenced by the environment [111–113]. Ecotypes adapted to local environments often differ in many traits, particularly in the arrangement, size and shape of such organs. Research on phenotypic variation often refers to “morphology”, “form” and “shape” by implicit and informal definitions. As a consequence, the concept of shape can be reduced to adjectives like long-short, round-elongated, sparse-dense or into categories without explicit parameterization. Computer vision based shape descriptors are precisely defined, their measurement is objective, repeatable and interpretable as compared to visual human experience [22, 114, 115]. Extending the strategy previously used to describe shape variation of the 19 parental accessions in [22], we have quantified the rosette shape of 485 RILs. We used these measurements to associate genetic and phenotypic variation. Arabidopsis MAGIC population captures a reasonable range of natural variation and the inter-cross results in highly recombinant lines with higher mapping resolution to dissect quantitative traits genetic architecture [116] than biparental populations [117]. The lack of structure in MAGIC populations reduces false positives rates, which is an important drawback in association mapping [96]. The MAGIC population has been previously characterised for flowering time, height and fitness [96, 118] with a specific advanced statistical method for QTL mapping [89, 91]. This method takes advantage of multiple parents to assign genetic variants as multi-allelic markers rather than bi-allelic. In summary, MAGIC populations represent the best of both worlds, in the sense of high variation, low structure, high resolution and precision in the statistical methods. As a potential disadvantage the set of 485 RILs, with three replicates per RIL, became 1455 plants to keep under strict environmental control and daily phenotyping. To overcome this difficulty, a mechanised phenomics approach was necessary. Our results support the idea that variation in rosette growth and shape involves multiple genes in a hierarchical control. This genetic structure should be able to exploit variable environmental conditions [119, 120], regulate heterochrony [109, 121], and enable a phenotypic response to variation in vernalization and photoperiod, e.g., flowering time or branching pattern. Shape descriptors were correlated, suggesting ecologically related trait syndromes [122]. Therefore, they can be grouped into functional traits. PRA, PL, and SOL form a cluster capturing information on size and length. ECC and RMS form another cluster describing rosette elongation. RND and ISO describe the pattern of leaf arrangement as in a circle or a star-like shape. A singleton comprising only RCH captures accurately the closeness of rosettes to a perfect cycle, regardless of the gaps between leaves. Compactness would be another singleton describing rosette coverage. Significantly associated markers were found using a dynamic QTL approach with a full combination of shape traits and DAS. This method yields up to 180 variables to test, thus increase the analytic effort with respect to single time point and single variable approaches, yet also increase the number of QTLs that would otherwise not be found using these other common strategies, e.g. phenotyping rosette size at the six leaves stage. The markers found at chromosome 2 form a region with several potential genes related to rosette morphology. A first example is the GPA-1 gene. In yeast, the GPA gene is related to signal transduction in pheromone response pathway [123]. In Arabidopsis, amongst other functions, it is related to blue light induction of phenylalanine production [124], abscisic acid responses [125] and modulation of hypocotyl elongation and leaf formation (recessive mutants show round leaves and elongated petioles associated to sugar signalling and response-associated cell death [126]. GWAS association with environmental variables in the 1001 genomes population found SNPs markers related with the γ-subunit of a heterotrimeric G-protein, AGG3, related with cold tolerance [8]. The AGG3 protein is related to seed and organ growth [127] and shape [128], connecting this activity with the single G-protein alpha subunit found in Arabidopsis, GPA-1 and their orthologs in rice. GPA-1 also regulates germination, seedling development, reaction to environmental changes and stomata opening by means of ABA signalling. Mutants for GPA-1 are sensitive to ABA signalling [129]. ABA, together with ethylene and gibberellins, affect phenotypic plasticity related variation in leaf architecture [130]. This candidate alone would support a highly significant QTL in this region but there were other QTLs close to this marker that may be novel. For example, the gene AT2G26240 encoding for the transmembrane protein 14C is suspected to be related with fatty acid transport, FAX7 fatty acid export 7 [131]. The gene ERECTA (AT2G26330) is involved in shade avoidance responses (SAS) and the general morphology [132]. The gene PHYB is a well-known photoreceptor involved in the shade avoidance syndrome (reviewed in [133]), playing an important role in canopy development and morphology [134]. The gene CPR-5 (AT5G64930) is not directly related with morphology but with plant defences, yet, changes in rosette shape have been reported in cpr5 mutants in response to light and altered salicylic acid levels [135, 136]. Another gene close to a marker QTL and involved in plant defence is ATG4G22300 (AtTIPSY1) [137]. Overall, from the 116 markers significantly associated with the rosette shape descriptors, most are located within loci with known or suspected regulatory functions (S5 Table). These were several membrane proteins and receptors, including a G-protein subunit like GPA-1 (AT2G26300), a protein kinase receptor like ERECTA (AT2G26330), transcription factors like, PHYTOCHROME RAPIDLY REGULATED1 (PAR1, AT2G42870), or chromo proteins like PHYB (AT2G18790) and hemoproteins, like HO2 (AT2G26550), participating on photomorphology and shade avoidance responses, by regulation of an auxin-responsive gene [138] and similar environment response phenotypes. This study focuses on the global shape of an organ assemblage, the A. thaliana rosette. Rosette leaf shape varies along the ontogenetic development according to their local environment [67, 139], e.g., in a single plant some leaves are longer than others, usually towards light sources, with crenate, undulate or entire borders. Yet, rosette appearance is distinguishable among ecotypes and similar within individuals of the same ecotype, especially when grown in homogenous conditions [22]. Thus, rosette shape variation is visible and genetically controlled within some range as predicted by the ‘continuum and process morphology’ [140–142]. Our results on shape descriptors heritability are consistent with these observations and build up on the concept that morphological traits act as functional ones [63, 143]. The QTLs found here agree with the theory that finely tuned genetic regulatory networks, linking and integrating environmental clues during ontogenetic development, are among the major contributions to plant local adaptations [144–147]. In this sense, our study introduces the automated plant phenomics as a relevant tool for the so called eco-evo-devo [148] with particular emphasis on morphology at a subspecies taxa level [149, 150]. From an applied biology perspective, the QTLs reported may be useful for further research either in their role on phenotypic regulation or the type of genetic variants they bear. For example, it can be argued that genetic manipulation of phytochromes or kinase receptors may potentiate crop adaptability to extreme environments or reduce undesirable variation due to early stage disturbances like short-term frost, drought or salt stress [15]. Acknowledgments The authors thank Candida Nibau for her thoughtful comments towards improving our manuscript and Karen Askew for technical assistance with plant care. [END] [1] Url: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0263985 (C) Plos One. "Accelerating the publication of peer-reviewed science." Licensed under Creative Commons Attribution (CC BY 4.0) URL: https://creativecommons.org/licenses/by/4.0/ via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/