(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Inferring a spatial code of cell-cell interactions across a whole animal body [1] ['Erick Armingol', 'Bioinformatics', 'Systems Biology Graduate Program', 'University Of California', 'San Diego', 'La Jolla', 'California', 'United States Of America', 'Department Of Pediatrics', 'Abbas Ghaddar'] Date: 2022-12 Cell-cell interactions shape cellular function and ultimately organismal phenotype. Interacting cells can sense their mutual distance using combinations of ligand-receptor pairs, suggesting the existence of a spatial code, i.e., signals encoding spatial properties of cellular organization. However, this code driving and sustaining the spatial organization of cells remains to be elucidated. Here we present a computational framework to infer the spatial code underlying cell-cell interactions from the transcriptomes of the cell types across the whole body of a multicellular organism. As core of this framework, we introduce our tool cell2cell, which uses the coexpression of ligand-receptor pairs to compute the potential for intercellular interactions, and we test it across the Caenorhabditis elegans’ body. Leveraging a 3D atlas of C. elegans’ cells, we also implement a genetic algorithm to identify the ligand-receptor pairs most informative of the spatial organization of cells across the whole body. Validating the spatial code extracted with this strategy, the resulting intercellular distances are negatively correlated with the inferred cell-cell interactions. Furthermore, for selected cell-cell and ligand-receptor pairs, we experimentally confirm the communicatory behavior inferred with cell2cell and the genetic algorithm. Thus, our framework helps identify a code that predicts the spatial organization of cells across a whole-animal body. Neighboring cells coordinate gene expression through cell-cell interactions, enabling proper functioning in multicellular organisms. Hence, intercellular interactions can be inferred from gene expression. We use this strategy to define a molecular code bearing spatial information of cell-cell interactions across a whole animal body. We develop a computational framework to infer the first cell-cell interaction network in Caenorhabditis elegans from its single-cell transcriptome, and show a negative correlation between interactions and intercellular distances, which is driven by a combination of ligand-receptor pairs following spatial patterns across the C. elegans’ body, i.e., the spatial code. Thus, our framework uncovers molecular features crucial to defining spatial cell-cell interactions across a whole body; a strategy that can be readily applied in higher organisms. Funding: EA is supported by the Chilean Agencia Nacional de Investigación y Desarrollo (ANID) through its scholarship program DOCTORADO BECAS CHILE/2018 - 72190270, the Fulbright Chile Commission, and the Siebel Scholar Foundation. This work was further supported by NIGMS grant R35 GM119850 to NEL, a Lilly Innovation Fellows Award to CJJ, Jefferson Foundation Award to AG, J Yang Foundation Fellowship to HLH, PEW Charitable Trust Award and a generous funding from the W. M. Keck Foundation to EJO. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Data Availability: The single-cell RNA-seq dataset (GEO accession code GSE98561), the 3D digital atlas of C. elegans including cell annotations based on the cell types in the scRNAseq dataset ( S6 Table ), the manual curated list containing 245 ligand-receptor interactions ( S1 Table ), and the consensus list from the GA-selection containing 37 interactions ( S3 Table ) are available in a public Code Ocean capsule ( https://doi.org/10.24433/CO.4688840.v2 ). All analyses performed in this work, their respective codes (implemented in Python and Jupyter Notebooks), all data, and instructions to use them are available in a public repository ( https://github.com/LewisLabUCSD/Celegans-cell2cell ). Reproducible runs of our analyses can be performed in a public Code Ocean capsule ( https://doi.org/10.24433/CO.4688840.v2 ). Our open-source suite, cell2cell, is for inferring cell-cell interactions from bulk or single-cell RNA-seq data, using or not spatial information, and is available in a GitHub repository ( https://github.com/earmingol/cell2cell ). Caenorhabditis elegans is an excellent model for studying CCIs in a spatial context across a whole body [ 14 ]. This animal has fewer than 1,000 somatic cells stereotypically arranged across the body, whose locations have been described in a 3D atlas [ 15 ]. Despite the small number of cells, the intercellular organization in C. elegans shows complexity comparable to higher-order organisms. Taking advantage of these features, here we use scRNA-seq data from C. elegans to compute CCIs and assess which ligand-receptor pairs could govern an intercellular spatial code across the body. For this purpose, single-cell transcriptome data [ 16 ] were integrated with a 3D-atlas of cells of C. elegans [ 15 ], while we built the most comprehensive list of ligand-receptor interactions in C. elegans for CCI analyses. Next, we compared our CCI predictions to literature and found them consistent with previous studies independently reporting relevant roles of the identified LR interactions as encoders of spatial information. Additionally, we experimentally tested uncharacterized CCIs, and validated in situ that adjacent cells co-express the LR pairs computationally inferred to contribute to the spatial code. Thus, together, we demonstrate that single-cell RNAseq data can be used to define a genotype-spatial phenotype link for the whole body in a multicellular organism. CCIs can be inferred from the gene expression levels of ligands and receptors [ 9 ]. Although spatial information is lost during tissue dissociation in conventional bulk and single-cell RNA-sequencing technologies (scRNA-seq) [ 10 ], inferring CCIs from transcriptomics can help elucidate how multicellular functions are coordinated by both the molecules mediating CCIs and their spatial context. Indeed, previous studies have proven that gene expression levels still encode spatial information that can be recovered by adding information such as protein-protein interactions and/or microscopy data [ 10 – 13 ]. For example, RNA-Magnet inferred cellular contacts in the bone marrow by considering the coexpression of adhesion molecules present on cell surfaces [ 12 ], while ProximID used gene expression coupled with microscopy of cells to construct a spatial map of cell-cell contacts in bone marrow [ 11 ]. Thus, we propose that CCIs inferred from transcriptomics could be extended to assess whether one can find, in RNA, a spatial code of intercellular messages that defines spatial organization and cellular functions across the whole body of a multicellular organism. Cell-cell interactions (CCIs) are fundamental to all facets of multicellular life. They shape cellular differentiation and the functions of tissues and organs, which ultimately influence organismal physiology and behavior. CCIs often take the form of secreted or surface proteins produced by a sender cell (ligands) interacting with their cognate surface proteins in a receiver cell (receptors).The nature of CCIs is constrained by the distance between interacting cells [ 1 – 3 ], and, in turn, CCIs follow spatial patterns of interaction [ 4 ]. These patterns are important since they allow CCIs to define cell location and community spatial structure [ 3 , 5 ]. For instance, some molecules mediating CCIs form gradients that serve as a spatial cue for other cells to migrate [ 6 , 7 ]. In addition, co-occurrence of ligands and receptors are strongly defined by their spatial neighborhoods [ 8 ], and cells can use these signals to sense spatial proximity to other cells [ 3 ]. Thus, it is reasonable to speculate that there is a spatial code embedded in ligand-receptor (LR) interactions across the body of multicellular organisms; a code that encodes spatial information and defines the distribution of cells in tissues and organs. Results Computing cell-cell interactions A first step to study cell-cell interactions can be to reveal active intercellular communication pathways from the coexpression of the corresponding LR pairs in any particular pair of cells. Communication scores can be assigned to each LR pair based on the RNA expression levels of their encoding genes in a given pair of sender and receiver cells [17–22]. Communication scores are then aggregated into an overall CCI score for each pair of cells, often represented by the number of active (expressed) LR pairs (LR Count score [18]), and in other cases by the sum of the LR expression product (ICELLNET score [23]). Higher numbers of active LR pairs and higher sum of expression LR levels can represent stronger cell-cell interactions [9]. However, these methods disregard that a high CCI score could result just by chance when one of the interacting cells promiscuously expresses many different ligands and/or receptors, or when the expression levels of a few LR pairs are too high, respectively. In contrast, we propose a novel CCI score that is based on the idea that high CCI scores should represent a high but also specific complementarity in the production of ligands and receptors between the interacting cells (Fig 1). PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 1. Calculation of the modified Bray-Curtis CCI score. (A) To represent the overall interaction potential between cell A and cell B, our CCI score is computed from two vectors representing the ligands and receptors independently expressed in each cell. If only the ligands from one cell and the cognate receptors on the other are considered (“Cell A to Cell B” half or “Cell B to Cell A” half, independently), the score would be a directed score for representing the interaction (one cell is the sender and the other is the receiver). However, our score is undirected by considering both ligands and receptors of each cell to build the vector (both halves simultaneously, indicated with the yellow rectangle on the left). Thus, the vector of each cell is built with both directed halves of molecule production (e.g., top half possess ligands of cell A while the bottom half considers its receptors, generating a unique vector with both the ligands and the receptors of cell A). (B) Toy examples for computing our score for the interaction of Cell A and Cell B. Here, both possible directions of interaction are represented to show that they result in the same (undirected) score. https://doi.org/10.1371/journal.pcbi.1010715.g001 The specific complementarity captured by our Bray-Curtis score is also intended to represent a cell-cell potential of interaction that may respond to or drive intercellular proximity. Cells can sense the number of receptors that are occupied by signals from surrounding cells [3,24] and higher occupancy can indicate greater proximity of communicating cells [25]. Thus, our score is computed from the mRNA expression of ligands and receptors in pairs of interacting cells in a way that accounts for the usage fraction of the total number of expressed ligands and receptors (Fig 1A). The main assumption of our CCI score is that more proximal cells co-express more complementary ligands and receptors between the pair of cells. In other words, for any given pair of cells, cells are defined as closer when a greater fraction of the ligands produced by one cell interacts with cognate receptors on the other cell and vice versa, as this increases their potential of interaction in an undirected manner (Fig 1B). To facilitate the implementation of our computational framework to predict a spatial code of CCIs and perform other general CCI analyses that do not rely on spatial information, we developed cell2cell. This open source tool infers intercellular interactions and communication using any gene expression matrix and list of LR pairs as inputs (https://github.com/earmingol/cell2cell), and depending on the purpose of a study, cell2cell also allows using other CCI scores beyond our Bray-Curtis score (e.g., LR Count and ICELLNET scores). Cell-type roles and spatial properties are captured by computed cell-cell interactions To assess whether our Bray-Curtis score captures spatial properties associated with intercellular distances, we used C. elegans data since, among other relevant characteristics, this model organism has a stereotypical distribution of cells across its whole body that has been extensively studied through microscopy, and reported for 357 of its individual cells in a 3D atlas [15]. To compute the complementarity of interaction between C. elegans cells, an extensive list of functional LR interactions is needed. However, while much is known about C. elegans, knowledge of its LR interactions remains dispersed across literature or contained in protein-protein interaction (PPI) networks that include other categories of proteins. Thus, we first generated a list of 245 ligand-receptor interactions in C. elegans (S1 Table). Next, we used this list to determine the presence or absence of mRNAs encoding ligands and receptors in each cell identified in the single-cell transcriptome of C. elegans [16]. Briefly, this dataset takes a matrix of gene expression data with the aggregated values from all individual cells with the same annotation. Of the 27 cell types identified across the body of C. elegans, we considered only the 22 cell types that we were able to assign a spatial location in a previously published 3D atlas [15]. After integrating this aggregated single-cell transcriptomic data with the list of LR pairs, we inferred the active (expressed) LR pairs in all pairs of cell types by using a binary communication score (S2 Table). Next, we aggregated the respective communication scores for each cell pair with our Bray-Curtis metric, generating the first predicted network of CCIs in C. elegans that measures the complementarity of interacting cell types given their active LR pairs (Fig 2A). PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 2. Cell-cell interactions and communication in C. elegans. (A) Heatmap of CCI scores obtained for each pair of cell types using the curated list of LR pairs. An agglomerative hierarchical clustering was performed on a dissimilarity-like metric by taking the complement (1-score) of CCI scores, disregarding autocrine interactions. Cell types are colored by their lineages as indicated in the legend. Lineages and colors were assigned previously [16]. (B) UMAP visualization of CCIs. Dots represent pairs of interacting cells and they were projected based on their Jaccard distances, which were computed from the LR pairs expressed in the directed interactions between cells (one cell is producing the ligands and the other the receptors). Dots are colored by either the sender cell (left) or the receiver cell (right), depending on their lineages as indicated in the legend of (A). A readable version of the data used for this projection is available in S2 Table, where names of LR pairs and their communication scores are specified for each cell pair. Another UMAP visualization based on a more appropriate similarity metric is available in S1 Fig, which uses the Rand index that accounts for both active and inactive LR pairs. Using the Rand index still represents the same behavior of sender cells driving similarities. (C) Receiver operating characteristic (ROC) curves of random forest models for classifying cell-cell pairs from their CCI scores computed with different approaches as indicated in the legend. These models predict the intercellular distance range (short-, mid-, or long-range distance, as defined in the N1 Fig in S1 Text). For each classifier, the mean (solid line) ± standard deviation (transparent area) of the ROCs were computed with 3-fold stratified cross validations. The area under the curve (AUC) for the ROC curves is shown in the legend, detailing the mean ± standard deviation across all distance-range classifications. Separate evaluations for the distance ranges are provided in S3 Fig. https://doi.org/10.1371/journal.pcbi.1010715.g002 After determining the potential for interaction between every pair of cell types from the single-cell transcriptome of C. elegans, we grouped the different cell types based on their interactions with other cells through an agglomerative hierarchical clustering (Fig 2A). This analysis generated clusters that seem to represent known roles of the defined cell types in their tissues. For instance, neurons have the largest potential for interactions with other cell types, especially with themselves and muscle cells. This suggests that these cell types use a higher fraction of all possible communication pathways, which is consistent with the high molecule interchange that occurs at the neuronal synapses and the neuromuscular junctions [26]. Also, seemingly in line with basement membranes surrounding germline cells and physically constraining their ability to communicate with other cell types [27,28], germline cells have the lowest CCI potential with other cell types. Thus, the results suggest that our method may be properly capturing the nature of the interactions between vastly different cell pairs. We further observed that pairs of interacting cells tend to be grouped by the sender cells (i.e., those expressing the ligands), but not by the receiver cells (i.e., those expressing the receptors) (Figs 2B and S1). Remarkably, our result is consistent with previous findings that ligands are produced in a cell type-specific manner by human cells, but receptors are promiscuously produced [29]. While the study used a network-based clustering of ligands and receiver cell connections, we used UMAP [30,31] to visually summarize the Jaccard similarity [32] between pairs of interacting cell types, indicating this similar result from two different approaches could be biologically meaningful. Correspondingly, the coexpression of ligands and their cognate receptors follows a more similar behavior in cell pairs where the sender cells are of the same type, while the receiver cell types can be disregarded (S2 Fig). Using the overall CCI scores computed for the cell-cell pairs in C. elegans, we next evaluated the ability of our Bray-Curtis score to separate distinct ranges of intercellular distances (short, mid, and long range, as defined in N1 Fig in S1 Text). To measure this ability, a classifier was trained by using the CCI scores as inputs and the intercellular-distance categories as outputs, and the performance was evaluated through a Receiver Operating Characteristic (ROC) curve and its area under the curve (AUC). In this regard, the Bray-Curtis score performed better than a random model (avg. AUC of 0.65, Figs 2C and S3). In addition, we compared our score with other overall CCI scores, including those aggregated from binary communication scores, such as the number of active LR pairs (LR Counts) and the cell-type specific probability (Smillie) [33] and continuous-based scores, such as the sum of the LR expression product (ICELLNET) [23] and the weight of significant LR pairs (CellChat) [34]. Under similar conditions of comparison, our Bray-Curtis score resulted to be the score that better separates intercellular-distance ranges, even slightly higher than the ICELLNET score (avg. AUC of 0.63), followed by the LR count–the most employed overall CCI score–(avg. AUC of 0.57). Interestingly, CCI scores based on permutations (CellChat and Smillie) had the lowest performance in separating intercellular distance ranges (avg. AUC ~0.5). However, the strength of permutation-based scores is that they better identify cell-type specific LR pairs and reduce the number of false positives in this regard, while they disregard LR pairs that are shared across multiple cell types. Thus, spatial proximity seems to be encoded by activation/inactivation of signaling mechanisms that are shared across multiple cell types rather than in very specific cell-type pairs. [END] --- [1] Url: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010715 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/