(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org. Licensed under Creative Commons Attribution (CC BY) license. url:https://journals.plos.org/plosone/s/licenses-and-copyright ------------ Revealing mechanisms of infectious disease spread through empirical contact networks ['Pratha Sah', 'Department Of Biology', 'Georgetown University', 'Washington', 'District Of Columbia', 'United States Of America', 'Michael Otterstatter', 'British Columbia Centre For Disease Control', 'Vancouver', 'School Of Population'] Date: 2022-01 Abstract The spread of pathogens fundamentally depends on the underlying contacts between individuals. Modeling the dynamics of infectious disease spread through contact networks, however, can be challenging due to limited knowledge of how an infectious disease spreads and its transmission rate. We developed a novel statistical tool, INoDS (Identifying contact Networks of infectious Disease Spread) that estimates the transmission rate of an infectious disease outbreak, establishes epidemiological relevance of a contact network in explaining the observed pattern of infectious disease spread and enables model comparison between different contact network hypotheses. We show that our tool is robust to incomplete data and can be easily applied to datasets where infection timings of individuals are unknown. We tested the reliability of INoDS using simulation experiments of disease spread on a synthetic contact network and find that it is robust to incomplete data and is reliable under different settings of network dynamics and disease contagiousness compared with previous approaches. We demonstrate the applicability of our method in two host-pathogen systems: Crithidia bombi in bumblebee colonies and Salmonella in wild Australian sleepy lizard populations. INoDS thus provides a novel and reliable statistical tool for identifying transmission pathways of infectious disease spread. In addition, application of INoDS extends to understanding the spread of novel or emerging infectious disease, an alternative approach to laboratory transmission experiments, and overcoming common data-collection constraints. Author summary Network models are widely used to understand and predict infectious disease spread in human and animal populations. However, the choice of network model often relies on subjective expert knowledge or disease transmission experiments that are time-consuming and difficult to perform. We developed a novel tool, called INoDS (Identifying contact Networks of infectious Disease Spread), that uses robust statistical approach to establish relevance of a network model in explaining transmission pathways of an infectious disease outbreak. We used computer simulations and real-world dataset to test the accuracy of our tool and robustness to missing data. We found that INoDS is robust to common data-collection constraints, broadly applicable and accurate compared to current approaches. The tool that we have developed can therefore provide immediate epidemiological insights in the event of an epidemic outbreak, and can be used to improve targeted disease control. Citation: Sah P, Otterstatter M, Leu ST, Leviyang S, Bansal S (2021) Revealing mechanisms of infectious disease spread through empirical contact networks. PLoS Comput Biol 17(12): e1009604. https://doi.org/10.1371/journal.pcbi.1009604 Editor: Benjamin Althouse, University of Washington, UNITED STATES Received: November 8, 2020; Accepted: October 31, 2021; Published: December 20, 2021 Copyright: © 2021 Sah et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: All empirical datasets are available at: https://doi.org/10.7910/DVN/YAHRDJ The code for the INoDs software is available at: https://github.com/bansallab/INoDS-model. Funding: SB and PS was supported by the National Science Foundation Ecology and Evolution of Infectious Diseases Grant 1216054 (https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=5269). STL was supported by an Australian Research Council DECRA fellowship (DE170101132, https://www.arc.gov.au/grants/discovery-program/discovery-early-career-researcher-award-decra), and an ARC grant to CM Bull (DP130100145), which funded the sleepy lizard project (https://www.arc.gov.au/grants). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Introduction Host contacts, whether direct or indirect, play a fundamental role in the spread of infectious diseases [1–4]. Traditional epidemiological models make assumptions of homogeneous social structure and mixing among hosts which can yield unreliable predictions of infectious disease spread [3, 5–7]. Network approaches provide an alternative to modeling infection transmission by explicitly incorporating host interactions that mediate pathogen transmission. Formally, in a contact network model, individuals are represented as nodes, and an edge between two nodes represents an interaction that has the potential to transmit infection. A dynamic contact network model tracks interactions evolving over time due to social, demographic or environmental processes as well as perturbations [8]. Constructing a complete contact network model requires (i) knowledge about the transmission route(s) of a pathogen, (ii) a sampling of all individuals in a population, and (iii) a sampling of all interactions among the sampled individuals that may lead to infection transfer. In addition, accuracy of disease predictions depends on the precise epidemiological knowledge about the pathogen, including the rate of pathogen transfer given a contact between two individuals, and the rate of recovery of infected individuals. The use of modern technology in recent years, including RFID, GPS, radio tags, proximity loggers and automated video tracking has enabled the collection of detailed movement and contact data, making network modeling feasible. Despite the technology, logistical and financial constraints still prevent data collection on all individuals and their social contacts [9–14]. More importantly, limited knowledge about a host-pathogen system makes it challenging to identify the mode of infection transmission, define the relevant contacts between individuals that may lead to infection transfer, and measure the per-contact rate of infection transmission [15–17]. Laboratory techniques of unraveling transmission mechanisms usually take years to resolve [18–20]. Defining accurate contact networks underlying infection transmission in human infectious disease has been far from trivial [3, 21, 22]. For animal infectious disease, limited information on host behavior and the epidemiological characteristics of the spreading pathogen makes it particularly difficult to define a precise contact network, which has severely limited the scope of network modeling in animal and wildlife epidemiology [15, 23]. Lack of knowledge about disease transmission mechanisms has prompted the use of several indirect approaches to identify the link between social structure and disease spread. A popular approach has been to explore the association between social network position (usually quantified as network degree) of an individual and its risk of acquiring infection [24–27]. Another approach is to use proxy behaviors, such as movement, spatial proximity or home-range overlap, to measure direct and indirect contact networks occurring between individuals [28–30]. A recent approach, called the k-test procedure, explores a direct association between infectious disease spread and a contact network by comparing the number of infectious contacts of infected cases to that of uninfected cases [31]. However, several challenges remain in identifying the underlying contact networks of infection spread that are not addressed by these approaches. First, it is often unclear how contact intensity (e.g. duration, frequency, distance) relates to the risk of infection transfer unless validated by transmission experiments [19]. Furthermore, the role of weak ties (i.e., low-intensity contacts) in pathogen transfer is ambiguous [21, 32]. The interaction network of any social group will appear as a fully connected network if monitored for a long period of time. As fully-connected contact networks rarely reflect the dynamics of infectious disease spread through a host population, one may ask whether weak ties can be ignored, or what constitutes an appropriate intensity threshold below which interactions are epidemiologically irrelevant? Second, many previous approaches ignore the dynamic nature of host contacts. The formation and dissolution of contacts over time is crucial in determining the order in which contacts occur, which in turn regulates the spread of infectious diseases through host networks [8, 33, 34]. Finally, none of the existing approaches allow direct comparison of competing hypotheses about disease transmission mechanisms which may generate distinct contact patterns and consequently different contact network models. All of these challenges demand an approach that can allow direct comparison between competing hypothesis on transmission pathways while taking into account the dynamics of host interactions and constraints of data sampling. We introduce a statistical tool called INoDS (Identifying contact Networks of infectious Disease Spread) that establishes the epidemiological relevance of observed contact networks in explaining the patterns of infectious disease spread. INoDS also allows testing competing hypotheses on the mode of disease transmission by performing model comparison between different contact networks. The tool can estimate the per-contact transmission rate and recovery rate for various disease progressions (e.g. SI, SIS and SIR) and can be extended to incorporate complexities in transmission (e.g. individual-level heterogeneity in susceptibility; latent period following infection). INoDS provides inference on static and dynamic contact networks, and is robust to common forms of missing data. Using two empirical datasets, we highlight the two-fold application of INoDS—(i) to identify whether observed patterns of infectious disease spread are likely given an empirical contact network, and (ii) to identify transmission routes, the role of the contact intensity, and the per contact transmission rate of a host-pathogen system. The epidemiological insights into infectious disease provided by INoDS can be invaluable in implementing immediate disease control measures in the event of an emerging epidemic outbreak. Discussion In this study we present INoDS as a tool that performs network model selection and establishes the statistical significance of a contact network model to describe the spread of infectious diseases. Our method also provides epidemiological insights about the host-pathogen system by enabling hypothesis testing on different transmission mechanisms, and estimating pathogen transmission rates. Unlike previous approaches that rely on social network position [24–27], proxy behaviors [28–30] or connectivity [31], we show that our method is robust to missing network data, imperfect disease surveillance, can provide inference for dynamic networks and a range of disease progression models. Additionally, our tool overcomes a common challenge of imperfect knowledge of infection acquisition by assuming infection times to be unobserved and using data on infection diagnosis instead to provide inference on contact networks. In principle, the background transmission rate parameter, ϵ, in INoDS is similar to the asocial learning rate used in the network based diffusion analysis approach in the behavior learning literature [38, 39]. The background transmission parameter in our model serves to approximately assess transmission which is unexplained by the edge connections of the network hypothesis. Under the scenario of a network hypothesis with no edges, the best fit ϵ would indicate that all disease data is generated by unobserved transmission. Any transmission events that can be better explained by a network edge reduce the model’s expectation from this maximum ϵ estimate. Relative deviation of ϵ from the maximum possible value is therefore highest for a network hypothesis with no missing data. The relative deviation of ϵ declines with increasing missing data, although it is more sensitive to missing network information compared with missing information in infection time-series data (S4 Fig). Our work thus addresses a growing subfield in network epidemiology that leverages statistical tools to infer contact networks using all available host and disease data [9, 31, 40–42]. Our approach can be used to tackle several fundamental challenges in the field of infectious disease modeling [21, 22]. First, INoDS can be used to perform model selection on contact network models that quantify different transmission modes; this approach facilitates the identification of infection-transmitting contacts and does not rely on laboratory experimentation (or subjective expert knowledge). Second, INoDS can be used to establish the statistical significance of proxy measures of contact (such as spatial proximity, home-range overlap or asynchronous refuge use) in cases where data on direct interactions between hosts are limited. Third, INoDS can establish the epidemiological role of edge weights in a contact network by performing model selection of contact networks with similar edge connections but different edge weighting criteria. In the first empirical example involving the spread of the Crithidia gut protozoan in bumble bee colonies, we demonstrate that contact networks weighted with respect to frequency, rather than duration, explain the observed patterns of transmission. Our results therefore support the original finding of the study [36], where individual risk of infection was found to be correlated with contact rate with infected nest-mates. We further found that weak ties below a certain edge weight threshold do not play an important role in infection transfer for this empirical system. In the second empirical dataset, we found that frequency of contacts between closely located lizards allows better, i.e. more consistent, predictions on Salmonella transmission. Our results supports results from a previous study which suggests that the bacterial transmission in Australian sleepy lizards occur between closely located animals [37]. As with all models, the results from INoDS should be considered within the context of model assumptions. First, we assumed that the infection process has no latent period. In the future, disease latency can be incorporated into the model by using a data augmentation technique, similar to what we use for inferring infection times. Second, we assumed that the infectiousness of infected hosts and susceptibility of naive hosts is equal for all individuals in the population. Heterogeneity in infectiousness of infected hosts and the susceptibility of naive hosts can be incorporated as random effects in the model. Third, in the current version of our model, we do not consider re-infection of hosts. Re-infectivity of hosts can be easily incorporated by allowing multiple time-points of infection acquisition in Eq (2). Our results show that the data-collection efforts should aim to sample as many individuals in the population as possible, since missing nodes have the greatest impact (rather than missing edges) on the statistical significance of network models. Since data-collection for network analysis can be labor-intensive and time-consuming, our approach can be used to make essential decisions on how limited data collection resources should be deployed. For example, under a limited capability of recording real-time interactions between hosts, INoDS can identify the minimum time-resolution required during data collection for a network model with sufficient statistical ability to establish epidemiological relevance. Our approach can also be used to improve targeted disease management and control by identifying high-risk behaviors and super-spreaders of a novel pathogen without relying on intensive transmission experiments that take years to resolve. Materials & methods Here we describe INoDs (Identifying contact Networks of infectious Disease Spread), a computational tool that (i) estimates per contact transmission rate (β) of infectious disease for empirical contact networks, (ii) establishes the epidemiological relevance of a contact network by performing Bayesian hypothesis testing, and (iii) enables discrimination of competing contact network hypotheses, including those based on pathogen transmission mode, edge weight criteria and data collection techniques. Two types of data are required as input for INoDS—infection time-series data, which include infection diagnoses (coded as 0 = not infected and 1 = infected), and time-step of diagnosis for all available individuals in the population; and an edge-list of a dynamic (or static) contact network. An edge-list format is a list of node pairs (each node pair represents an edge of the network), along with the weight assigned to the interaction, and time-step of interaction, with one node pair per line. The tool can be used for unweighted contact networks—an edge weight of one is assigned to all edges in this case. Time-steps of interactions are not required when analysis is performed on static contact networks. The software, implemented in Python, is platform independent, and is freely available at https://github.com/bansallab/INoDS-model. Empirical datasets used in this study are available at https://doi.org/10.7910/DVN/YAHRDJ. INoDS formulation For a susceptible individual, the potential of acquiring infection at each time-step depends on the per contact transmission rate, the total strength of interactions with its infected neighbors at the previous time-step, and a parameter ϵ that captures the force of infection that is not explained by the individual’s social connections. The probability of receiving infection, λ i (t), of a susceptible individual i at time t is thus calculated as: (1) where both β and ϵ parameters are > 0; w i (t − 1) denotes the total strength of association between the focal individual i and its infected associates at the previous time-step (t − 1). For binary (unweighted) contact network models w i = k i , where k i is the total infected connections of the focal individual. The log-likelihood of the observed infection time-series data given the contact network hypothesis (H A ) can therefore be estimated as: (2) where D is the infection time-series data, the set of unknown parameters is Θ = {Θ 1 , Θ 2 …..Θ N }, t n is the time of infection of individual n, and t are all time-steps at which individual n is naive and has the ability to contract infection. The first part of Eq 2 estimates the log likelihood of all observed infection acquisition events. The second part of the equation represents the log-likelihood of susceptible individuals m remaining uninfected at time t. Following Bayes’ theorem, the posterior distribution of the set of parameters is given as: (3) where D is the infection time-series data, H is the contact network hypothesis, and are the shorthands for the posterior, the likelihood, the prior and the marginal likelihood, respectively. Parameter estimation and data augmentation of infection timings. We used DynamicNestedSampler of dynesty package implemented in Python to estimate Bayesian posteriors and evidences [43]. Nested sampling is a numerical method of simultaneously estimating both the posterior and evidence by maintaining a set of samples from the prior, and iteratively updating them subject to the constraint that new samples have higher likelihoods. Dynamic nested sampling allows the samples to be allocated adaptively, maximizing both accuracy and efficiency [43]. We assumed a uniform prior distribution for β and ϵ parameters with range [0, 10]. Calculation of the likelihood in Eq 2 requires knowledge of exact timing of infection, t 1 , …t n , for n infected individuals in the population. However in many cases, the only data available are the timings of when individuals in a populations were diagnosed to be infected, d 1 , …d n . We therefore employ a Bayesian data augmentation approach to estimate the actual infection timings in the disease dataset [44]. Since in this case the infection time t i for an individual i is unobserved, we only know that t i lies between the interval (L i , d i ], where L i is the last negative diagnosis of individual i before infection acquisition. Within this interval, the individual could have potentially acquired infection at any time-step where it was in contact with other individuals in the network. Assuming incubation period to be one time-step, the potential set of infection timing can be represented as t i ∈ {g i (t i − 1) > 0, L i < t i ≤ d i }, where g i (t i − 1) is the degree (number of contacts) of individual i at time t i − 1. For infections that follow a SIS or SIR disease model, it is also essential to impute the recovery time of infected individuals for accurate estimation of infected degree. To do so, we adopt a similar data augmentation approach as described to sample from the set of possible recovery time-points. The data augmentation proceeds in two steps. In the first step, the missing infection times are imputed conditional on the possible set of infection times. In the next step the posterior distributions of the unknown parameters are sampled based on the imputed data. We performed data imputation using inverse transform sampling method, which is a technique of drawing random samples from any probability distribution given its cumulative distribution function [45]. Interpretation of the εparameter. In principle, inclusion of the ϵ parameter in Eq 1 is similar to the asocial learning rate used in the network based diffusion analysis approach in the behavior learning literature [38, 39]. The background transmission parameter ϵ in INoDS formulation serves to approximately assess transmission which is unexplained by the edge connections of the network hypothesis. Estimate value of ϵ parameter increases with missing data, although it is more sensitive to missing network information compared with missing information in infection time-series data (S3 Fig). Epidemiological relevance of a contact network. We performed Bayesian hypothesis testing to establish the epidemiological relevance of a contact network in explaining the observed pattern of infectious disease spread. Null hypothesis is expressed as a uniform prior distribution over networks with permuted edge connections of varying degree (10% to 100%). Prior distribution of alternative hypothesis puts all credibility in an infinitely dense spike at 0% permutation level (viz, the observed contact network). Bayes factor of alternate hypothesis vs null was defined as: (4) Log Bayes factor was calculated as the difference between log marginal likelihoods of the alternate and null hypothesis. A log Bayes factor of 0.5 –1, 1–2 and >2.44 is considered as substantial, strong and decisive evidence, respectively, towards the epidemiological relevance of the observed contact network [46, 47]. Model selection of competing network hypotheses. To facilitate model selection in cases where there are more than one network hypothesis, we compute marginal likelihood of the infection data given each contact network model. The marginal likelihood, also called the Bayesian evidence, measures the model fit, i.e,. to what extent the infection time-series data can be simulated by a network hypothesis (H 1 ). Bayesian evidence is based on the average model fit, and calculated by integrating the model fit over the entire parameter space. (5) Dynamic nested sampling calculates the evidence by integrating the prior within nested contours of constant likelihood. Model selection can be then performed by computing pair-wise Bayes factor, i.e. the ratio of the marginal likelihoods of two network hypotheses. The log Bayes factor to assess the performance of network hypothesis H A over network hypothesis H B , is expressed as: (6) The contact network with a higher marginal likelihood is considered to be more plausible, and a log Bayes’ factor of more than 2.44 is considered to be a decisive support in favor of the alternative network model (H B ) [46, 47]. Validating INoDS performance We validated the performance of INoDS by evaluating its accuracy in estimating the unknown transmission parameter β, robustness to missing data, and comparing the performance of tool with previous approaches. To do so we first constructed a dynamic synthetic network using the following procedure. At time-step t = 0, a static network of 100 nodes, mean degree 4, and Poisson degree distribution was generated using the configuration model [35]. At each subsequent time-step, 10% of edge-connections present in the previous time-step were permuted, for a total of 100 time-steps. That is, the dynamic synthetic network (called true synthetic network in the results) consists of contacts (i, j, t) between node i and j at time t, lasting from a duration of 1 to 100 time steps; each contact is capable of disease transmission [48]. Next, through the synthetic dynamic network, we performed 10 independent SI disease simulations with per contact rate of infection transmission (β*) 0.01 to 0.1. Model accuracy was evaluating by comparing INoDS estimation of the transmission parameter, β, with the true transmission rate β* that was used to perform disease simulations. Since the synthetic network completely described the disease simulations, model accuracy was also tested by evaluating the deviation of the estimated error parameter ϵ, from the expected value of zero. We next tested robustness of the tool against two types of missing network data: (a) incomplete sampling of individuals in a population (missing nodes), and (b) incomplete sampling of interactions between individuals (missing edges). To do so, we simulated two independent SI disease simulations each for ten β* values ranging from 0.01 to 0.1 with increment of 0.01 through the synthetic dynamic network. The two scenarios of missing data were created by randomly removing 25–75% of nodes and edges from the true synthetic network that were not a part of the path of simulated infection spread. This approach allowed us to investigate INoDS performance for various levels of incompletely sampled networks with the infection path preserved. If INoDS is robust to missing network data, we expect to recover similar parameter estimate, epidemiological relevance and model evidence for all networks. Comparisons with previous approaches We compared INoDS performance with two previous approaches that are popularly used to establish epidemiological significance of contact networks—k-test [31] and network position test. k − test estimates the average number of infected connections for each infected individual in the network. To establish the epidemiological significance of a contact network this metric, called the k-statistic, is compared to a distribution of null k-statistics obtained by randomly swapping the node labels. A p-value is calculated as the number of permutations that produce k-statistics more extreme than the observed k-statistic. Network position test compares the average degree (i.e., the average number of connections) of infected cases with the degree of uninfected individuals. The difference in average degree in the observed network is compared to the degree difference in networks where random edge connections. A p-value is calculated as the number of random networks where the degree between infected and uninfected individuals is higher than the observed network. Both of these previous approaches only provide epidemiological evidence of the observed contact network and do not estimate transmission parameters or enable model comparisons. We therefore performed comparisons with step 2 of INoDS where epidemiological relevance of the observed network is evaluated. We also note that the previous approaches are limited to static and unweighted networks, we therefore performed model comparisons on a “true” synthetic static network with 100 nodes, Poisson degree distribution and a mean network degree of 3. Simulations of disease spread were performed with a broad range of per contact transmission rate (β). Null expectation in INoDS and network position test was generated by permuting the edge connections of the observed networks, creating an ensemble of null networks. In k-test, the location of infection cases within the observed network are permuted, creating a permuted distribution of k-statistic [31]. Applications to empirical datasets We demonstrate the applications of our approach using two datasets from the empirical literature. The first dataset comprises of dynamic networks of bee colonies (N = 5–7 individuals), where edges represent direct physical contacts that were recorded using a color-based video tracking software. A bumble bee colony consists of a single queen bee and infertile workers. Here, we focus on the infection experiments in two colonies (colony QC6 and UN1). Infection progression through the colonies was tracked by daily screening of individual feces, and the infection timing was determined using the knowledge of the rate of replication of C. bombi within its host intestine. The second dataset monitors the spread of the commensal bacterium Salmonella enterica in two separate wild populations of the Australian sleepy lizard Tiliqua rugosa. The two sites consisted of 43 and 44 individuals respectively, and these represented the vast majority of all resident individuals at the two sites (i.e., no other individuals were encountered during the study period). Individuals were fitted with GPS loggers and their locations were recorded every 10 minutes for 70 days. Salmonella infections were monitored using cloacal swabs on each animal once every 14 days. Consequently, the disease data in this system do not identify the onset of each individual’s infection. We used a SIS (susceptible-infected-susceptible) disease model to reflect the fact that sleepy lizards can be reinfected with salmonella infections. Proximity networks were constructed by assuming a contact between individuals whenever the location of two lizards was recorded to be within 14m distance of each other [26]. The dynamic networks at both sites consisted of 70 static snapshots, with each snapshot summarizing a day of interactions between the lizards. We constructed two contact network hypotheses to explain the spread of salmonella. The first contact network hypothesis placed binary edges between lizards if they were ever within 14m distance from each other during a day. The second contact network assigned edge weights proportional to the number of times two lizards were recorded within 14m distance of each other during a day. Specifically, edge weights between two lizards were equal to their frequency of contacts during a day normalized by the maximum edge weight observed in the dynamic network. Supporting information S1 Fig. Corner plot of the posterior sample showing relationship between estimates of β and ϵ. True value of the parameters are indicated with red lines. Contours contain 25%, 50% and 75% of the sample points respectively. https://doi.org/10.1371/journal.pcbi.1009604.s001 (EPS) S2 Fig. Estimate of ϵ with increasing percentages of nodes/edges removal that were not involved in infection spread path. https://doi.org/10.1371/journal.pcbi.1009604.s002 (EPS) S3 Fig. Robustness of INoDS to random missing data. Networks with missing data were created by randomly removing 10–60% of nodes, edges and cases. (a) Step 1: Δβ is the deviation of estimated transmission parameter β from the true transmission rate β*. (b) Step 2: Epidemiological relevance of observed network with missing data. Each box summarizes log Bayes factor of observed network with missing data compared to null hypothesis (viz a prior of networks with 10% to 100% permuted edges). (c) Evidence for the true synthetic network over datasets with missing data. Log Bayes factor of more than 2.44 (dashed line) is considered to be a strong support in favor of the observed contact network. The middle black line in each box plot is the median, the boxed area extends from the 25th to 75th quartile, and whiskers extended from the hinge to the largest/smallest value no further than 1.5 times the inter-quartile range. https://doi.org/10.1371/journal.pcbi.1009604.s003 (EPS) S4 Fig. Relative deviation of estimated ϵ from the maximum possible value with increasing percentage of random missing data. Maximum ϵ was determined by maximizing the likelihood function in Eq (2) assuming network with no edges. Each boxplot summarizes the results of 20 independent SI disease simulations through a synthetic dynamic network—two each for ten β* values ranging from 0.01 to 0.1 with increment of 0.01. Missing data were created by randomly removing 10–60% of nodes, edges or cases from the dataset. The middle black line in each box plot is the median, the boxed area extends from the 25th to 75th quartile, and whiskers extended from the hinge to the largest/smallest value no further than 1.5 times the inter-quartile range. https://doi.org/10.1371/journal.pcbi.1009604.s004 (EPS) S5 Fig. Disease prevalence corresponding to different values of β used in our simulations for Fig 4. https://doi.org/10.1371/journal.pcbi.1009604.s005 (EPS) Acknowledgments We thank Peter Majoros, Tom Haley, Alienor Quiblier and Ben Westwood for their assistance with lizard fieldwork, and David Gordon for his lab work. [END] [1] Url: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009604 (C) Plos One. "Accelerating the publication of peer-reviewed science." Licensed under Creative Commons Attribution (CC BY 4.0) URL: https://creativecommons.org/licenses/by/4.0/ via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/