(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Real-time estimation of the epidemic reproduction number: Scoping review of the applications and challenges [1] ['Rebecca K. Nash', 'Mrc Centre For Global Infectious Disease Analysis', 'Jameel Institute', 'School Of Public Health', 'Imperial College London', 'Pierre Nouvellet', 'School Of Life Sciences', 'University Of Sussex', 'Anne Cori'] Date: 2022-08 Abstract The time-varying reproduction number (R t ) is an important measure of transmissibility during outbreaks. Estimating whether and how rapidly an outbreak is growing (R t > 1) or declining (R t < 1) can inform the design, monitoring and adjustment of control measures in real-time. We use a popular R package for R t estimation, EpiEstim, as a case study to evaluate the contexts in which R t estimation methods have been used and identify unmet needs which would enable broader applicability of these methods in real-time. A scoping review, complemented by a small EpiEstim user survey, highlight issues with the current approaches, including the quality of input incidence data, the inability to account for geographical factors, and other methodological issues. We summarise the methods and software developed to tackle the problems identified, but conclude that significant gaps remain which should be addressed to enable easier, more robust and applicable estimation of R t during epidemics. Author summary Many software tools have been developed to support real-time outbreak response, allowing epidemiologists and other health professionals to characterise new and persistent pathogens and their threat to populations. Some of these tools allow the user to estimate how rapidly a pathogen is spreading through a population in real-time. This knowledge is crucial for a timely and effective response, helping to inform the optimal choice of control strategies. However, these tools are not perfect, and may need to be adapted to make them applicable to more pathogens and a variety of contexts. Our study summarises the different tools available in the literature and identifies the remaining issues that are yet to be addressed. We combine these findings with feedback from users of a popular tool, EpiEstim, to better understand the key priorities for development going forward. Citation: Nash RK, Nouvellet P, Cori A (2022) Real-time estimation of the epidemic reproduction number: Scoping review of the applications and challenges. PLOS Digit Health 1(6): e0000052. https://doi.org/10.1371/journal.pdig.0000052 Editor: Michele Tizzoni, ISI Foundation: Fondazione ISI - Istituto per l’lnterscambio Scientifico, ITALY Received: December 16, 2021; Accepted: April 27, 2022; Published: June 27, 2022 Copyright: © 2022 Nash et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: The authors confirm that the data supporting the findings of this study are available within the article and its supplementary materials. Funding: RKN acknowledges funding from the Medical Research Council (MRC) Doctoral Training Partnership (grant reference MR/N014103/1). AC acknowledges funding by the National Institute for Health Research (NIHR) Health Protection Research Unit in Modelling and Health Economics, a partnership between Public Health England, Imperial College London and LSHTM (grant code NIHR200908); and acknowledges funding from the MRC Centre for Global Infectious Disease Analysis (reference MR/R015600/1), jointly funded by the UK Medical Research Council (MRC) and the UK Foreign, Commonwealth & Development Office (FCDO), under the MRC/FCDO Concordat agreement and is also part of the EDCTP2 programme supported by the European Union; and acknowledges funding by Community Jameel. The views expressed are those of the author(s) and not necessarily those of the NIHR, Public Health England or the Department of Health and Social Care. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: AC has received payment from Pfizer for lecturing on a course for mathematical modelling of infectious disease transmission and vaccination. Introduction Transmissibility quantifies how easily a pathogen can spread through a population and can depend on numerous factors, such as the pathogenicity of the infectious agent but also the current level of immunity, demographics, and connectivity in the population. Transmissibility is typically measured by the time-varying (or effective) reproduction number (R t ), which represents the average number of secondary infections generated by a case at time t of an outbreak.[1–4] R t is an important element of outbreak analysis as it indicates whether case numbers are rising (R t > 1) or falling (R t < 1) and by how much. This can guide intervention planning and help to determine whether current control measures are effective, and if not, to what extent they need to be intensified. R t estimates can also feed into real-time incidence forecasts, which can assist logistics and resource planning, for instance, by identifying whether hospital bed capacity is likely to be exceeded or if resources need to be allocated to specific areas.[5–8] Many methods have been developed to estimate R t in real-time; one of the most commonly used is through the renewal equation (Eq 1), which relies on a branching process model. The model assumes that the incidence of new cases on day t (I t ) can be represented by a Poisson process: (1) where R t is the time-varying reproduction number (i.e. the average number of cases caused by a primary case infected at time t, assuming that conditions remain the same after time t), and the past incidence (I t-s ) is weighted by ω s , the probability mass function of the generation time (the time between infection in a case and their infector).[3, 9] In practice, as infection itself is difficult to observe, the incidence of symptomatic cases can be used instead and ω s can be approximated by the serial interval (SI, the time between symptom onset in a case and their infector. See Britton and Scalia Tomba for the implications of this approximation).[10] Cori et al. developed a method to estimate R t from the renewal equation that is suitable for real-time application.[11] The method is implemented in the R package EpiEstim, where estimation is performed over user-defined time windows within which R t is assumed constant. Longer windows typically lead to smoother estimates, but may artificially hide some of the temporal variability.[12] R t estimates can then be used to project forward in time and forecast future epidemic trajectories (using Eq 1, see Fig 1), as implemented in the R package projections.[13] PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 1. Schematic of the forecasting process. In the renewal equation, the incidence at time t (I t ) is expressed as a function of the serial interval distribution (ω s ), the time-varying reproduction number (R t ) and the past incidence (I t-s ). https://doi.org/10.1371/journal.pdig.0000052.g001 The COVID-19 pandemic has generated a surge in methodological and tool developments to estimate R t , as well as numerous applications of these methods. We aimed to describe the landscape of methods and real-world applications of R t estimation, and to identify potential gaps that could be addressed to make those methods more easily applicable and more useful in practice in the future. EpiEstim has been widely used for R t estimation during the COVID-19 pandemic and has been shown to perform better than many other methods in terms of estimation accuracy.[12] Therefore, we use EpiEstim as a starting point to identify ways in which renewal equation methods have been used and modified.[12] We reviewed research articles that either used EpiEstim (the method or the software) or described a modified approach to address unmet needs. Alongside this scoping literature review, a questionnaire was distributed to known users of EpiEstim and advertised on social media to gather information on issues that were unlikely to be described in publications, e.g. computational speed or usability. By collating the findings, we aim to reveal some of the key challenges when estimating R t in real-time and those that are yet to be addressed. Methods Scoping review: Search strategy and selection criteria Google scholar was used to identify all articles or reports up to 10th December 2020 that cited one of the two papers describing EpiEstim (Cori et al. 2013 or Thompson et al. 2019).[3, 14] After full text screening, two databases were compiled using the inclusion and exclusion criteria outlined in Table 1, collating papers or reports that used: i) an unmodified version of the EpiEstim method or software and ii) a modified version of the EpiEstim method or software. PPT PowerPoint slide PNG larger image TIFF original image Download: Table 1. Inclusion and exclusion criteria for the scoping review. https://doi.org/10.1371/journal.pdig.0000052.t001 Evaluation of R package or tool usability Each R package or tool identified in the scoping review was appraised in terms of its ease of installation, available documentation (e.g. vignette or tutorials) and speed of estimation of the reproduction number (see Table B and section 3 in S1 Text for more detail). Questionnaire We designed an online questionnaire including 16 questions aiming to reveal the key challenges users encountered when using EpiEstim to estimate R t . It was shared via Twitter and distributed to a list of 41 known EpiEstim users.[15] The questionnaire was live between 11th and 31st January 2021. Ethics statement Ethical approval was not required as the questionnaire was evaluative and solely to understand the current use and limitations of the EpiEstim R package. Respondents were informed that their survey responses were being collected for academic purposes. Discussion To improve methods of R t estimation, it is crucial to first gain an understanding of the current key challenges preventing, or restricting, practical applications of these methods to quantify pathogen transmissibility in real-time. To get an overview of such issues, we chose EpiEstim, one of the most popular and best-performing methods for estimating R t in real-time, as a case study.[12] We analysed the literature citing EpiEstim as well as direct feedback from EpiEstim users collected through a questionnaire. We have also summarised the R packages identified that use alternative methodologies. The most common challenge identified was to deal with factors influencing the quality of incidence data. Addressing frequently encountered issues, such as delays in reporting, underreporting, and administrative noise, appeared to be a priority in the field. Delays in reporting of cases or time-varying reporting rates have been a particular issue throughout the COVID-19 pandemic, linked to a high proportion of asymptomatic cases, fluctuations in testing, and health systems being under strain.[79–81] To counter some of these issues, R packages such as bayEStim and EpiNow2 explicitly account for delay distributions enabling “nowcasting”, which aims to eliminate any time-lag when estimating the impact of control measures or factors such as changes in behaviour.[68, 69] When incidence by date of infection or symptom onset was unavailable or unreliable, some approaches used incidence by date of death. For instance, COVID-19 case data was initially very sporadic, whilst death data was more reliably reported. The R packages Epidemia and EpiNow2 allow the use of death data to maximise the use of available information in real-time.[69, 70] In many epidemics, including COVID-19, the reliability of case data improves over time and can provide more up-to-date insights into changing transmission dynamics. The ability to use either dataset means they can complement each other during different phases of the outbreak. However, the suitability of the type of data used will depend on the pathogen under investigation and the local context. An issue that, to our knowledge, has yet to be addressed by any R package is the ability to estimate R t using weekly-aggregated incidence data, which was highlighted in questionnaire responses and is a frequent query in correspondence with users of EpiEstim. This is a key ongoing issue, particularly for diseases such as influenza and Zika, where incidence is typically reported on a weekly basis, [45, 82] but also for COVID-19, with several US states having recently moved from reporting daily to weekly cases only.[83] While it is possible to supply EpiEstim and EpiNow2 with weekly-aggregated data if the SI is provided on the same time scale, aggregating a short SI to a weekly distribution may affect the quality of the R t estimates and is not possible when the SI is shorter than a week.[45] The scoping review identified additional methodological or data-related issues. These led to the development of methods that are more appropriate for use during periods of low incidence, [21, 24, 43, 49, 50, 63] or offer alternatives to allow less subjective and more flexible inputs into EpiEstim, e.g. the prior for R t , the SI distribution, or an alternative way of temporally smoothing R t estimates.[18, 19, 21, 24–28, 30, 31, 33, 37, 43, 45–47, 49–52, 54, 60, 61, 63, 64, 67] Modifications concerning geographical factors were also common in the literature, although only mentioned by one questionnaire respondent. Beyond accounting for imported cases [14, 63] and estimating R t across different regions at the same time, [18, 25, 27, 31, 37, 46, 47, 52, 54] we found no R packages that explicitly addressed spatial interactions. This means that, aside from bespoke studies, [36, 40, 53, 56, 58, 59, 64, 66] important aspects of the spatial structure in transmission may have been overlooked. For instance, to our knowledge, there is no ready-to-use tool to analyse how easing restrictions in an area may influence transmission in neighbouring locations. Another theme highlighted in both the scoping review and questionnaire responses was practical or logistical issues, such as evaluating or accounting for the impact of interventions, assessing when elimination has been reached, or extending the framework for logistical planning in hospitals. Transmissibility estimates ultimately inform intervention design, e.g. to select interventions that appear most effective and identify locations which should be targeted for implementation. The ability to estimate the impact of interventions on R t across different countries is a useful feature of Epidemia, which uses a logistic regression framework to infer the impact of non-pharmaceutical interventions (e.g. social distancing) and reductions in mobility on transmission.[70] Additionally, an important practical consideration is when one can confidently estimate that elimination has been reached, which has been addressed by EpiFilter.[24, 63, 72] More R packages that directly address a variety of recurring logistical issues is a promising avenue for future research in the field. Despite the small sample size (n = 17) and the imperfect nature of the questionnaire (see section 4 of S1 Text), it reinforced many of the findings from the scoping review and revealed additional challenges with computational speed, usability, and compatibility. We evaluated each of the identified R packages and tool based on these criteria and found that, of the packages that estimate the reproduction number, only earlyR scored full marks in the usability category (Table 2, Table B in S1 Text). These themes emphasise that package developers (of both new and existing methods) should ensure that increased functionality does not come at the expense of speed, and that additional resources are available to aid understanding of the package and how it can be used in conjunction with others in an outbreak analysis workflow. In response, we created a new EpiEstim vignette, which intends to provide a greater range of examples of using EpiEstim in practice.[84] A suggestion to improve usability by providing a library of pre-built configurations for different pathogens has already begun to be addressed to some extent by the R package EpiNow2.[69] This is a fast-moving field and novel extensions or applications of EpiEstim-like methods are continuously emerging. It is therefore likely that we have missed valuable modifications not covered by articles available at the time of our literature search. This includes an extension to the method by Johnson et al. to account for superspreading, and work by the authors of this study, for example, the development of multi-variant EpiEstim (MV-EpiEstim) to estimate the transmission advantage of new pathogen variants or strains in real-time.[85, 86] It is also possible that through using EpiEstim as a starting point for our literature search, we may have missed relevant papers that did not cite the method. However, given that EpiEstim has been recognised as the most reliable currently existing approach for real-time R t estimation, [12] we believe it is unlikely that we would have missed important contributions that would alter our conclusions. To our knowledge, there are still numerous issues regarding real-time estimation of the reproduction number that remain unaddressed by R packages or opensource software. For instance, we found no readily available tools that consider non-spatial population heterogeneities or the ability to include time-varying generation times.[87, 88] The intention of this review was to provide a broad overview of the current landscape of renewal-equation based R t estimation methods and tools. Therefore, beyond providing an appraisal of the usability of each identified R package/tool in terms of ease of installation, available documentation, and speed, we have not tested or compared their performance. Future work should focus on a more systematic and critical evaluation of the estimation accuracy of R packages intending to tackle the same issues. Moreover, it would be interesting to characterise the added value of additional features in relation to their cost, such as longer computational time. Conclusion The quality of incidence data continues to pose significant challenges to real-time estimation of R t . Numerous methods and R packages have been developed to address some of these issues, but significant gaps remain, such as the inability to directly use temporally aggregated incidence data. Despite the importance of spatial factors in transmission, no R package has been identified to account for movement or interactions between locations when estimating R t . It is also clear that extensions to these methods could allow for rapid translation of R t estimates into logistically relevant outputs, such as predicting how the demand for hospital resources may change in real-time. Addressing these recurring issues and extending the methodology to directly answer important practical and logistical questions are key priorities for widening the applicability of R t estimation methods during epidemics. However, package developers should keep in mind that speed, ease of use, and access to sufficient resources, will be key to the uptake of these new or improved tools. Supporting information S1 Text. Figure A. The disease or pathogen under investigation in A) the papers that used an unmodified version of the EpiEstim package or the method and B) the papers that used a modified version of the approach or package. The category “multiple” refers to papers where more than one disease or pathogen were investigated. Note the diseases are different in both panels. Table A. Summary table of the papers identified that used an unmodified version of EpiEstim (n = 242). Table B. Usability of each R package or tool* identified in the scoping review. This table shows a full breakdown of how the classifications (very good = ✓✓, good = ✓, poor = ✗) were determined for the “additional exploration” section of Table 2 within the main text. For the ‘ease of installation’ and ‘documentation and tutorials’ sections, each criterion was allocated a score, shown in squared brackets, and the overall classification was determined by the sum of the scores. For the ‘speed’ section, each author used the system.time() function in R to determine the run time of the main function of the package available in the provided examples. ** The classification (<10s = ✓✓, >10s – 5min = ✓, >5min = ✗) was decided based on the time category agreed on by at least 2 out of the 3 computers. Figure B. Map showing the country in which each questionnaire respondent is based. The majority of responses were from the USA (n = 4), followed by Canada (n = 2), France (n = 2) and Indonesia (n = 2). There was one response from each of Austria, Bermuda (circled), Germany, India, Peru, Uruguay, and the UK. Figure C. A) The profession of each questionnaire respondent and B) the purpose of their analysis. Respondents could select more than one answer for both questions. Figure D. A) Disease(s) investigated by each respondent. B) Categories of input data. Respondents could select more than one answer for both questions. Figure E. Broad reason for the use of EpiEstim. Respondents could select more than one answer for this question. Figure F. Questionnaire responses to A) how well the package met the needs of each respondent on a scale from 1 to 5 (1: “badly”, 5: “very well”), and B) “Which features do you think could be improved?”. Table C. Summary of the issues and suggestions reported in questionnaire feedback categorised by broad theme. n is the number of respondents and % is the percentage of the 17 respondents who reported or made a suggestion regarding the issue. https://doi.org/10.1371/journal.pdig.0000052.s001 (PDF) Acknowledgments We thank the 17 respondents to the questionnaire for their valuable insight. [END] --- [1] Url: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000052 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/