(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . An investigation of spatial-temporal patterns and predictions of the coronavirus 2019 pandemic in Colombia, 2020–2021 [1] ['Amna Tariq', 'Department Of Population Health Sciences', 'School Of Public Health', 'Georgia State University', 'Atlanta', 'Georgia', 'United States Of America', 'Tsira Chakhaia', 'Sushma Dahal', 'Alexander Ewing'] Date: 2022-08 Abstract Colombia announced the first case of severe acute respiratory syndrome coronavirus 2 on March 6, 2020. Since then, the country has reported a total of 5,002,387 cases and 127,258 deaths as of October 31, 2021. The aggressive transmission dynamics of SARS-CoV-2 motivate an investigation of COVID-19 at the national and regional levels in Colombia. We utilize the case incidence and mortality data to estimate the transmission potential and generate short-term forecasts of the COVID-19 pandemic to inform the public health policies using previously validated mathematical models. The analysis is augmented by the examination of geographic heterogeneity of COVID-19 at the departmental level along with the investigation of mobility and social media trends. Overall, the national and regional reproduction numbers show sustained disease transmission during the early phase of the pandemic, exhibiting sub-exponential growth dynamics. Whereas the most recent estimates of reproduction number indicate disease containment, with R t <1.0 as of October 31, 2021. On the forecasting front, the sub-epidemic model performs best at capturing the 30-day ahead COVID-19 trajectory compared to the Richards and generalized logistic growth model. Nevertheless, the spatial variability in the incidence rate patterns across different departments can be grouped into four distinct clusters. As the case incidence surged in July 2020, an increase in mobility patterns was also observed. On the contrary, a spike in the number of tweets indicating the stay-at-home orders was observed in November 2020 when the case incidence had already plateaued, indicating the pandemic fatigue in the country. Author summary As the COVID-19 pandemic continues to spread across Colombia, studies highlighting the intensity of the pandemic have become imperative for appropriate resource allocation and informing public health policies. In this study we utilize mathematical models to infer the transmission dynamics of SARS-CoV-2 at the regional and national level as well as short-term forecasting of the COVID-19 epidemic trajectory. Moreover, we examine the geographic heterogeneity of the COVID-19 case incidence in Colombia along with the analysis of mobility and social media trends in relation to the observed COVID-19 case incidence in the country. The estimates of reproduction numbers at the national and regional level show a decline in disease transmission as of October 31, 2021. Moreover, the 30-day ahead short-term forecasts for the most recent time-period (October 2-October 31, 2021) generated using the mathematical models needs to be interpreted with caution as the single peak models point towards a sustained decline in case incidence contrary to the sub-epidemic wave model. Nevertheless, the spatial analysis in Colombia shows distinct variations in incidence rate patterns across different departments which can be grouped into four distinct clusters. Moreover, the social media and mobility trends also explain the occurrence of case resurgences over time. Citation: Tariq A, Chakhaia T, Dahal S, Ewing A, Hua X, Ofori SK, et al. (2022) An investigation of spatial-temporal patterns and predictions of the coronavirus 2019 pandemic in Colombia, 2020–2021. PLoS Negl Trop Dis 16(3): e0010228. https://doi.org/10.1371/journal.pntd.0010228 Editor: Joseph T. Wu, University of Hong Kong, HONG KONG Received: July 20, 2021; Accepted: February 1, 2022; Published: March 4, 2022 Copyright: © 2022 Tariq et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: Case incidence data by the dates of symptom onset and mortality data are available from the Colombian Ministry of Health. url: https://coronaviruscolombia.gov.co/Covid19/index.html. Genomic data is available from the "global initiative on sharing avian influenza data” (GISAID) repository. url: https://www.gisaid.org/ Apple mobility data is available from the Apple's mobility trends reports. url: https://covid19.apple.com/mobility. Google mobility data is available from the Google mobility page (COVID-19 Community Mobility Reports Google 2020): https://www.google.com/covid19/mobility/ Twitter data is available from the Twitter data set of the COVID-19 chatter: https://github.com/thepanacealab/covid19_twitter. Funding: G.C. is partially supported from NSF grants 1610429 and 1633381, R01 GM 130900 and project ANID/MEC/80170119. A.T. is supported by a 2CI fellowship from Georgia State University. P.S. is supported by the National Institutes of Health grant 1R01EB025022 and by the National Science Foundation grant 2047828. R.B. is supported by ANID (Chile) through Fondecyt project 1210610; Centro de Modelamiento Matemático (CMM) projects ACE210010 and FB210005 of BASAL funds for Centers of Excellence; and CRHIAM, project ANID/FONDAP/15130015. L.Y.L-D is supported by ANID scholarship ANID-PCHA/Doctorado Nacional/2019-21190640 and Anillo project ANID/PIA/ACT210030. E.S. is supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT, No. 2018R1C1B6001723) and by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2021R1A6A1A10044154). S.K.O. is supported by Ionis Pharmaceuticals. J.M.B was supported by a grant by the National Institute of Aging (3P30AG059307-02S1). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Introduction Ever since the detection of the first cluster of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in December 2019, the coronavirus disease 2019 (COVID-19) pandemic continues to threaten the world [1]. As this pandemic gears to continue in the third year, it has already resulted in devastating global socioeconomic and political consequences [2]. The virus’s novelty and complex transmission dynamics have resulted in non-pharmaceutical public health measures including social distancing mandates and intermittent lockdowns as the primary weapons to fight COVID-19 [3]. Moreover, as new variants of SARS-CoV-2 continue to emerge amidst vaccination campaigns globally, it remains unclear how the COVID-19 pandemic will unfold [4,5]. Despite the uncertainty in the development of the pandemic, SARS-CoV-2 continues to exert substantial morbidity and mortality burden surpassing 246 million confirmed cases and nearly 5 million associated deaths worldwide as of October 31, 2021 [6]. The mortality and morbidity of the COVID-19 pandemic continues to overwhelm the health care systems of many nations including the United States, Colombia, Mexico, Brazil and Argentina in the Americas [7]. As the third COVID-19 wave strikes globally, it remains imperative to understand the dynamics of disease transmission and the potential role of mitigation strategies at different spatial scales including the municipality, regional or departmental level in order to contain the virus [8]. Assessing the geographical patterns of the cases and identifying the COVID-19 impact at the regional level can provide valuable insights regarding the exposure to the virus and the distribution of public health resources required to mitigate its spread [9,10]. Pandemic emergency response which includes the implementation and lifting of dynamic lockdowns [11], social distancing, mask mandates, population testing, and provision of vaccinations has varied across countries in the same region and in different cities within the same country [12]. As with Latin America, one of the epicenters of the pandemic [13], the evolution of the pandemic and the quality of public health responses have varied across nations. While Chile aimed to mitigate the virus to slow down the disease and reduce peak in health care demand, Brazil, Ecuador, and Colombia’s objective was to contain the spread of the virus by minimizing the risk of transmission from infected to non-infected individuals to stop the outbreak [14,15]. Despite aiming to contain virus transmission in the country, Colombia became the second country in Latin America and eighth globally to reach one million cases by the end of October 2020 [16]. Colombia has also seen the fastest increase in total COVID-19 associated deaths compared to other Latin American countries as reported by the end of the year 2020 [14]. In Latin America, where the first confirmed COVID-19 case was reported in Brazil on February 26, 2020 [17], the intensity and diversity of social distancing efforts have varied widely. Since the detection of the first imported confirmed case on March 6, 2020, Colombia has observed three epidemic peaks [13,18]. As the virus disseminated throughout the country, initial prevention recommendations provided between March 9–11, 2020 from the local government authorities conflicted with those at the national level [19], creating confusion amongst the public about the preventive measures required to protect themselves [19]. Therefore, the case incidence continued to rise leading to the declaration of the national emergency on March 12, 2020 [19]. As a part of the prompt escalation of measures to prevent a full-blown outbreak most educational institutions were closed by March 16, 2020, followed by the closure of the country borders the next day [19]. As cases continued to increase rapidly, domestic and international flights were suspended on March 23, 2020 [19]. Subsequently, to limit virus transmission within the country, a nationwide mandatory quarantine was implemented on March 24, 2020. Meanwhile a phased reopening of the economy was initiated as early as April 27, 2020, under strict protocols, to support the country’s dwindling economy [19]. The evolution of the pandemic in Colombia has justified the five extensions of the mandatory quarantine that lasted until August 31, 2020 [20]. At this point, the country transitioned into a period of “selective isolation with responsible individual distancing” as the daily incidence in the country’s main cities including Bogotá, Medellín, Cali, Bucaramanga and Pasto levelled off and eventually leaned towards a downward trend [21,22]. Moreover, Barranquilla, Cartagena, Leticia and Quibdó had overcome the worst part of the first wave by August 25, 2020 [21]. The chosen selective isolation strategy prioritized tracing of suspected cases, those with infection, and their contacts, while reactivating the social and economic life of the country. As the cases continued to increase in the subsequent months, the government imposed and lifted dynamic lockdowns in multiple cities across the country. By the end of year 2020, COVID-19 cases were mainly concentrated in Bogotá (314,745), the capital city of Colombia followed by Antioquia (163,952), and Valle del Cauca (81,101) [23]. As the pandemic entered its second year, a second wave of COVID-19 was detected, primarily fueled by the P.1 (Gamma) variant of SARS-CoV-2 initially circulating in Brazil [24]. Soon after, the government announced a mass vaccination strategy that began on February 20, 2021 [25]. Around the same time, on February 25, 2021, the president of Colombia extended the duration of the national health emergency to last until May 31, 2021 [26]. Amidst delayed vaccine acquisition and slow vaccine distribution in the country, the cases continued to rise in some cities like Bogota and Barranquilla [27,28]. This resulted in the implementation of curfews in the cities and municipalities followed by the opening of businesses based on the occupancy rates of the intensive care units (ICU) in the hospitals beginning March 25, 2021 [27]. From March to June 2021, Colombia experienced a massive third wave of the COVID-19 pandemic, which evolved into a two stage peak-within-a-peak surge [28]. The month of May 2021 was reported to be the deadliest month resulting in an average of ~20,000 cases and 500 deaths per day [29]. While the impact of COVID-19 pandemic was not uniform across the entire country, Bogotá, Antioquia and Valle del Cauca have been the hardest hit areas in Colombia [30]. The Colombian government’s response to contain the COVID-19 pandemic has been declared the third worst out of 116 countries evaluated by the Lowy Institute in Australia, only ahead of Mexico and Peru [31]. Despite being one of the first countries in Latin America to offer diagnostic tests for COVID-19 [32], the testing and vaccination rate for Colombians remains low [33], with 0.79 tests per 1000 people per day and 0.39 vaccine doses administered per 100 people as of October 31, 2021. [34]. Hence, the country stands with a total of 5,002,387 cases and 127,258 deaths as of October 31, 2021 [30]. The factors contributing to the current COVID-19 outbreak are myriad. However, three distinguishable events have particularly interacted synergistically to add to the complexity of the pandemic in Colombia. These include the COVID-19 outbreaks in prisons and nursing homes that affected the vulnerable communities of the society [35,36] and the April 2021 Colombian protests provoked by the government policy proposals [28]. The Colombian prisons in Cali, Villavicencio, and Bogotá were severely impacted by the COVID-19 pandemic due to over-crowding, inadequate medical supplies and unhygienic conditions of facilities, which led to many infected inmates [36]. The COVID-19 pandemic in Colombia presents complex risk dynamics of SARS-CoV-2 transmission with a simultaneous interplay of epidemiological, behavioral, and political factors. Hence assessing the temporal variation of transmission in near real time becomes vital to evaluate the effectiveness of intervention strategies and possible reopening of the economy. Moreover, forecasting the COVID-19 trajectory can help understand the trends of the disease and estimate its potential burden. As the epidemic trajectory of the COVID-19 pandemic continues to unfold we forecast the COVID-19 trajectory in near-real time utilizing the mathematical models that have been validated for previous infectious disease outbreaks such as Ebola, Zika and the COVID-19 pandemic [37–40]. Moreover, we specifically investigate the transmission dynamics of SARS-CoV-2 at the national and regional levels. We augment the analysis by investigating the spatial epidemic curves that allow us to analyze geographic heterogeneity that is lost while evaluating the aggregated epidemic curves and examine the social media and mobility trends concerning the implementation of lockdowns. Methods Data Six sources of data are analyzed in this study. A brief description of the data sets is provided as follows: (i) Case incidence data Case incidence data based on the dates of symptom onset is used to generate the epidemic curve, scaled incidence curve (see S1 Text), short-term forecasts of the COVID-19 pandemic, assess the spatial heterogeneity in Colombia and estimate the national and regional reproduction numbers. Specifically, we utilize publicly available time series of laboratory-confirmed cases based on the dates of symptom onset for the national and regional COVID-19 case incidence. These data are retrieved from the Colombian Ministry of Health as of October 31, 2021 [23]. Colombian departments and their four main districts; Barranquilla, Bogotá, Cartagena, and Santa Marta can be grouped into five regions as follows: the Amazon, the Andean, the Orinoquía, the Caribbean and the Pacific [41]. To evaluate the spatial heterogeneity and estimate the regional reproduction numbers, for simplicity we assume a regional level distribution as follows (S1 Fig): Amazon region : Amazonas, Caquetá, Guainía, Guaviare, Putumayo and Vaupés. Andean region : Antioquia, Boyacá, Bogotá, Caldas, Cauca, Cundinamarca, Huila, Norte De Santander, Quindío, Risaralda, Santander, and Tolima. Orinoqu í a region : Arauca, Casanare, Meta, and Vichada Caribbean region : Atlántico, Bolívar, Cesar, Córdoba, La Guajira, Magdalena, San Andrés, Sucre, Barranquilla, Cartagena, and Santa Marta Pacific region : Chocó, Nariño, and Valle Del Cauca Based on the geographical distribution of the departments, Nariño, Valle Del Cauca, Guainía, and Cauca appear in two regions each. However, we include them in one region as stated above, to avoid the duplication of data for our analysis. (ii) Mortality data COVID-19 mortality data based on the date of death is retrieved from the Colombian Ministry of Health as of October 31, 2021 [23]. Mortality data is used to generate the national short-term forecast of the COVID-19 pandemic and estimate the national reproduction number. (iii) Genomic data To estimate the reproduction number from the genomic data, from the 1855 samples available as of June 30, 2021, 136 SARS-CoV-2 genome samples for Colombia are obtained from the “Global Initiative on Sharing Avian Influenza Data” (GISAID) repository [42] between February 27 and April 5, 2020 to estimate the reproduction number. (iv) Mobility trends data Two sources of mobility data are analyzed at the national level from March 6, 2020, to October 31, 2021. These data were retrieved as of October 31, 2021. Apple’s mobility data is a set of data that summarizes the number of requests made to Apple Maps for directions. The data are categorized by countries/regions, sub-regions, and cities. Apple has released the data for the three modes of human mobility: driving, walking and public transit. These data are published publicly by Apple’s mobility trends reports [43]. These aggregated and anonymized data are updated daily and includes the relative volume of directions requests per country compared to the baseline volume on January 13, 2020. To emphasize on the change rather than the gross number of requests, Apple has normalized all the data. The number of direction requests for each country or city and the type of direction request (driving, walking and public transit) has been set to 100 for January 13, 2020. All subsequent data are relative to the starting point. Google’s mobility data [44] is retrieved from users who have turned on the location history in their Google account. Google’s mobility data shows the changes in mobility in each geographic and administrative region for visits to different places, such as grocery stores, parks, and recreation spots. The baseline day represents the normal value on the day of the week. The baseline day is the median value from the 5-week period from January 3, 2020, to February 6, 2020. (v) Twitter data We retrieved data from the publicly available Twitter data set of the COVID-19 chatter from March 12, 2020 to October 31, 2021 [45]. A detailed description of Twitter data acquisition and processing is provided in a prior study [45]. Modeling framework for forecast generation We utilize and compare three dynamic phenomenological growth models to generate short-term (i.e., 30-day ahead) forecasts for Colombia. These models have been applied to various infectious diseases including SARS, foot and mouth disease, Ebola [46–48] and the current COVID-19 outbreak [49–51]. The phenomenological growth models applied in this study include the two models defined by one scalar differential equation namely the generalized logistic growth model (GLM) [47] and the Richards growth model [52]. Additionally we apply the sub-epidemic wave model [46] which captures diverse epidemic trajectories such as the multimodal outbreaks. The forecasts obtained from these dynamic growth models can assess the potential scope of the pandemic in near real-time, provide insights on the contribution of disease transmission pathways, predict the impact of control interventions, and evaluate optimal resource allocation to inform public health policies. A detailed description of these models is provided in S2 Text. Model calibration and forecasting approach We utilize the national and regional level case incidence data retrieved from the Colombian Ministry of Health based on the dates of symptom onset as of October 31, 2021. We also utilize the national level mortality data based on the date of death as of October 31, 2021. Each forecast is fitted to the daily case counts based on the dates of symptom onset and daily death counts based on the date of death between July 4, 2021, and October 1, 2021 (90 days calibration period) to conduct a 30-day ahead short-term forecast for each model. The data from October 2, 2020, to October 31, 2021, is utilized to assess the performance of our 30-day ahead short-term forecasts. Details of the model calibration and forecasting approach are provided in S3 Text. Performance metrics The following five performance metrics are used to assess the quality of our model fit and the 30-day ahead short-term forecasts: the mean absolute error (MAE) [53]; the root mean squared error (RMSE) [54]; the coverage of the 95% prediction intervals (95% PI) [54]; the mean interval score (MIS) [54] and the weighted interval score (WIS) [55]; for each of the three models: GLM; Richards model; and the sub-epidemic wave model. A detailed description of the performance metrics is provided in S4 Text. Reproduction number In order to understand the transmission dynamics of COVID-19, we estimate the effective reproduction number, R t , for the early ascending phase from February 27, 2020, to March 27, 2020, and the effective reproduction number R t throughout the pandemic for the national and regional COVID-19 epidemic curves. The effective reproduction number, R t , is the key parameter that characterizes the average number of secondary cases generated by a primary case at calendar time (t) during the outbreak. This quantity is crucial for identifying the magnitude of public health interventions required to contain an epidemic [56–58]. The estimates of R t indicate whether widespread disease transmission continues (R t >1) or disease transmission declines (R t <1). Therefore, to contain an outbreak, it is vital to maintain R t <1. We utilize the case incidence, mortality, and the genomic data to estimate the reproduction numbers using three rigorous methods to compare the estimates of R t obtained from each method and data source. Effective reproduction number R t , using the generalized growth model (GGM) We estimate the national and regional reproduction numbers by calibrating GGM to the early growth phase of the pandemic [59], from February 27, 2020 to March 27, 2020. The description of the GGM is provided in S2 Text. The generation interval of SARS-CoV-2 is modeled with the assumption of a gamma distribution with a mean of 5.2 days and a standard deviation of 1.72 days [60]. We first characterize the daily incidence of local cases using the GGM. The progression of local incidence cases by dates of symptom onset, I i , is simulated using the calibrated GGM model and account for the daily series of imported cases by dates of symptom onset, J i , into the renewal equation to estimate the effective reproduction, as The factor J i represents the imported cases at time t i , I i denotes the local case incidence at calendar time t i and ρ j represents the discretized probability distribution of the generation interval. The factor 0≤α≤1 represents the relative contribution of imported cases to secondary disease transmission. We perform sensitivity analysis by setting α = 0.15, α = 1.00 and α = 0.00 to assess the relative contribution of the imported cases to the secondary disease transmission [40]. The numerator represents the total new cases I i , and the denominator represents the total number of primary cases that contributes to the generation the new cases I i (i.e., as secondary cases) at time t i . Hence, R t , represents the average number of secondary cases generated by a single case at calendar time t. The uncertainty bounds around the curve of are derived directly from the 300 bootstrap samples generated from the GGM model with estimated parameters ( ) where i = 1,2, …S. We assume a negative binomial error structure where the variance is assumed to be three times the mean based on the noise of the data [61]. This method is utilized to derive the early estimates of reproduction number and has been employed in several prior studies [50,61,62]. Since the national and regional epidemic curves have a distinct date of onset according to the seeding of the first local case, therefore, R t for the national and regional level is estimated based on the date of onset of the first local case. For the national and regional epidemic curves, we estimate R t for the first 30 days (Table 1). PPT PowerPoint slide PNG larger image TIFF original image Download: Table 1. Dates for R t estimation for the national and regional epidemic curves for the first 30 epidemic days. https://doi.org/10.1371/journal.pntd.0010228.t001 Effective reproduction number R t , using the Cori et al. method We estimate the regional and national effective reproduction numbers (also known as the time dependent instantaneous reproduction numbers) using the local and imported case incidence data based on the date of symptoms onset as of October 31, 2021. The total number of incident cases arising at time step t, I t , is the sum of the number of local incident cases ( ) and imported incident cases ( ). Hence, We assume that if the imported cases exist, they can be distinguished from the local cases through epidemiological investigations so that and can be observed at each time step [63,64]. The time dependent (instantaneous) R t is defined as the ratio of the number of new locally infected cases generated at calendar time t ( ), and the total infectiousness across all infected individuals at time t given by [65,66]. Hence R t can be written as In this equation, the factor w s represents the infectivity function, which is the infectivity profile of the infected individual. The infectivity function is dependent on the time since infection (s), but is independent of the calendar time (t) [67,68]. The term describes the sum of infection incidence up to time step t − 1, weighted by the values of infectivity function w s . The distribution of the generation time can be applied to approximate w s , however, since the time of infection is a rarely observed event, measuring the distribution of generation time becomes difficult [65]. Therefore, the time of symptom onset is usually used to estimate the distribution of the serial interval (SI), which is defined as the time interval between the dates of symptom onset among two successive cases in a disease transmission chain [69]. The infectiousness of a case is a function of the time since infection, which is proportional to w s if the timing of infection in the primary case is set as time zero of w s and we assume that the generation interval equals the serial interval (SI). The SI is assumed to follow a gamma distribution with a mean of 5.2 days and a standard deviation of 1.72 days [60]. We also estimate R t for the mortality curve by the date of death as of October 31, 2021. The analytical estimates of R t are obtained within a Bayesian framework using the Cori et al. method, employing the EpiEstim package in R version 4.0.3. [69]. The values for R t are estimated using 1-weekly sliding window. We report the median and 95% credible interval (CrI). Reproduction number, R, from the genomic analysis To estimate the reproduction number for SARS-CoV-2 between February 27, 2020 and April 5, 2020 from the genomic data, 136 SARS-CoV-2 genome samples from Colombia and their sampling times are obtained from the GISAID repository [42,70]. Short sequences and sequences with significant number of gaps and non-identified nucleotides are removed. For clustering, they are complemented by the sequences from other geographical regions across the globe, downsampled to n = 1470 representative sequences using a country specific sub-sampling algorithm of Nextstrain. Briefly, the algorithm first chooses the sequences from the focal country, and then complements them with the sequences obtained from other geographical regions, prioritizing sequences from the geographical neighbors and the sequences that are genetically close to the focal sequences. This decreases the size of the genomic data set to allow for feasible phylogenetic inference. We use the sequence subsample from Nextstrain (www.nextstrain.org) global analysis as of July 4, 2021. These sequences are aligned to the reference genome obtained from the literature [71] using Muscle [72] and trimmed using Molecular Evolutionary Genetics Analysis version: 7 (Mega 7) [73]. The largest Colombian cluster that possibly corresponds to within-country transmissions has been identified using hierarchical clustering of sequences. Phylodynamics analysis of that cluster is carried out using Bayesian Evolutionary Analysis by Sampling Trees (BEAST) [74] version 2. We use a strict molecular clock and a tree prior with exponential growth coalescent. Markov Chain Monte Carlo (MCMC) sampling is run for 10,000,000 iterations, and the parameters are sampled every 1000 iterations. The results are acceptable if the effective sample sizes are above 200 for all parameters. The exponential growth rate, f estimated by BEAST is used to calculate the basic reproduction number, R 0 . We utilize the standard assumption that SARS-CoV-2 generation intervals (times between infection and onward transmission) are gamma-distributed [75]. In this case, R 0 can be estimated as where μ and σ are the mean and standard deviation of the gamma distribution, respectively. The values of μ = 5.2 days and σ = 1.72 days are taken from a prior study [60]. The formula for R 0 is a strictly monotone transformation of , therefore it also transforms the 95% highest posterior density (HPD) intervals for f into those for R 0 . Spatial analysis We quantify and statistically analyze the shapes of the incidence rate curves. Shape analysis of the national and regional incidence rate curves is performed following the analytical methodology as described in previous literature [76] including steps to pre-process the daily cumulative COVID-19 case data based on the dates of symptom onset. Then, we analyze the shapes of these rate curves to compare, cluster and summarize the incidence rates. The pre-processing steps applied to the COVID-19 daily count data are as follows: Smoothing: Cumulative case curves are smoothed over 10 days using the smooth function in the MATLAB. The smooth function returns a moving average of the elements of a vector using a fixed window length which is determined heuristically, and in the case of spatial analysis, it is 10 days. The window slides down the length of the vector, computing an average over the elements within each window. Time differencing: If f i (t) denotes the given cumulative number of confirmed cases for department i on day t, then the growth rate per day at time t is given by g i (t) = f i (t)−f i (t−1). Rescaling: Each curve is rescaled by dividing each g i by the total number of confirmed cases. Smoothing: We then smooth the normalized curves again over a 5-day span using the smooth function in the MATLAB. To identify the clusters by comparing the curves, we used a simple metric. For any two rate curves, h i and h j , we compute the norm ||h i −h j ||, where the double bars denote the L2 norm of the difference function, that is, To perform clustering of thirty six curves into smaller groups, we apply the dendrogram function in the MATLAB using the “ward” linkage as explained in reference [76]. The number of clusters is determined empirically based on the display of overall clustering results. After clustering the departments and administrative divisions into different groups, we derive the average curve for each cluster after using a time wrapping algorithm as performed in previous studies [76,77]. All analysis is conducted in MATLAB R2020b. Twitter data analysis To observe any relationship between the COVID-19 cases based on the dates of symptom onset and the frequency of tweets indicating stay-at-home orders we used a public dataset of 9,233,913,414 unique tweets of the COVID-19 chatter [45]. The frequency of tweets indicating stay-at-home order is used to infer people’s compliance with the stay-at-home-orders implemented to avoid spread of the virus by practicing social distancing. Tweets indicate the number of the people in favor of lockdowns and show how these numbers have varied throughout the pandemic. We removed all retweets and tweets not in the Spanish language to obtain the plotted data. We also filtered the tweets using the following hashtags: #quedateencasa (stay-at-home), and #trabajardesdecasa (work-from-home), which are two of the most used hashtags when users refer to the COVID-19 pandemic and their engagement with health measures. Lastly, our analysis was restricted to the tweets that originated from Colombia. A set of 493,276 unique tweets were gathered between March 12, 2020, and October 31, 2021. The usage of Twitter in Colombia is less compared to other Latin American countries like Mexico and Chile [78], hence we observe a lower proportion of filtered tweets. We overlay the curve of filtered tweets and the epidemic curve in Colombia to observe any relation between the shape of the epidemic trajectory and the shape of curve for the frequency of filtered tweets during the established time-period. The correlation coefficient between the daily cases based on the dates of symptom onset and frequency of filtered tweets is also estimated using the "corrcoef” function in MATLAB version R2020b. This coefficient explains the linear dependency of the daily cases based on the dates of symptom onset and frequency of filtered tweets. Mobility data analysis We utilize the R code developed by Healy [79] to analyze the time series data for Colombia from March 6, 2020 to October 31, 2021, for two modes of mobility: driving and walking derived from Apple’s mobility reports. We analyze the mobility trends to discover any commonality in pattern of the mobility curves with the epidemic curve of COVID-19. The time series for mobility requests is decomposed into trends, weekly and remainder components. The trend is obtained by applying locally weighted regression fitted to the data and the remainder is any residual leftover on any given day after accounting for the underlying trend and normal daily fluctuations. The analysis is performed using ‘covmobility’ package in R software, version 4.0.3. We analyze the Google mobility trends from March 6, 2020, to October 31, 2021 for visits to the grocery stores and pharmacy, parks, transit stations, workplaces, and residential areas compared to the baseline which is the median value, for the corresponding day of the week, during the five weeks from January 3, 2020 to February 6, 2020. For each region category, the baseline is not a single value; it comprises of seven individual values. The same number of visitors on two different days of the week results in different percentage changes. The analysis is performed using ‘tidycovid19’ package and R software, version 4.0.3. Discussion In this study, we characterize the forecasting efforts and spatio-temporal transmission dynamics of the COVID-19 pandemic in Colombia by fitting mathematical models to the national and spatially distributed time series data. Our results point towards the sub-epidemic model as the most accurate model in terms of the calibration and forecasting performances. More importantly, the regional and national level GLM and Richards forecasts point towards a continuous declining trend in the epidemic trajectory in comparison to the sub-epidemic model that is able to reproduce the sustained growth pattern particularly distinguishable for the national, Caribbean, Andean and the Pacific region. Overall, the transmission dynamics show sustained disease transmission during the early phase of the COVID-19 pandemic exhibiting sub-exponential growth dynamics at the regional and national levels. These findings are consistent with the estimates of reproduction number obtained from the genomic data. As the epidemic progressed, fluctuations in R t were observed with the most recent estimates of R t <1.00 indicating disease containment. Regarding the geographical distribution in the national territory, Colombian departments can be segregated into four distinct clusters based on the growth rate patterns in each cluster. Moreover, while the increase in mobility patterns in June and July 2020 correspond to the observed peak in case incidence in July 2020, a spike in the number of tweets indicating the stay-at-home orders was observed in November 2020 when the case incidence had levelled off. Appropriate short-term forecasts at the national and regional levels can help guide the intensity and magnitude of public health interventions required to contain the epidemic. In general, the national and regional short-term forecasts from the Richards model and the GLM point towards a sustained decline in the over-all case counts similar to the forecast produced by the sub-epidemic wave model for the Amazon and Orinoquía region. However, the sub-epidemic model predicts the stabilization in case incidence for the national, Andean, Caribbean, and the Pacific region. On the contrary, the mortality curve forecast predicted by the sub-epidemic wave model shows an increase in the death counts as is projected by the IHME model [82]. In general, the forecasts generated by the GLM, and Richards model predict that national and regional outbreaks have nearly reached extinction (Figs 3 and 4) in comparison to the forecasts generated by the sub-epidemic wave model. Therefore, the projections made using these models should be interpreted with caution given the instability in the reporting patterns and reporting delays. It can be appreciated that the sub-epidemic wave model is able to capture the underlying structure of the sub-epidemic building blocks comprising the coarse scale epidemic [46]. In our analysis the sub-epidemic wave model performs better than the GLM and Richards model in short term forecasts based on the performance metrics (Tables 3 and S2) that account for uncertainty in predictions similar to the findings in Mexico where the sub-epidemic wave model outperformed the GLM and Richards model in short term forecasting the COVID-19 pandemic [49]. The sub-epidemic wave model is also a better fit to the epidemic trajectories during the calibration period compared to the other two models (Tables 2 and S1). The scaled COVID-19 case incidence based on the dates of symptom onset for all regions and the national data shows a coherent pattern with fluctuations in the percentage of cases around ~0.001 between April 2020 and October 2021 (Fig 8). The epidemic curve of the COVID-19 pandemic displays multimodal peaks as of October 31, 2021. As observed the COVID-19 epidemic peaked in mid-July 2020 followed by a decline in case incidence in mid-September 2020, only to rise again. The national COVID-19 curve declined at the beginning of year 2021. However, by the end of April 2021 another surge in the COVID-19 cases was observed that continued until the end of June 2021, with estimates of R t ~1.00, indicating continued virus transmission in the country. The early transmission dynamics of SARS-CoV-2 exhibit similarity at the national and regional levels. The COVID-19 pandemic in Colombia exhibited sub-exponential growth dynamics (01.00. The results of our genomic analysis for Colombia agree with the sustained disease transmission in the country during the early ascending phase of the pandemic with R>1.00. These estimates indicate that although containment strategies were implemented during the first thirty days to mitigate the impact of the pandemic (Fig 1), additional interventions should be prioritized such as compulsory social distancing and intensified case surveillance. The results of our analysis are compatible with the estimates of early reproduction numbers retrieved from other countries, including Peru, Chile, Brazil and Mexico which followed similar COVID-19 outbreaks around the same time period [39,49,50,85]. We also employ the Cori et al. method to estimate the overall transmission potential (effective reproduction number) throughout the COVID-19 pandemic in Colombia to observe the spatiotemporal variation in R t across different time points. Indeed multiple factors can be attributed to influence the reproduction number including the transmissibility of an infectious agent, individual contact rates, individual susceptibility and mitigation strategies [57,86]. Our results indicate that the national reproduction number remained substantially high (R t >1) until the end of July 2020, after which it declined and fluctuated around ~1.0 with no major deviations. The most recent estimates of R t indicate a decrease in the transmission potential of the virus with R t estimated at ~0.8, indicating disease containment. At the regional level, Orinoquía, Pacific and Caribbean regions displayed the greatest fluctuation in the transmission potential. Moreover, the estimates of R t remained between 3.02–3.46 until March 6, 2020, for all regions. As the pandemic progressed and control interventions were put in place, the estimates of R t declined to fluctuate at around ~1.0. The estimates of R t remained consistently above 1.00 in the Caribbean, Andean and the Pacific regions until July 2020, after which R t fluctuated at ~1.00. The most recent estimates of transmission potential, as of October 31, 2021, at the regional level show that the control interventions have a positive impact on the control of the spread of the virus. In general, Colombia has observed sustained disease transmission of SARS-CoV-2 despite the implementation of social distancing interventions. As can also be seen from Twitter analysis, the number of cases based on the dates of onset are positively correlated with the tweets indicating the stay-at-home orders. A plausible explanation of this phenomenon could be that the people might have stopped following the government’s preventive orders to stay at home as a result of pandemic fatigue or due to confusion regarding the national and regional policies. There was a decline in the mobility profile in Colombia after the implementation of social distancing mandates and declaration of national emergency. However, as the pandemic progressed, the mobility of the public reached a normal level with fluctuations around the baseline. A study reported that the average mobility reduction in Colombia fluctuated during the pandemic, starting at ~80% in April 2020, and reaching ~30% by the end of September 2020 [87]. In comparison Argentina and Chile have reported the most steady mobility reductions of ~80% and 70% respectively during the course of the pandemic [87]. The most recent data shows an increase in mobility, although the case counts seem to be stabilizing at ~1307 cases per day for the month of October 2021. Analyzing the data at a finer scale provides an opportunity to characterize the growth patterns and appreciate the spatially clustered pattern of COVID-19 that emerges at the departmental level. Our classification of epidemic patterns in Colombia at the department level, including the 4 main districts, shows distinct variations in growth rates across different departments. For instance, cluster 1 including Antioquia, Caquetá, Cesar, Choco, Magdalena, Norte Santander, Putumayo, Quindío, Risaralda, Sta Marta D.E., and Valle del Cauca shows a fluctuating yet multipeak rising pattern in growth rate. In contrast, cluster 3 that includes Vichada and cluster 4 that includes Amazonas, Guainía, and Vaupés show stable growth at lower rates. This information can be utilized by the departments, districts, and regions to guide their decision regarding the implementation of public health measures. For example, departments in cluster 1 show fluctuating increasing growth pattern during the first three months and may need strict public health measures to contain the epidemic compared to cluster 3 which displays slow growth during the first three months. Moreover, as the mobility increased in June and July 2020, an increase in the growth rate patterns can also be observed in cluster 1, 2 and 4 (Figs 12 and 13). This study has some limitations. First, we excluded the last 11-day case counts from our study in to overcome the influence of reporting delays on our analysis. Specifically, we utilize the case counts based on the dates of onset in this study, because delays in case reporting, testing rates and factors related to the surveillance systems can influence our epidemic projections. Secondly, we relied on the daily updates of cases in the official surveillance system of Colombia, which can sometimes underreport the cases. Third, the phenomenological models applied in this study do not explicitly account for behavioral changes, and thus the results such as the predicted decline or stability in the epidemic trajectory should be interpreted with caution. Fourth, some authors [80] shifted the dates of symptom onset backwards to approximate the date of infection when applying the Cori et al. method to estimate the instantaneous reproduction number. However, in this study we stick to the date of symptoms onset for the sake of comparison across methods to estimate the reproduction number. Lastly, the unpredictable social component of the epidemic on ground was also a limiting factor for the study as we did not know the ground truth epidemic pattern when the forecasts were generated. In conclusion, Colombia observed a surge in case counts as the mobility increased in July 2020; however, since the end of 2020, the national and regional reproduction numbers have fluctuated around ~1.00 indicating sustained virus transmission in the region. The most recent estimates of R t <1.00 indicate disease containment as of October 31, 2021. Moreover, the population of Colombia has had mixed reactions towards the stay-at-home orders as can be inferred from the Twitter analysis, contributing to the virus transmission in the country. Importantly, the spatial analysis indicates that departments like Antioquia, Caquetá, Cesar, and Choco need stronger public health strategies to contain the rising patterns in growth rates. Although Richards model and GLM point towards a sustained decline in the case and death counts in general, the sub-epidemic model indicates an upward trend in death counts and a stable case incidence pattern for the national, Pacific, Caribbean, and Andean regions. Hence, the forecasts need to be interpreted with caution given the spatial heterogeneity in transmission rates as well as dynamic implementation and lifting of the social distancing measures. The phenomenological growth models employed in this study to forecast and estimate reproduction numbers are particularly valuable for providing rapid predictions of the epidemics in complex scenarios that can be used for real-time preparedness because these models do not require specific disease transmission processes to account for the interventions. [END] --- [1] Url: https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0010228 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/