(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Global microbial water quality data and predictive analytics: Key to health and meeting SDG 6 [1] ['Joan B. Rose', 'Department Of Fisheries', 'Wildlife', 'Plant Soil', 'Microbial Sciences', 'Michigan State University', 'East Lansing', 'Michigan', 'United States Of America', 'Nynke Hofstra'] Date: 2023-08 Abstract Microbial water quality is an integral to water security and is directly linked to human health, food safety, and ecosystem services. However, specifically pathogen data and even faecal indicator data (e.g., E. coli), are sparse and scattered, and their availability in different water bodies (e.g., groundwater) and in different socio-economic contexts (e.g., low- and middle-income countries) are inequitable. There is an urgent need to assess and collate microbial data across the world to evaluate the global state of ambient water quality, water treatment, and health risk, as time is running out to meet Sustainable Development Goal (SDG) 6 by 2030. The overall goal of this paper is to illustrate the need and advocate for building a robust and useful microbial water quality database and consortium worldwide that will help achieve SDG 6. We summarize available data and existing databases on microbial water quality, discuss methods for producing new data on microbial water quality, and identify models and analytical tools that utilize microbial data to support decision making. This review identified global datasets (7 databases), and regional datasets for Africa (3 databases), Australia/New Zealand (6 databases), Asia (3 databases), Europe (7 databases), North America (12 databases) and South America (1 database). Data are missing for low- and middle-income countries. Increased laboratory capacity (due to COVID-19 pandemic) and molecular tools can identify potential pollution sources and monitor directly for pathogens. Models and analytical tools can support microbial water quality assessment by making geospatial and temporal inferences where data are lacking. A genomics, information technology (IT), and data revolution is upon us and presents unprecedented opportunities to develop software and devices for real-time logging, automated analysis, standardization, and modelling of microbial data to strengthen knowledge of global water quality. These opportunities should be leveraged for achieving SDG 6 around the world. Citation: Rose JB, Hofstra N, Hollmann E, Katsivelis P, Medema GJ, Murphy HM, et al. (2023) Global microbial water quality data and predictive analytics: Key to health and meeting SDG 6. PLOS Water 2(8): e0000166. https://doi.org/10.1371/journal.pwat.0000166 Editor: Jennifer Davis, Stanford University, UNITED STATES Published: August 23, 2023 Copyright: © 2023 Rose et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: The authors received no specific funding for this work. Competing interests: The authors have declared that no competing interests exist. 1. Introduction In March 2023, the world gathered to review the successes and continued needs for implementing the objectives of the International Decade for Action (2018–2028): Water for Sustainable Development. Greater attention was given to pathogen pollution and disease monitoring through wastewater surveillance. Poor microbial water quality affects the entire water cycle and the global indicators tell the tale, where only 74%, and 54%, of the world’s population has access to safely managed drinking water and safely managed sanitation, respectively (as of 2020, https://unstats.un.org/sdgs/report/2022/goal-06/). This report also calls for greater monitoring. In addition, only 56% of the household wastewater flows are safely treated (https://www.unwater.org/publications/progress-wastewater-treatment-2021-update). Drinking water and wastewater also intersect with fresh surface waters, where only 72% of the world’s monitored waters have good ambient water quality (UN global indicators https://sdg6data.org/en). This does not account for all the waters that are unmonitored. High levels of faecal microorganisms in water sources affect designated uses, such as drinking water supply, irrigation, recreation, and fisheries, thus impacting not only water security but food security, well-being, and economic development. Severe faecal pollution (defined as faecal coliform/Escherichia coli concentrations > 1000 cfu/100ml) affects about one-third of river stretches in Latin America, Africa and Asia [1]. The majority of the faecal pollution comes from domestic sewage and inadequately treated wastewater, but animal faecal wastes are a growing source of zoonotic pathogens [1]. While increased human and animal populations, land use changes, and poor infrastructure contribute to faecal pollution, it is well known that climate change and weather extremes are a driver of water quality degradation and are linked to endemic waterborne disease and outbreaks [2, 3]. A discussion of water quality is integral to the themes of the UN 2023 Water conference which was the first one in over 30 years, organized to “mobilize Member States, the UN system and stakeholders alike to take action and bring successful solutions to a global scale.” https://sdgs.un.org/conferences/water2023. The themes included 1) Water for Health; 2) Water for Development: 3) Water for Climate; 4) Water for Cooperation; and 5) Water Action Decade. The slow progress and interruption of efforts due to the COVID-19 pandemic have made the need for the SDG 6 Global Acceleration Framework (https://www.unwater.org/publications/sdg-6-global-acceleration-framework) more essential than ever before (Box 1). This acceleration framework includes data and information as one of the key accelerators to dramatically improve knowledge about and support for attaining the international goals. Box 1. Improved data and information–Data generation, validation, standardization and information exchange will build trust so leaders can make informed decisions and increase accountability. Success looks like: High-quality information on SDG 6 indicators is shared and easily accessible by any decision maker. (https://www.unwater.org/sites/default/files/app/uploads/2021/03/Global-Acceleration-Framework-Brief.pdf) Global data sets on microbial water quality are underdeveloped, providing little knowledge on spatial and temporal hotspots (areas or time periods with high concentrations), trends and impacts. In particular, there are only a few national datasets and little monitoring and knowledge on microbial water quality for low-and-middle income countries. The available data are mainly on faecal indicator bacteria (FIB), which are not adequate to address health concerns associated with viruses, protozoa or helminths [4, 5]. Moreover, microbial water quality analyses are currently hindered by limitations of the older culture-based approaches for FIB where results are delayed for 24 hours, faecal source identification (eg. association with humans or animals) is not available, and environmental sources of FIB interfere with the interpretation. Yet, we now have new powerful molecular methods at our fingertips that provide the ability to improve our understanding of microbial water quality around the world that include microbial source tracking and direct pathogen monitoring [4]. The SDGs themselves have inspired the world to collect and share data from each country to track progress on ambient water quality as well as the safety of drinking water through the Joint Monitoring Program (JMP). Interestingly, the COVID-19 pandemic increased laboratory capacity across the globe (https://arcg.is/1aummW) and demonstrated that monitoring sewage and polluted waters for viruses directly is feasible and valuable to support the pandemic response. Pathogen and source tracking monitoring data are critical for evidence-based decisions on wastewater treatment, waterbody protection, and restoration of polluted waters to control pathogens and will provide long lasting approaches to protect health, water quality and the environment. The overall goal of this paper is to illustrate the need and advocate for building a robust and useful microbial water quality database and consortium worldwide that will help achieve SDG 6. We 1) summarize available data and existing databases on microbial water quality, supported by the United Nations, governments, and private entities, 2) discuss the use of innovative molecular methods for microbial water quality, 3) identify and highlight the water quality models and analytical tools that exist that can utilize microbial data to support water quality assessment and decision making. In the paper we refer to microbiological water quality for different water bodies. Our main focus is water quality of ambient water (surface and groundwater), as this closely links to SDG 6.3.2 on the proportion of water bodies with good ambient water quality. However, to better understand the ambient water quality, also wastewater (as a source) and irrigation, recreational and drinking water (related to the impacts (health risks)) are relevant. These data can be used in assessments with molecular methods, models and tools from the sources of pathogens to the exposures and can inspire use of relevant tools for addressing the One Water concept [6] (a water management approach adopted from integrated water resources management as described by the Water Research Foundation and US Water Alliance among other). 2. Available water quality databases Table 1 presents a summary of microbial databases that exist around the world. The table is not exhaustive but helps to illustrate the scale and fragmentation of microbial water quality data globally. Databases range in scale from global to local and mostly cover surface waters, although some databases cover groundwater and drinking water. Many databases are open and some of the data can be viewed online or downloaded in the form of a report or machine-readable formats. PPT PowerPoint slide PNG larger image TIFF original image Download: Table 1. Summary of key microbial water quality databases a . https://doi.org/10.1371/journal.pwat.0000166.t001 There are two global databases that are open and most accessible for downloading of microbial water quality data for ambient waters: 1) The Global Freshwater Quality Database (GEMStat) by the United Nations Environment Program (https://gemstat.org/about/data-availability/). The GEMStat database provides data on the state of global inland water quality. At the time of writing, the database contains more than 15 million entries from more than 80 countries. However, most of these entries are for chemical and physical parameters with microbiological parameters representing only a minority of entries (6874 entries). The GEMS site allows for data download, visualisation and data upload. 2) The Water Quality Portal is a data portal assembled by the United States Geological Survey (USGS), the Environmental Protection Agency (EPA), and over 400 state, federal, tribal, and local agencies in the United States (https://www.waterqualitydata.us). The US-based Water Quality Portal has microbial indicator data from surface water and groundwater, mostly from across the United States. The database also contains some data from other countries and is setup to compile additional global data. However, currently it is unclear how many countries have microbial data in their database and the number of records available (Box 2 is an example of one such database). Box 2. Example of a National Database New Zealand has all microbial water quality data available at one website which covers groundwater, lake water, river water and recreational water. The data are up to date, easy to download and available as excel sheets. This serves as an excellent model that others could replicate. https://www.lawa.org.nz/download-data/ Ambient water databases should include both surface and groundwaters; this type of data could be linked to both drinking and recreational designated uses. Microbial water quality data on wastewater is primarily in the published literature or in records at facilities on discharge limits for FIB. However, an example of a global database with virus concentration data is the Wastewater-SPHERE, a global data and use case repository of wastewater surveillance for SARS-CoV-2 (https://sphere.waterpathogens.org/) currently containing almost 200,000 wastewater sample records from 21 countries (See Box 3). Box 3. The Wastewater SARS Public Health Environmental REsponse (WSPHERE) now includes 27 open access datasets from around the world, 1456 sites and 200,000 data entries for SARS-CoV-2 and now highlights six public health use cases from Canada, Catalonia, Ghana, the Netherlands, South Africa, and Switzerland. The global dataset is available upon request: https://sphere.waterpathogens.org/form/contribute?subject=global. There are several existing regional datasets. North America, Europe and Australia seem to dominate in the number of databases that exist for monitoring microbial water quality of surface waters for recreation, drinking water and groundwater. However, many of the databases are local, site specific, or region specific. Data in Asia and Africa exist, but most of the data are closed and not accessible. The situation is similar in South America where only one, closed, database was found in Brazil. Nearly all microbial data available in these databases are restricted to indicator bacteria such as total coliforms, thermotolerant (faecal) coliforms, E. coli, Enteroccocci, Streptoccocci and H2S forming bacteria. The CEDEN database (from California) includes some data on Giardia, Cryptosporidium, Salmonella, and human-associated microbial source tracking (MST) markers, and the GEMStat and SURVAL (from France) databases includes some Salmonella data, but otherwise pathogen data are absent. Spatially and temporally, the available microbial indicator data available are quite vast and cover millions of monitoring sites collectively across the globe. In some cases, the data date back to as early as the 1970s. However, the data are fragmented across local, regional, and national databases and it is impossible to easily visualize the data either spatially or temporally. The GEMStat database, although limited in the number of microbial data records, does allow you to visualize on a global map what countries have microbial water quality data, and then when you select certain regions you can drill down and temporally visualize datasets by waterbody. The GEMStat database could serve as a global repository for all microbial datasets for ambient waters which then could be linked to wastewater and drinking water data (both infrastructure and monitoring information). If data “keepers” from around the world would be willing to submit their data and appropriate metadata to this centralized repository in the necessary formats then a very valuable dataset would be available for making decisions on protection of watersheds and large waterbasins for a variety of ecosystem services. Several databases represent “big data”. Generally, these are generated using the data from large monitoring programs to evaluate compliance with drinking water or recreational water standards. The WHO-UNICEF JMP water quality testing household surveys or even the SDG and The United Nations Economic Commission for Europe (UNECE) datasets are based on large amounts of data, but only provide the summary statistics (% compliance in different categories). Such datasets are also communicated through published reports in non-reusable formats (PDF). The large databases on recreational water quality data from Europe and the United Kingdom (https://water.europa.eu/freshwater; https://environment.data.gov.uk/bwq/profiles/) allow access to the data by site, but do not allow users to query the data. Although it is understandable that these databases, which were created to address bathing water guideline compliance monitoring, are condensed into a format that allows evaluation of regulatory compliance, this also means there is significant loss of information and lack of reusability. Microbial water quality is often quite variable, and condensing data does not allow evaluation of the true spatial and temporal variability of the data and understanding of its underlying processes. So, even though the original (source) data are available and contain much more information, they are often omitted from public access. We strongly suggest that microbial databases provide access to all source data for their records, allowing users to validate, query and observe their geo-temporal variability. Lastly, to ensure the quality of a global microbial dataset, it is important for its metadata to comply with or even cross-reference other databases and standardized resources. For example, when capturing regional distribution of data, one could make use of existing administrative division standards such as ISO-3166-2 (https://www.iso.org/iso-3166-country-codes.html#2012_iso3166-2) or GADM (https://gadm.org/). Similarly, characterization of progress towards drinking water goals could follow the JMP classification. The same practice could prove particularly useful when referencing standard microbial methods employed such as the Standard Methods for the Examination of Water and Wastewater (https://www.standardmethods.org/) or the recently developed Environmental Microbiology Minimum Information (EMMI) guidelines for qPCR and dPCR [7]. Although ensuring high quality data is of the utmost importance, it may take many years to enforce data quality standards and this may be a barrier to researchers in publishing their data. At the starting point of a global dataset, it seems most practical to include data that has been provided by reputable sources (i.e. accredited labs, government agencies) and published datasets from researchers where one assumes that the peer review process will have ensured some data quality. Some journals such as Environmental Science and Technology are already requiring that molecular environmental data published in their journal report the necessary elements outlined in the EMMI guidelines prior to publication. However, there are also situations where urgent public health response is mandated. The COVID-19 pandemic has shown that many labs worldwide can provide useful data in the absence of standard methods, with the goal to inform decisions in real time. In such cases, efforts in compiling global datasets could focus on standardizing data that becomes available. Data standards such as the Public Health Environmental Surveillance Open Data Model (https://github.com/Big-Life-Lab/PHES-ODM) and initiatives that focus on global data harmonisation such as the W-SPHERE could prove to be valuable tools in determining the data and metadata gaps of published data, while ensuring their compliance with FAIR principles [8] and overall data quality. 3. Use of innovative molecular methods While most of the datasets in Table 1 contain data on FIB, the introduction of innovative molecular methods/instruments such as quantitative and digital PCR and next generation sequencing are opening up opportunities to monitor more specific targets, such as pathogens and markers for specific faecal pollution sources (See Box 4). The COVID-19 pandemic has boosted the use of molecular tools such as PCR in water laboratories around the globe [9, 10]. National and local wastewater monitoring programs were developed in many countries and continue to provide valuable, unbiased information about trends in SARS-CoV-2 infections and early warning of new virus variants-of-concern. For example, the COVIDPoops19 dashboard lists 166 dashboards of SARS-CoV-2 in wastewater, with 288 universities, 72 countries, and 4,107 sites (https://arcg.is/1aummW). Although wastewater monitoring is primarily occurring in high-income countries, many low- and middle-income countries have also implemented these methods [11]. Some countries and sites have even started to include other targets beyond COVID-19 including poliovirus, influenza A, Respiratory Syncytial Virus (RSV) and Monkeypox (Mpox). Box 4. Use of Microbial Source Tracking Genetic Targets for Water Quality [13] The impact on faecal pollution analysis in health-related water quality research by nucleic acid-based methods, such as PCR analysis and sequencing, was assessed by rigorous literature analysis. A wide range of application areas and study designs was identified, since the first application more than 30 years ago (>1,000 publications). This comprehensive meta-analysis provides the scientific status quo of this field, including trend analyses and literature statistics, outlining identified application areas, and discussing benefits and challenges of nucleic-acid-based analysis in water. Given the consistency of methods and assessment types, this emerging science is a new discipline: genetic faecal pollution diagnostics (GFPD) in health-related microbial water quality analysis. Without any doubt, GFPD has already revolutionised faecal pollution detection and microbial source tracking, the current core applications. The widespread application of GFPD in various studies on the impacts of pathogen pollution on ambient waters means that the data could easily be included in a global database. The presence of these global wastewater monitoring programs at labs with new molecular capabilities demonstrates the global capacity to fill some of the gaps in the existing microbial water quality datasets. The presence of such a large number of wastewater monitoring programs using molecular targets demonstrates that such monitoring efforts are possible for other matrices, such as surface waters, and for other targets. For wastewater surveillance and indeed for the monitoring of pathogens in ambient waters in general, this is just the beginning. A global consortium to support capacity development and data sharing would catalyse this evolution. One positive lesson that can be learned from the global wastewater surveillance efforts is the need to use standardized formats and data dictionaries. One way global wastewater monitoring labs can support ambient microbial water quality monitoring is through the utilization of new molecular tools, such as microbial source tracking. This method uses key molecular targets to identify more specific information about the sources of faecal pollution. This makes the monitoring data more useful to support measures to mitigate unsafe waters. Molecular microbial source tracking assays have moved from the academic realm to applications in environmental monitoring, for instance of bathing waters and transboundary waters (https://ijc.org/en/hpab/great-lakes-water-quality-centennial-study-phase-i-report) [12]. The great value of these new molecular methods is that all targets can be assayed with one platform (PCR). So once the method is established for one target, it can be expanded to other targets. We strongly support the use of new molecular methods for source tracking and targeted pathogen monitoring; resources should be focused on harnessing the laboratory infrastructure that was developed during the pandemic to monitor wastewater for SARS-CoV-2 to more targets and water bodies. This is particularly valuable in low- and middle-income countries and underserved areas in high-income countries, where waterborne diseases are most prominent. 5. Conclusions: Approaches to improve microbial water quality data worldwide There are great challenges along the road towards meeting the goals of SDG 6, but we also believe there are great opportunities for the digital transformation of the microbial water quality sector to support SDG 6. The time has come to invest in a global data initiative to accelerate the assessment and control of water pathogen pollution. Data and information are needed to accelerate the slow progress for meeting the SDG 6 goals and address microbial water quality to improve health, food safety, ecosystem services, and economic vitality of communities. First, we have shown that existing databases are fragmented and most do not have adequate spatial and temporal coverage, and mostly deal with faecal indicator data, which are inadequate for providing information on pathogen risks. Nevertheless, it is encouraging to see the evolution of large datasets, which generally originate from water quality directives. Global or supranational datasets (see EU Wise, WHO JMP etc) make use of these data, with global objectives such as the SDG as driving force. And despite limited resources and capacity, also LMIC generate data that feed these global data platforms and inform the government about their national WASH priorities. The GEMS database is a promising platform for a global microbial water quality data repository and we recommend that this platform be further utilized to house microbial datasets generated by government agencies and researchers across the globe. Second, we have highlighted that powerful new molecular methods and boosted laboratory capacity as a result of the COVID-19 pandemic means that there is more global capacity and models to generate high quality datasets, including pathogen data that are lacking. Third, we have identified models and analytical tools that can help fill gaps in data availability and support decision making and water quality assessments at various scales. Even with limited data, the use of water quality models and analytical tools will advance the understanding of risks and strategies for management. Ideally, databases, models, and tools should be developed together with contributors and users of the data. Databases together with the models and tools provide first steps towards a global knowledge repository on pathogen pollution. These data with appropriate metadata will be actionable to improve microbial water quality assessment and understanding and ultimately inform management decisions. As the COVID-19 pandemic has shown, microbial data can be rapidly collected, processed, and shared using FAIR principles [8]. There is also ample ground for new digital tools that provide access to data with an eye toward equity, especially focusing on low- and middle-income countries. As discussed, many existing databases data are often presented in non-actionable formats. However, moving toward state-of-the-art data science along with improvement of databases as presented in Section 4 will ensure that data exchange becomes much easier and more accessible. We provide the following recommendations for consideration. Form an international microbiological water quality consortium through the International Water Association to a) establish metadata and data quality criteria for microbial data (learning from the GEMStat and New Zealand government databases); b) develop mechanisms for data submission to a global database (e.g., governments submit data directly; researchers submit data as a requirement for publication in IWA journals) Advocate that the Global Freshwater Quality Database (GEMStat) by the United Nations Environment Program take the lead in compiling global microbial data and expand their current dataset, starting with including the data from the surface water databases listed in Table 1 using open data sharing principles. Start with IWA journals, to develop a standard for data submission to the central global repository for researchers. Making submission of microbial data a requirement for publishing will contribute to a FAIR microbial water quality data repository. Continue to invest in the laboratory capacity built during the pandemic to evaluate microbial water quality using genetic faecal pollution diagnostics, targeted pathogen monitoring in addition to faecal indicators. Improve and utilize models to fill the gaps in observational datasets. Their outputs should also be incorporated in the global database. Additionally, whenever a new model or analytical tool is developed, this should be added to a global repository of models and tools that is closely linked to the global database. Expand and co-develop training programs with stakeholders in low- and middle-income countries for increased capacity development in water quality monitoring and modelling. Training programs need to be developed in collaboration to address current inequalities (data colonialism) [71] and to ensure indigenous knowledge is utilized to improve the development and use of models and tools. Ideally, the low- and middle- income country scientists and practitioners lead the collection of samples and analysis in country and related publications. Training programs should be co-developed with stakeholders and training course development should become standard practice in the projects in which the models and tools are developed. Acknowledgments The authors thank Mr. Tom Slaymaker (Senior Statistics and Monitoring Specialist (WASH), UNICEF) and Dr Rick Johnston (Technical Officer, Joint Monitoring Programme (JMP) at World Health Organization) for providing thoughtful comments and suggestions regarding the various databases. We also recognize the hundreds of authors who have contributed to the Global Water Pathogens Project and W-SPHERE. [END] --- [1] Url: https://journals.plos.org/water/article?id=10.1371/journal.pwat.0000166 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/