(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Exome-wide association study to identify rare variants influencing COVID-19 outcomes: Results from the Host Genetics Initiative [1] ['Guillaume Butler-Laporte', 'Department Of Epidemiology', 'Biostatistics', 'Occupational Health', 'Mcgill University', 'Montréal', 'Québec', 'Lady Davis Institute', 'Jewish General Hospital', 'Gundula Povysil'] Date: 2022-12 Abstract Host genetics is a key determinant of COVID-19 outcomes. Previously, the COVID-19 Host Genetics Initiative genome-wide association study used common variants to identify multiple loci associated with COVID-19 outcomes. However, variants with the largest impact on COVID-19 outcomes are expected to be rare in the population. Hence, studying rare variants may provide additional insights into disease susceptibility and pathogenesis, thereby informing therapeutics development. Here, we combined whole-exome and whole-genome sequencing from 21 cohorts across 12 countries and performed rare variant exome-wide burden analyses for COVID-19 outcomes. In an analysis of 5,085 severe disease cases and 571,737 controls, we observed that carrying a rare deleterious variant in the SARS-CoV-2 sensor toll-like receptor TLR7 (on chromosome X) was associated with a 5.3-fold increase in severe disease (95% CI: 2.75–10.05, p = 5.41x10-7). This association was consistent across sexes. These results further support TLR7 as a genetic determinant of severe disease and suggest that larger studies on rare variants influencing COVID-19 outcomes could provide additional insights. Author summary COVID-19 clinical outcomes vary immensely, but a patient’s genetic make-up is an important determinant of how they will fare against the virus. While many genetic variants commonly found in the populations were previously found to be contributing to more severe disease by the COVID-19 Host Genetics Initiative, it isn’t clear if more rare variants found in less individuals could also play a role. This is important because genetic variants with the largest impact on COVID-19 severity are expected to be rarely found in the population, and these rare variants require different technologies to be studies (usually whole-exome or whole-genome sequencing). Here, we combined sequencing results from 21 cohorts across 12 countries to perform a rare variant association study. In an analysis comprising 5,085 participants with severe COVID-19 and 571,737 controls, we found that the gene for toll-like receptor 7 (TLR7) on chromosome X was an important determinant of severe COVID-19. Importantly, despite being found on a sex chromosome, this observation was consistent across both sexes. Citation: Butler-Laporte G, Povysil G, Kosmicki JA, Cirulli ET, Drivas T, Furini S, et al. (2022) Exome-wide association study to identify rare variants influencing COVID-19 outcomes: Results from the Host Genetics Initiative. PLoS Genet 18(11): e1010367. https://doi.org/10.1371/journal.pgen.1010367 Editor: Gregory M. Cooper, HudsonAlpha Institute for Biotechnology, UNITED STATES Received: April 6, 2022; Accepted: July 29, 2022; Published: November 3, 2022 Copyright: © 2022 Butler-Laporte et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: Code availability Code guidance is available at https://github.com/DrGBL/WES.WGS. Data availability The exome-wide burden test summary statistics are available in the manuscript's supporting information files. The single variant association studies summary statistics are available openly on the GWAS Catalog[62] (study numbers GCST90132193, GCST90132194, and GCST90132195). Participant level data from each corresponding cohorts may be accessed according to the rules of each cohort’s data sharing policies. Specifically, we refer readers to the following resources: - Biobanque Québecoise de la COVID-19 (BQC-19): https://www.bqc19.ca/ - Sedish biobank: https://swecovid.org/ - Columbia Biobank: https://www.vagelos.columbia.edu/research/researchers/core-and-shared-facilities/new-instruments-and-facilities/columbia-university-biobank - Geisinger Health Systems: https://www.geisinger.edu/research - Helix Exome+ and Healthy Nevada Project: https://healthynv.org/ - Penn Medicine Biobank: https://pmbb.med.upenn.edu/ - GEN-COVID Multicenter Study: https://sites.google.com/dbm.unisi.it/gen-covid - Qatar Genome Program: https://www.qatargenome.org.qa/ - Deutsche COVID-19 OMICS Initiative (DeCOI): https://decoi.eu/ - POLCOVID-Genomika: Medical University of Bialystok ethics board. - FHoGID: Commission cantonale d'éthique de la recherche sur l'être humain (CER-VD, https://www.cer-vd.ch/) - Interval: https://www.intervalstudy.org.uk/ - Saudi Human Genome Program: https://shgp.kacst.edu.sa/index.en.html - Genentech: https://www.gene.com/ - Mount Sinai Clinical Intelligence Center: https://labs.icahn.mssm.edu/minervalab/resources/data-ark/mscic-covid-19-biobank/ - Vanda COVID-19: https://www.vandapharma.com/ - University of California Los Angeles: https://www.uclahealth.org/precision-health/research - Japan COVID-19 Taskforce: https://www.covid19-taskforce.jp/en/home/ - Thai Biobank: Institutional Review Board of the Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand (COA No. 691/2021) - MNM Biosience Polish COVID WGS: https://mnmbioscience.com/ - UK Biobank: https://www.ukbiobank.ac.uk/. Funding: Genome sequencing of Biobanque Québec COVID-19 was funded by the CanCOGeN HostSeq project, with contribution from Fonds de Recherche Québec Santé (FRQS), Génome Québec, and the Public Health Agency of Canada. The Richards group is supported by the Canadian Institutes of Health Research (CIHR), the Lady Davis Institute of the Jewish General Hospital, the Canadian Foundation for Innovation, the NIH, Cancer Research UK, and FRQS. The Richards research group is supported by the Canadian Institutes of Health Research (CIHR: 365825; 409511, 100558, 169303), the McGill Interdisciplinary Initiative in Infection and Immunity (MI4), the Lady Davis Institute of the Jewish General Hospital, the Jewish General Hospital Foundation, the Canadian Foundation for Innovation, the NIH Foundation, Cancer Research UK, Genome Québec, the Public Health Agency of Canada, McGill University, Cancer Research UK [grant number C18281/A29019] and the Fonds de Recherche Québec Santé (FRQS). JBR is supported by a FRQS Mérite Clinical Research Scholarship. Support from Calcul Québec and Compute Canada is acknowledged. GBL is supported by FRQS and CIHR fellowships. The Columbia COVID-19 Biobank is supported by the Vagelos College of Physicians & Surgeons Office for Research, Precision Medicine Resource, and Biomedical Informatics Resource of the Columbia University Irving Institute for Clinical and Translational Research (CTSA). Columbia CTSA is funded by the National Center for Advancing Translational Sciences (UL1TR001873). The Columbia University COVID-19 Biobank was supported by Columbia University and the National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), through grant no. UL1TR001873. DeCOI NGS was supported by funding from the German Research Foundation (DFG; INST 217/1011-1) and were performed at the DFG-funded NGS Competence Center Tübingen (INST 37/1049-1) – Project-ID 286/2020B01 – 428994620 and the DFG-funded West German Genome Center (INST 216/981-1) - Project Number 407493903. Funding for this work was further received from the Stiftung Universitätsmedizin Essen, Germany. The COMRI cohort is funded through in-house institutional funding of the Technical University of Munich, Munich, Germany. Individual grants are as follows: AS is supported by the BONFOR program of the Medical Faculty, University of Bonn (O-149.0134). KUL is supported by the Emmy-Noether programm of the German Research Foundation (DFG; LU 1944/3-1). RB was supported by the State of Saarland and the Dr. Rolf M. Schwiete Foundation. ECS is supported by the Munich Clinician Scientist Programm (MCSP) and the DFG (SCHU2419/2-1). JR received funding from DFG, RY159/3-1; DFG SFB1403; BMBF COVIM 01KX2021. MA and PS were supported by Netzwerk-Universitaetsmedizin-COVIM: (NaFoUniMedCovid19, FKZ: 01KX2021) and the BMFB (Idepico). For FhOGID: P-YB is supported by the Swiss National Science Foundation (31CA30_196036, 33IC30_179636 and 314730_192616), the Leenaards Foundation, the Santos-Suarez Foundation as well as grants allocated by Carigest. CR is supported by the Swiss National Science Foundation (31CA30_196036, 31003A_176097, and 310030_204285). The GEN-COVID Multicenter Study (Italy) was funded by the MIUR project “Dipartimenti di Eccellenza 2018-2020” to Department of Medical Biotechnologies University of Siena, Italy (Italian D.L. n.18 March 17, 2020), private donors for COVID-19 research, “Bando Ricerca COVID-19 Toscana” project to Azienda Ospedaliero-Universitaria Senese, charity fund 2020 from Intesa San Paolo dedicated to the project N. B/2020/0119 “Identificazione delle basi genetiche determinanti la variabilità clinica della risposta a COVID-19 nella popolazione italiana”, the Italian Ministry of University and Research for funding within the “Bando FISR 2020” in COVID-19 and the Istituto Buddista Italiano Soka Gakkai for funding the project “PAT-COVID: Host genetics and pathogenetic mechanisms of COVID-19” (ID n. 2020-2016_RIC_3). The GEN-COVID (Spain) study received support from Instituto de Salud Carlos III (ISCIII): GePEM (PI16/01478/Cofinanciado FEDER; A.S.), DIAVIR (DTS19/00049/Cofinanciado FEDER, A.S.), Resvi-Omics (PI19/01039/Cofinanciado FEDER, A.S.), ReSVinext (PI16/01569/Cofinanciado FEDER, F.M.T.), Enterogen (PI19/01090/Cofinanciado FEDER, F.M.T.); Agencia Gallega para la Gestión del Conocimiento en Salud (ACIS): BI-BACVIR (PRIS-3, A.S.), and CovidPhy (SA 304 C, A.S.); Agencia Gallega de Innovación (GAIN): Grupos con Potencial de Crecimiento (IN607B 2020/08, A.S.), GEN-COVID (IN845D 2020/23, F.M.T.); Framework Partnership Agreement between the Consellería de Sanidad de la XUNTA de Galicia and GENVIP-IDIS - 2021-2024 (SERGAS-IDIS march 2021); and consorcio Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CB21/06/00103; F.M.T.). For Genentech: The COVACTA study was supported by F. Hoffmann-La Roche Ltd and, in part, by federal funds received from the U.S. Department of Health and Human Services, Office of the Assistant Secretary for Preparedness and Response, and Biomedical Advanced Research and Development Authority, under grant number HHSO100201800036C. For Helix+ and Healthy Nevada Project: Funding was provided to Desert Research Institute (DRI) by the Nevada Governor's Office of Economic Development. Funding was provided to the Renown Institute for Health Innovation by Renown Health and the Renown Health Foundation. Thai Biobank (Host genetic factors in COVID-19 patients in relation to disease susceptibility, disease severity and pharmacogenomics) funding was obtained from the following sources: 1.Ratchadapiseksompotch Fund, Faculty of Medicine, Chulalongkorn University (RA(PO) 003/63 and 764002-HE01). 2.Grant for the Healthcare-associated Infection Research Group STAR (Special Task Force for Activating Research), Chulalongkorn University (STF 6100430002-1). 3.Grant for Development of New Faculty Staff, Ratchadaphiseksomphot Endowment Fund (DNS 64_002_30_001_2). 4.The e-ASIA Joint Research Program (e-ASIA JRP) as administered by the National Science and Technology Development Agency. 5.Health Systems Research Institute, TSRI Fund (CU_FRB640001_01_30_10) and Thailand Research Fund (DPG6180001). Further, PC is supported by Ratchadapiseksompotch Fund, Faculty of Medicine,Chulalongkorn University, Bangkok, Thailand, Grant number RA(PO) 003/63. VN is supported by Ratchadapiseksompotch Fund, Faculty of Medicine,Chulalongkorn University, Bangkok, Thailand, Grant number RA(PO) 001/63. NH is supported by The e-ASIA Joint Research Program (e-ASIA JRP) as administered by the National Science and Technology Development Agency. VS is supported by Health Systems Research Institute (64-132) and the Ratchadapisek Sompoch Endowment Fund, Chulalongkorn University (764002-HE01), Bangkok, Thailand. Interval was funded by the NHS Blood and Transplant, the National Institute for Health Research, the UK Medical Research Council, and the British Heart Foundation Japan COVID-19 Task Force acknowledges the contribution of Japan Agency for Medical Research and Development (AMED) and Japan Science and Technology Agency (JST). The Japan NCGM-COVID-19 study was supported in part by Grants-in-Aid for Research from the National Center for Global Health and Medicine (20A2009) and the Agency for Medical Research and Development (AMED) (JP20fk0108416 and JP20fk0108104). MNM Diagnostics (Polish COVID WGS) partially supported by the Polish National Science Centre grant No. SZPITALE-JEDNOIMIENNE/2/2020 and by the Medical Research Agency grant No 2020/ABM /COVID19/0022. The Penn Medicine Biobank is supported by Perelman School of Medicine at University of Pennsylvania, a gift from the Smilow family, and the National Center for Advancing Translational Sciences of the National Institutes of Health under CTSA award number UL1TR001878. The POLCOVID-Genomika study was financially supported by the Polish Medical Research Agency (ABM) grant no. 2020/ABM/COVID19/0001. The Qatar Genome Program and Qatar Biobank are both Research, Development & Innovation entities within Qatar Foundation for Education, Science and Community Development. The Saudi COVID-19 acknowledges the Saudi Ministry of Health and King Abdulaziz City for Science and Technology (KACST). The Swedish Biobank received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 824110 (EASI-Genomics). Sequencing was performed by the National Genomics Infrastructure SNP&SEQ facility, which is supported by Science for Life Laboratory, the Swedish Research Council, and the Knut and Alice Wallenberg Foundation. UCLA acknowledges OCRC, Microsoft COVID Compute Funding, Illumina in-kind donation. We thank the UCLA COVID-19 Oversight Research Committee, Microsoft COVID Compute Funding, Illumina in-kind donation, and UCLA David Geffen School of Medicine - Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research Award Program" for funding for this project under award #20-10 ("COVID-19 Host Genomics Registry at UCLA" PI:Pasaniuc). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: Biobanque Québécoise de la Covid-19: Brent Richards’s institution has received investigator-initiated grant funding from Eli Lilly, GlaxoSmithKline and Biogen for projects unrelated to this research. He is the CEO of 5 Prime Sciences Inc (www.5primesciences.com). DeCOI: Oliver Witzke has received research grants for clinical studies, speaker’s fees, honoraria and travel expenses from Amgen, Alexion, Astellas, Basilea, Biotest, Bristol-Myers Squibb, Correvio, Chiesi, Gilead, Hexal, Janssen, Dr. F. Köhler Chemie, MSD, Novartis, Roche, Pfizer, Sanofi, Takeda, TEVA and UCB. Kerstin U. Ludwig is co-founder and holds equity in the LAMPseq Diagnostics GmbH. UP serves as ad hoc advisor for Sanofi-Pasteur, BioNtech and Sobi and is member of the SAB of Leukocare. Christoph D. Spinner reports grants, personal fees from AstraZeneca, personal fees and non-financial support from BBraun Melsungen, grants, personal fees and non-financial support from Gilead Sciences, grants and personal fees from Janssen-Cilag, personal fees from Eli Lilly, personal fees from Formycon, personal fees from Roche, other from Apeiron, grants and personal fees from MSD, grants from Cepheid, personal fees from GSK, personal fees from Molecular partners, other from Eli Lilly, personal fees from SOBI during the conduct of the study; personal fees from AbbVie, personal fees from MSD, grants and personal fees from ViiV Healthcare, outside the submitted work. Jochen Schneider received grants and/or personal fees from Gilead Sciences, Janssen-Cilag, and from AbbVie outside the submitted work. Philipp Koehler reports grants or contracts from German Federal Ministry of Research and Education (BMBF) B-FAST (Bundesweites Forschungsnetz Angewandte Surveillance und Testung) and NAPKON (Nationales Pandemie Kohorten Netz, German National Pandemic Cohort Network) of the Network University Medicine (NUM) and the State of North Rhine-Westphalia; Consulting fees Ambu GmbH, Gilead Sciences, Mundipharma Resarch Limited, Noxxon N.V. and Pfizer Pharma; Honoraria for lectures from Akademie für Infektionsmedizin e.V., Ambu GmbH, Astellas Pharma, BioRad Laboratories Inc., European Confederation of Medical Mycology, Gilead Sciences, GPR Academy Ruesselsheim, medupdate GmbH, MedMedia, MSD Sharp & Dohme GmbH, Pfizer Pharma GmbH, Scilink Comunicación Científica SC and University Hospital and LMU Munich; Participation on an Advisory Board from Ambu GmbH, Gilead Sciences, Mundipharma Resarch Limited and Pfizer Pharma; A pending patent currently reviewed at the German Patent and Trade Mark Office; Other non-financial interests from Elsevier, Wiley and Taylor & Francis online outside the submitted work. Oliver A. Cornely reports grants or contracts from Amplyx, Basilea, BMBF, Cidara, DZIF, EU-DG RTD (101037867), F2G, Gilead, Matinas, MedPace, MSD, Mundipharma, Octapharma, Pfizer, Scynexis; Consulting fees from Amplyx, Biocon, Biosys, Cidara, Da Volterra, Gilead, Matinas, MedPace, Menarini, Molecular Partners, MSG-ERC, Noxxon, Octapharma, PSI, Scynexis, Seres; Honoraria for lectures from Abbott, Al-Jazeera Pharmaceuticals, Astellas, Grupo Biotoscana/United Medical/Knight, Hikma, MedScape, MedUpdate, Merck/MSD, Mylan, Pfizer; Payment for expert testimony from Cidara; Participation on a Data Safety Monitoring Board or Advisory Board from Actelion, Allecra, Cidara, Entasis, IQVIA, Jannsen, MedPace, Paratek, PSI, Shionogi; A patent at the German Patent and Trade Mark Office (DE 10 2021 113 007.7); Other interests from DGHO, DGI, ECMM, ISHAM, MSG-ERC, Wiley. Genentech: Amy D Stockwell, Fang Cai, and Brian L Yaspan are, or were at the execution of the study, full time employees of Genentech with stock and stock options in Roche. Helix Exome+ and Healthy Nevada Project COVID-19 Phenotypes: Alexandre Bolze, Kelly M Schiabor Barrett, Simon White, Nicole L Washington, Francisco Tanudjaja, Stephen Riffle, Efren Sandoval, and Elizabeth T Cirulli are employees of Helix. Regeneron: Jack A Kosmicki and Manuel AR Ferreira are current employees and/or stockholders of Regeneron Genetics Center or Regeneron Pharmaceuticals. Vanda CALYPSO COVID-19: Bartlomiej Przychodzen and Sandra Smieszek are employees of Vanda Pharmaceuticals Inc. Introduction Despite successful vaccine programs, SARS-CoV-2 is still a major cause of mortality and widespread societal disruption [1,2]. While disease severity has correlated with well established epidemiological and clinical risk factors (e.g., advanced age, obesity, immunosuppression), these do not explain the wide range of COVID-19 presentations [3]. Hence, individuals without one of these known risk factors may have a genetic predisposition to severe COVID-19[4]. These genetic determinants to severe disease can, in turn, inform about the pathophysiology underlying COVID-19 severity and accelerate therapeutics development [5,6]. Previous work on COVID-19 host genetics using genome-wide association studies (GWASs) revealed 23 statistically robust genetic loci associated with either COVID-19 severity or susceptibility [7–11]. Given that most GWASs use genetic data obtained from genome-wide genotyping followed by imputation to measure the association between a phenotype and genetic variation, their reliability and statistical power declines as a variant’s frequency decreases, especially at allele frequencies of less than 1%[12]. Ascertainment of rare genetic variation can be improved with sequencing technology [13]. Rare variants are expected to be enriched for larger effect sizes, due to evolutionary pressure on highly deleterious variants, and may therefore provide unique insights into genetic predisposition to COVID-19 severity. Identifying such genes may highlight critical control points in the host response to SARS-CoV-2 infection. Measuring the effect of rare genetic variants on a given phenotype (here COVID-19) is difficult. Specifically, while variants of large effect on COVID-19 are more likely to be rare, the converse is not true, and most rare variants are not expected to strongly impact COVID-19 severity [14]. Therefore, unless large sample sizes and careful statistical adjustments are used, most rare variant genetic associations studies risk being underpowered, and are at higher risk of false or inflated effect estimates if significant associations are found between COVID-19 and genetic loci. This is exemplified by the fact that several rare variant associations reported for COVID-19 have not been replicated in independent cohorts [15–17]. Here, we investigated the association of rare genetic variants on the risk of COVID-19 by combining gene burden test results from whole exome and whole genome sequencing. We build off recent work on exome-wide analyses [17] and include close to 5 times the number of severe cases, with a more genetically diverse cohort, to better study the effect of rare variants on COVID-19. To our knowledge, this is the first rare genetic variant burden test meta-analysis ever performed on a worldwide scale, including 21 cohorts, in 12 countries, including all main continental genetic ancestries. Discussion Whole genome and whole exome sequencing can provide unique insights into genetic determinants of COVID-19, by uncovering associations between rare genetic variants and COVID-19. Specifically, gene burden tests can be particularly helpful, because they test for coding variants, thereby pointing directly to a causal gene and often suggesting a direction of effect. However, such studies require careful control for population stratification and an adapted analysis method such as burden testing, in order to have enough statistical power to find those associations. In our study, we observed that individuals with rare deleterious variants at TLR7 are at increased risk of severe COVID-19 (up to 13.1-fold increase in odds in those with pLoFs). Although this association was suggested by previous studies [28–30], our study provides the most definitive evidence for the role of TLR7 in COVID-19 pathogenesis, with exome-wide significance for this gene in the discovery phase followed by strong replication in a large independent cohort. TLR7 is a well-studied part of the antiviral immunity cascade and stimulates the interferon pathway after recognizing viral pathogen-associated molecular patterns. Given its location on the X chromosome, it has been hypothesis that it could partly explain the observed COVID-19 outcome differences between sexes [40–42], and to our knowledge, this is the first study to show that even in heterozygous females, this gene can potentially play a role in severe disease. Further, this our results suggest that TLR7 mediated genetic predisposition to severe COVID-19 may be a dominant or co-dominant trait, an observation that cannot be made in cohorts limited to male participants[28,30]. We also uncovered a potential role for cellular microtubule disruption in the pathogenesis of COVID-19 and the microtubule network is known to be exploited by other viruses during infections [43]. Indeed, the MARK1 protein has been shown to interact with SARS-CoV-2 in previous in-vitro experiments [33]. Nevertheless, these findings at MARK1 were not replicated in the GenOMICC cohort and will need to be tested in larger cohorts, especially given the small number of highly deleterious variants that we found in our consortium. Lastly, we found single variant associations at IL6R, SRRM1, and FRMD5. While IL6R is is already a therapeutic target [44,45] for COVID-19, and SRRM1 has been reported in a previous pre-print [46], these were found in smaller cohorts and will require replication. To our knowledge, this is the first time a rare variant burden test meta-analysis has been attempted on such a large scale. Our framework allowed for easy and interpretable summary statistics results, while at the same time preventing participant de-identification or any breach of confidentiality that stems from sharing results of rare genetic variant analyses [47]. It also provides important insights into how these endeavours should be planned in the future. First, our burden test operated under the assumption that the effect of any of the deleterious variants on the phenotype would be in the same direction and did not account for compound deleterious variant heterozygosity. This allowed for easier meta-analysis across cohorts, but may have decreased statistical power. Other methods may be needed in future analysis to soften this assumption, though some of these cannot be easily meta-analyzed across multiple cohorts directly from summary statistics (e.g., SKAT-O [48]). Similarly, methods that combine both rare and common variants might also provide additional insights into disease outcomes [31,49]. Second, our results highlight the importance at looking at different categories of variants through different masks to increase sensitivity and specificity of our burden tests. Third, while the largest biobanks contributed the most to the signal observed at TLR7 and MARK1, many of our smaller prospective COVID-19 specific cohorts also contributed to the signal. This further highlights the importance of robust study design to improve statistical power, especially with rare variant associations. Lastly, work remains to be done to standardize sequencing and annotation pipelines to allow comparisons of results easily across studies and cohorts. Here, we provided a pipeline framework to every participating cohort, but there remains room for process harmonization. While the decentralized approach to genetic sequencing, quality control, and analyses allowed for more rapid generation of results, it may come at the cost of larger variance in our estimates. In the future, more sophisticated approaches may be required to increase statistical power of exome-wide rare variant association studies [50]. Our study had limitations. First, even if this is one of the world’s largest consortia using sequencing technologies for the study of rare variants, we remain limited by a relatively small sample size. For example, in a recent analyses of UK Biobank exomes, many of the phenotypes for which multiple genes were found using burden tests had a much higher number of cases than in our analyses (e.g. blonde hair colour, with 48,595 cases) [22]. Further, rare variant signals were commonly found in regions enriched in common variants found in GWASs. The fact that ABO and NSF were the only genes from the COVID-19 HGI GWAS that were also identified in our burden test (albeit using a more liberal significance threshold), also suggests a lack of statistical power. Similarly, GenOMICC, a cohort of similar size, was also unable to find rare variant associations using burden tests [11]. However, their analysis methods were different from ours, making further comparisons difficult. Nevertheless, this provides clear guidance that smaller studies looking at the effect of rare variants across the genome are at considerable risk of finding both false positive and false negative associations. Second, many cohorts used population controls, which may have decreased statistical power given that some controls may have been misclassified. However, given that COVID-19 critical illness remains a rare phenomenon [51], our severe disease phenotype results are unlikely to be strongly affected by this. Finally, the use of population control is a long-established strategy in GWAS burden tests [7,8,11,22,52], and the statistical power gain from increasing our sample size is likely to have counter-balanced the misclassification bias. In summary, we reproduced an exome-wide significant association with severe COVID-19 outcomes in carriers of rare deleterious variants at TLR7, for both sexes. Our results also suggest an association between the cellular microtubule network and severe disease, which requires further validation. More importantly, our results underline the fact that future genome-wide studies of rare variants will require considerably larger sample size, but our work provides a roadmap for such collaborative efforts. Methods Ethics statement Each cohort had the following statement to make on ethics: BQC-19. Each participant or their legal representative (if the participant was incapable to consent) provided informed consent to the biobank. If a participant regained capacity to give consent, informed consent was obtained again directly from the participant. The study was approved by the Jewish General Hospital and Centre Hospitaler de l’Université de Montréal institutional review boards. Columbia Biobank: Recruitment and sequencing of participants from the Columbia COVID-19 Biobank were approved by the Columbia University Institutional Review Board (IRB) protocol AAAS7370 and the genetic analyses were approved under protocol AAAS7948. A subset of patients was included under a public health crisis IRB waiver of consent specifically for COVID-19 studies if patients were deceased, not able to consent, or if the study team was unable to contact them as per IRB protocol AAAS7370. DeCOI. Informed consent was obtained from each participant or the legal representative. DeCOI received ethical approval by the Ethical Review Board (ERB) of the participating hospitals/centres (Technical University Munich, Munich, Germany; Medical Faculty Bonn, Bonn, Germany; Medical Board of the Saarland, Germany; University Duisburg-Essen, Germany; Medical Faculty Duesseldorf, Duesseldorf, Germany) FHoGID. Each participant or their legal representative provided informed consent to the biobank. FHoGID received ethical approval by the Commission cantonale d’éthique de la recherche sur l’être humain. GEN-COVID multicenter study: The patients were informed of this research and agreed to it through the informed consent process. The GEN-COVID is a multicentre academic observational study that was approved by the Internal Review Boards (IRB) of each participating centre (protocol code 16917, dated March 16, 2020 for GEN-COVID at the University Hospital of Siena). Genentech. The protocol was reviewed by the institutional review board or ethics committee at each site. Written informed consent was obtained from all the patients or, if written consent could not be provided, the patient’s legally authorized representative could provide oral consent with appropriate documentation by the investigator. Details on institutional review boards are provided in S9 Table. GenOMICC. GenOMICC was approved by the appropriate research ethics committees (Scotland, 15/SS/0110; England, Wales and Northern Ireland, 19/WM/0247). Informed consent was obtained for all participants. Geisinher Health Systems: All subjects consented to participation and the analysis was approved by the Geisinger Institutional Review Board under project number 2006–0258. Helix Exome+ and Healthy Nevada Project COVID-19 Phenotypes: informed consent was obtained for all participants. The Healthy Nevada Project study was reviewed and approved by the University of Nevada, Reno Institutional Review Board (IRB, project 956068–12) Thai Biobank (). Informed consent was obtained for each participant via the biobank. The study was approved by the Institutional Review Board of the Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand (COA No. 691/2021). Japan COVID-19 Task Force: Each participant or their legal representative (if the participant was incapable to consent) provided informed consent to the biobank. Study was approved by the ethical committees of Keio University School of Medicine, Osaka University Graduate School of Medicine, and affiliated institutes. Interval WGS. After reading study leaflets and participating in a discussion with donor carer staff, eligible donors were asked to complete the trial consent form before giving a blood donation. The National Research Ethics Service (United Kingdom) approved this study. MNM Diagnostics (Polish Covid WGS). All participants, or their guardians/parents for the participants under 18), provided their informed consent before collecting their blood samples. The study was approved by the Institutional Ethics Committee of the Central Clinical Hospital of the Ministry of Interior and Administration in Warsaw, Poland (decision nr: 41/2020 from 03.04.2020 and 125/2020 from 1.07.2020). MSCIC. This research protocol was reviewed and approved by the Icahn School of Medicine at Mount Sinai Institutional Review Board (IRB) (STUDY-20-00341). During the height of the SARS-CoV-2 pandemic in New York City, all patients admitted to the Mount Sinai Health System were made aware of the research study by a notice included in their admission paperwork. The notice outlined details of the planned research, potential specimen collection and the opportunity to opt-out of research. Flyers announcing the study were also posted throughout the health system. Given the monumental hurdles of consenting sick and infectious patients in isolation rooms, the IRB allowed for specimen collection to occur prior to obtaining research consent at the time of clinical blood collection. Patients and/or their legally authorized representative provided consent to the research study, including genetic profiling for research and data sharing on an individual level. In a subset of individuals, who were unreachable following hospital discharge, we were unable to obtain written informed consent. In these cases, data cannot be share further. All data used these these analyses were anonymized same as above. Penn medicine. Recruitment of PMBB participants was approved under IRB protocol 813913 and supported by Perelman School of Medicine at University of Pennsylvania. POLCOVID-Genomika. All study participants provided written informed consent and received detailed information on the study and associated risk before enrollment. The study was approved by the Bioethics Committee of the Medical University of Bialystok. Qatar Genome Program. All QBB participants signed an Informed Consent Form prior to their participation; QBB study protocol ethical approval was obtained from the Hamad Medical Corporation Ethics Committee in 2011 and continued with QBB Institutional Review Board (IRB) from 2017 onwards and it is renewed on an annual basis Saudi human genome program. Informed Consent was provided to each participant or their legal guardian (if the participant could not consent) by the corresponding institute. This study was approved the IRB of each participating hospitals, and the IRB at King Abdullah International Medical Research Centre, Ministry of National Guard–Health Affairs, Riyadh, Ministry of Health, and King Fahad Medical City. Swedish Biobank. Informed consent was obtained for all study participants. The study was approved by the National Ethical Review Agency (Sweden) (No. 2020–01623). UK Biobank. All subjects consented to participation. The UK Biobank was approved by the North West Multi-centre Research Ethics Committee (United Kingdom) (11/NW/0382). The work described herein was approved by the UK Biobank under application no. 26041. University of California, Los Angeles biobank. Each participant or their legal representative (if the participant was incapable to consent) provided informed consent to the biobank. If a participant regained capacity to give consent, informed consent was obtained again directly from the participant. This study was considered human subjects research exempt because all genetic and electronic health records were de-identified. This study was approved by the UCLA Health Institutional Review Board. Vanda COVID-19. All participants consented to WGS. The study was reviewed and approved by Advarra IRB; Pro00043096. COVID-19 outcome phenotypes For all analyses, we used three case-control definitions: A) Severe COVID-19, where cases were those who died, or required either mechanical ventilation (including extracorporeal membrane oxygenation), high-flow oxygen supplementation, new continuous positive airway pressure ventilation, or new bilevel positive airway pressure ventilation, B) Hospitalized COVID-19, where cases were all those who died or were admitted with COVID-19, and C) Susceptibility to COVID-19, where cases are anyone who tested positive for COVID-19, self-reported an infection to SARS-CoV-2, or had a mention of COVID-19 in their medical record. For all three, controls were individuals who did not match case definitions, including population controls for which case status was unknown (given that most patients are neither admitted with COVID-19, nor develop severe disease [53]). These three analyses are also referred to as analyses A2, B2, and C2 by the COVID-19 Host Genetics Initiative [8], respectively. Cohort inclusion criteria and genetic sequencing Any cohort with access to genetic sequencing data and the associated patient level phenotypes were allowed in this study. Specifically, both whole-genome and whole-exome sequencing was allowed, and there were no limitations in the platform used. There were no minimal number of cases or controls necessary for inclusion. However, the first step of Regenie, which was used to perform all tests (see below), uses a polygenic risk score which implicitly requires that a certain sample size threshold be reached (which depends on the phenotype and the observed genetic variation). Hence, cohorts were included if they were able to perform this step. All cohorts obtained approval from their respective institutional review boards, and informed consent was obtained from all participants. More details on each cohort’s study design and ethics approval can be found in the S3 and S1 Tables. Variant calling and quality control Variant calling was performed locally by each cohort, with the pre-requisite that variants should not be joint-called separately between cases and controls. Quality control was also performed individually by each cohort according to individual needs. However, a general quality control framework was made available using the Hail software [54]. This included variant normalization and left alignment to a reference genome, removal of samples with call rate less than 97% or mean depth less than 20. Genotypes were set to unknown if they had genotype quality less than 20, depth less than 10, or poor allele balance (more than 0.1 for homozygous reference calls, less than 0.9 for homozygous alternative calls, and either below 0.25 or above 0.75 for heterozygous calls. Finally, variants were removed from if the mean genotype quality was less than 11, mean depth was less than 6, mean call rate less than or equal to 0.8, and Hardy-Weinberg equilibrium p-value less than or equal to 5x10-8 (10−16 for single variant association tests). Details on variant calling and quality control is described for each cohort in the S3 Table. Single variant association tests We performed single variant association tests using a GWAS additive model framework, with the following covariates: age, age2, sex, age*sex, age2*sex, 10 genetic principal components obtained from common genetic variants (MAF>1%). Each cohort performed their analyses separately for each genetic ancestry, but also restricted their variants to those with MAF>0.1% and MAC>6. Summary statistics were then meta-analyzed using a fixed effect model within each ancestry and using a DerSimonian-Laird random effect model across ancestries with the Metal package [55] and its random effect extension [56]. Lastly, given that multiple technologies were used for sequencing, and that whole-exome sequencing can provide variant calls of worse quality in its off-target regions [57], we used the UKB, GHS, and Penn Medicine whole-exome sequencing variants as our “reference panel” for whole-exome sequencing. Hence, only variants reported in at least one of these biobanks were used in the final single-variant analyses. Variant exclusion list For the burden tests, we also compiled a list of variants that had a MAF > 1% or > 0.1% in any of the participating cohorts. This list was used to filter out variants that were less likely to have a true deleterious effect on COVID-19, even if they were considered rare in other cohorts, or in reference panels [25]. We created two such variant exclusion lists: one to be used in our burden test with variants of MAF less than 1%, and the other for the analysis with MAF less than 0.1%. In any cohort, if a variant had a minor allele count of 6 or more, and a MAF of more than 1% (or 0.1%), this variant was added to our exclusion list. This list was then shared with all participating cohorts, and all variants contained were removed from our burden tests. Gene burden tests The following analyses generally followed the methods used by recent literature on large-scale whole-exome sequencing [22] and the COVID-19 HGI [8]. The burden tests were performed by pooling variants in three different variant sets (called masks), as described in recent UK Biobank whole-exome sequencing papers by Backman et al.[22] and Kosmicki et al.[17].: “M1” which included loss of functions as defined by high impact variants in the Ensembl database[23] (i.e. transcript ablation, splice acceptor variant, splice donor variant, stop gained, frameshift variant, stop lost, start lost, transcript amplification), “M3” which included all variants in M1 as well as moderate impact indels and any missense variants that was predicted to be deleterious based on all of the in-silico pathogenicity prediction scores used, and “M4” which included all variants in M3 as well as all missense variants that were predicted to be deleterious in at least one of the in-silico pathogenicity prediction scores used. For in-silico prediction, we used the following five tools: SIFT [58], LRT [59], MutationTaster[60], PolyPhen2[61] with the HDIV database, and PolyPhen2 with the HVAR database. Protein coding variants were collapsed on canonical gene transcripts. Once variants were collapsed into genes in each participant, for each mask, genes were given a score of 0 if the participant had no variants in the mask, a score of 1 if the participant had one or more heterozygous variant in this mask, and a score of 2 if the participant had one or more homozygous variant in this mask. These scores were used as regressors in logistic regression models for the three COVID-19 outcomes above. These regressions were also adjusted for age, age2, sex, age*sex, age2*sex, 10 genetic principal components obtained from common genetic variants (MAF>1%), and 20 genetic principal components obtained from rare genetic variants (MAF<1%). The Regenie software [18] was used to perform all burden tests, and generate the scores above. Regenie uses Firth penalized likelihood to adjust for rare or unbalanced events, providing unbiased effect estimates. All analyses were performed separately for each of six genetic ancestries (African, Admixed American, East Asian, European, Middle Eastern, and South Asian). Summary statistics were meta-analyzed as for the single variant analysis. Participant assignment to genetic ancestry was done locally by each cohort, more details on the methods can be found in the S3 Table. Lastly, we used ACAT [35] to meta-analyze p-values across masks, within each phenotype separately. ACAT is not affected by lack of independence between tests. These values were used to draw Manhattan and QQ plots in Fig 2. Acknowledgments We thank the patients who volunteered to all participating cohorts, and the researchers and clinicians who enrolled them into the respective studies. A full list of acknowledgments can be found in S1 and S2 Tables. [END] --- [1] Url: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1010367 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/