(DIR) Home
        
        
       Redefining the treponemal history through pre-Columbian genomes from
       Brazil
        
 (HTM) Source
        
       ----------------------------------------------------------------------
        
       ### Inclusion and ethics
        
       Genetic studies of ancient human diseases shed light on how past
       populations thrived and dealt with health problems, which may trigger
       concerns such as stigmatization due to diseases or rights and legal
       issues among people living today. Historical injustices, colonization,
       and dispossession have often complicated indigenous communities'
       ability to assert and maintain their territorial rights in a legal or
       administrative framework. It is therefore crucial to consider, besides
       the scientific aspects, also the perspectives of living (indigenous)
       communities and people when carrying out this work55.
        
       Here we study human remains of fully anonymous individuals who died
       more than 1,000 years ago and were buried in the archaeological site
       Jabuticabeira II, in the municipality of Tubarao in Santa Catarina
       state, Brazil. This site was excavated by P. de Blasis and team56,
       funded by Fundação de Amparo à Pesquisa do Estado de São Paulo
       (FAPESP). and a research permit was obtained from the Brazilian
       National Institute of Historical and Artistic Heritage (IPHAN),
       according to the correspondence 1793/2019 GAB PRESI-IPHAN of the
       process 01506.000720/2019-65 by K. Santos Bogia. The use of the
       samples of the remains for this study has been also approved by P. de
       Blasis, custodian of the Jabuticabeira II collections at the Museum of
       Archeology and Ethnology of the University of Sao Paulo. The human
       remains have been curated, studied and sampled by S.E. and team at the
       University of Sao Paulo until 2016 and thereafter at the Natural
       History Museum of Vienna.
        
       The territories and sites spanning across Rio Grande do Sul, Santa
       Catarina and Paraná are inherent to the ancestral heritage of the
       Kaingang, Guarani and Xokleng communities (also referred to as the
       'Sun people' or 'coastal group'), who are still living in the region
       today. These societies have not only utilized the region for
       mobilization and migration in search of food supply, but also
       traditionally traversed significant distances, leaving a trail of
       cultural imprints, particularly in the domain of funerary practices.
       In a previous study57, samples of five of the individuals exhumed at
       Jabuticabeira II were studied, revealing some genetic affinity with
       the Kaingang (a Ge-speaking group of Southern Brazil). However, to the
       best of our knowledge, the Kaingang are not seen as direct descendants
       of sambaqui societies, nor do they identify with the people who once
       dwelled at Jabuticabeira II or request their remains. Finally,
       research in the Instituto Socioambiental
       (https://www.socioambiental.org/; for the defence of Brazilian socio-
       environmental diversity, including Indigenous Rights) states that the
       region around Jabuticabeira II is not part of any Indigenous reserve,
       nor are there claims of groups for territorial rights of this region
       or for the archaeological remains of this site (P. de Blasis, personal
       communication).
        
       Degenerative processes, often resulting from contexts of
       marginalization, conflict and displacement, bear witness to the impact
       of historical relationships of the indigenous groups with colonizers
       and invaders. The afflictions and diseases experienced by these groups
       carry historical and environmental ramifications of notable
       significance, warranting explicit acknowledgment and examination.
       Regarding possible stigmatization of local (indigenous) communities
       and persons affected with bejel, it must be stressed that this
       contagious disease is an endemic, mostly non-sexually transmitted
       disease, common in hot regions where people live in close contact to
       each other, have no need for especially covering clothing, and share
       utensils. Today bejel, which can lead to stigmatization owing to
       disfiguring wounds, occurs especially in east-mediterranean and west-
       African communities with limited access to modern medical care.
       Although the World Health Organization realizes the importance of
       actions taken to eradicate bejel worldwide since 1949 (WHA2.36 Bejel
       and other Treponematoses
       (https://www.who.int/publications/i/item/wha2.36)), the disease is not
       seen as a current public health issue in Brazil, as it is for some
       other countries58,59. This stands in contrast to the high prevalences
       of sexually transmitted diseases such as HIV and venereal syphilis,
       which affect the indigenous communities in Brazil (Em São Paulo, ação
       em aldeias promove debate e testagem rápida de HIV e Sífilis —
       Fundação Nacional dos Povos Indígenas (http://www.gov.br/funai/pt-
       br/assuntos/noticias/2019/em-sao-paulo-acao-em-aldeias-promove-debate-
       e-testagem-rapida-de-hiv-e-sifilis)). It is, however, notable that, in
       the archaeological context, nothing implied that those prehistoric
       people of Jabuticabeira II with the local treponematosis would have
       been discriminated against in their time and culture.
        
       Furthermore, the culturally insensitive descriptions in palaeogenomic
       research articles are an ethical issue of concern60. To ensure
       discretion, we curated for potentially insensitive or discriminatory
       expressions within the manuscript. Importantly, we had the invaluable
       help of E. Krenak, Cultural Survival's lead on Brazil, Indigenous
       activist and PhD candidate at the University of Vienna, to critically
       analyse our texts and provide advice in ethically correct and fair use
       of terminology.
        
       ### Archaeological information
        
       #### The sambaquis of the Laguna region
        
       A sambaqui is the prevalent type of archaeological site on the
       Brazilian coast: a human-built shell midden or shell mound of varying
       dimensions, located in rich resource areas such as lagoons, mangroves
       or estuaries. Sambaquis consist of inorganic sediment, mollusc shells,
       food debris and organic matter mixed in intricate stratigraphies
       associated with domestic and/or funerary functions61. More than 1,000
       sambaquis are mapped along the 7,500 km long Brazilian coast and are
       dated to between 7,500 and 1,000 yr bp61,62. Recent archaeological
       research suggests that these shell mound-building populations were
       sedentary, with an abundant and stable marine-based subsistence,
       horticulture, high population growth61,63, elaborate funerary
       rituals64 and landscape appropriation56.
        
       #### Jabuticabeira II excavation site
        
       Jabuticabeira II (UTM 22 J − 0699479E; 6835488 S) is a medium-sized
       shell mound (400 × 250 × 10 m in height), settled on a palaeodune and
       located in the Laguna region, the highest density area of sambaquis
       from the southern Brazilian coast, 3 km from Laguna do Camacho, one of
       several water sources associated with a barrier-lagoon geological
       system formed during the Holocene (Fig. 1). Jabuticabeira II, built
       during a nearly 1,000-year period, is one of 65 sambaquis mapped
       around the lagoon system. This large number of settlements and their
       chronologically overlapping occupation history attest to a fairly
       dense occupation and intense interactions of the sambaqui builders
       between 7,500 and 900 calibrated years (cal yr) bp56. According to
       stratigraphic studies, Jabuticabeira II is the result of incremental
       funerary rituals accumulated over centuries. Although Jabuticabeira II
       was not completely excavated, 204 burials containing the remains of
       282 individuals were exhumed from a 373 m2 area64. Radiocarbon dates
       of Jabuticabeira II stratigraphy56,64 suggest a long occupation period
       between 1214-830 cal bc and 118-413 cal ad or 3137-2794 to 1860-1524
       cal yr bp (2σ), roughly in line with the radiocarbon datings from bone
       material of the four individuals in this study, ranging from 350 cal
       bc to 573 cal ad.
        
       The human remains from the Jabuticabeira II sambaqui were found in
       single, double, and multiple burials, dispersed in clusters. The
       skeletons recovered were mostly incomplete, avoiding categorical
       estimations of age and sex or other osteological findings. The burial
       pattern was tightly flexed and suggested intentional treatment of the
       body prior to the internment. The small size of the graves suggested
       that the bodies suffered previous desiccation or decomposition of soft
       tissues, but not enough to produce complete disarticulation (hand and
       feet bones were found articulated). Many burials come from profiles
       and are incomplete. The bones of several individuals are stained with
       red ochre65, a common practice in archaeological sites of the Santa
       Catarina state66,67. Offerings are common in sambaqui burial contexts
       and include adornments made with faunal material and lithic tools in a
       wide range of forms, from debris to polished tools and zooliths, with
       differences in frequency of occurrence among different sites and
       strata68. The most common offering in Jabuticabeira II was fish.
        
       Altogether, 99 Jabuticabeira II individuals, with and without bone
       alterations suggestive of infection, were screened for pathogen DNA
       content. 37 samples deemed positive for treponemal DNA in the initial
       screening and four samples yielded sufficient data for _T. pallidum_
       genome reconstruction (Supplementary Table 1).
        
       #### Palaeopathological analysis of treponematoses
        
       Bioarchaeological analyses showed results compatible with increasing
       population growth and high population density in Jabuticabeira II,
       including high frequencies of nonspecific stress markers69 and
       occasional infant stress70, but no evidence of trauma associated with
       interpersonal conflicts over resources or territory69.
        
       There is, however, evidence of communicable systemic diseases in
       Jabuticabeira II and other local Brazilian sambaquis30. Eleven 14C
       accelerator mass spectrometry dates obtained directly from the
       presumably treponematosis-affected individuals suggest that these
       diseases are very old on the east coast of South America, with a time-
       range between 6,300 and 500 yr BP. Among the possible treponemal cases
       based on osteological analysis, three came from Jabuticabeira II.
       However, these did not overlap with the individuals yielding the
       detected genetic evidence in this study.
        
       #### Information on individuals
        
       ### Individual 41A-L2.05-E4, sample ZH1390
        
       The individual is an adult male of robust build, with an estimated
       stature of 150.49 ± 2.6 cm (ref. 70). Although fragmented, the bones
       of this individual comprised an almost complete skeleton (80%),
       articulated and buried in an oval shell-rich matrix in a hyper-flexed
       position. The bones of the individual showed signs of systemic
       infectious disease in the lower limbs. Femurs, tibias, and fibulas all
       show discrete generalized periostitis and osteoarthrosis. A widening
       in the lateral portion of clavicles was also observed. According to
       Filippini et al.30, applying the SPIRAL method71, this individual's
       disease could be classified non-conclusively as syphilis, yaws or
       bejel. The sampling was performed on an active lesion on the tibia
       fragment.
        
       ### Individual FS9-L3-T2, sample ZH1540
        
       The sample comes from an assemblage of commingled bones, of probably
       more than one individual. The bones assigned to this individual
       consist of several skeletal elements, some with pathological
       alterations, such as severe osteomyelitis in the distal third of the
       right humerus, severe periostitis in the left ulna, periostitis in a
       fibula diaphysis, and two vertebral bodies with osteophytosis. The
       sample was taken from the fibula fragment, in the area with
       periostitis.
        
       ### Individual FS3B-L3-T4, sample ZH1541
        
       The sample comes from one of three separate individuals, found
       commingled. The skeletal elements belonging to this robust adult of
       unknown age and sex include a left radius with arthritis, a fragment
       of the left ulna (very robust), a fragment of the left humerus,
       fragments of a femur, a tibia, and a fibula and a first metatarsal.
       The sample was taken from a femur fragment, under the immediate
       surface of the bone, to best avoid the possible introduction of
       external contaminants.
        
       ### Individual 2B-L6-E3, ZH1557
        
       The sample comes from a probably adult male individual. The individual
       was articulated and in a flexed position with another, adult female
       individual buried on top. Osteopathological findings on the bones of
       the sampled individual included signs of degenerative joint disease,
       severe lumbar intervertebral osteoarthritis, scoliosis, and possible
       injuries to the patellae. However, no typical lesions suggestive of
       treponemal infection were observed. The sample was taken from a small
       piece of long bone, under the immediate surface of the bone, to best
       avoid the possible introduction of external contaminants.
        
       #### Marine reservoir effect correction for 14C dating
        
       Radiocarbon dating was performed by the Laboratory of Ion Beam Physics
       at ETH Zurich (laboratory number: ETH-127328) using bone collagen
       purified by a modified ultrafiltration method72. Data calibration was
       done with OxCal v4.4.4. The diet of the Jabuticabeira II inhabitants,
       substantially consisting of marine food sources, produces a reservoir
       effect in the radiocarbon dates calculated as mean age of 247.8 ( _σ_
       = 103.7) years73. Considering the high contribution of marine carbon
       to bone collagen of individuals in Jabuticabeira II, the radiocarbon
       dates were modelled with Calib Rev 8.2074
       (http://calib.org/calib/calib.htm) using the Mixed Marine SHCal20
       calibration curve75,76 and applying the estimated average local marine
       radiocarbon reservoir correction value (Δ _R_ ) of −126 ± 29 for the
       South coast of Brazil (Marine Reservoir Correction database)73,77. We
       considered the average relative contribution of marine carbon to
       collagen derived from Bayesian Mixing Models for Jabuticabeira II
       individuals, calculated at a mean value of 42.5%78,79. For the
       individual estimates for the samples, see Supplementary Table 2.
        
       #### Sample processing
        
       Samples were documented and carried through sampling, DNA extraction,
       library preparation and library indexing in facilities dedicated to
       ancient DNA work at the University of Zurich, including
       decontamination of samples, laboratory equipment and reagents with UV
       irradiation and using protective clothing and minimum contamination-
       risk working methods.
        
       All post-amplification steps were performed in the regular laboratory
       facility available for the Paleogenetics Group at the Institute of
       Evolutionary Medicine (IEM), University of Zurich (UZH). DNA
       sequencing was performed at the Next Generation Sequencing facility of
       the Vienna BioCenter Core Facilities (VBCF) or at the Functional
       Genomics Center at the University of Zurich (FGCZ).
        
       #### Ancient DNA extraction
        
       All sample surfaces were irradiated with ultraviolet light to minimize
       potential contamination from modern DNA. The bone powder was obtained
       using a dental drill and diamond head drill bits. DNA extraction was
       performed on around 50-100 mg of bone powder, according to a well-
       established extraction protocol for ancient DNA80. Negative controls
       for extraction and library processes were processed in parallel
       through all experiments, one control per ten samples, sequenced and
       bioinformatically compared to their corresponding sample batches, as
       precaution against possible contamination.
        
       #### Library preparation
        
       Double stranded DNA libraries were produced for initial screening with
       shotgun sequencing, without UDG treatment (that is, chemical treatment
       aiming to limit age-related damage in the DNA). Two additional
       libraries for each of the potentially positive samples from the first
       round of capture were produced to maximize the DNA complexity. For the
       preparation of DNA libraries, 20 µl of DNA extract was converted into
       double stranded DNA libraries31. Sample-specific barcodes (indexes)
       were added to both ends of the DNA fragments in the libraries81. The
       indexed libraries were then amplified to reach a minimum DNA
       concentration of approximately 90 ng ml−1. The amplification was
       performed using 1× Herculase II buffer, 0.4 mM IS5 and 0.4 mM IS6
       primer81, Herculase II Fusion DNA Polymerase (Agilent Technologies),
       0.25 mM dNTPs (100 mM; 25 mM each dNTP) and 5 ml indexed library as
       DNA template. Four reactions per library were prepared and the total
       amplification reaction volume was 100 ml. The thermal profile included
       an initial denaturation for 2 min at 95 °C and 3-18 cycles, depending
       on DNA concentration after indexing of the libraries, denaturation for
       30 s at 95 °C, 30 s annealing at 60 °C and a 30 s elongation at 72 °C,
       followed by a final elongation step for 5 min at 72 °C. All splits of
       one indexed library were pooled and purified using the QIAGEN MinElute
       PCR purification kit. DNA libraries were then quantified with D1000
       ScreenTape on an Agilent 2200 TapeStation (Agilent Technologies) and
       combined in equimolar pools for sequencing.
        
       #### Pathogen screening
        
       Shotgun data were used for an initial screening of the 99 candidate
       samples, with Kraken2 software82, and 41 samples that had more than 7
       hits to _T. pallidum_ were selected for target enrichment. The samples
       selected were subjected to a target enrichment process and
       subsequently processed by FastQ Screen v0.15.183 to check the number
       of mapped reads against three representative high-quality reference
       genomes of _T. pallidum_ subspecies (CDC2, BosniaA and Nichols). The
       nine most promising samples (>5,000 Kraken hits to _T. pallidum_ after
       first round of in-solution capture), were turned into two extra
       libraries and re-captured as explained in detail in the following
       sections.
        
       #### Target enrichment for _T. pallidum_ DNA
        
       Genome-wide enrichment of double stranded libraries was performed
       through custom target enrichment kits (Arbor Bioscience). RNA baits
       with a length of 60 nucleotides and a 4 bp tiling density were
       designed based on three reference genomes: Nichols (CP004010.2), SS14
       (CP000805.1), Fribourg-Blanc (CP003902). 500 ng library pools were
       enriched according to the manufacturer's instructions. Captured
       libraries were amplified in 100 µl reactions containing 1 unit
       Herculase II Fusion DNA polymerase (Agilent), 1× Herculase II reaction
       buffer, 0.25 mM dNTPs, 0.4 mM primers IS5 and IS681 and 15 µl library
       template, with the following thermal profile: initial denaturation at
       95 °C for 2 min, 14 cycles of denaturation at 95 °C for 30 s,
       annealing at 60 °C for 30 s, and elongation at 72 °C for 30 s,
       followed by a final elongation at 72 °C for 5 min. Captured libraries
       were purified with MinElute spin columns (QIAGEN) and quantified with
       a D1000 High Sensitivity ScreenTape on an Agilent 2200 TapeStation.
        
       #### Sequencing
        
       For both shotgun data retrieval and after the capture processing, the
       samples were pooled in unimolar quantity (for SG sequencing up to 50
       samples per pool, and for the capture process 2-8 samples per pool),
       and sequenced on an Illumina NextSeq500 with 2 × 75 + 8 + 8 cycles
       using the manufacturer's protocols for multiplex sequencing at the
       Functional Genomics Center in Zurich or at the Vienna BioCenter Core
       Facilities.
        
       ### Statistical analyses
        
       #### Dataset selection
        
       We assembled a genomic dataset comprising 98 publicly available _T.
       pallidum_ genomes (8 TEN, 30 TPE and 60 TPA) from previously published
       studies (including 8 ancient genomes), and the newly generated ZH1540
       genome. The genomes represent the genetic variation of the three known
       subspecies of _T. pallidum_ (TPA, TPE and TEN) available by December
       2022, and were selected with a focus on TEN and TPE, because of their
       proximity to the new ancient genome classified as TEN.
        
       Published data for the modern genome dataset in this study are
       available at the European Nucleotide Archive (ENA) database:
       PRJNA313497 (accession numbers: SRR3268682, SRR3268724, SRR3268715,
       SRR3268694, SRR3268696, SRR3268709, SRR3268710), PRJEB11481 (accession
       numbers: ERR1470343, ERR3596780, ERR3596747, ERR3596783), PRJEB28546
       (accession numbers: ERR4045394, ERR3684452, ERR3684456, ERR3684465,
       SRR13721290, ERR4853530, ERR4993349, ERR4853587, ERR4899206,
       ERR5207017, ERR5207018, ERR5207019, ERR4899215, ERR4853623,
       ERR4853625), PRJNA508872 (accession numbers: SRR8501165, SRR8501164,
       SRR8501167, SRR8501166, SRR8501168, SRR8501171), PRJNA723099
       (accession numbers: SRR14277267, SRR14277266, SRR14277458,
       SRR14277444), PRJEB11481 (accession number: ERR1470331), PRJDB9408
       (accession numbers: DRR213712, DRR213718), PRJNA588802 (accession
       numbers: SRR10430858, SRS5636328), PRJNA322283 (accession number:
       SRR3584843), PRJNA754263 (accession numbers: SRR15440297, SRR15440150,
       SRR15440451, SRR15440240), PRJEB40752 (accession numbers: ERR4690809,
       ERR4690806, ERR4690810, ERR4690812, ERR4690811). Assembly files were
       used for 9 genomes from National Center for Biotechnology Information
       (NCBI) database: CP002375.1, CP002376.1, NC_016842.1, NC_017268.1,
       NC_018722.1, NC_021490.2, NC_021508.1, GCA_000813285.1, CP035193.1 and
       for 24 modern genomes from the European Nucleotide Archive (ENA):
       CP021113.1, CP073572.1, CP073557.1, CP073553.1, CP073536.1,
       CP073526.1, CP073490.1, CP073487.1, CP073470.1, CP073447.1,
       CP073446.1, CP073399.1, CP040555.1, LT986433.1, LT986434.1,
       CP032303.1, CP020366.1, CP024088.1, CP024089.1, CP078121.1,
       CP078090.1, CP081507.1, CP051889.1 and CP003902 .1. Raw sequence data
       (fastq files) used for 6 modern genomes is available at the NCBI
       database: PRJEB20795 (accession numbers: ERS1724928, ERS1724930,
       ERS1884567) and PRJNA343706 (accession numbers: SRR4308604,
       SRR4308606, SRR4308597). Previously published ancient treponemal
       genomes here used are available at the ENA: PRJEB37490 (accession
       number: ERR4065503), PRJEB37633 (accession number: ERR4000645),
       PRJEB35855, PRJEB21276 (accession numbers: ERS2470995, ERS2470994) and
       PRJEB62102. Detailed source information for the reference dataset is
       documented in Supplementary Table 3.
        
       We selected all eight publicly available TEN genomes, all of which
       have more than 99.4% genome coverage, with the exception of C7717
       (81.4%). We selected 30 TPE genomes (Supplementary Table 3). To
       represent each lineage or sublineage, we selected at least one genome,
       preferring the ones with the highest sequencing depth and genome
       coverage. All included TPE genomes have more than 95.3% genome
       coverage, except the four ancient TPE genomes: SJN003, AGU007, 133 and
       CHS119, displaying 97.4%, 92.7%, 57% and 62% genome coverage,
       respectively. Furthermore, 60 TPA genomes from the major lineages and
       sublineages described in previous studies were included (Supplementary
       Table 3). All of these genomes had more than 90% coverage, except the
       four ancient genomes, PD28, W86, SJ219 and 94B, all of which have
       genome coverage of 30% or more. All genomes in the dataset are
       separated from each other by at least 5 SNPs. The TPA strain
       Seattle-81 was excluded from the final dataset owing to mutations
       probably accumulated during extensive passaging in rabbits that can
       cause ambiguous placement in phylogenies4,16,36.
        
       The raw data and/or assembly files for each genome in our dataset were
       downloaded from the public databases: European Nucleotide Archive
       (ENA)84 and National Center for Biotechnology Information (NCBI)85.
       Accession numbers are given in Supplementary Table 3.
        
       #### Read processing and multiple reference-based genome alignment
       generation
        
       To reconstruct the individual genomes from the raw data, we carried
       out raw read quality control and preprocessing, removing duplicates,
       variant calling and filtering using the default parameters when not
       otherwise specified. After processing the de-multiplexed sequencing
       reads, sample sequencing quality was analysed with FastQC version
       0.11.983, filtering reads with a QC value < 25\. Following processing
       by cutadapt version 4.186 to remove the sequencing adapters, in order
       to reduce the reference bias, and improve the posterior phylogenetic
       inference and assignment87, the genome reference selection for mapping
       each sample was determined according to the results from the original
       manuscript where the genomes were published (see Supplementary Table
       3). The mapping was carried out by BWA mem88 (using parameters: -k 19,
       -r 2.5). Four reference genomes were used; the well-studied TEN and
       TPE genomes BosniaA (NZ_CP007548.1) and CDC2 (NC_016848.1), as well as
       the Nichols (NC_021490.2) and SS14 (NC_010741.1) genomes, representing
       the two main lineages within TPA. However, for the new ancient samples
       obtained here, genomes for each sample were reconstructed by mapping
       to three high-quality reference genomes, representing the three _T.
       pallidum_ subspecies (CDC2, BosniaA and Nichols).
        
       CleanSam, from Picard Toolkit version 2.18.29
       (http://broadinstitute.github.io/picard), was used to clean the
       provided SAM or BAM files. Duplicate reads were removed using
       MarkDuplicates, from Picard toolkit version 2.18.29.
       AddOrReplaceReadGroups, from Picard Toolkit version 2.18.29, was used
       to assign all the reads in a file to a single new read-group before
       using mapDamage version 2.2.0-86-g81d0aca89 to estimate the DNA damage
       parameters and rescale quality scores of probably damaged positions in
       the reads (using parameter: --rescale).
        
       After generating a text pileup output for the BAM files with the
       mpileup tool from Samtools version 1.790, SNPs were called using
       VarScan version 2.4.391 (using parameters: -p-value 0.01, -min-reads2
       1, -min-coverage 1, -min-freq-for-hom, 0.4 -min-var-freq 0.05,
       -output-vcf 1). Next, a SNP filtering was also carried out with
       VarScan (using for the modern samples parameters: -p-value 0.01, -min-
       reads2, 5 -min-coverage 10, -min-avg-qual 30 -min-freq-for-hom 0.4,
       -min-var-freq 0.9, -output-vcf 1; and modifying some parameters for
       the ancient samples because of their lower read coverage and quality:
       -p-value 0.01 -min-reads2 3, -min-coverage 5, -min-avg-qual 30, -min-
       freq-for-hom 0.4, -min-var-freq 0.9 -output-vcf 1). Additionally, all
       positions with less than 3 mapped reads were masked with Genomecov
       from Bedtools version 2.26.092 for modern and ancient samples. All
       steps of genome generation were visualized and manually confirmed with
       Tablet version 1.21.02.0893, checking each SNP one by one and
       discarding the possible spurious SNPs from the new ancient genome
       ZH1540. The resulting final sequences were obtained by maskfasta from
       Bedtools v2.26.0.
        
       Additionally, we used tested sequencing and posterior analysis
       methodologies17,42 to obtain higher coverage and more reliable modern
       _T. pallidum_ genomes. Where possible, assembly files were obtained
       rather than raw data (Supplementary Table 3). A multiple reference-
       based genome alignment for all sequences was generated in MAFFT
       v7.46794 (using parameters: --adjustdirection --auto --fastaout
       --reorder). However, due to the use of different genomic references,
       regions with low coverage for some genomes, corresponding mostly to
       _tpr_ and _arp_ genes, were reviewed and manually aligned with Aliview
       version 1.2595.
        
       The samples ZH1390, ZH1541, and ZH1557 had sufficient data to attempt
       a genome reconstruction and were determined to have the most SNPs in
       common with the TEN reference but they were excluded from downstream
       analyses due to the limited coverage acquired for each of them, which
       made the obtained SNPs less reliable. The sample ZH1540, however,
       yielded a remarkable 33.6× genomic coverage and was selected for
       subsequent in-depth analyses.
        
       Proteinortho version 6.0b96 (using parameters: -p=blastn -singles
       -keep) was used to conduct an orthology study in order to find
       orthologous genes in the four reference genomes used96. Each gene
       present in at least one of the four reference genomes had its genomic
       coordinates determined based on its location in the final merged
       alignment (see Supplementary Table 3).
        
       To verify the accuracy of the final multiple genome alignment, and
       that no protein-coding gene was inadvertently truncated, the protein
       translations for every gene present in at least one reference genome
       were compared to the original gff3 files of each of the four
       references (Supplementary Table 3). The reconstructed ZH1540 genome
       and its main features were represented graphically using BRIG version
       0.95-dev.000397.
        
       #### Recombination analysis using PIM
        
       As previously noted36, the presence of recombination in the genomes of
       _T. pallidum_ may interfere with the topologies of phylogenetic trees
       inferred. In order to look into potential gene recombination, we used
       the PIM pipeline36 to detect recombination gene by gene. In brief, the
       process involved the following steps:
        
         1. (1)
        
       Using IQ-TREE version 1.6.10, a maximum-likelihood tree was created
       for the multiple genome alignment98. All maximum-likelihood trees for
       the remaining steps were obtained using GTR99 \+ G100 \+ I101 as an
       evolutionary model and 1,000 bootstraps replications.
        
         2. (2)
        
       The 1,161 genes found in at least one of the reference genomes were
       extracted, and the number of SNPs for each gene was calculated. Genes
       with less than three SNPs were excluded.
        
         3. (3)
        
       The phylogenetic signal in each gene alignment for each of the
       remaining genes was evaluated by likelihood mapping102 in IQ-TREE
       (using parameters: -lmap 10000 -n 0), retaining only those genes that
       showed a phylogenetic signal.
        
         4. (4)
        
       A maximum-likelihood tree was generated for each of the remaining
       genes using IQ-TREE.
        
         5. (5)
        
       For each included gene, we tested the phylogenetic congruence between
       trees using IQ-TREE (using parameters: -m GTR + G8 -zb 10000 -zw),
       comparing the maximum-likelihood tree obtained from the gene alignment
       and the maximum-likelihood tree obtained from the whole-genome
       alignment using two different methods: Shimodaira-Hasegawa103 and
       expected likelihood weights (ELW)104. Genes for which at least one
       test rejected the reference tree topology with the gene alignment
       adopting a conservative approach ( _P_ < 0.2, weight value close to 0,
       for Shimodaira-Hasegawa and ELW tests, respectively) and the complete
       genome alignment rejected the topology of the tree built using the
       gene alignment (reciprocal incongruence, _P_ < 0.2 and weight value
       close to 0) in at least one of them were selected and examined more
       closely in the next step.
        
         6. (6)
        
       Using MEGAX105, the selected genes that displayed reciprocal
       incongruence were subsequently examined to assess and describe
       potential recombination events. A gene has to have at least three
       nearby homoplastic SNPs—SNPs that are shared by several groups (TPE,
       TEN, TPA-Nichols or TPA-SS14) and produce a polyphyletic
       distribution—in order to be labelled as recombinant. The homoplastic
       SNPs found in the gene alignment served as the boundaries of the
       recombinant areas.
        
         7. (7)
        
       Using a parsimony criterion on the distribution of alternative states
       of the homoplastic SNPs, the potential donor and recipient clades or
       strains of each recombination event were inferred.
        
       DNA sections, a number of genes have a high percentage of sites with
       missing data. The majority of these genes are members of the _tpr_ and
       _arp_ families, which include collections of paralogous genes. In
       order to continue analysing these intriguing genes with the PIM
       pipeline, strains that had a high percentage of missing data in each
       of these genes were eliminated. Following previous findings35,36, the
       hypervariable gene _tprK_ ( _tp0897_ ), with seven hypervariable
       regions that undergo intrastrain gene conversion17,37,106,107,108,109,
       and the _tp0316_ and _tp0317_ genes, also under gene conversion, were
       completely excluded from the recombination analysis.
        
       #### PIM procedure for likelihood mapping and topology tests
        
       A likelihood mapping test was performed using IQ-TREE to determine
       which genes (Supplementary Table 4) showed a phylogenetic signal (out
       of the 382 genes for which >3 SNPs were found in pairwise comparison
       with at least one reference genome). For each quartet (subset of four
       sequences) in the data, the test creates unrooted phylogenetic trees.
       The quartet likelihoods are then plotted within a triangle, where the
       position denotes the 'tree-likeness' of the quartet in question.
       Corner quartets are completely resolved, quartets on the sides are
       partially resolved, and quartets in the centre are unresolved. Of the
       382 genes, 29 had too many missing values to be tested using the
       likelihood mapping method. In order to include these genes in the next
       steps of the PIM pipeline and topology comparisons, the problematic
       sequences with more than 50% of positions with missing data were
       removed.
        
       Following the likelihood mapping test, 9 genes falling within the
       central zone of the triangle were discarded (Supplementary Table 4).
       Then, using the Shimodaira-Hasegawa and ELW topology tests, we
       compared the gene trees of the remaining genes to the preliminary
       reference tree of the whole-genome alignment to assess their
       phylogenetic congruence (Supplementary Table 4). Of the 373 genes that
       tested positive for phylogenetic incongruence, 27 contained at least
       three consecutive SNPs, supporting a recombination event. To these we
       added _tp0859_ , which was detected as recombinant in a previous
       study35, resulting in a total of 27 recombinant genes.
        
       #### Recombination analysis using Gubbins and ClonalFrameML
        
       Gubbins version 2.3.1110 and ClonalFrameML version 1.11-1111 are
       frequently used tools for the genome-wide identification of
       recombinant positions in bacterial genomes. To test the robustness of
       our recombination analysis using PIM, we also ran these two programs,
       with default parameters and the same whole-genome alignment used with
       PIM. Gubbins identified 301 distinct recombination events associated
       with 103 genes, ranging in size from 5 bp to 13,866 bp. Similarly,
       ClonalFrameML detected 656 events, with 32 of them being 1 or 2 bp
       long, and the longest event spanning 782 bp. Notably, all the genes
       identified by PIM as having a recombinant region were also detected by
       both ClonalFrameML and Gubbins, except for gene _tp0558_ , which was
       missed by ClonalFrameML but detected by Gubbins. Additionally, genes
       _tp0164_ and _tp0179_ were detected by ClonalFrameML but missed by
       Gubbins.
        
       #### Phylogenetic analysis
        
       A maximum-likelihood tree based on the alignment including all genes
       was constructed with IQ-TREE, using GTR + G + I as the evolutionary
       model and 1,000 bootstrap replications (Extended Data Fig. 2a). Next,
       genes identified as recombinant by PIM were removed from the multiple
       genome alignment. Three additional genes ( _tp0897_ , _tp0316_ , and
       _tp0317_ ), which contain repetitive regions and have been identified
       as hypervariable and/or under gene conversion in the past, were also
       removed to prevent the introduction of a potential bias. Because the
       _tp0317_ gene is nested inside the _tp0316_ gene and the coordinates
       from the BosniaA reference genome for _tp0316_ covered a larger area
       than those of the other reference genomes, _tp0316_ and _tp0317_ were
       removed according to the _tp0316_ coordinates from the BosniaA
       reference genome. A reference phylogenetic tree was then constructed
       employing the new vertical-inheritance genome alignment, also with IQ-
       TREE using GTR + G + I as the evolutionary model and 1,000 bootstrap
       replications (Extended Data Fig. 2b). Both trees obtained were
       compared and are shown in Extended Data Fig. 2.
        
       The SS14 lineage was previously described as a largely epidemic,
       macrolide-resistant cluster that emerged after, and was possibly
       prompted by, the clinical use of antibiotics following its
       discovery12,16. Based on our phylogenetic analysis results and
       expanding on earlier phylogenetic classifications and nomenclature of
       the SS14 lineage12,16, we defined the clade that contains almost all
       SS14 genomes from clinical and contemporary samples as the SS14-Ω
       sublineage. However, two contemporary clinical samples (MD18Be and
       MD06B), were not classified as SS14-Ω sublineage, because these
       samples cluster together with the MexicoA genome, in line with
       previously published results42.
        
       To compare the PIM-based analysis with other widely used recombination
       detection methods, Gubbins and ClonalFrameML, we followed a similar
       procedure of removing the recombinant positions detected by these
       tools and inferred maximum-likelihood trees with the retained
       positions in the corresponding multiple genome alignments. All the
       phylogenetic trees with recombination events removed exhibit general
       congruence with each other, whether the events were identified by PIM,
       Gubbins or ClonalFrameML. Furthermore, the placement of the ZH1540
       genome remained consistent in the phylogenetic trees, regardless of
       the recombination detection method employed, and despite the
       elimination of recombinant genes to generate the vertically inherited
       alignment.
        
       #### Exploratory characterization of the 16S-23S genes
        
        _T. pallidum_ contains two rRNA ( _rrn_ ) operons, each of which
       encodes the 16S-23S-5S rRNA genes and intergenic spacer regions
       (ISRs). There is evidence that the random distribution of _rrn_ spacer
       patterns in _T. pallidum_ may be generated by reciprocal translocation
       of _rrn_ operons mediated by a recBCD-like system found in the
       intergenic spacer regions (ISRs)112. In concordance with previous
       studies112,113,114,115, we found that the 16S-23 S ISRs of the TPA
       strains contain the tRNA-Ile (tRNA-Ile-1; _tp0012_ ) and tRNA-Ala
       (tRNA-Ala-3; _tp00t15_ ) genes within the _rrn1_ and _rrn2_ operons,
       respectively. By contrast, the TPE genomes show an Ala/Ile spacer
       pattern, where the _tp0012_ and _tp00t15_ orthologues are located
       within the _rrn2_ and _rrn1_ operons, respectively.
        
       We identified 68 SNPs in genes _r0001_ , _r0002_ , _r0004_ and _r0005_
       , encoding the 16S-23S rRNA genes of the new ancient genome ZH1540,
       placing them among the most variable genes in our alignment and
       raising the potential that including them in the alignment could
       result in a biased phylogenetic reconstruction. Although the SNPs
       found appear to be well supported by the reads obtained from the
       sequence mapping (Supplementary Table 3), their origin from possible
       contamination cannot be completely ruled out and further analyses
       would be necessary to confirm them.
        
       Excluding these genes from the alignment, in addition to the
       recombinant genes and _tp0316, tp0317_ and _tp0897_ , did not result
       in any changes to the topology (Extended Data Figs. 2b and 3),
       although branch lengths were altered. As these genes are known to have
       conserved regions in addition to variable regions used to explore the
       evolutionary relationships among pathogenic bacteria116,117,118, we
       decided to retain them in the alignment for all subsequent analyses.
       Finally, we note that the ZH1540 genome did not possess either of the
       two _T. pallidum_ 23 S ribosomal RNA gene mutations known to confer
       macrolide resistance (A2058G and A2069G). In contrast, four modern TEN
       strains from Japan possess the A2048G mutation, suggesting recent
       selection pressure for antibiotic resistance mutations.
        
       #### Molecular clock dating
        
       We used the Bayesian phylogenetics package BEAST2 v2.6.7119 to
       estimate a time-calibrated phylogeny of the context dataset of 98 _T.
       pallidum_ genomes along with our new ancient genome, ZH1540. We
       removed hypervariable and recombining genes from the alignment, as
       described above, reduced it to variable sites and used an
       ascertainment bias correction to account for constant sites.
        
       Root-to-tip regression analyses (Extended Data Fig. 4) show that while
       there is a positive correlation between sampling year and root-to-tip
       divergence among all modern clinical strains, indicating clock-like
       evolution, the correlation is very weak when also including passaged
       strains and negative when including ancient strains. Within the TPE,
       TEN and SS14 clades there exists a positive correlation among all
       modern clinical and passaged strains. On the other hand, the
       correlation is negative for Nichols strains, even when looking only at
       clinical strains. In order to account for rate variation and the long
       terminal branches on some strains (likely due to a multitude of
       effects, including sequencing errors, contamination and mutations
       introduced during rabbit passaging) we used a UCLD and a UCED clock
       model for the molecular clock dating analysis120. For both models we
       placed a narrow lognormal prior with a mean (in real space) of 1 ×
       10−7 substitutions per site per year and standard deviation 0.25 on
       the mean clock rate. This strong prior was used to compensate for the
       poor temporal signal among _T. pallidum_ genomes and was calibrated on
       previous estimates of the substitution rate4,35. We further used a GTR
       + G + I substitution model118 and a Bayesian skyline plot121
       demographic model (tree-prior) with 10 groups. For all genomes where
       the sampling dates are not known exactly, we used uniform priors
       across the date ranges reported in the original studies to account for
       the uncertainty4,5,6,16,122. For ZH1540 we set the date range to
       364-573 ad, in accordance with the marine reservoir effect corrected
       radiocarbon dating results above. Default priors were used for all
       other model parameters. The same analysis was repeated without ZH1540
       in order to assess the effect of our new ancient genome on the
       divergence dates. We further repeated the analysis using a wide
       lognormal prior with a mean (in real space) of 1 × 10−7 substitutions
       per site per year and standard deviation 1 on the mean clock rate and
       using both constant-size and exponential growth coalescent models to
       assess the impacts of the mean clock rate prior and demographic models
       on divergence time estimates.
        
       For each analysis we ran four Markov chain Monte Carlo (MCMC) chains
       of 5 × 108 steps each, sampling parameters and trees every 10,000
       steps. After assessing convergence in Tracer v1.7123 and confirming
       that all four chains converged to the same posterior distribution, we
       combined the chains after discarding the first 10% of samples as burn-
       in. In the resulting combined chains all parameters have effective
       sample size (ESS) values > 150\. TreeAnnotator v2.6.7 was used to
       compute MCC trees and the results were visualized using ggplot2124,
       ggtree125 and custom scripts. The 95% HPD of the coefficient of
       variation estimated under the UCLD model excluded 0 (median = 1.46,
       95% HPD 1.08-1.9), indicating that a strict clock model is not
       appropriate for our dataset. Robustness analyses show that under a
       narrow mean clock rate prior both the UCED and UCLD clock models
       result in similar divergence time estimates (Extended Data Fig. 5a-f),
       with the UCED model estimates tending to be more recent and the UCLD
       model estimates usually having longer tails. Under a wide mean clock
       rate prior, estimates with the UCED are broadly similar, albeit wider,
       while the UCLD model estimates very wide posterior distributions for
       divergence times, indicating little information under this model.
       Divergence time estimates were not sensitive to the demographic model
       used. The MCC trees under the UCED model with a narrow prior, both
       with and without ZH1540 included in the analysis are shown in Extended
       Data Figs. 6 and 7, respectively.
        
       Finally, we performed a Bayesian date randomization test126,127,128
       (DRT) to further assess the strength of the temporal signal in our
       dataset, by permuting sampling dates among genomes and performing 50
       replicate analyses. For the analyses, the full dataset, a UCED clock
       model with a narrow prior and the Bayesian skyline plot demographic
       model were used, while fixing the sampling dates of ancient strains to
       the means of the radiocarbon date ranges for simplicity. MCMC chains
       were run for 1 × 108 steps, sampling parameters every 10,000 steps.
       Convergence was assessed using the coda129 package to ensure that all
       parameters in all chains have ESS values > 150\. The DRT results show
       that the 95% HPD intervals of the mean clock rate on replicates with
       permuted sampling dates are much smaller than expected if all
       information came from the mean clock rate prior (Extended Data Fig.
       5g). In general, the HPD intervals do not overlap with the 95% HPD
       interval of the mean clock rate estimated with the true sampling
       dates.
        
       ### Reporting summary
        
       Further information on research design is available in the Nature
       Portfolio Reporting Summary linked to this article.
        
        
        
        
       ______________________________________________________________________
                                                 Served by Flask-Gopher/2.2.1