(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org. Licensed under Creative Commons Attribution (CC BY) license. url:https://journals.plos.org/plosone/s/licenses-and-copyright ------------ Transcriptome-wide mapping reveals a diverse dihydrouridine landscape including mRNA ['Austin S. Draycott', 'Yale School Of Medicine', 'Department Of Molecular Biophysics', 'Biochemistry', 'New Haven', 'Connecticut', 'United States Of America', 'Cassandra Schaening-Burgos', 'Massachusetts Institute Of Technology', 'Department Of Biology'] Date: 2022-06 Dihydrouridine is a modified nucleotide universally present in tRNAs, but the complete dihydrouridine landscape is unknown in any organism. We introduce dihydrouridine sequencing (D-seq) for transcriptome-wide mapping of D with single-nucleotide resolution and use it to uncover novel classes of dihydrouridine-containing RNA in yeast which include mRNA and small nucleolar RNA (snoRNA). The novel D sites are concentrated in conserved stem-loop regions consistent with a role for D in folding many functional RNA structures. We demonstrate dihydrouridine synthase (DUS)-dependent changes in splicing of a D-containing pre-mRNA in cells and show that D-modified mRNAs can be efficiently translated by eukaryotic ribosomes in vitro. This work establishes D as a new functional component of the mRNA epitranscriptome and paves the way for identifying the RNA targets of multiple DUS enzymes that are dysregulated in human disease. Funding: Development of D-seq was supported by National Institute of Environmental Health Sciences ( https://www.niehs.nih.gov/ ) grant 1R21ES031525 to WG, National Cancer Institute ( https://www.cancer.gov/ ) grant 5R21CA246118 to WG, National Institute of General Medical Sciences ( https://www.nigms.nih.gov/ ) grant 5R01GM112766 to KMN, William Raveis Charitable Fund Dale F. Frey Breakthrough Scientist Award DFS-34-19 from the Damon Runyon Foundation ( https://www.damonrunyon.org/ ) to S.N. LS was supported by American Heart Association ( https://www.heart.org/ ) grant 908949, AD was supported by National Cancer Institute ( https://www.cancer.gov/ ) grant 1F31CA254339 and a Gruber Foundation Fellowship ( https://gruber.yale.edu/ ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Copyright: © 2022 Draycott et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. In this paper, we report the development of a novel method to map D residues in RNA in high-throughput. Our method takes advantage of known D-selective chemistry [ 19 – 22 ] to reduce D and induce reverse transcriptase (RT) stops 1nt 3′ of Ds. We combine this D-selective chemistry with next-generation sequencing to determine the location of Ds across the yeast transcriptome. D-seq identifies known tRNA D sites and uncovers novel D sites in small nucleolar RNA (snoRNA) and mRNA. These novel D sites occur in conserved stem-loop regions of mRNAs and snoRNAs—and are consistent with a broad function for D in folding functional RNA structures. In support of the potential for dihydrouridine to affect mRNA biogenesis, we demonstrate DUS-dependent changes in splicing of a naturally dihydrouridylated pre-mRNA in cells. Our results establish D as a new component of the mRNA epitranscriptome and show that the D-seq method is broadly applicable to identifying and studying the functions of D. Profound alteration of RNA conformation and structure by D would be expected to affect multiple steps in mRNA metabolism depending on the location of the D nucleotide. For example, D antagonizes formation of RNA duplexes [ 13 , 14 ], which are required for pre-mRNA splicing (due to base-pairing between splice sites and U1, U2, and U6 small nuclear RNAs (snRNAs)) and for regulation by micro RNAs (miRNAs) (due to base-pairing between target mRNA and miRNA). Intramolecular RNA secondary structures have been found to affect the efficiency and regulation of translation initiation, alternative splicing, RNA localization, and RNA stability (reviewed in [ 16 , 17 ]). D is also expected to stabilize binding of numerous regulatory RNA-binding proteins by favoring the C2′-endo conformation that is preferentially bound by K homology (KH) domains and RNA recognition motifs (RRMs) [ 18 ]. KH and RRM domains are responsible for sequence-specific binding by proteins that regulate all aspects of mRNA processing and function. The D modification is a reduction of the C5-C6 double bond in uridine that has multiple effects on RNA structure. First, D subtly distorts the pyrimidine ring [ 11 ] causing destacking of bases in oligonucleotides [ 12 ]. D also disrupts the orientation of N3 and O4 in the pyrimidine ring, weakening Watson–Crick base pairing, which likely contributes to the 3 to 5°C reduction in melting temperature of RNA duplexes containing a D [ 13 ]. More significantly, D substantially destabilizes the typical C3′-endo conformation of the ribose thereby favoring the C2′-endo conformation in a D nucleotide by 5.3 kcal/mol and in the nucleotide 5′ of D by 3.6 kcal/mol [ 12 ]. These changes to the RNA backbone conformation strongly disfavor RNA helical geometry [ 14 ] and allow for greater flexibility in RNAs. NMR studies of modified and unmodified versions of the tRNA D loop illustrate the consequences of this effect for RNA folding: The unmodified D loop adopts several conformations that rapidly interconvert, whereas the modified RNA folds into a hairpin with a stable stem and the D in a flexible loop region [ 15 ]. Thus, dihydrouridylation of RNA is expected to have large effects on RNA structure. Dihydrouridine (D) is a modified version of uridine that is installed by dihydrouridine synthase (DUS) enzymes in all domains of life. It is of great interest to determine the locations of D modifications because elevated expression of DUS and elevated D levels in tumors are associated with worse outcomes for patients in lung [ 1 ], liver [ 2 ], and kidney [ 3 , 4 ] cancer. DUS target sites in tRNAs are best characterized in budding yeast [ 5 , 6 ] and include multiple positions within the eponymous D loop as well as sites in the variable loops of some tRNAs. D has also been detected in the genomic RNA of Dengue, Zika, Hepatitis C, and Polio viruses [ 7 ], but the specific locations are unknown. It is likely that DUS modify additional classes of cellular RNA as recently discovered for other tRNA-modifying enzymes [ 8 ]. Notably, DUS1 and DUS3 cross-link to mRNA in both yeast and human cells [ 9 , 10 ] suggesting their potential to modify mRNA target sites. Results and discussion In light of previous work showing that DUS1 and DUS3 cross-link to mRNA in both yeast and human cells [9,10], we performed bulk nucleotide analysis on RNA from budding yeast. We purified polyA+ mRNA from a dus1Δ dus2Δ dus3Δ dus4Δ quadruple mutant strain lacking all DUS activity [6] and a matched wild-type (WT) strain. We detected D in the polyA+ mRNA fraction from WT but not DUS KO (Fig 1A), confirming the hypothesis that DUS enzymes install D in mRNA. We therefore developed a method to map D at single nucleotide resolution by identifying chemical treatments that stall RT at D. PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 1. Dihydrouridine-specific chemistry to map dihydrouridine sites in RNA with single-nucleotide resolution. (A) Bulk nucleoside analysis of detects D in mRNA from WT but not DUS KO yeast. mRNA was purified by selecting for poly(A)+ and tRNAs were removed by size selection. (B) Structures of uridine, dihydrouridine, and tetrahydrouridine. (C) Primer extension analysis of synthetic 4D and 4U RNAs treated with NaBH 4 and reverse transcribed with Super Script III RT. D-dependent RT stop positions are highlighted. (D) Schematic of D-seq library preparation. The data underlying this figure can be found in S3 Table. D, dihydrouridine; D-seq, dihydrouridine sequencing; DUS, dihydrouridine synthase; RT, reverse transcriptase; WT, wild-type. https://doi.org/10.1371/journal.pbio.3001622.g001 To identify RT stopping conditions for D, we tested different chemistries for selective RT stopping at D compared to U. Strong OH- treatment conditions used previously to map D in tRNA by primer extension [6] proved too harsh to use for mRNA due to substantial RNA degradation (S1A Fig). In contrast, milder sodium borohydride treatment conditions do not damage mRNA-like molecules (S1A Fig). D is selectively reduced to tetrahydrouridine by sodium borohydride to remove a hydrogen bond donor on the Watson–Crick face (Fig 1B) [22]. We prepared 194-nt synthetic RNAs with 4 Us or Ds positioned at approximately 30 nt intervals for easy characterization by primer extension (Methods). Using these RNAs, we found that reduced dihydrouridine blocks several RT enzymes 1 nucleotide 3′ to the D site while having no effect on RT processivity on an identical U-containing template (Figs 1C and S1B). We note that other modified nucleosides not at U can react with sodium borohydride [23]. We combined this D-specific chemistry with strand-specific cDNA sequencing to map the locations of D transcriptome-wide using high-throughput sequencing (Fig 1D). We tested the D-seq approach in budding yeast where positive control D sites in cytoplasmic tRNAs have been extensively although not exhaustively characterized [5,6]. We observed strong DUS-dependent pileups of cDNA ends 1nt 3′ of many known tRNA D sites (Fig 2A and S1 Table). Given these encouraging findings, we developed a quantitative approach to evaluate D-seq signal by calculating a modified Z-score (MAD score) as a measure of the strength of the RT stop signal at every nucleotide. We used the difference between the distributions of MAD scores at known tRNA D sites (based on previous analysis by micro array and primer extension [6]) in WT and DUS KO libraries (S2A Fig) to set cutoffs for defining a D site in abundant RNAs (Methods). Using these cutoffs, we identified previously reported target sites of 3 of the 4 DUS as well as previously unannotated D sites in 9 tRNAs at positions in the D loop that are known to be modified by DUS1 and DUS4 in other tRNAs (Figs 2A, 2B, and S2B and S1 Table, which compares these sites to previous annotations). We identified a single unanticipated site at U32 in tRNA IleAAT (S1 Table). PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 2. D-seq identifies known and novel dihydrouridine sites in structured ncRNAs. (A) Plots of cDNA end positions in Dus2 target tRNA ProAGG and Dus2, Dus4 target tRNA ArgCCG. D Peaks are highlighted. X scale in RPM and Y scale in bp. (B) Summary of known tRNA D positions and corresponding DUS. (C) Plots of cDNA end positions in snR5, snR13, and snR46 snoRNAs. D peaks are highlighted. TSS (transcription start site) of snR5. X scale in RPM and Y scale in bp. (D) snoRNA Ds occur primarily in stem-loop structures that resemble tRNA D loops. Plot of median DMS-induced mutation rate in 25 nt window flanking D site. Red trace is median DMS reactivity flanking D positions. Black dots are median DMS reactivity for randomly selected set of background positions. Blue trace is p-value for difference in DMS reactivity for sequences flanking D or background sites. (E) D sites occur in stem-loop structures of 16 H/ACA and 7 C/D box snoRNAs. The data underlying this figure can be found in S1 and S2 Tables. DMS, dimethyl sulfate; D-seq, dihydrouridine sequencing; ncRNA, non-coding RNA; snoRNA, small nucleolar RNA; TSS, transcription start site. https://doi.org/10.1371/journal.pbio.3001622.g002 As implemented here, D-seq has specific “blind spots” in tRNAs. First, the cDNA size selection step precluded detection of DUS3-dependent Ds at position 47 because they are too close to the 3′ end of the transcript. In addition, several known target sites of DUS1, DUS2, and DUS4 were not detected because they are shadowed by another D 3′ of them (S2B Fig and S1 Table). Other known tRNA D sites that were not visible occur 3′ of a penetrant RT-stop at position 26 in some tRNAs (S2C Fig and S1 Table). We suspect this RT stop is caused by N2,N2-dimethylguanosine (m2,2G) [24]. Pretreatment of RNA samples with AlkB demethylases to remove m2,2G as well as 1-methyladnosine (m1A) and 1-methylguanosine (m1G) [25–27] should overcome this limitation. Advantages of the D-seq method are that it inherently offers single-nucleotide resolution and can, in principle, be used to detect D sites in any type of RNA present in the sample. We then examined other classes of non-coding RNAs (ncRNAs) with sufficient coverage (Methods). We identified 48 novel D sites in 23 different snoRNAs, uncovering snoRNAs as a substantial new class of RNA targeted by DUS enzymes (Fig 2C and S2 Table). We considered the possibility that DUS might modify ribosomal RNA given that dihydrouridine has been reported in the bacterial ribosome at U2449 of the large subunit RNA [28]. However, inspection of the cytoplasmic rRNAs did not reveal any DUS-dependent modification at the orthologous position (S2D Fig). Like tRNAs, snoRNAs must fold to perform their cellular function [29,30]. Given the importance of D for tRNA folding [12,15], we analyzed chemical probing data to determine if D occurs within structurally stereotyped regions in snoRNAs. Dimethyl sulfate (DMS) methylates the Watson–Crick face of unpaired As and Cs, which can be detected as sites of misincorporation by RT. The observed mutation rate at each A and C indicates the extent of pairing [31], with paired nucleotides having low DMS reactivity and low mutation rates and unpaired loop regions having high reactivity and high mutation rates. Comparing snoRNA D sites with DMS probing data from WT yeast cells [31] revealed a propensity for D to occur in unpaired regions (Fig 2D). Intriguingly, most of the 48 snoRNA D sites are located in 4–8 bp stem-loop regions (schematized in Fig 2E). These compact stem loops are structurally similar to the D loops of tRNAs, suggesting a common mechanism of recognition by DUS and/or a similar role for D within the loop region to promote stable folding of the adjacent stem by causing changes to the RNA backbone conformation [12,15]. Our results establish that DUS modify additional ncRNAs beyond tRNAs and suggest a broad role for DUS in the biogenesis and function of many structured RNAs. We next analyzed yeast mRNA for D. We used a simple statistical metric, a modified Z-score, to distinguish robust DUS-dependent RT stops from noise in these less abundant RNAs. (See Methods for the advantages and limitations of the MAD score and Z-score metrics). As for tRNAs, we defined empirical thresholds for site calling based on differences in the distributions of scores in WT and DUS KO samples (S3A Fig). Applying conservative cutoffs to the mRNA mapping reads (Methods), we identified 130 high-confidence D sites in mRNAs (S2 Table). To estimate the number of false positives, we inverted the analysis (required high Z-scores in the DUS KO replicates and low Z-scores in WT replicates), which identified 5 false positives for an estimated false discovery rate for D sites in mRNA of 3.8%. Two false positives are understandable as “shadow” peaks downstream of a D (S3B Fig). The number of D sites we identified (130) represents a lower bound for the total number of D sites in yeast mRNA as we surveilled only approximately 1% of the yeast transcriptome that met the coverage threshold in all 6 libraries. These results show that interactions between DUS and mRNA [9,10] result in substantial modification and uncover dihydrouridine as a component of the mRNA epitranscriptome. The 130 D sites were distributed throughout mRNA features including the 5′-UTR, CDS, introns, and 3′-UTR (Fig 3A and 3B and S2 Table). The prevalence of D in coding sequences, including of essential genes, raised the question of how the presence of D in mRNA impacts translation. We generated model mRNAs encoding a short (12kD) protein, Top7 [32], that can be produced with few uridines: 2 or 3, including the start/AUG, stop/UAG and an internal test codon (Fig 3C). We synthesized mRNAs with no internal U/D test codon, or 1 of 3 different internal codons that we detected as frequently D-modified in endogenous yeast mRNAs, ADC, AGD, and GAD. We translated the D or U versions of these mRNAs in rabbit reticulocyte lysate (RRL) and quantified protein production by measuring 35S-Met incorporation into full-length Top7 protein by SDS-PAGE and autoradiography (Figs 3C and S3C). All 8 mRNAs were efficiently translated in RRL with no significant differences in the amount of protein produced from any D or U containing mRNA (n = 6 replicates, Figs 3C and S3C). Thus, eukaryotic ribosomes can efficiently traverse D sites in mRNAs. While our results show that the translational output is not impaired by these D-containing codons, other codons may behave differently. It is also possible that D could impact translational fidelity, as has been reported for pseudouridine [33]. PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 3. D-seq identifies dihydrouridine sites in mRNAs. (A) Plots of cDNA end positions in ALD6 and SEC63 mRNAs. D peaks are highlighted. Scale in RPM and bp. (B) Distribution of D sites among mRNA features, and background distribution of features for all sites interrogated for D. (C) SDS-PAGE gels showing Top7 protein produced from U and D containing mRNAs with 4 different test codons. Denaturing glyoxal agarose gel showing mRNA integrity. All 4 test constructs showed no significant difference in protein produced per mRNA +/‒ D. Schematic of U/D mRNAs with U/D positions highlighted in red. (D) Plots of cDNA end positions for intronic D in RPL30 mRNA. D peak is highlighted. Scale in RPM and bp. (E) DUS KO strain has increased ratio of RPL30 intron mapping reads to exon mapping reads (p < 0.05, Student’s t test). Model of regulation of RPL30 pre-mRNA splicing by RPL30 protein. (F) mRNA sequences flanking Ds have higher DMS reactivity indicating greater flexibility. Plot of median DMS-induced mutation rate in 25 nt window flanking D site. Red trace is median DMS reactivity surrounding D positions. Black dots are median DMS reactivity for randomly selected set of background positions. Blue is p-value for difference in DMS reactivity for sequences flanking D or background sites. (G) D has multiple impacts on RNA structure. D both promotes loop formation and antagonizes duplex formation. The data underlying this figure can be found in S3 Table. D-seq, dihydrouridine sequencing; DMS, dimethyl sulfate; DUS, dihydrouridine synthase. https://doi.org/10.1371/journal.pbio.3001622.g003 In light of the impacts of D on RNA structure [12,13,15], the location of D in the intron of RPL30 (Fig 3D) is notable; this intronic D is adjacent to an RNA structure that is important for the autoregulation of pre-mRNA splicing by free Rpl30 protein [34]. To investigate the potential consequences of this D site for splicing, we performed RNA-seq on WT and DUS KO. The absence of DUS activity caused a reproducible accumulation of unspliced RPL30 transcripts in DUS KO cells that is consistent with a positive effect of D on splicing of this pre-mRNA (Fig 3E). Other D-containing introns (RPL16B and COF1) were not affected indicating that splicing is not generally impaired in the absence of DUS activity (S3D Fig). It is interesting that several additional mRNA D sites occur in regions where secondary structure potential is evolutionarily conserved [35], suggesting biological function for these structures. Although the predicted structures of D sites in mRNA are more diverse than in snoRNAs, 19 of the 130 identified mRNA D sites occurred in structures very similar to the tRNA D-loop, which is consistent with modification of mRNAs at structurally stereotyped positions analogous to previously known D sites in tRNAs. Globally, our analysis of DMS structure-probing data [31] found that mRNA regions flanking D sites were significantly likelier to be unpaired in cells than a background set of sites (p < 0.05, Fig 3F). This might be a consequence of modification because D antagonizes RNA duplex formation, and promotes the formation of stem-loop structures [12,13,15] (Fig 3G). Alternatively, accessibility could be important for modification by DUS. While our manuscript was in review, Finet and colleagues [36] reported the development of a method similar to D-seq, Rho-seq (so named for the coupling of rhodamine to reduced dihydrouridine). They identified sparse D modification of mRNAs from human cells and Schizosaccharomyces pombe similar to the frequency of mRNA D sites that we uncovered in Saccharomyces cerevisiae. One notable difference between the studies is that Finet and collegues report modest reductions in translation of D-containing mRNAs in vitro for several D-containing codons, including GAD. Our results do not confirm this reported translational defect (Fig 3C). Conceivably, the source of translation components (rabbit reticulocytes versus wheat germ) and/or differences in the mRNA context, including sequences flanking the GAD codons, affect the amount of protein produced. Our results establish D-seq as a high-throughput method to map dihydrouridine sites with single-nucleotide resolution and reveal new classes of RNA targets for conserved DUS enzymes, which we now show include mRNA. The discovery of D in mRNA validates the function of DUS–mRNA interactions that have been observed from yeast to human cells [9,10]. The D-seq method is broadly applicable to reveal the specific locations of D, including in pathogenic RNA viruses where dihydrouridine has been detected by MS (mass spectrometry) [7] and in tumors where elevated DUS expression is linked to worse patient outcomes [1–4] [END] [1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001622 (C) Plos One. "Accelerating the publication of peer-reviewed science." Licensed under Creative Commons Attribution (CC BY 4.0) URL: https://creativecommons.org/licenses/by/4.0/ via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/