We found that approximately 25% of the total dataset contains at least one RLFS in a specific genome with an expected overrepresentation in dsDNA viruses compared to other virus types, especially in the Herpesviridae order. The genome set was retrieved from the National Center for Biotechnology Information (NCBI) database and exhibits remarkable diversity in length and information content 27 across a wide range of viral families. In this study, we employed our previously developed computer algorithm, QmRLFS-finder 26, to identify possible R-loop forming sequences (RLFSs) in a reference set of over 6000 viral genomes. Notably, both studies highlighted the investigation of R-loop formation in herpesviruses. Although, this study did not report R-loop forming loci in the KSHV genome, it provides important evidence that R-loop formation in virus-infected cells might be impacted by the virus life cycle. The mechanism for this likely involves KSHV ORF57, a protein that hijacks the host hTREX complex, an RNA binding protein complex that normally prevents R-loop formation, leading to enhanced R-loop formation and DNA damage in KSHV-infected cells. demonstrated that R-loops formed during infection by Kaposi sarcoma-associated herpesvirus (KSHV) correlated with a host-cell DNA damage response and genome instability 25. Removal of R-loops by RNase H, an RNA-degrading enzyme specific for RNA-DNA duplex molecules, eliminated the generation of ssDNA at the viral origin, thereby inhibiting viral replication by preventing the recruitment of the ssDNA binding protein BALF2 to the origin of viral replication 24. demonstrated the persistence of R-loops at the origin of replication of Epstein–Barr virus (EBV) 24. Knowledge regarding roles for R-loops in viruses is limited. R-loops, have been predicted by computational approaches in various organisms 21, 22, 23 and demonstrated by direct experimental evidence, however, genome-scale identification of R-loops in the viral species has not been explored. They are also associated with certain diseases, such as Prader–Willi syndrome 13, ataxia with oculomotor apraxia 14, amyotrophic lateral sclerosis 15, spinal muscular atrophy 16, 17, motor neuron disorders 18, cancers 19 and many others 20. R-loops have been experimentally observed in a wide range of organisms, from bacteria to mammals 6, 7, where they function in transcription 8, telomere maintenance 9, genome instability 10, 11, and epigenetic regulation 12. G-quadruplexes are present in several viral genomes e.g., human immunodeficiency virus (HIV- 1), Epstein–Barr virus (EBV), and human papillomavirus (HPV) (reviewed in 4 and function in various aspects of the replication cycles of these viruses.Īnother non-canonical nucleic acid structure called an R-loop or RNA:DNA structure is preferentially formed within G-rich sequences and possesses greater thermodynamic stability than the original DNA:DNA duplex 6. Hairpin structures at the termini of the adeno-associated virus genome promote persistent DNA circles and concatemers during recombination processes that occur in the infected host cell 2, 3. For instance, Z-DNA structures in the simian virus 40 enhancer regions activate transcription 1. These structures likely play important roles in the replication strategies used by a particular virus. Recent advances in the field of genomics have revealed widespread occurrence of non-canonical nucleic acid-forming structures, such as Z-DNA 1, hairpin loops 2, 3, and G-quadruplexes (guanine-rich sequences that attain specific four-stranded conformations) 4, in various genomes including viruses 5. These predictions and validations support future analysis of RLFS in regulating the replication, transcription, and genome maintenance of herpesviruses. Validating the computationally-identified RLFS, R-loop formation was experimentally confirmed in the TR and viral Bcl-2 promoter of Kaposi sarcoma-associated herpesvirus (KSHV). RLFS in TRs are positionally conserved between herpesviruses. Analysis of RLFS density in all RLFS-positive genomes revealed unusually high RLFS densities in herpesvirus genomes, with RLFS densities particularly enriched within repeat regions such as the terminal repeats (TRs). In the order Herpesvirales, RLFS were presented in all members whereas no RLFS was predicted in the order Ligamenvirales. Over 70% of RLFS-positive genomes are dsDNA viruses. A total of 14637 RLFS loci were identified in 1586 viral genomes. Here, we performed a computational approach to investigate prevalence, distribution, and location of R-loop forming sequences (RLFS) across more than 6000 viral genomes. In viruses, R-loop investigation is limited and functional importance is poorly understood. R-loops are RNA-DNA hybrid sequences that are emerging players in various biological processes, occurring in both prokaryotic and eukaryotic cells.