Long inverted repeat, long hpRNA and siRNA
An inverted repeat is a single stranded nucleotide sequence followed by its reverse complement at the downstream. The intervening sequence between the initial sequence and the reverse complement can be any length including zero. When transcribed, long inverted repeat can form long hairpin RNA genes (hpRNAs), which are much longer than typical animal or plant pre-miRNAs.
Henderson et al. reported the biogenesis of small interfering RNAs (siRNAs) from long inverted repeat in Arabidopsis thaliana for the first time (Henderson et al. 2006 Nature genetics). This siRNA biogenesis pathway was soon verified in Drosophila (Czech et al. 2008 Nature). In 2008, Okamura et al. systematically characterized the genes and mechanisms underlying the biogenesis of 21-22-nucleotide siRNAs from long hpRNAs encoded by LIRs in Drosophila (Okamura et al. 2008 Nature). They found that Dicer-2, Hen1 and Argonaute 2 played vital roles in this siRNA biogenesis pathway. This siRNA biogenesis pathway was further characterized in Arabidopsis soon (Dunoyer et al. 2010 EMBO J).
<style> .aligncenter { text-align: center; } </style>LIRs can act as functional genomic elements in eukaryotic genomes.
A typical long inverted repeat and the small RNAs originated from the LIR analyzed utilizing LIRBase are demonstrated in the following image.
siRNA derived from long inverted repeats play important biological roles
In 2018, Lin et al. identified two long hpRNAs in Drosophila simulans, which could be processed into 21-nt siRNAs (Tao et al. 2007a PLOS Biology; Tao et al. 2007b PLOS Biology; Lin et al. 2018 Developmental Cell). These siRNAs could then repress the expression of the Dox and MDox genes which promotes X chromosome transmission by suppressing Y-bearing sperm. As a result, the two long hpRNAs and the derived siRNAs are critical to the maintenance of balanced sex ratio in the offsprings of Drosophila simulans.
The biological functions of siRNAs derived from long inverted repeats in plants and animals were also reported in recent years.
In mouse, siRNAs derived from LIRs were reported to regulate gene expression in oocytes (Tam et al. 2008 Nature; Watanabe et al. 2008 Nature).
In Drosophila, another hpRNA and the derived siRNAs were reported to regulate testis gene expression and control male fertility (Wen et al. 2015 Molecular Cell).
In apple, a long hpRNA and the generated siRNAs contributed to the resistance of apple to leaf spot disease (Zhang et al. 2018 Plant Cell).
In soybean, a long hpRNA and the derived 22-nt siRNAs regulate the seed coat color of soybean (Tuteja et al. 2009 Plant Cell; Cho et al. 2013 PLOS ONE; Jia et al. 2020 Plant Cell).
<style> .aligncenter { text-align: center; } </style>In rice, we previously found that several LIRs were present in one parental genome of an elite hybrid but were absent from the other parental genome (Yao et al. 2020 Computational and Structural Biotechnology Journal). As a result, siRNAs derived from the LIRs were detected and expressed in only one parental genome. The association between the LIRs and siRNAs were further detected and verified in an F2 population derived from a self-cross of the elite hybrid.
Comprehensive genome-wide identification of LIRs and long hpRNAs in eukaryotic genomes are urgently needed
In 2013, Axtell urgently called on the comprehensive genome-wide identification and annotation of long inverted repeats and long hpRNAs (Axtell et al. 2013 Annual Review of Plant Biology). However, genome-wide identification and annotation of long inverted repeats were only conducted in very few organisms. None database or web server for annotation and analysis of long inverted repeats and long hpRNAs exist up to now.
Using Inverted Repeats Finder (IRF) (Warburton et al. 2004 Genome Research), we identified a total of 6,789,791 long inverted repeats in the whole genomes of 424 eukaryotes, including 297,317 LIRs in 77 invertebrate metazoa genomes, 1,902,296 LIRs in 142 plant genomes and 4,590,178 LIRs in 208 vertebrate genomes. We requested a minimum length of 400 nt for both arms of the long inverted repeat identified by IRF, to remove potential miniature inverted-repeat transposable element (MITE) or Alu element from the result of IRF.
Nomenclature of a long inverted repeat in LIRBase
Each long inverted repeat has a unique identifier in LIRBase determined by the species name and several features of the LIR including the chromosome ID, the start coordinate of the left arm, the end coordinate of the left arm, the start coordinate of the right arm, the end coordinate of the right arm.
<style> .aligncenter { text-align: center; } </style>Please be noted that the sequence of a LIR in LIRBase is composed of the left arm sequence, the loop sequence, the right arm sequence, as well as two 200-bp sequences flanking the LIR (the left flanking sequence and the right flanking sequence). The genomic coordinates of both arms of the LIR are reflected in the identifier of the LIR, while the flanking sequences are not denoted in the identifier of the LIR.
References
- Axtell et al. (2013), Classification and Comparison of Small RNAs from Plants, Annual Review of Plant Biology
- Cho et al. (2013), The Transition from Primary siRNAs to Amplified Secondary siRNAs That Regulate Chalcone Synthase During Development of Glycine max Seed Coats, PLoS ONE
- Czech et al. (2008), An endogenous small interfering RNA pathway in Drosophila, Nature
- Dunoyer et al. (2010), An endogenous, systemic RNAi pathway in plants, EMBO J (Note: This article had been retracted due to image irregularities, while the authors considered that the core conclusions of the published paper remain valid.)
- Henderson et al. (2006), Dissecting Arabidopsis thaliana DICER function in small RNA processing, gene silencing and DNA methylation patterning, Nature genetics
- Jia et al. (2020), Soybean DICER-LIKE2 Regulates Seed Coat Color via Production of Primary 22-Nucleotide Small Interfering RNAs from Long Inverted Repeats, Plant Cell
- Lin et al. (2018), The hpRNA/RNAi Pathway Is Essential to Resolve Intragenomic Conflict in the Drosophila Male Germline, Developmental Cell
- Okamura et al. (2008), The Drosophila hairpin RNA pathway generates endogenous short interfering RNAs, Nature
- Tam et al. (2008), Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes, Nature
- Tao et al. (2007), A sex-ratio Meiotic Drive System in Drosophila simulans. I: An Autosomal Suppressor, PLOS Biology
- Tao et al. (2007), A sex-ratio Meiotic Drive System in Drosophila simulans. II: An X-linked Distorter, PLOS Biology
- Tuteja et al. (2009), Endogenous, Tissue-Specific Short Interfering RNAs Silence the Chalcone Synthase Gene Family in Glycine max Seed Coats, Plant Cell
- Warburton et al. (2004), Inverted Repeat Structure of the Human Genome: The X-Chromosome Contains a Preponderance of Large, Highly Homologous Inverted Repeats That Contain Testes Genes, Genome Research
- Watanabe et al. (2008), Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes, Nature
- Wen et al. (2015), Adaptive Regulation of Testis Gene Expression and Control of Male Fertility by the Drosophila Hairpin RNA Pathway, Molecular Cell
- Yao et al. (2020), Features of sRNA biogenesis in rice revealed by genetic dissection of sRNA expression level, Computational and Structural Biotechnology Journal
- Zhang et al. (2018), A Single-Nucleotide Polymorphism in the Promoter of a Hairpin RNA Contributes to Alternaria alternata Leaf Spot Resistance in Apple (Malus × domestica), Plant Cell