This repository contains information on performing age analysis on transposable elements found in sorghum genomes (NAM panel).
Transposable element (TE) age analysis is performed by comparing the 5' and 3' terminal repeat regions to look for sequence divergence.
To annotate the TE structure, TE-greedy-nester tool was used. Information on this tool can be accessed at https://gitlab.fi.muni.cz/lexa/nested The tool was installed within a conda environment in the local directory.
The input and output files can be accessed on the HPC cluster- /nobackup/cooper_research/krittikak/nester_output
-
Run TE-greedy-nester on the scaffold data. The output is generated per chromosome. This data was concatenated into a single file before proceeding to the next step.
-
Filter out only the Right and Left ends for each TE id from TE-greedy-nester output.
-
Get the FASTA sequences for the right and left ends- used bedtools getfasta. The FASTA headers were renamed to contain the TE pair ids - custom python script header_tags.py used.
-
Right and Left pairs were split into their own fasta files for performing pairwise alignment- custom perl script get_LTR_pairs.pl used.
-
Performed pairwise sequence alignment and calulation of TE age- custom R script TEage.R used.
-
TE positions were added to the output of age calculation to be able to visualize the age distribution - custom perl script saveTEpos2.pl used.