TIP (Transposable-element Insertion site Polymorphism) identification in sorghum

This pipeline was adapted from the one presented by Xu Cai et al. in their paper 'Transposable element insertion: a hidden major source of domesticated phenotypic variation in Brassica rapa'. Most of the scripts were used without major alterations but the ones that were modified are indicated as such with comments in the script. Their GitHub page can be accessed at https://github.com/caixu0518/ITIPs

Reference: Xu Cai, Runmao Lin, Jianli Liang, Graham J. King, Jian Wu, Xiaowu Wang. (2022). Transposable element insertion: a hidden major source of domesticated phenotypic variation in Brassica rapa. Plant Biotechnology Journal. https://doi.org/10.1111/pbi.13807

The files used and created by this pipeline can be found at /nobackup/cooper_research/krittikak/TIP_pipeline/smartie-sv

Step 1: Identification of insertions and deletions in the 11 sorghum genotypes (NAM lines)

This step involved running the smartie-sv pipeline. The input were 11 sorghum genomes from the NAM panel being compared against a single reference BTx623 genome. The output were BED files that also contained the insertion/deletion segments (with respect to the reference BTx623). Unlike in the paper being referred to, only a single reference genome was used to run this pipeline.

Local installation of Blasr and smartie-sv was required for running the program successfully on the UNC Charlotte HPC cluster.

Installing smartie-sv

Use the scripts in smartie-sv folder to locally install and run smartie-sv. This folder also contains a slurm script to get the required version of blasr.

To install unzip the file and type:

bash smartie-sv-sgb.sh

Step 2: Construction of the TE insertion dataset

The insertions and deletions were mapped on the TE library. To create the TE library, first EDTA was run on all the genomes followed by clustering by CD-HIT to create a non-redundant TE library. CD-HIT parameters:

The insertions and deletions were merged to create BTx623.merged.DEL and BTx623.merged.INS (corresponding .fa files were also created). BLAST (blastn) was run using the TE library as the database and the merged files as the query. This created files with extension .withMorethan80_TE_cov.The output in these files in based on the insertion/deletion having 80% similarity and 80% coverage.

Next step was to annotate the insertions and deletions using the reference gff3 file, which created two .anno files.

With the annotation files,the reference genome was used to identify the flanking sequence around the TEs. This was followed by using the BTx623 reference genome to identify the flanking+reference based TE insertions and flanking+non-reference based TE insertions

Step3: Determining TIPs at the population level (wild sorghum)

Using the output files in the last stage of Step 2, bwa-mem (with the same parameters as the authors had set) was used to map the wild sorghum genotypes to the TE insertions.

bwa-mem provided in the referred GitHub page was used for alignment.

Files used in each step:

The slurm scripts can also be found in the SVprocessing folder on the cluster (within the folder path mentioned above)

run-smartie.pl (smartie-sv.slurm)
merge_insertions_deletions.pl (mergeSVs.slurm)
getFASTA.pl replaced blast_INS_DEL_seq_to_TE_lib.pl (maptoTElib.slurm)
deletion_annotation.pl (TEsingenes_ref.slurm)
insertion_annotation.pl (TEsingenes_nonref.slurm)
get_referenceTEinsertions_and_flanking_Seqs.pl (getFlanking.slurm)
get_nonReferenceTEinsertions_and_flanking_Seqs.pl (getFlanking.slurm)
TE_insertions_genotype.pl (mapPopSeq_ref.slurm and mapPopSeq_nonred.slurm) - this script calls on code_genotype.pl

The final outputs for all the wild genotypes are in the folder genotype_TIPs. These outputs were merged using TIPGeneOverlaps_DEL.py script.

NOTE: While interpreting the Genotype results keep in mind that 'CC' gneotype indicates homozygous to the reference and 'GG' is heterozygous to the reference.

For TEs called on deletion data, 'CC' genotype would indicate that the TE is also present in the query genome and 'GG' would indicate deletion of the TE in the query.

For TEs called on insertion data, 'CC' genotype would indicate TE is not present in the query or the reference genomes and 'GG' would indicate insertion of the TE in the query.

Counts of BTx623 genes with TE insertions:

Genotype ID	Genes with TEs (ref based)	Genes with DEL TEs (ref based)	Genes with TEs (non-ref based)	Genes with INS TEs (non-ref based)
Grif16309	443	1034	1417	406
pi302118	462	1192	1671	491
pi302267	507	782	1249	303
pi329250	536	892	1720	442
pi329251	447	869	1720	442
pi329252	540	868	1696	471
pi369484	514	843	1667	536
pi369487	423	829	1628	489
pi524718	556	928	1644	463
pi532564	521	965	1667	433
pi532565	508	865	1654	441
pi532566	504	835	1651	416
pi532568	483	827	1658	422
pi535995	548	915	1677	503

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
pipeline		pipeline
slurm-scripts		slurm-scripts
smartie-sv		smartie-sv
README.md		README.md
TIPGeneOverlaps_DEL.py		TIPGeneOverlaps_DEL.py
TIPGeneOverlaps_INS.py		TIPGeneOverlaps_INS.py
TIPOverlaps_DEL.out		TIPOverlaps_DEL.out
TIPOverlaps_INS.out		TIPOverlaps_INS.out

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TIP (Transposable-element Insertion site Polymorphism) identification in sorghum

Step 1: Identification of insertions and deletions in the 11 sorghum genotypes (NAM lines)

Step 2: Construction of the TE insertion dataset

Step3: Determining TIPs at the population level (wild sorghum)

Files used in each step:

About

Releases

Packages

Languages

Krittika25/TIPs_sorghum

Folders and files

Latest commit

History

Repository files navigation

TIP (Transposable-element Insertion site Polymorphism) identification in sorghum

Step 1: Identification of insertions and deletions in the 11 sorghum genotypes (NAM lines)

Step 2: Construction of the TE insertion dataset

Step3: Determining TIPs at the population level (wild sorghum)

Files used in each step:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages