Skip to content

natallah/methylRad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

methylRad

  • Usage: python3 1_genomeCountSites.py -i genome_ref.fasta

1b_CountTotSites.sh

  • Counts the number of sites found in Reference Genome for each pattern

2_extractFields.sh

  • Extract necessary fields from bamfiles and save in a .tsv file

2b_indexBamFile.sh

  • Index sorted bamfiles

3_conv2CountMat.py

  • Convert .tsv file to count matrix. Results are stored in the provided output folder name
  • Output: (pattern_name)_sitesLoc.tsv - contains sites and counts, regardless of CGAR strings (pattern_name)_sitesLocRaw.tsv - contains all reads with insertion and deletion in the CGAR strings. Use to edit the sites accordingly.
  • Usage: python3 3_conv2CountMat.py -i (name_of_file.tsv) -o (name_of_folder_for_results) ex: python3 3_conv2CountMat.py -i UD_MAPQ10_coord_sorted.tsv -o UD_readsCatalogue

3_conv2CountMat.sh

  • Script to run 3_conv2CountMat.py on all samples

4_CompareSites.py

  • Compare sites found in 3_conv2CountMat.py to the sites found in 1_genomeCountSites.py to find mismatches.
  • Prints out number of sites in and out of the reference genome.
  • Edit sites according to the CGAR strings insertion, deletions, substitution
  • Output: (pattern_name)_finalSites.tsv - contains edited sites and counts
  • Usage: python3 4_compareSites.py -i (folder_name_of_samples_with_tsv_files) ex: python3 4_CompareSites.py -i UD_readsCatalogue

4_CompareSites.sh

  • Script to run 4_CompareSites.py on all samples

4a_RemoveDupSites.py

  • Collect all sites from every samples into one dictionary. This collection still has duplicates between patterns. Results saved to SaveData/SampleSites.pkl and (sample_name)_readsCatalogue/(sample_name)_allSites.tsv

  • Find duplicates between patterns, combine counts, and save results to (sample_folder)/(sample_name)_allSites_noDups.tsv

  • Include RPM and save to (sample_folder)/(sample_name)_allSites_noDups_final.tsv

  • Usage: python3 4a_RemoveDupSites.py

4b_VizCounts.py

  • Count number of overlapping sites in each sample vs the sites found in the reference genome
  • Note: still uses unedited sites. Need to redo this to use no duplicated sites and edited sites
  • Usage: python3 4b_VizCounts.py

4b_runVizCounts.py

  • Script to run 4b_VizCounts.py

5_CompareSamples.py

  • Find overlapping sites and differences for every pair of samples as well as RA4_TCP + RA4_PG - UD Save results to SaveData/SampleSitesCompare.pkl and SaveData/SampleSitesCompareList.pkl Bed files are saved in their respective comparisons under Results. Output example: Results/UD_vs_RA4 RA4_diff_sites.bed - contains sites that are in RA4 and not in UD UD_diff_sites.bed - contains sites that are in UD and not in RA4 Intersection_sites.bed - contains sites that are in both UD and RA4 * naming convention are similar for all comparisons

  • Find overlapping sites and differences amongst all RA4_TCP - UD, RA4_PG - UD, and RA4 - UD. Creates Venn diagram of all sets which is found in Figures/RA4_UD_diffVenn.pdf Save results as a bed file under Results/ Output example: Results/RA4_D_vs_PG_D_vs_TCP_D/RA4_UD_only - are sites found only in RA4 - UD Results/RA4_D_vs_PG_D_vs_TCP_D/RA4_UD_vs_TCP_intersection - are sites found in both RA4 - UD and RA4_TCP - UD Results/RA4_D_vs_PG_D_vs_TCP_D/RA4_PG_UD_only - are sites found in RA4_PG - UD only Results/RA4_D_vs_PG_D_vs_TCP_D/RA4_PG_vs_TCP_intersection - are sites found in both RA4_PG - UD and RA4_TCP - UD Results/RA4_D_vs_PG_D_vs_TCP_D/RA4_TCP_UD_only - are sites found in RA4_TCP - UD only Results/RA4_D_vs_PG_D_vs_TCP_D/RA4_UD_vs_PG_intersection - are sites found in both RA4 - UD and RA4_PG - UD Results/RA4_D_vs_PG_D_vs_TCP_D/RA4_vs_PG_vs_TCP_intersection - are sites found in all RA4 - UD, RA4_TCP - UD and RA4_PG - UD

Usage: Line-by-line from python console. * Needs a little cleaning up

5a_VizCompareSamples.py

  • Create Venn diagrams for site differences between samples

  • Usage: python3 5a_VizCompareSamples.py

6_AnnotateWithGenome.sh

  • Convert gtf files to bed files
  • Convert (sample_name)_allSites.tsv into bed files and change chromosome type from "1" to "chr1"
  • Use BEDTools intersect to annotate (sample_comparisons)_allSites.bed to gene using Mus_musculus.GRCm38.93.chr.bed
  • Use BEDTools closest to annotate (sample_comparisons)_allSites.bed to nearest gene using Mus_musculus.GRCm38.93.chr.bed
  • Use BEDTools intersect to annotate Results/RA4_D_vs_PG_D_vs_TCP_D for sets using Mus_musculus.GRCm38.93.chr.bed
  • Convert promoters.txt to bed file

7_AnnotateSites.py

  • In progress, annotate sites using pybedtools instead of BEDTools. Maybe useful for creating a package later.

7_AnnotateSites.sh

  • Adds 1KB to LSD1occupied_enhancers.mm10.use.bed and save as LSD1occupied_enhancers.mm10.use.edit.bed
  • Adds 1KB to ESC_J1.enhancers.use.bed and save as ESC_J1.enhancers.use.edit.bed
  • Order of annotations: LSD1, ESC, promotors.
  • Annotate to LSD1 sites in RA4 - UD, RA4_PG - UD, RA4_TCP - UD, and sites found only in RA4 - UD when compare to the rest (RA4_UD_only_sites.bed). Store reminder of sites in FinalAnnotation/(sample_name_diff)_sites_noLSD1*
  • Annotate FinalAnnotation/(sample_name_diff)_sites_noLSD1* to ESC1
  • Annotate to promoters
  • Annotate (sample_name)_allSites_noDups_final.tsv to LSD1, ESC, promoters.
  • Annotate with count data to get foldChanges

8_makePieCharts.R

  • Produces pie chart from annotation using ChIPSeeker

9_FoldChange.py

  • Compute Log2 fold changes and save results under Results/FoldChanges
  • Usage: Line-by-line from python console

9a_FoldChange.sub

  • Script to submit 9_FoldChange.py

10_AdditionalAnalysis.py

  • Find common genes between sites only in RA4 annotated to promoters vs enhancers
  • Sites analysis between RA4_PG - UD and RA4_TCP - UD
  • Count the number of patterns in each samples

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published