Welcome to the SAGs repo README. This document outlines the contents of folders, files, and documents included in the SAG repo.
As of latest commit, SAGs contains the following folders and files:
- bubble_plots: a collection of bubble plots by SAG
- dN_dS_scatter_images: scatterplots of dN/dS by ITEP cluster presence/absence
- KO_SNV_misc: miscellaneous files for Seq Object data structures: awaiting sorting or deletion.
- misc_images: miscellaneous/one-time images
- misc_trash: miscellaneous formatting files awaiting deletion.
- pa_files: text files related to ITEP/clusterDbanaylsis presence/absence tables.
- python_files: scripts, pipelines, custom and classes written in Python 3. Awaiting further organization and documentation.
- SAG_data_files: .gff, .ko, .fa assembly files, .tsv contig names (from anvi-script), and .names_map files for each SAG. This is all the data that needs to be integrated for non-ITEP/non-PAML analysis.
- variability.zip: zipped anvi'o outputs for SNV and SAAV variability profiles for all 5 SAGs.
- papers: the papers I've been reading and pre-existing info on the research
These files are written in Python 3. The following modules are used: \
- pandas
- numpy
- matplotlib (pyplot)
- sklearn (commented out)
- scipy
- pickle
Note: The majority of the python modules and functions in this directory are intended for single-purpose use on specific file types.
Contains code for custom Python SNV, SAAV, Contig and ORF objects. Designed to store all of the information in the SAG_data_files folder in Python objects to easily sort, parse and analyze this information. The Python object modules are:
- SNV,
- SAAV
- Contig
- ORF
View the module files for a list of data attributes stored in these objects.
An array of functions for reading the input files and constructing lists of Sequence objects exist:
- make_contig_ORF_and_SNV_lists.py
- make_SNVs_and_Contigs.py
- makeSequenceObjects.py
- mergeORFpaData.py
- organizeKO.py
- populateContigLists.py
These functions must be called in order to properly initialize the Sequence objects.
Michael Hoffert - Undergrad, [email protected]
Rika Anderson - PI, Space Hogs