"Surf" the biological network, from genome to transcriptome to proteome and back to gain insights into human disease biology.
Contents
- Python 3.9 or higher
- Python packages (numpy, more-itertools, intervaltree, biopython, attrs, tqdm)
- Database (sqlalchemy >=1.4)
- Vizualization (matplotlib, brokenaxes)
Clone the project repository (using SSH if need be) and create a new conda environment if needed.
# Clone the repository
git clone https://github.com/sheynkman-lab/biosurfer
# Move to the folder
cd biosurfer
# Run setup
pip install --editable .
Usage: biosurfer [OPTIONS] COMMAND1 [ARGS]... [COMMAND2 [ARGS]...]...
Options:
--help Show this message and exit.
Commands:
hybrid_alignment This script runs hybrid alignment on the provided...
load_db Loads transcript and protein isoform information from...
plot Plot isoforms from a single gene, specified by...
- Download the toy gencode data from Zenodo into the project directory.
Usage: biosurfer load_db [OPTIONS]
Loads transcript and protein isoform information from provided files into a
Biosurfer database. A new database is created if the target database does
not exist.
Options:
-v, --verbose Will print verbose messages
-d, --db_name TEXT Database name [required]
--source [GENCODE|PacBio] Source of input data [required]
--gtf PATH Path to gtf file [required]
--tx_fasta PATH Path to transcript sequence fasta file
[required]
--tl_fasta PATH Path to protein sequence fasta file [required]
--sqanti PATH Path to SQANTI classification tsv file (only for
PacBio isoforms)
--help Show this message and exit.
biosurfer load_db --source=GENCODE --gtf biosurfer_gencode_toy_data/gencode.v38.toy.gtf --tx_fasta biosurfer_gencode_toy_data/gencode.v38.toy.transcripts.fa --tl_fasta biosurfer_gencode_toy_data/gencode.v38.toy.translations.fa --db_name gencode_toy
Running GENCODE files without --ref
will
biosurfer load_db --source=PacBio --gtf biosurfer_wtc11_data/wtc11_with_cds.gtf --tx_fasta biosurfer_wtc11_data/wtc11_corrected.fasta --tl_fasta biosurfer_wtc11_data/wtc11_orf_refined.fasta --sqanti biosurfer_wtc11_data/wtc11_classification.txt --db_name wtc11_db
- Run hybdrid alignment script on the created database. Create a directory to store the output files.
biosurfer hybrid_alignment -d gencode_toy -o output/gencode_toy -- gencode
Usage: biosurfer hybrid_alignment [OPTIONS]
This script runs hybrid alignment on the provided database.
Options:
-v, --verbose Print verbose messages
-d, --db_name TEXT Database name [required]
-o, --output DIRECTORY Directory for output files
--gencode Also compare all GENCODE isoforms of a gene against
its anchor isoform
--anchors FILE TSV file with gene names in column 1 and anchor
isoform IDs in column 2
--help Show this message and exit.
Please note that in the code, the terms
anchor
andother
correspond to thereference
andalternative
isoforms mentioned in the manuscript.
- To visualization isoforms of CRYBG2 gene, run the following snippet.
biosurfer plot -d gencode_toy --gene CRYBG2
Usage: biosurfer plot [OPTIONS] [TRANSCRIPT_IDS]...
Plot isoforms from a single gene, specified by TRANSCRIPT_IDS.
Options:
-v, --verbose Print verbose messages
-o, --output DIRECTORY Directory in which to save plots
-d, --db_name TEXT Database name [required]
--gene TEXT Name of gene for which to plot all isoforms;
overrides TRANSCRIPT_IDS
--help Show this message and exit.