GitHub - ilibarra/gimme_motif_bias: Workflow to annotate cell identity TFs based on pairwise gene program comparisons between cell types.

Transcription factor motif biases calculator

Motivation

Comparison of gene programs between cell types is necessary for making decisions on Transcription Factors (TFs) useful for conversions.
Design and selection of gene sets for this propuse can be automated considering heuristics, such as top-N up-regulated genes.
Comparison of gene groups should consider between different cell lineages and redundancy between gene sets.

Solution

This Python workflow:

Calculates motif biases (log2FC and Z-scores (up-coming)) between two cell types of interest, or a celltype versus a group of celltypes, using top-N genes and TF motifs for TF-gene associations.
Summarizes values as a table, for downstream analyses

Workflow steps

Expression values are obtained an normalized into Z-scores from a expression resource (e.g. TabulaMuris)
Z-scores also preparing sets of N genes (e.g. 1000) for each cell type.
Using pre-annotated motifs (from CIS-BP) cell type pairs are compared based on this metric.
For each comparison, a log2FC is reported.

Installation and running (typical time: less than 5 minutes)

Clone repository.

git clone gimme_motif_bias.git
cd gimme_motif_bias

#### Create an enviroment suitable to run this (gmb can be replaced by any name you want).
conda create --name gmb python=3.6 --file requirements.txt
source activate gmb

Motif hits CIS-BP (mouse genome) (~1GB).
- download the following file and uncompress it in input Motif hits mm10 (Dropbox)

Dependencies (they must be installed before running).

(If creating an envinroment with requirements.txt, then already good to go).

Python 3 https://www.python.org/
Data Science packages for Python: pandas numpy
MyGene (for ENSEMBL IDs conversion steps).

Execution examples

python gimme_motif_bias.py --help # print help and exit
python gimme_motif_bias.py --listont # list all available ontologies and finish
python gimme_motif_bias.py --listmotifs ASCL1 # list all motifs related to ASCL1
# 1 versus 1
python gimme_motif_bias.py -a neuron -b hepatocyte --motifid M08474_1.94d 
# force rewriting
python gimme_motif_bias.py -a neuron -b hepatocyte --motifid M08474_1.94d --overwrite
# 1 versus many
python gimme_motif_bias.py -a neuron -b shortlist1 --motifid M08474_1.94d --overwrite

Output

A TSV table with the respective effect sizes, p-values, in long format see output/motif_ensemblid.tsv.gz
Excel table in similar format (if --xlsx is given).

Multiple runs

If running sequencially for several pairs, run with option --overwrite to update the current table
Values for repeated queries will unless cleaning the output directory. This is designed to save CPU time in the long run.

Running time

Around 1-2 minutes for one pair.
10 minutes for full execution between cell type versus all one TFs (one CPU, default parameters, verification against other genomes and RNA secondary structure assessment).
Adding more cell types increases quadratically running time quadratically.
Increasing the number of motifs increases linearly the running time.

Misc

Custom cell types and gene sets can be added manually in input/genes_by_ont as text files. Mind the format for labels to be recognized.
You can add any custom set of hits of your interest in motif_hits_cisbp_build_1.94d_mm10, following the motif hit format.

Feedback, errors, or further questions

Report in Issues.
e-mail to Ignacio Ibarra ([email protected]).

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
00_compare_zscores		00_compare_zscores
about		about
input		input
output/enrichment_heatmaps_cisbp_mm10/tabula_muris		output/enrichment_heatmaps_cisbp_mm10/tabula_muris
utilities		utilities
README.md		README.md
gimme_motif_bias.py		gimme_motif_bias.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcription factor motif biases calculator

Motivation

Solution

Workflow steps

Installation and running (typical time: less than 5 minutes)

Dependencies (they must be installed before running).

Execution examples

Output

Multiple runs

Running time

Misc

Feedback, errors, or further questions

About

Releases

Packages

Languages

ilibarra/gimme_motif_bias

Folders and files

Latest commit

History

Repository files navigation

Transcription factor motif biases calculator

Motivation

Solution

Workflow steps

Installation and running (typical time: less than 5 minutes)

Dependencies (they must be installed before running).

Execution examples

Output

Multiple runs

Running time

Misc

Feedback, errors, or further questions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages