pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.

The pioneering work was done in R and results were published in Nature Methods [1]. A new and comprehensive description of this Python implementation of the SCENIC pipeline is available in Nature Protocols [5] (see here).

pySCENIC can be run on a single desktop machine but easily scales to multi-core clusters to analyze thousands of cells in no time. The latter is achieved via the dask framework for distributed computing [2].

Full documentation is available on Read the Docs

News and releases

0.10.3 | 2020-07-15

Integrate arboreto multiprocessing script into pySCENIC CLI
Skip modules with zero db overlap in cisTarget step
Additional error message if regulons file is empty
Additional error if there is a mismatch between the genes present in the GRN and the expression matrix
Fixed bug in motif url construciton when running without pruning

0.10.2 | 2020-06-05

Bugfix for CLI grn step

0.10.1 | 2020-05-17

CLI: file compression (optionally) enabled for intermediate files for the major steps: grn (adjacencies matrix), ctx (regulons), and aucell (auc matrix). Compression is used when the file name argument has a .gz ending.

Overview

The pipeline has three steps:

First transcription factors (TFs) and their target genes, together defining a regulon, are derived using gene inference methods which solely rely on correlations between expression of genes across cells. The arboreto package is used for this step.
These regulons are refined by pruning targets that do not have an enrichment for a corresponding motif of the TF effectively separating direct from indirect targets based on the presence of cis-regulatory footprints.
Finally, the original cells are differentiated and clustered on the activity of these discovered regulons.

The most impactful speed improvement is introduced by the arboreto package in step 1. This package provides an alternative to GENIE3 [3] called GRNBoost2. This package can be controlled from within pySCENIC.

All the functionality of the original R implementation is available and in addition:

You can leverage multi-core and multi-node clusters using dask and its distributed scheduler.
We implemented a version of the recovery of input genes that takes into account weights associated with these genes.
Regulons, i.e. the regulatory network that connects a TF with its target genes, with targets that are repressed are now also derived and used for cell enrichment analysis.

Additional resources

For more information, please visit LCB, or SCENIC (R version). The CLI to pySCENIC has also been streamlined into a pipeline that can be run with a single command, using the Nextflow workflow manager. There are two Nextflow implementations available:

SCENICprotocol: A Nextflow DSL1 implementation of pySCENIC alongside a basic "best practices" expression analysis. Includes details on pySCENIC installation, usage, and downstream analysis, along with detailed tutorials.
VSNPipelines: A Nextflow DSL2 implementation of pySCENIC with a comprehensive and customizable pipeline for expression analysis. Includes additional pySCENIC features (multi-runs, integrated motif- and track-based regulon pruning, loom file generation).

Acknowledgments

We are grateful to all providers of TF-annotated position weight matrices, in particular Martha Bulyk (UNIPROBE), Wyeth Wasserman and Albin Sandelin (JASPAR), BioBase (TRANSFAC), Scot Wolfe and Michael Brodsky (FlyFactorSurvey) and Timothy Hughes (cisBP).

References

[1]	Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat Meth 14, 1083–1086 (2017).

[2]	Rocklin, M. Dask: parallel computation with blocked algorithms and task scheduling. conference.scipy.org

[3]	Huynh-Thu, V. A. et al. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5, (2010).

[4]	Zeisel, A. et al. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347, 1138–1142 (2015).

[5]	Van de Sande B., Flerin C., et al. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat Protoc. June 2020:1-30. doi:10.1038/s41596-020-0336-2

Name		Name	Last commit message	Last commit date
Latest commit History 530 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
dockerfiles		dockerfiles
docs		docs
notebooks		notebooks
resources		resources
scripts		scripts
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.rst		README.rst
Singularity.0.9.18		Singularity.0.9.18
miniconda_environment_installation.sh		miniconda_environment_installation.sh
pypi.sh		pypi.sh
requirements.conda.yml		requirements.conda.yml
requirements.dev.txt		requirements.dev.txt
requirements.doc.txt		requirements.doc.txt
requirements.notebooks.txt		requirements.notebooks.txt
requirements.txt		requirements.txt
requirements_docker.txt		requirements_docker.txt
runtox.sh		runtox.sh
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini
versioneer.py		versioneer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pySCENIC

News and releases

0.10.3 | 2020-07-15

0.10.2 | 2020-06-05

0.10.1 | 2020-05-17

Overview

Additional resources

Acknowledgments

References

About

Releases

Packages

Languages

License

data-intuitive/pySCENIC

Folders and files

Latest commit

History

Repository files navigation

pySCENIC

News and releases

0.10.3 | 2020-07-15

0.10.2 | 2020-06-05

0.10.1 | 2020-05-17

Overview

Additional resources

Acknowledgments

References

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages