Skip to content

Analysis Framework for Pooled CRISPR Genome Editing Screens

License

Notifications You must be signed in to change notification settings

pinellolab/perturb-tools

Repository files navigation

perturb-tools_logo

PyPI pyversions PyPI version Code style: black

perturb-tools is an analysis framework for pooled CRISPR genome-editing screens. Thus far, development has focused on local (i.e., not genome-wide) tiling screens with specific phenotypic readouts though expansion of this scope is of interest.

Data Structure and Analysis Framework

import perturb_tools as pt

screen = pt.Screen(X)
Genome Editing Screen composed of: n_guides x n_conditions = 946 x 12

   guides:    'barcode', 'barcode_id', 'experiment', 'sequence', 'target_id', 'pred_ABE_edit', 'pred_CBE_edit'
   samples:    'condition', 'replicate'
   samples_m:  'barcode_counts', 'unexpected_sequences'
   samples_p:  'correlation'
   layers:    'X_lognorm'
   uns:       'run_info', 'poolq3', 'metadata', 'SampleBarcodeReadCounts', 'CommonSampleBarcodeReadCounts'

This format and organization of metadata surrounding a multidimensional experiment is inspired by AnnData, a useful solution for the analogous organization of single-cell data.

The three main components of this data strcuture:

  • screen.X (Numpy array)

  • screen.samples (pandas DataFrame) of shape: [n_samples x sample_annotation]

  • screen.guides (pandas DataFrame) of shape: [n_guides x guide_annotation]

See the tutorial for more information.

Installation

Install the development package:

# (1) clone this repository
git clone https://github.com/pinellolab/perturb-tools.git

# (2) install the local project in editable mode
cd ./perturb-tools; pip install -e .

General analysis Steps

  • See tutorial which includes:
    • API tutorial
    • Normalization
    • Arithmetic
      • Calculating the mean, standard deviation, and log-fold change between/across replicates
      • Correlation calculation
  • Hit discovery (under development)
  • Visualization (under development)

Items under consideration:

  1. Sequence prediction of targeted base-edit

  2. TF motif annotation

    a. Occupancy of Cas9 for CRISPRi (and how this may disrupt a TF motif)

    b. Putative creation / destruction of TF motifs upon predicted base-editing outcome