Skip to content
Shantanu Singh edited this page May 5, 2016 · 3 revisions

Instructions: Here is an example of the morphological profiling data analysis workflow followed by the Carpenter lab. Please write down the workflow that you use in your own group, adding new steps if needed. Also, please provide references where relevant. During the hackathon, each group will be allotted 8 minutes to present their workflow.

Biological motivations

Biological problems addressed by your group that use morphological profiling and types of perturbations that are profiled.

(What we do)

Image analysis and feature extraction

Use image analysis software to extract features from images. This results in a data matrix where the rows correspond to cells in the experiment and the columns are the extracted image features.

(How we do it)

Image quality control

Flag/remove images that are affected by technical artifacts or segmentation errors.

(How we do it)

Data cleaning

Filter out or impute missing values in the data matrix.

(How we do it)

Normalize features

Normalize cell features with respect to a reference distribution (e.g. by z-scoring against all DMSO cells on the plate).

(How we do it)

Transform features

Transform features as appropriate, e.g. log transform.

(How we do it)

Correct for systematic effects

Systematic noise such as plate effects need to handled.

(How we do it)

Select features / reduce dimensionality

Select features that are most informative, based on some appropriate criterion, or, perform dimensionality reduction

(How we do it)

Create per-well profiles

Aggregate single-cell data from each well to create a per-well morphological profile. This is typically done by computing the median across all cells in the well, per feature. Other approaches include methods to first identify sub-populations, then construct a profile by counting the number of cells in each sub-population.

(How we do it)

Measure similarity between profiles

An appropriate similarity metric is crucial to the downstream analysis. Pearson correlation and Euclidean distance are the most common metrics used.

(How we do it)

Downstream analysis / visualization

Analysis/visualization performed after creating profiles. E.g. clustering, classification, visualization using 2D embeddings, etc.


References

  • Ref 1
  • Ref 2
  • ...
Clone this wiki locally