Workflow Linington lab

Instructions: Here is an example of the morphological profiling data analysis workflow followed by the Carpenter lab. Please write down the workflow that you use in your own group, adding new steps if needed. Also, please provide references where relevant. During the hackathon, each group will be allotted 8 minutes to present their workflow.

Biological motivations

Biological problems addressed by your group that use morphological profiling and types of perturbations that are profiled.

We are interested in using mammalian cytological profiling data to provide phenotypic descriptions of the effect of natural products and natural product extracts on cell development, and to use these results in conjunction with high-resolution metabolomics analysis to relate chemical and biological features for the de novo prediction of natural product modes of action directly from primary screening data.[Schulze, 2013, Woehrmann, 2013, Kurita, 2015] The objective is to move the natural products discovery model from the existing 'grind and find' paradigm, where little is known about structure or function until late in the discovery model, to an ordered hypothesis-driven discovery platform.

Separately, we have also developed tools for bacterial cytological profiling, where the objective is to use phenotypic transitions to predict bacterial MOAs. Unlike the mammalian cell analyses, the bacterial work was not possible using standard software packages, and we have developed our own code for this analysis [Peach, 2013]

Image analysis and feature extraction

Use image analysis software to extract features from images. This results in a data matrix where the rows correspond to cells in the experiment and the columns are the extracted image features.

We currently use a suite of 5 stains (nuclear, tubulin, actin, DNA synthesis, mitosis) and the commercial MetaXpress software from Molecular Devices for segmentation, feature extract and cell-by-cell quantifiction.

For the bacterial work we use a cell like that is has a chromosomal GFP tag, and extract fewer direct size and shape features.

Image quality control

Flag/remove images that are affected by technical artifacts or segmentation errors.

Largely handled by MetaXpress for mammalian cells. Out of focus or other technical artifacts identified by statistically abnormal variations in grayscale gradient in final images.

Data cleaning

Filter out or impute missing values in the data matrix.

Missing or null values are usually the result of poor staining/ cell culture work. While the pipeline will run with null values, re-acquisition of primary imaging data is our primary solution to absent values in data matrix

Normalize features

Normalize cell features with respect to a reference distribution (e.g. by z-scoring against all DMSO cells on the plate).

For mammalian cells: Export of raw per cell data is followed by 'his-diff' evaluation, which examines the distribution of feature values for the cell population vs. negative controls, and scores peak-to-peak differences in distribution maxima as either positive or negative values. Difference values are normalized to a -1 to +1 scale, and correlated using any of a number of standard correlation tools (e.g. Pearson).

Transform features

Transform features as appropriate, e.g. log transform.

Not done

Correct for systematic effects

Systematic noise such as plate effects need to handled.

Strategy under development

Select features / reduce dimensionality

Select features that are most informative, based on some appropriate criterion, or, perform dimensionality reduction

Largely done by eliminating variables from sets with co-linear responses to chemical perturbations. We could do better here.

Create per-well profiles

Aggregate single-cell data from each well to create a per-well morphological profile. This is typically done by computing the median across all cells in the well, per feature. Other approaches include methods to first identify sub-populations, then construct a profile by counting the number of cells in each sub-population.

Compute medians through His-Diff.

Measure similarity between profiles

An appropriate similarity metric is crucial to the downstream analysis. Pearson correlation and Euclidean distance are the most common metrics used.

Pearson

Downstream analysis / visualization

Analysis/visualization performed after creating profiles. E.g. clustering, classification, visualization using 2D embeddings, etc.

We have invested a lot of time into this element, particularly with respect to the incorporation of metabolomic profiling data and the subsequent predictions of compound MOAs from complex mixtures. We have used Cytoscape and Gephi extensively for this, as well as a number of in-house display methods. This is a key challenge for our team, and visualization approaches have a strong impact on lead selection and prioritization, particularly among non-specialists in informatics.

References

Kurita, K. L.; Glassey, E.; Linington, R. G.* "Integration of High-Content Screening and Untargeted Metabolomics for Comprehensive Functional Annotation of Natural Product Libraries." Proceedings of the National Academy of Sciences, USA, 2015, 112, 11999-12004. PMID: 26371303

Peach, K. C.; Bray, W. M.; Winslow, D. Linington, P. F.; Linington, R. G.* "Mechanism of Action-Based Classification of Antibiotics using High-Content Bacterial Image Analysis Molecular Biosystems 2013, 9, 1837 - 1848. PMCID: PMC3674180

Schulze, C. J.; Bray, W. M.; Woerhman, M. H.; Stuart, J.; Lokey, R. S.; Linington, R. G.* "‘‘Function-First’’ Lead Discovery: Mode of Action Profiling of Natural Product Libraries Using Image- Based Screening" Chemistry and Biology 2013, 20, 285-295. PMCID: PMC3584419

Woehrmann MH, Bray WM, Durbin JK, Nisam SC, Michael AK, Glassey E, Stuart JM, Lokey RS. "Large-scale cytological profiling for functional analysis of bioactive compounds." Mol Biosyst. 2013 Nov;9(11):2604-17. doi: 10.1039/c3mb70245f. PMID:24056581