Workflow Boutros lab

Boutros lab workflow

Biological motivations

In our lab we are broadly interested in various aspects of signaling and functional genomics centered mostly around colorectal cancer. Therin, we develop and apply high-throughput methods of gene perturbations in order to study for example the dynamic properties of genetic interactions, resistance mechanisms and cancer vulnerabilities. To this end, we perform arrayed high-content RNAi or drug screens in Drosophila and human cells with various phenotypic readouts, e.g. fluorescent reporters, high-content imaging, luciferase reporter assays.

More details can be found on www.boutroslab.org

Image analysis and feature extraction

In our arrayed screens we stain with three channels and image 20x tiles/well on an incell2200 analyzer (GE). Acquired images (16-bit, greyscale, 2024x2024 px) are directly piped via glassfiber onto a servercluster, which inline performs all processing steps using the R/Bioconductor packages EBImage, CRimage and FNN. The feature extraction follows a pipeline, roughly adapted from Horn et al. and Fischer et al.. There images are read in, log transformen, normalized and gaussian blurred. From the smoothened images binary maskes are extracted giving nuclear and cell body outline. Objects are expanded from the nuclear outlines in to the cell body mask. Shape features are extracted for each cellular compartement on the outlines, while texture and intensity features are extracted from the raw/untouched images of the corresponding channel.

Single cell data are stored for each feature in each well.

Image quality control

Extra, boolean features demarking artifact prone images are added to the feature vector. Segmentation errors on single cells are excluded by summarizing the feature vectors per well using the mean, trimmend by the 5% quantile along all cell of a well. All fields per well contribute equally to the feature vector for that well, assuming that the show similar numbers of cells.

Normalize features

Features are normalized according to either the per-plate B-Score approach, if wells contain randomized perturbations, or on the basis of the distribution of the negaitve controls per plate.

Transform features

Features are transformed on to a logarithmic scale using the generalized logarithm as outlined in Huber et al. Fischer et al. and scaled using a Z-Score like scaling on the basis of the overall distribution of the entire dataset.

Correct for systematic effects

Plate effects, such as resulting from cell seeding errors should have been taken care of by the per plate normalization to its negative controls.

Data cleaning

In out latest project we flagged plates which show pearson correlation of replicates smalleer than 0.6 and/or a Z'-factor smaller than 0.3 between positive and negative controls. Fallging was performed by setting their entire datavector NA.

Select features / reduce dimensionality

Features are selected on a case to case basis accoriding to reproducibility and information content. We try to take care to exclude biologically informative or redundant features while ensuring reproducible measurements and maintain single observations, where globally reduntant features diverge. This way we aim to obtain a most diverse and complete featurevector giving relevant information on every phenotype.

Create per-well profiles

Per well profiles are obtained by summarizing the feature vectors of each cell in every field of view belonging to the same well under the trimmed mean.

Measure similarity between profiles

Depending on the resulting set of features (more or less redundant or strong) the metric should be chosen. While pearson correlation and euclidean distance are rather prone to follow covariate or very "information rich" features. Spearman correlation or mahalanobis distance could work less biased.

Downstream analysis / visualization

Or latest approaches center around the idea of interactivity. Since every person has a different approach to understanding complex data, we believe that interactive analysis tools (e.g. R/shiny) offer a very intuitive and easy to grasp enty-point. From these vizualizations (networks, interactive heatmaps or scatters) one can derive conclusions and ready to print-graphs and plots for paper or publication layout. With this we try to escape the haiball trap, where the biggest problem is their non-usefulness, once printed.