Skip to content

Commit

Permalink
Simplify, restructure, and clarify
Browse files Browse the repository at this point in the history
  • Loading branch information
gwaybio authored Sep 26, 2024
1 parent 17a4520 commit 9d8f223
Showing 1 changed file with 11 additions and 46 deletions.
57 changes: 11 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,23 +66,25 @@ Pycytominer is primarily built on top of [pandas](https://pandas.pydata.org/docs

Pycytominer currently supports [parquet](https://parquet.apache.org/) and compressed text file (e.g. `.csv.gz`) i/o.

## Data structure
### CellProfiler support

Pycytominer processes all data using [Pandas](https://pandas.pydata.org/) DataFrames.

Currently, Pycytominer fully supports data generated by [CellProfiler]('https://cellprofiler.org/'), adhering to its specific data structure and naming conventions.
Currently, Pycytominer fully supports data generated by [CellProfiler]('https://cellprofiler.org/'), adhering defaults to its specific data structure and naming conventions.

CellProfiler-generated image-based profiles typically consist of two main components:

- **Metadata features:** This section contains information about the experiment, such as plate ID, well position, incubation time, perturbation type, and other relevant experimental details.
- **Metadata features:** This section contains information about the experiment, such as plate ID, well position, incubation time, perturbation type, and other relevant experimental details. These feature names are prefixed with `Metadata_`, indicating that the data in these columns contain metadata information.
- **Morphology features:** These are the quantified morphological features captured from microscopy images. Thse feature names are prefixed with the default compartments ("Cells_", "Cytoplasm_", and "Nuclei_"). Pycytominer supports non-default compartment names.

- **Morphology features:** These are the quantified morphological features captured from microscopy images. Naming of these features is structured like this
### Handling inputs from other image analysis tools (other than CellProfiler)

These features follow a specific structure.
Pycytominer also supports processing of raw morphological features from image analysis tools beyond [CellProfiler](https://cellprofiler.org/).
These tools include In Carta, Harmony, and others.
Using Pycytominer with these tools requires minor modifications to function arguments, and we encourage these users to pay particularly close attention to individual function documentation.

- **Metadata features:** These feature names are prefixed with `Metadata_`, indicating that the data in these columns contain metadata information.
For example, to resolve potential feature issues in the `normalize()` function, you must manually specify the morphological features using the `features` [parameter](https://pycytominer.readthedocs.io/en/latest/pycytominer.html#pycytominer.normalize.normalize).
This parameter is also available in other key steps, such as [`aggregate`](https://pycytominer.readthedocs.io/en/latest/pycytominer.html#pycytominer.aggregate.aggregate) and [`feature_select`](https://pycytominer.readthedocs.io/en/latest/pycytominer.html#pycytominer.feature_select.feature_select).

- **Morphological features:** These follow CellProfiler’s naming conventions, where default compartments are labeled as "cells," "cytoplasm," and "**nuclei**." If users have different compartments in their dataset, they will need to manually specify those compartments using the `compartments` [parameter](https://pycytominer.readthedocs.io/en/stable/pycytominer.cyto_utils.html#pycytominer.cyto_utils.cells.SingleCells.compartments).
If you are using Pycytominer with these other tools, we'd love to hear from you so that we can learn how to best support broad and multiple use-cases.

## API

Expand Down Expand Up @@ -134,43 +136,6 @@ normalized_df = pycytominer.normalize(
)
```

### Handling Non-CellProfiler Morphological Features in Pycytominer

Pycytominer also supports processing of raw morphological features from image analysis tools beyond [CellProfiler](https://cellprofiler.org/).
These tools include In Carta, Harmony, and others.
Using Pycytominer with these tools requires minor modifications to function arguments, and we encourage these users to pay particularly close attention to individual function documentation.

If you are using Pycytominer with these other tools, we'd love to hear from you so that we can learn how to best support broad and multiple use-cases.

For example, to resolve potential feature issues in the `normalize()` function, you can manually specify the morphological features using the `features` [parameter](https://pycytominer.readthedocs.io/en/latest/pycytominer.html#pycytominer.normalize.normalize).
This parameter is also available in other key steps, such as [`aggregate`](https://pycytominer.readthedocs.io/en/latest/pycytominer.html#pycytominer.aggregate.aggregate) and [`feature_select`](https://pycytominer.readthedocs.io/en/latest/pycytominer.html#pycytominer.feature_select.feature_select).

Below is an example of loading data that is not from [`CellProfiler`](https://cellprofiler.org/), demonstrating how to handle non-CellProfiler features.

```python
# Real world example using a different dataset
import pandas as pd
import pycytominer

commit = "da8ae6a3bc103346095d61b4ee02f08fc85a5d98"
url = f"https://media.githubusercontent.com/media/broadinstitute/lincs-cell-painting/{commit}/profiles/2016_04_01_a549_48hr_batch1/SQ00014812/SQ00014812_augmented.csv.gz"

# assuming this is dataset was not generated by CellProfiler
df = pd.read_csv(url)

# split the metadata and morphology features columns
metadata_features = df.columns[0:26].tolist()
morphology_features = df.columns[26::].tolist()

# use the 'features' parameter to declare what features pycytominer should focus
normalized_df = pycytominer.normalize(
profiles=df,
features=morphology_features,
method="standardize",
samples="Metadata_broad_sample == 'DMSO'"
)
```

### Pipeline orchestration

Pycytominer is a collection of different functions with no explicit link between steps.
Expand Down

0 comments on commit 9d8f223

Please sign in to comment.