Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Documentation for Non-CellProfiler Datasets in Pycytominer #430

Merged
merged 33 commits into from
Sep 26, 2024
Merged
Changes from 5 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
d8e7760
Updated docs on handling non-CellProfiler features
axiomcura Sep 11, 2024
5aebe2c
formatting
axiomcura Sep 11, 2024
2f7b971
update docs indicating CellProfiler data was used
axiomcura Sep 11, 2024
9c65c81
typos and formatting
axiomcura Sep 11, 2024
81497c6
added data structure documentation
axiomcura Sep 11, 2024
e88cfdd
Update README.md
axiomcura Sep 22, 2024
66e2c16
Update README.md
axiomcura Sep 22, 2024
7a7ee8f
Update README.md
axiomcura Sep 22, 2024
576ec94
Update README.md
axiomcura Sep 22, 2024
3ef0a2f
Update README.md
axiomcura Sep 22, 2024
b31d8d5
added reviewer comments
axiomcura Sep 22, 2024
17a4520
formatting
axiomcura Sep 23, 2024
9d8f223
Simplify, restructure, and clarify
gwaybio Sep 26, 2024
308fd0f
Move singlecells note to CellProfiler support section
gwaybio Sep 26, 2024
936b272
Update README.md
gwaybio Sep 26, 2024
4cb70ec
to make precommit prettier hook happy
gwaybio Sep 26, 2024
073de23
Prettier hook catching something it should not
gwaybio Sep 26, 2024
6ba3502
Update README.md
axiomcura Sep 26, 2024
bc5de22
Update README.md
axiomcura Sep 26, 2024
7b22711
Update README.md
axiomcura Sep 26, 2024
0c3d622
Update README.md
axiomcura Sep 26, 2024
dc8467b
Update README.md
axiomcura Sep 26, 2024
494edf4
Minor fixes to README after review
gwaybio Sep 26, 2024
da83c28
added new figure for the pycytominer pipeline, renamed old pipeline a…
axiomcura Sep 26, 2024
770de4b
centering figure
axiomcura Sep 26, 2024
956c9df
added figure caption
axiomcura Sep 26, 2024
61f52d0
reduced main figure height
axiomcura Sep 26, 2024
78d2d14
added references and updated docs
axiomcura Sep 26, 2024
be6621e
added references
axiomcura Sep 26, 2024
f85a2a2
Update README.md
axiomcura Sep 26, 2024
7820fc2
added full links
axiomcura Sep 26, 2024
f2caf87
Update README.md
axiomcura Sep 26, 2024
6caab11
Update README.md
axiomcura Sep 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,24 @@ Pycytominer is primarily built on top of [pandas](https://pandas.pydata.org/docs

Pycytominer currently supports [parquet](https://parquet.apache.org/) and compressed text file (e.g. `.csv.gz`) i/o.

## Data structure

Pycytominer processes all data using [Pandas](https://pandas.pydata.org/) DataFrames.

Currently, Pycytominer fully supports data generated by [CellProfiler]('https://cellprofiler.org/'), adhering to its specific data structure and naming conventions.

CellProfiler-generated image-based profiles typically consist of two main components:

- **Metadata features:** This section contains information about the experiment, such as plate ID, well position, incubation time, perturbation type, and other relevant experimental details.

- **Morphology features:** These are the quantified morphological features captured from microscopy images. Naming of these features is structured like this

The feature naming scheme in Pycytominer follows a specific structure.
axiomcura marked this conversation as resolved.
Show resolved Hide resolved

- **Metadata features:** These feature names are prefixed with `Metadata_`, indicating that the data in these columns contain metadata information.

- **Morphological features:** These follow CellProfiler’s naming conventions, where default compartments are labeled as "cells," "cytoplasm," and "nuclei." If users have different compartments in their dataset, they will need to manually specify those compartments using the `compartments` [parameter](https://pycytominer.readthedocs.io/en/stable/pycytominer.cyto_utils.html#pycytominer.cyto_utils.cells.SingleCells.compartments).

## API

Pycytominer has five major processing functions:
Expand Down Expand Up @@ -97,6 +115,8 @@ Each processing function has unique arguments, see our [documentation](https://p

The default way to use pycytominer is within python scripts, and using pycytominer is simple and fun.

The example below demonstrates how to perform normalization with a dataset generated by [CellProfiler](https://cellprofiler.org/).

```python
# Real world example
import pandas as pd
Expand All @@ -114,6 +134,42 @@ normalized_df = pycytominer.normalize(
)
```

### Handling Non-CellProfiler Morphological Features in Pycytominer

In some cases, raw morphological features may not be extracted from [CellProfiler](https://cellprofiler.org/).
While Pycytominer fully supports features extracted by [`CellProfiler`](https://cellprofiler.org/), errors may occur when using features from other tools.
axiomcura marked this conversation as resolved.
Show resolved Hide resolved

To resolve this, you can manually specify the morphological features using the `features` [parameter](https://pycytominer.readthedocs.io/en/latest/pycytominer.html#pycytominer.normalize.normalize).
axiomcura marked this conversation as resolved.
Show resolved Hide resolved
This parameter is also available in other key steps, such as [`aggregate`](https://pycytominer.readthedocs.io/en/latest/pycytominer.html#pycytominer.aggregate.aggregate), [`feature_select`](https://pycytominer.readthedocs.io/en/latest/pycytominer.html#pycytominer.feature_select.feature_select), and [`consensus`](https://pycytominer.readthedocs.io/en/latest/pycytominer.html#pycytominer.feature_select.feature_select).
axiomcura marked this conversation as resolved.
Show resolved Hide resolved

Below is an example of loading data that is not from [`CellProfiler`](https://cellprofiler.org/), demonstrating how to handle non-CellProfiler features.

```python
# Real world example using a different dataset
gwaybio marked this conversation as resolved.
Show resolved Hide resolved
import pandas as pd
import pycytominer

commit = "da8ae6a3bc103346095d61b4ee02f08fc85a5d98"
url = f"https://media.githubusercontent.com/media/broadinstitute/lincs-cell-painting/{commit}/profiles/2016_04_01_a549_48hr_batch1/SQ00014812/SQ00014812_augmented.csv.gz"

# assuming this is dataset was not generated by CellProfiler
df = pd.read_csv(url)

# split the metadata and morphology features columns
metadata_features = df.columns[0:26].tolist()
morphology_features = df.columns[26::].tolist()

# use the 'features' parameter to declare what features pycytominer should focus
normalized_df = pycytominer.normalize(
profiles=df,
features=morphology_features,
method="standardize",
samples="Metadata_broad_sample == 'DMSO'"
)
```

**Note:** We are actively working on enhancing `pycytominer` to support morphological features extracted from a variety of software tools!
axiomcura marked this conversation as resolved.
Show resolved Hide resolved

### Pipeline orchestration

Pycytominer is a collection of different functions with no explicit link between steps.
Expand All @@ -138,8 +194,10 @@ Therefore, we have included some custom tools in `pycytominer/cyto_utils` that p
- [Data processing for image-based profiling](#data-processing-for-image-based-profiling)
- [Installation](#installation)
- [Frameworks](#frameworks)
- [Data structure](#data-structure)
- [API](#api)
- [Usage](#usage)
- [Handling Non-CellProfiler Morphological Features in Pycytominer](#handling-non-cellprofiler-morphological-features-in-pycytominer)
- [Pipeline orchestration](#pipeline-orchestration)
- [Other functionality](#other-functionality)
- [CellProfiler CSV collation](#cellprofiler-csv-collation)
Expand Down
Loading