Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Documentation for Non-CellProfiler Datasets in Pycytominer #430

Merged
merged 33 commits into from
Sep 26, 2024
Merged
Changes from 17 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
d8e7760
Updated docs on handling non-CellProfiler features
axiomcura Sep 11, 2024
5aebe2c
formatting
axiomcura Sep 11, 2024
2f7b971
update docs indicating CellProfiler data was used
axiomcura Sep 11, 2024
9c65c81
typos and formatting
axiomcura Sep 11, 2024
81497c6
added data structure documentation
axiomcura Sep 11, 2024
e88cfdd
Update README.md
axiomcura Sep 22, 2024
66e2c16
Update README.md
axiomcura Sep 22, 2024
7a7ee8f
Update README.md
axiomcura Sep 22, 2024
576ec94
Update README.md
axiomcura Sep 22, 2024
3ef0a2f
Update README.md
axiomcura Sep 22, 2024
b31d8d5
added reviewer comments
axiomcura Sep 22, 2024
17a4520
formatting
axiomcura Sep 23, 2024
9d8f223
Simplify, restructure, and clarify
gwaybio Sep 26, 2024
308fd0f
Move singlecells note to CellProfiler support section
gwaybio Sep 26, 2024
936b272
Update README.md
gwaybio Sep 26, 2024
4cb70ec
to make precommit prettier hook happy
gwaybio Sep 26, 2024
073de23
Prettier hook catching something it should not
gwaybio Sep 26, 2024
6ba3502
Update README.md
axiomcura Sep 26, 2024
bc5de22
Update README.md
axiomcura Sep 26, 2024
7b22711
Update README.md
axiomcura Sep 26, 2024
0c3d622
Update README.md
axiomcura Sep 26, 2024
dc8467b
Update README.md
axiomcura Sep 26, 2024
494edf4
Minor fixes to README after review
gwaybio Sep 26, 2024
da83c28
added new figure for the pycytominer pipeline, renamed old pipeline a…
axiomcura Sep 26, 2024
770de4b
centering figure
axiomcura Sep 26, 2024
956c9df
added figure caption
axiomcura Sep 26, 2024
61f52d0
reduced main figure height
axiomcura Sep 26, 2024
78d2d14
added references and updated docs
axiomcura Sep 26, 2024
be6621e
added references
axiomcura Sep 26, 2024
f85a2a2
Update README.md
axiomcura Sep 26, 2024
7820fc2
added full links
axiomcura Sep 26, 2024
f2caf87
Update README.md
axiomcura Sep 26, 2024
6caab11
Update README.md
axiomcura Sep 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 25 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,29 @@ Pycytominer is primarily built on top of [pandas](https://pandas.pydata.org/docs

Pycytominer currently supports [parquet](https://parquet.apache.org/) and compressed text file (e.g. `.csv.gz`) i/o.

### CellProfiler support

Currently, Pycytominer fully supports data generated by [CellProfiler]('https://cellprofiler.org/'), adhering defaults to its specific data structure and naming conventions.
axiomcura marked this conversation as resolved.
Show resolved Hide resolved

CellProfiler-generated image-based profiles typically consist of two main components:

- **Metadata features:** This section contains information about the experiment, such as plate ID, well position, incubation time, perturbation type, and other relevant experimental details. These feature names are prefixed with `Metadata_`, indicating that the data in these columns contain metadata information.
- **Morphology features:** These are the quantified morphological features prefixed with the default compartments (`Cells_`, `Cytoplasm_`, and `Nuclei_`). Pycytominer also supports non-default compartment names.
gwaybio marked this conversation as resolved.
Show resolved Hide resolved
gwaybio marked this conversation as resolved.
Show resolved Hide resolved

Note, [`pycytominer.cyto_utils.cells.SingleCells()`](pycytominer/cyto_utils/cells.py) contains code to interact with single-cell SQLite files, which are output from CellProfiler.
axiomcura marked this conversation as resolved.
Show resolved Hide resolved
Processing capabilities for SQLite files depends on SQLite file size and your available computational resources (for ex. memory and cores).
axiomcura marked this conversation as resolved.
Show resolved Hide resolved

### Handling inputs from other image analysis tools (other than CellProfiler)

Pycytominer also supports processing of raw morphological features from image analysis tools beyond [CellProfiler](https://cellprofiler.org/).
These tools include In Carta, Harmony, and others.
axiomcura marked this conversation as resolved.
Show resolved Hide resolved
Using Pycytominer with these tools requires minor modifications to function arguments, and we encourage these users to pay particularly close attention to individual function documentation.

For example, to resolve potential feature issues in the `normalize()` function, you must manually specify the morphological features using the `features` [parameter](https://pycytominer.readthedocs.io/en/latest/pycytominer.html#pycytominer.normalize.normalize).
This parameter is also available in other key steps, such as [`aggregate`](https://pycytominer.readthedocs.io/en/latest/pycytominer.html#pycytominer.aggregate.aggregate) and [`feature_select`](https://pycytominer.readthedocs.io/en/latest/pycytominer.html#pycytominer.feature_select.feature_select).
axiomcura marked this conversation as resolved.
Show resolved Hide resolved

If you are using Pycytominer with these other tools, we'd love to hear from you so that we can learn how to best support broad and multiple use-cases.
gwaybio marked this conversation as resolved.
Show resolved Hide resolved

## API

Pycytominer has five major processing functions:
Expand Down Expand Up @@ -97,6 +120,8 @@ Each processing function has unique arguments, see our [documentation](https://p

The default way to use pycytominer is within python scripts, and using pycytominer is simple and fun.

The example below demonstrates how to perform normalization with a dataset generated by [CellProfiler](https://cellprofiler.org/).

```python
# Real world example
import pandas as pd
Expand Down Expand Up @@ -135,21 +160,6 @@ And, more specifically than that, image-based profiling readouts from [CellProfi

Therefore, we have included some custom tools in `pycytominer/cyto_utils` that provides other functionality:

- [Data processing for image-based profiling](#data-processing-for-image-based-profiling)
- [Installation](#installation)
- [Frameworks](#frameworks)
- [API](#api)
- [Usage](#usage)
- [Pipeline orchestration](#pipeline-orchestration)
- [Other functionality](#other-functionality)
- [CellProfiler CSV collation](#cellprofiler-csv-collation)
- [Creating a cell locations lookup table](#creating-a-cell-locations-lookup-table)
- [Generating a GCT file for morpheus](#generating-a-gct-file-for-morpheus)
- [Citing pycytominer](#citing-pycytominer)

Note, [`pycytominer.cyto_utils.cells.SingleCells()`](pycytominer/cyto_utils/cells.py) contains code to interact with single-cell SQLite files, which are output from CellProfiler.
Processing capabilities for SQLite files depends on SQLite file size and your available computational resources (for ex. memory and cores).

### CellProfiler CSV collation

If running your images on a cluster, unless you have a MySQL or similar large database set up then you will likely end up with lots of different folders from the different cluster runs (often one per well or one per site), each one containing an `Image.csv`, `Nuclei.csv`, etc.
Expand Down
Loading