Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor pipeline to use grain crop dictionaries #1022

Draft
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

SylviaWhittle
Copy link
Collaborator

This is a draft PR and documentation / tests will be added before full PR is made

TopoStats Pull Requests

Please provide a descriptive summary of the changes your Pull Request introduces.

The Software Development section of
the Contributing Guidelines may be useful if you are unfamiliar with linting, pre-commit, docstrings and testing.

NB - This header should be replaced with the description but please complete the below checklist or a short
description of why a particular item is not relevant.


Before submitting a Pull Request please check the following.

  • Existing tests pass.
  • Documentation has been updated and builds. Remember to update as required...
    • docs/configuration.md
    • docs/usage.md
    • docs/data_dictionary.md
    • docs/advanced.md and new pages it should link to.
  • Pre-commit checks pass.
  • New functions/methods have typehints and docstrings.
  • New functions/methods have tests which check the intended behaviour is correct.

Optional

topostats/default_config.yaml

If adding options to topostats/default_config.yaml please ensure.

  • There is a comment adjacent to the option explaining what it is and the valid values.
  • A check is made in topostats/validation.py to ensure entries are valid.
  • Add the option to the relevant sub-parser in topostats/entry_point.py.

SylviaWhittle and others added 21 commits November 20, 2024 14:35
@SylviaWhittle
Copy link
Collaborator Author

Proposed solution to the data frame issue

|----------------------------------------------------------------------------------------------------------------------|
|   image   |    direction   |     class         | grain | molecule | ... <grainstats> ... | ... <dnatracingstats> ... |
| mini.spm  |    above       |   dna_only        | 0     | 0        | ... <stats> ...      |          NONE             |
| mini.spm  |    above       |   dna_only        | 0     | 1        | ... <stats> ...      |          NONE             |
| mini.spm  |    above       |   dna_only        | 1     | 0        | ... <stats> ...      |          NONE             |
| mini.spm  |    above       |   dna_only        | 1     | 1        | ... <stats> ...      |          NONE             |
| mini.spm  |    above       |   protein_only    | 0     | 0        | ... <stats> ...      |          NONE             |
| mini.spm  |    above       |   protein_only    | 0     | 1        | ... <stats> ...      |          NONE             |
| mini.spm  |    above       |   protein_only    | 1     | 0        | ... <stats> ...      |          NONE             |
| mini.spm  |    above       |   protein_only    | 1     | 1        | ... <stats> ...      |          NONE             |
| mini.spm  |    above       |   combined_mask   | 0     | 0        | ... <stats> ...      |     ... <stats> ...       |
|----------------------------------------------------------------------------------------------------------------------|

@ns-rse
Copy link
Collaborator

ns-rse commented Dec 3, 2024

Its that or split into separate files.

I'm ambivalent as to the preferred solution as I don't use the output but consideration for end users should be given. Whilst data management, manipulation, summarisation and plotting are, in my view, core skills for researchers these days experience levels vary widely and I don't know what would be easiest.

@ns-rse
Copy link
Collaborator

ns-rse commented Dec 10, 2024

Are we aiming to include this refactoring in v2.3.0 release?

@@ -65,6 +65,7 @@ grainstats:
extract_height_profile: true # Extract height profiles along maximum feret of molecules
disordered_tracing:
run: true # Options : true, false
class_index: 1 # The class index to trace. This is the class index of the grains.
min_skeleton_size: 10 # Minimum number of pixels in a skeleton for it to be retained.
pad_width: 1 # Pixels to pad grains by when tracing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pad with could be added into grains? or (see later disordered tracing comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

cropped_images, cropped_masks, bboxs = prep_arrays(image, grains_mask, pad_width)
n_grains = len(cropped_images)
img_base = np.zeros_like(image)
# cropped_images, cropped_masks, bboxs = prep_arrays(image, grains_mask, pad_width)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(continued from default config) or need to be padded here otherwise the 3x3 convolutions will not work

@SylviaWhittle
Copy link
Collaborator Author

Are we aiming to include this refactoring in v2.3.0 release?

I don't think so necessarily. If this comes before we do v2.3.0 then sure but don't hold it off for this PR

@SylviaWhittle
Copy link
Collaborator Author

We will need to make grain padding an option in grains and require at least 1. probably want to take this out of unet config and have it general. This is because Max's code looks at adjacent pixels (so can't touch edge)

@SylviaWhittle
Copy link
Collaborator Author

Notes for self for after the break:

This PR focuses on using data classes to hold grains in a per-grain data structure rather than a per-processing-step, whole image + whole image tensor mask structure.

This PR's scope takes it from grains initially generated, to disordered tracing and no further, to keep this PR more manageable.

This PR will enable downstream processing of multi class grain tensor masks, and allow Max's code to trace them while also getting grain stats out for sub-grains, results stored neatly in disordered tracing, while avoiding any clashes of index in the grainstats csv.

In subsequent PRs, it would be good to try to align the stats for the sub-grains with sections of the molecules identified by Max's tracing code, though they may produce vastly different structures in some circumstances. Use positions potentially to do the mapping (WIP proposal).

Traditional height thresholding to produce multi class masks will also be required. This will produce "halos" but this is to be ignored / dealt with later and not worried about. Allows the lab members to at least check feasibility of their pipelines.

Pixel to nanometre scaling and filenames should be kept in the grain crops too.

@SylviaWhittle
Copy link
Collaborator Author

Things to add to grain crop objects:

  • pixel to nm
  • filename

@SylviaWhittle
Copy link
Collaborator Author

SylviaWhittle commented Dec 20, 2024

How grains are handled now
image

Note that disordered tracing still outputs the same. Will need updating later on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants