Refactor pipeline to use grain crop dictionaries #1022

SylviaWhittle · 2024-11-22T17:19:21Z

This is a draft PR and documentation / tests will be added before full PR is made

TopoStats Pull Requests

Please provide a descriptive summary of the changes your Pull Request introduces.

The Software Development section of
the Contributing Guidelines may be useful if you are unfamiliar with linting, pre-commit, docstrings and testing.

NB - This header should be replaced with the description but please complete the below checklist or a short
description of why a particular item is not relevant.

Before submitting a Pull Request please check the following.

Existing tests pass.
Documentation has been updated and builds. Remember to update as required...
- docs/configuration.md
- docs/usage.md
- docs/data_dictionary.md
- docs/advanced.md and new pages it should link to.
Pre-commit checks pass.
New functions/methods have typehints and docstrings.
New functions/methods have tests which check the intended behaviour is correct.

Optional

`topostats/default_config.yaml`

If adding options to topostats/default_config.yaml please ensure.

There is a comment adjacent to the option explaining what it is and the valid values.
A check is made in topostats/validation.py to ensure entries are valid.
Add the option to the relevant sub-parser in topostats/entry_point.py.

…ures.

…ti class & subgrains

…mask required bool. Tested in debugger.

…f GrainCrops

… each row

…2] >= 2, shape[1]==shape[2]

…rainCrops. Locally debugged working

…e image plotting

SylviaWhittle · 2024-12-03T17:19:26Z

Proposed solution to the data frame issue

|----------------------------------------------------------------------------------------------------------------------|
|   image   |    direction   |     class         | grain | molecule | ... <grainstats> ... | ... <dnatracingstats> ... |
| mini.spm  |    above       |   dna_only        | 0     | 0        | ... <stats> ...      |          NONE             |
| mini.spm  |    above       |   dna_only        | 0     | 1        | ... <stats> ...      |          NONE             |
| mini.spm  |    above       |   dna_only        | 1     | 0        | ... <stats> ...      |          NONE             |
| mini.spm  |    above       |   dna_only        | 1     | 1        | ... <stats> ...      |          NONE             |
| mini.spm  |    above       |   protein_only    | 0     | 0        | ... <stats> ...      |          NONE             |
| mini.spm  |    above       |   protein_only    | 0     | 1        | ... <stats> ...      |          NONE             |
| mini.spm  |    above       |   protein_only    | 1     | 0        | ... <stats> ...      |          NONE             |
| mini.spm  |    above       |   protein_only    | 1     | 1        | ... <stats> ...      |          NONE             |
| mini.spm  |    above       |   combined_mask   | 0     | 0        | ... <stats> ...      |     ... <stats> ...       |
|----------------------------------------------------------------------------------------------------------------------|

ns-rse · 2024-12-03T17:26:42Z

Its that or split into separate files.

I'm ambivalent as to the preferred solution as I don't use the output but consideration for end users should be given. Whilst data management, manipulation, summarisation and plotting are, in my view, core skills for researchers these days experience levels vary widely and I don't know what would be easiest.

ns-rse · 2024-12-10T12:02:21Z

Are we aiming to include this refactoring in v2.3.0 release?

topostats/processing.py

MaxGamill-Sheffield · 2024-12-10T16:47:40Z

topostats/default_config.yaml

@@ -65,6 +65,7 @@ grainstats:
  extract_height_profile: true # Extract height profiles along maximum feret of molecules
 disordered_tracing:
  run: true # Options : true, false
+  class_index: 1 # The class index to trace. This is the class index of the grains.
  min_skeleton_size: 10 # Minimum number of pixels in a skeleton for it to be retained.
  pad_width: 1 # Pixels to pad grains by when tracing


The pad with could be added into grains? or (see later disordered tracing comment)

MaxGamill-Sheffield · 2024-12-10T16:48:26Z

topostats/tracing/disordered_tracing.py

-    cropped_images, cropped_masks, bboxs = prep_arrays(image, grains_mask, pad_width)
-    n_grains = len(cropped_images)
-    img_base = np.zeros_like(image)
+    # cropped_images, cropped_masks, bboxs = prep_arrays(image, grains_mask, pad_width)


(continued from default config) or need to be padded here otherwise the 3x3 convolutions will not work

Co-authored-by: Neil Shephard <[email protected]>

SylviaWhittle · 2024-12-11T16:25:44Z

Are we aiming to include this refactoring in v2.3.0 release?

I don't think so necessarily. If this comes before we do v2.3.0 then sure but don't hold it off for this PR

SylviaWhittle · 2024-12-11T16:49:09Z

We will need to make grain padding an option in grains and require at least 1. probably want to take this out of unet config and have it general. This is because Max's code looks at adjacent pixels (so can't touch edge)

…ion.

…rdered tracing

…ken from main)

…ng non square bboxes

…regtest

SylviaWhittle · 2024-12-20T15:25:20Z

Notes for self for after the break:

This PR focuses on using data classes to hold grains in a per-grain data structure rather than a per-processing-step, whole image + whole image tensor mask structure.

This PR's scope takes it from grains initially generated, to disordered tracing and no further, to keep this PR more manageable.

This PR will enable downstream processing of multi class grain tensor masks, and allow Max's code to trace them while also getting grain stats out for sub-grains, results stored neatly in disordered tracing, while avoiding any clashes of index in the grainstats csv.

In subsequent PRs, it would be good to try to align the stats for the sub-grains with sections of the molecules identified by Max's tracing code, though they may produce vastly different structures in some circumstances. Use positions potentially to do the mapping (WIP proposal).

Traditional height thresholding to produce multi class masks will also be required. This will produce "halos" but this is to be ignored / dealt with later and not worried about. Allows the lab members to at least check feasibility of their pipelines.

Pixel to nanometre scaling and filenames should be kept in the grain crops too.

SylviaWhittle · 2024-12-20T15:26:08Z

Things to add to grain crop objects:

pixel to nm
filename

SylviaWhittle · 2024-12-20T15:38:47Z

How grains are handled now

Note that disordered tracing still outputs the same. Will need updating later on.

SylviaWhittle and others added 21 commits November 20, 2024 14:35

WIP: Scope out refactor for grains.py

58757ad

WIP: Begin grains > grainstats pipeline overhaul. Outline data struct…

f49b1e6

…ures.

WIP: Scope out changes to GrainStats.calculate_stats to allow for mul…

4334027

…ti class & subgrains

WIP: Initial proposal for grainstats using grain dictionary refactor

d106b0f

Add function: graincrops_merge_classes

1acee67

Add function: graincrops_update_background_class

a7b9075

Update: extract_grains_from_full_image now works in theory, untested

913e3b5

Fix: extract_grains_from_full_image_mask: allocating region to empty …

843a606

…mask required bool. Tested in debugger.

WIP: Switch vetting, merging and update background to work on dicts o…

535bea8

…f GrainCrops

WIP: Update vet_grains to take / return dicts of GrainCrops

36c07e2

WIP: Update grainstats handling of dataframe to use list of dicts for…

d1e4c1d

… each row

Fix: validate_full_mask_tensor_shape: Require len(shape) == 3, shape[…

ace46ea

…2] >= 2, shape[1]==shape[2]

Edit: find_grains now stores grains in self.image_grain_crops: ImageG…

e2824f9

…rainCrops. Locally debugged working

WIP: Handle ImageGrainCrops between run_grains and run_grainstats

f21eccd

WIP: Graintstats handles ImageGrainCrops

63d0003

Fix: grainstats: process scan no longer needing grain plots returned

02ecf43

WIP: Begin grains > disordered_tracing pipeline overhaul

651e89c

Merge branch 'main' into SylviaWhittle/grain_restructure

a0370cc

[pre-commit.ci] Fixing issues with pre-commit

a03524e

WIP: grains > disorderd_tracing pipeline | fix typing and remove whol…

8d215fd

…e image plotting

[pre-commit.ci] Fixing issues with pre-commit

3a18880

Add: class index to disordered tracing config

7781c86

[WIP] Fix: Attempt to fix grain_number double index issue

80b711c

MaxGamill-Sheffield reviewed Dec 10, 2024

View reviewed changes

topostats/processing.py Outdated Show resolved Hide resolved

MaxGamill-Sheffield reviewed Dec 10, 2024

View reviewed changes

remove raising error on empty direction

4c6f0f3

Co-authored-by: Neil Shephard <[email protected]>

SylviaWhittle and others added 11 commits December 17, 2024 10:11

Add padding to the grains section of config and remove from unet sect…

e061001

…ion.

[WIP]: Attempt to fix data passing errors between processing and diso…

8855707

…rdered tracing

Attempt merge with main (grains.py taken from branch, disorderd.py ta…

9f59678

…ken from main)

[pre-commit.ci] Fixing issues with pre-commit

6547c59

Add post init script to validate input data to GrainCrop dataclass

8030c8e

Fix bug: wrong (old) graincrops returned from unet masking

b2ec80d

Fix bug: wrong bbox used in construction of GrainCrop dataclass causi…

be9f663

…ng non square bboxes

Fix: padding wrongly subtracted in nodestats and ordered_tracing

fc938fb

[pre-commit.ci] Fixing issues with pre-commit

ba08b36

Add: image grain crops to topostats object and fix process scan both …

8da84e2

…regtest

Add more rigorous GrainCrop setters and getters

cec5301

[pre-commit.ci] Fixing issues with pre-commit

7f75723

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor pipeline to use grain crop dictionaries #1022

Refactor pipeline to use grain crop dictionaries #1022

SylviaWhittle commented Nov 22, 2024

SylviaWhittle commented Dec 3, 2024

ns-rse commented Dec 3, 2024

ns-rse commented Dec 10, 2024

MaxGamill-Sheffield Dec 10, 2024

SylviaWhittle Dec 20, 2024

MaxGamill-Sheffield Dec 10, 2024

SylviaWhittle commented Dec 11, 2024

SylviaWhittle commented Dec 11, 2024

SylviaWhittle commented Dec 20, 2024

SylviaWhittle commented Dec 20, 2024

SylviaWhittle commented Dec 20, 2024 •

edited

Loading

Refactor pipeline to use grain crop dictionaries #1022

Are you sure you want to change the base?

Refactor pipeline to use grain crop dictionaries #1022

Conversation

SylviaWhittle commented Nov 22, 2024

This is a draft PR and documentation / tests will be added before full PR is made

TopoStats Pull Requests

Optional

topostats/default_config.yaml

SylviaWhittle commented Dec 3, 2024

ns-rse commented Dec 3, 2024

ns-rse commented Dec 10, 2024

MaxGamill-Sheffield Dec 10, 2024

Choose a reason for hiding this comment

SylviaWhittle Dec 20, 2024

Choose a reason for hiding this comment

MaxGamill-Sheffield Dec 10, 2024

Choose a reason for hiding this comment

SylviaWhittle commented Dec 11, 2024

SylviaWhittle commented Dec 11, 2024

SylviaWhittle commented Dec 20, 2024

SylviaWhittle commented Dec 20, 2024

SylviaWhittle commented Dec 20, 2024 • edited Loading

`topostats/default_config.yaml`

SylviaWhittle commented Dec 20, 2024 •

edited

Loading