Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifies the squaring bbox and tweaks PIL input shapes #1030

Merged
merged 7 commits into from
Dec 11, 2024

Conversation

MaxGamill-Sheffield
Copy link
Collaborator

Closes #1029

Very quick one as I had to do this for the topology paper which modifies how grains are cropped upon entry to the unet - turns out if these were too large, they weren't properly handled by the cropping and thus forced a grain failure.

Also quickly modified the PIL input shapes as sequential steps in the unet processing gives weird values (x, y) vs (y, x) and so I swapped these and now my images work well. I assume this wasn't a problem before as all images were square anyway.

Have updated tests too :)

Copy link
Collaborator

@ns-rse ns-rse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have started looking at this but not finished, will complete review tomorrow.

pytest.param(1, 6, 7, 8, (10, 10), (1, 3, 7, 9), id="constrained right"),
pytest.param(1, 1, 2, 6, (10, 10), (0, 1, 5, 6), id="constrained top"),
pytest.param(6, 1, 8, 7, (10, 10), (3, 1, 9, 7), id="constrained bottom"),
pytest.param(117, 20, 521, 603, (608, 608), (24, 20, 607, 603), id="constrained top and bottom"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator

@ns-rse ns-rse Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I've tried looking at this in detail I find the dimensions quite hard to think about, are there smaller image_shape and co-ordinates that would be constrained top and bottom?

I think if image_shape is always square, which isn't necessarily the case since I recall @SylviaWhittle writing code to handle none-square images, then we'd only ever see a constraint on one side. Something that would be constrained top and bottom would be a wide rectangular bounding box within a wide rectangular image.

EDIT: I added just such a set of test parameters to the above mentioned commit and marked it as pytest.mark.xfail() with a reason explaining why. If cherry picking it will be included.

@ns-rse ns-rse changed the title Modifies the squaring bbox and Modifies the squaring bbox and tweaks PIL input shapes Dec 3, 2024
Copy link
Collaborator

@ns-rse ns-rse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some broad comments as I don't think I saw/reviewed this unet code when it was originally added.

Test ID's

These are useful they give us (and our future selves) a marker/indication of what is going on.

Maybe I'm misunderstanding the intent/meaning but I think the id's are perhaps misleading. Taking the second parameterised test (which I can't comment on directly in this PR) it is...

pytest.param(3, 4, 8, 8, (10, 10), (3, 4, 8, 9), id="free space single min col decrease"),

The minimum column though remains at 4 and its the last co-ordinate which is the max_col which has gone from 8 > 9 here.

Across these I think they should perhaps be...

        pytest.param(3, 4, 8, 8, (10, 10), (3, 4, 8, 9), id="free space single max col increase"),
        pytest.param(4, 3, 8, 8, (10, 10), (4, 3, 9, 8), id="free space single max row increase"),
        pytest.param(4, 4, 7, 8, (10, 10), (4, 4, 8, 8), id="free space single max row increase"),
        pytest.param(4, 4, 8, 7, (10, 10), (4, 4, 8, 8), id="free space single max col increase"),
        pytest.param(4, 2, 8, 8, (10, 10), (3, 2, 9, 8), id="free space double min row decrease, max row increase"),
        pytest.param(2, 4, 8, 8, (10, 10), (2, 3, 8, 9), id="free space double min col decrease, max col increase"),
        pytest.param(4, 4, 8, 6, (10, 10), (4, 3, 8, 7), id="free space double min col decrease, max col increase"),
        pytest.param(4, 4, 6, 8, (10, 10), (3, 4, 7, 8), id="free space double min row decrease, max row increase"),

Additional assertions

The tests currently assert that the returned co-ordinates match pre-defined variables, but I think we can be a bit more savvy here and add in the following checks (can't make a direct suggestion on the code as its outside the scope of changes for the PR). The first of actually checks that the returned co-ordinates give a square...

    assert (result[2] - result[0]) == (result[3] - result[1])

We can then check that each of the co-ordinates is within the image_shape with...

    assert result[0] >= 0
    assert result[1] >= 0
    assert result[2] <= image_shape[0]
    assert result[3] <= image_shape[1]

This almost makes the test as to whether the result == expected_indices redundant but no harm in keeping it in.

Is it always needed?

This function is called on line 644 of grains.py and it is called unconditionally...

            # Make the bounding box square within the confines of the image
            bounding_box = make_bounding_box_square(
                crop_min_row=bounding_box[0],
                crop_min_col=bounding_box[1],
                crop_max_row=bounding_box[2],
                crop_max_col=bounding_box[3],
                image_shape=(image.shape[0], image.shape[1]),
            )

...and therefore on every single bounding box. I wouldn't be surprised if it is rare to see a square bounding box and so they almost always need squaring but the logic of testing whether it is square could be used to check whether all of the grains need to be checked by this function. That can be done with...

            if (bounding_box[2] - bounding_box[0]) != (bounding_box[3] - bounding_box[1]):
                bounding_box = make_bounding_box_square(
                    crop_min_row=bounding_box[0],
                    crop_min_col=bounding_box[1],
                    crop_max_row=bounding_box[2],
                    crop_max_col=bounding_box[3],
                    image_shape=(image.shape[0], image.shape[1]),

This led me to think about the logical checks on how the box is squared because there are two checks made the first to see if the height > width and then if the width > height. So if we only call this when we have non-square boxes the elif width > height can simply be else as the existing else (which returns a square bounding box) isn't required as we wouldn't be calling this function with a square bounding box (removes five lines of code!).

But there are still a lot of if: ... else:... statements being checked and lots of calculations so I wondered if there might be another way of doing this and I realised there is.

The broad algorithm is to...

  1. Find the mid-point of the existing bounding box and its largest radius.
  2. Expand in each direction from the mid-point by the radius (taking floor for minimum and ceilings for maximum).
  3. Adjust to the image boundary if this extends beyond the edges.

This doesn't always return a square bounding box though (the tests failed, which is why tests are so useful to have in place and the proposed additional assert to check the height == width would capture this nicely).

These changes are on branch ns-rse/unet-crop-alternative at commit d2187cb3215ca52fd5439de8ad8eae9c1cb5c02e. You can check them out at...

git switch main
git pull
git checkout d2187cb3215ca52fd5439de8ad8eae9c1cb5c02e

A quick search threw up this thread which has some simpler logic and I've had a got at coding it up on the same branch ns-rse/unet-crop-alternative (it's at HEAD so you can just git switch ns-rse/unet-crop-alternative to look at the changes and test). It too isn't quite right though as some of the tests still fail. and I've spent too long on this.

Why go through all of this? Its the essence of reviewing code.I like to keep code simple and avoid unnecessary calls if possible as at some point the problem will arise that "Its not fast enough" so if we can at the very least avoid making a function call if we don't need to that is useful (although I'm not sure perfectly square boxes will turn up that often). The proposed change is I think a bit easier to reason about there are less conditional clauses to work through in the various stages, its just the maths that need reasoning about. This might make the code easier to understand in the future.

Don't need to switch to these alternative solutions (especially as neither are complete and are still works in progress!) but I would suggest making a couple of changes at the very least...

  1. Only calling make_bounding_box_square() from grains.py if it isn't already and in turn remove the current else: and make the elif: the else:.
  2. Adding the suggested assert additions to the tests.

You can, I think, get the change to just the tests/test_unet_masking.py and/or topostats/grains.py by using...

git show d2187cb3215ca52fd5439de8ad8eae9c1cb5c02e -- tests/test_unet_masking.py | git apply --index -

Which shows the changes made to that file then pipes them into git apply --index - which will apply it to the current branch.

Repeat for grains.py.

You then need to make a git commit to commit those changes on your branch.

pytest.param(1, 6, 7, 8, (10, 10), (1, 3, 7, 9), id="constrained right"),
pytest.param(1, 1, 2, 6, (10, 10), (0, 1, 5, 6), id="constrained top"),
pytest.param(6, 1, 8, 7, (10, 10), (3, 1, 9, 7), id="constrained bottom"),
pytest.param(117, 20, 521, 603, (608, 608), (24, 20, 607, 603), id="constrained top and bottom"),
Copy link
Collaborator

@ns-rse ns-rse Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I've tried looking at this in detail I find the dimensions quite hard to think about, are there smaller image_shape and co-ordinates that would be constrained top and bottom?

I think if image_shape is always square, which isn't necessarily the case since I recall @SylviaWhittle writing code to handle none-square images, then we'd only ever see a constraint on one side. Something that would be constrained top and bottom would be a wide rectangular bounding box within a wide rectangular image.

EDIT: I added just such a set of test parameters to the above mentioned commit and marked it as pytest.mark.xfail() with a reason explaining why. If cherry picking it will be included.

@ns-rse ns-rse added this to the v2.3.0 milestone Dec 10, 2024
@ns-rse ns-rse added the v2.3.0 label Dec 10, 2024
MaxGamill-Sheffield and others added 4 commits December 10, 2024 12:32
- Corrects the `id` of the parameterised tests for `test_make_bounding_box_square()` to reflect the `row`/`col`
  and whether they `increase`/`decrease`.
- Adds a rectangular image with a large rectangular `bbox` that can not be squared and marks the test as a fail.

I recall, but can't find, that we added functionality to handle non-square scans (I think @SylviaWhittle wrote the code
to handle it) and so its possible that instances may arise where the `bbox` can't be squared.

Ideally we should add a `try: ... except:` to handle such cases but for now adding a test which is marked as expected to
fail should serve as an _aide memoire_ that sometimes things won't work.
tests(unet_masking): Correct ids and add expected fail
Copy link
Collaborator

@ns-rse ns-rse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We went through this earlier and agreed its good to go. The test that has been introduced that is expected to fail if a rectangular image with a large bounding box which can not be squared is considered sufficient for now.

Thanks @MaxGamill-Sheffield and @SylviaWhittle

@MaxGamill-Sheffield MaxGamill-Sheffield added this pull request to the merge queue Dec 11, 2024
Merged via the queue into main with commit 9e1fde6 Dec 11, 2024
11 checks passed
@MaxGamill-Sheffield MaxGamill-Sheffield deleted the maxgamill-sheffield/unet-crop-fix branch December 11, 2024 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Large crops using a U-Net aren't cropped correctly
2 participants