[Bug Report] Signed/unsigned arithmetic bugs in reshape ops #16104

mrakitaTT · 2024-12-17T18:08:05Z

Describe the bug
Shape and LegacyShape are using std::size_t for the return type of rank(). In some places in tt-metal/ttnn/cpp/ttnn/operations/data_movement/reshape_view/reshape.cpp there are statements like this: for (int i=0; i <tensor_shape.rank()-2; i++) where unsigned-signed can lead to overflow for certain ranks (in this case when rank is 1).

To Reproduce
Steps to reproduce the behavior:

Run the reshape op in TILE layout for this shape: tensor<128x1xf32> -> tensor<128xf32>
Overflow will cause the for loop to continue past the shape rank which will lead to error: Index is out of bounds for the rank, should be between 0 and 0 however is 1
`

The text was updated successfully, but these errors were encountered:

@sdjordjevicTT

This PR adds TTNN workarounds for these Metal issues: - tenstorrent/tt-metal#13361 - By decomposing `reduce(keepDim=false)` into `reduce(keepDim=true) + reshape` - tenstorrent/tt-metal#16118 - By annulling dimensions argument when all dims are being reduced As part of this work I've also: - Enabled conversion of `stablehlo.reduce` op with multiple reduce dimensions - Added reduce ops verifiers in TTIR - Added a separate function in TTNNWorkarounds to run rewrite patterns for decomposition and layout workarounds - Added lots of unit tests for reduce ops to cover conversions and verifiers - Added lots of silicon tests for reduce ops Opened issue #1624 on myself to revert these workarounds once Metal issues are fixed. Closes #805, #848 After implementing these workarounds and running tests, I've encountered [another Metal issue](tenstorrent/tt-metal#16104), this time in `reshape` op. I've debugged it and I have a local fix, I will send a PR to fix it in Metal repo, confirmed with reshape op owners. I've opened myself an issue #1640 to enable Reduce ops silicon tests after this fix is uplifted. Another issue that I've encountered while working on this is that after the workaround pass decompositions, if we are changing the shapes of the ops tensors, that means that their layout needs to be changed too, but layout pass is done before the workaround pass. I've managed to solve it by reusing the layout of the input tensor, but I am not sure if that is a good solution and maybe we need to repeat some of the layout logic again after workaround decompositions. FYI @sdjordjevicTT Here is the example TTNN IR before the workarounds: ``` %3 = "ttnn.sum"(%2) <{dim_arg = [0: i32, 1 : i32, 2: i32], keep_dim = false}> : (tensor<128x32x4xf32, #ttnn_layout2>) -> tensor<1xf32, #ttnn_layout2> ``` and after the workarounds: ``` %3 = "ttnn.sum"(%2) <{keep_dim = true}> : (tensor<128x32x4xf32, #ttnn_layout2>) -> tensor<1x1x1xf32, #ttnn_layout2> %4 = "ttnn.reshape"(%3) <{shape = [1 : i32]}> : (tensor<1x1x1xf32, #ttnn_layout2>) -> tensor<1xf32, #ttnn_layout3> ```

mrakitaTT added the bug Something isn't working label Dec 17, 2024

mrakitaTT self-assigned this Dec 17, 2024

This was referenced Dec 18, 2024

Add Reduce ops workaround for keepDim=false tenstorrent/tt-mlir#1625

Merged

Enable Reduce ops silicon tests after Metal reshape issue is fixed tenstorrent/tt-mlir#1640

Open

This was referenced Dec 20, 2024

Enable tests for Reduce ops tenstorrent/tt-xla#126

Open

Fix unsigned arithmetic bugs in reshape ops #16253

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] Signed/unsigned arithmetic bugs in reshape ops #16104

[Bug Report] Signed/unsigned arithmetic bugs in reshape ops #16104

mrakitaTT commented Dec 17, 2024 •

edited

Loading

[Bug Report] Signed/unsigned arithmetic bugs in reshape ops #16104

[Bug Report] Signed/unsigned arithmetic bugs in reshape ops #16104

Comments

mrakitaTT commented Dec 17, 2024 • edited Loading

mrakitaTT commented Dec 17, 2024 •

edited

Loading