Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cb allocation errors for halo and conv2d #16190

Merged
merged 1 commit into from
Jan 7, 2025

Conversation

pavlejosipovic
Copy link
Contributor

Halo op has a problem where manipulating
tt::tt_metal::LegacyShape in compute_output_specs
is not doing the expected thing (value is unchanged).

Conv2d has issue when act_block_h_override is used. Assert that act_block_h_override in tiles needs
be a divisor of per_core_out_matrix_height_ntiles
was wrongly disabled.
In turn each core would produce more output than
than it was allocated and override random data.
which is not visible in op unit tests.

This change regress unet shallow device perf.
1122 -> 1053.
Most of the regressino comes from a workaround
to make model work determinically which converts
one sharded tensor to l1 interleaved tensor.
Root cause is unknow but we did hit this issue
before and it went away, not it's back.

Checklist

@@ -462,6 +462,7 @@ def __call__(self, x, move_input_tensor_to_device=True):
ttnn.deallocate(c4_residual)
x = self.upblock2(x, c3_residual)
ttnn.deallocate(c3_residual)
c2_residual = ttnn.to_memory_config(c2_residual, ttnn.L1_MEMORY_CONFIG)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the reason for regression on unet perf?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to figure out why this is needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is, I don't have bandwidth to dig into why is this helping

Copy link
Contributor

@esmalTT esmalTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but maybe we should open some follow on issues for the trace assert and the added s2i call.

@pavlejosipovic pavlejosipovic force-pushed the pjosipovic/fix_cb_asserts branch 2 times, most recently from c6b8fb8 to a9f2805 Compare January 7, 2025 17:34
Halo op has a problem where manipulating
tt::tt_metal::LegacyShape in compute_output_specs
is not doing the expected thing (value is unchanged).

Conv2d has issue when act_block_h_override is used.
Assert that act_block_h_override in tiles needs
be a divisor of per_core_out_matrix_height_ntiles
was wrongly disabled.
In turn each core would produce more output than
than it was allocated and override random data.
which is not visible in op unit tests.

This change regress unet shallow device perf.
1122 -> 1053.
Most of the regressino comes from a workaround
to make model work determinically which converts
one sharded tensor to l1 interleaved tensor.
Root cause is unknow but we did hit this issue
before and it went away, not it's back.
@pavlejosipovic pavlejosipovic force-pushed the pjosipovic/fix_cb_asserts branch from a9f2805 to aa0d30e Compare January 7, 2025 17:38
@pavlejosipovic pavlejosipovic merged commit 282a7b2 into main Jan 7, 2025
9 checks passed
@pavlejosipovic pavlejosipovic deleted the pjosipovic/fix_cb_asserts branch January 7, 2025 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants