-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cb allocation errors for halo and conv2d #16190
Conversation
@@ -462,6 +462,7 @@ def __call__(self, x, move_input_tensor_to_device=True): | |||
ttnn.deallocate(c4_residual) | |||
x = self.upblock2(x, c3_residual) | |||
ttnn.deallocate(c3_residual) | |||
c2_residual = ttnn.to_memory_config(c2_residual, ttnn.L1_MEMORY_CONFIG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the reason for regression on unet perf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to figure out why this is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it is, I don't have bandwidth to dig into why is this helping
ttnn/cpp/ttnn/operations/conv/conv2d/prepare_conv2d_weights.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but maybe we should open some follow on issues for the trace assert and the added s2i call.
c6b8fb8
to
a9f2805
Compare
Halo op has a problem where manipulating tt::tt_metal::LegacyShape in compute_output_specs is not doing the expected thing (value is unchanged). Conv2d has issue when act_block_h_override is used. Assert that act_block_h_override in tiles needs be a divisor of per_core_out_matrix_height_ntiles was wrongly disabled. In turn each core would produce more output than than it was allocated and override random data. which is not visible in op unit tests. This change regress unet shallow device perf. 1122 -> 1053. Most of the regressino comes from a workaround to make model work determinically which converts one sharded tensor to l1 interleaved tensor. Root cause is unknow but we did hit this issue before and it went away, not it's back.
a9f2805
to
aa0d30e
Compare
Halo op has a problem where manipulating
tt::tt_metal::LegacyShape in compute_output_specs
is not doing the expected thing (value is unchanged).
Conv2d has issue when act_block_h_override is used. Assert that act_block_h_override in tiles needs
be a divisor of per_core_out_matrix_height_ntiles
was wrongly disabled.
In turn each core would produce more output than
than it was allocated and override random data.
which is not visible in op unit tests.
This change regress unet shallow device perf.
1122 -> 1053.
Most of the regressino comes from a workaround
to make model work determinically which converts
one sharded tensor to l1 interleaved tensor.
Root cause is unknow but we did hit this issue
before and it went away, not it's back.
Checklist