Add subdevice support to multicore untilize #16193

sraizada-tt · 2024-12-19T16:45:05Z

What's changed

Added sub_core_grids as an arg to use specific cores in the multicore version of the op

Checklist

Post commit CI passes https://github.com/tenstorrent/tt-metal/actions/runs/12433212242
(For models and ops writers) Full new models tests passes https://github.com/tenstorrent/tt-metal/actions/runs/12430594736
New/Existing tests provide coverage for changes

github-actions

⚠️ Clang-Tidy found issue(s) with the introduced code (1/1)

github-actions · 2024-12-19T16:50:51Z

ttnn/cpp/ttnn/operations/data_movement/untilize/device/untilize_program_factory.cpp

+    uint32_t block_size_nbytes = input_single_tile_size;
+
+    auto cores = corerange_to_cores(sub_core_grids, ncores, true);
+    auto all_cores = num_cores_to_corerangeset_in_subcoregrids(cores[0], ncores, sub_core_grids, true);


⚠️ clang-diagnostic-error ⚠️
use of undeclared identifier num_cores_to_corerangeset_in_subcoregrids

github-actions · 2024-12-19T16:50:51Z

ttnn/cpp/ttnn/operations/data_movement/untilize/device/untilize_program_factory.cpp

+        auto src_buffer = input_tensors.at(0).buffer();
+        auto dst_buffer = output_tensors.at(0).buffer();
+        {
+            auto& runtime_args_by_core = GetRuntimeArgs(program, reader_kernel_id);


⚠️ clang-diagnostic-error ⚠️
use of undeclared identifier reader_kernel_id

github-actions · 2024-12-19T16:50:51Z

ttnn/cpp/ttnn/operations/data_movement/untilize/device/untilize_program_factory.cpp

+        }
+
+        {
+            auto& runtime_args_by_core = GetRuntimeArgs(program, writer_kernel_id);


⚠️ clang-diagnostic-error ⚠️
use of undeclared identifier writer_kernel_id

github-actions · 2024-12-19T16:50:51Z

ttnn/cpp/ttnn/operations/data_movement/untilize/untilize.cpp

@@ -40,7 +40,8 @@ ttnn::Tensor ExecuteUntilize::invoke(
    const ttnn::Tensor& input_tensor,
    const std::optional<MemoryConfig>& memory_config,
    bool use_multicore,
-    bool use_pack_untilize) {
+    bool use_pack_untilize,
+    const std::optional<CoreRangeSet> sub_core_grids) {


⚠️ performance-unnecessary-value-param ⚠️
the const qualified parameter sub_core_grids is copied for each invocation; consider making it a reference

Suggested change

const std::optional<CoreRangeSet> sub_core_grids) {

const std::optional<CoreRangeSet>& sub_core_grids) {

github-actions · 2024-12-19T16:50:51Z

ttnn/cpp/ttnn/operations/data_movement/untilize/untilize.hpp

@@ -16,13 +16,15 @@ struct ExecuteUntilize {
        const ttnn::Tensor& input_tensor,
        const std::optional<MemoryConfig>& memory_config = std::nullopt,
        bool use_multicore = true,
-        bool use_pack_untilize = true);
+        bool use_pack_untilize = true,
+        const std::optional<CoreRangeSet> sub_core_grids = std::nullopt);


⚠️ performance-unnecessary-value-param ⚠️
the const qualified parameter sub_core_grids is copied for each invocation; consider making it a reference

Suggested change

const std::optional<CoreRangeSet> sub_core_grids = std::nullopt);

const std::optional<CoreRangeSet>& sub_core_grids = std::nullopt);

github-actions · 2024-12-19T16:50:51Z

ttnn/cpp/ttnn/operations/data_movement/untilize/untilize.cpp

-    bool use_pack_untilize) {
-    return invoke(DefaultQueueId, input_tensor, memory_config, use_multicore, use_pack_untilize);
+    bool use_pack_untilize,
+    const std::optional<CoreRangeSet> sub_core_grids) {


⚠️ performance-unnecessary-value-param ⚠️
the const qualified parameter sub_core_grids is copied for each invocation; consider making it a reference

Suggested change

const std::optional<CoreRangeSet> sub_core_grids) {

const std::optional<CoreRangeSet>& sub_core_grids) {

github-actions · 2024-12-19T16:50:52Z

ttnn/cpp/ttnn/operations/data_movement/untilize/untilize.hpp


    static ttnn::Tensor invoke(
        const ttnn::Tensor& input_tensor,
        const std::optional<MemoryConfig>& memory_config = std::nullopt,
        bool use_multicore = true,
-        bool use_pack_untilize = true);
+        bool use_pack_untilize = true,
+        const std::optional<CoreRangeSet> sub_core_grids = std::nullopt);


⚠️ performance-unnecessary-value-param ⚠️
the const qualified parameter sub_core_grids is copied for each invocation; consider making it a reference

Suggested change

const std::optional<CoreRangeSet> sub_core_grids = std::nullopt);

const std::optional<CoreRangeSet>& sub_core_grids = std::nullopt);

…t-metal into untilize-subdevice

sjameelTT · 2024-12-20T16:32:36Z

ttnn/cpp/ttnn/operations/data_movement/untilize/device/untilize_program_factory.cpp

@@ -29,6 +30,190 @@ uint32_t get_largest_divisor(uint32_t dividend, uint32_t starting_divisor, uint3
    return 1;
 }

+operation::ProgramWithCallbacks untilize_multi_core_parallelize_column_subgrid(
+    const Tensor& a,


is the support here 1:1 with the version without subgrids?

Asking because I'm curious if the validate needs updating.

the regular version splits the cores into core_range and core_range_cliff but when I ran it for my shapes, the core_range_cliff was empty so I think it is 1:1.
I think validate is fine.

sjameelTT · 2024-12-20T16:34:33Z

ttnn/cpp/ttnn/operations/data_movement/untilize/device/untilize_program_factory.cpp

+    uint32_t block_size_nbytes = input_single_tile_size;
+
+    auto cores = corerange_to_cores(sub_core_grids, ncores, true);
+    auto all_cores = num_cores_to_corerangeset_in_subcoregrids(cores[0], ncores, sub_core_grids, true);


is the only difference here that we're using a different work_split and num_cores_to_corerangeset_in_subcoregrids to get the cores?

If so, could you just have one function that if-elses when it's nullopt?

the multicore version has some computations for num_x_cores, num_y_cores and splits the cores into core_range and core_range_cliff:
auto [ncores, all_cores, core_range, core_range_cliff, nblocks_per_core, nblocks_per_core_cliff] =
ttnn::split_blocks_for_tilize(CoreCoord(ncores_x, ncores_y), nblocks);

sjameelTT · 2024-12-20T16:36:57Z

tests/tt_eager/python_api_testing/unit_testing/misc/test_untilize_test.py

+    "nb, nc, nh, nw",
+    (
+        # llama shapes
+        (1, 1, 32, 128 * 1024),


No other shapes are needed?

for our model use case, this is the only one

llongTT

Would you put some message describing the scenario when to use sub_core_grid version?

sraizada-tt · 2024-12-20T16:51:11Z

Would you put some message describing the scenario when to use sub_core_grid version?

The scenario is very specific to a model use case - where we are working on dram prefetching on specific cores, so the ops need to be moved to a specific core-grid. I don't expect anyone to use this other than us right now.

sraizada-tt added 4 commits December 19, 2024 05:37

#0: Add init function

8179e3c

#0: add subdevice support to multicore untilize

af86fc8

#0: add test

8b9d860

#0: clean up

5cd6fc4

sraizada-tt requested review from ntarafdar, sjameelTT, jaykru-tt, yugi957, jvegaTT and llongTT as code owners December 19, 2024 16:45

#0: Update test_untilize_test.py

dd43c84

github-actions bot reviewed Dec 19, 2024

View reviewed changes

sraizada-tt and others added 6 commits December 20, 2024 11:23

#0: address comments

2e5edcf

Merge branch 'main' into untilize-subdevice

f83acab

#0: update test

8f549c5

Merge branch 'untilize-subdevice' of https://github.com/tenstorrent/t…

06ab0c9

…t-metal into untilize-subdevice

Merge branch 'main' into untilize-subdevice

ef48a3c

#0: update test

23e79d4

sjameelTT reviewed Dec 20, 2024

View reviewed changes

llongTT approved these changes Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add subdevice support to multicore untilize #16193

Add subdevice support to multicore untilize #16193

sraizada-tt commented Dec 19, 2024 •

edited

Loading

github-actions bot left a comment

github-actions bot Dec 19, 2024

github-actions bot Dec 19, 2024

github-actions bot Dec 19, 2024

github-actions bot Dec 19, 2024

github-actions bot Dec 19, 2024

github-actions bot Dec 19, 2024

github-actions bot Dec 19, 2024

sjameelTT Dec 20, 2024

sraizada-tt Dec 20, 2024

sjameelTT Dec 20, 2024

sjameelTT Dec 20, 2024

sraizada-tt Dec 20, 2024

sjameelTT Dec 20, 2024

sraizada-tt Dec 20, 2024

llongTT left a comment

sraizada-tt commented Dec 20, 2024

	const std::optional<CoreRangeSet> sub_core_grids) {
	const std::optional<CoreRangeSet>& sub_core_grids) {

Add subdevice support to multicore untilize #16193

Are you sure you want to change the base?

Add subdevice support to multicore untilize #16193

Conversation

sraizada-tt commented Dec 19, 2024 • edited Loading

What's changed

Checklist

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot Dec 19, 2024

Choose a reason for hiding this comment

github-actions bot Dec 19, 2024

Choose a reason for hiding this comment

github-actions bot Dec 19, 2024

Choose a reason for hiding this comment

github-actions bot Dec 19, 2024

Choose a reason for hiding this comment

github-actions bot Dec 19, 2024

Choose a reason for hiding this comment

github-actions bot Dec 19, 2024

Choose a reason for hiding this comment

github-actions bot Dec 19, 2024

Choose a reason for hiding this comment

sjameelTT Dec 20, 2024

Choose a reason for hiding this comment

sraizada-tt Dec 20, 2024

Choose a reason for hiding this comment

sjameelTT Dec 20, 2024

Choose a reason for hiding this comment

sjameelTT Dec 20, 2024

Choose a reason for hiding this comment

sraizada-tt Dec 20, 2024

Choose a reason for hiding this comment

sjameelTT Dec 20, 2024

Choose a reason for hiding this comment

sraizada-tt Dec 20, 2024

Choose a reason for hiding this comment

llongTT left a comment

Choose a reason for hiding this comment

sraizada-tt commented Dec 20, 2024

sraizada-tt commented Dec 19, 2024 •

edited

Loading