[ GPU/OpenCL ] Split kernel registration from forwarding method #2785

EunjuYang · 2024-11-04T10:54:22Z

This draft is a suggestion for [ GPU ] GPU Kernel creation time #2723
This draft splits kernel registration from forwarding function.
This draft make kernelPtr as static member of layer to avoid redundant kernel registration.
This draft contains example update for concat_cl , reshape_cl, and fc_layer_cl only.

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

taos-ci · 2024-11-04T10:54:25Z

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2785. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

taos-ci · 2024-11-04T10:54:30Z

cibot: @EunjuYang, nntrainer/layers/cl_layers/concat_cl.h does not include Doxygen tags such as @file @brief @author @bug. You must include the Doxygen tags in the source code. Please refer to a Doxygen manual at http://github.com/nnstreamer/TAOS-CI/blob/main/ci/doc/doxygen-documentation.md

taos-ci · 2024-11-04T11:08:02Z

cibot: @EunjuYang, nntrainer/layers/cl_layers/layer_impl_cl.h does not include Doxygen tags such as @file @brief @author @bug. You must include the Doxygen tags in the source code. Please refer to a Doxygen manual at http://github.com/nnstreamer/TAOS-CI/blob/main/ci/doc/doxygen-documentation.md

taos-ci

@EunjuYang, 💯 All CI checkers are successfully verified. Thanks.

taos-ci

@EunjuYang, 💯 All CI checkers are successfully verified. Thanks.

taos-ci

@EunjuYang, 💯 All CI checkers are successfully verified. Thanks.

baek2sm

Nice work. LGTM!

nntrainer/cl_context.cpp

djeong20 · 2024-11-14T23:35:55Z

nntrainer/layers/cl_layers/concat_cl.cpp

+    << "OpenCL Error: Fail to register concat_cl_axis3_fp16 kernel";
+  layer_kernel_ptrs.emplace_back(kernel_concat_ptr);
+
+  return true;


quick question! can't ConcatLayerCl::registerClKernels() be called twice?
assume it is called for a second time, would it throw a runtime error or return true?

I assumed it is only called once in add_default_object, which is called by registerer ; the registerer is called once. However, it seems better to check it. I will update it.

ClContext &ClContext::Global() { static ClContext instance; // initializing commandqueue and context bool result = instance.clInit(); if (!result) { ml_loge("cl_context: opencl command queue creation failed"); } /// in g++ there is a bug that hangs up if caller throws, /// so registerer is noexcept although it'd better not /// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70298 std::call_once(global_cl_context_init_flag, registerer, std::ref(instance)); return instance; }

I added a condition to check it in the registerClKernels() as well.

nntrainer/nntrainer/layers/cl_layers/reshape_cl.cpp

Line 55 in 6dc250f

if (!layer_kernel_ptrs.empty())

nntrainer/layers/cl_layers/fc_layer_cl.h

nntrainer/layers/cl_layers/concat_cl.cpp

- This commit is draft - This commit splits kernel registeration from forwarding function. - This is WIP. This commit contains example update for concat_cl and fc_layer_cl. Self evaluation: Build test: [X]Passed [ ]Failed [ ]Skipped Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Eunju Yang <[email protected]>

- This commit updates reshape_cl.cpp/.h to inherit LayerImplCl. - This commit implements registerClKernels(), which is called in context_cl.cpp - update fc_layer_cl.h (removing redundant variable) - update register_kernels only return true when all kernels are successfully registered. - add conditional code to check kernel is already registered Self evaluation: Build test: [X]Passed [ ]Failed [ ]Skipped Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Eunju Yang <[email protected]>

- This commit do a fp16-related bugfix in concat_cl.cpp. - add condition `ENABLE_FP16` - update __fp16 to _FP16 Signed-off-by: Eunju Yang <[email protected]>

EunjuYang · 2024-11-20T02:54:29Z

📢 An additional commit to fix fp16-related issue in concat_cl is included : da18596
🩹 @djeong20 's recommendation is applied. Please review it and give me feedbacks. Thanks.

taos-ci

@EunjuYang, 💯 All CI checkers are successfully verified. Thanks.

nntrainer/layers/cl_layers/concat_cl.cpp

- clang format is applied. - revert Android.mk - fix bug in registerClKernels Signed-off-by: Eunju Yang <[email protected]>

taos-ci

@EunjuYang, 💯 All CI checkers are successfully verified. Thanks.

djeong20

Thank you for the hard work! 👍

jijoongmoon · 2024-11-26T23:30:21Z

nntrainer/layers/cl_layers/concat_cl.cpp

-    int dim = int(input1_batch_size * input1_width * input1_height *
-                  (input1_channels + input2_channels));
+    int dim = int(input1_batch_size * input1_channels * input1_width *
+                  (input1_height + input2_height));

    opencl::Buffer inputA(cl_context_ref.context_inst_,


How about change the oopencl::Buffer to take the Tensor itself? Then we can set clCreateBuffer depending on type and we do not need to consider the type here.

nntrainer/nntrainer/opencl/opencl_buffer.cpp

Line 30 in ff71fad

Buffer::Buffer(ContextManager &context_manager, int size_in_bytes,

github-actions bot added the Need Review label Nov 4, 2024

EunjuYang force-pushed the gpu_layer_refactor branch 2 times, most recently from f43b253 to d48da33 Compare November 4, 2024 11:07

taos-ci approved these changes Nov 4, 2024

View reviewed changes

EunjuYang force-pushed the gpu_layer_refactor branch from d48da33 to bff721b Compare November 5, 2024 11:45

taos-ci approved these changes Nov 5, 2024

View reviewed changes

EunjuYang changed the title ~~[WIP/Draft] [ GPU/OpenCL ] Split kernel registration from forwarding method~~ [ GPU/OpenCL ] Split kernel registration from forwarding method Nov 6, 2024

EunjuYang force-pushed the gpu_layer_refactor branch from bff721b to 4354f53 Compare November 6, 2024 03:27

taos-ci approved these changes Nov 6, 2024

View reviewed changes

EunjuYang marked this pull request as ready for review November 6, 2024 04:31

EunjuYang requested review from myungjoo, jijoongmoon, again4you, jaeyun-jung, leemgs, wooksong, gichan-jang, anyj0527, lhs8928, songgot, jihochu, DonghakPark, SeoHyungjun, baek2sm, skykongkong8 and djeong20 as code owners November 6, 2024 04:31

baek2sm approved these changes Nov 12, 2024

View reviewed changes

djeong20 reviewed Nov 14, 2024

View reviewed changes

EunjuYang added the WIP label Nov 18, 2024

EunjuYang added 3 commits November 20, 2024 10:38

[ FP16 ] Concat_cl fp16-related update

da18596

- This commit do a fp16-related bugfix in concat_cl.cpp. - add condition `ENABLE_FP16` - update __fp16 to _FP16 Signed-off-by: Eunju Yang <[email protected]>

EunjuYang force-pushed the gpu_layer_refactor branch from 4354f53 to da18596 Compare November 20, 2024 02:37

EunjuYang removed the WIP label Nov 20, 2024

EunjuYang force-pushed the gpu_layer_refactor branch from 6dc250f to e0bc0d8 Compare November 20, 2024 02:48

taos-ci approved these changes Nov 20, 2024

View reviewed changes

djeong20 reviewed Nov 20, 2024

View reviewed changes

nntrainer/layers/cl_layers/concat_cl.cpp Outdated Show resolved Hide resolved

[ trivial ] clang format & revert some tmp codes & fix bug

da83297

- clang format is applied. - revert Android.mk - fix bug in registerClKernels Signed-off-by: Eunju Yang <[email protected]>

EunjuYang force-pushed the gpu_layer_refactor branch from e0bc0d8 to da83297 Compare November 21, 2024 02:10

taos-ci approved these changes Nov 21, 2024

View reviewed changes

EunjuYang mentioned this pull request Nov 25, 2024

[ Wait for #2785 ] [ GPU / RMSNorm layer ] split kernel register from forwarding function #2804

Open

djeong20 approved these changes Nov 26, 2024

View reviewed changes

github-actions bot added PR/READY2MERGE and removed Need Review labels Nov 26, 2024

jijoongmoon reviewed Nov 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ GPU/OpenCL ] Split kernel registration from forwarding method #2785

[ GPU/OpenCL ] Split kernel registration from forwarding method #2785

EunjuYang commented Nov 4, 2024 •

edited

Loading

taos-ci commented Nov 4, 2024

taos-ci commented Nov 4, 2024

taos-ci commented Nov 4, 2024

taos-ci left a comment

taos-ci left a comment

taos-ci left a comment

baek2sm left a comment

djeong20 Nov 14, 2024

EunjuYang Nov 18, 2024 •

edited

Loading

EunjuYang Nov 20, 2024

EunjuYang commented Nov 20, 2024

taos-ci left a comment

taos-ci left a comment

djeong20 left a comment

jijoongmoon Nov 26, 2024

[ GPU/OpenCL ] Split kernel registration from forwarding method #2785

Are you sure you want to change the base?

[ GPU/OpenCL ] Split kernel registration from forwarding method #2785

Conversation

EunjuYang commented Nov 4, 2024 • edited Loading

taos-ci commented Nov 4, 2024

taos-ci commented Nov 4, 2024

taos-ci commented Nov 4, 2024

taos-ci left a comment

Choose a reason for hiding this comment

taos-ci left a comment

Choose a reason for hiding this comment

taos-ci left a comment

Choose a reason for hiding this comment

baek2sm left a comment

Choose a reason for hiding this comment

djeong20 Nov 14, 2024

Choose a reason for hiding this comment

EunjuYang Nov 18, 2024 • edited Loading

Choose a reason for hiding this comment

EunjuYang Nov 20, 2024

Choose a reason for hiding this comment

EunjuYang commented Nov 20, 2024

taos-ci left a comment

Choose a reason for hiding this comment

taos-ci left a comment

Choose a reason for hiding this comment

djeong20 left a comment

Choose a reason for hiding this comment

jijoongmoon Nov 26, 2024

Choose a reason for hiding this comment

EunjuYang commented Nov 4, 2024 •

edited

Loading

EunjuYang Nov 18, 2024 •

edited

Loading