Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ GPU/OpenCL ] Split kernel registration from forwarding method #2785

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

EunjuYang
Copy link
Contributor

@EunjuYang EunjuYang commented Nov 4, 2024

  • This draft is a suggestion for [ GPU ] GPU Kernel creation time #2723
  • This draft splits kernel registration from forwarding function.
  • This draft make kernelPtr as static member of layer to avoid redundant kernel registration.
  • This draft contains example update for concat_cl , reshape_cl, and fc_layer_cl only.

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

@taos-ci
Copy link
Collaborator

taos-ci commented Nov 4, 2024

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2785. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

@taos-ci
Copy link
Collaborator

taos-ci commented Nov 4, 2024

:octocat: cibot: @EunjuYang, nntrainer/layers/cl_layers/concat_cl.h does not include Doxygen tags such as @file @brief @author @bug. You must include the Doxygen tags in the source code. Please refer to a Doxygen manual at http://github.com/nnstreamer/TAOS-CI/blob/main/ci/doc/doxygen-documentation.md

@EunjuYang EunjuYang force-pushed the gpu_layer_refactor branch 2 times, most recently from f43b253 to d48da33 Compare November 4, 2024 11:07
@taos-ci
Copy link
Collaborator

taos-ci commented Nov 4, 2024

:octocat: cibot: @EunjuYang, nntrainer/layers/cl_layers/layer_impl_cl.h does not include Doxygen tags such as @file @brief @author @bug. You must include the Doxygen tags in the source code. Please refer to a Doxygen manual at http://github.com/nnstreamer/TAOS-CI/blob/main/ci/doc/doxygen-documentation.md

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EunjuYang, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EunjuYang, 💯 All CI checkers are successfully verified. Thanks.

@EunjuYang EunjuYang changed the title [WIP/Draft] [ GPU/OpenCL ] Split kernel registration from forwarding method [ GPU/OpenCL ] Split kernel registration from forwarding method Nov 6, 2024
Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EunjuYang, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Contributor

@baek2sm baek2sm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. LGTM!

nntrainer/cl_context.cpp Show resolved Hide resolved
<< "OpenCL Error: Fail to register concat_cl_axis3_fp16 kernel";
layer_kernel_ptrs.emplace_back(kernel_concat_ptr);

return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick question! can't ConcatLayerCl::registerClKernels() be called twice?
assume it is called for a second time, would it throw a runtime error or return true?

Copy link
Contributor Author

@EunjuYang EunjuYang Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed it is only called once in add_default_object, which is called by registerer ; the registerer is called once. However, it seems better to check it. I will update it.

ClContext &ClContext::Global() {
  static ClContext instance;

  // initializing commandqueue and context
  bool result = instance.clInit();

  if (!result) {
    ml_loge("cl_context: opencl command queue creation failed");
  }

  /// in g++ there is a bug that hangs up if caller throws,
  /// so registerer is noexcept although it'd better not
  /// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70298
  std::call_once(global_cl_context_init_flag, registerer, std::ref(instance));
  return instance;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a condition to check it in the registerClKernels() as well.

if (!layer_kernel_ptrs.empty())

nntrainer/layers/cl_layers/fc_layer_cl.h Show resolved Hide resolved
nntrainer/layers/cl_layers/fc_layer_cl.h Outdated Show resolved Hide resolved
nntrainer/layers/cl_layers/concat_cl.cpp Outdated Show resolved Hide resolved
@EunjuYang EunjuYang added the WIP label Nov 18, 2024
- This commit is draft
- This commit splits kernel registeration from forwarding function.
- This is WIP. This commit contains example update for concat_cl and
fc_layer_cl.

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Eunju Yang <[email protected]>
- This commit updates reshape_cl.cpp/.h to inherit LayerImplCl.
- This commit implements registerClKernels(), which is called in
context_cl.cpp
- update fc_layer_cl.h (removing redundant variable)
- update register_kernels only return true when all kernels are
successfully registered.
- add conditional code to check kernel is already registered

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Eunju Yang <[email protected]>
- This commit do a fp16-related bugfix in concat_cl.cpp.
- add condition `ENABLE_FP16`
- update __fp16 to _FP16

Signed-off-by: Eunju Yang <[email protected]>
@EunjuYang
Copy link
Contributor Author

📢 An additional commit to fix fp16-related issue in concat_cl is included : da18596
🩹 @djeong20 's recommendation is applied. Please review it and give me feedbacks. Thanks.

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EunjuYang, 💯 All CI checkers are successfully verified. Thanks.

- clang format is applied.
- revert Android.mk
- fix bug in registerClKernels

Signed-off-by: Eunju Yang <[email protected]>
Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EunjuYang, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Contributor

@djeong20 djeong20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the hard work! 👍

int dim = int(input1_batch_size * input1_width * input1_height *
(input1_channels + input2_channels));
int dim = int(input1_batch_size * input1_channels * input1_width *
(input1_height + input2_height));

opencl::Buffer inputA(cl_context_ref.context_inst_,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about change the oopencl::Buffer to take the Tensor itself? Then we can set clCreateBuffer depending on type and we do not need to consider the type here.

Buffer::Buffer(ContextManager &context_manager, int size_in_bytes,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants