[ GPU ] GPU Kernel creation time #2723

EunjuYang · 2024-08-29T09:14:39Z

For now, it seems clCreateKernel is called whenever any type of _cl wrapper function is invoked.

For example,

FullyConnectedLayer::forwarding -> dotCl -> dot_cl -> clCreateKernel

Considering the timing of calling _cl functions, this could potentially slow down performance. (Of course, it already avoids duplicate of registration. However, it may not be enough for speed up.)
Since NNTrainer already has a compilation phase, what about to move the kernel registration process into the compilation stage?
During the compilation step, we can identify which computational units are utilized by each layer and generate the corresponding kernels accordingly.

The text was updated successfully, but these errors were encountered:

taos-ci · 2024-08-29T09:14:42Z

cibot: Thank you for posting issue #2723. The person in charge will reply soon.

jijoongmoon · 2024-09-04T01:08:13Z

I'm thinking to change the current implementation cl context structure.
I think we need cl kernel context and cl_context has it. and when it creates cl layer, it will set the cl kernel context..
And then, it might be possible to register custom cl kernel at finalize function of cl layer.
and also Tensor kernel which is developed by us ( like gemm cl kenels ) could be initialized when kernel context is
created. (like default Kernel we provide)

s-debadri · 2024-09-12T04:44:23Z

PR #2732 has been created for addressing this issue. Following is the plan:

Added registerClKernel function at cl_context to register custom OpenCl kernels as well as in-house kernels.
Used hash map to track created Kernel objects. shared_ptr was used t store inside the map.
Modified sscal using above Kernel creation flow to remove dependency of layer_context.

In progress: Removing layer_context dependency for existing kernels.

EunjuYang · 2024-10-21T01:56:53Z

[Suggestion / Need Discussion]

As I understood, current GPU's ClContext is expected to be created with LayerNode.
What about moving its first creation time to AppContext creation by adding it as a member variable to AppContext?
By doing so, we can further abstract user API to create Layer without directly calling Cl Layer's name.

EunjuYang added the DISCUSSION label Aug 29, 2024

EunjuYang mentioned this issue Sep 12, 2024

[gpu/enhance] Registering OpenCL kernels at cl_context #2732

Merged

EunjuYang mentioned this issue Oct 7, 2024

[GPU/OpenCL] Updated the SwiGLU, Reshape and Concat Layers with latest GPU pipeline changes @open sesame 10/04 17:23 #2745

Closed

EunjuYang mentioned this issue Oct 21, 2024

[ GPU ] move initBlaseClKernels() to registerer #2761

Merged

EunjuYang mentioned this issue Nov 4, 2024

[ GPU/OpenCL ] Split kernel registration from forwarding method #2785

Open

EunjuYang mentioned this issue Nov 25, 2024

[ Wait for #2785 ] [ GPU / RMSNorm layer ] split kernel register from forwarding function #2804

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ GPU ] GPU Kernel creation time #2723

[ GPU ] GPU Kernel creation time #2723

EunjuYang commented Aug 29, 2024 •

edited

Loading

taos-ci commented Aug 29, 2024

jijoongmoon commented Sep 4, 2024

s-debadri commented Sep 12, 2024

EunjuYang commented Oct 21, 2024

[ GPU ] GPU Kernel creation time #2723

[ GPU ] GPU Kernel creation time #2723

Comments

EunjuYang commented Aug 29, 2024 • edited Loading

taos-ci commented Aug 29, 2024

jijoongmoon commented Sep 4, 2024

s-debadri commented Sep 12, 2024

EunjuYang commented Oct 21, 2024

EunjuYang commented Aug 29, 2024 •

edited

Loading