You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This feature of PoCL makes it possible to offload to multiple servers, see pocl/pocl#1621 (comment). That could be an interesting approach to distributing code, although I fear that our current SPMD-style kernels are written with local GPUs in mind and thus won't scale well to a distributed execution environment.
(I know OpenCL.jl isn't the best place to keep track of this, but it's close enough, and issues on pocl_jll.jl have no visibility.)
The text was updated successfully, but these errors were encountered:
There's also a sub-buffer migration manager in PoCL-R. That is, data decomposition could be done using sub-buffers to the main buffer and the runtime manages the sub-buffer transfers directly between nodes as well (this is a rather fresh feature for which I'd like to see more testing). Thus, distributing the workload using OpenCL only would minimally require generating multiple SPMD kernel commands instead of a single SPMD one (for each local/remote GPU/CPU) and splitting the data with sub-buffer offsets. And of course launching a bit more of the kernel commands just to hide the data transfer latencies.
This feature of PoCL makes it possible to offload to multiple servers, see pocl/pocl#1621 (comment). That could be an interesting approach to distributing code, although I fear that our current SPMD-style kernels are written with local GPUs in mind and thus won't scale well to a distributed execution environment.
(I know OpenCL.jl isn't the best place to keep track of this, but it's close enough, and issues on pocl_jll.jl have no visibility.)
The text was updated successfully, but these errors were encountered: