Try out PoCL-R #258

maleadt · 2024-10-16T12:58:21Z

This feature of PoCL makes it possible to offload to multiple servers, see pocl/pocl#1621 (comment). That could be an interesting approach to distributing code, although I fear that our current SPMD-style kernels are written with local GPUs in mind and thus won't scale well to a distributed execution environment.

(I know OpenCL.jl isn't the best place to keep track of this, but it's close enough, and issues on pocl_jll.jl have no visibility.)

pjaaskel · 2024-10-16T14:44:38Z

There's also a sub-buffer migration manager in PoCL-R. That is, data decomposition could be done using sub-buffers to the main buffer and the runtime manages the sub-buffer transfers directly between nodes as well (this is a rather fresh feature for which I'd like to see more testing). Thus, distributing the workload using OpenCL only would minimally require generating multiple SPMD kernel commands instead of a single SPMD one (for each local/remote GPU/CPU) and splitting the data with sub-buffer offsets. And of course launching a bit more of the kernel commands just to hide the data transfer latencies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try out PoCL-R #258

Try out PoCL-R #258

maleadt commented Oct 16, 2024

pjaaskel commented Oct 16, 2024

Try out PoCL-R #258

Try out PoCL-R #258

Comments

maleadt commented Oct 16, 2024

pjaaskel commented Oct 16, 2024