You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An enduring issue with the model right now is the incapacity to efficiently build at-scale. Every stencil takes a significant amount of time to build due to the well known under performance of nvcc. This coupled to the fact that the cube sphere means we have up to 9 different code path (following the placement of a rank on any given tile), this leads to build time up into the 3+ hrs.
A solution is to use distributed compilation on multi-node*. Using the new identify code path technique, that guarantees relocability we should be able to compile with 54 ranks and scale up to any layout.
Here's an outline of a solution:
Rank 0 spins a file socket server - acting as a scheduler for everybody else
When hitting FrozenStencil, the rank queries the server for stencil state
Build: stencil is not built - build it
Stub: stencil is being built - stub for now come back when execution is needed
Load: stencil is ready load it
When a stencil needs to be executed, the rank queries the server until given the "Load" call
Why not multithread? Because Python+ GIL = sad developer
The text was updated successfully, but these errors were encountered:
This issue should also be the base for a full uprooting and refactor of the CompileConfig, distrubuted_caches, etc. and all build/load system that has been growing in multiple files in both Orchestrated and Stencil based system.
The build system should be the same for all execution - presenting an unify API to users to create workflow.
An enduring issue with the model right now is the incapacity to efficiently build at-scale. Every stencil takes a significant amount of time to build due to the well known under performance of
nvcc
. This coupled to the fact that the cube sphere means we have up to 9 different code path (following the placement of a rank on any given tile), this leads to build time up into the 3+ hrs.A solution is to use distributed compilation on multi-node*. Using the new identify code path technique, that guarantees relocability we should be able to compile with 54 ranks and scale up to any layout.
Here's an outline of a solution:
FrozenStencil
, the rank queries the server for stencil stateThe text was updated successfully, but these errors were encountered: