Skip to content

Load Balancing

Kengo TOMIDA edited this page Apr 12, 2019 · 17 revisions

By default, Athena++ assumes that each MeshBlock has the same cost and tries to distribute MeshBlocks as evenly as possible. However, some additional physics may cause uneven computational cost. Athena++ provides two ways to adjust the load balance. Note that this feature may cause lower performance if it is not used correctly and there are some imitations, so please read the following description carefully.

Manual load balancing

If the cost of each MeshBlock is predictable and stable, one can use MeshBlock::SetCostForLoadBalancing in MeshBlock::UserWorkInLoop to assign the cost manually. Athena++ try to redistribute MeshBlocks so that the total cost per node becomes as even as possible. This function must be called from all the MeshBlocks.

Two control parameters for load balancing can be set in the input file.

    <loadbalancing>
    interval   = 10        # interval between load balancing (default = 10)
    tolerance  = 0.5       # acceptable load imbalance (default = 0.5 = 50%)

The interval parameter sets how frequently the load balancing is performed. Since the load balancing itself is somewhat costly as it involves global communication, we recommend to perform it only occasionally. Note that when AMR is in use and one or more MeshBlocks are refined/derefined, it always perform load balancing regardless of the interval parameter.

The tolerance parameter specifies the maximum permissible load imbalance. When the total cost in a node exceeds the average cost by this factor, it triggers the load balancing process. Because of the limited flexibility of the load balancing (see the notes below), this value should not be set too low. Also, for AMR, this parameter is ignored and set automatically because the adjustability changes according to the number of MeshBlocks (this implementation may subject to change).

Automatic load balancing

The other option is automatic load balancing based on timing. With this feature enabled, the code measures the computation time for each MeshBlock and redistribute the load accordingly. To enable this feature, set the automatic flag to true in the input file.

    <loadbalancing>
    automatic  = true      # enable automatic load balancing (default = false)
    interval   = 10        # interval between load balancing (default = 10)
    tolerance  = 0.5       # acceptable load imbalance (default = 0.5 = 50%)

The other parameters remain the same as in the manual load balancing.

Notes on load balancing

First of all, the use of these load balancing features is recommended only when there is physics causing substantially uneven load. The default load balancing is optimal when the cost is proportional to the number of MeshBlocks.

The load balancing works well only when there are sufficiently many MeshBlocks per node to allow flexible load redistribution. If there is one or two MeshBlocks per node, it is likely that the new load distribution does not improve the performance significantly, or it is even possible that no better solution can be found. Roughly speaking, the number of MeshBlocks per node needed for good load balancing should be as large as the maximum ratio between the maximum and minimum cost MeshBlock. These features should be used only when the load imbalance is not negligible because smaller MeshBlocks can cause poor performance and you may achieve a better performance by ignoring minor load imbalance.

Also, it is not always possible to improve the performance because the code redistribute MeshBlocks only along the Z-ordering. If the high cost MeshBlocks are badly localized on the Z-ordering, it is possible that the code cannot adjust the load sufficiently. This is the limitation of the current implementation.

Adding a new task to load balancing

When a new task is added, it should be registered to the load balancer if the cost of that task should be counted. This is controlled by lb_flag boolean variable in the Task structure. If this flag is true the code measures the time spent by this task, and it is ignored if the flag is false. It should be set in the AddTask function in the TaskList as follows:

    case (MY_PHYSICS):
      task_list_[ntasks].TaskFunc=
          static_cast<TaskStatus (TaskList::*)(MeshBlock*,int)>
          (&TimeIntegratorTaskList::MyPhysicsTask);
      task_list_[ntasks].lb_time = true;
      break;

Tasks involving MPI receiving should not be counted, as its time depends on the condition of the network and other nodes.

Clone this wiki locally