Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algorithm comparision between MVAPICH and SST-Macro #684

Open
sunsirui opened this issue Aug 13, 2022 · 0 comments
Open

Algorithm comparision between MVAPICH and SST-Macro #684

sunsirui opened this issue Aug 13, 2022 · 0 comments

Comments

@sunsirui
Copy link

sunsirui commented Aug 13, 2022

Hi~
Question: Is there a reasonable path in MVAPICH that is the same as the algorithm used in SST-Macro, in other words, the two benchmarks use the same parameter file (same hardware information). Can SST-Macro get the same similar results as MVAPICH.
Take MPI_Allreduce and MPI_Barrier as an example:

Algorithm in SST-Macro:

  1. MPI_Barrier : bruck algorithm
  2. MPI_Allreduce : Wilke-Halving (The wilke algorithm is a variation binary blocks algorithm)
    2.1 First reduce rounds(similar to recuriseve-halving algorithm)
    2.2 Second recv rounds (similar to bruck algorithm)

Algorithm in MVAPICH :

  1. MPI_Barrier :
    1.1 : if mv2_use_osu_collectives:(default)  use pairwise exchange with recursive doubling algorithm
    1.2 : else : dissemination algorithm (the bruck algorithm)
  2. MPI_Barrier :
    2.1 : if mv2_use_osu_collectives:(default) What algorithm is not analyzed
    2.2 : else :
    short messages:  size <= MPIR_CVAR_ALLREDUCE_LONG_MSG_SIZE
    long messages:  size > MPIR_CVAR_ALLREDUCE_LONG_MSG_SIZE
    2.2.1 For long messages , we use Rabenseifner's algorithm.
        First  recuriseve-halving algorithm is used.
        Second  recursive doubling algorithm is used.
      2.2.2 For short messages, we use a recursive doubling algorithm.

Based on the algorithm implemented by MPI_Allreduce and MPI_Barrier, it is found that the same algorithm is not used by default in SST-Macro and MVAPICH.
The current test osu_allreduce and osu_barrer benchmarks are in SST-Macro and MVAPICH, and the results are quite different.
As shown in the figure below: The configuration information is shown in parameter.ini (same as the hardware information)
parameters.ini (all benchmark use the same one)
node {
name = simple
app1 {
launch_cmd = aprun -n 4 -N 1
exe=./osu_allreduce_sst
allocation = node_id
node_id_allocation_file = andy-node_id_allocation_topo1_4.txt
mpi {
max_vshort_msg_size = 16384
max_eager_msg_size = 16384
post_header_delay = 0.81us
post_rdma_delay = 0.13us
rdma_pin_latency = 0.9us
rdma_page_delay = 1ns
eager_cutoff = 524288
allgather = ring
}
}
proc {
frequency = 2.6 GHz
ncores = 8
parallelism = 16
}
memory {
name = pisces
total_bandwidth = 12.8GB/s
latency = 12.5ns

arbitrator = cut_through

}
nic {
name = pisces
negligible_size = 0
injection {
mtu = 4096
arbitrator = cut_through
bandwidth = 100Gb/s
latency = 300ns
credits = 64KB
}
ejection{
mtu = 4096
arbitrator = cut_through
bandwidth = 100Gb/s
latency = 300ns
credits = 64KB
}
}
os{
compute_scheduler = simple
stack_size = 128KB
stack_chunk_size = 2MB
}
}
switch {
router {
name = table
}
name = pisces
arbitrator = cut_through
mtu = 512
link {
bandwidth = 200Gb/s
latency = 130ns
credits = 64KB
}
xbar {
bandwidth = 16Tb/s
}
logp {
bandwidth = 200Gb/s
hop_latency = 116ns
out_in_latency = 60ns
}
}

topology {
name = file
filename = topology.json
routing_tables = routing-table.json
}

Using a performance KPI to measure the results of osu_allreduce and osu_barrier (MVAPICH and SST-Macro comparison), the performance can only reach 60% and 70% similar

Hence the question:: Is there a reasonable path in MVAPICH that is the same as the algorithm used in SST-Macro, in other words, the two benchmarks use the same parameter file (same hardware information). Can SST-Macro get the same similar results as MVAPICH.
Thanks a lot,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant