Skip to content

Cold Flow Runs

Karl W. Schulz edited this page Aug 17, 2021 · 21 revisions

Cold Flow Validation Run Startups (July 14)

Note: starting tet meshes in Box: icp-torch/meshes/cold-flow-mod2-outlet (I uploaded new versions with higher-order elements [order=3] for the 50K and 117K sizes). Until discrepancy is better understood on the precision differences between cpu/gpu results, recommendation is to start these runs off on Quartz. Suggest focusing on the following two run configs; make sure to save HDF5 restart files in backup directories periodically.

Config A: 50K mesh

  • Flow rate - 40slpm

    • mesh -> cold-flow-mod2-outlet.order3.50K.msh
    • pressure-based outlet
    • Example Quartz timing(s), Git Version: 4fde4fa
      • 9 nodes/320 MPI tasks
        • 0.074 secs/iteration (p1), Initial time-step: 1.7903731e-07s (CFL=0.5)
        • 0.152 secs/iteration (p2), Initial time-step: 1.0742239e-07s (CFL=0.3)
        • 0.302 secs/iteration (p3), Initial time-step: 4.2968955e-08s (CFL=0.12)
      • 20 nodes/320 tasks
        • 0.037 secs/iteration (p1) [~3.1 wall clock days/flow thru]
        • 0.072 secs/iteration (p2) [~10.1 wall clock days/flow thru]
        • 0.141 secs/iteration (p3) [~49.4 wall clock days/flow thru]
      • 40 nodes/1440 MPI tasks <--- let's start here
        • 0.021 secs/iteration (p1) [~1.76 wall clock days/flow thru]
        • 0.039 secs/iteration (p2) [~5.5 wall clock days/flow thru]
        • 0.078 secs/iteration (p3) [~27 wall clock days/flow thru]
          • reduced to 0.06418 secs/iteration (p3) with be4bbfb version
  • [Startup Run 1] - start here, can restart with .h5 files from karl [COMPLETE]

    • Start with CFL=0.5, p=1 on 40 nodes/1440 MPI tasks
    • With p=1, have 823,060 total unknowns
    • Run ~1-2 flow thru's
    • [Update 7/26/21]:
      • Ran ~4.4M iterations till nan'd using non-reflecting pressure outlet
      • Restarted from ~4.2 iterations using reflecting pressure outlet BC and ran till 12M iterations (t=1.77 secs)
  • [Continuation Run 2]

    • Pick up where Run 1 completed and raise order, will need to add RESTART_FROM_AUX option to runfile
    • CFL=0.3, p=2
    • With p=2, have 2,057,650 total unknowns
    • Run this for an additional flow thru
    • Enable averaging and gather some results; decide on whether to then run p=3 or alternate mesh
    • [Update 7/26/21]:
      • Restarting from p=1 at t=1.77 secs
      • 0.039 secs/iteration; dt=8.716927e-08
    • [Update 8/6/21]:
      • Completed 28.5M iterations (0.033287 secs/iter with two sets of perf mods)
      • t=3.193, so have a full flow thru at p=2 -> 1.42 secs
    • [Update 8/17/21]:
      • Enabled averaging at i=28500010 (every 25th timestep), t=3.19 secs
      • Ran till i=47500000 iters, t=4.82 secs (1.63 secs with averaging)
  • [Continuation Run 3]

    • Pick up where Run 2 completed and raise order
    • CFL=0.12, p=3
    • With p=3, have 4,115,300 total unknowns
    • [Update 8/4/21]:
      • Restarting from p=2 at t=2.89306
        • On 60 nodes (2160 cores) -> 0.0488 secs/iteration, dt=3.4272543e-08
        • [~16.5 wall clock days/flow thru]
    • enabled averaging at i=32000000 (every 25th timestep), t=3.13 secs

Config B: 117K mesh

  • [Run 1] - 40slpm
    • mesh -> cold-flow-mod2-outlet.order3.117K.msh
    • pressure-based outlet

TPS build options I use with container on Quartz:

To build, I just run singularity on the login node. I have a convenience script to launch the container with the following contents:

CONTAINER=mfem-mv2-psm:4.2.tps.r1

singularity exec -B /p/lustre2/$USER:/home/ohpc \
	    -B /var/run/munge/:/var/run/munge/ \
	    -B /etc/slurm:/etc/slurm \
	    --env SLURM_JOB_ID=$SLURM_JOB_ID \
	    --cleanenv /usr/workspace/utaustin/containers/$CONTAINER /bin/bash --login

Once in the container, I change to the directory in my Quartz $HOME where tps is cloned and configure similarly to Marvin as follows:

./configure CXXFLAGS="-g -O3 -I$PETSC_INC" LDFLAGS="-L$PETSC_LIB -lpetsc"