Replies: 2 comments 1 reply
-
For reference, the full specification that runs on Lassen: description:
name: lulesh_sample1
description: A sample LULESH study that downloads, builds, and runs a parameter study of varying problem sizes and iterations on SLURM.
env:
variables:
OUTPUT_PATH: ./sample_output/lulesh
labels:
outfile: $(SIZE.label).$(ITERATIONS.label).log
dependencies:
git:
- name: LULESH
path: $(OUTPUT_PATH)
url: https://github.com/LLNL/LULESH.git
batch:
type : flux
host : quartz
bank : baasic
queue : pbatch
study:
- name: make-lulesh
description: Build the MPI enabled version of LULESH.
run:
cmd: |
cd $(LULESH)
mkdir build
cd build
cmake -WITH_MPI=Off -WITH_OPENMP=Off ..
make
depends: []
- name: run-lulesh
description: Run LULESH.
run:
cmd: |
$(LAUNCHER) $(LULESH)/build/lulesh2.0 -s $(SIZE) -i $(ITERATIONS) -p > $(outfile)
#$(LULESH)/build/lulesh2.0 -s $(SIZE) -i $(ITERATIONS) -p > $(outfile)
depends: [make-lulesh]
nodes: 1
procs: 1
cores per task: 1
use_broker: True
walltime: "00:10:00"
global.parameters:
SIZE:
values : [100, 100, 100, 200, 200, 200, 300, 300, 300]
label : SIZE.%%
ITERATIONS:
values : [10, 20, 30, 10, 20, 30, 10, 20, 30]
label : ITER.%% |
Beta Was this translation helpful? Give feedback.
-
Now that the |
Beta Was this translation helpful? Give feedback.
-
When working with @Jmast on GMD related Maestro configuration, we dove fairly deep into how the Maestro/Flux adapter behaves. We have a few things to discuss/sort out, see below:
Flux Broker (use_broker) Behavior
In the case of
use_broker
, it can be either set toTrue
orFalse
. This setting is meant to control whether or not a nested instance is used to execute a Maestro step. If we use the following step as an example, we see the following behavior.Using a nested instance
The YAML specification for setting the nested instance for running LULESH is as follows:
The resulting script is run under a nested instance, where
$(LAUNCHER)
is replaced by aflux mini -c 1 -n 1 -N 1
call, which bubbles up to the newly created sub-instance of Flux. This code path in the adapter utilizes thefrom_batch_command
code API and results in a job listing that looks like:Forcing steps to use the global instance
When running a step under the global instance, the current version of the specification sets
use_broker
toFalse
and the step looks as follows:The expectation is that this would result in
run-lulesh
jobs running in the top level, but we see duplicate entries. These duplicates are because thefrom_command
API is called which allocates resources, and then even further the$(LAUNCHER)
in the step is replaced with aflux mini
call that calls into the global instance with the end result looking like:We end up with two jobs per Maestro step since
flux mini
is a blocking call which sequesters a new set of resources on top of keeping the original job mapped to therun-lulesh
step alive.What we expect to see
For the two cases, we would expect to see:
use_broker
True
False
Questions
@Jmast -- This is meant to tee up our conversation next week. Feel free to chime in.
@SteVwonder @dongahn -- Just point you here since we've been discussing interfacing between Maestro and Flux.
Beta Was this translation helpful? Give feedback.
All reactions