You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is intended to collect possible workflow differences between the operation part and the develop part so that we may have a better picture on adding the MPAS+JEDI components and making them NCO-compliant and also flexible to run on the develop machines (WCOSS2, Hera, Jet, Orion, Hercules).
To clarify, the operation part refers to all directories/files excluding the rocoto/ and util directories.
The develop part refers to the rocoto/ and util directories which help generate develop experiments and associated rocoto workflow XML files.
operation has dedicated computing resources so jobs will run without waiting. operation will purge job run directories immediately after completion. operation will only run one fix configuration on one machine operation will use the ecflow workflow management software.
In operation , the archive, graphics will run offline outside of the rrfs-workflow.
In develop,
jobs usually wait a certain amount of time to run,
need to keep job run directories for a certain amount of time to facilitate debugging,
no data purge on disks on Hera/Jet/Orion/Hercules/Gaea
online clean and archive tasks will be needed to help clean up disk spaces and archive develop experiments. develop will run different configurations (such as conus 3km, conus 12km, atlantic 12km, atlantic 4km, North American 3km, etc) on different computer platforms (such as Hera/Jet/Orion/Hercules/WCOSS2/Gaea, etc). develop will use the rocoto workflow management software.
To accommodate those differences, the following measures are considered:
side loading for non-NCO tasks, such as clean, archive, graphics. They don't need J-jobs/ex-scripts and will be put under util/sideload.
The rocoto workflow management software does not provide some job card variables as ecflow. To compensate for this, a util/sideload/launch.sh script is added to mimic the ecflow behavior and provide a switch which routes a task to either J-jobs or non-NCO tasks
use ${cpreq} to copy files/directories that are required for a job to function. In most situations, soft links work better at the develop stage, so the following line is added in util/sideload/launch.sh to tweak the cpreq command for develop: export cpreq="ln -snf"
Use links to manage fix files (more detailed information here. In NCO implementation, do something as follows: cp -rpL fix fix2; rm -rf fix; mv fix2 fix
will make a hard copy of fix files needed for operation
In order to separate concerns and only export required environmental variables for a task at runtime, a cascade config structure will be adopted. Resource configuration (such as ACCOUNT, QUEUE, PARTITION, RESERVATION, NODES, WALLTIME, NATIVE, MEMORY etc) are only needed in the experiment setup process and will be separated from the runtime configuration. More detailed information here.
exp.setup or similar files under parm/config/exp will be used to set up top-level variables for an experiment, such as directories, NET, VERSION, TAG, days if it is a realtime run or retro period if it is a retro run. Users can also preempt some environmental variables here. These files are to facilitate quickly setting up a develop experiment (retro or realtime, different machines and different grids/resolutions). These files are not needed in operation
The core of the workflow will only consider the NCO naming convention for all existing operational products (such as gfs grib2 files, etc). However, the workflow will provide example link utilities under util to use hard or soft links to convert users' specific naming conventions to match the NCO standard.
The text was updated successfully, but these errors were encountered:
This is intended to collect possible workflow differences between the operation part and the develop part so that we may have a better picture on adding the MPAS+JEDI components and making them NCO-compliant and also flexible to run on the develop machines (WCOSS2, Hera, Jet, Orion, Hercules).
To clarify, the
operation
part refers to all directories/files excluding therocoto/
andutil
directories.The
develop
part refers to therocoto/
andutil
directories which help generate develop experiments and associated rocoto workflow XML files.operation
has dedicated computing resources so jobs will run without waiting.operation
will purge job run directories immediately after completion.operation
will only run one fix configuration on one machineoperation
will use the ecflow workflow management software.In
operation
, thearchive
,graphics
will run offline outside of the rrfs-workflow.In
develop
,jobs usually wait a certain amount of time to run,
need to keep job run directories for a certain amount of time to facilitate debugging,
no data purge on disks on Hera/Jet/Orion/Hercules/Gaea
online
clean
andarchive
tasks will be needed to help clean up disk spaces and archive develop experiments.develop
will run different configurations (such asconus 3km
,conus 12km
,atlantic 12km
,atlantic 4km
,North American 3km
, etc) on different computer platforms (such as Hera/Jet/Orion/Hercules/WCOSS2/Gaea, etc).develop
will use the rocoto workflow management software.To accommodate those differences, the following measures are considered:
side loading for non-NCO tasks, such as clean, archive, graphics. They don't need J-jobs/ex-scripts and will be put under
util/sideload
.The rocoto workflow management software does not provide some job card variables as ecflow. To compensate for this, a
util/sideload/launch.sh
script is added to mimic the ecflow behavior and provide a switch which routes a task to either J-jobs or non-NCO tasksuse
${cpreq}
to copy files/directories that are required for a job to function. In most situations, soft links work better at the develop stage, so the following line is added inutil/sideload/launch.sh
to tweak thecpreq
command fordevelop
:export cpreq="ln -snf"
Use links to manage fix files (more detailed information here. In NCO implementation, do something as follows:
cp -rpL fix fix2; rm -rf fix; mv fix2 fix
will make a hard copy of fix files needed for
operation
In order to separate concerns and only export required environmental variables for a task at runtime, a cascade config structure will be adopted. Resource configuration (such as
ACCOUNT, QUEUE, PARTITION, RESERVATION, NODES, WALLTIME, NATIVE, MEMORY
etc) are only needed in the experiment setup process and will be separated from the runtime configuration. More detailed information here.exp.setup
or similar files underparm/config/exp
will be used to set up top-level variables for an experiment, such as directories,NET, VERSION, TAG
, days if it is a realtime run or retro period if it is a retro run. Users can also preempt some environmental variables here. These files are to facilitate quickly setting up a develop experiment (retro or realtime, different machines and different grids/resolutions). These files are not needed inoperation
The core of the workflow will only consider the NCO naming convention for all existing operational products (such as gfs grib2 files, etc). However, the workflow will provide example link utilities under
util
to use hard or soft links to convert users' specific naming conventions to match the NCO standard.The text was updated successfully, but these errors were encountered: