Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workflow considerations on coordinating the operation part and the develop part #437

Open
guoqing-noaa opened this issue Aug 15, 2024 · 0 comments

Comments

@guoqing-noaa
Copy link
Contributor

guoqing-noaa commented Aug 15, 2024

This is intended to collect possible workflow differences between the operation part and the develop part so that we may have a better picture on adding the MPAS+JEDI components and making them NCO-compliant and also flexible to run on the develop machines (WCOSS2, Hera, Jet, Orion, Hercules).

To clarify, the operation part refers to all directories/files excluding the rocoto/ and util directories.
The develop part refers to the rocoto/ and util directories which help generate develop experiments and associated rocoto workflow XML files.

operation has dedicated computing resources so jobs will run without waiting.
operation will purge job run directories immediately after completion.
operation will only run one fix configuration on one machine
operation will use the ecflow workflow management software.
In operation , the archive, graphics will run offline outside of the rrfs-workflow.

In develop,
jobs usually wait a certain amount of time to run,
need to keep job run directories for a certain amount of time to facilitate debugging,
no data purge on disks on Hera/Jet/Orion/Hercules/Gaea
online clean and archive tasks will be needed to help clean up disk spaces and archive develop experiments.
develop will run different configurations (such as conus 3km, conus 12km, atlantic 12km, atlantic 4km, North American 3km, etc) on different computer platforms (such as Hera/Jet/Orion/Hercules/WCOSS2/Gaea, etc).
develop will use the rocoto workflow management software.

To accommodate those differences, the following measures are considered:

  1. side loading for non-NCO tasks, such as clean, archive, graphics. They don't need J-jobs/ex-scripts and will be put under util/sideload.

  2. The rocoto workflow management software does not provide some job card variables as ecflow. To compensate for this, a util/sideload/launch.sh script is added to mimic the ecflow behavior and provide a switch which routes a task to either J-jobs or non-NCO tasks

  3. use ${cpreq} to copy files/directories that are required for a job to function. In most situations, soft links work better at the develop stage, so the following line is added in util/sideload/launch.sh to tweak the cpreq command for develop:
    export cpreq="ln -snf"

  4. Use links to manage fix files (more detailed information here. In NCO implementation, do something as follows:
    cp -rpL fix fix2; rm -rf fix; mv fix2 fix
    will make a hard copy of fix files needed for operation

  5. In order to separate concerns and only export required environmental variables for a task at runtime, a cascade config structure will be adopted. Resource configuration (such as ACCOUNT, QUEUE, PARTITION, RESERVATION, NODES, WALLTIME, NATIVE, MEMORY etc) are only needed in the experiment setup process and will be separated from the runtime configuration. More detailed information here.

  6. exp.setup or similar files under parm/config/exp will be used to set up top-level variables for an experiment, such as directories, NET, VERSION, TAG, days if it is a realtime run or retro period if it is a retro run. Users can also preempt some environmental variables here. These files are to facilitate quickly setting up a develop experiment (retro or realtime, different machines and different grids/resolutions). These files are not needed in operation

  7. The core of the workflow will only consider the NCO naming convention for all existing operational products (such as gfs grib2 files, etc). However, the workflow will provide example link utilities under util to use hard or soft links to convert users' specific naming conventions to match the NCO standard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant