Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix job runner stdout missing newline #6081

Merged
merged 8 commits into from
Nov 25, 2024

Conversation

jennan
Copy link
Contributor

@jennan jennan commented Apr 23, 2024

Kia ora,

While implementing a new job runner, I realised that the if the job runner submit method returns its stdout text without a newline, then a workflow with more than one task will fail. After further investigations with @hjoliver , we discovered that this is due to text handling in JobRunnerManager.jobs_submit, that doesn't properly add a newline to the text message if it is missing. This PR implements a fix.

You can reproduce the bug by removing the newline of the second output of the background job runner submit method (see https://github.com/cylc/cylc-flow/blob/master/cylc/flow/job_runner_handlers/background.py#L92) and try to run a workflow with at least 2 tasks.

I am not sure how to turn this into a test to reproduce the bug. Any guidance on this would be welcome 🙂.


[UPDATE (HO)] Short explanation: without this fix, if a job-runner submit method returns job-ID without a newline, it causes the cylc jobs-submit stdout to be misformatted (two lines concatenated) such that the scheduler can't parse it correctly - for one out of the multiple jobs submitted.


Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • No dependency changes
  • Tests are included (or explain why tests are not needed).
  • Changelog fragment included if this is a change that can affect users
  • No cylc-doc PR needed
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@jennan jennan changed the base branch from 8.2.x to master April 23, 2024 23:30
@hjoliver
Copy link
Member

hjoliver commented Apr 24, 2024

Thanks @jennan !

@oliver-sanders (UK team) noted overnight that he ran into this some time ago, but wasn't sure it was a bug so added a note on the need for newline in the handler API documentation: https://github.com/cylc/cylc-flow/blob/master/cylc/flow/job_runner_handlers/documentation.py#L426-L428

However, now that you've tracked down the offending code, by inspection (and also by comparison with similar in jobs_kill(...), it is definitely a bug.

Most of the current job-runners don't provide an explicit submit method, they just use the command template as a shortcut. However, the background handler does, and it deliberately adds a newline to its return value. https://github.com/cylc/cylc-flow/blob/master/cylc/flow/job_runner_handlers/background.py#L92

@jennan - can you tweak this PR to:

  • not add the newline in the background handler (above), as it's not necessary now
  • and remove the newline advice in the docs (above)

@hjoliver
Copy link
Member

@jennan - no cylc-doc PR is needed for this. It could do with a change log entry though, just in case anyone else is working on job runners at the moment (unlikely but not impossible!). Just add a file changes.d/6081.fix.md with a short description (see other md files in there for example).

@MetRonnie MetRonnie added this to the 8.3.1 milestone Apr 30, 2024
@MetRonnie MetRonnie changed the base branch from master to 8.3.x June 19, 2024 12:34
@oliver-sanders oliver-sanders modified the milestones: 8.3.1, 8.3.x Jun 27, 2024
@hjoliver hjoliver self-assigned this Oct 23, 2024
@hjoliver hjoliver added the bug Something is wrong :( label Oct 23, 2024
jennan and others added 5 commits October 23, 2024 14:40
When the stdout returned by the job runner submit method doesn't end with
a newline, the `sys.stdout.write` call in `jobs_submit` do not add it and
this cases submission to fails (at least to be recorded as failed by the
calling process when running a workflow).

Looking at how newlines are handled in the JobRunnerManager.job_kill method,
it looks like this a typo in the jobs_submit method.
@hjoliver hjoliver force-pushed the fix_job_runner_newline branch from 28cfe66 to 32aed6a Compare October 23, 2024 01:43
@@ -89,7 +89,7 @@ def submit(cls, job_file_path, submit_opts):
exc.filename = "nohup"
return (1, None, str(exc))
else:
return (0, "%d\n" % (proc.pid), None)
return (0, str(proc.pid), None)
Copy link
Member

@hjoliver hjoliver Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This newline was compensating for the bug, pre-fix).

@@ -424,8 +424,7 @@ def submit(
ret_code:
Subprocess return code.
out:
Subprocess standard output, note this should be newline
terminated.
Subprocess standard output.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We now add the newline automatically)

@hjoliver
Copy link
Member

@oliver-sanders - I've finished this off, and simplified the affected code a bit.

Not sure how to go about testing it.

@hjoliver hjoliver marked this pull request as ready for review October 23, 2024 01:53
@oliver-sanders
Copy link
Member

Not sure how to go about testing it.

Have had a go: hjoliver#59

….add_tests

Fix job runner newline.add tests
@hjoliver
Copy link
Member

Merged and pushed your new test @oliver-sanders - this is good to go now.

@oliver-sanders
Copy link
Member

Great, the other coverage hole was indeed a false negative, Codecov now says 100% of diff hit.

@oliver-sanders oliver-sanders modified the milestones: 8.3.x, 8.3.7 Nov 25, 2024
@oliver-sanders oliver-sanders merged commit c266fd8 into cylc:8.3.x Nov 25, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants