-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fq lint module update: exit on failed validation #7000
Closed
Closed
Changes from 2 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't change the module, generally we try to let the tools work as they do, without overlaying additional logic.
We can add some logic to the subworkflow + workflow that would filter out any libraries failing linting. You may already see logic in rnaseq that we use for trimming, strand failures etc, and we can use the same mechanism.
But my understanding is that fq_lint will return a non-zero error code on a failure? That should trigger a stop without this. Am I mistaken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pinin4fjords Yeah, kind of. fq lint has different validators that are all assigned a different code, and if a FastQ file fails the linting, you will see in the log what code is failed on. You can't grep for the codes themselves in the log, as the log will say upfront what validators are enabled by code. That's why I had to grep for "fq-lint end" and check if it doesn't exist. That's the list line you'll see in the linting log if the linting was successful, so failed FastQ files won't have that in their log.
So, fq lint doesn't actually put out an error code itself. If we left the current module alone as is, then that process will successfully complete on any FastQ file, even corrupt ones, which defeats the purpose of adding the linting into the pipeline.
I am definitely open to trying out some different logic to filter out samples that failed linting. Are you thinking those samples would just filtered out, and the pipeline would keep going with the good samples, or that the pipeline would exit if it finds a sample that failed linting? We find that our users often want the pipeline to stop so that they can try reuploading (or redownloading from source and then reuploading) the offending FastQ file, which has worked in a lot of cases. Maybe another conditional that could be set as to if a user wants the pipeline to exit if a sample fails linting or if they want the pipeline to continue with successful samples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is the following. Assume a bad fastq file, foo.fq, with content like:
The output of
fq lint foo.fq
is:This produces a non-zero error code:
Because Nextflow runs processes with
set -e
turned on, the task will fail when it encounters that issue. Your addition will have no effect.I see that
fq
has its own codes, but that wasn't what I meant.But the other point I mentioned is important. Modules in nf-core should reflect the native behaviour of the tool as much as possible, otherwise users don't know what to expect. We shouldn't layer in additional errors etc as you were doing here (even if it worked). If this WAS a problem (and again, I don't think it is, because of
set -e
), we would need to introduce any custom logic at the pipeline level, perhaps as a 'local' module that parsed the error logs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, ok, I follow now! Let me up take out the custom code from the module.
I think I got mixed up and noticed the pipeline was trying to re-run failed linting jobs, which I didn't want - unless the linting job was failing due to a pipeline resource error. How are those two different scenarios handled in nextflow, i.e., if the linting completes but the FastQ file was bad and it produces a non-zero error code, I would want the pipeline to fail. However, if the linting doesn't complete because of a resource issue or something, then I would want the pipeline to retry that task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the specific rnaseq example, we'd just make sure to set the appropriate label so the process gets enough resource to start.
But to the general point, you can use dynamic retry strategies to catch exit codes reflecting e.g. OOM:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually you could PR against this, if you feel the current value isn't appropriate.