Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow collect()'ing when all processing is complete #112

Open
unode opened this issue Jun 19, 2019 · 2 comments
Open

Allow collect()'ing when all processing is complete #112

unode opened this issue Jun 19, 2019 · 2 comments

Comments

@unode
Copy link
Member

unode commented Jun 19, 2019

Scenario:

  1. 12 samples are being processed using the parallel machinery lock1() and collect().
  2. 10 samples complete and 2 fail.
  3. The 2 failing samples are considered bad and are excluded from the sample file.

At this point re-running ngless has no effect since all work is complete however the merged output from collect() was never generated.

collect() can also fail to occur in rare cases where the last two samples finish almost simultaneously or filesystem lag prevents the last two processes from seeing all samples as complete.

@unode
Copy link
Member Author

unode commented Sep 24, 2019

In order to keep compatibility with the current behavior (no action when finished), I'm wondering if this should be implemented through a --only-collect command-line option.

Effectively we have to skip all actions (preprocess, map, fastq, paired, ...) except collect but, we still need to have a sample name for collect to act upon.

@luispedro
Copy link
Member

Over the long term, I would prefer an approach where, whenever ngless runs¸ it will create any missing outputs. The whole lock1/collect business is a bit of a hack now. This is probably for NGLess 2, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants