Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make main.nf process multiple datasets #4

Open
tischi opened this issue Nov 10, 2023 · 6 comments
Open

make main.nf process multiple datasets #4

tischi opened this issue Nov 10, 2023 · 6 comments
Labels
question Further information is requested

Comments

@tischi
Copy link
Contributor

tischi commented Nov 10, 2023

Hi @BioinfoTongLI,

@sebgoti and myself were wondering how to process multiple images with the current workflow.

I guess this line of code would need some modification?

@tischi tischi added the question Further information is requested label Nov 10, 2023
@BioinfoTongLI
Copy link
Contributor

yes. but that is not the ultimate solution.
The ultimate solution would be writing all the metas and their corresponding paths into a json or yaml file and pass them as -params-file.
Then split them into channels using channel.from
I can commit an example file later

@tischi
Copy link
Contributor Author

tischi commented Nov 10, 2023

I am often using wild-cards:

nextflow run main.nf --inputPattern "IJ-TIFF/*_s01.tif"

And then inside main.nf:

images = Channel.fromPath("${params.inputPattern}")

Is that bad practice?

@BioinfoTongLI
Copy link
Contributor

not at all. That's a clever way of doing it. But in our case here, we also have meta and output dir tightly attached to each input image folder. You will have issues say if you want to apply different blurring to different set of images. And the output are folders as well. This is what makes this more complex....

@BioinfoTongLI
Copy link
Contributor

this commit should allow multidataset processing
2d7434e
the command line to run:
nextflow run . -params-file data/input_params.yaml
It should run ok for the first time. But it will throw error in the second run, saying the folder already exists etc. Just delete the output folder in that case.
Basically we need to find a better way for -resume/rerun etc.

@krokicki
Copy link
Contributor

krokicki commented Nov 11, 2023

I think the nf-core way is to use a samplesheet CSV file. This is even tied into nf-core parameter validation now (see recent Nextflow Summit talk) so you can validate the internals of the samplesheet. Kind of off topic, but I think parameter validation would be really useful. It's very easy to create a schema using the web-based schema builder.

Here's an example of using samplesheets for processing multiple CZI/MVL files. Note that I still have the old-style validation in there with a custom Python script.

When I get time I can work on adding this, if you all agree it's the way to go.

@BioinfoTongLI
Copy link
Contributor

I wasn't able to attend the summit. Bummer!
Yes, this looks great to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants