Simplify pipeline parameters #202

WackerO · 2023-11-22T10:37:33Z

Description of feature

From some of the recent issues (and also from my own experience) I think the multitude of semi-optional parameters in the pipeline is a problem.

Something like exploratory_assay_names is not marked as a required param in the nextflow schema, but effectively it is, as if the param is not correctly set, the pipeline will fail. Other params, like those for a GTF file (e.g. features_id_col), are expected even when the user does not input a GTF file.
The param differential_file_suffix should either be removed completely (as it depends on the input type of the dataset) or be made optional; the suffix should be determined automatically by the pipeline.

I'll see which parts I can simplify, I think this would help make it easier to run the pipeline, especially for users who don't know much about coding. If anyone has additional ideas, please post them here!

The text was updated successfully, but these errors were encountered:

olgabot · 2023-11-27T17:50:54Z

Hi @WackerO -- could this issue be caused by exploratory_assay_names? #196

I don't see --exploratory_assay_names listed here in the documentation:

But I do see it in the main branch:

differentialabundance/nextflow_schema.json

Line 385 in a3d664c

"exploratory_assay_names": {

pinin4fjords · 2023-12-14T13:14:46Z

Something like exploratory_assay_names is not marked as a required param in the nextflow schema, but effectively it is, as if the param is not correctly set, the pipeline will fail.

That's just an oversight in the schema definition, please feel free to fix. But we should probably put this option and the suffix one (and similar) in a new 'Advanced options' section. This IS a parameter, even if the users don't change it, and it's nice for advanced user to tweak the suffixes if they want.

You'll notice that exploratory_assay_names is a hidden parameter to discourage users from changing it (which is why you can't see it in the UI @olgabot - you shouldn't need to interact with it). We could do the same with others.

The alternative is hard-coding all these things into the workflow code, which don't think would be an improvement.

Other params, like those for a GTF file (e.g. features_id_col), are expected even when the user does not input a GTF file.

That's because it's required in the absence of a GTF file - it's required everywhere we need to cross reference matrix rows with feature annotation (which is lots of places). If there is a context in which the pipeline can work without this parameter, feel free to highlight it, and we can make the parameter optional (and add checks everywhere it's needed but not supplied).

WackerO · 2024-01-10T12:57:55Z

I'm so sorry, for some reason I was not notified of the activity in this issue and only now saw your responses!

Hi @WackerO -- could this issue be caused by exploratory_assay_names? #196

Hmm, I'm not entirely sure but don't think that the issue is related to the names...

You'll notice that exploratory_assay_names is a hidden parameter to discourage users from changing it (which is why you can't see it in the UI @olgabot - you shouldn't need to interact with it). We could do the same with others.

Aah, fair enough!

That's because it's required in the absence of a GTF file - it's required everywhere we need to cross reference matrix rows with feature annotation (which is lots of places). If there is a context in which the pipeline can work without this parameter, feel free to highlight it, and we can make the parameter optional (and add checks everywhere it's needed but not supplied).

You are indeed right! I think when I wrote that point, I was looking at the nextflow.schema description of the param which is Feature ID attribute in the GTF file (e.g. the gene_id field). I think this should be explained in a bit more detail, I'll think of some text and you can tell me what you think of it.

I'll start working on a PR to at least make the name_col params optional (so that by default, the respective id_col will be used instead of some hard-coded value like gene_name) and we can see what else can be simplified in that PR :)

WackerO · 2024-07-04T12:59:03Z

Closed by #254

WackerO added the enhancement New feature or request label Nov 22, 2023

WackerO self-assigned this Nov 22, 2023

WackerO added this to Hackathon: May 2024 Mar 5, 2024

WackerO mentioned this issue Mar 21, 2024

Some parameter changes, added qbic credits #254

Merged

10 tasks

WackerO closed this as completed Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify pipeline parameters #202

Simplify pipeline parameters #202

WackerO commented Nov 22, 2023

olgabot commented Nov 27, 2023

pinin4fjords commented Dec 14, 2023 •

edited

Loading

WackerO commented Jan 10, 2024 •

edited

Loading

WackerO commented Jul 4, 2024

Simplify pipeline parameters #202

Simplify pipeline parameters #202

Comments

WackerO commented Nov 22, 2023

Description of feature

olgabot commented Nov 27, 2023

pinin4fjords commented Dec 14, 2023 • edited Loading

WackerO commented Jan 10, 2024 • edited Loading

WackerO commented Jul 4, 2024

pinin4fjords commented Dec 14, 2023 •

edited

Loading

WackerO commented Jan 10, 2024 •

edited

Loading