Update biobox_add_taxid wrapper #6344

SantaMcCloud · 2024-09-20T09:53:29Z

FOR CONTRIBUTOR:

I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
License permits unrestricted use (educational + commercial)
This PR adds a new tool or tool collection
This PR updates an existing tool or tool collection
This PR does something else (explain below)

SantaMcCloud · 2024-09-20T09:53:35Z

I change the select column from data_column to integer, since it can happen that you have multiple files as input and all files share the same column as input. With data_column it can not work since my workflow generate a collection which stop it there since it has now data as reference for the column or at least it still throws an error if you try to use it this way.

SantaMcCloud · 2024-09-20T12:31:13Z

@bgruening could you merge this quick to have this in the bot update scope? I really need it this weekend!

bgruening · 2024-09-22T07:10:30Z

Can you please explain this more.

This change seems backwards and maybe there is a Galaxy bug to fix.
Don't worry about the updates, I can install tools during the week if urgent.

SantaMcCloud · 2024-09-22T10:45:12Z

Yes, I can explain it more.

So in this workflow https://usegalaxy.eu/u/santinof/w/gtdb-tk-subworkflow-1 there is the possibility that GTDB-Tk will output 2 summary files. These 2 files will run through 2 other tools. The last tool is Names2taxID which is needed for this tool as input. Here you have now the Problem that you have to set the column where the names are stated in the Names2taxID output but since you have 2 files Galaxy can not refer to a file with the data_colum type. So I did change it to use an integer as a workaround because both file has the same format so both share the same column which has to be stated.

Here is the error msg:

parameter 'column': Dataset 'None' for data_ref attribute 'taxonkit' of parameter 'column' is not a DatasetInstance

and here a History as an example:
https://usegalaxy.eu/u/santinof/h/mag-benchmark-workflow-without-batcami-low-1

Hope this will explain the change if not i can give more details about it!

SantaMcCloud · 2024-09-22T21:26:56Z

Okay, maybe this change doesn't need to be done. I find it strange that when I only got 1 summary file from GTDB-Tk that I end up with a list of a list instead of 2 files in a list. I only saw this now since I did let my workflow run till the error such that I can use the data to work with them manually to get some result. There was one History where this error did not appear, since GTDB-Tk did yield a list with 2 files and not a list of a list.

I will now try the workflow with the flatten tool to see if I cut out the error or not, and I will either close the PR or I will give more details in here

SantaMcCloud · 2024-09-23T10:10:33Z

The workflow still got the error this time in both runs:

https://usegalaxy.eu/u/santinof/h/mag-benchmark-workflow-without-batcami-low-3
https://usegalaxy.eu/u/santinof/h/mag-benchmark-workflow-without-batmarine-sample-0-3

Here is the history where the tool did work, only thing different was that the output from Names2taxID was not flattened before inputting into biobox add taxid.

https://usegalaxy.eu/u/santinof/h/mag-benchmark-workflow-without-batmarine-sample-0-2

SantaMcCloud · 2024-09-23T10:14:50Z

This should not happen that the flatten tool did run on a collection which was not created yet?
This could be the bug? Since Names2taxID will create a list of a list and with flatten it should be a list only but since it did run it take over the list of a list.... really strange......

In the linked History where the error did not appear, it seems that the flatten tool work there but only there.....

After Name2taxID was run I did try flatten again and there you can see it work, so I think there is a bug in galaxy with flatten?

For my workflow i try a workaround to see if using a subworkflow to see of flatten work there since it is forced to wait for the result

paulzierep · 2024-09-23T13:17:56Z

Mhh, I can only assume this, but the input in the history you provided is list(samples):list(summary files); I assume as such, the tool wants to get the column from the first level (which is a collection not a file), maybe we could just merge the summary files (one is for archaea and one for bacteria, right ? To overcome the difficult to handle collection structure ?

paulzierep · 2024-09-23T13:27:36Z

In general, I am wondering how the logic of multiplechosen for taxonkit and data_column works, since the data_column can only choose from one file. Maybe using an integer in this case is a good workaround

paulzierep · 2024-09-23T13:37:38Z

Can you also explain why there can be multiple inputs here: https://github.com/galaxyproject/tools-iuc/blob/303002db06287fb25306020c4391626842f52162/tools/cami_amber/biobox_add_taxid.xml#L86C23-L86C115

SantaMcCloud · 2024-09-23T13:45:34Z

Mhh, I can only assume this, but the input in the history you provided is list(samples):list(summary files); I assume as such, the tool wants to get the column from the first level (which is a collection not a file), maybe we could just merge the summary files (one is for archaea and one for bacteria, right ? To overcome the difficult to handle collection structure ?

Correct this way i want to use tha flatten tool to have all dataset on one Level but in the exapanation above Show that this tool runs wirhout waiting for the needed outputs. Even when merge them when we have the list:list Situation it will still yield this error to see this you can see in the cami error worklow LinkedIn above there Names2axID have only 1 files Aa output but still in the list:list dataype which means it does not work

SantaMcCloud · 2024-09-23T13:49:25Z

In general, I am wondering how the logic of multiplechosen for taxonkit and data_column works, since the data_column can only choose from one file. Maybe using an integer in this case is a good workaround

For the stuff i tested the data_column param type can still be used when habe mutlipe files. The only Problem which can happen is that the mutlipe file does not have any specific format which means that the chopse column is not the same all over each file.

The error is still showed when trying to is manually but Galaxy still runs the tool. You can see this in the not error history (marine-sample-0-2) linked above. There you can try to run biobox add taxid to see the "error" msg in the column para GUI

SantaMcCloud · 2024-09-23T13:50:04Z

Can you also explain why there can be multiple inputs here: https://github.com/galaxyproject/tools-iuc/blob/303002db06287fb25306020c4391626842f52162/tools/cami_amber/biobox_add_taxid.xml#L86C23-L86C115

Can you name the input which i should explain more? :)

bernt-matthias · 2024-09-23T14:36:53Z

So the main problem here is that you have nested lists. Is this expected or a potential problem of the tools running upstream in the workflow? I do not understand yet: does flattening the collection not help?

SantaMcCloud · 2024-09-23T14:52:38Z

So the main problem here is that you have nested lists. Is this expected or a potential problem of the tools running upstream in the workflow? I do not understand yet: does flattening the collection not help?

Correct and it is not expected since only want a list as input. How this happens I can not explain, but for this I build in the flatten tool to eliminate the nested list.

Now to the real problem: It seems that show here

That flatten will be executed right after the job is created, which does not follow the workflow logic since it should have waited for the Names2taxID did finish since this is the input.

I now try to work around with that, I split my Subworkflow into 2 other Workflow such that flatting will be in the second and force (hopefully) to wait till all outputs from the first Subworkflow are created.

I hope this help understanding the Problem a bit better?

bernt-matthias · 2024-09-23T14:57:27Z

Now to the real problem:

Might be also a problem, but I think your primary problem is that an upstream tools produces a nested list and you/we need to understand why.

SantaMcCloud · 2024-09-23T15:06:17Z

Okay now I know how the nested list will be generated. It is because of a batch mode of a different tool which is expected since it can happen that GTDB-Tk can produce 2 files which has to be in the upstream.

Now I have a question, since I didn't find it is there a tool to merge 2 TSV files to one file where the content will be merged by row and not by column? This might work as a problem solver or to change this tool such that the param is an integer and not data_column

bernt-matthias · 2024-09-23T15:41:13Z

There are quite a few tools to concatenate files (one below the other), e.g. https://usegalaxy.eu/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fbgruening%2Ftext_processing%2Ftp_cat%2F9.3%2Bgalaxy1&version=latest

For pasting (adding new columns) https://usegalaxy.eu/?tool_id=Paste1&version=latest

Will close this here. Feel free to reopen if you still think its a bug. Otherwise we can continue discussion at gitter

https://matrix.to/#/#galaxyproject_iwc:gitter.im for workflow questions
https://matrix.to/#/#galaxy-iuc_iuc:gitter.im for tool questions

or of course https://help.galaxyproject.org/

bgruening · 2024-09-28T12:14:54Z

@SantaMcCloud there was also a fix, maybe related, in Galaxy, so check out latest EU.

fix tool

9c55c25

bernt-matthias closed this Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update biobox_add_taxid wrapper #6344

Update biobox_add_taxid wrapper #6344

SantaMcCloud commented Sep 20, 2024

SantaMcCloud commented Sep 20, 2024

SantaMcCloud commented Sep 20, 2024 •

edited

Loading

bgruening commented Sep 22, 2024

SantaMcCloud commented Sep 22, 2024

SantaMcCloud commented Sep 22, 2024

SantaMcCloud commented Sep 23, 2024

SantaMcCloud commented Sep 23, 2024 •

edited

Loading

paulzierep commented Sep 23, 2024

paulzierep commented Sep 23, 2024

paulzierep commented Sep 23, 2024

SantaMcCloud commented Sep 23, 2024

SantaMcCloud commented Sep 23, 2024

SantaMcCloud commented Sep 23, 2024

bernt-matthias commented Sep 23, 2024

SantaMcCloud commented Sep 23, 2024

bernt-matthias commented Sep 23, 2024

SantaMcCloud commented Sep 23, 2024 •

edited

Loading

bernt-matthias commented Sep 23, 2024

bgruening commented Sep 28, 2024

Update biobox_add_taxid wrapper #6344

Update biobox_add_taxid wrapper #6344

Conversation

SantaMcCloud commented Sep 20, 2024

SantaMcCloud commented Sep 20, 2024

SantaMcCloud commented Sep 20, 2024 • edited Loading

bgruening commented Sep 22, 2024

SantaMcCloud commented Sep 22, 2024

SantaMcCloud commented Sep 22, 2024

SantaMcCloud commented Sep 23, 2024

SantaMcCloud commented Sep 23, 2024 • edited Loading

paulzierep commented Sep 23, 2024

paulzierep commented Sep 23, 2024

paulzierep commented Sep 23, 2024

SantaMcCloud commented Sep 23, 2024

SantaMcCloud commented Sep 23, 2024

SantaMcCloud commented Sep 23, 2024

bernt-matthias commented Sep 23, 2024

SantaMcCloud commented Sep 23, 2024

bernt-matthias commented Sep 23, 2024

SantaMcCloud commented Sep 23, 2024 • edited Loading

bernt-matthias commented Sep 23, 2024

bgruening commented Sep 28, 2024

SantaMcCloud commented Sep 20, 2024 •

edited

Loading

SantaMcCloud commented Sep 23, 2024 •

edited

Loading

SantaMcCloud commented Sep 23, 2024 •

edited

Loading