-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about using crossMapParallel #102
Comments
Dear Young-Ho, Indeed, in that tutorial I only showcase the simpler
Please let me know if you have further questions and thank you for trying out our pipeline! 💎 Best wishes, |
Francisico, thank you for answer. I've tried There was no error until
I think there is some error in line 20 of my Snakefile about rule all.
Could you give me an advise to solve this problem? Best regrards, |
Hi Young-Ho, Thank you for reporting back. Here are some of my ideas:
Lines 598 to 623 in b11dd7c
The line is in the output field is silenced by default to avoid conflicting/identical output rule definitions. E.g. to tell the Lines 390 to 406 in b11dd7c
e.g.
Please let me know if this helps with your problems. |
Hi, Francisco. Thanks to your advice, the problem could be solved. I modified Snakefile as you said:
After that, I could run But, when I tried to run
The required files that were shown in error messages appear to be in the
Any comment to resolve this problem would be appreciated. Best regards, |
Hi Young-Ho, The fact that the error is pointing towards Just to make sure, could you also double check the following?
Please let me know if this helps! |
Dear Francisco, Sorry for late reply. I was busy with other 16s data processing using QIIME2 and bioconductor. For suggestion 1, I commented out rule
I could solve above problem by uncomment line 683 of Snakefile (
For suggestion 2, I guess my coverage_table.tsv has correct dimension like follows: channelbiome:~/metaGEM/concoct$ wc -l `find . -name coverage_table.tsv`
7718 ./sample1/cov/coverage_table.tsv
10051 ./sample2/cov/coverage_table.tsv
7843 ./sample3/cov/coverage_table.tsv
25612 total
channelbiome:~/metaGEM/concoct$ head -3 ./sample1/cov/coverage_table.tsv
target_id kallisto_coverage_sample1 kallisto_coverage_sample2 kallisto_coverage_sample3
k119_5532-flag=1-multi=5.0000-len=1352.concoct_part_0 6.508876 0.000000 0.000000
k119_22129-flag=1-multi=4.0000-len=1356.concoct_part_0 6.047198 0.000000 0.000000
channelbiome:~/metaGEM/concoct$ head -3 ./sample2/cov/coverage_table.tsv
target_id kallisto_coverage_sample1 kallisto_coverage_sample2 kallisto_coverage_sample3
k119_59674-flag=1-multi=2.0000-len=1040.concoct_part_0 0.769231 1.153846 4.038462
k119_44204-flag=1-multi=3.0000-len=1069.concoct_part_0 2.245089 14.593078 4.677268
channelbiome:~/metaGEM/concoct$ head -3 ./sample3/cov/coverage_table.tsv
target_id kallisto_coverage_sample1 kallisto_coverage_sample2 kallisto_coverage_sample3
k119_39765-flag=1-multi=5.0000-len=1125.concoct_part_0 7.111111 0.000000 0.711111
k119_20930-flag=1-multi=2.0000-len=1196.concoct_part_0 3.344482 0.000000 1.337793 I always appreciate your detailed and kind response. Best regards, |
Dear Young-Ho, Aplogies for my late response.
There is no need to re-exectute the rules from fastp to concoct, since you have already generated the individual binner results. The next step is to refine them, as you have attempted above. I believe that Snakemake is getting mixed up with rule dependencies between the two mapping workflows, and that is the reasoning behing the suggestion to remove potentially problematic rules that have already finished running or are not needed for the desired mapping mode (i.e. series or parallel). Sorry for these issues, the Snakefile is configured by default for the mapping in-series workflow. I will consider how to improve this aspect, perhaps it makes sense to have separate Snakefiles for the different mapping modes. I have found that Snakemake sometimes gets confused and complains about incomplete files or missing files when it shouldn't, or it tries to re-generate already present files. To avoid this, it can sometimes help to delete the hidden snakemake folder that is automatically generated in your metaGEM directory e.g. Also I curious if these problems are specific to a version of Snakemake. Could you tell me what version you are using? e.g. Thanks and best wishes, |
Dear Francisco, My snakemake version was 7.2.1 rm -rf assemblies/ benchmarks/ concoct/ kallisto* qfiltered/ stats/* tmp/* .snakemake/* And executed all metaGEM tasks from Running metabat2 ...
terminate called after throwing an instance of 'boost::wrapexcept<boost::program_options::too_many_positional_options_error>'
what(): too many positional options have been specified on the command line
/bin/bash: line 57: 2648 Aborted (core dumped) metabat2 -i $fsampleID.fa $id.sort -s 50000 -v --seed 420 -t 0 -m 1500 -o $(basename $(dirname /home2/channelbiome/metaGEM/metabat/sample3/sample3.metabat-bins)) I think I've found some clue to solve this problem. The version of When I try to run metabat in bash command line like follows, metabat2 -i sample2.fa sample2.metabat-bins.sort -s 50000 -v --seed 420 -t 0 -m 1500 -o sample2
So, I changed the line 697 of SnakeFile from After that, But,
Regards, |
Dear Young-Ho, Thank you very much for reporting back with this versioning issue. Indeed, I just checked the tool versions specified in the metaGEM paper (see Table 1 in methods section), and it shows
Just to clarify, when I mention intermediate files I am refering to the files in your main metaGEM folder under subfolder names like
It should be noted that the error here is not thrown by maxbin, but rather it is Snakemake that cannot resolve the file dependencies between the different tasks. I think I see what the problem is here, and apologies for not catching it earlier. As I quoted in my first comment:
Essentially, for the parallel cross map workflow, you should be using the It was quite tricky dealing with this issue when we were developing/implementing. You can read a bit more about it in this issue. How many metagenomic samples do you have? and what type of microbiomes are you looking at? Perhaps you may want to try the series mapping mode? This could be another potential solution: split your dataset into chunks for cross mapping and binning. For example, if you had 200 samples you could split them into four 50 sample sub-datasets, cross map within those, and then generate bins. I hope this helps! |
Dear Francisco, Sorry for late reply. Our team are trying to collect around 400 samples. Regards, |
Dear Mr. Zorrilla,
Thank you for inventing wonderful tool.
I could go through all tasks described in https://github.com/franciscozorrilla/unseenbio_metaGEM/README.md,
using crossMapSerial.
But, after using crossMapParallel, concoct task fails with following error messages:
Are there some other tasks should be done to use
crossMapParallel
such askalistoIndex
orkalisto2concoct
?I've heard that
crossMapSerial
becomes impractical for large datasets. (#60 (comment))So, I hope to go through all metaGEM tasks using
crossMapParallel
.Best regards,
Young-Ho
The text was updated successfully, but these errors were encountered: