Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strainscan_build failing due to hclsMap_95.txt file having extra lines #24

Open
kheber opened this issue Sep 21, 2024 · 3 comments
Open

Comments

@kheber
Copy link

kheber commented Sep 21, 2024

In creating a custom database using strainscan_build version 1.0.14 from bioconda, I get the following error:

2024-09-21 18:22:29,037 - Constructing matrix with dashing (jaccard index)
2024-09-21 18:22:33,708 - Hierarchical clustering
Traceback (most recent call last):
  File "/data/shared_resources/conda_local/envs/strainscan/bin/strainscan_build", line 10, in <module>
    sys.exit(main())
  File "/data/shared_resources/conda_local/envs/strainscan/lib/python3.7/site-packages/StrainScan/StrainScan_build.py", line 117, in main
    cls_file, cls_res)
  File "/data/shared_resources/conda_local/envs/strainscan/lib/python3.7/site-packages/StrainScan/library/select_rep.py", line 44, in pick_rep
    clsa.append(int(ele[0]))
ValueError: invalid literal for int() with base 10: 'WARNING:'

Looking at the tail of hclsMap_95.txt, I see the following:

1	1	MIKI-NS13
2	1	MIKI-NS15
WARNING:	0	
ignoring	0	
environment	0	
value	0	
of	0	
R_HOME	0

I think this is what is causing the problem.

@liaoherui
Copy link
Owner

Hi, thanks for using StrainScan!

This issue might be related to a problematic filename. Could you share the filename list with me? Alternatively, you can send some of your input genomes for debugging, and I'll test the code to find a solution.

@kheber
Copy link
Author

kheber commented Sep 22, 2024

I have attached the list of genome filenames. They come from the CAMI challenge "strain-madness" dataset, which I downloaded from here.

I did manage to find a temporary fix by providing that hclsMap_95.txt file with the -c option after deleting the problematic lines. The second column added up to the number of genomes I had provided, so I felt it would be ok to do. Do you think it would be valid for me to go forward using the results with what I did?

genome_filenames.txt

@liaoherui
Copy link
Owner

I think you can try. If it completes without errors, it should be valid. Still wondering why this occurs in your hclsMap_95.txt file...

WARNING:	0
ignoring	0	
environment	0	
value	0	
of	0	
R_HOME	0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants