Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add more species to pointfinder analysis list #144

Closed
wants to merge 1 commit into from

Conversation

pimarin
Copy link

@pimarin pimarin commented Apr 21, 2022

Hi,
I would like to increase the number of available species used by staramr from the pointfinder DB. I simply modified the list of species in
staramr/blast/pointfinder/PointfinderBlastDatabase.py and compared the output with the pointfinder webservice, which are identical all tested species.
Is there more to do as a first step to increase analysis ?
Then, I would like:

  • Implement the automatic detection of new species in the database when updated, with test for available file format
  • Add an option to build a database from raw files which could be tranlated in the pointfinder format to be analyzed
    @classmethod
    def get_available_organisms(cls):
        """
        A Class Method to get a list of organisms that are currently supported by staramr.
        :return: The list of organisms currently supported by staramr.
        """
        return ['campylobacter', 'enterococcus_faecalis', 'enterococcus_faecium','escherichia_coli',
                'helicobacter_pylori', 'klebsiella','mycobacterium_tuberculosis','neisseria_gonorrhoeae',
                'plasmodium_falciparum','staphylococcus_aureus', 'salmonella']

@apetkau
Copy link
Member

apetkau commented Apr 26, 2022

Hello @pimarin ,

Thanks so much for this PR. I really appreciate it 😄

Which dataset did you use to test out on the resfinder/pointfinder web service? Is it something anybody can download?

I describe a bit about why I hadn't added support for other species in pointfinder here phac-nml/galaxy_tools#218 (comment)

In general, though, it's because there were some mutations in promotor regions (with negative coordinates) and deletions, which I had never explicitly added support for in staramr (though I have always intended to): https://bitbucket.org/genomicepidemiology/pointfinder_db/src/8706a6363bb29e47e0e398c53043b037c24b99a7/e.coli/resistens-overview.txt#lines-63:68

I'm not sure if the test dataset you used would include mutations in these regions, which is why I am wondering where it came from.

Implement the automatic detection of new species in the database when updated, with test for available file format

This is a great idea :). Do you have a particular method in mind? I had a small issue about this (#84), and thought of trying to just re-use the results of the mlst software (which auto-detects an mlst scheme which often corresponds to an organism). But there might be better ways then this.

Add an option to build a database from raw files which could be tranlated in the pointfinder format to be analyzed

Yes, this is another great idea. Another option would be to do a bit of refactoring/abstractions to provide support for these raw files to be directly loaded up in staramr (instead of converting them to the pointfinder directory structure). This could possibly be done by making abstractions of the classes in here https://github.com/phac-nml/staramr/tree/master/staramr/blast/pointfinder

@apetkau
Copy link
Member

apetkau commented May 18, 2022

I apologize @pimarin since I think I misunderstood your first suggestion. You were referring more to detecting when new species are added to the PointFinder database, whereas I was thinking this was referring to automated detection of which organism a particular genome is so you no longer have to set --pointfinder-organism when running staramr. I think this suggestion is good as well.

However, I think I have a better solution. I was mostly using the list returned by get_available_organisms() to make sure that organisms/species from the PointFinder database aren't selected until I have validated that they work in staramr. However, maybe this is a bit too strict. I am thinking of switching this over so that you can pass any acceptable value to --pointfinder-organism that exists in the PointFinder database, but if it's not in the get_available_organisms() list, you will get a warning that the results produced by staramr for this PointFinder organism haven't been validated.

I think this is a better solution as it will let people run staramr with any new PointFinder organisms that are available (but still provide some feedback about which organisms have been validated). I I have made an issue for this: #147

I still do also plan to implement the support for indels that are keeping staramr from providing full support for other pointfinder organisms in the future.

I hope this would still work for you? I am going to close this PR.

@apetkau apetkau closed this May 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants