Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow registration of entries for new versions of a tool but with trivial differences only #160

Open
joncison opened this issue Apr 5, 2017 · 4 comments
Assignees
Labels
API Concerns the bio.tools API. complex feature request We expect this will be hard to do. content Concerns bio.tools content. GUI Concerns the bio.tools GUI. wontfixsoon We won't get to this soon (not to say it's not important!)

Comments

@joncison
Copy link
Member

joncison commented Apr 5, 2017

The API and GUI should discourage or (ideally) disallow registration of new tool versions with trivial differences only. The intention was for bio.tools to include tool versions with major differences from an end-user / usage perspective not merely record the fact that the version number has changed (for whatever possibly trivial reason).

It boils down to settling what constitutes a valid new version. I'd suggest:

  • a change in the available modes of operation, i.e. number of functions
  • a change in a given function
    • a change in what EDAM Operation(s) a given tool function performs
    • a change in the EDAM Data is supported for input or output

Ignore EDAM formats for now.

Once settled, the above should be captured:

  • on-line documentation (http://biotools.readthedocs.io/en/latest)
  • tool tip in bio.tools registration interface
  • QA / QC system (checking & reporting of trivial versions)
  • registration mechanism (ideally disallowing trivial versions)
@joncison joncison added API Concerns the bio.tools API. complex feature request We expect this will be hard to do. content Concerns bio.tools content. docs GUI Concerns the bio.tools GUI. labels Apr 5, 2017
@joncison joncison added the high priority We get to these once "critical priority" issues are done. label Apr 10, 2017
@bug1303
Copy link

bug1303 commented May 3, 2017

I see some problems in regard to maintenance here (again depending on what bio.tools should and should not be...):

Precise version numbers are required to ensure reproducibility of any bioinformatics project.

If bio.tools actually becomes the authority in handling IDs that can then be referenced in publications as well, and that includes a version number, those should not suddenly disappear. At the same time developers would want to keep their version numbers updated.

With the perspective of integrating with bioconda, one of the key features (the selling point for me really) are environments where you can ensure specific versions of certain packages to be installed and share those with your co-workers.

Of course having all kind of minor versions, creates a lot of redundant information. Ideally, upon a search in bio.tools, I would only see the latest version in the results page. (Or any major version before when there have been significant changes as described above.) But it should still be possible to reference a certain version.

Just some examples, why a version number might change...

  • a bug has been discovered and fixed (in this case you wouldn't want to use the older version anymore, but it's still relevant to know if it has been used in a project and how that might have screwed the analysis)
  • the output format might have changed for some reason or the way a certain statistic is computed (this is of course bad for a user in terms of backwards compatibility, but it happens and won't change the associated EDAM terms, it's a prime example for the points mentioned above...)
  • additional features have been implemented (in some cases this won't change main input/operation/output, but just allows an additional parameter to be tweaked; which is beyond the detail of annotation of most tools)

@joncison
Copy link
Member Author

joncison commented May 3, 2017

Thanks and I agree, the version information is crucial for all reasons given

The current position (from discussions with @ekry et al) is that we should associate version information with (at least) a publication ID and (maybe) other fields, e.g. download, such that it is at least clear what version number is associated with what publication and downloads such as binary or source package.

In this scenario, we'd retain versionIDs, but these would not be part of the tool URL; a profound change, i.e. abandoning (for now at least) our ambition of providing:
htpps://bio.tools/toolID/versionID

and simply supporting:
htpps://bio.tools/toolID/

providing a unique and (once clean-up of toolIDs is complete, cc @hans) persistent reference to the tool. This we're thinking is a more realistic aim, at least in 1st instance, given the available resources.

This is critical issues so discussion is good here.

@joncison
Copy link
Member Author

joncison commented Sep 6, 2018

Just a note cc @bug1303, in biotoolsSchema 3.0.0 (supported in the next release of bio.tools) you can assign version information to a publication, download and otherID of a tool. The entry itself can also receive version information in a flexible way. But this version information isn't (and won't) be baked into the tool identifiers themselves (see https://biotools.readthedocs.io/en/latest/what_is_biotools.html#bio-tools-tool-identifiers)

The version number is a precise thing - provided by the tool developer, but distinct from the tool identifier (provided by bio.tools, based on supplied tool names).

What constitutes a unique version of a tool (and what's registered) is thus down to the provider (the entries being subject to bio.tools admin curation) - we're aiming for bio.tools records that capture major functional differences (not all versions of a tool have such differences). WIth more time and resources, we could go further. But for now, I close the issue (feel free to re-open and comment!).

cc @hansioan

@joncison joncison assigned joncison and hansioan and unassigned ekry Sep 6, 2018
@joncison joncison added the wontfixsoon We won't get to this soon (not to say it's not important!) label Sep 6, 2018
@joncison joncison removed the docs label Nov 9, 2018
@joncison joncison removed the high priority We get to these once "critical priority" issues are done. label Dec 14, 2018
@joncison
Copy link
Member Author

see to what extent biotoolsLint can detect suspected duplicate entries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Concerns the bio.tools API. complex feature request We expect this will be hard to do. content Concerns bio.tools content. GUI Concerns the bio.tools GUI. wontfixsoon We won't get to this soon (not to say it's not important!)
Projects
None yet
Development

No branches or pull requests

4 participants