Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💌 Files from Debian 🐞 #57

Open
5 tasks
matuskalas opened this issue Feb 26, 2020 · 3 comments
Open
5 tasks

💌 Files from Debian 🐞 #57

matuskalas opened this issue Feb 26, 2020 · 3 comments

Comments

@matuskalas
Copy link
Contributor

matuskalas commented Feb 26, 2020

In the awesome YAMLs from Debian - despite of the highly appreciated perfectionism - we found a couple of 🐞🐛. @smoe if/when you'll have a bit of time for hacking the "yamlDump" again, it'd be super lovely if you could test the following records/phenomena with the edam.sh and edamJson2biotools.py? e^6 thanks!! 🚀🙏🏽

  • beast looks like a cluttered record from beast-mcmc and beast-mcmc2 deb src packages, resulting in invalid YAML
description: >
  BEAST is a cross-platform program for Bayesian MCMC analysis of molecular
  sequences. It is entirely orientated towards rooted, time-measured
  phylogenies inferred using strict or relaxed molecular clock models. It
  can be used as a method of reconstructing phylogenies but is also a
  framework for testing evolutionary hypotheses without conditioning on a
  single tree topology. BEAST uses MCMC to average over tree space, so that
  each tree is weighted proportional to its posterior probability. Included
  is a simple to use user-interface program for setting up standard
  analyses and a suit of programs for analysing the results.
version: 1.10.4
no new upstream version of beast-mcmc (1.x) but rather a rewritten
  version.
version: 2.6.0
  • The same problem for soapdenovo, deb src pkgs soapdenovo and soapdenovo2

Note 1: In these 2 cases, there is probably a reason to maintain both major versions in Debian (or isn't it?), and therefore we should consider that in bio.tools too: consider whether they should have different descriptions, and maybe also EDAM annotation (if not, keep just 1 record), plus credits, pubs, ...

  • Are there any more pairs of src pkgs that point to the same bio.tools record? What should be the general solution, or options, here? Any additional ideas on this issue @hmenager @bgruening ? (e.g. having 2 debian.yaml files in 1 bio-tools/content directory for the start? )

  • dnacopy: some YAML validators are happy, but some dislike the colon+space in R package: DNA copy number data analysis

  • bowtie: funny "punctuation" of function Genome indexing (Burrow-Wheeler). Ok in bowtie2. It looks like the only occurence of this phenomenon.

Note 2: Btw., @hmenager @bgruening @OlegZharkov have the best experiences with using the ruamel.yaml python lib for creating pretty YAML files.

@hmenager
Copy link
Collaborator

excellent summary @matuskalas , thanks. Now, regarding the problem of multiple packages related to one bio.tools entry, I would strongly argue that:
1- we do not try to solve this necessarily by having one bio.tools entry for each debian package or else. this won't scale or apply to all situations, because e.g. one tool might actually be rightfully packaged as multiple debian packages (one for library, one for graphical interface, etc.)
2- instead we come up with a naming mechanism whereby we can have multiple debian files in a "toolid" repository.
e.g.:
in the beast folder we would have two debian med files, beast-mcmc.debian.yaml and beast-mcmc2.debian.yaml. this way we keep all the information from debian, and all the links!

@joncison
Copy link
Contributor

Just to chip in here with a reminder of bio.tools scope vis-à-vis tools and packages - bio.tools aims to be a registry of unique tool functionality, which means the default is for different interfaces providing essentially the same functionality to go in one entry (see the guidelines written for IFB recently). cc @hansioan

@matuskalas
Copy link
Contributor Author

Exactly @joncison 👍

Therefore, as examples:

  • Deb src packages clustalw and clustalx are different interfaces with more-or-less the same functionality, same versioning, and are distributed together (upstream outside of Debian)
    ==>>
    1 record in bio.tools/clustal2

  • clustalw and clustalo are 2 diverged versions also with more-or-less the same functionality, and more-or-less the same interface. But they behave differently and are applicable to different sets of input data
    ==>>
    2 separate records in bio.tools. In addition, the 2 "versions" are so different, that they ARE NOT treated as 2 versions of the same tool. They are just 2 separate tools (both in Debian and bio.tools)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@matuskalas @hmenager @joncison and others