Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curation work to ensure all entries give "canonical" tool descriptions #10

Open
joncison opened this issue Feb 1, 2019 · 10 comments
Open
Assignees

Comments

@joncison
Copy link
Contributor

joncison commented Feb 1, 2019

One of many issues around GitHub-based content management for bio.tools.

@joncison
Copy link
Contributor Author

joncison commented Feb 1, 2019

Curation work to remove remaining entry redundancy ensuring a non-redundant set of “canonical” tool descriptions - this is mostly done but see e.g. bio-tools/biotoolsRegistry#282

@joncison
Copy link
Contributor Author

joncison commented Feb 5, 2019

@hansioan, can we make a definitive list of actions here? To my mind it's this:

@joncison
Copy link
Contributor Author

joncison commented Feb 7, 2019

@bgruening @piotrgithub1 @matuskalas - me, Hans & Herve have been making a major push in content clean-up (mostly ID verification, tool names and redundancy removal) in preparation for data dump (#2).

Bearing in mind that the vision for bio.tools is to provide "canonical" descriptions of unique tools, may I ask please that if you have a view on clean-ups that need doing in this regard, to let us know very soon please. e.g. do we satisfy the requirement for integration of data from bioconda etc.

We hope to get the clean-up complete by end of next week.

@bgruening
Copy link
Contributor

@joncison what do you need? Imho we can deal with this after the push. Bioconda will deal with whatever bio.tools drop. Bioconda has already started to annotate packages with bio.tools IDs, so ideally they should keep stable and the content should be YAML from our side. But otherwise, we will know more if we start working on it :)

@joncison
Copy link
Contributor Author

joncison commented Feb 7, 2019

I was wondering whether any of you guys know already of content issues that would make the integration hard, duplicates (which are now I think nearly all resolved) being an obvious case. We need also to do this clean-up for a paper soon to be submitted (we're all co-authors) - the main reason for doing it now. Rest assured the dump will go ahead ASAP.

@bgruening
Copy link
Contributor

Thanks @joncison! My take on this is, we create the bot and create the content-validation scripts and if things fail, because of duplicates or such, we will know and can fix it.

@joncison
Copy link
Contributor Author

joncison commented Feb 7, 2019

very good - which would trap any currently unknown issues (and soon we'll have fixed all the known ones). ps for the validation angles we already have biotoolsLint (currently just harvesting ideas)

@joncison
Copy link
Contributor Author

quick update @bgruening and @hmenager : @hansioan and me are making sweeping progress on above, but it's a huge job ... will keep you posted. The (clean) content dump will follow once we're done.

@joncison
Copy link
Contributor Author

joncison commented Mar 19, 2019

quick update @bgruening @piotrgithub1 me and @hansioan are done with the clean-ups (huge job) only thing left is a final verification of IDs (for things added in last weeks). Once that's done I'll close this issue. I'm not claiming all the content is now perfect, but it's a lot better than it was a couple of months ago in terms of redundancy, sensible IDs, ownership etc. cc @hmenager

@joncison
Copy link
Contributor Author

UPDATE
All things mooted on Feb 5 have been done, but keep this open because there will be further improvements to make, no doubt.

@joncison joncison changed the title Curation work to assure all entries give "canonical" tool descriptions Curation work to ensure all entries give "canonical" tool descriptions May 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants