Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GO Error Reports: Deprecated rules in report; Warnings vs. errors; Counting #387

Open
suzialeksander opened this issue Apr 18, 2022 · 8 comments

Comments

@suzialeksander
Copy link
Collaborator

This was emailed through helpdesk, but there appear to be at least a few different things to be addressed so this ticket can be moved/duplicated as appropriate.

Hello,

I’m responsible for submissions of GAF files to GO from RGD. However, I’m confused: where should I look to see the result from processing of our latest submission?

My doc says, that I should look at http://current.geneontology.org/reports/rgd-summary.txt and then look for details in the other files, f.e http://current.geneontology.org/reports/rgd-report.html.

Let’s see: http://current.geneontology.org/reports/rgd-summary.txt displays among others:

For rule GO_AR:0000014 (http://www.geneontology.org/GO.annotation_qc.shtml#GO_AR:0000014)
Valid GO term ID, there are 1590 violations with type Error.

For rule GO_AR:0000008 (http://www.geneontology.org/GO.annotation_qc.shtml#GO_AR:0000008)
No annotations should be made to uninformative high level terms, there are 447 violations with type Warning.

For rule GO_AR:0000011 (http://www.geneontology.org/GO.annotation_qc.shtml#GO_AR:0000011)
ND annotations to root nodes only, there are 2 violations with type Warning.

For rule GO_AR:0000014 (http://www.geneontology.org/GO.annotation_qc.shtml#GO_AR:0000014)
Valid GO term ID, there is one violation with type Warning.

As you see, for the same rule GO_AR:0000014 first it reports that there are 1590 Error violations, and then it says that there is 1 violation of type Warning.

When you go to github to see what the rule 14 is: https://github.com/geneontology/go-site/blob/master/metadata/rules/gorule-0000014.md, you read: this rule has been deprecated and merged with rule 20. (!!! Surprise !!)

When I go to the full report http://current.geneontology.org/reports/rgd-report.html#gorule-0000020 I see that there are 9 violations of this rule. (Not 1 or 1590, but 9!)

Total confusion and mix up of rules in reports.
So, my question is: what is the correct procedure to see QC report for RGD submissions that I am responsible for?
Are the reports available at current.geneontology.org/reports to be used by GAF file submitters, or they are only to be used internally by GO consortium and GAF submitters should submit their files with the hope that the files are correct?

@kltm
Copy link
Member

kltm commented Apr 18, 2022

Tagging @tutajm . Looking to also loop in @pgaudet .

@suzialeksander
Copy link
Collaborator Author

I looked through the wiki for user-oriented SOP docs but not sure if there's anything other than https://wiki.geneontology.org/Release_Pipeline#Annotation_QC_checks ?

@pgaudet
Copy link
Contributor

pgaudet commented Apr 19, 2022

It looks like http://current.geneontology.org/reports/rgd-summary.txt is not behaving correctly? I think there were two kinds of scripts, one with owl-tools and one with ontobio, and some rules were implemented in both (although the older owl-tools implementation should be removed when the newer ontobio implementation was done.)

In any case, the correct file should be: http://current.geneontology.org/reports/rgd-report.html#rmd
(ie, the page that is linked from the overall report page, http://current.geneontology.org/reports/gorule-report.html)

@tutajm , where is the doc stating that you should look at http://current.geneontology.org/reports/rgd-summary.txt and then look for details in the other files, f.e http://current.geneontology.org/reports/rgd-report.html ? We should update that documentation.

Thanks, Pascale

@pgaudet
Copy link
Contributor

pgaudet commented Apr 19, 2022

It's odd because these .txt files are not available in our files:
http://current.geneontology.org/reports/index.html

@kltm Should they even be generated? Indeed gorule-0000014 is deprecated, and not shown in other reports.

Thanks, Pascale

@kltm
Copy link
Member

kltm commented Apr 19, 2022

@pgaudet The *-summary.txt files are the last remainder of the owltools reports, currently suppressed form getting to the html file, but not fully removed. (geneontology/pipeline#184, from geneontology/go-site#1095). If there is truly no longer any use for these *-summary.txt files, we can either try and prevent their creation or filter them out and prevent them from getting into the final product.

(Also geneontology/go-site#1816)

@tutajm
Copy link
Collaborator

tutajm commented Apr 19, 2022

Regarding question: "where is the doc stating that you should look at http://current.geneontology.org/reports/rgd-summary.txt and then look for details in the other files, f.e http://current.geneontology.org/reports/rgd-report.html ? "

It is my personal workflow, not in any particular docs: I am simply looking first at the summary file, to see if there are any issues classified as 'Error'. And then I am looking at the detailed report to see the actual problematic lines.

Issues tagged as 'Error' will make me look into the problems promptly. However, when seeing 'Warning' issues, (often supplemented by comment that they were autocorrected) I usually go to my other tasks. That's why I liked the summary file to quickly see what's going on with RGD GAF submission.

@pgaudet
Copy link
Contributor

pgaudet commented Apr 19, 2022

Thanks for the clarification. Looks like these files are out of date compared with the current workflow (since this error does not appear in other reports).

@tutajm Can you use http://current.geneontology.org/reports/rgd-report.html#rmd in your workflow?

@kltm can you stop producing these .txt files?

Thanks, Pascale

@kltm
Copy link
Member

kltm commented Apr 19, 2022

@pgaudet That ticket now at geneontology/pipeline#281 and in project

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants