Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transforms of agro, envo, and gaz fail during post-processing #186

Open
caufieldjh opened this issue Sep 19, 2022 · 9 comments
Open

Transforms of agro, envo, and gaz fail during post-processing #186

caufieldjh opened this issue Sep 19, 2022 · 9 comments
Labels
bug Something isn't working

Comments

@caufieldjh
Copy link
Collaborator

Describe the bug

The agro transform appears to go as expected, until it hits post-processing:

Transforming agro to tsv...
[KGX][cli_utils.py][    transform_source] INFO: Processing source 'agro.json'
INFO:kg-obo:No errors in parsing ['data/agro/2021-11-05/agro.json'].
Post-processing agro...
INFO:kg-obo:Post-processing agro...
Failed to remap node IDs - could not find corresponding nodes.
Failed post-processing agro...
INFO:kg-obo:Failed post-processing agro...
WARNING:kg-obo:Failed to transform agro

To Reproduce

python run.py --bucket kg-hub-public-data --save_local --get_only agro

Expected behavior

Post-processing for this OBO should update 4 CURIEs and write out the updated nodes file.

Version

efc2324

@caufieldjh caufieldjh added the bug Something isn't working label Sep 19, 2022
@caufieldjh
Copy link
Collaborator Author

caufieldjh commented Sep 19, 2022

A clue - the CURIEs to be updated are all wikidata URLs and should get the prefix WIKIDATA:, but they get the prefix WD_Entity: instead. Bioregistry knows about that alternate prefix but it isn't in the imported maps.

@caufieldjh
Copy link
Collaborator Author

The post-processing fails because KG-OBO finds prefixes it wants to rewrite, writes them to the update_id_maps.tsv, but then finds that the nodefile doesn't contain any of those nodes since they have been converted to WD_Entity: already.

@caufieldjh
Copy link
Collaborator Author

This is a conversion kgx is doing - transforming the obojson version also yields WD_Entity nodes:

kgx transform -i obojson -f tsv -o agro_test agro.json

This is true for both kgx 1.5.9 and 1.7.0.

@caufieldjh
Copy link
Collaborator Author

@caufieldjh
Copy link
Collaborator Author

Essentially we need to deactivate the prefix maps handled by the kgx prefix manager (https://kgx.readthedocs.io/en/latest/reference/prefix_manager.html).

@caufieldjh
Copy link
Collaborator Author

The transform of envo has a nearly identical issue.

@caufieldjh caufieldjh changed the title Transform of agro fails during post-processing Transforms of agro and envo fail during post-processing Sep 19, 2022
@caufieldjh
Copy link
Collaborator Author

Same with gaz.

@caufieldjh caufieldjh changed the title Transforms of agro and envo fail during post-processing Transforms of agro, envo, and gaz fail during post-processing Sep 20, 2022
@caufieldjh
Copy link
Collaborator Author

xco has a potentially related issue, though with MESH.

@caufieldjh
Copy link
Collaborator Author

The big workaround here is to just be less stringent about incomplete mappings. Right now, if we attempt to remap 2 nodes and 2 fail, we consider the whole transform failed, but if just 1 fails, it clears. The priority should be on having a transform; there may be 10000x as many perfectly prefixed nodes in there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant