Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate/fix "taxa without parents" in the pbdb hierarchy #151

Open
KatjaSchulz opened this issue May 27, 2016 · 4 comments
Open

Investigate/fix "taxa without parents" in the pbdb hierarchy #151

KatjaSchulz opened this issue May 27, 2016 · 4 comments

Comments

@KatjaSchulz
Copy link

@eliagbayani : @JRice alerted us to the fact that the pbdb hierarchy has a lot of "root nodes" (370), i.e., hierarchy entries that don't have any parents. Since this is a continually updated, community curated hierarchy, it's expected that there might be a few species, genera, families, etc. dangling around the root until somebody takes mercy on them and gives them a parent. But I don't remember there being quite so many. A spot check of some of the affected taxa reveals that they do have parents at the source, e.g.:
https://paleobiodb.org/data1.1/taxa/single.json?id=170202&show=attr
https://paleobiodb.org/data1.1/taxa/single.json?id=282936&show=attr
https://paleobiodb.org/data1.1/taxa/single.json?id=14054&show=attr

Is this just a matter of our data being old, and it will fix itself once we re-run the connector? Or is there something about these taxa that makes it difficult for our connector to catch their parents?

@eliagbayani
Copy link
Contributor

Hi @KatjaSchulz, this one fell into the cracks.
I'm investigating now.

@eliagbayani
Copy link
Contributor

eliagbayani commented Jul 11, 2016

Hi @KatjaSchulz ,
I saw the problem. Many of the parent IDs (e.g. 100759, 282934, 170201) from their CSV dump don’t have its own taxon entry in the CSV.
I now fixed this by using the API call (e.g. https://paleobiodb.org/data1.1/taxa/single.json?id=100759&show=attr) to get info for those taxa
and include them in the EOL-generated taxon.tab.
Latest EOL DWC-A for this resource has now been generated.

This resource has now been uploaded to the server, ready for harvesting.
I have not set the resource to force-harvest.
I cc @jhammock Jen , because I know there is a queue where resources are prioritized for harvesting.

Latest stats:
taxon: 263815
vernacularname:en: 3911
measurementorfact: 1103650
occurrence 1103650

A copy of the archive file here.

@jhammock
Copy link

Thanks, Eli! Queued up.

@KatjaSchulz
Copy link
Author

It's a big one, but it would be great to get it reharvested soon if we can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants