-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
records returned with lat lon in varying CRS but CRS or datum/geoid not provided #38
Comments
@tphilippi , Thank you very much for bringing this to our attention and for the excellent examples. You are correct that this is less than ideal and there are a few different things wrapped up in all of this. Regarding points 1 and 3, I've been wanting to improve the documentation and examples for the ridigbio package/repo and will add your suggestions to the list. I've created a new issue to begin tracking documentation changes and suggestions here: #39 . In related news, some of us at iDigBio are in the process of forming a community working group focused on consuming iDigBio and other specimen-based API data via R, and I'm hoping that improvements to the documentation and additional examples will also result from the work done there. This issue also has different components as far as adjustments to data that iDigBio does when ingesting for data quality purposes and what we make available when searching and downloading. Regarding the former, I haven't been on the project very long, and so there may have been discussions that I'm not aware of pertaining to adjustments of this type (reprojection, etc) upon ingest, whether it was considered and then shelved, or has been on a TODO list. That will be a much deeper issue. Regarding the latter, this is thankfully something that's much easier for us to deal with. One thing that can help with API search requests is that you can specify fields within the raw provider data to be returned by including them in the
This isn't the greatest from a usability standpoint, in that you'd have to then specify every single field that you're interested in, but it at least makes the data available. fields = "all" in ridigbio isn't very clear about what it's actually doing under the hood and unfortunately doesn't follow the principle of least surprise. I also note that when doing so, we see in the underlying data that the coordinate systems are listed in both abbreviated and expanded form within provider data, e.g., "NAD27" and "North American Datum 1927", but that's a relatively minor issue. Regarding availability of this information within the downloads, since iDigBio does not apply adjustments, it does not appear in the
I hope that this helps you get the information you need from your search results? |
@roncanepa Your example of exactly how to specify that field name gets me over this hurdle. [Documentation in point 3 above would be great.] I already specify a set of field names because fieldlist = "all" returns only fields with at least 1 non-missing value, so repeated calls with different bounding boxes give dataframes that don't simply rbind. I agree that the different values for the geodeticDatum field are a minor problem. As a DwC term, the recommended controlled value is an EPSG My opinion is that if new iDigBio records use EPSG, iDigBio need not work to clean those old values: gbif does that when it ingests from iDigBio or the same museums. I am completely satisfied by being able to take my returned iDigBio data, then work through translating the geodeticDatum values to full CRS/EPSG myself. Some values will be ambiguous, and I will need to go back to the contributing dataset and look at the full information if I really need that record. iDigBio has data provenance built-in so I can do that. Exposing geodeticDatum allows me to take care of the easy cases in bulk and only track down a small handful of important, ambiguous records. Your note that occurrence_raw.csv has dwc:geodeticDatum will help me with my DwC-A files from the portal (we have national parks in countries AS, GU, MP, PR, US, VI, and adjacent to CA & MX. At the least, that gives me additoinal info beyond the parsed flag value noting an issue "geopoint_datum_error". Again, I won't be able to resolve all coordinates, but I will be able to be confident in a larger fraction of records. |
Glad to hear that this will allow you to proceed. Please let us know if you have any other questions or encounter other problems. I'll also add these documentation improvements to the list. |
Records returned from calls to ridigbio::idig_search_records() can have lat lon components of geopoint in mixed coordinate systems. However, the geodeticDatum variable is not included in fields = "all", so the only indication of an issue is that the flags list includes "geopoint_datum_error".
And the utility function What is needed is the value of the geodeticDatum or epsg or CRS to allow the user to make valid use of the lat lon coordinates.
Simple Example with mix of WGS84/epsg:4326 and NAD27 lat lon:
https://www.idigbio.org/portal/records/58999bd0-d35a-4bbf-9695-a51732807867
https://www.idigbio.org/portal/records/74bec6f7-a4d8-4f8a-805a-ed63fa23fd38
idig_meta_fields() shows that the geodeticDatum in that portal page exists:
but I cannot translate that fieldName to one that works in the fields = parameter for idig_search_records().
Note that this is also an issue for the Darwin Core Archives returned from the portal, where the geodeticDatum field is not returned, and thus the values for idigbio:geoPoint in occurrence.csv cannot be properly interpreted.
So, my ask is:
Also, see ropensci/spocc#223 for a simple use-case where gbif and iDigBio return different lat lon for the same occurrence.
The text was updated successfully, but these errors were encountered: