-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abstractions to extend catalog to other data models #57
Comments
What we have:
Desiderata:
|
I agree.
I believe this is something that should be updated in the extensions. The extensions should define enums as appropriate per well-known CV properties to validate the same way through JSON schemas. Those schemas could be generated from
I've added this a while back: stac-utils/pystac#1320 Can you provide an example of a problematic case that errors were easier to analyze with |
I have two issues with pystac validation:
|
I think it should be possible to address both points fairly easily. Thanks for detailing. I have faced (2) as well. Iterating over the nested errors and print them one by one can solve this, instead of "only" reporting the generic top-level error. It creates error steps similar to a traceback. For (1), I think they could simply be omitted, since iterating over errors could print the nested JSON fields along the way, giving enough context to interpret the full JSON. |
I'm working on a new CMIP6-CORDEX extension and implementation for the catalog, so we can deploy it at Ouranos.
I'm hitting a few issues that I wanted to discuss.
Abstraction
Adding a new extension and implementation requires copy-pasting a fair amount of boilerplate code, where we only change class names and one attribute name (CMIP6 -> CORDEX6). I don't mind doing this once, but we'll want to add other data models in the future and I can see the mess it would generate to have multiple copies of the same logic. I think this calls for a higher level abstraction than what we have now.
Validation
The current CMIP6 extension does validation at two places: in the Pydantic base model, and in the pystac.Item through jsonschema. Deepak created a custom jsonschema for the CMIP data that we currently use. There's now an official cmip6 jsonschema, but I don't think it would validate against the UoT catalog. It looks to be made for ESGF-held datasets, which apparently hold additional attributes.
None of these two schemas actually check that fields comply with the CV, they just make sure a value is present. So we use a pydantic.BaseModel to check that metadata attributes conform to the Controlled Vocabulary (CV).
Then we call item.validate(), which compares STAC Item attributes against Deepack' jsonschema.
My experience so far is that it's much easier to debug errors with Pydantic traceback than with the pystac.validation traceback. However, if a jsonschema already exists, I don't want to recopy that logic in pydantic. Also, our pydantic data model only checks metadata attributes, not the STAC attributes.
Customization
My experience is that catalog admins want to have some flexibility regarding the metadata to include in catalog entries. If a jsonschema exist, you might not want to include all of its properties in your catalog, and you may want to include additional properties not found in the schema. My understanding is that you cannot "unrequire" a property by subclassing or extending a jsonschema, meaning we'd have to maintain modified versions of "official" schemas to achieve the flexibility required. So in the Abstractions discussed above, I think we need to account for the need to decouple official external jsonschemas for metadata attributes from our STAC records.
The text was updated successfully, but these errors were encountered: