-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve validate processor #171
Comments
Currently known issues:
|
IIRC all of the validations are done in underlying libraries, so we might
need to fix that there.
There's a long standing issue of moving dataflows to use frictionless
instead of tabulator/datapackage etc, so that also might be a good
motivator.
…On Sun, Sep 19, 2021 at 10:58 AM Paul Walsh ***@***.***> wrote:
Currently known issues:
1. Does not validate primary keys
2. Does not validate foreign keys
3. If field format is None (which is an invalid value according to the
spec), it validates, but fails in dump_to_sql
4. Does not validate field.constraints (e.g.: unique)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#171 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACAY5ICAX36BJDCB2J4STLUCWQ27ANCNFSM5EKDK5DQ>
.
|
Looks like the only data validation is done via tableschema.Field.cast_value:
As that only checks field values, it means that points (1) and (2) in #171 (comment) are not checked, for point (3) I'm not sure what is going on, will need to create a failing test. For point (4), https://github.com/frictionlessdata/tableschema-py/blob/main/tableschema/field.py#L138 There are all easily addressed, but I agree it may be a good motivator to explore moving this area to frictionless. |
DF.validate()
does some basic checks but doesn't validate everything that is possible based on Table Schema. In particular, it does not validate primary keys and we have noted that this creates other currently untraced bugs (e.g.: load from a package with invalid primary keys and try to dump again, the package will be incomplete).We need to explore one of:
The problem with adopting Frictionless is that it can't be incrementally adopted AFAIK - the validation is built into the Resource class and I don't know just from reading the code where that leads (if / how it complicates our code when we use different libraries for managing Frictionless Data specs). Also, it sets state in memory (seen data for primary keys and foreign keys), and I guess based on other patterns in Dataflows we would want to store that data outside of the running python process ( e.g.: using https://github.com/akariv/kvfile ).
The text was updated successfully, but these errors were encountered: