[0.3.6] (2019-07-12)
Added
- Categories rule with a plot showing unique values and count per field. By default,
report_all()
only includes fields which have less or equal to 10 unique values. See https://arche.readthedocs.io/en/latest/nbs/Rules.html#Category-fields, #100 - Category documentation
Changed
Arche.report_all()
does not shorten report by default, addedshort
parameter.- Data is consistent with Dash and Spidermon:
_type, _key
fields are dropped from dataframe, raw data, basic schema, #104, #106 df.index
now stores_key
insteadbasic_json_schema()
works withdeleted
jobsstart
is supported for Collections, #112enum
is counted as acategory
tag, #18Garbage Symbols
searches in str representation of nested fields instead of expanded df, #130- Show real coverage difference (negative\positive) instead of absolute, #114
Fixed
Arche.glance()
, #88- Item links in Schema validation errors, #89
- Empty NAN bars on category graphs, #93
data_quality_report()
, #95- Wrong number of Collection Items if it contains item 0, #112
Removed
- Responses Per Item Ratio rule
- Deprecated
expand
parameter and removedflat_df
, sinceGarbage Rule
deal with nested data itself, #133
Thanks - @ejulio @victor-torres @Gallaecio @alexander-matsievsky @ivankivanov @raphapassini @alexandr1988