[BUG] Two series' in an import can only differ by `dcAggregate/` #862

pradh · 2022-06-07T03:50:32Z

When two series in an import differ only by dcAggregate/, it seems the Mixer might only pick one of them, because the metadata hash does not include is_dc_aggregate

This happens with the Census PEP imports because they stitch together multiple historical CSVs into an import, and for some year ranges the data isn't available and need to be aggregated (dcAggregate/). Currently, only for those aggregated years, they set dcAggregate/.

Validation:

./check_bt d/3/country/USA^Count_Person_Male frequent | ./cache_parse returns two series with import name USCensusPEP_By_Sex_Race

curl -X POST 'https://autopush.api.datacommons.org/stat/all' -d '{ "places": ["country/USA"], "stat_vars": ["Count_Person_Male"]}' | jq returns only see one series

The text was updated successfully, but these errors were encountered:

shifucun · 2022-06-07T16:36:23Z

A few general questions and thoughts on aggregated data:

When do we want to expose "is_aggregate" for an observation?
If we claim an observation is aggregated, should it be in the import_name or in the measurement_method?
If in one import, there are both aggregated and non-aggregated data, should we present them uniformly or is it necessary to differentiate to the users?

Right now, we handle aggregation data as separate import(series). In some cases, this is not user friendly, ex, City level data (raw) and County level data (aggregated) are presented as two distinct series, which from user perspective is unnecessary.

In case of this bug, it's even more subtle for the aggregation mechanism and I doubt we should expose the complexity in the final data presentation.

A non-intrusive way would be to add a metadata property for the an import and indicating what place types, variables are aggregated. If users do need to figure out the subtlety, they can look up for it from this metadata.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Two series' in an import can only differ by `dcAggregate/` #862

[BUG] Two series' in an import can only differ by `dcAggregate/` #862

pradh commented Jun 7, 2022

shifucun commented Jun 7, 2022 •

edited

Loading

[BUG] Two series' in an import can only differ by dcAggregate/ #862

[BUG] Two series' in an import can only differ by dcAggregate/ #862

Comments

pradh commented Jun 7, 2022

shifucun commented Jun 7, 2022 • edited Loading

[BUG] Two series' in an import can only differ by `dcAggregate/` #862

[BUG] Two series' in an import can only differ by `dcAggregate/` #862

shifucun commented Jun 7, 2022 •

edited

Loading