You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When two series in an import differ only by dcAggregate/, it seems the Mixer might only pick one of them, because the metadata hash does not include is_dc_aggregate
This happens with the Census PEP imports because they stitch together multiple historical CSVs into an import, and for some year ranges the data isn't available and need to be aggregated (dcAggregate/). Currently, only for those aggregated years, they set dcAggregate/.
Validation:
./check_bt d/3/country/USA^Count_Person_Male frequent | ./cache_parse returns two series with import name USCensusPEP_By_Sex_Race
curl -X POST 'https://autopush.api.datacommons.org/stat/all' -d '{ "places": ["country/USA"], "stat_vars": ["Count_Person_Male"]}' | jq returns only see one series
The text was updated successfully, but these errors were encountered:
A few general questions and thoughts on aggregated data:
When do we want to expose "is_aggregate" for an observation?
If we claim an observation is aggregated, should it be in the import_name or in the measurement_method?
If in one import, there are both aggregated and non-aggregated data, should we present them uniformly or is it necessary to differentiate to the users?
Right now, we handle aggregation data as separate import(series). In some cases, this is not user friendly, ex, City level data (raw) and County level data (aggregated) are presented as two distinct series, which from user perspective is unnecessary.
In case of this bug, it's even more subtle for the aggregation mechanism and I doubt we should expose the complexity in the final data presentation.
A non-intrusive way would be to add a metadata property for the an import and indicating what place types, variables are aggregated. If users do need to figure out the subtlety, they can look up for it from this metadata.
When two series in an import differ only by
dcAggregate/
, it seems the Mixer might only pick one of them, because the metadata hash does not include is_dc_aggregateThis happens with the Census PEP imports because they stitch together multiple historical CSVs into an import, and for some year ranges the data isn't available and need to be aggregated (dcAggregate/). Currently, only for those aggregated years, they set
dcAggregate/
.Validation:
./check_bt d/3/country/USA^Count_Person_Male frequent | ./cache_parse
returns two series with import nameUSCensusPEP_By_Sex_Race
curl -X POST 'https://autopush.api.datacommons.org/stat/all' -d '{ "places": ["country/USA"], "stat_vars": ["Count_Person_Male"]}' | jq
returns only see one seriesThe text was updated successfully, but these errors were encountered: