-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'universe' target should not be an option for normalization #326
Comments
We have removed universe completely the last dump and are testing right now to see if we find any problems. |
Discovered a problem with the removal of 'universe' metadata today! Enriching polygons with US median household income returns the following error: "Analysis B2 failed: {"Median or average aggregation for polygons requires a denominator to provide weights. Please review the provided options"}" Previously, I thought universe targets were not used in the DO query, but I was proven wrong today. I believe it's use comes up here, with predenominated polygon interpolation weighted by universe: |
TODO:
|
The It is done using a formula like this:
|
Question time.
|
Answering the questions:
|
Wanted to add some two cents on the big picture customer perspective and some resources I've found when researching this problem. The big picture “objective” is that the DO analysis should be able to enrich arbitrary polygons with measures that are medians and measures that are averages. It’s a common customer use case to draw an AOI and ask “what is the median income in this area?” or "what's the average age?" I saw this issue/other project with discussion that sounds very helpful in terms of performing "medians of medians": censusreporter/census-aggregate#1 It seems there's something called Pareto Interpolation: https://en.wikipedia.org/wiki/Pareto_interpolation And also: http://mumford.albany.edu/census/CityProfiles/Profiles/MHHINote.htm While reading those links and their approaches, I got another related idea that’s a bit broader but would also be great to address somehow. It’s also been a common customer request (one we haven’t been able to deliver on) to ask “how many people making more than $50,000 annually live in this arbitrary AOI?“, or “how many people between the ages of 20 and 45 live in this AOI?” But we’ve never been able to do “population above $X”/“pop between $X and Y” or “pop above A age”/“pop between A and B ages” queries because the Census data (at least for the USA) has those values binned at $5,000 and 5 year increments - we have “pop with age 25 to 29", 30-34, 35-39, etc etc and similar $5k bins for income counts (and probably other measures too). Someone would have to enrich a separate column for each 5 value bin to approximate their requested range, and then use SQL arithmetic / field calculation to add up those multiple column values and come up with a sum total for their range, or at least a 5-value rounded estimate of their range. If we think about brand demographic profiles like “affluent millenials”, they’re defined as ranges - 20-35 y/o, $50-$100k, etc. There’s no easy way to use the DO as it is to analyze for profiles like that, and we might be able to get there using some of the same approaches as doing median income for polygons... |
Answering @michellemho
The calculation of each
|
There are two types of "targets" in bigmetadata; universe and denominator. The denominator targets indicate how the variable can be normalized. The universe targets captures the variable used to calculate the median or per capita measurement. We should remove universe targets from the normalization option. We might also just remove the universe target entirely, because is there any known use for it? I'm not sure.
The text was updated successfully, but these errors were encountered: