Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gce pop loader fix tract #9

Merged
merged 5 commits into from
Nov 25, 2024
Merged

Conversation

confunguido
Copy link
Collaborator

Suggesting some changes to how we are reading the population and for the test. I think we should try census tract as a person property, at least for now.

Copy link
Collaborator

@ChiragKumar9 ChiragKumar9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly small things, my main question is about the census tract not being a derived property. I feel like that at least deserves a justification. Thanks!


fn create_person_from_record(
context: &mut Context,
person_record: &PeopleRecord,
) -> Result<PersonId, IxaError> {
let person_id =
context.add_person(((Age, person_record.age), (HomeId, person_record.homeId)))?;
let tract: usize = String::from_utf8(person_record.homeId[..11].to_owned())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand why tract should be an independent property? I am not fully convinced. I think Jason did it this way in rust-nucleus because he didn't have derived person properties.

If we choose to stick with this, I wonder if it's worth pulling the 11 out into a config.json or something like that so there isn't this hardcoding in case the synth pop ever changes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought was to have something minimal in main without derived properties. My perception is that tract should be a group and not necessarily a person property. I suggest we let it be here and think about it as an issue since I think the 11 is very standard for fips codes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this justification makes a ton of sense and sounds good. Look forward, I'd be curious what you think, then, about also having a more general group where people fit into households (a one-to-one group), and households fit into census tracts (a meta group that is also one-to-one)?

let person_id =
context.add_person(((Age, person_record.age), (HomeId, person_record.homeId)))?;
let tract: usize = String::from_utf8(person_record.homeId[..11].to_owned())
.expect("Home id should have 11 digits for tract + home id")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add Utf8Error to the list of potential IxaErrors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should. To clarify, you mean ixa and not this repo. Right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yessir!

fn load_synth_population(context: &mut Context, synth_input_file: PathBuf) -> Result<(), IxaError> {
let mut reader = csv::Reader::from_path(synth_input_file).expect("Failed to open file.");
let mut raw_record = csv::ByteRecord::new();
let headers = reader.byte_headers().unwrap().clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should at least expect with No headers found or something if we're not going to error propagate here. The documentation tends to error propagation here: https://docs.rs/csv/latest/csv/struct.Reader.html#method.byte_headers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't error propagate because IxaError doesn't have a way to convert from csv to IxaError. I was wondering whether we should do that here or in Ixa. But didn't think was appropriate for this skeleton.

let mut raw_record = csv::ByteRecord::new();
let headers = reader.byte_headers().unwrap().clone();

while reader.read_byte_record(&mut raw_record).unwrap() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while reader.read_byte_record(&mut raw_record).unwrap() {
let record: PeopleRecord = raw_record
.deserialize(Some(&headers))
.expect("Failed to parse record.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, would also error propagate here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about IxaError conversion from csv. Do you have a suggestion of how to deal with that here without incorporating it on ixa? As you said above, I think we should include csv and utf8 errors on ixa but don't know how to deal with that as right now.

);
assert!(context.get_person_property(person_id, Alive));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the check that the person is alive?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually removed both tests so this test is only to check that I can extract the age and census tract, not the person creation. Also, this is a PR to will's branch, not to main. So, was expecting Will to continue on the tests.

load_synth_population(&mut context, synth_file).unwrap();
let age: u8 = 43;
let tract: usize = 36_093_033_102;

assert_eq!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about also testing whether there are only two people in the population as we expect and that the properties of the other person also match?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. I think that would be good to merge WIll's branch to main. I wasn't trying to do that in this PR.

@ChiragKumar9 ChiragKumar9 self-requested a review November 24, 2024 23:02
@confunguido confunguido merged commit f95e3a0 into wtk_pop_loader Nov 25, 2024
3 checks passed
@ChiragKumar9 ChiragKumar9 deleted the gce_pop_loader_fix_tract branch November 26, 2024 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants