-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gce pop loader fix tract #9
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,112 +1,110 @@ | ||
use crate::parameters_loader::Parameters; | ||
use ixa::{ | ||
context::Context, | ||
define_derived_property, define_person_property, define_person_property_with_default, | ||
define_person_property, define_person_property_with_default, | ||
error::IxaError, | ||
global_properties::ContextGlobalPropertiesExt, | ||
people::{ContextPeopleExt, PersonId}, | ||
}; | ||
|
||
use serde::Deserialize; | ||
use std::path::PathBuf; | ||
|
||
#[derive(Deserialize, Debug)] | ||
#[allow(non_snake_case)] | ||
pub struct PeopleRecord { | ||
pub struct PeopleRecord<'a> { | ||
age: u8, | ||
homeId: usize, | ||
homeId: &'a [u8], | ||
} | ||
|
||
define_person_property!(Age, u8); | ||
define_person_property!(HomeId, usize); | ||
define_person_property_with_default!(Alive, bool, true); | ||
|
||
#[allow(clippy::cast_possible_truncation)] | ||
#[allow(clippy::cast_sign_loss)] | ||
const CENSUS_MAX: usize = 1e15 as usize; | ||
|
||
#[allow(clippy::cast_possible_truncation)] | ||
#[allow(clippy::cast_sign_loss)] | ||
const CENSUS_MIN: usize = 1e14 as usize; | ||
|
||
define_derived_property!(CensusTract, usize, [HomeId], |home_id| { | ||
if (CENSUS_MIN..CENSUS_MAX).contains(&home_id) { | ||
home_id / 10_000 | ||
} else { | ||
0 //Err(IxaError::IxaError(String::from("Census tract invalid from homeId"))) | ||
} | ||
}); | ||
define_person_property!(CensusTract, usize); | ||
|
||
fn create_person_from_record( | ||
context: &mut Context, | ||
person_record: &PeopleRecord, | ||
) -> Result<PersonId, IxaError> { | ||
let person_id = | ||
context.add_person(((Age, person_record.age), (HomeId, person_record.homeId)))?; | ||
let tract: usize = String::from_utf8(person_record.homeId[..11].to_owned()) | ||
.expect("Home id should have 11 digits for tract + home id") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we should. To clarify, you mean ixa and not this repo. Right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yessir! |
||
.parse() | ||
.expect("Could not parse census tract"); | ||
let home_id: usize = String::from_utf8(person_record.homeId.to_owned()) | ||
.expect("Could not read home id") | ||
.parse() | ||
.expect("Could not read home id"); | ||
|
||
let person_id = context.add_person(( | ||
(Age, person_record.age), | ||
(HomeId, home_id), | ||
(CensusTract, tract), | ||
))?; | ||
Ok(person_id) | ||
} | ||
|
||
fn load_synth_population(context: &mut Context, synth_input_file: PathBuf) -> Result<(), IxaError> { | ||
let mut reader = | ||
csv::Reader::from_path(synth_input_file).expect("Failed to open file. No headers found."); | ||
let mut raw_record = csv::ByteRecord::new(); | ||
let headers = reader.byte_headers().unwrap().clone(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should at least expect with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didn't error propagate because IxaError doesn't have a way to convert from csv to IxaError. I was wondering whether we should do that here or in Ixa. But didn't think was appropriate for this skeleton. |
||
|
||
while reader.read_byte_record(&mut raw_record).unwrap() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment about error propagation: https://docs.rs/csv/latest/csv/struct.Reader.html#method.read_byte_record. |
||
let record: PeopleRecord = raw_record | ||
.deserialize(Some(&headers)) | ||
.expect("Failed to parse record."); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, would also error propagate here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment about IxaError conversion from csv. Do you have a suggestion of how to deal with that here without incorporating it on ixa? As you said above, I think we should include csv and utf8 errors on ixa but don't know how to deal with that as right now. |
||
create_person_from_record(context, &record)?; | ||
} | ||
Ok(()) | ||
} | ||
|
||
pub fn init(context: &mut Context) -> Result<(), IxaError> { | ||
let parameters = context | ||
.get_global_property_value(Parameters) | ||
.unwrap() | ||
.clone(); | ||
|
||
let mut reader = | ||
csv::Reader::from_path(parameters.synth_population_file).expect("Failed to open file."); | ||
|
||
for result in reader.deserialize() { | ||
let record: PeopleRecord = result.expect("Failed to parse record."); | ||
create_person_from_record(context, &record)?; | ||
} | ||
load_synth_population(context, PathBuf::from(parameters.synth_population_file))?; | ||
context.index_property(Age); | ||
context.index_property(CensusTract); | ||
Ok(()) | ||
} | ||
|
||
#[cfg(test)] | ||
|
||
mod tests { | ||
mod test { | ||
use super::*; | ||
use ixa::{context::Context, people::ContextPeopleExt, random::ContextRandomExt}; | ||
use std::fs::File; | ||
use std::io::Write; | ||
use std::path::PathBuf; | ||
use tempfile::tempdir; | ||
|
||
#[test] | ||
#[allow(clippy::inconsistent_digit_grouping)] | ||
fn test_create_person_from_record() { | ||
fn check_synth_file_tract() { | ||
let mut context = Context::new(); | ||
context.init_random(0); | ||
let record = PeopleRecord { | ||
age: 42, | ||
homeId: 36_09_30_33102_0005, | ||
}; | ||
let person_id = create_person_from_record(&mut context, &record).unwrap(); | ||
assert_eq!(context.get_person_property(person_id, Age), 42); | ||
assert_eq!( | ||
context.get_person_property(person_id, HomeId), | ||
36_09_30_33102_0005 | ||
); | ||
assert!(context.get_person_property(person_id, Alive)); | ||
let temp_dir = tempdir().unwrap(); | ||
let path = PathBuf::from(&temp_dir.path()); | ||
let persisted_file = path.join("synth_pop_test.csv"); | ||
let mut file = File::create(persisted_file).unwrap(); | ||
file.write_all( | ||
b" | ||
age,homeId | ||
43,360930331020001 | ||
42,360930331020002", | ||
) | ||
.unwrap(); | ||
let synth_file = path.join("synth_pop_test.csv"); | ||
load_synth_population(&mut context, synth_file).unwrap(); | ||
let age: u8 = 43; | ||
let tract: usize = 36_093_033_102; | ||
|
||
assert_eq!( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about also testing whether there are only two people in the population as we expect and that the properties of the other person also match? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could. I think that would be good to merge WIll's branch to main. I wasn't trying to do that in this PR. |
||
context.get_person_property(person_id, CensusTract), | ||
36_09_30_33102 | ||
age, | ||
context.get_person_property(context.get_person_id(0), Age) | ||
); | ||
} | ||
|
||
#[test] | ||
#[allow(clippy::inconsistent_digit_grouping)] | ||
fn test_create_person_from_record_invalid_census_tract() { | ||
let mut context = Context::new(); | ||
context.init_random(0); | ||
let record = PeopleRecord { | ||
age: 42, | ||
homeId: 36_09_30_33102_0005_0005, | ||
}; | ||
let person_id = create_person_from_record(&mut context, &record).unwrap(); | ||
assert_eq!(context.get_person_property(person_id, Age), 42); | ||
assert_eq!( | ||
context.get_person_property(person_id, HomeId), | ||
36_09_30_33102_0005_0005 | ||
tract, | ||
context.get_person_property(context.get_person_id(0), CensusTract) | ||
); | ||
assert!(context.get_person_property(person_id, Alive)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why remove the check that the person is alive? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I actually removed both tests so this test is only to check that I can extract the age and census tract, not the person creation. Also, this is a PR to will's branch, not to main. So, was expecting Will to continue on the tests. |
||
assert_eq!(context.get_person_property(person_id, CensusTract), 0); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand why tract should be an independent property? I am not fully convinced. I think Jason did it this way in rust-nucleus because he didn't have derived person properties.
If we choose to stick with this, I wonder if it's worth pulling the 11 out into a
config.json
or something like that so there isn't this hardcoding in case the synth pop ever changes?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first thought was to have something minimal in main without derived properties. My perception is that tract should be a group and not necessarily a person property. I suggest we let it be here and think about it as an issue since I think the 11 is very standard for fips codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this justification makes a ton of sense and sounds good. Look forward, I'd be curious what you think, then, about also having a more general group where people fit into households (a one-to-one group), and households fit into census tracts (a meta group that is also one-to-one)?