Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gce pop loader fix tract #9

Merged
merged 5 commits into from
Nov 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,5 @@ csv = "1.3.1"
ixa = { git = "https://github.com/cdcgov/ixa", version = "0.0.1" }
ixa-derive = { git = "https://github.com/cdcgov/ixa", version = "0.0.0" }
serde = "1.0.215"
serde_derive = "1.0.215"
tempfile = "3.14.0"
1 change: 0 additions & 1 deletion src/parameters_loader.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct ParametersValues {
pub population: usize,
pub max_time: f64,
pub seed: u64,
pub r_0: f64,
Expand Down
124 changes: 61 additions & 63 deletions src/population_loader.rs
Original file line number Diff line number Diff line change
@@ -1,112 +1,110 @@
use crate::parameters_loader::Parameters;
use ixa::{
context::Context,
define_derived_property, define_person_property, define_person_property_with_default,
define_person_property, define_person_property_with_default,
error::IxaError,
global_properties::ContextGlobalPropertiesExt,
people::{ContextPeopleExt, PersonId},
};

use serde::Deserialize;
use std::path::PathBuf;

#[derive(Deserialize, Debug)]
#[allow(non_snake_case)]
pub struct PeopleRecord {
pub struct PeopleRecord<'a> {
age: u8,
homeId: usize,
homeId: &'a [u8],
}

define_person_property!(Age, u8);
define_person_property!(HomeId, usize);
define_person_property_with_default!(Alive, bool, true);

#[allow(clippy::cast_possible_truncation)]
#[allow(clippy::cast_sign_loss)]
const CENSUS_MAX: usize = 1e15 as usize;

#[allow(clippy::cast_possible_truncation)]
#[allow(clippy::cast_sign_loss)]
const CENSUS_MIN: usize = 1e14 as usize;

define_derived_property!(CensusTract, usize, [HomeId], |home_id| {
if (CENSUS_MIN..CENSUS_MAX).contains(&home_id) {
home_id / 10_000
} else {
0 //Err(IxaError::IxaError(String::from("Census tract invalid from homeId")))
}
});
define_person_property!(CensusTract, usize);

fn create_person_from_record(
context: &mut Context,
person_record: &PeopleRecord,
) -> Result<PersonId, IxaError> {
let person_id =
context.add_person(((Age, person_record.age), (HomeId, person_record.homeId)))?;
let tract: usize = String::from_utf8(person_record.homeId[..11].to_owned())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand why tract should be an independent property? I am not fully convinced. I think Jason did it this way in rust-nucleus because he didn't have derived person properties.

If we choose to stick with this, I wonder if it's worth pulling the 11 out into a config.json or something like that so there isn't this hardcoding in case the synth pop ever changes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought was to have something minimal in main without derived properties. My perception is that tract should be a group and not necessarily a person property. I suggest we let it be here and think about it as an issue since I think the 11 is very standard for fips codes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this justification makes a ton of sense and sounds good. Look forward, I'd be curious what you think, then, about also having a more general group where people fit into households (a one-to-one group), and households fit into census tracts (a meta group that is also one-to-one)?

.expect("Home id should have 11 digits for tract + home id")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add Utf8Error to the list of potential IxaErrors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should. To clarify, you mean ixa and not this repo. Right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yessir!

.parse()
.expect("Could not parse census tract");
let home_id: usize = String::from_utf8(person_record.homeId.to_owned())
.expect("Could not read home id")
.parse()
.expect("Could not read home id");

let person_id = context.add_person((
(Age, person_record.age),
(HomeId, home_id),
(CensusTract, tract),
))?;
Ok(person_id)
}

fn load_synth_population(context: &mut Context, synth_input_file: PathBuf) -> Result<(), IxaError> {
let mut reader =
csv::Reader::from_path(synth_input_file).expect("Failed to open file. No headers found.");
let mut raw_record = csv::ByteRecord::new();
let headers = reader.byte_headers().unwrap().clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should at least expect with No headers found or something if we're not going to error propagate here. The documentation tends to error propagation here: https://docs.rs/csv/latest/csv/struct.Reader.html#method.byte_headers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't error propagate because IxaError doesn't have a way to convert from csv to IxaError. I was wondering whether we should do that here or in Ixa. But didn't think was appropriate for this skeleton.


while reader.read_byte_record(&mut raw_record).unwrap() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let record: PeopleRecord = raw_record
.deserialize(Some(&headers))
.expect("Failed to parse record.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, would also error propagate here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about IxaError conversion from csv. Do you have a suggestion of how to deal with that here without incorporating it on ixa? As you said above, I think we should include csv and utf8 errors on ixa but don't know how to deal with that as right now.

create_person_from_record(context, &record)?;
}
Ok(())
}

pub fn init(context: &mut Context) -> Result<(), IxaError> {
let parameters = context
.get_global_property_value(Parameters)
.unwrap()
.clone();

let mut reader =
csv::Reader::from_path(parameters.synth_population_file).expect("Failed to open file.");

for result in reader.deserialize() {
let record: PeopleRecord = result.expect("Failed to parse record.");
create_person_from_record(context, &record)?;
}
load_synth_population(context, PathBuf::from(parameters.synth_population_file))?;
context.index_property(Age);
context.index_property(CensusTract);
Ok(())
}

#[cfg(test)]

mod tests {
mod test {
use super::*;
use ixa::{context::Context, people::ContextPeopleExt, random::ContextRandomExt};
use std::fs::File;
use std::io::Write;
use std::path::PathBuf;
use tempfile::tempdir;

#[test]
#[allow(clippy::inconsistent_digit_grouping)]
fn test_create_person_from_record() {
fn check_synth_file_tract() {
let mut context = Context::new();
context.init_random(0);
let record = PeopleRecord {
age: 42,
homeId: 36_09_30_33102_0005,
};
let person_id = create_person_from_record(&mut context, &record).unwrap();
assert_eq!(context.get_person_property(person_id, Age), 42);
assert_eq!(
context.get_person_property(person_id, HomeId),
36_09_30_33102_0005
);
assert!(context.get_person_property(person_id, Alive));
let temp_dir = tempdir().unwrap();
let path = PathBuf::from(&temp_dir.path());
let persisted_file = path.join("synth_pop_test.csv");
let mut file = File::create(persisted_file).unwrap();
file.write_all(
b"
age,homeId
43,360930331020001
42,360930331020002",
)
.unwrap();
let synth_file = path.join("synth_pop_test.csv");
load_synth_population(&mut context, synth_file).unwrap();
let age: u8 = 43;
let tract: usize = 36_093_033_102;

assert_eq!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about also testing whether there are only two people in the population as we expect and that the properties of the other person also match?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. I think that would be good to merge WIll's branch to main. I wasn't trying to do that in this PR.

context.get_person_property(person_id, CensusTract),
36_09_30_33102
age,
context.get_person_property(context.get_person_id(0), Age)
);
}

#[test]
#[allow(clippy::inconsistent_digit_grouping)]
fn test_create_person_from_record_invalid_census_tract() {
let mut context = Context::new();
context.init_random(0);
let record = PeopleRecord {
age: 42,
homeId: 36_09_30_33102_0005_0005,
};
let person_id = create_person_from_record(&mut context, &record).unwrap();
assert_eq!(context.get_person_property(person_id, Age), 42);
assert_eq!(
context.get_person_property(person_id, HomeId),
36_09_30_33102_0005_0005
tract,
context.get_person_property(context.get_person_id(0), CensusTract)
);
assert!(context.get_person_property(person_id, Alive));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the check that the person is alive?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually removed both tests so this test is only to check that I can extract the age and census tract, not the person creation. Also, this is a PR to will's branch, not to main. So, was expecting Will to continue on the tests.

assert_eq!(context.get_person_property(person_id, CensusTract), 0);
}
}
Loading