Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added random module #18

Merged
merged 15 commits into from
Aug 27, 2024
Merged

Added random module #18

merged 15 commits into from
Aug 27, 2024

Conversation

k88hudson-cfa
Copy link
Collaborator

This PR adds a module to support creating and using random generators. It's implemented as a trait on context, with a DataPlugin to store a base random seed and a hashmap of rng instances. The design supports any rng implementation that implements rand::SeedableRandom.

The basic idea is you initialize the utility with a base random seed when setting up a context:

context.set_base_random_seed(12345678);

In a module that needs an independent random number generator, you use a macro to define an independent type, and then call context.get_rng::<YourType>() to retrieve it. For example:

define_rng!(FooRng)

let mut rng = context.get_rng::<FooRng>();
let value = rng.next_u64();

Note that an rng needs a mutable reference because it has to store internal state about previous calls in order to be reproducible.

One difference from the eosim implementation here is that I destroy existing rngs if base_seed is changed rather than using a flag on the holder.

@k88hudson-cfa
Copy link
Collaborator Author

So I gave explicit initialization a try (implementation here if you're curious: k88hudson_rng_explcit_init), which ends up looking something like this:

In your init function of the module, you call create_rng:

init() {
  // ...
  context.create_rng::<FooRng>();
}

And then later you call get_rng as normal:

some_other_method() {
  let mut foo_rng = context.get_rng::<FooRng>();

  foo_rng.next_u64();
  // ...
  foo_rng.next_u64();
}

The thing I don't like about this is that if you want to reset the base random seed, you have to deal with reinitialization (like maybe iterating through existing ones and replacing them?), which seems kind of not great

@k88hudson-cfa
Copy link
Collaborator Author

k88hudson-cfa commented Aug 15, 2024

I spent a bunch of time but I couldn't figure out how to get this to work with interior mutability for the RngHolders which also having the whole HashMap in a RefCell, so I think I'm just gonna leave it as is for now

@k88hudson-cfa k88hudson-cfa requested a review from ekr-cfa August 15, 2024 21:46
@ekr-cfa
Copy link
Collaborator

ekr-cfa commented Aug 16, 2024

I spent a bunch of time but I couldn't figure out how to get this to work with interior mutability for the RngHolders which also having the whole HashMap in a RefCell, so I think I'm just gonna leave it as is for now

Should this say "without also"?

src/random.rs Outdated
struct $random_id {}

impl $crate::random::RngId for $random_id {
// TODO: This is hardcoded to StdRng; we should replace this
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO: This is hardcoded to StdRng; we should replace this
// TODO([email protected]): This is hardcoded to StdRng; we should replace this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/random.rs Outdated
}
pub use define_rng;

pub trait RngId: Any {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to implement Any

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some discussion, turns out problem here was about the lifetime of R (the type id) in get_rng<R: RngId> , which apparently is scoped to the function whereas the closure where it is used has a static lifetime. The right fix here was to change the signature to fn get_rng<R: RngId + 'static>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could get away with just RngId: 'static so that it has a TypeId, or move the static lifetime requirement into the get_rng / sample method signatures.

src/random.rs Show resolved Hide resolved
src/random.rs Outdated
}
);

#[allow(clippy::module_name_repetitions)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment that this is going to be a trait extension on Context.

I also wonder of we should have the convention be Context<Thingy> rather than <Thingy>Context. Or maybe ContextThingyExtn

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is conceptually like the eosim code, but we also talked about a version where you didn't get the RNG but just told it what distribution to draw from. @jasonasher did you ever prototype that.

src/random.rs Outdated
.expect("You must initialize the random number generator with a base seed");

let rng_holders = data_container.rng_holders.try_borrow_mut().unwrap();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra blank line.

src/random.rs Outdated
let mut foo_rng = context.get_rng::<FooRng>();
assert_eq!(foo_rng.next_u64(), 5113542052170610017);
assert_eq!(foo_rng.next_u64(), 8640506012583485895);
assert_eq!(foo_rng.next_u64(), 16699691489468094833);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to hardcode these values? This will break if Rust changes the algorithm, which they say they might.

image

src/random.rs Outdated

#[test]
#[should_panic]
fn get_rng_one_ref_per_rng_id() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this test misnamed because the rule it is testing is that you can't have two references to any two Rngs, as you are getting BarRng.

src/random.rs Show resolved Hide resolved
src/random.rs Outdated

let mut foo_rng = context.get_rng::<FooRng>();
foo_rng.next_u64();
drop(foo_rng);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe comment this drop to explain why it's different from the previous test.

@k88hudson-cfa
Copy link
Collaborator Author

To summarize some discussion: one thing we should consider here is that we will mostly be using rngs in combination with some kind of distribution/sample function. These may be created in place by a module (like the Exp example below), and sometimes they might need to be stored as a global variable / in a data container (because they are pretty large / used in multiple places or whatever).

The design in this PR returns a RefMut you have to deal with that when you pass it to a sample function, which is maybe not ideal:

let mut foo_rng = context.get_rng::<FooRng>();
let dist = Exp::new(1 / bar).unwrap();
let value = dist.sample(&mut *foo_rng);

@jasonasher did some prototyping of an alternative design where instead of the context extension providing a get_rng method, it implements a sample that retrieves the rng internally:

let dist = context.get_data_container::<SamplerData>().unwrap();
let value = context.sample::<FooRng, usize>(|rng| dist.sample(rng));
// Aternatively, pass in the whole distribution
let value2 = context.sample_distr::<FooRng, usize>(dist);

I don't this hate honestly, if you're pretty much always using rngs this way

@jasonasher
Copy link
Collaborator

jasonasher commented Aug 18, 2024

In my experience the most common way we use random sampling is either with distributions like this or by grabbing random f64/f32s to make future decisions, such as setting individual propensities to engage in various behaviors. So having a general sample method and then sample_distr and sample_xyz for various primitive xyz would cover all of the common use cases. I'd be interested to know if @confunguido agrees.

@confunguido
Copy link
Collaborator

In my experience the most common way we use random sampling is either with distributions like this or by grabbing random f64/f32s to make future decisions, such as setting individual propensities to engage in various behaviors. So having a general sample method and then sample_distr and sample_xyz for various primitive xyz would cover all of the common use cases. I'd be interested to know if @confunguido agrees.

@jasonasher. I think I agree. Sampling from distribution, making decisions, and grabbing a random individual from a group. As a side note, I think it would be nice to have the ability to specify and sample from custom probability distributions.

@k88hudson-cfa k88hudson-cfa requested a review from ekr-cfa August 20, 2024 00:40
@k88hudson-cfa
Copy link
Collaborator Author

Alright, I think this is ready: updated this to expose sample and sample_distr – also added something to the macro to prevent name collisions.

@jasonasher
Copy link
Collaborator

jasonasher commented Aug 20, 2024

@jasonasher. I think I agree. Sampling from distribution, making decisions, and grabbing a random individual from a group. As a side note, I think it would be nice to have the ability to specify and sample from custom probability distributions.

@confunguido These would be supported right out of the box with sample_distr: rand_distr. And, by implementing the Distribution<T> trait we can support custom distributions.

@k88hudson-cfa
Copy link
Collaborator Author

So @jasonasher came up with a solution in https://github.com/CDCgov/ixa/tree/jma_rand_runtime_collision for runtime collision checking – the thing I don't like about it is that it's easy to forget to update the field that caches name collisions given the current architecture of data containers, so I think i'd still leave this as is (unless we really don't want to require paste)

@jasonasher
Copy link
Collaborator

I don't think requiring paste is much of an issue - I was more reacting to handling people potentially not using the macro. My instinct is we don't have much of an update problem here for the name field caching, as it's handled in one place and not obviously likely to change much in the future, but I could be missing something and defer to your and @ekr-cfa's judgment here.

@ekr-cfa
Copy link
Collaborator

ekr-cfa commented Aug 21, 2024

I would prefer a compile time check even if it's imperfect. I think if people color this far outside the lines, then they can suffer.

Copy link
Collaborator

@ekr-cfa ekr-cfa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/random.rs Outdated

let mut foo_rng = context.get_rng::<FooRng>();
foo_rng.next_u64();
// If you drop the first reference, you should be able to get a reference to a different rng
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If you drop the first reference, you should be able to get a reference to a different rng
// If you drop the first reference, you should be able to get another reference to an rng

The same or different, right?


#[test]
fn multiple_references_with_drop() {
let mut context = Context::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't seem to do what it says.

@ekr-cfa ekr-cfa merged commit eac036c into main Aug 27, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants