Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report termination information #28

Merged
merged 7 commits into from
Dec 19, 2024
Merged

Report termination information #28

merged 7 commits into from
Dec 19, 2024

Conversation

afmagee42
Copy link
Contributor

@afmagee42 afmagee42 commented Dec 17, 2024

Revised per synchronous discussion, this PR now:

  • Tracks simulation termination
  • Excludes simulations from computation when they stop due to max_infections
    • It also checks for 0 successful simulations and throws an st.error instead of a runtime error.
  • Warns users if max_infections is being hit and requires them to acknowledge the potentially-fraught results.
  • Adds to the spin wheel text to say that simulations being very slow is probably bad news.

Known issues:

  • I'm pretty sure if you say "show me anyways" it re-runs the simulations. We could maybe wrap the entire display code into a function and decorate it with an st.fragment but since this is already ill-advised usage, that seems like overkill.

@afmagee42 afmagee42 requested a review from swo December 17, 2024 18:56
@afmagee42 afmagee42 linked an issue Dec 17, 2024 that may be closed by this pull request
Copy link
Contributor

@swo swo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's a good idea to indicate whether you terminated because of hitting max # of infections

But I think the stuff about "last complete generation" can be retrieved from the simulation results. I'd rather compute all the summary values (e.g., total # of infections, last complete generation, etc.) in the summary output widget/postprocessing.

We can incorporate the "hit max infections" flag into summary outputs. Like, we can define "successful control" as <max infections AND zero infections in CoCoCs.

Thoughts?

@afmagee42
Copy link
Contributor Author

But I think the stuff about "last complete generation" can be retrieved from the simulation results. I'd rather compute all the summary values (e.g., total # of infections, last complete generation, etc.) in the summary output widget/postprocessing.

I don't see how? Once the queue is gone, how do I know if a generation was in-progress or not?

Example:

set max_infections = 5
infection `0` draws infections at time 1, 2, 3
at time 1, new infection `1` draws times 1.1, 1.2, 1.3, 1.4
len(queue) > max_infections
stop

We never hit the infection at time 2, so no generations are complete (or, I guess technically generation 0 is), so Simulation.infections just contains ["0", "1"], and would appear to contain people from the first generation, but that generation is not complete.

We can incorporate the "hit max infections" flag into summary outputs. Like, we can define "successful control" as <max infections AND zero infections in CoCoCs.

I don't follow

@swo
Copy link
Contributor

swo commented Dec 17, 2024

I don't see how? Once the queue is gone, how do I know if a generation was in-progress or not?

I see your point: if infection N_max occurred in generation G, you don't know if the N+1-th would have been in G or G+1

But my hypothesis is that we don't need to know in precisely which ring the simulation finished for any of the relevant outputs? Like, I'm seeing hitting max # of infections as a sign that you're asking this widget to perform outside its specs

We can incorporate the "hit max infections" flag into summary outputs. Like, we can define "successful control" as <max infections AND zero infections in CoCoCs.

I don't follow

If we hit max # of infections, I'd like the main output to be: hey you either need to change the params or you can kiss control goodbye, probably

Like, maybe there's some world where you want some % of chains to go extinct, but you're also OK with some chains growing to >100, but that just doesn't feel epidemiologically relevant

I think the outputs I want are:

  • Did you achieve control? where "control" means you had fewer than the max # of infections AND no infected contacts-of-contacts-of-contacts
  • How many infections total were there? Show a histogram(?) truncated at the max # of infections

If we really do need more rigorous stats, I'd rather just demand that we push the max # of infections higher. Again, this is just a widget to give you an idea of outcomes.

@afmagee42
Copy link
Contributor Author

I agree that if you don't set max_infections to something ridiculous (like 5) then hitting it is probably a sign you've blown past control.

I suppose whether we discard those simulations or incorrectly use them, we're still stuck with the fact that conditional summaries are biased towards things looking rosier.

@afmagee42
Copy link
Contributor Author

In other words:

  • To properly compute Pr(containment | # infections < M) or Pr(# infections = m | # infections < M) we need this information that we're not tracking, but I believe Pr(containment | # infections < M) > Pr(containment) and E[# infections = m | # infections < M] < E[# infections = m].
  • If we take the event of maxing-out the infecteds to be non-containment
    • We might have a decent approximation to Pr(containment)
    • I don't know what the probability distribution on sizes is here, it's not a well-specified quantity

@swo
Copy link
Contributor

swo commented Dec 17, 2024

Right, I'm saying that the constant M should be large enough that Pr(containment | # infections >= M) is negligible, so that Pr(containment) ~= P(cont | #inf < M) * P(#inf < M).

But for now, let's just have a big warning sign that says "you hit max # of infections in x% of simulations; interpret results with care" and see if this is actually a problem before we try to repair the statistics

@afmagee42
Copy link
Contributor Author

afmagee42 commented Dec 17, 2024

This is going to bug me big time if we don't get it satisfactorily right-ish the first time.

But on reflection, I was being a bit pessimistic and I do think we can get something which is both interpretable and sensible if we:

Track early termination
Force max_infections to not be tiny (say, a minimum of 100)
Then we can

Make the "maxing out means failed containment" assumption to appropriately compute Pr(containment)

1 - Pr(hit max) - Pr(no containmnent | did not hit max)
Provide your suggested conditional summary of sizes, the probability distribution Pr(size | did not hit max) ≡ Pr(size | size < max), this is just the histogram of final sizes of all non-early-terminated simulations (including extinct ones)

(sorry, @swo tried to quote this comment but instead edited it and now I can't revert it)

@afmagee42
Copy link
Contributor Author

We can also add warnings about things blowing up impacting results, and a "if your simulations are taking too long, you've probably got an absurdly large $R_e$, maybe don't do that" warning too

@swo
Copy link
Contributor

swo commented Dec 18, 2024

  1. Force max_infections to not be tiny (say, a minimum of 100)

Yes, this was my thought; this is mostly an escape valve so that you're not sitting for a long time and waiting for a simulation to finish

But we should also maybe make this less possible via more sensible slider rangers

@afmagee42 afmagee42 force-pushed the afm-report-termination branch from 62b6290 to 117a0ad Compare December 18, 2024 19:22
@afmagee42 afmagee42 requested a review from swo December 18, 2024 20:29
Copy link
Contributor

@swo swo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; some things to maybe change downstream; I'll open issues

ringvax/__init__.py Show resolved Hide resolved
ringvax/__init__.py Show resolved Hide resolved
ringvax/app.py Outdated Show resolved Hide resolved
st.header(
f"Probability of control: {pr_control:.0%}",
help=f"The probability that there are no infections in the {format_control_gens(control_generations)}, or equivalently that the {format_control_gens(control_generations - 1)} do not produce any further infections.",
show = True if n_at_max == 0 else False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some way to refactor all this for mild cleanliness; I'll open a PR if I have something useful to say

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that logic is definitely not the prettiest, happy if there's a cleaner way to express "plot the results unless the user pushes the big red override button"

ringvax/summary.py Show resolved Hide resolved
@swo swo force-pushed the afm-report-termination branch from 6a6ba27 to ee038a4 Compare December 18, 2024 22:44
* add progress bar

* add progress text
@afmagee42
Copy link
Contributor Author

@swo I well and truly appreciate all the improvements but I will not approve any more PRs into this PR above #45. Anything else can be handled as a new PR into main later

swo added 2 commits December 19, 2024 10:05
* refactoring

* use polars schema

* refactor lists
@afmagee42 afmagee42 merged commit e82f2e5 into main Dec 19, 2024
2 checks passed
@afmagee42 afmagee42 deleted the afm-report-termination branch December 19, 2024 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Report information on simulation termination
2 participants