+
+
+
+
+
+
+
+Getting stated with EpiAware
This tutorial introduces the basic functionality of EpiAware
. EpiAware
is a package for making inferences on epidemiological case/determined infection data using a model-based approach.
EpiAware
models
The models we consider are discrete-time $t = 1,\dots, T$ with a latent random process, $Z_t$ generating stochasticity in the number of new infections $I_t$ at each time step. Observations are treated as downstream random variables determined by the actual infections and a model of infection to observation delay.
Mathematical definition
$$\begin{align}
+Z_\cdot &\sim \mathcal{P}(\mathbb{R}^T) | \theta_Z, \\
+I_0 &\sim f_0(\theta_I), \\
+I_t &\sim g_I(\{I_s, Z_s\}_{s < t}, \theta_{I}), \\
+y_t &\sim f_O(\{I_s\}_{s \leq t}, \theta_{O}).
+\end{align}$$
Where, $\mathcal{P}(\mathbb{R}^T) | \theta_Z$ is a parametric process on $\mathbb{R}^T$. $f_0$ and $f_O$ are parametric distributions on, respectively, the initial number of infections and the observed case data conditional on underlying infections. $g_I$ is distribution of new infections conditional on infections and latent process in the past. Note that we assume that new infections are conditional on the strict past, whereas new observations can depend on infections on the same time step.
Code structure outline
An EpiAware
model in code is created from three modular components:
A LatentModel
: This defines the model for $Z_\cdot$. This chooses $\mathcal{P}(\mathbb{R}^T) | \theta_Z$.
An EpiModel
: This defines a generative process for infections conditional on the latent process. This chooses $f_0(\theta_I)$, and $g_I(\{I_s, Z_s\}_{s < t}, \theta_{I})$.
An ObservationModel
: This defines the observation model. This chooses $f_O({I_s}_{s \leq t}, \theta_{O})$
Reproductive number
EpiAware
models do not need to specify a time-varying reproductive number $\mathcal{R}_t$ to generate $I_\cdot$, however, this is often a quantity of interest. When not directly used we will typically back-calculate $\mathcal{R}_t$ from the generated infections:
$$\mathcal{R}_t = {I_t \over \sum_{s \geq 1} g_s I_{t-s} }.$$
Where $g_s$ is a discrete generation interval. For this reason, even when not using a reproductive number approach directly, we ask for a generation interval.
+
+begin
+ using EpiAware
+ using Turing
+ using Distributions
+ using StatsPlots
+ using Random
+ using DynamicPPL
+ using Statistics
+ using DataFramesMeta
+ using LinearAlgebra
+end
+
+
+
+Random walk LatentModel
As an example, we choose the latent process as a random walk with parameters $\theta_Z$:
Conditional on the parameters the random walk is then generated by white noise:
$$\begin{align}
+Z_t &= Z_0 + \sigma_{RW} \sum_{t= 1}^T \epsilon_t, \\
+\epsilon_t &\sim \mathcal{N}(0,1).
+\end{align}$$
In EpiAware
we provide a constructor for random walk latent models with priors for $\theta_Z$. We choose priors,
$$\begin{align}
+Z_0 &\sim \mathcal{N}(0,1),\\
+\sigma^2_Z &\sim \text{HalfNormal}(0.01).
+\end{align}$$
+
+rwp = EpiAware.RandomWalk(Normal(),
+ truncated(Normal(0.0, 0.02), 0.0, Inf))
+RandomWalk{Normal{Float64}, Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}}(Distributions.Normal{Float64}(μ=0.0, σ=1.0), Truncated(Distributions.Normal{Float64}(μ=0.0, σ=0.02); lower=0.0, upper=Inf))
+
+
+Direct infection EpiModel
This is a simple model where the unobserved log-infections are directly generated by the latent process $Z$.
$$\log I_t = \log I_0 + Z_t.$$
As discussed above, we still ask for a defined generation interval, which can be used to calculate $\mathcal{R}_t$.
+
+truth_GI = Gamma(2, 5)
+Distributions.Gamma{Float64}(α=2.0, θ=5.0)
+
+
+The EpiData
constructor performs double interval censoring to convert our continuous estimate of the generation interval into a discretized version. We also implement right truncation using the keyword D_gen
.
+
+model_data = EpiData(truth_GI, D_gen = 10.0)
+EpiData{Float64, typeof(exp)}([0.056722565637507084, 0.0944813112842689, 0.11668723746973605, 0.12773814268679387, 0.130948497124998, 0.1287976856448402, 0.12312384970427602, 0.11527489380975135, 0.10622581663782862], 9, exp)
+
+
+We can supply a prior for the initial log_infections.
+
+log_I0_prior = Normal(log(100.0), 1.0)
+Distributions.Normal{Float64}(μ=4.605170185988092, σ=1.0)
+
+
+And construct the EpiModel
.
+
+epi_model = DirectInfections(model_data, log_I0_prior)
+DirectInfections{Normal{Float64}}(EpiData{Float64, typeof(exp)}([0.056722565637507084, 0.0944813112842689, 0.11668723746973605, 0.12773814268679387, 0.130948497124998, 0.1287976856448402, 0.12312384970427602, 0.11527489380975135, 0.10622581663782862], 9, exp), Distributions.Normal{Float64}(μ=4.605170185988092, σ=1.0))
+
+
+Delayed Observations ObservationModel
The observation model is a negative binomial distribution with mean μ
and cluster factor 1 / r
. Delays are implemented as the action of a sparse kernel on the infections $I(t)$.
$$\begin{align}
+y_t &\sim \text{NegBinomial}(\mu = \sum_{s\geq 0} K[t, t-s] I(s), r), \\
+1 / r &\sim \text{Gamma}(3, 0.05/3).
+\end{align}$$
+
+
+We also set up the inference to occur over 100 days.
+
+time_horizon = 100
+100
+
+
+We choose a simple observation model where infections are observed 0, 1, 2, 3 days later with equal probability.
+
+obs_model = EpiAware.DelayObservations(
+ fill(0.25, 4),
+ time_horizon,
+ truncated(Gamma(5, 0.05 / 5), 1e-3, 1.0)
+)
+DelayObservations{Float64, Truncated{Gamma{Float64}, Continuous, Float64, Float64, Float64}}(sparse(Integer[1, 2, 3, 4, 2, 3, 4, 5, 3, 4 … 97, 98, 99, 100, 98, 99, 100, 99, 100, 100], Integer[1, 1, 1, 1, 2, 2, 2, 2, 3, 3 … 97, 97, 97, 97, 98, 98, 98, 99, 99, 100], [0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25 … 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25], 100, 100), Truncated(Distributions.Gamma{Float64}(α=5.0, θ=0.01); lower=0.001, upper=1.0))
+
+
+Generate cases from the EpiAware
model
Having chosen an EpiModel
, LatentModel
and ObservationModel
, we can implement the model as a Turing
model using make_epi_aware
.
By giving missing
to the first argument, we indicate that case data will be generated from the model rather than treated as fixed.
+
+full_epi_aware_mdl = make_epi_aware(missing, time_horizon;
+ epi_model = epi_model,
+ latent_model = rwp,
+ observation_model = obs_model)
+Model{typeof(make_epi_aware), (:y_t, :time_steps), (:epi_model, :latent_model, :observation_model, :pos_shift), (:y_t,), Tuple{Missing, Int64}, Tuple{DirectInfections{Normal{Float64}}, RandomWalk{Normal{Float64}, Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}}, DelayObservations{Float64, Truncated{Gamma{Float64}, Continuous, Float64, Float64, Float64}}, Float64}, DefaultContext}(EpiAware.make_epi_aware, (y_t = missing, time_steps = 100), (epi_model = DirectInfections{Normal{Float64}}(EpiData{Float64, typeof(exp)}([0.056722565637507084, 0.0944813112842689, 0.11668723746973605, 0.12773814268679387, 0.130948497124998, 0.1287976856448402, 0.12312384970427602, 0.11527489380975135, 0.10622581663782862], 9, exp), Distributions.Normal{Float64}(μ=4.605170185988092, σ=1.0)), latent_model = RandomWalk{Normal{Float64}, Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}}(Distributions.Normal{Float64}(μ=0.0, σ=1.0), Truncated(Distributions.Normal{Float64}(μ=0.0, σ=0.02); lower=0.0, upper=Inf)), observation_model = DelayObservations{Float64, Truncated{Gamma{Float64}, Continuous, Float64, Float64, Float64}}(sparse(Integer[1, 2, 3, 4, 2, 3, 4, 5, 3, 4 … 97, 98, 99, 100, 98, 99, 100, 99, 100, 100], Integer[1, 1, 1, 1, 2, 2, 2, 2, 3, 3 … 97, 97, 97, 97, 98, 98, 98, 99, 99, 100], [0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25 … 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25], 100, 100), Truncated(Distributions.Gamma{Float64}(α=5.0, θ=0.01); lower=0.001, upper=1.0)), pos_shift = 1.0e-6), DefaultContext())
+
+
+We choose some fixed parameters:
Initial incidence is 100.
In the direct infection model, the initial incidence and in the initial value of the random walk form a non-identifiable pair. Therefore, we fix $Z_0 = 0$.
+
+fixed_parameters = (rw_init = 0.0, init_incidence = log(100.0))
+(rw_init = 0.0, init_incidence = 4.605170185988092)
+
+
+We fix these parameters using fix
, and generate a random epidemic.
+
+cond_generative_model = fix(full_epi_aware_mdl, fixed_parameters)
+Model{typeof(make_epi_aware), (:y_t, :time_steps), (:epi_model, :latent_model, :observation_model, :pos_shift), (:y_t,), Tuple{Missing, Int64}, Tuple{DirectInfections{Normal{Float64}}, RandomWalk{Normal{Float64}, Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}}, DelayObservations{Float64, Truncated{Gamma{Float64}, Continuous, Float64, Float64, Float64}}, Float64}, DynamicPPL.FixedContext{@NamedTuple{rw_init::Float64, init_incidence::Float64}, DefaultContext}}(EpiAware.make_epi_aware, (y_t = missing, time_steps = 100), (epi_model = DirectInfections{Normal{Float64}}(EpiData{Float64, typeof(exp)}([0.056722565637507084, 0.0944813112842689, 0.11668723746973605, 0.12773814268679387, 0.130948497124998, 0.1287976856448402, 0.12312384970427602, 0.11527489380975135, 0.10622581663782862], 9, exp), Distributions.Normal{Float64}(μ=4.605170185988092, σ=1.0)), latent_model = RandomWalk{Normal{Float64}, Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}}(Distributions.Normal{Float64}(μ=0.0, σ=1.0), Truncated(Distributions.Normal{Float64}(μ=0.0, σ=0.02); lower=0.0, upper=Inf)), observation_model = DelayObservations{Float64, Truncated{Gamma{Float64}, Continuous, Float64, Float64, Float64}}(sparse(Integer[1, 2, 3, 4, 2, 3, 4, 5, 3, 4 … 97, 98, 99, 100, 98, 99, 100, 99, 100, 100], Integer[1, 1, 1, 1, 2, 2, 2, 2, 3, 3 … 97, 97, 97, 97, 98, 98, 98, 99, 99, 100], [0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25 … 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25], 100, 100), Truncated(Distributions.Gamma{Float64}(α=5.0, θ=0.01); lower=0.001, upper=1.0)), pos_shift = 1.0e-6), FixedContext((rw_init = 0.0, init_incidence = 4.605170185988092), DynamicPPL.DefaultContext()))
+
+random_epidemic = rand(cond_generative_model)
+(ϵ_t = [-0.5583184135402424, -0.11335818151049779, -0.41840610994469457, -1.547563151811605, 0.6999363447850959, -0.3056255658036757, -1.535672236352086, 1.774399549687218, -1.4892356494555308, -1.5762432361371324 … 0.8092430706433104, 0.25054983992169017, -0.22299867432234016, 0.9535550868628223, 0.6972338651279294, -0.8825600469470198, 1.8024873865011717, -1.60481179576158, 0.3586762822392592, -0.3759702879979565], σ²_RW = 0.03212848549550727, neg_bin_cluster_factor = 0.07280717990510553, y_t = [26, 49, 90, 131, 78, 80, 74, 61, 47, 71 … 247, 188, 251, 148, 147, 181, 271, 194, 345, 258])
+
+true_infections = generated_quantities(cond_generative_model, random_epidemic).I_t
+100-element Vector{Float64}:
+ 90.47692493358163
+ 88.65709363937565
+ 82.25129849708927
+ 62.32654682907016
+ 70.65769231857523
+ 66.89106298063069
+ 50.795360380484595
+ ⋮
+ 313.83867330836426
+ 267.91916167543263
+ 370.0990677393993
+ 277.5823105609579
+ 296.01441408488313
+ 276.7231792195055
+
+
+
+
+
+
Fixing $Z_0 = 0$ for the random walk was based on inference principles; in this model $Z_0$ and $\log I_0$ are non-identifiable.
However, we now treat the generated data as truth_data
and make inference without fixing any other parameters.
We do the inference by MCMC/NUTS using the Turing
NUTS sampler with default warm-up steps.
+
+truth_data = random_epidemic.y_t
+100-element Vector{Int64}:
+ 26
+ 49
+ 90
+ 131
+ 78
+ 80
+ 74
+ ⋮
+ 147
+ 181
+ 271
+ 194
+ 345
+ 258
+
+inference_mdl = fix(
+ make_epi_aware(truth_data, time_horizon; epi_model = epi_model,
+ latent_model = rwp, observation_model = obs_model),
+ (rw_init = 0.0,)
+)
+Model{typeof(make_epi_aware), (:y_t, :time_steps), (:epi_model, :latent_model, :observation_model, :pos_shift), (), Tuple{Vector{Int64}, Int64}, Tuple{DirectInfections{Normal{Float64}}, RandomWalk{Normal{Float64}, Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}}, DelayObservations{Float64, Truncated{Gamma{Float64}, Continuous, Float64, Float64, Float64}}, Float64}, DynamicPPL.FixedContext{@NamedTuple{rw_init::Float64}, DefaultContext}}(EpiAware.make_epi_aware, (y_t = [26, 49, 90, 131, 78, 80, 74, 61, 47, 71 … 247, 188, 251, 148, 147, 181, 271, 194, 345, 258], time_steps = 100), (epi_model = DirectInfections{Normal{Float64}}(EpiData{Float64, typeof(exp)}([0.056722565637507084, 0.0944813112842689, 0.11668723746973605, 0.12773814268679387, 0.130948497124998, 0.1287976856448402, 0.12312384970427602, 0.11527489380975135, 0.10622581663782862], 9, exp), Distributions.Normal{Float64}(μ=4.605170185988092, σ=1.0)), latent_model = RandomWalk{Normal{Float64}, Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}}(Distributions.Normal{Float64}(μ=0.0, σ=1.0), Truncated(Distributions.Normal{Float64}(μ=0.0, σ=0.02); lower=0.0, upper=Inf)), observation_model = DelayObservations{Float64, Truncated{Gamma{Float64}, Continuous, Float64, Float64, Float64}}(sparse(Integer[1, 2, 3, 4, 2, 3, 4, 5, 3, 4 … 97, 98, 99, 100, 98, 99, 100, 99, 100, 100], Integer[1, 1, 1, 1, 2, 2, 2, 2, 3, 3 … 97, 97, 97, 97, 98, 98, 98, 99, 99, 100], [0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25 … 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25], 100, 100), Truncated(Distributions.Gamma{Float64}(α=5.0, θ=0.01); lower=0.001, upper=1.0)), pos_shift = 1.0e-6), FixedContext((rw_init = 0.0,), DynamicPPL.DefaultContext()))
+
+chn = sample(inference_mdl,
+ NUTS(; adtype = AutoReverseDiff(true)),
+ MCMCThreads(),
+ 250,
+ 4;
+ drop_warmup = true)
+ | iteration | chain | ϵ_t[1] | ϵ_t[2] | ϵ_t[3] | ϵ_t[4] | ϵ_t[5] | ϵ_t[6] | ... |
---|
1 | 126 | 1 | 0.803301 | 0.450201 | -0.551514 | -0.299899 | -0.808481 | -1.17731 | |
2 | 127 | 1 | 0.811487 | 0.27051 | 1.01452 | -0.280454 | -0.314382 | -0.820463 | |
3 | 128 | 1 | 0.450336 | 0.24293 | -0.782281 | 0.0228791 | -1.24283 | -0.952719 | |
4 | 129 | 1 | -1.10222 | -0.7561 | -0.41742 | -0.629301 | 0.16572 | -0.905448 | |
5 | 130 | 1 | -0.85121 | -1.2367 | -1.08595 | -0.842347 | -0.0570876 | -0.646837 | |
6 | 131 | 1 | 0.247534 | -0.929693 | -1.084 | 0.697031 | -0.913332 | -0.967544 | |
7 | 132 | 1 | -0.66134 | -0.392204 | -0.719505 | 0.662858 | -1.91246 | 0.557281 | |
8 | 133 | 1 | -0.28067 | -0.332762 | -1.61992 | -0.134065 | -1.42008 | 1.68518 | |
9 | 134 | 1 | -0.444202 | 0.574384 | -0.304614 | -0.919158 | -1.92502 | 0.511353 | |
10 | 135 | 1 | -0.459689 | 0.712142 | 0.000772433 | -1.11338 | -2.41843 | 0.703778 | |
... |
+
+
+Predictive plotting
We can spaghetti plot generated case data from the version of the model which hasn't conditioned on case data using posterior parameters inferred from the version conditioned on observed data. This is known as posterior predictive checking, and is a useful diagnostic tool for Bayesian inference (see here).
Because we are using synthetic data we can also plot the model predictions for the unobserved infections and check that (at least in this example) we were able to capture some unobserved/latent variables in the process accurate.
+
+let
+ post_check_mdl = fix(full_epi_aware_mdl, (rw_init = 0.0,))
+ post_check_y_t = mapreduce(hcat, generated_quantities(post_check_mdl, chn)) do gen
+ gen.generated_y_t
+ end
+
+ predicted_I_t = mapreduce(hcat, generated_quantities(inference_mdl, chn)) do gen
+ gen.I_t
+ end
+
+ p1 = plot(post_check_y_t, c = :grey, alpha = 0.05, lab = "")
+ scatter!(p1, truth_data,
+ lab = "Observed cases",
+ xlabel = "Time",
+ ylabel = "Cases",
+ title = "Post. predictive checking: cases",
+ ylims = (-0.5, maximum(truth_data) * 1.5),
+ c = :green)
+
+ p2 = plot(predicted_I_t, c = :grey, alpha = 0.05, lab = "")
+ scatter!(p2, true_infections,
+ lab = "Actual infections",
+ xlabel = "Time",
+ ylabel = "Unobserved Infections",
+ title = "Post. predictions: infections",
+ ylims = (-0.5, maximum(true_infections) * 1.5),
+ c = :red)
+
+ plot(p1, p2,
+ layout = (1, 2),
+ size = (700, 400))
+end
+
+
+
+As well as checking the posterior predictions for latent infections, we can also check how well inference recovered unknown parameters, such as the random walk variance or the cluster factor of the negative binomial observations.
+
+let
+ parameters_to_plot = (:σ²_RW, :neg_bin_cluster_factor)
+
+ plts = map(parameters_to_plot) do name
+ var_samples = chn[name] |> vec
+ histogram(var_samples,
+ bins = 50,
+ norm = :pdf,
+ lw = 0,
+ fillalpha = 0.5,
+ lab = "MCMC")
+ vline!([getfield(random_epidemic, name)], lab = "True value")
+ title!(string(name))
+ end
+ plot(plts..., layout = (2, 1))
+end
+
+
+
+
As mentioned at the top, we don't directly use the concept of reproductive numbers in this note. However, we can back-calculate the implied $\mathcal{R}(t)$ values, conditional on the specified generation interval being correct.
Here we spaghetti plot posterior sampled time-varying reproductive numbers against the actual.
+
+let
+ n = epi_model.data.len_gen_int
+ Rt_denom = [dot(reverse(epi_model.data.gen_int), true_infections[(t - n):(t - 1)])
+ for t in (n + 1):length(true_infections)]
+ true_Rt = true_infections[(n + 1):end] ./ Rt_denom
+
+ predicted_Rt = mapreduce(hcat, generated_quantities(inference_mdl, chn)) do gen
+ _It = gen.I_t
+ _Rt_denom = [dot(reverse(epi_model.data.gen_int), _It[(t - n):(t - 1)])
+ for t in (n + 1):length(_It)]
+ Rt = _It[(n + 1):end] ./ _Rt_denom
+ end
+
+ plt = plot((n + 1):time_horizon, predicted_Rt, c = :grey, alpha = 0.05, lab = "")
+ plot!(plt, (n + 1):time_horizon, true_Rt,
+ lab = "true Rt",
+ xlabel = "Time",
+ ylabel = "Rt",
+ title = "Post. predictions: reproductive number",
+ c = :red,
+ lw = 2)
+end
+
+
+