diff --git a/NEWS.md b/NEWS.md
index c0ebcea6..eec35d5c 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -1,11 +1,11 @@
 # wwinference 0.1.0.99 (dev)
 
 ## User-visible changes
-
-- `wwinference` now checks whether `site_pop` is unique per site (see issue [#223](https://github.com/CDCgov/ww-inference-model/issues/226) and reported by [@akeyel](https://github.com/akeyel)).
+- Add wastewater data into the forecast period to output in `generate_simulated_data()` function and as package data. Also adds subpopulation-level
+hospital admissions to output of function and package data. ([#184](https://github.com/CDCgov/ww-inference-model/issues/184))
+- `wwinference` now checks whether `site_pop` is fixed per site (see issue [#223](https://github.com/CDCgov/ww-inference-model/issues/226) reported by [@akeyel](https://github.com/akeyel)).
 
 ## Internal changes
-
 - Updated the workflow for posting the pages artifact to PRs (issue [#229](https://github.com/CDCgov/ww-inference-model/issues/229)).
 - Modify `plot_forecasted_counts()` so that it does not require an evaluation dataset ([#218](https://github.com/CDCgov/ww-inference-model/pull/218))
 
diff --git a/R/data.R b/R/data.R
index b196dcf6..3baeeff7 100644
--- a/R/data.R
+++ b/R/data.R
@@ -39,6 +39,47 @@
 #' @source vignette_data.R
 "ww_data"
 
+#' Example evaluation wastewater dataset.
+#'
+#' A dataset containing the simulated retrospective wastewater concentrations
+#' (labeled here as `log_genome_copies_per_ml_eval`) by sample collection date
+#' (`date`), the site where the sample was collected (`site`) and the lab
+#' where the samples were processed (`lab`). Additional columns that are
+#' required attributes needed for the model are the limit of detection for
+#' that lab on each day (labeled here as `log_lod`) and the population size of
+#' the wastewater catchment area represented by the wastewater concentrations
+#' in each `site`.
+#'
+#' This data is generated via the default values in the
+#'  `generate_simulated_data()` function. They represent the bare minumum
+#'  required fields needed to pass to the model, and we recommend that users
+#'  try to format their own data to match this format.
+#'
+#' The variables are as follows:
+#'
+#' @format ## ww_data_eval
+#' A tibble with 126 rows and 6 columns
+#' \describe{
+#'   \item{date}{Sample collection date, formatted in ISO8601 standards as
+#'   YYYY-MM-DD}
+#'   \item{site}{The wastewater treatment plant where the sample was collected}
+#'   \item{lab}{The lab where the sample was processed}
+#'   \item{log_genome_copies_per_ml_eval}{The natural log of the wastewater
+#'   concentration measured on the date specified, collected in the site
+#'   specified, and processed in the lab specified. The package expects
+#'   this quantity in units of log estimated genome copies per mL.}
+#'   \item{log_lod}{The log of the limit of detection in the site and lab on a
+#'   particular day of the quantification device (e.g. PCR).  This should be in
+#'    units of log estimated genome copies per mL.}
+#'   \item{site_pop}{The population size of the wastewater catchment area
+#'   represented by the site variable}
+#'   \item{location}{ A string indicating the location that all of the
+#'   data is coming from. This is not a necessary column, but instead is
+#'   included to more realistically mirror a typical workflow}
+#'   }
+#' @source vignette_data.R
+"ww_data_eval"
+
 
 
 
@@ -57,9 +98,9 @@
 #'  to match this format.
 #'
 #' This data is generated via the default values in the
-#'  `generate_simulated_data()` function. They represent the bare minumum
+#'  `generate_simulated_data()` function. They represent the bare minimum
 #'  required fields needed to pass to the model, and we recommend that users
-#'  try to format their own data to match this formate.
+#'  try to format their own data to match this format.
 #'
 #' The variables are as follows:
 #' \describe{
@@ -132,6 +173,77 @@
 #' @source vignette_data.R
 "hosp_data_eval"
 
+
+
+
+#' Example subpopulation level hospital admissions dataset
+#'
+#'  A dataset containing the simulated daily hospital admissions
+#' (labeled here as `daily_hosp_admits`) by date of admission (`date`) in
+#'  each subpopulation.
+#'  Additional columns that are the population size of the
+#'  population contributing to the hospital admissions. In this instance,
+#'  the subpopulations here are each of the wastewater catchment areas plus
+#'  an additional subpopulation for the portion of the population not captured
+#'  by wastewater surveillance. The data generated are daily hospital
+#'  admissions but they could be any other epidemiological count dataset e.g.
+#'  cases. This data should only contain hospital admissions that would have
+#'  been available as of the date that the forecast was made.
+#'
+#' This data is generated via the default values in the
+#'  `generate_simulated_data()` function.
+#'
+#' The variables are as follows:
+#' \describe{
+#'   \item{date}{Date the hospital admissions occurred, formatted in ISO8601
+#'   standards as YYYY-MM-DD}
+#'   \item{subpop_name}{A string indicating the subpopulation the hospital
+#'   admissiosn corresponds to. This is either a wastewater site, or the
+#'   remainder of the population}
+#'   \item{daily_hosp_admits}{The number of individuals admitted to the
+#'   hospital on that date, available as of the forecast date}
+#'   \item{subpop_pop}{The number of people contributing to the daily hospital
+#'   admissions in each subpopulation}
+#'   }
+#' @source vignette_data.R
+"subpop_hosp_data"
+
+
+#' Example subpopulation level retrospective hospital admissions dataset
+#'
+#'  A dataset containing the simulated daily hospital admissions
+#' (labeled here as `daily_hosp_admits`) by date of admission (`date`) in
+#'  each subpopulation observed retrospectively.
+#'  Additional columns that are required are the population size of the
+#'  population contributing to the hospital admissions. In this instance,
+#'  the subpopulations here are each of the wastewater catchment areas plus
+#'  an additional subpopulation for the portion of the population not captured
+#'  by wastewater surveillance. The data generated are daily hospital
+#'  admissions but they could be any other epidemiological count dataset e.g.
+#'  cases.This data should contain hospital admissions retrospectively beyond
+#'  the forecast date in order to evaluate the forecasts.
+#'
+#'  This data is generated via the default values in the
+#'  `generate_simulated_data()` function. They represent the bare minimumum
+#'  required fields needed to pass to the model, and we recommend that users
+#'  try to format their own data to match this format.
+#'
+#' The variables are as follows:
+#' \describe{
+#'   \item{date}{Date the hospital admissions occurred, formatted in ISO8601
+#'   standards as YYYY-MM-DD}
+#'   \item{subpop_name}{A string indicating the subpopulation the hospital
+#'   admissions corresponds to. This is either a wastewater site, or the
+#'   remainder of the population}
+#'   \item{daily_hosp_admits_for_eval}{The number of individuals admitted to the
+#'   hospital on that date, available as of the forecast date}
+#'   \item{subpop_pop}{The number of people contributing to the daily hospital
+#'   admissions in each subpopulation}
+#'   }
+#' @source vignette_data.R
+"subpop_hosp_data_eval"
+
+
 #' COVID-19 post-Omicron generation interval probability mass function
 #'
 #' \describe{
diff --git a/R/figures.R b/R/figures.R
index ce40b45e..03ee740d 100644
--- a/R/figures.R
+++ b/R/figures.R
@@ -57,7 +57,6 @@ get_plot_forecasted_counts <- function(draws,
       aes(x = .data$date, y = .data$pred_value, group = .data$draw),
       color = "red4", alpha = 0.1, linewidth = 0.2
     ) +
-    geom_point(aes(x = .data$date, y = .data$observed_value)) +
     geom_vline(
       xintercept = lubridate::ymd(forecast_date),
       linetype = "dashed"
@@ -91,7 +90,11 @@ get_plot_forecasted_counts <- function(draws,
         shape = 21, color = "black", fill = "white"
       )
   }
-  return(p)
+  # Add calibration data as final step, this should be plotted on top of
+  # the eval data(if present) and draws
+  p_final <- p + geom_point(aes(x = .data$date, y = .data$observed_value))
+
+  return(p_final)
 }
 
 #' Get plot of fit and forecasted wastewater concentrations
diff --git a/R/generate_simulated_data.R b/R/generate_simulated_data.R
index c3582bd5..df74cfd5 100644
--- a/R/generate_simulated_data.R
+++ b/R/generate_simulated_data.R
@@ -59,6 +59,9 @@
 #' infection feedback into the infection process, default is `FALSE`, which
 #' sets the strength of the infection feedback to 0.
 #' If `TRUE`, this will apply an infection feedback drawn from the prior.
+#' @param subpop_phi Vector of numeric values indicating the overdispersion
+#' parameter phi in the hospital admissions observation process in each
+#' subpopulation
 #' @param input_params_path path to the toml file with the parameters to use
 #' to generate the simulated data
 #'
@@ -121,6 +124,7 @@ generate_simulated_data <- function(r_in_weeks = # nolint
                                     sigma_eps = 0.05,
                                     sd_i0_over_n = 0.5,
                                     if_feedback = FALSE,
+                                    subpop_phi = c(25, 50, 70, 40, 100),
                                     input_params_path =
                                       fs::path_package("extdata",
                                         "example_params.toml",
@@ -322,12 +326,35 @@ generate_simulated_data <- function(r_in_weeks = # nolint
   )
 
   ## Latent per capita admissions--------------------------------------------
+  # This won't be used other than for the unit test
   model_hosp_over_n <- model$functions$convolve_dot_product(
     p_hosp_days * new_i_over_n, # individuals who will be hospitalized
     rev(inf_to_hosp),
     uot + ot + ht
   )[(uot + 1):(uot + ot + ht)]
 
+  # Also compute per capita hosps for each subpopulation
+  model_hosp_subpop_over_n <- matrix(
+    nrow = n_subpops,
+    ncol = (ot + ht)
+  )
+  for (i in 1:n_subpops) {
+    model_hosp_subpop_over_n[i, ] <- model$functions$convolve_dot_product(
+      p_hosp_days * new_i_over_n_site[i, ],
+      rev(inf_to_hosp),
+      uot + ot + ht
+    )[(uot + 1):(uot + ot + ht)]
+  }
+
+  # unit test to make sure these are equivalent
+  if (!all.equal(
+    colSums(model_hosp_subpop_over_n * pop_fraction),
+    model_hosp_over_n,
+    tolerance = 1e-8
+  )) {
+    cli::cli_abort("Sum of convolutions not equal to convolution of sums")
+  }
+
 
   ## Add weekday effect on hospital admissions-------------------------------
   pred_hosp <- pop_size * model$functions$day_of_week_effect(
@@ -335,12 +362,36 @@ generate_simulated_data <- function(r_in_weeks = # nolint
     day_of_week_vector,
     hosp_wday_effect
   )
+
+  pred_hosp_subpop <- matrix(
+    nrow = n_subpops,
+    ncol = (ot + ht)
+  )
+  for (i in 1:n_subpops) {
+    pred_hosp_subpop[i, ] <- pop_fraction[i] * pop_size *
+      model$functions$day_of_week_effect(
+        model_hosp_subpop_over_n[i, ],
+        day_of_week_vector,
+        hosp_wday_effect
+      )
+  }
+
+
   ## Add observation error---------------------------------------------------
-  # This is negative binomial but could swap out for a different obs error
-  pred_obs_hosp <- rnbinom(
-    n = length(pred_hosp), mu = pred_hosp,
-    size = 1 / ((params$inv_sqrt_phi_prior_mean)^2)
+  # Use negative binomial but could swap out for a different obs error.
+  # Each subpopulation has its own dispersion parameter, then we sum
+  # the observations to get the population total
+  pred_obs_hosp_subpop <- matrix(
+    nrow = n_subpops,
+    ncol = (ot + ht)
   )
+  for (i in 1:n_subpops) {
+    pred_obs_hosp_subpop[i, ] <- rnbinom(
+      n = length(pred_hosp_subpop[i, ]), mu = pred_hosp_subpop[i, ],
+      size = subpop_phi[i]
+    )
+  }
+  pred_obs_hosp <- colSums(pred_obs_hosp_subpop)
 
 
 
@@ -381,6 +432,18 @@ generate_simulated_data <- function(r_in_weeks = # nolint
     lab_site_reporting_latency = lab_site_reporting_latency
   )
 
+  # Create evaluation data with same reporting freq but go through the entire
+  # time period
+  log_obs_conc_lab_site_eval <- downsample_ww_obs(
+    log_conc_lab_site = log_conc_lab_site,
+    n_lab_sites = n_lab_sites,
+    ot = ot + ht,
+    ht = 0,
+    nt = 0,
+    lab_site_reporting_freq = lab_site_reporting_freq,
+    lab_site_reporting_latency = rep(0, n_lab_sites)
+  )
+
 
 
   # Global adjusted R(t) --------------------------------------------------
@@ -406,6 +469,18 @@ generate_simulated_data <- function(r_in_weeks = # nolint
     lod_lab_site = lod_lab_site
   )
 
+  ww_data_eval <- format_ww_data(
+    log_obs_conc_lab_site = log_obs_conc_lab_site_eval,
+    ot = ot + ht,
+    ht = 0,
+    date_df = date_df,
+    site_lab_map = site_lab_map,
+    lod_lab_site = lod_lab_site
+  ) |>
+    dplyr::rename(
+      "log_genome_copies_per_ml_eval" = "log_genome_copies_per_ml"
+    )
+
   # Artificially add values below the LOD----------------------------------
   # Replace it with an NA, will be used as an example of how to format data
   # properly.
@@ -419,16 +494,27 @@ generate_simulated_data <- function(r_in_weeks = # nolint
           TRUE ~ .data$log_genome_copies_per_ml
         )
     )
+  ww_data_eval <- ww_data_eval |>
+    dplyr::mutate(
+      "log_genome_copies_per_ml_eval" =
+        dplyr::case_when(
+          .data$log_genome_copies_per_ml_eval ==
+            !!min_ww_val ~ 0.5 * .data$log_lod,
+          TRUE ~ .data$log_genome_copies_per_ml_eval
+        )
+    )
 
 
   # Make a hospital admissions dataframe for model calibration
-  hosp_data <- format_hosp_data(pred_obs_hosp,
+  hosp_data <- format_hosp_data(
+    pred_obs_hosp = pred_obs_hosp,
     dur_obs = ot,
     pop_size = pop_size,
     date_df = date_df
   )
 
-  hosp_data_eval <- format_hosp_data(pred_obs_hosp,
+  hosp_data_eval <- format_hosp_data(
+    pred_obs_hosp = pred_obs_hosp,
     dur_obs = (ot + ht),
     pop_size = pop_size,
     date_df = date_df
@@ -437,6 +523,36 @@ generate_simulated_data <- function(r_in_weeks = # nolint
       "daily_hosp_admits_for_eval" = "daily_hosp_admits"
     )
 
+  # Make a subpopulation level hospital admissions data
+  # For now this will only be used for evaluation, eventually, can add
+  # feature to use this in calibration
+  subpop_map <- tibble::tibble(
+    subpop_index = as.character(1:n_subpops),
+    subpop_pop = pop_size * pop_fraction,
+    subpop_name = c(1:n_sites, NA)
+  ) |>
+    dplyr::mutate(subpop_name = ifelse(!is.na(subpop_name),
+      glue::glue("Site: {subpop_name}"),
+      "remainder of population"
+    ))
+
+  subpop_hosp_data <- format_subpop_hosp_data(
+    pred_obs_hosp_subpop = pred_obs_hosp_subpop,
+    dur_obs = ot,
+    subpop_map = subpop_map,
+    date_df = date_df
+  )
+
+  subpop_hosp_data_eval <- format_subpop_hosp_data(
+    pred_obs_hosp_subpop = pred_obs_hosp_subpop,
+    dur_obs = (ot + ht),
+    subpop_map = subpop_map,
+    date_df = date_df
+  ) |>
+    dplyr::rename(
+      "daily_hosp_admits_for_eval" = "daily_hosp_admits"
+    )
+
   # Global R(t)
   true_rt <- tibble::tibble(
     unadj_rt_daily = as.numeric(unadj_r_daily),
@@ -453,8 +569,11 @@ generate_simulated_data <- function(r_in_weeks = # nolint
 
   example_data <- list(
     ww_data = ww_data,
+    ww_data_eval = ww_data_eval,
     hosp_data = hosp_data,
     hosp_data_eval = hosp_data_eval,
+    subpop_hosp_data = subpop_hosp_data,
+    subpop_hosp_data_eval = subpop_hosp_data_eval,
     true_global_rt = true_rt
   )
 
diff --git a/R/model_component_fwd_sim.R b/R/model_component_fwd_sim.R
index b5449646..956e574d 100644
--- a/R/model_component_fwd_sim.R
+++ b/R/model_component_fwd_sim.R
@@ -422,6 +422,52 @@ format_hosp_data <- function(pred_obs_hosp,
   return(hosp_data)
 }
 
+
+#' Format the subpopulation-level hospital admissions data into a tidy
+#' dataframe
+#'
+#' @param pred_obs_hosp_subpop matrix of non-negative integers indicating the
+#' number of hospital admissions on each day in each subpopulation. Rows are
+#' subpopulations, columns are time points
+#' @param dur_obs integer indicating the number of days we want the
+#' observations for
+#' @param subpop_map tibble mapping the numbered subpopulations to the
+#' wastewater sites, must contain columns "subpop_index" and "subpop_name"
+#' @param date_df tibble of columns `date` and `t` that map time in days to
+#' dates
+#'
+#' @return a tidy dataframe containing counts of admissions by date alongside
+#' population size for each subpopulation
+format_subpop_hosp_data <- function(pred_obs_hosp_subpop,
+                                    dur_obs,
+                                    subpop_map,
+                                    date_df) {
+  subpop_hosp_data <- as.data.frame(t(pred_obs_hosp_subpop)) |>
+    dplyr::mutate(t = seq_len(ncol(pred_obs_hosp_subpop))) |>
+    dplyr::filter(t <= dur_obs) |>
+    tidyr::pivot_longer(!t,
+      names_to = "subpop_index",
+      names_prefix = "V",
+      values_to = "daily_hosp_admits"
+    ) |>
+    dplyr::left_join(
+      date_df,
+      by = "t"
+    ) |>
+    dplyr::left_join(
+      subpop_map,
+      by = "subpop_index"
+    ) |>
+    dplyr::select(
+      "date",
+      "subpop_name",
+      "daily_hosp_admits",
+      "subpop_pop"
+    )
+  return(subpop_hosp_data)
+}
+
+
 #' Back- calculate R(t) from incident infections and the generation interval
 #'
 #' @description
diff --git a/data-raw/vignette_data.R b/data-raw/vignette_data.R
index 38c61081..8f4c2dbd 100644
--- a/data-raw/vignette_data.R
+++ b/data-raw/vignette_data.R
@@ -1,22 +1,19 @@
 set.seed(1)
 simulated_data <- wwinference::generate_simulated_data()
 hosp_data_from_sim <- simulated_data$hosp_data
-ww_data_from_sim <- simulated_data$ww_data
-# Add some columns and reorder sites to ensure package works as expected
-# even if sites are not in order
-ww_data <- ww_data_from_sim |>
-  dplyr::mutate(
-    "location" = "example state",
-    "site" = .data$site + 1
-  ) |>
-  dplyr::ungroup() |>
-  dplyr::arrange(desc(.data$site))
+ww_data <- simulated_data$ww_data
+ww_data_eval <- simulated_data$ww_data_eval
 hosp_data <- hosp_data_from_sim |>
   dplyr::mutate("location" = "example state")
 hosp_data_eval <- simulated_data$hosp_data_eval
+subpop_hosp_data <- simulated_data$subpop_hosp_data
+subpop_hosp_data_eval <- simulated_data$subpop_hosp_data_eval
 true_global_rt <- simulated_data$true_global_rt
 
 usethis::use_data(hosp_data, overwrite = TRUE)
 usethis::use_data(hosp_data_eval, overwrite = TRUE)
 usethis::use_data(ww_data, overwrite = TRUE)
+usethis::use_data(ww_data_eval, overwrite = TRUE)
+usethis::use_data(subpop_hosp_data, overwrite = TRUE)
+usethis::use_data(subpop_hosp_data_eval, overwrite = TRUE)
 usethis::use_data(true_global_rt, overwrite = TRUE)
diff --git a/data/hosp_data.rda b/data/hosp_data.rda
index 7595c3bb..83e0eeb8 100644
Binary files a/data/hosp_data.rda and b/data/hosp_data.rda differ
diff --git a/data/hosp_data_eval.rda b/data/hosp_data_eval.rda
index 559fb6e0..4ec7bf76 100644
Binary files a/data/hosp_data_eval.rda and b/data/hosp_data_eval.rda differ
diff --git a/data/subpop_hosp_data.rda b/data/subpop_hosp_data.rda
new file mode 100644
index 00000000..29de9168
Binary files /dev/null and b/data/subpop_hosp_data.rda differ
diff --git a/data/subpop_hosp_data_eval.rda b/data/subpop_hosp_data_eval.rda
new file mode 100644
index 00000000..66dda2dd
Binary files /dev/null and b/data/subpop_hosp_data_eval.rda differ
diff --git a/data/true_global_rt.rda b/data/true_global_rt.rda
index 39952038..c1a6d882 100644
Binary files a/data/true_global_rt.rda and b/data/true_global_rt.rda differ
diff --git a/data/ww_data.rda b/data/ww_data.rda
index 77c8e284..c58ab9dd 100644
Binary files a/data/ww_data.rda and b/data/ww_data.rda differ
diff --git a/data/ww_data_eval.rda b/data/ww_data_eval.rda
new file mode 100644
index 00000000..176a52b4
Binary files /dev/null and b/data/ww_data_eval.rda differ
diff --git a/man/format_subpop_hosp_data.Rd b/man/format_subpop_hosp_data.Rd
new file mode 100644
index 00000000..ed97afba
--- /dev/null
+++ b/man/format_subpop_hosp_data.Rd
@@ -0,0 +1,31 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/model_component_fwd_sim.R
+\name{format_subpop_hosp_data}
+\alias{format_subpop_hosp_data}
+\title{Format the subpopulation-level hospital admissions data into a tidy
+dataframe}
+\usage{
+format_subpop_hosp_data(pred_obs_hosp_subpop, dur_obs, subpop_map, date_df)
+}
+\arguments{
+\item{pred_obs_hosp_subpop}{matrix of non-negative integers indicating the
+number of hospital admissions on each day in each subpopulation. Rows are
+subpopulations, columns are time points}
+
+\item{dur_obs}{integer indicating the number of days we want the
+observations for}
+
+\item{subpop_map}{tibble mapping the numbered subpopulations to the
+wastewater sites, must contain columns "subpop_index" and "subpop_name"}
+
+\item{date_df}{tibble of columns \code{date} and \code{t} that map time in days to
+dates}
+}
+\value{
+a tidy dataframe containing counts of admissions by date alongside
+population size for each subpopulation
+}
+\description{
+Format the subpopulation-level hospital admissions data into a tidy
+dataframe
+}
diff --git a/man/generate_simulated_data.Rd b/man/generate_simulated_data.Rd
index 802b77e7..da353779 100644
--- a/man/generate_simulated_data.Rd
+++ b/man/generate_simulated_data.Rd
@@ -30,6 +30,7 @@ generate_simulated_data(
   sigma_eps = 0.05,
   sd_i0_over_n = 0.5,
   if_feedback = FALSE,
+  subpop_phi = c(25, 50, 70, 40, 100),
   input_params_path = fs::path_package("extdata", "example_params.toml", package =
     "wwinference")
 )
@@ -115,6 +116,10 @@ infection feedback into the infection process, default is \code{FALSE}, which
 sets the strength of the infection feedback to 0.
 If \code{TRUE}, this will apply an infection feedback drawn from the prior.}
 
+\item{subpop_phi}{Vector of numeric values indicating the overdispersion
+parameter phi in the hospital admissions observation process in each
+subpopulation}
+
 \item{input_params_path}{path to the toml file with the parameters to use
 to generate the simulated data}
 }
diff --git a/man/hosp_data.Rd b/man/hosp_data.Rd
index 1393f270..10811a61 100644
--- a/man/hosp_data.Rd
+++ b/man/hosp_data.Rd
@@ -28,9 +28,9 @@ to match this format.
 }
 \details{
 This data is generated via the default values in the
-\code{generate_simulated_data()} function. They represent the bare minumum
+\code{generate_simulated_data()} function. They represent the bare minimum
 required fields needed to pass to the model, and we recommend that users
-try to format their own data to match this formate.
+try to format their own data to match this format.
 
 The variables are as follows:
 \describe{
diff --git a/man/subpop_hosp_data.Rd b/man/subpop_hosp_data.Rd
new file mode 100644
index 00000000..4267b5a0
--- /dev/null
+++ b/man/subpop_hosp_data.Rd
@@ -0,0 +1,46 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/data.R
+\docType{data}
+\name{subpop_hosp_data}
+\alias{subpop_hosp_data}
+\title{Example subpopulation level hospital admissions dataset}
+\format{
+An object of class \code{tbl_df} (inherits from \code{tbl}, \code{data.frame}) with 450 rows and 4 columns.
+}
+\source{
+vignette_data.R
+}
+\usage{
+subpop_hosp_data
+}
+\description{
+A dataset containing the simulated daily hospital admissions
+(labeled here as \code{daily_hosp_admits}) by date of admission (\code{date}) in
+each subpopulation.
+Additional columns that are the population size of the
+population contributing to the hospital admissions. In this instance,
+the subpopulations here are each of the wastewater catchment areas plus
+an additional subpopulation for the portion of the population not captured
+by wastewater surveillance. The data generated are daily hospital
+admissions but they could be any other epidemiological count dataset e.g.
+cases. This data should only contain hospital admissions that would have
+been available as of the date that the forecast was made.
+}
+\details{
+This data is generated via the default values in the
+\code{generate_simulated_data()} function.
+
+The variables are as follows:
+\describe{
+\item{date}{Date the hospital admissions occurred, formatted in ISO8601
+standards as YYYY-MM-DD}
+\item{subpop_name}{A string indicating the subpopulation the hospital
+admissiosn corresponds to. This is either a wastewater site, or the
+remainder of the population}
+\item{daily_hosp_admits}{The number of individuals admitted to the
+hospital on that date, available as of the forecast date}
+\item{subpop_pop}{The number of people contributing to the daily hospital
+admissions in each subpopulation}
+}
+}
+\keyword{datasets}
diff --git a/man/subpop_hosp_data_eval.Rd b/man/subpop_hosp_data_eval.Rd
new file mode 100644
index 00000000..9da0cc9d
--- /dev/null
+++ b/man/subpop_hosp_data_eval.Rd
@@ -0,0 +1,48 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/data.R
+\docType{data}
+\name{subpop_hosp_data_eval}
+\alias{subpop_hosp_data_eval}
+\title{Example subpopulation level retrospective hospital admissions dataset}
+\format{
+An object of class \code{tbl_df} (inherits from \code{tbl}, \code{data.frame}) with 635 rows and 4 columns.
+}
+\source{
+vignette_data.R
+}
+\usage{
+subpop_hosp_data_eval
+}
+\description{
+A dataset containing the simulated daily hospital admissions
+(labeled here as \code{daily_hosp_admits}) by date of admission (\code{date}) in
+each subpopulation observed retrospectively.
+Additional columns that are required are the population size of the
+population contributing to the hospital admissions. In this instance,
+the subpopulations here are each of the wastewater catchment areas plus
+an additional subpopulation for the portion of the population not captured
+by wastewater surveillance. The data generated are daily hospital
+admissions but they could be any other epidemiological count dataset e.g.
+cases.This data should contain hospital admissions retrospectively beyond
+the forecast date in order to evaluate the forecasts.
+}
+\details{
+This data is generated via the default values in the
+\code{generate_simulated_data()} function. They represent the bare minimumum
+required fields needed to pass to the model, and we recommend that users
+try to format their own data to match this format.
+
+The variables are as follows:
+\describe{
+\item{date}{Date the hospital admissions occurred, formatted in ISO8601
+standards as YYYY-MM-DD}
+\item{subpop_name}{A string indicating the subpopulation the hospital
+admissions corresponds to. This is either a wastewater site, or the
+remainder of the population}
+\item{daily_hosp_admits_for_eval}{The number of individuals admitted to the
+hospital on that date, available as of the forecast date}
+\item{subpop_pop}{The number of people contributing to the daily hospital
+admissions in each subpopulation}
+}
+}
+\keyword{datasets}
diff --git a/man/ww_data_eval.Rd b/man/ww_data_eval.Rd
new file mode 100644
index 00000000..2afdd3d1
--- /dev/null
+++ b/man/ww_data_eval.Rd
@@ -0,0 +1,55 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/data.R
+\docType{data}
+\name{ww_data_eval}
+\alias{ww_data_eval}
+\title{Example evaluation wastewater dataset.}
+\format{
+\subsection{ww_data_eval}{
+
+A tibble with 126 rows and 6 columns
+\describe{
+\item{date}{Sample collection date, formatted in ISO8601 standards as
+YYYY-MM-DD}
+\item{site}{The wastewater treatment plant where the sample was collected}
+\item{lab}{The lab where the sample was processed}
+\item{log_genome_copies_per_ml_eval}{The natural log of the wastewater
+concentration measured on the date specified, collected in the site
+specified, and processed in the lab specified. The package expects
+this quantity in units of log estimated genome copies per mL.}
+\item{log_lod}{The log of the limit of detection in the site and lab on a
+particular day of the quantification device (e.g. PCR).  This should be in
+units of log estimated genome copies per mL.}
+\item{site_pop}{The population size of the wastewater catchment area
+represented by the site variable}
+\item{location}{ A string indicating the location that all of the
+data is coming from. This is not a necessary column, but instead is
+included to more realistically mirror a typical workflow}
+}
+}
+}
+\source{
+vignette_data.R
+}
+\usage{
+ww_data_eval
+}
+\description{
+A dataset containing the simulated retrospective wastewater concentrations
+(labeled here as \code{log_genome_copies_per_ml_eval}) by sample collection date
+(\code{date}), the site where the sample was collected (\code{site}) and the lab
+where the samples were processed (\code{lab}). Additional columns that are
+required attributes needed for the model are the limit of detection for
+that lab on each day (labeled here as \code{log_lod}) and the population size of
+the wastewater catchment area represented by the wastewater concentrations
+in each \code{site}.
+}
+\details{
+This data is generated via the default values in the
+\code{generate_simulated_data()} function. They represent the bare minumum
+required fields needed to pass to the model, and we recommend that users
+try to format their own data to match this format.
+
+The variables are as follows:
+}
+\keyword{datasets}
diff --git a/scratch/sim_data_script.R b/scratch/sim_data_script.R
index af84d369..337ec169 100644
--- a/scratch/sim_data_script.R
+++ b/scratch/sim_data_script.R
@@ -37,6 +37,7 @@ global_rt_sd <- 0.03
 sigma_eps <- 0.05
 sd_i0_over_n <- 0.5
 infection_feedback <- TRUE
+subpop_phi <- c(25, 50, 70, 40, 100)
 input_params_path <-
   fs::path_package("extdata",
     "example_params.toml",