From 6491e647ccb2fa8cc8d2f3f5885cb678113d098b Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 26 Jun 2024 10:06:33 -0400 Subject: [PATCH 001/103] remove extra files --- CONTRIBUTING.md | 38 ------------- DISCLAIMER.md | 23 -------- code-of-conduct.md | 103 ----------------------------------- open_practices.md | 126 ------------------------------------------- rules_of_behavior.md | 72 ------------------------- thanks.md | 6 --- 6 files changed, 368 deletions(-) delete mode 100644 CONTRIBUTING.md delete mode 100644 DISCLAIMER.md delete mode 100644 code-of-conduct.md delete mode 100644 open_practices.md delete mode 100644 rules_of_behavior.md delete mode 100644 thanks.md diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md deleted file mode 100644 index 5638186f..00000000 --- a/CONTRIBUTING.md +++ /dev/null @@ -1,38 +0,0 @@ -# Welcome! -Thank you for contributing to CDC's Open Source projects! If you have any -questions or doubts, don't be afraid to send them our way. We appreciate all -contributions, and we are looking forward to fostering an open, transparent, and -collaborative environment. - -Before contributing, we encourage you to also read our [LICENSE](LICENSE), -[README](README.md), and -[code-of-conduct](code-of-conduct.md) -files, also found in this repository. If you have any inquiries or questions not -answered by the content of this repository, feel free to [contact us](mailto:surveillanceplatform@cdc.gov). - -## Public Domain -This project is in the public domain within the United States, and copyright and -related rights in the work worldwide are waived through the [CC0 1.0 Universal public domain dedication](https://creativecommons.org/publicdomain/zero/1.0/). -All contributions to this project will be released under the CC0 dedication. By -submitting a pull request you are agreeing to comply with this waiver of -copyright interest. - -## Requesting Changes -Our pull request/merging process is designed to give the CDC Surveillance Team -and other in our space an opportunity to consider and discuss any suggested -changes. This policy affects all CDC spaces, both on-line and off, and all users -are expected to abide by it. - -### Open an issue in the repository -If you don't have specific language to submit but would like to suggest a change -or have something addressed, you can open an issue in this repository. Team -members will respond to the issue as soon as possible. - -### Submit a pull request -If you would like to contribute, please submit a pull request. In order for us -to merge a pull request, it must: - * Be at least seven days old. Pull requests may be held longer if necessary - to give people the opportunity to assess it. - * Receive a +1 from a majority of team members associated with the request. - If there is significant dissent between the team, a meeting will be held to - discuss a plan of action for the pull request. diff --git a/DISCLAIMER.md b/DISCLAIMER.md deleted file mode 100644 index 63fa40c7..00000000 --- a/DISCLAIMER.md +++ /dev/null @@ -1,23 +0,0 @@ -# DISCLAIMER -Use of this service is limited only to **non-sensitive and publicly available -data**. Users must not use, share, or store any kind of sensitive data like -health status, provision or payment of healthcare, Personally Identifiable -Information (PII) and/or Protected Health Information (PHI), etc. under **ANY** -circumstance. - -Administrators for this service reserve the right to moderate all information -used, shared, or stored with this service at any time. Any user that cannot -abide by this disclaimer and Code of Conduct may be subject to action, up to -and including revoking access to services. - -The material embodied in this software is provided to you "as-is" and without -warranty of any kind, express, implied or otherwise, including without -limitation, any warranty of fitness for a particular purpose. In no event shall -the Centers for Disease Control and Prevention (CDC) or the United States (U.S.) -government be liable to you or anyone else for any direct, special, incidental, -indirect or consequential damages of any kind, or any damages whatsoever, -including without limitation, loss of profit, loss of use, savings or revenue, -or the claims of third parties, whether or not CDC or the U.S. government has -been advised of the possibility of such loss, however caused and on any theory -of liability, arising out of or in connection with the possession, use or -performance of this software. diff --git a/code-of-conduct.md b/code-of-conduct.md deleted file mode 100644 index 2633c7b9..00000000 --- a/code-of-conduct.md +++ /dev/null @@ -1,103 +0,0 @@ -# Creating a Culture of Innovation -We aspire to create a culture where people work joyfully, communicate openly -about things that matter, and provide great services globally. We would like our -team and communities (both government and private sector) to reflect on -diversity of all kinds, not just the classes protected in law. Diversity fosters -innovation. Diverse teams are creative teams. We need a diversity of perspective -to create solutions for the challenges we face. - -This is our code of conduct (adapted from [18F's Code of Conduct](https://github.com/18F/code-of-conduct)). -We follow all Equal Employment Opportunity laws and we expect everyone we work -with to adhere to the [GSA Anti-harassment Policy](http://www.gsa.gov/portal/directive/d0/content/512516), -even if they do not work for the Centers for Disease Control and Prevention or -GSA. We expect every user to follow this code of conduct and the laws and -policies mentioned above. - -## Be Empowering -Consider what you can do to encourage and support others. Make room for quieter -voices to contribute. Offer support and enthusiasm for great ideas. Leverage the -low cost of experimentation to support your colleagues' ideas, and take care to -acknowledge the original source. Look for ways to contribute and collaborate, -even in situations where you normally wouldn't. Share your knowledge and skills. -Prioritize access for and input from those who are traditionally excluded from -the civic process. - -## Rules of Behavior - * I understand that I must complete security awareness and records management - training annually in order to comply with the latest security and records - management policies. - * I understand that I must also follow the [Rules of Behavior for use of HHS Information Resources](http://www.hhs.gov/ocio/policy/hhs-rob.html) - * I understand that I must not use, share, or store any kind of sensitive data - (health status, provision or payment of healthcare, PII, etc.) under ANY - circumstance. - * I will not knowingly conceal, falsify, or remove information. - * I understand that I can only use non-sensitive and/or publicly available - data. - * I understand that all passwords I create to set up accounts need to comply - with CDC's password policy. - * I understand that the stewards reserves the right to moderate all data at any - time. - -## Boundaries -Create boundaries to your own behavior and consider how you can create a safe -space that helps prevent unacceptable behavior by others. We can't list all -instances of unacceptable behavior, but we can provide examples to help guide -our community in thinking through how to respond when we experience these types -of behavior, whether directed at ourselves or others. - -If you are unsure if something is appropriate behavior, it probably is not. Each -person we interact with can define where the line is for them. Impact matters -more than intent. Ensuring that your behavior does not have a negative impact is -your responsibility. Problems usually arise when we assume that our way of -thinking or behavior is the norm for everyone. - -### Here are some examples of unacceptable behavior - * Negative or offensive remarks based on the protected classes as listed in the - GSA Anti-harassment Policy of race, religion, color, sex, national origin, - age, disability, genetric information, sexual orientation, gender identity, - parental status, maritual status, and political affiliation as well as gender - expression, mental illness, socioeconomic status or backgrounds, - neuro(a)typicality, physical appearance, body size, or clothing. Consider - that calling attention to differences can feel alienating. - * Sustained disruption of meetings, talks, or discussions, including chatrooms. - * Patronizing language or behavior. - * Aggressive behavior, such as unconstructive criticism, providing correction - that do not improve the conversation (sometimes referred to as "well - actually's"), repeatedly interrupting or talking over someone else, feigning - surprise at someone's lack of knowledge or awareness about a topic, or subtle - prejudice. - * Referring to people in a way that misidentifies their gender and/or rejects - the validity of their gender identity; for instance by using incorrect - pronouns or forms of address (misgendering). - * Retaliating against anyone who files a formal complaint that someone has - violated these codes or laws. - -## Background -CDC Scientific Clearance is the process of obtaining approvals by appropriate -CDC officials before a CDC information product is released to the public or -CDC's external public health partners. Information products that require formal -clearance include print, electronic, or oral materials, that CDC employees -author or co-author, whether published by CDC or outside CDC. CDC contractors -developing content on behalf of CDC for the public or CDC's external public -health partners are also required to put their content through the formal -clearance process. The collaborative functions related to the projects include -blogs, wikis, forums, bug tracking sites, source control and -others deemed as necessary. - -For those individuals within the CDC, adherence to the following policies are -required: -* CDC ["Clearance of Information Products Disseminated Outside CDC for Public Use"](http://www.cdc.gov/maso/Policy/PublicUse.pdf) -* HHS ["Ensuring the Quality of Information Disseminated by HHS agencies"](http://aspe.hhs.gov/infoquality) - -All collaborative materials will be controlled by the rules contained within -this document. This will allow for the real-time collaboration opportunities -among CDC employees, CDC contractors and CDC public health partners. - -## Credit -This code of conduct was mainly adapted from [18F's Code of Conduct](https://github.com/18F/code-of-conduct) -and the [CDC's Informatics Innovation Unit R&D Lab's code of conduct.](https://www.philab.cdc.gov/index.php/code-of-conduct/) - -## Relevant Legal Considerations -* [Laws enforced by the Equal Employment Opportunity Commission](http://www.eeoc.gov/laws/statutes/index.cfm) -* [Types of discrimination prohibited by law](http://www.eeoc.gov/laws/types) -* [New and proposed regulations](http://www.eeoc.gov/laws/regulations/index.cfm) diff --git a/open_practices.md b/open_practices.md deleted file mode 100644 index 357ecbbb..00000000 --- a/open_practices.md +++ /dev/null @@ -1,126 +0,0 @@ -# CDC GitHub Practices for Open Source Projects - -**The [CDCGov organization on GitHub](https://github.com/CDCgov) is designated for use by CDC programs to publish open source code.** This is a set of practices to help programs release secure and compliant open source projects successfully. If you are interested in using GitHub for non-open source projects, please see information on our [enterprise organization](#cdc-enterprise). - -We designed these practices to be straightforward and helpful, and we [accept feedback](#support-and-feedback) from the community on updating them. For [Required Practices](#required-practices), Projects that don't adhere to the [Required Practices](#required-practices) could be subject to [archival or removal](#non-compliance-procedure). - -## Getting Started - -Before you can publish your project, you must request access to be added to the CDCgov organization. Complete these steps: - -1. Review the [Rules of Behavior](rules_of_behavior.md). -2. Confirm your [Github profile is setup](#profile-setup) properly. -3. Complete the [project request form](https://forms.office.com/Pages/ResponsePage.aspx?id=aQjnnNtg_USr6NJ2cHf8j44WSiOI6uNOvdWse4I-C2NUNk43NzMwODJTRzA4NFpCUk1RRU83RTFNVi4u). - * This will require your CDC login, so if you don't have a login, ask someone to request on your behalf, or [get in touch](#support-and-feedback). - -You should receive an email or notification when you are given access and your first repository should be setup for you. For subsequent projects, you will be able to create a repository in the organization using Github's interface. The [template repository](https://github.com/CDCgov/template) is maintained and an easy way to quick start your repository that complies with the guidelines. Once this is completed you're ready to follow the required guidelines to publish code. - -## Required Practices - -You must follow these practices before you publish real code into your repository. - -* [ ] **Get Clearance.** Always obtain clearance from your organization prior to setting up and publishing a repository. - * GitHub is a third party service used by CDC to collaborate with the public. Official CDC health messages will always be distributed through www.cdc.gov and through appropriate channels, so make sure to plan your project along with your official public health program on cdc.gov. -* [ ] **Naming.** Set a meaningful project name and short description for your project. The form to do this is in your repositories settings. - * [ ] Add [topics](https://help.github.com/en/github/administering-a-repository/classifying-your-repository-with-topics) to improve discovery and use of your project. For AI-related projects, the [Code.gov Implementation Guidance to Federal Agencies Regarding Enterprise Data and Source Code Inventories](https://code.gov/federal-agencies/compliance/inventory-code) must be followed when setting topics. -* [ ] **Create a README.** Add a `README.md` file at the root with the following: - * An overview of your project, including the purpose, goals and the team responsible. - * A description of your development process in the `README.md` file. If your project is no longer active, mark it as [archived](https://docs.github.com/en/free-pro-team@latest/github/creating-cloning-and-archiving-repositories/archiving-repositories). - * Include the following notice sections. You can modify the verbiage and adapt as necessary based on your program need. - * [ ] [Public Domain Standard Notice](https://github.com/CDCgov/template#public-domain-standard-notice) - * [ ] [License Standard Notice](https://github.com/CDCgov/template#license-standard-notice) - * [ ] [Privacy Standard Notice](https://github.com/CDCgov/template#privacy-standard-notice) - * [ ] [Contributing Standard Notice](https://github.com/CDCgov/template#contributing-standard-notice) - * [ ] [Records Management Standard Notice](https://github.com/CDCgov/template#records-management-standard-notice) - * [ ] [Additional Standard Notices](https://github.com/CDCgov/template#additional-standard-notices) -* [ ] **Choose a license.** Assign an open source license based on program need. - * If you need help choosing a license, please review [this article](https://www.philab.cdc.gov/index.php/2012/03/27/open-source-development-for-public-health-informatics/), refer to existing CDCgov projects, or ask for consultation support in choosing a license. -* [ ] **Security scanning and review.** - * **This is the final step before publishing and the most critical.** - * All source code used within CDC systems must comply with all cybersecurity processes prior to production use, including static and dynamic scanning. The same applies to code published as open source. - * If you are unsure about compliance, reach out to your organization's security officers. - * Never commit sensitive information, including usernames, passwords, tokens, PII, PHI. To automate this, you can integrate pre-commit tools like [Clouseau](https://github.com/cfpb/clouseau) to systematically review material before committing. - * Make sure that the commit history of your Github repository also doesn't have these things. In many cases it's easier to start a new repository and push up the code that has all sensitive information removed as the first commit. - * Enable [GitHub automated security alerts](https://help.github.com/en/github/managing-security-vulnerabilities/about-security-alerts-for-vulnerable-dependencies) and configure notification for the repo admin to see. -* [ ] **Setup your profile.** [Active project committers need to add profile info to help collaboration.](#profile-setup) - * [ ] **Two-factor authentication (2FA).** [Project admins must secure their account with two-factor-authentication.](https://docs.github.com/en/enterprise-server@2.21/github/authenticating-to-github/securing-your-account-with-two-factor-authentication-2fa) -* [ ] **Maintain your repository.** Once your repository is published, you must do the following to remain in compliance: - * [ ] **Respond to critical security issues and communication from administrators.** Ignoring security issues or not responding to communication from administrators can result in [archiving or removal](#non-compliance-procedure). - * [ ] **Archive old projects.** If you're no longer updating the project or have moved it's location, update your `README.md` file to let users know and [archive the repository](https://docs.github.com/en/free-pro-team@latest/github/creating-cloning-and-archiving-repositories/archiving-repositories). - -## Recommended Practices - -Optional improvements to make your open source project more successful. - -* [ ] Establish pull request templates to make it easier for contributors to send pull requests. For example [SDP-V has a checklist for each PR to match their development practices.](https://github.com/CDCgov/SDP-Vocabulary-Service/blob/master/.github/PULL_REQUEST_TEMPLATE) -* [ ] Agree on project conventions and include them in your `README.md` file. Depending on what type of project, this includes folder structure for data, linters, editor configuration (eg, [MicrobeTrace's .editorconfig](https://github.com/CDCgov/MicrobeTrace/blob/master/.editorconfig)). This will help improve the quality of your project and make it easier for others to contribute to your project. -* [ ] Add support and community procedures. CDC does not provide warranty or official support for open source projects, but describing how you would like questions and issues will assist users of your project. If you use a wiki, or project board, or package manager, describe and link to that. Official contribution steps will make it easier for people outside of CDC to contribute to your project. -* [ ] Include references to publications, presentations, and sites featuring your project. -* [ ] Add an entry to [open.cdc.gov](https://open.cdc.gov) to the [data](https://open.cdc.gov/data.html), [code](https://open.cdc.gov/code.html), [api](https://open.cdc.gov/apis.html), or [event](https://open.cdc.gov/events.html) page to help people find your project on cdc.gov -* [ ] Add versions and tags describing major releases and milestones. For example, [open.cdc.gov's releases each time a new version is published to the web site](https://github.com/CDCgov/opencdc/releases/tag/v1.0.9) or [geneflow's changelog](https://github.com/CDCgov/geneflow/blob/master/CHANGELOG.md). -* [ ] Follow [Semantic Versioning 2.0.0](https://semver.org/) when creating versions for your project. -* [ ] Describe and test reproducible practices to install and build your project. For example, [injury_autocoding's code section on running the project's scripts](https://github.com/cdcai/injury_autocoding#code)). -* [ ] Recognize contributors and existing resources that have helped the project. For example, [fdns-ms-hl7-utils' AUTHORS file](https://github.com/CDCgov/fdns-ms-hl7-utils/blob/master/AUTHORS). -* [ ] Automate build and test procedures to reduce the effort of outside contributors to send pull requests (eg, [Travis CI](https://travis-ci.org/), [Circle CI](https://circleci.com/), [GitHub Actions](https://help.github.com/en/actions)) -* [ ] [Appropriately gather metrics](https://opensource.guide/metrics/) on how your project is used and incorporate this into your feature planning process. -* [ ] [Incorporate documentation into your development cycle](https://github.com/GSA/code-gov-open-source-toolkit/blob/master/toolkit_docs/documentation.md), and where possible, automate the generation of documentation so it is more likely to be up to date and useful to people interested in your project. - -## Guidance - -### Support and Feedback - -If you need additional support with your setting up project, or have any feedback or ideas about this guidance please [open an issue](https://github.com/CDCgov/template/issues) or send an email to [data@cdc.gov](mailto:data@cdc.gov). We also accept pull requests if you want to directly edit the guidance. - -### Non-Compliance Procedure - -Projects in this organization are reviewed occasionally for compliance with the [Required Practices](#required-practices). If your project is found to not be in compliance, you will be contacted by administrators to help bring your project into compliance. Projects that do not respond or that habitually fail to meet these practices will be archived or removed from the organization, depending on severity. - -### Profile Setup - -Please make sure your profile is set up properly to help us work better together. Specifically, keep your profile up to date with: - -* **Name:** Your first and last name. -* **Company:** Your government agency or contracting company. (If you also use GitHub for personal projects, consider specifying “CDC (work) + personal projects” to make it clear that some of your GitHub projects may be personal in nature.) -* **Location:** Your primary work location (city, state). -* **Photo:** A headshot photo, or an appropriate image that is unique to you. - -If you admin any projects, make sure to [secure your account with two-factor authentication (2FA)](https://docs.github.com/en/enterprise-server@2.21/github/authenticating-to-github/securing-your-account-with-two-factor-authentication-2fa). Although you probably already did this because you are smart. - -### Open Source Checklist - -So you've decided to set up an open source project at CDC. Here are the steps to do that, in the most common order. - -* [ ] Create a new project using the [template repo](https://github.com/CDCgov/template). -* [ ] Update your readme.md following the [CDC GitHub Practices for Open Source Projects](https://github.com/CDCgov/template/blob/master/open_practices.md) -* [ ] Choose a license. Most projects are ASL2, but license should meet public health program need. See for more info on choosing a license. -* [ ] Remove all sensitive info. -* [ ] Talk with your ADI, ADS, and ISSO for review and clearance. -* [ ] After approval, create a GitHub user. -* [ ] Fill out the [Request a Repo form](https://forms.office.com/Pages/ResponsePage.aspx?id=aQjnnNtg_USr6NJ2cHf8j44WSiOI6uNOvdWse4I-C2NUNk43NzMwODJTRzA4NFpCUk1RRU83RTFNVi4u) for a new repo on [CDCGov](https://github.com/cdcgov) or [CDCai](https://github.com/cdcai). -* [ ] When you get an email or push alert that your repo is ready, push to GitHub -* [ ] Add an entry in [open.cdc.gov](https://open.cdc.gov) on their [code page](https://open.cdc.gov/code.html) to officially be linked from cdc.gov. This helps users find and use your project. -* [ ] Keep your project up to date, when you're finished flag it as [archived](https://docs.github.com/en/free-pro-team@latest/github/creating-cloning-and-archiving-repositories/archiving-repositories). - -_This checklist was adapted from the CDC IT Guard Rail and put here to help people who don't have access to the intranet._ - -### CDC Enterprise - -Our [CDCent](https://github.com/cdcent/) organization is used for private, non-public projects so only CDC staff and approved outside collaborators work on these projects, you can request access through the [GitHub Enterprise Cloud form](https://forms.office.com/Pages/ResponsePage.aspx?id=aQjnnNtg_USr6NJ2cHf8j44WSiOI6uNOvdWse4I-C2NUQjVJVDlKS1c0SlhQSUxLNVBaOEZCNUczVS4u). - -### Reference Links - -These are helpful links from across the Federal Government regarding open sourcing code. - -* [CFPB Open Tech](https://cfpb.github.io/) -* [TTS Engineering Practices Guide](https://engineering.18f.gov/) -* [18F Open Source Policy](https://github.com/18F/open-source-policy) and [Practicing our open source policy](https://github.com/18F/open-source-policy/blob/master/practice.md) -* [GitHub and Government: How agencies build software](https://government.github.com/) -* [code.gov](https://code.gov) -* [Federal Source Code and Open Source Toolkit](https://github.com/GSA/code-gov-open-source-toolkit) -* [Federal Source Code Policy (M-16-21)](https://sourcecode.cio.gov/) -* [openCDC](https://open.cdc.gov) -* [Digital Services Playbook](https://playbook.cio.gov/) -* [CDC/ATSDR Policy on Public Health Research and Nonresearch Data Management and Access](https://www.cdc.gov/maso/policy/policy385.pdf) - * [CDC/ATSDR Policy on Releasing and Sharing Data](https://www.cdc.gov/maso/Policy/ReleasingData.pdf) (old version, but still a useful reference) -* [Clearance of Information Products Disseminated Outside CDC for Public Use](https://www.cdc.gov/os/policies/docs/CDC-GA-2005-06_Clearance_of_Information_Products_Disseminated_Outside_for_Public_Use.pdf) -* [Federal Source Code Toolkit](https://github.com/GSA/code-gov-open-source-toolkit) diff --git a/rules_of_behavior.md b/rules_of_behavior.md deleted file mode 100644 index 06b5716e..00000000 --- a/rules_of_behavior.md +++ /dev/null @@ -1,72 +0,0 @@ -# Rules of Behavior and Posting Guidelines for the Use of GitHub as a Third-Party Web Application - -## Purpose - -These rules of behavior establish the privacy and information security requirements for the use of Third Party Web Applications (TPWAs) in conjunction with the CDC GitHub.com organizations established for open source projects. These rules of behavior were developed to ensure that CDC and its confidential information and technologies are not compromised, as well as protecting general CDC interests and services from risks associated with the use of TPWAs while allowing for the increased efficiencies and cost savings that come with appropriate use of third party services. - -## Scope - -These rules of behavior and its related guidance apply to federal employees, contractors, and all external collaborators who will access GitHub from CDC directly or use them with non-sensitive data obtained from CDC. All engagement with TPWAs related to the GitHub will be governed by these rules of behavior, as well as to the Rules of Behavior for the Use of HHS Information Services. - -## Ownership - -CDC assigns three stewards in charge of rules and policy compliance: a Business Steward, a Security Steward, and a Technical Steward. The business and security stewards are responsible for establishing policy and providing approval, while the technical steward fulfills requests from users. Users requesting access to GitHub that have not been approved yet need to assign a main and a backup point of contact (POC) with the business steward, as well as provide a justification to the security steward. - -The security steward is responsible for the security of the GitHub usage as a TPWA and its impact on the CDC network and compliance with CDC security policies. All users, including POCs, are responsible for adherence to this policy and associated processes. Where there is not a rule of behavior that provides explicit guidance, users must do their best to safeguard CDC and its network and services from security risks. - -## Rules of Behavior - -All new users of GitHub must read and acknowledge these rules before using any of the approved TPWAs. This acknowledgment must be completed annually, and establishes agreement from part of the user to adhere to these rules. - -* I understand that I must complete security awareness and records management training annually in order to comply with the latest security and records management policies. -* I understand that I must also follow the Rules of Behavior for use of HHS Information Resources. -* I understand that I must not use, share, or store any kind of sensitive data (health status, provision or payment of healthcare, pictures, PII, etc.) with TPWAs under ANY circumstance. -* I will not knowingly conceal, falsify or remove information.This includes editing or removing the template language provided when a Github repository is created. -* I understand that I can only use non-sensitive and/or publicly available data in GitHub. If you are unsure of what constitutes non-sensitive information, please see guidance below. -* I understand that all passwords I create to set up GitHub accounts need to comply with CDC’s password policy. -* I understand that the steward reserves the right to moderate all data at any time. -* I understand my responsibilities to protect systems and data as specified by CDC policies. - -## Guidance Regarding Non-Sensitive and Publicly Available Information - -In support of program collaboration in the use oF GitHub, portions of some GitHub projects are either currently open to the public or may become open to the public in the future. The following guidelines will inform and assist the user in determining that the information to be posted on GitHub is not sensitive. The bottom line is if the content you are posting is not appropriate to post for public access, it should not be posted on GitHub. - -Before posting information that involves other CDC programs, employees, etc. to GitHub, it is important that the poster ensures they receive approval by the relevant CDC entity to post the information. - -Questions to consider before posting information include: - -| Do I have reservations about anyone viewing this information? | Yes | Do not post. | -| Were individuals informed that this information would be posted on GitHub? | No | Do not post. | -| Does this information contain details or descriptions of CDC security systems or other sensitive infrastructures? | Yes | Do not post. | -| Does this information reflect program efforts to engage and inform external partners and the public? | No | Do not post. | - -Examples of information which has been deemed not sensitive and may be posted on GitHub include the following. - -* Source Code -* Use cases -* User stories/requirements -* Process flows -* Program pain points -* Software Service Descriptions - -Sensitive information, which should not be posted, includes (but is not limited to) the following. - -* Information directly attributed to an individual in a sensitive manner -* The names or pictures of individuals -* Protected health information -* Project management material. This includes posting or discussing security documentation, implementation plans, communications regarding project specifics, etc. -* Opinions related to programs or tools, specifically those that may have an adverse impact -* Non-public Links to CDC SharePoint or other internal references -* Non-public Details on CDC internal infrastructure - -If there’s any question on whether information may be sensitive (such as detailed interview notes or specific references provided during a program interview), further guidance should be sought from the security steward prior to posting the information on any GitHub. - -## Enforcement - -Users looking to use GitHub that are unable to follow these rules of behavior will not have authorization to do so. Any users that violate these rules of behavior or CDC security policies may be subject to action, up to and including revoking access to GitHub. Technical and security stewards have the right to enforce these rules of behavior based on violations at any time. - -## References - -* [Policy for Managing the Use of Third-Party Websites and Applications](https://www.hhs.gov/about/agencies/asa/ocio/cybersecurity/policy-social-media-technologies/index.html) -* [Rules of Behavior for Use of HHS Information Resources](http://www.hhs.gov/ocio/policy/hhs-rob.html) -* [Security and Awareness Training](http://sat.cdc.gov/) (requires login) diff --git a/thanks.md b/thanks.md deleted file mode 100644 index 0f135cb9..00000000 --- a/thanks.md +++ /dev/null @@ -1,6 +0,0 @@ -# Thanks and Acknowledgements - -Starting this file way too late, but wanted to recognize contributions made by people who helped this repo. There are many more than this, but I should have started this file years ago. - -* Chris Sandlin [@cssandlin](https://github.com/cssandlin) -* Drewry Morris [@drewry](https://github.com/drewry) From 342701ab53d380da656344bf2ea40a2cc0065405 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 26 Jun 2024 10:16:19 -0400 Subject: [PATCH 002/103] add a .gitignore with R and data stuff --- .gitignore | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 91 insertions(+) create mode 100644 .gitignore diff --git a/.gitignore b/.gitignore new file mode 100644 index 00000000..7cdb0e35 --- /dev/null +++ b/.gitignore @@ -0,0 +1,91 @@ + +##### + +# Exclude data and output file types + +# Data +*.csv +*.tsv +*.parquet +*.dat +*.bin +*.xls +*.xlsx + +# Documents +*.doc +*.docx +*.htm +*.html +*.ppt +*.pptx +*.pdf + +# Images +*.bmp +*.jpeg +*.jpg +*.gif +*.pdf +*.png + +# Compressed archives +*.gz +*.tar +*.tgz +*.rar +*.zip + +##### +# R +# https://github.com/github/gitignore/blob/main/R.gitignore + +# History files +.Rhistory +.Rapp.history + +# Session Data files +.RData +.RDataTmp + +# User-specific files +.Ruserdata + +# Example code in package build process +*-Ex.R + +# Output files from R CMD build +/*.tar.gz + +# Output files from R CMD check +/*.Rcheck/ + +# RStudio files +.Rproj.user/ + +# produced vignettes +vignettes/*.html +vignettes/*.pdf + +# OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3 +.httr-oauth + +# knitr and R markdown default cache directories +*_cache/ +/cache/ + +# Temporary files created by R markdown +*.utf8.md +*.knit.md + +# R Environment Variables +.Renviron + +# pkgdown site +docs/ + +# translation temp files +po/*~ + +# RStudio Connect folder +rsconnect/ From 4685a13234b293a6bf3693e2d4af8be7ba03983c Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 26 Jun 2024 11:33:15 -0400 Subject: [PATCH 003/103] add git ignore from cfa-repo-template --- .gitignore | 298 ++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 294 insertions(+), 4 deletions(-) diff --git a/.gitignore b/.gitignore index 7cdb0e35..ad2d8439 100644 --- a/.gitignore +++ b/.gitignore @@ -1,7 +1,5 @@ - ##### - -# Exclude data and output file types +# Exclude many data and output file types by default # Data *.csv @@ -23,8 +21,8 @@ # Images *.bmp -*.jpeg *.jpg +*.jpeg *.gif *.pdf *.png @@ -36,6 +34,180 @@ *.rar *.zip +# Allowlist +# !your_data_file.csv +# !your_data_directory/ + + +##### +# Python +# https://github.com/github/gitignore/blob/main/Python.gitignore + +# Byte-compiled / optimized / DLL files +__pycache__/ +*.py[cod] +*$py.class + +# C extensions +*.so + +# Distribution / packaging +.Python +build/ +develop-eggs/ +dist/ +downloads/ +eggs/ +.eggs/ +lib/ +lib64/ +parts/ +sdist/ +var/ +wheels/ +share/python-wheels/ +*.egg-info/ +.installed.cfg +*.egg +MANIFEST + +# PyInstaller +# Usually these files are written by a python script from a template +# before PyInstaller builds the exe, so as to inject date/other infos into it. +*.manifest +*.spec + +# Installer logs +pip-log.txt +pip-delete-this-directory.txt + +# Unit test / coverage reports +htmlcov/ +.tox/ +.nox/ +.coverage +.coverage.* +.cache +nosetests.xml +coverage.xml +*.cover +*.py,cover +.hypothesis/ +.pytest_cache/ +cover/ + +# Translations +*.mo +*.pot + +# Django stuff: +*.log +local_settings.py +db.sqlite3 +db.sqlite3-journal + +# Flask stuff: +instance/ +.webassets-cache + +# Scrapy stuff: +.scrapy + +# Sphinx documentation +docs/_build/ + +# PyBuilder +.pybuilder/ +target/ + +# Jupyter Notebook +.ipynb_checkpoints + +# IPython +profile_default/ +ipython_config.py + +# pyenv +# For a library or package, you might want to ignore these files since the +# code is intended to run in multiple environments; otherwise, check them in: +# .python-version + +# pipenv +# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in +# version control. However, in case of collaboration, if having +# platform-specific dependencies or dependencies having no cross-platform +# support, pipenv may install dependencies that don't work, or not install all +# needed dependencies. +#Pipfile.lock + +# poetry +# Similar to Pipfile.lock, it is generally recommended to include poetry.lock +# in version control. This is especially recommended for binary packages to +# ensure reproducibility, and is more commonly ignored for libraries. +# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control +#poetry.lock + +# pdm +# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in +# version control. +#pdm.lock +# pdm stores project-wide configurations in .pdm.toml, but it is recommended +# to not include it in version control. +# https://pdm.fming.dev/#use-with-ide +.pdm.toml + +# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm +__pypackages__/ + +# Celery stuff +celerybeat-schedule +celerybeat.pid + +# SageMath parsed files +*.sage.py + +# Environments +.env +.venv +env/ +venv/ +ENV/ +env.bak/ +venv.bak/ + +# Spyder project settings +.spyderproject +.spyproject + +# Rope project settings +.ropeproject + +# mkdocs documentation +/site + +# mypy +.mypy_cache/ +.dmypy.json +dmypy.json + +# Pyre type checker +.pyre/ + +# pytype static type analyzer +.pytype/ + +# Cython debug symbols +cython_debug/ + +# PyCharm +# JetBrains specific template is maintained in a separate JetBrains.gitignore +# that can be found at +# https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore +# and can be added to the global gitignore or merged into this file. +# For a more nuclear option (not recommended) you can uncomment the following +# to ignore the entire idea folder. +#.idea/ + ##### # R # https://github.com/github/gitignore/blob/main/R.gitignore @@ -89,3 +261,121 @@ po/*~ # RStudio Connect folder rsconnect/ + +##### +# Java +# https://github.com/github/gitignore/blob/main/Java.gitignore + +# Compiled class file +*.class + +# Log file +*.log + +# BlueJ files +*.ctxt + +# Mobile Tools for Java (J2ME) +.mtj.tmp/ + +# Package Files # +*.jar +*.war +*.nar +*.ear +*.zip +*.tar.gz +*.rar + +# virtual machine crash logs, see +# http://www.java.com/en/download/help/error_hotspot.xml +hs_err_pid* +replay_pid* + +##### +# C++ +# https://github.com/github/gitignore/blob/main/C%2B%2B.gitignore + +# Prerequisites +*.d + +# Compiled Object files +*.slo +*.lo +*.o +*.obj + +# Precompiled Headers +*.gch +*.pch + +# Compiled Dynamic libraries +*.so +*.dylib +*.dll + +# Fortran module files +*.mod +*.smod + +# Compiled Static libraries +*.lai +*.la +*.a +*.lib + +# Executables +*.exe +*.out +*.app + + +##### +# Rust +# https://github.com/github/gitignore/blob/main/Rust.gitignore + +# Generated by Cargo +# will have compiled files and executables +debug/ +target/ + +# Remove Cargo.lock from gitignore if creating an executable, leave it for +# libraries. More information here +# https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html +#Cargo.lock + +# These are backup files generated by rustfmt +**/*.rs.bk + +# MSVC Windows builds of rustc generate these, which store debugging information +*.pdb + +##### +# Julia +# https://github.com/github/gitignore/blob/main/Julia.gitignore + +# Files generated by invoking Julia with --code-coverage +*.jl.cov +*.jl.*.cov + +# Files generated by invoking Julia with --track-allocation +*.jl.mem + +# System-specific files and directories generated by the BinaryProvider and +# BinDeps packages. They contain absolute paths specific to the host computer, +# and so should not be committed +deps/deps.jl +deps/build.log +deps/downloads/ +deps/usr/ +deps/src/ + +# Build artifacts for creating documentation generated by the Documenter package +docs/build/ +docs/site/ + +# File generated by Pkg, the package manager, based on a corresponding +# Project.toml It records a fixed state of all packages used by the project. As +# such, it should not be committed for packages, but should be committed for +# applications that require a static environment. +#Manifest.toml From f66494ae6e6fb04ee673b514476bca9d1967518b Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 26 Jun 2024 11:38:43 -0400 Subject: [PATCH 004/103] make .github folder mirror cfa-repo-template --- .github/ISSUE_TEMPLATE/bug_report.md | 8 +------- .../ISSUE_TEMPLATE/scientific-improvement.md | 17 +++++++++++++++++ {pre-commit => .github/actions}/action.yaml | 0 .../workflows}/pre-commit.yaml | 0 .vscode/settings.json | 8 -------- 5 files changed, 18 insertions(+), 15 deletions(-) create mode 100644 .github/ISSUE_TEMPLATE/scientific-improvement.md rename {pre-commit => .github/actions}/action.yaml (100%) rename {workflows => .github/workflows}/pre-commit.yaml (100%) delete mode 100644 .vscode/settings.json diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md index 747e8aa1..dd84ea78 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -8,10 +8,7 @@ assignees: '' --- **Describe the bug** -A clear and concise description of what feature is not working. - -**Impact** -Please describe the impact this bug is causing to your program or organization. +A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior: @@ -26,9 +23,6 @@ A clear and concise description of what you expected to happen. **Screenshots** If applicable, add screenshots to help explain your problem. -**Logs** -If applicable, please attach logs to help describe your problem. - **Desktop (please complete the following information):** - OS: [e.g. iOS] - Browser [e.g. chrome, safari] diff --git a/.github/ISSUE_TEMPLATE/scientific-improvement.md b/.github/ISSUE_TEMPLATE/scientific-improvement.md new file mode 100644 index 00000000..4870c642 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/scientific-improvement.md @@ -0,0 +1,17 @@ +--- +name: Scientific improvement +about: Suggest a way to improve an existing tool or pipeline +title: '' +labels: '' +assignees: '' + +--- + +## Describe the improvement that needs to be made +(e.g. update a parameter estimate, tweak the prior, modify the model) + +## Provide links to references to methods or data sources + +## Describe the changes expected to the model's outputs + +## Suggest new tests that will need to be implemented diff --git a/pre-commit/action.yaml b/.github/actions/action.yaml similarity index 100% rename from pre-commit/action.yaml rename to .github/actions/action.yaml diff --git a/workflows/pre-commit.yaml b/.github/workflows/pre-commit.yaml similarity index 100% rename from workflows/pre-commit.yaml rename to .github/workflows/pre-commit.yaml diff --git a/.vscode/settings.json b/.vscode/settings.json deleted file mode 100644 index 6b502cd5..00000000 --- a/.vscode/settings.json +++ /dev/null @@ -1,8 +0,0 @@ -{ - "cSpell.words": [ - "ATSDR", - "CFPB", - "ISSO", - "cybersecurity" - ] -} \ No newline at end of file From d19fe05ead7f14c7301bd6cbb46ec2033f6d6a1c Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 26 Jun 2024 11:43:15 -0400 Subject: [PATCH 005/103] remove attributes --- .gitattributes | 13 ------------- 1 file changed, 13 deletions(-) delete mode 100644 .gitattributes diff --git a/.gitattributes b/.gitattributes deleted file mode 100644 index dcdf6979..00000000 --- a/.gitattributes +++ /dev/null @@ -1,13 +0,0 @@ -# Normal text let sit to auto -*.htm text -*.html text -*.css text -*.js text - -## Declare files that will always have LF (aka \n aka 10 aka 0x0a) line endings on checkout. -*.sh text eol=lf -*.md text eol=lf -*.json text eol=lf -*.yml text eol=lf -*.csv text eol=lf - From eeb3fe60ad3d069fccd90c174677349e1d7e4abf Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 26 Jun 2024 12:23:18 -0400 Subject: [PATCH 006/103] add bare bones componets of an R package --- .Rbuildignore | 2 ++ DESCRIPTION | 15 ++++++++++++++ NAMESPACE | 3 +++ R/utils.R | 53 ++++++++++++++++++++++++++++++++++++++++++++++++ man/add_pmfs.Rd | 42 ++++++++++++++++++++++++++++++++++++++ tests/testthat.R | 12 +++++++++++ 6 files changed, 127 insertions(+) create mode 100644 .Rbuildignore create mode 100644 DESCRIPTION create mode 100644 NAMESPACE create mode 100644 R/utils.R create mode 100644 man/add_pmfs.Rd create mode 100644 tests/testthat.R diff --git a/.Rbuildignore b/.Rbuildignore new file mode 100644 index 00000000..fc677c7b --- /dev/null +++ b/.Rbuildignore @@ -0,0 +1,2 @@ +^wwinference\.Rproj$ +^\.Rproj\.user$ diff --git a/DESCRIPTION b/DESCRIPTION new file mode 100644 index 00000000..1d876a74 --- /dev/null +++ b/DESCRIPTION @@ -0,0 +1,15 @@ +Package: wwinference +Title: What the Package Does (One Line, Title Case) +Version: 0.0.0.9000 +Authors@R: + person("First", "Last", , "first.last@example.com", role = c("aut", "cre"), + comment = c(ORCID = "YOUR-ORCID-ID")) +Description: What the package does (one paragraph). +License: `use_mit_license()`, `use_gpl3_license()` or friends to pick a + license +Encoding: UTF-8 +Roxygen: list(markdown = TRUE) +RoxygenNote: 7.3.1 +Suggests: + testthat (>= 3.0.0) +Config/testthat/edition: 3 diff --git a/NAMESPACE b/NAMESPACE new file mode 100644 index 00000000..c42a4827 --- /dev/null +++ b/NAMESPACE @@ -0,0 +1,3 @@ +# Generated by roxygen2: do not edit by hand + +export(add_pmfs) diff --git a/R/utils.R b/R/utils.R new file mode 100644 index 00000000..9e0ff95d --- /dev/null +++ b/R/utils.R @@ -0,0 +1,53 @@ +#' Add probability mass functions +#' +#' This function allows the addition of probability mass functions (PMFs) to +#' produce a new PMF. This is useful for example in the context of reporting +#' delays where the PMF of the sum of two Poisson distributions is the +#' convolution of the PMFs. +#' +#' This code was adapted from code written +#' (under an MIT license) as part of the `epinowcast` +#' package (https://github.com/epinowcast/epinowcast) +#' +#' @param pmfs A list of vectors describing the probability mass functions to +#' +#' @return A vector describing the probability mass function of the sum of the +#' +#' @export +#' @examples +#' # Sample and analytical PMFs for two Poisson distributions +#' x <- rpois(10000, 5) +#' xpmf <- dpois(0:20, 5) +#' y <- rpois(10000, 7) +#' ypmf <- dpois(0:20, 7) +#' # Add sampled Poisson distributions up to get combined distribution +#' z <- x + y +#' # Analytical convolution of PMFs +#' conv_pmf <- add_pmfs(list(xpmf, ypmf)) +#' conv_cdf <- cumsum(conv_pmf) +#' # Empirical convolution of PMFs +#' cdf <- ecdf(z)(0:42) +#' # Compare sampled and analytical CDFs +#' plot(conv_cdf) +#' lines(cdf, col = "black") +add_pmfs <- function(pmfs) { + d <- length(pmfs) + if (d == 1) { + return(pmfs[[1]]) + } + if (!is.list(pmfs)) { + return(pmfs) + } + # P(Z = z) = sum_over_x(P(X = x) * P(Y = z - x)) # nolint + return( + Reduce(x = pmfs, f = function(conv, pmf) { + lc <- length(conv) + wd <- seq_len(lc) - 1 + proc <- numeric(lc + length(pmf)) + for (j in seq_along(pmf)) { + proc[j + wd] <- proc[j + wd] + pmf[j] * conv + } + return(proc) + }) + ) +} diff --git a/man/add_pmfs.Rd b/man/add_pmfs.Rd new file mode 100644 index 00000000..544f2116 --- /dev/null +++ b/man/add_pmfs.Rd @@ -0,0 +1,42 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/utils.R +\name{add_pmfs} +\alias{add_pmfs} +\title{Add probability mass functions} +\usage{ +add_pmfs(pmfs) +} +\arguments{ +\item{pmfs}{A list of vectors describing the probability mass functions to} +} +\value{ +A vector describing the probability mass function of the sum of the +} +\description{ +This function allows the addition of probability mass functions (PMFs) to +produce a new PMF. This is useful for example in the context of reporting +delays where the PMF of the sum of two Poisson distributions is the +convolution of the PMFs. +} +\details{ +This code was adapted from code written +(under an MIT license) as part of the \code{epinowcast} +package (https://github.com/epinowcast/epinowcast) +} +\examples{ +# Sample and analytical PMFs for two Poisson distributions +x <- rpois(10000, 5) +xpmf <- dpois(0:20, 5) +y <- rpois(10000, 7) +ypmf <- dpois(0:20, 7) +# Add sampled Poisson distributions up to get combined distribution +z <- x + y +# Analytical convolution of PMFs +conv_pmf <- add_pmfs(list(xpmf, ypmf)) +conv_cdf <- cumsum(conv_pmf) +# Empirical convolution of PMFs +cdf <- ecdf(z)(0:42) +# Compare sampled and analytical CDFs +plot(conv_cdf) +lines(cdf, col = "black") +} diff --git a/tests/testthat.R b/tests/testthat.R new file mode 100644 index 00000000..713302a4 --- /dev/null +++ b/tests/testthat.R @@ -0,0 +1,12 @@ +# This file is part of the standard setup for testthat. +# It is recommended that you do not modify it. +# +# Where should you do additional test configuration? +# Learn more about the roles of various files in: +# * https://r-pkgs.org/testing-design.html#sec-tests-files-overview +# * https://testthat.r-lib.org/articles/special-files.html + +library(testthat) +library(wwinference) + +test_check("wwinference") From 1338587d27af310db042a97a97e9ee7e85f4eacc Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 26 Jun 2024 12:27:58 -0400 Subject: [PATCH 007/103] update description --- DESCRIPTION | 49 ++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 44 insertions(+), 5 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index 1d876a74..1d373c44 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,12 +1,51 @@ Package: wwinference Title: What the Package Does (One Line, Title Case) Version: 0.0.0.9000 -Authors@R: - person("First", "Last", , "first.last@example.com", role = c("aut", "cre"), - comment = c(ORCID = "YOUR-ORCID-ID")) +Authors@R: c( + person(given = "Kaitlyn", + family = "Johnson", + role = c("aut", "cre"), + email = "uox1@cdc.gov", + comment = c(ORCID = "0000-0001-8011-0012")), + person(given = "Sam", + family = "Abbott", + role = c("aut"), + email = "contact@samabbott.co.uk", + comment = c(ORCID = "0000-0001-8057-8037")), + person(given = "Zachary", + family = "Susswein", + role = c("aut"), + email = "utb2@cdc.gov"), + person(given = "Andrew", + family = "Magee", + role = c("aut"), + email = "rzg0@cdc.gov"), + person(given = "Dylan", + family = "Morris", + role = c("aut"), + email = "dylan@dylanhmorris.com", + comment = c(ORCID = "0000-0002-3655-406X")), + person(given = "Scott", + family = "Olesen", + role = c("aut"), + email = "ulp7@cdc.gov"), + person(given = "George", + family = "Vega Yon", + role = c("ctb"), + email = "g.vegayon@gmail.com", + comment = c(ORCID = "0000-0002-3171-0844")), + person(given = "Damon", + family = "Bayer", + role = c("aut"), + email = "xum8@cdc.gov") + ) Description: What the package does (one paragraph). -License: `use_mit_license()`, `use_gpl3_license()` or friends to pick a - license +License: Apache License (>= 2) +URL: https://github.com/cdcgov/ww-inference-model/ +BugReports: https://github.com/cdcgov/ww-inference-model/issues/ +Depends: + R (>= 4.3.0) +SystemRequirements: CmdStan (>=2.35.0) Encoding: UTF-8 Roxygen: list(markdown = TRUE) RoxygenNote: 7.3.1 From 112ae919bbe821cf12d38f5168b4bd809dc5208e Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 26 Jun 2024 12:38:20 -0400 Subject: [PATCH 008/103] add description of package --- DESCRIPTION | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index 1d373c44..b2540a65 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,5 +1,5 @@ Package: wwinference -Title: What the Package Does (One Line, Title Case) +Title: Jointly infers infection dynamics from wastewater data and epidemiological indicators Version: 0.0.0.9000 Authors@R: c( person(given = "Kaitlyn", @@ -39,7 +39,17 @@ Authors@R: c( role = c("aut"), email = "xum8@cdc.gov") ) -Description: What the package does (one paragraph). +Description: An implementation of a hierarchical semi-mechanistic renewal +approach jointly calibrating to multiple wastewater concentrations datasets from +subsets of a specified population and epidemioliogical indicators such as cases +or hospital admissions from the whole population. Our framework is an extension +of the widely used semi-mechanistic renewal framework {EpiNow2}, using a Bayesian +latent variable approach implemented in the probabilistic programming language +Stan. This package contains just the core components needed to fit these two +data sources and produce the following outputs-- estimated and forecasted +hospital admissions, estimated and forecasted wastewater concentrations, +global $R(t)$ estimates, local $R(t)$ estimates for the subpopulations +represented by each wastewater catchment area. License: Apache License (>= 2) URL: https://github.com/cdcgov/ww-inference-model/ BugReports: https://github.com/cdcgov/ww-inference-model/issues/ From 76d857480d8117db4cc429e9182257f01b44e29d Mon Sep 17 00:00:00 2001 From: Kaitlyn Johnson <94390107+kaitejohnson@users.noreply.github.com> Date: Wed, 26 Jun 2024 12:39:26 -0400 Subject: [PATCH 009/103] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 7a12c144..acc9a692 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# wwinference: joint inference and forecasting from wastewater and epidemiological indicators +# `wwinference`: joint inference and forecasting from wastewater and epidemiological indicators ## Overview From fcd6b152d3aae0842ca566474772fe000eb067b8 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 26 Jun 2024 12:54:20 -0400 Subject: [PATCH 010/103] fix pre-commit --- .github/actions/{ => pre-commit}/action.yaml | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename .github/actions/{ => pre-commit}/action.yaml (100%) diff --git a/.github/actions/action.yaml b/.github/actions/pre-commit/action.yaml similarity index 100% rename from .github/actions/action.yaml rename to .github/actions/pre-commit/action.yaml From 620d6049568b95af25fcd8895e246d87a0658ec9 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 26 Jun 2024 12:57:55 -0400 Subject: [PATCH 011/103] pre-commit on readme --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index acc9a692..d610a028 100644 --- a/README.md +++ b/README.md @@ -7,10 +7,10 @@ > `{wwinference}` estimates latent incident infections from wastewater concentration data and data on epidemiological indicators, with an initial assumed structure that the wastewater concentration data comes from subsets of the population contributing to the global epidemiological indicator data, such as hospital admissions. This model reflects the model currently used to produce forecasts of COVID-19 hospital admissions, of which the pipeline and model code is available at -the [wastewater-informed-covid-forecasting](https://github.com/CDCgov/wastewater-informed-covid-forecasting) repository. +the [wastewater-informed-covid-forecasting](https://github.com/CDCgov/wastewater-informed-covid-forecasting) repository. + +The intention is for {wwinference} to provide a user-friendly R-package interface for running forecasting models that use wastewater concentrations combined with other more traditional epidemiological signals such as cases or hospital admissions. -The intention is for {wwinference} to provide a user-friendly R-package interface for running forecasting models that use wastewater concentrations combined with other more traditional epidemiological signals such as cases or hospital admissions. - ## Public Domain Standard Notice This repository constitutes a work of the United States Government and is not subject to domestic copyright protection under 17 USC § 105. This repository is in From d8167b6646a87dae1bd1935dd265e246b05bec70 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 26 Jun 2024 13:09:49 -0400 Subject: [PATCH 012/103] attempt to set up pkgdown --- .Rbuildignore | 6 ++-- .github/.gitignore | 1 + .github/workflows/pkgdown.yaml | 50 ++++++++++++++++++++++++++++++++++ .gitignore | 1 + DESCRIPTION | 22 +++++++-------- _pkgdown.yml | 3 ++ 6 files changed, 70 insertions(+), 13 deletions(-) create mode 100644 .github/.gitignore create mode 100644 .github/workflows/pkgdown.yaml create mode 100644 _pkgdown.yml diff --git a/.Rbuildignore b/.Rbuildignore index fc677c7b..531e1155 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -1,2 +1,4 @@ -^wwinference\.Rproj$ -^\.Rproj\.user$ +^_pkgdown\.yml$ +^docs$ +^pkgdown$ +^\.github$ diff --git a/.github/.gitignore b/.github/.gitignore new file mode 100644 index 00000000..2d19fc76 --- /dev/null +++ b/.github/.gitignore @@ -0,0 +1 @@ +*.html diff --git a/.github/workflows/pkgdown.yaml b/.github/workflows/pkgdown.yaml new file mode 100644 index 00000000..c9f0165d --- /dev/null +++ b/.github/workflows/pkgdown.yaml @@ -0,0 +1,50 @@ +# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples +# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help +on: + push: + branches: [main, master] + pull_request: + branches: [main, master] + release: + types: [published] + workflow_dispatch: + +name: pkgdown + +permissions: read-all + +jobs: + pkgdown: + runs-on: ubuntu-latest + # Only restrict concurrency for non-PR jobs + concurrency: + group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }} + env: + GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} + permissions: + contents: write + steps: + - uses: actions/checkout@v4 + + - uses: r-lib/actions/setup-pandoc@v2 + + - uses: r-lib/actions/setup-r@v2 + with: + use-public-rspm: true + + - uses: r-lib/actions/setup-r-dependencies@v2 + with: + extra-packages: any::pkgdown, local::. + needs: website + + - name: Build site + run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE) + shell: Rscript {0} + + - name: Deploy to GitHub pages 🚀 + if: github.event_name != 'pull_request' + uses: JamesIves/github-pages-deploy-action@v4.5.0 + with: + clean: false + branch: gh-pages + folder: docs diff --git a/.gitignore b/.gitignore index ad2d8439..2998db10 100644 --- a/.gitignore +++ b/.gitignore @@ -379,3 +379,4 @@ docs/site/ # such, it should not be committed for packages, but should be committed for # applications that require a static environment. #Manifest.toml +docs diff --git a/DESCRIPTION b/DESCRIPTION index b2540a65..62bd51f5 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -40,18 +40,18 @@ Authors@R: c( email = "xum8@cdc.gov") ) Description: An implementation of a hierarchical semi-mechanistic renewal -approach jointly calibrating to multiple wastewater concentrations datasets from -subsets of a specified population and epidemioliogical indicators such as cases -or hospital admissions from the whole population. Our framework is an extension -of the widely used semi-mechanistic renewal framework {EpiNow2}, using a Bayesian -latent variable approach implemented in the probabilistic programming language -Stan. This package contains just the core components needed to fit these two -data sources and produce the following outputs-- estimated and forecasted -hospital admissions, estimated and forecasted wastewater concentrations, -global $R(t)$ estimates, local $R(t)$ estimates for the subpopulations -represented by each wastewater catchment area. + approach jointly calibrating to multiple wastewater concentrations datasets from + subsets of a specified population and epidemioliogical indicators such as cases + or hospital admissions from the whole population. Our framework is an extension + of the widely used semi-mechanistic renewal framework EpiNow2, using a Bayesian + latent variable approach implemented in the probabilistic programming language + Stan. This package contains just the core components needed to fit these two + data sources and produce the following outputs-- estimated and forecasted + hospital admissions, estimated and forecasted wastewater concentrations, + global R(t) estimates, local R(t) estimates for the subpopulations + represented by each wastewater catchment area. License: Apache License (>= 2) -URL: https://github.com/cdcgov/ww-inference-model/ +URL: https://github.com/cdcgov/ww-inference-model/, https://cdcgov.github.io/ww-inference-model/ BugReports: https://github.com/cdcgov/ww-inference-model/issues/ Depends: R (>= 4.3.0) diff --git a/_pkgdown.yml b/_pkgdown.yml new file mode 100644 index 00000000..13aa9bb2 --- /dev/null +++ b/_pkgdown.yml @@ -0,0 +1,3 @@ +url: https://cdcgov.github.io/ww-inference-model/ +template: + bootstrap: 5 From 178c1cf228c5d6ef15e777d382e7fd333cb9f4e0 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 26 Jun 2024 13:17:33 -0400 Subject: [PATCH 013/103] correct path to deps --- .github/workflows/pkgdown.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/pkgdown.yaml b/.github/workflows/pkgdown.yaml index c9f0165d..d753eded 100644 --- a/.github/workflows/pkgdown.yaml +++ b/.github/workflows/pkgdown.yaml @@ -34,7 +34,7 @@ jobs: - uses: r-lib/actions/setup-r-dependencies@v2 with: - extra-packages: any::pkgdown, local::. + extra-packages: any::pkgdown, local::wwinference needs: website - name: Build site From 9d444e84138b96361f988ca825161a82f8364d71 Mon Sep 17 00:00:00 2001 From: Kaitlyn Johnson <94390107+kaitejohnson@users.noreply.github.com> Date: Wed, 26 Jun 2024 13:44:18 -0400 Subject: [PATCH 014/103] remove call to package --- .github/workflows/pkgdown.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/pkgdown.yaml b/.github/workflows/pkgdown.yaml index d753eded..c9f0165d 100644 --- a/.github/workflows/pkgdown.yaml +++ b/.github/workflows/pkgdown.yaml @@ -34,7 +34,7 @@ jobs: - uses: r-lib/actions/setup-r-dependencies@v2 with: - extra-packages: any::pkgdown, local::wwinference + extra-packages: any::pkgdown, local::. needs: website - name: Build site From 4f8ff3a24ce054fab0822f5f64eeaedf6bc69a99 Mon Sep 17 00:00:00 2001 From: Kaitlyn Johnson <94390107+kaitejohnson@users.noreply.github.com> Date: Wed, 26 Jun 2024 13:47:31 -0400 Subject: [PATCH 015/103] Create CODEOWNERS --- .github/CODEOWNERS | 1 + 1 file changed, 1 insertion(+) create mode 100644 .github/CODEOWNERS diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS new file mode 100644 index 00000000..4d45862e --- /dev/null +++ b/.github/CODEOWNERS @@ -0,0 +1 @@ +*@kaitejohnson @dylanhmorris From 46f79c456077fb8d048644c492940566a9548b16 Mon Sep 17 00:00:00 2001 From: Kaitlyn Johnson <94390107+kaitejohnson@users.noreply.github.com> Date: Wed, 26 Jun 2024 13:49:18 -0400 Subject: [PATCH 016/103] Create CONTRIBUTING.md --- .github/CONTRIBUTING.md | 46 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) create mode 100644 .github/CONTRIBUTING.md diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md new file mode 100644 index 00000000..7233a280 --- /dev/null +++ b/.github/CONTRIBUTING.md @@ -0,0 +1,46 @@ +# Contributing to `wwinference` + +This outlines how to propose a change to `wwinference`. + +## Fixing typos + +You can fix typos, spelling mistakes, or grammatical errors in the documentation directly using the GitHub web interface, as long as the changes are made in the _source_ file. +This generally means you'll need to edit [roxygen2 comments](https://roxygen2.r-lib.org/articles/roxygen2.html) in an `.R`, not a `.Rd` file. +You can find the `.R` file that generates the `.Rd` by reading the comment in the first line. + +## Bigger changes + +If you want to make a bigger change, it's a good idea to first file an issue and make sure someone from the team agrees that it’s needed. +If you’ve found a bug, please file an issue that illustrates the bug with a minimal +[reprex](https://www.tidyverse.org/help/#reprex) (this will also help you write a unit test, if needed). +See our guide on [how to create a great issue](https://code-review.tidyverse.org/issues/) for more advice. + +### Pull request process + +* Fork the package and clone onto your computer. If you haven't done this before, we recommend using `usethis::create_from_github("CDCgov/ww-inference-model", fork = TRUE)`. + +* Install all development dependencies with `devtools::install_dev_deps()`, and then make sure the package passes R CMD check by running `devtools::check()`. + If R CMD check doesn't pass cleanly, it's a good idea to ask for help before continuing. +* Create a Git branch for your pull request (PR). We recommend using `usethis::pr_init("brief-description-of-change")`. + +* Make your changes, commit to git, and then create a PR by running `usethis::pr_push()`, and following the prompts in your browser. + The title of your PR should briefly describe the change. + The body of your PR should contain `Fixes #issue-number`. + +* For user-facing changes, add a bullet to the top of `NEWS.md` (i.e. just below the first header). Follow the style described in . + +### Code style + +* New code should follow the tidyverse [style guide](https://style.tidyverse.org). + You can use the [styler](https://CRAN.R-project.org/package=styler) package to apply these styles, but please don't restyle code that has nothing to do with your PR. + +* We use [roxygen2](https://cran.r-project.org/package=roxygen2), with [Markdown syntax](https://cran.r-project.org/web/packages/roxygen2/vignettes/rd-formatting.html), for documentation. + +* We use [testthat](https://cran.r-project.org/package=testthat) for unit tests. + Contributions with test cases included are easier to accept. + +## Code of Conduct + +Please note that the RtGam project is released with a +[Contributor Code of Conduct](CODE_OF_CONDUCT.md). By contributing to this +project you agree to abide by its terms. From 45d949fd0e718b0cf8dacb7847eb4049a81f96ca Mon Sep 17 00:00:00 2001 From: Kaitlyn Johnson <94390107+kaitejohnson@users.noreply.github.com> Date: Wed, 26 Jun 2024 13:51:20 -0400 Subject: [PATCH 017/103] Create NEWS.md --- NEWS.md | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 NEWS.md diff --git a/NEWS.md b/NEWS.md new file mode 100644 index 00000000..9558c013 --- /dev/null +++ b/NEWS.md @@ -0,0 +1,3 @@ +# wwinference + +This will serve as our change-log From 47364bde0f89c33cc95a6e732c7c7d8e834fa531 Mon Sep 17 00:00:00 2001 From: Kaitlyn Johnson <94390107+kaitejohnson@users.noreply.github.com> Date: Wed, 26 Jun 2024 13:52:57 -0400 Subject: [PATCH 018/103] Create SUPPORT.md --- .github/SUPPORT.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) create mode 100644 .github/SUPPORT.md diff --git a/.github/SUPPORT.md b/.github/SUPPORT.md new file mode 100644 index 00000000..4ffb7945 --- /dev/null +++ b/.github/SUPPORT.md @@ -0,0 +1,20 @@ +# Getting help with wwinference + +Thanks for using wwinference! +Before filing an issue, there are a few places to explore and pieces to put together to make the process as smooth as possible. + +## Make a reprex + +Start by making a minimal **repr**oducible **ex**ample using the [reprex](https://reprex.tidyverse.org/) package. +If you haven't heard of or used reprex before, you're in for a treat! +Seriously, reprex will make all of your R-question-asking endeavors easier (which is a pretty insane ROI for the five to ten minutes it'll take you to learn what it's all about). +For additional reprex pointers, check out the [Get help!](https://www.tidyverse.org/help/) section of the tidyverse site. + +## Where to ask? + +[File an issue](https://github.com/CDCgov/ww-inference-model/issues/new)! + +Before opening a new issue, be sure to [search issues and pull requests](https://github.com/CDCgov/ww-inference-model/issues) to make sure the bug hasn't been reported and/or already fixed in the development version. +By default, the search will be pre-populated with `is:issue is:open`. +You can [edit the qualifiers](https://help.github.com/articles/searching-issues-and-pull-requests/) (e.g. `is:pr`, `is:closed`) as needed. +For example, you'd simply remove `is:open` to search _all_ issues in the repo, open or closed. From 9884e8cd32e09eb8de019b9a2763af02c6849f31 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Thu, 27 Jun 2024 10:19:27 -0400 Subject: [PATCH 019/103] add stan model and start of vignette --- inst/stan/functions/ar1.stan | 39 +++ inst/stan/functions/convolve.stan | 16 + inst/stan/functions/diff_ar1.stan | 23 ++ inst/stan/functions/expgamma_lpdf.stan | 25 ++ inst/stan/functions/hospitalization.stan | 29 ++ inst/stan/functions/infections.stan | 94 ++++++ inst/stan/functions/observation_model.stan | 37 +++ inst/stan/functions/utils.stan | 12 + inst/stan/simplewwinference.stan | 232 +++++++++++++ inst/stan/wwinference.stan | 370 +++++++++++++++++++++ vignettes/wwinference.Rmd | 74 +++++ 11 files changed, 951 insertions(+) create mode 100644 inst/stan/functions/ar1.stan create mode 100644 inst/stan/functions/convolve.stan create mode 100644 inst/stan/functions/diff_ar1.stan create mode 100644 inst/stan/functions/expgamma_lpdf.stan create mode 100644 inst/stan/functions/hospitalization.stan create mode 100644 inst/stan/functions/infections.stan create mode 100644 inst/stan/functions/observation_model.stan create mode 100644 inst/stan/functions/utils.stan create mode 100644 inst/stan/simplewwinference.stan create mode 100644 inst/stan/wwinference.stan create mode 100644 vignettes/wwinference.Rmd diff --git a/inst/stan/functions/ar1.stan b/inst/stan/functions/ar1.stan new file mode 100644 index 00000000..3f0a277d --- /dev/null +++ b/inst/stan/functions/ar1.stan @@ -0,0 +1,39 @@ +/** + * Assembles an AR(1) process out of its non-centered, non-scaled components. + * @param mu vector of the mean value that the process preserves to + * @param ac The autocorrelation coefficient. + * @param sd The sd for the white noise/IID normal terms. + * @param z vector of IID Normal(0,1) variables, the white noise/increment terms. + * @param is_stat int serving as logical, should process be initialized at + * stationary variance (1) or not (0)? Not valid for |ac| >= 1. + * @return A vector, the values of the mean-preserving AR(1) process: + * x[1] = x0, + * x[i > 1] = mu + tvd[[i], where + * tvd[i] = ac * tvd[i - 1] + sd* z[i] + * Formally x(t) = mu(t) + delta(t) where + * delta(t) = psi*delta(t-1) + eta(t) + */ +vector ar1(vector mu, real ac, real sd, vector z, int is_stat) { + int n = num_elements(z); + vector[n] eta; + vector[n] x; + vector[n] tvd; + + eta = sd * z; + if(is_stat) { + real adj; + if (ac >= 1.0) { + reject("AR(1) process is not stationary if ac >= 1."); + } + adj = 1.0 / sqrt(1.0 - ac^2); + eta[1] = adj * eta[1]; + } + + tvd[1] = eta[1]; + for (t in 2:n) { + tvd[t] = ac * tvd[t - 1] + eta[t]; + } + + x = mu + tvd; + return x; +} diff --git a/inst/stan/functions/convolve.stan b/inst/stan/functions/convolve.stan new file mode 100644 index 00000000..eb714f40 --- /dev/null +++ b/inst/stan/functions/convolve.stan @@ -0,0 +1,16 @@ +// This code was adapted from code written +// (under an MIT license) as part of the `EpiNow2` +// package (https://github.com/epiforecasts/EpiNow2) + +// convolve two vectors as a backwards dot product +// y vector shoud be reversed +// limited to the length of x and backwards looking for x indexes +// designed for use convolve a case vector and a delay pmf +vector convolve_dot_product(vector x, vector y, int len) { + int ylen = num_elements(y); + vector[len] z; + for (s in 1 : len) { + z[s] = dot_product(x[max(1, s - ylen + 1) : s], tail(y, min(ylen, s))); + } + return z; +} diff --git a/inst/stan/functions/diff_ar1.stan b/inst/stan/functions/diff_ar1.stan new file mode 100644 index 00000000..b14c39bb --- /dev/null +++ b/inst/stan/functions/diff_ar1.stan @@ -0,0 +1,23 @@ +/** + * Assembles a differenced AR(1) process out of its non-centered, non-scaled + * components. + * @param x0 The initial value, or intercept. + * @param ar The autocorrelation coefficient. + * @param sd The standard deviation for the white noise/IID normal terms. + * @param z Vector of IID Normal(0,1) variables, the white noise/increment terms. + * @param is_stat int serving as logical. Passed as the `is_stat` + * argument to function [ar1()]. Should the underlying AR(1) process on + * the first differences be initialized at stationary variance (1) + * or not (0)? Not valid for |ar| >= 1. + * + * @return A vector representing the values of the differenced + * AR(1) process. + */ +vector diff_ar1(real x0, real ar, real sd, vector z, int is_stat) { + int n = num_elements(z) + 1; + vector[n] diffs; + + diffs[1] = x0; + diffs[2:n] = ar1(rep_vector(0.0, n-1), ar, sd, z, is_stat); + return cumulative_sum(diffs); +} diff --git a/inst/stan/functions/expgamma_lpdf.stan b/inst/stan/functions/expgamma_lpdf.stan new file mode 100644 index 00000000..44ab18c7 --- /dev/null +++ b/inst/stan/functions/expgamma_lpdf.stan @@ -0,0 +1,25 @@ +real expgamma_lpdf(vector x, real shape_k, real scale_theta){ + + return(sum( + -shape_k * log(scale_theta) - + lgamma(shape_k) + + shape_k * x - (exp(x) / scale_theta))); +} + +real expgamma_lpdf(real x, real shape_k, real scale_theta){ + + return( + -shape_k * log(scale_theta) - + lgamma(shape_k) + + shape_k * x - (exp(x) / scale_theta)); +} + +real expgamma_lpdf(vector xs, + vector shapes_k, + vector scales_theta){ + + return(sum( + -shapes_k .* log(scales_theta) - + lgamma(shapes_k) + + shapes_k .* xs - (exp(xs) ./ scales_theta))); +} diff --git a/inst/stan/functions/hospitalization.stan b/inst/stan/functions/hospitalization.stan new file mode 100644 index 00000000..4cf4db5d --- /dev/null +++ b/inst/stan/functions/hospitalization.stan @@ -0,0 +1,29 @@ +/** + * Assembles daily hospitalization probability vector from a weekly random walk. + * Uses non-centered, non-scaled differences, initial value, and SD for the walk. + * Multiplies by matrix to transform to daily. + * + * @param p_hosp_m matrix to handle conversion from weekly to daily + * @param p_hosp_int intercept/first value, on unconstrained scale + * @param p_hosp_w_sd, scaling factor for/SD of random walk + * @param autoreg_p_hosp, autoreg parameter for IHR + * @param p_hosp_w weekly deviations of the random walk, on unconstrained scale + * @param tot_weeks, number of total weeks in calibration and forecast period + * @param is_stat, whether or not AR(1) process starts at stationarity + * @return A vector, daily probabilities of hospitalization + */ +vector assemble_p_hosp(matrix p_hosp_m, real p_hosp_mean, real p_hosp_w_sd, + real autoreg_p_hosp, + vector p_hosp_w, int tot_weeks, int is_stat) { + vector[dims(p_hosp_m)[1]] p_hosp; + vector[tot_weeks] p_hosp_in_weeks; + + p_hosp_in_weeks = ar1(rep_vector(p_hosp_mean, tot_weeks), + autoreg_p_hosp, + p_hosp_w_sd, + p_hosp_w, + is_stat); + p_hosp = p_hosp_m * p_hosp_in_weeks; + p_hosp = inv_logit(p_hosp); + return p_hosp; +} diff --git a/inst/stan/functions/infections.stan b/inst/stan/functions/infections.stan new file mode 100644 index 00000000..29e24d20 --- /dev/null +++ b/inst/stan/functions/infections.stan @@ -0,0 +1,94 @@ +// This code was adapted from code written +// (under an MIT license) as part of the `EpiNow2` +// package (https://github.com/epiforecasts/EpiNow2) +// calculate infectiousness (weighted sum of the generation interval and incident infections) +real update_infectiousness(vector infections, vector gt_rev_pmf, + int seeding_time, int index) { + int gt_max = num_elements(gt_rev_pmf); + // work out where to start the convolution of past infections with the + // generation time distribution: (current_time - maximal generation time) if + // that is >= 1, otherwise 1 (how far back to add infectiousness from) + int inf_start = max(1, index + seeding_time - gt_max); + // work out where to end the convolution: (current_time - 1) + int inf_end = index + seeding_time - 1; + // number of indices of the generation time to sum over (inf_end - inf_start + 1) + // Either go all the way back or just go to where we start seeing infections + int pmf_accessed = min(gt_max, index + seeding_time - 1); + // calculate the elements of the convolution + real new_inf = dot_product(infections[inf_start : inf_end], + tail(gt_rev_pmf, pmf_accessed) + // Last elements of reversed generation interval = first elemenets of generation interval! + ); + return new_inf; +} + +/** + * Computes the number of infections over time by updating the reproduction + * number based on the effective infectiousness from previous days and + * feedback from incidence data. Adapted from the EpiNow2 package. + * + * @param obs_r A vector of length `ot` representing the observed reproduction + * numbers over a certain period. + * + * @param uot An integer representing the number of unobserved time steps + * prior to the first observed value in `obs_r`. + * + * @param gt_rev_pmf A vector representing the reversed generation time + * probability mass function. + * + * @param initial_infections An array of real numbers representing the initial + * number of infections (in log scale) at the start of the unobserved period. + * + * @param initial_growth A real number representing the initial + * growth rate of infections (in log scale) over the unobserved period. + * + * @param ht An integer representing the time horizon for the historical + * tracking of infections. + * + * @param infection_feedback An optional array of real numbers providing + * feedback to adjust the reproduction number based on recent incidence data. + * + * @param infection_feedback_pmf_rev A vector representing the + * reversed probability mass function for the infection feedback. + * + * @return A tuple containing a vector of length `t` representing the number + * of infections for each time step, combining both unobserved and observed + * periods and a vector the same length as the input obs_r representing the + * effective reproduction number at each time step. + * @author Sam Abbott + */ +tuple(vector, vector) generate_infections(vector obs_r, int uot, vector gt_rev_pmf, + real initial_infections, real initial_growth, + int ht, real infection_feedback, vector infection_feedback_pmf_rev +) { + // time indices and storage + int at = num_elements(obs_r); // all time with R(t), including horizon time + int t = at + uot; // uot + ot + Ht + vector[at] rt = obs_r; + vector[t] infections = rep_vector(0, t); + real infectiousness; + real infection_feedback_weighting; + tuple(vector[t], vector[at]) output; + + // Initialise infections using daily growth + infections[1] = initial_infections; + if (uot > 1) { + vector[uot-1] growth = rep_vector(initial_growth, uot-1); + infections[2:uot] = initial_infections + cumulative_sum(growth); + } + infections[1:uot] = exp(infections[1:uot]); + // iteratively update infections + for (s in 1:at) { + infectiousness = update_infectiousness(infections, gt_rev_pmf, uot, s); + infection_feedback_weighting = update_infectiousness( + infections, infection_feedback_pmf_rev, uot, s + ); + rt[s] = exp(log(rt[s]) - infection_feedback .* infection_feedback_weighting); + infections[s + uot] = rt[s] * infectiousness; + } + + // Assign tuple output + output.1 = infections; + output.2 = rt; + return(output); +} diff --git a/inst/stan/functions/observation_model.stan b/inst/stan/functions/observation_model.stan new file mode 100644 index 00000000..fb659881 --- /dev/null +++ b/inst/stan/functions/observation_model.stan @@ -0,0 +1,37 @@ +// This code was adapted from code written +// (under an MIT license) as part of the `EpiNow2` +// package (https://github.com/epiforecasts/EpiNow2) +vector day_of_week_effect(vector reports, array[] int day_of_week, + vector effect) { + int t = num_elements(reports); + int wl = num_elements(effect); + // scale day of week effect + vector[wl] scaled_effect = wl * effect; + vector[t] scaled_reports; + for (s in 1:t) { + // add reporting effects (adjust for simplex scale) + scaled_reports[s] = reports[s] * scaled_effect[day_of_week[s]]; + } + return scaled_reports; +} + +vector get_vl_trajectory(real tpeak, real viral_peak, + real duration_shedding, int n) { + vector[n] s; + real growth = viral_peak / tpeak; + real wane = viral_peak / (duration_shedding - tpeak); + + for (t in 1 : n) { + if (t <= tpeak) { + s[t] = pow(10, growth * t); + } else { + s[t] = viral_peak + wane * tpeak - wane * t; + if (s[t] < 0) { + s[t] = 0; + } + s[t] = pow(10, s[t]); + } + } + s = s / sum(s); + return s; +} diff --git a/inst/stan/functions/utils.stan b/inst/stan/functions/utils.stan new file mode 100644 index 00000000..e8409302 --- /dev/null +++ b/inst/stan/functions/utils.stan @@ -0,0 +1,12 @@ +// Functions to convert natural scale means to corresponding lognormal means +real convert_to_logmean(real mean, real sd) { + real logmean; + logmean = log(mean ^ 2 / sqrt(sd ^ 2 + mean ^ 2)); + return logmean; +} + +real convert_to_logsd(real mean, real sd) { + real logsd; + logsd = sqrt(log(1 + (sd ^ 2 / mean ^ 2))); + return logsd; +} diff --git a/inst/stan/simplewwinference.stan b/inst/stan/simplewwinference.stan new file mode 100644 index 00000000..9ff0fb27 --- /dev/null +++ b/inst/stan/simplewwinference.stan @@ -0,0 +1,232 @@ +functions { +#include functions/ar1.stan +#include functions/diff_ar1.stan +#include functions/convolve.stan +#include functions/infections.stan +#include functions/hospitalization.stan +#include functions/observation_model.stan +#include functions/utils.stan +} +// end functions + +// The fixed input data +data { + int gt_max; + int hosp_delay_max; + vector[hosp_delay_max] inf_to_hosp; + int dur_inf; // duration people are infectious (number of days) + real mwpd; // mL of WW produced per person per day + int if_l; // length of infection feedback pmf + vector[if_l] infection_feedback_pmf; // infection feedback pmf + int ot; // maximum time index for the hospital admissions (max number of days we could have observations) + int oht; // number of days that we have hospital admissions observations + int owt; // number of days of observed WW (should be roughly ot/7) + int uot; // unobserved time before we observe hospital admissions/ WW + int ht; // horizon time (nowcast + forecast time) + int n_weeks; // number of weeks for weekly random walk on R(t) + matrix[ot + ht, n_weeks] ind_m; // matrix needed to transform R(t) from weekly to daily + int tot_weeks; // number of weeks for the weekly random walk on IHR (includes unobserved time) + matrix[uot + ot + ht, tot_weeks] p_hosp_m ; // matrix needed to convert p_hosp RW from weekly to daily + vector[gt_max] generation_interval; // generation interval distribution + real n; // population size + array[owt] int ww_sampled_times; // the days on which WW is sampled relative + // to the days with which hospital admissions observed + array[oht] int hosp_times; // the days on which hospital admissions are observed + array[oht] int hosp; // observed hospital admissions + array[ot + ht] int day_of_week; // integer vector with 1-7 corresponding to the weekday + vector[owt] log_conc; // log(genome copies/mL) + int compute_likelihood; // 1= use data to compute likelihood + int include_ww; // 1= include wastewater data in likelihood calculation + int include_hosp; // 1 = fit to hosp, 0 = only fit wastewater model + + // Priors + vector[6] viral_shedding_pars; // tpeak, viral peak, shedding duration mean and sd + real autoreg_rt_a; + real autoreg_rt_b; + real autoreg_p_hosp_a; + real autoreg_p_hosp_b; + real inv_sqrt_phi_prior_mean; + real inv_sqrt_phi_prior_sd; + real r_prior_mean; + real r_prior_sd; + real log10_g_prior_mean; + real log10_g_prior_sd; + real i0_over_n_prior_a; + real i0_over_n_prior_b; + real wday_effect_prior_mean; + real wday_effect_prior_sd; + real initial_growth_prior_mean; + real initial_growth_prior_sd; + real sigma_ww_prior_mean; + real eta_sd_sd; + real p_hosp_prior_mean; + real p_hosp_sd_logit; + real p_hosp_w_sd_sd; + real inf_feedback_prior_logmean; + real inf_feedback_prior_logsd; +} + +transformed data { + // viral shedding parameters + real t_peak_mean = viral_shedding_pars[1]; + real t_peak_sd = viral_shedding_pars[2]; + real viral_peak_mean = viral_shedding_pars[3]; + real viral_peak_sd = viral_shedding_pars[4]; + real dur_shed_mean = viral_shedding_pars[5]; + real dur_shed_sd = viral_shedding_pars[6]; + // natural scale -> lognormal parameters + // https://en.wikipedia.org/wiki/Log-normal_distribution + real r_logmean = convert_to_logmean(r_prior_mean, r_prior_sd); + real r_logsd = convert_to_logsd(r_prior_mean, r_prior_sd); + // reversed generation interval + vector[gt_max] gt_rev_pmf = reverse(generation_interval); + vector[if_l] infection_feedback_rev_pmf = reverse(infection_feedback_pmf); +} + +// The parameters accepted by the model. +parameters { + vector[n_weeks-1] w; // weekly random walk + real eta_sd; // step size of random walk + real autoreg_rt;// coefficient on AR process in R(t) + array [(include_ww==1) ? 1 : 0] real autoreg_p_hosp; + real log_r; // baseline reproduction number estimate (log) + real i0_over_n; // Per capita incident infections + // on day -uot before first observation day + real initial_growth; // initial growth from I0 to first observed time + real inv_sqrt_phi_h; + real sigma_ww; + real p_hosp_mean; // Estimated IHR + vector[(include_ww==1) ? tot_weeks : 0] p_hosp_w; + array [(include_ww==1) ? 1 : 0] real p_hosp_w_sd; + real t_peak; // time to viral load peak in shedding + real viral_peak; // log10 peak viral load shed /mL + real dur_shed; // duration of detectable viral shedding + real log10_g; // log10 of number of genomes per infected individual + simplex[7] hosp_wday_effect; // day of week reporting effect, sums to 1 + real infection_feedback; // infection feedback + +} + +transformed parameters { + vector[ot + uot + ht] new_i; // daily incident infections /n + vector [(include_ww ==1) ? ot + uot + ht: 1] p_hosp; // probability of hospitalization + vector[(include_ww == 1) ? tot_weeks-1: 0] p_hosp_in_weeks; // the weekly vector of probability of hospital admissions + vector[ot + uot + ht] model_hosp; // model estimated hospital admissions + vector[oht] exp_obs_hosp; // expected observed hospital admissions + vector[ot] exp_obs_hosp_no_wday_effect; // expected observed hospital admissions before weekday effect + vector[gt_max] s; // viral kinetics trajectory (normalized) + vector[ot + uot + ht] model_log_v; // model estimated log viral genomes shed per person + vector[ot+ht] model_log_v_ot; // model estimated log viral genomes shed per person during ot + vector[owt] exp_obs_log_v; // model estimated log viral genomes shed per person at ww sampled times only + vector[ot + uot + ht] model_net_i; // number of net infected individuals shedding on each day (sum of individuals in dift stages of infection) + real phi_h = inv_square(inv_sqrt_phi_h); // previouslt inv_square(inv_sqrt_phi_h) + vector[ot + ht] unadj_r; // R(t) + vector[ot + ht] rt; // R(t) + real i0 = i0_over_n * n; // Absolute initial incident infections + vector[n_weeks] log_rt_weeks; // log R(t) in weeks for autocorrelated RW + + + // AR + RW implementation: + log_rt_weeks = diff_ar1(log_r, autoreg_rt, eta_sd, w, 0); + unadj_r = ind_m * log_rt_weeks; + unadj_r = exp(unadj_r); + + // Expected daily number of new infections (per capita), using EpiNow2 assumptions re pre and post observation time + // Using pop = 1 so that damping is normalized to per capita + (new_i, rt) = generate_infections( + unadj_r, + uot, + gt_rev_pmf, + log(i0_over_n), + initial_growth, + ht, + infection_feedback, + infection_feedback_rev_pmf); + + // Expected hospitalizations: + // generates all hospitalizations, across unobserved time, observed time, and forecast time + if(include_ww==1){ + p_hosp = assemble_p_hosp(p_hosp_m, p_hosp_mean, p_hosp_w_sd[1], + autoreg_p_hosp[1], p_hosp_w, tot_weeks, 1); + model_hosp = convolve_dot_product(p_hosp .* new_i, reverse(inf_to_hosp), + ot + uot + ht); + }else{ + p_hosp[1] = inv_logit(p_hosp_mean); + // generates all hospitalizations, across unobserved time, observed time, and forecast time + model_hosp = convolve_dot_product(p_hosp[1] * new_i, reverse(inf_to_hosp), + ot + uot + ht); + } + + + + // just get the expected observed hospitalizations + exp_obs_hosp_no_wday_effect = model_hosp[uot + 1 : uot + ot]; + // apply the weekday effect so these are distributed with fewer admits on Sat & Sun + // multiply by n because data must be integer so need this to be in actual numbers not proportions + exp_obs_hosp = n * day_of_week_effect(exp_obs_hosp_no_wday_effect[hosp_times], + day_of_week[hosp_times], hosp_wday_effect); + + // Expected shed viral genomes: + // Shedding kinetics trajectory + s = get_vl_trajectory(t_peak, viral_peak, dur_shed, gt_max); + + // This should also be a convolution of incident infections and shedding kinetics pmf times avg total virus shed + model_net_i = convolve_dot_product(new_i, reverse(s), uot + ot + ht); // net number of infected individuals + // log number of viral genomes shed on a given day = net infected individuals * amount shed per individual + model_log_v = log(10)*log10_g + log(model_net_i + 1e-8); // adding for numerical stability + // genome copies/mL = genome copies/(person * mL of WW per person day) + model_log_v_ot = model_log_v[(uot + 1) : (uot + ot + ht)] - log(mwpd); + exp_obs_log_v = model_log_v_ot[ww_sampled_times]; +} + +// Prior and sampling distribution +model { + // priors + vector[7] effect_mean = rep_vector(wday_effect_prior_mean, 7); + w ~ std_normal(); + eta_sd ~ normal(0, eta_sd_sd); + autoreg_rt ~ beta(autoreg_rt_a, autoreg_rt_b); + autoreg_p_hosp ~ beta(autoreg_p_hosp_a, autoreg_p_hosp_b); + log_r ~ normal(r_logmean, r_logsd); + i0_over_n ~ beta(i0_over_n_prior_a, i0_over_n_prior_b); + initial_growth ~ normal(initial_growth_prior_mean, initial_growth_prior_sd); + inv_sqrt_phi_h ~ normal(inv_sqrt_phi_prior_mean, inv_sqrt_phi_prior_sd); + sigma_ww ~ normal(0, sigma_ww_prior_mean); + log10_g ~ normal(log10_g_prior_mean, log10_g_prior_sd); + hosp_wday_effect ~ normal(effect_mean, wday_effect_prior_sd); + p_hosp_mean ~ normal(logit(p_hosp_prior_mean), p_hosp_sd_logit); + p_hosp_w ~ std_normal(); + p_hosp_w_sd ~ normal(0, p_hosp_w_sd_sd); + t_peak ~ normal(t_peak_mean, t_peak_sd); + viral_peak ~ normal(viral_peak_mean, viral_peak_sd); + dur_shed ~ normal(dur_shed_mean, dur_shed_sd); + infection_feedback ~ lognormal(inf_feedback_prior_logmean, inf_feedback_prior_logsd); + + // Compute log likelihood + if (compute_likelihood == 1) { + if (include_ww == 1) { + log_conc ~ normal(exp_obs_log_v, sigma_ww); + } + + if (include_hosp == 1) { + hosp ~ neg_binomial_2(exp_obs_hosp, phi_h); + } + } // end if for computing log likelihood +} + +generated quantities { + array[ot + ht] real pred_hosp; + array[ot + ht] real pred_new_i; + array[ot + ht] real pred_conc; + vector[ot + ht] exp_state_ww_conc = exp(model_log_v_ot); // state mean wastewater concentration + real g = pow(10, log10_g); + + pred_hosp = neg_binomial_2_rng(n * day_of_week_effect(model_hosp[uot + 1 : + uot + ot + ht], + day_of_week, + hosp_wday_effect), + phi_h); + pred_new_i = neg_binomial_2_rng(n * new_i[uot + 1 : uot + ot + ht], phi_h); + + pred_conc = normal_rng(model_log_v_ot, sigma_ww); +} diff --git a/inst/stan/wwinference.stan b/inst/stan/wwinference.stan new file mode 100644 index 00000000..e07ef35a --- /dev/null +++ b/inst/stan/wwinference.stan @@ -0,0 +1,370 @@ +functions { +#include functions/ar1.stan +#include functions/diff_ar1.stan +#include functions/convolve.stan +#include functions/hospitalization.stan +#include functions/infections.stan +#include functions/observation_model.stan +#include functions/utils.stan + +} + +// The fixed input data +data { + int gt_max; + int hosp_delay_max; + vector[hosp_delay_max] inf_to_hosp; // delay distribution from infecion to hospital admission + real mwpd; // mL of ww produced per person per day + int if_l; // length of infection feedback pmf + vector[if_l] infection_feedback_pmf; // infection feedback pmf + int ot; // maximum time index for the hospital admissions (max number of days we could have observations) + int oht; // number of days that we have hospital admissions observations + int n_subpops; // number of WW sites + int n_ww_lab_sites; // number of unique ww-lab combos + int n_censored; // numer of observed WW data points that are below the LOD + int n_uncensored; //number not below LOD + int owt; // number of days of observed WW (should be roughly ot/7) + int uot; // unobserved time before we observe hospital admissions/ WW + int ht; // horizon time (nowcast + forecast time) + int n_weeks; // number of weeks for weekly random walk on R(t) + matrix [ot+ht, n_weeks] ind_m; // matrix to convert R(t) from weekly to daily + int tot_weeks; // number of weeks for the weekly random walk on IHR (includes unobserved time) + matrix [uot+ot+ht, tot_weeks] p_hosp_m; // matrix to convert p_hosp from weekly to daily + vector[gt_max] generation_interval; // generation interval distribution + real state_pop; // population size + vector[n_subpops] subpop_size; // the population sizes for each subpopulation + real norm_pop; + array[owt] int ww_sampled_times; // a list of all of the days on which WW is sampled + // will be mapped to the corresponding sites (ww_sampled_sites) + array[oht] int hosp_times; // the days on which hospital admissions are observed + array[owt] int ww_sampled_sites; // vector of unique sites in order of the sampled times + array[owt] int ww_sampled_lab_sites; // vector of unique lab-site combos i + // n order of the sampled times + array[n_censored] int ww_censored; // times that the WW data is below the LOD + array[n_uncensored] int ww_uncensored; // time that WW data is above LOD + vector[owt] ww_log_lod; // The limit of detection in that site at that time point + array[n_ww_lab_sites] int lab_site_to_site_map; // which lab sites correspond to which sites + array[oht] int hosp; // observed hospital admissions + array[ot + ht] int day_of_week; // integer vector with 1-7 corresponding to the weekday + vector[owt] log_conc; // observed concentration of viral genomes in WW + int compute_likelihood; // 1= use data to compute likelihood + int include_ww; // 1= include wastewater data in likelihood calculation + int include_hosp; // 1 = fit to hosp, 0 = only fit wastewater model + + // Priors + vector[6] viral_shedding_pars;// tpeak, viral peak, shedding duration mean and sd + real autoreg_rt_a; + real autoreg_rt_b; + real autoreg_rt_site_a; + real autoreg_rt_site_b; + real autoreg_p_hosp_a; + real autoreg_p_hosp_b; + real inv_sqrt_phi_prior_mean; + real inv_sqrt_phi_prior_sd; + real r_prior_mean; + real r_prior_sd; + real log10_g_prior_mean; + real log10_g_prior_sd; + real i0_over_n_prior_a; + real i0_over_n_prior_b; + real sigma_i0_prior_mode; + real sigma_i0_prior_sd; + real wday_effect_prior_mean; + real wday_effect_prior_sd; + real initial_growth_prior_mean; + real initial_growth_prior_sd; + real sigma_ww_site_prior_mean_mean; + real sigma_ww_site_prior_mean_sd; + real sigma_ww_site_prior_sd_mean; + real sigma_ww_site_prior_sd_sd; + real eta_sd_sd; + real p_hosp_prior_mean; + real p_hosp_sd_logit; + real p_hosp_w_sd_sd; + real ww_site_mod_sd_sd; + real sigma_rt_prior; + real log_phi_g_prior_mean; + real log_phi_g_prior_sd; + real inf_feedback_prior_logmean; + real inf_feedback_prior_logsd; +} + +// The transformed data +transformed data { + // viral shedding parameters + real t_peak_mean = viral_shedding_pars[1]; + real t_peak_sd = viral_shedding_pars[2]; + real viral_peak_mean = viral_shedding_pars[3]; + real viral_peak_sd = viral_shedding_pars[4]; + real dur_shed_mean = viral_shedding_pars[5]; + real dur_shed_sd = viral_shedding_pars[6]; + + // natural scale -> lognormal parameters + // https://en.wikipedia.org/wiki/Log-normal_distribution + real r_logmean = convert_to_logmean(r_prior_mean, r_prior_sd); + real r_logsd = convert_to_logsd(r_prior_mean, r_prior_sd); + // reversed generation interval + vector[gt_max] gt_rev_pmf = reverse(generation_interval); + vector[if_l] infection_feedback_rev_pmf = reverse(infection_feedback_pmf); +} + +// The parameters accepted by the model. +parameters { + vector[n_weeks-1] w; // weekly random walk of state-level mean baseline R(t) (log scale) + real eta_sd; + real autoreg_rt;// coefficient on AR process in R(t) + real log_r_mu_intercept; // state-level mean baseline reproduction number estimate (log) at t=0 + real sigma_rt; // magnitude of site level variation from state level + real autoreg_rt_site; + real autoreg_p_hosp; + matrix[n_subpops, n_weeks] error_site; // matrix of subpopulations + real i0_over_n; // initial per capita + // infection incidence + vector[n_subpops] eta_i0; // z-score on logit scale of state + // initial per capita infection incidence relative to state value + real sigma_i0; // stdev between logit state and site initial + // per capita infection incidence + vector[n_subpops] eta_growth; + real sigma_growth; + real initial_growth; // initial growth from I0 to first observed time + real inv_sqrt_phi_h; + real sigma_ww_site_mean; //mean of site level stdev + real sigma_ww_site_sd; // stdev of site level stdev + vector[n_ww_lab_sites]sigma_ww_site_raw; // let each lab-site combo have its own observation error + real p_hosp_mean; // Estimated mean IHR + vector[tot_weeks] p_hosp_w; // weekly random walk for IHR + real p_hosp_w_sd; // Estimated IHR sd + real t_peak; // time to viral load peak in shedding + real viral_peak; // log10 peak viral load shed /mL + real dur_shed; // duration of detectable viral shedding + real log10_g; // mean log10 of number of genomes per infected individual + vector[n_ww_lab_sites] ww_site_mod_raw; // lab-site specific WW modifier on the observation error + real ww_site_mod_sd; // site specific WW modifier stdev + // for now assumes the same across sites, can change or throw into + // observation error + simplex[7] hosp_wday_effect; // day of week reporting effect, sums to 1 + real infection_feedback; // infection feedback + +} +// +transformed parameters { + vector[ot + uot + ht] p_hosp; // probability of hospitalization + vector[ot + uot + ht] model_hosp_per_capita; // model estimated hospital admissions per capita + vector[oht] exp_obs_hosp; // expected observed hospital admissions + vector[ot] exp_obs_hosp_per_capita_no_wday_effect; // expected observed hospital admissions per capita before weekday effect + vector[gt_max] s; // viral kinetics trajectory (normalized) + vector[owt] exp_obs_log_v_true = rep_vector(0, owt); // expected observations at each site in log scale + vector[owt] exp_obs_log_v = rep_vector(0, owt); // expected observations at each site with modifier in log scale + vector[n_ww_lab_sites] ww_site_mod; // site specific WW mod + row_vector [ot + uot + ht] model_net_i; // number of net infected individuals shedding on each day (sum of individuals in dift stages of infection) + real phi_h = inv_square(inv_sqrt_phi_h); + vector[n_ww_lab_sites] sigma_ww_site; + vector[n_weeks] log_r_mu_t_in_weeks; // log of state level mean R(t) in weeks + vector[n_weeks] log_r_site_t_in_weeks; // log of site level mean R(t) in weeks, used as a placeholder in loop + vector[ot + ht] unadj_r; // state level R(t) before damping + matrix[n_subpops, ot+ht] r_site_t; // site_level R(t) + row_vector[ot + ht] unadj_r_site_t; // site_level R(t) before damping + row_vector[ot + uot + ht] new_i_site; // site level incident infections per capita + real pop_fraction; // proportion of state population that the subpopulation represents + vector[ot + uot + ht] state_inf_per_capita = rep_vector(0, uot + ot + ht); // state level incident infections per capita + matrix[n_subpops, ot + ht] model_log_v_ot; // expected observed viral genomes/mL at all observed and forecasted times + real g = pow(log10_g, 10); // Estimated genomes shed per infected individual + real i0 = i0_over_n * state_pop; // Initial absolute infection incidence + vector[n_subpops] i0_site_over_n; // site-level initial + // per capita infection incidence + vector[n_subpops] growth_site; + + + // State-leve R(t) AR + RW implementation: + log_r_mu_t_in_weeks = diff_ar1(log_r_mu_intercept, + autoreg_rt, + eta_sd, + w, + 0); + unadj_r = ind_m*log_r_mu_t_in_weeks; + unadj_r = exp(unadj_r); + + // Shedding kinetics trajectory + s = get_vl_trajectory(t_peak, viral_peak, dur_shed, gt_max); + + // Site level disease dynamic estimates! + i0_site_over_n = inv_logit(logit(i0_over_n) + eta_i0 * sigma_i0); + growth_site = initial_growth + eta_growth * sigma_growth; // site level growth rate + for (i in 1:n_subpops) { + // Let site-level R(t) vary around the hierarchical mean R(t) + // log(R(t)site) ~ log(R(t)state) + log(R(t)state-log(R(t)site)) + eta_site + log_r_site_t_in_weeks = ar1(log_r_mu_t_in_weeks, + autoreg_rt_site, sigma_rt, + to_vector(error_site[i]), + 1); + //convert from weekly to daily + unadj_r_site_t = exp(to_row_vector(ind_m*(log_r_site_t_in_weeks))); + + { + tuple(vector[num_elements(state_inf_per_capita)], vector[num_elements(unadj_r)]) output; + output = generate_infections( + to_vector(unadj_r_site_t), + uot, + gt_rev_pmf, + log(i0_site_over_n[i]), + growth_site[i], + ht, + infection_feedback, + infection_feedback_rev_pmf + ); + new_i_site = to_row_vector(output.1); + r_site_t[i] = to_row_vector(output.2); + } + + // For each site, tack on number of state infections + // site level infection dynamics sum to the total state infections: + pop_fraction = subpop_size[i] / norm_pop; + state_inf_per_capita += pop_fraction * to_vector(new_i_site); + + model_net_i = to_row_vector(convolve_dot_product(to_vector(new_i_site), + reverse(s), (uot + ot + ht))); + + + model_log_v_ot[i] = log(10) * log10_g + + log(model_net_i[(uot+1):(uot + ot + ht) ] + 1e-8) - + log(mwpd); + } + + + // Set up p_hosp as an AR(1) process that regresses back towards the initial value of p_hosp + p_hosp = assemble_p_hosp(p_hosp_m, p_hosp_mean, p_hosp_w_sd, + autoreg_p_hosp, p_hosp_w, tot_weeks, 1); + + // Expected hospital admissions per capita: + // This is a convolution of incident infections and the hospital-admission delay distribution + // generates all hospitalizations, across unobserved time, observed time, and forecast time + model_hosp_per_capita = convolve_dot_product(p_hosp .* state_inf_per_capita, reverse(inf_to_hosp), + ot + uot + ht); + + // predicted hospital admissions per capita + exp_obs_hosp_per_capita_no_wday_effect = model_hosp_per_capita[uot + 1 : uot + ot]; + // apply the weekday effect so these are distributed with fewer admits on Sat & Sun + // multiply by state population to convert from predicted per capita admissions to + // predicted absolute admissions + exp_obs_hosp = state_pop * day_of_week_effect( + exp_obs_hosp_per_capita_no_wday_effect[hosp_times], + day_of_week[hosp_times], + hosp_wday_effect); + + // Observations at the site level (genomes/person/day) are: + // get a vector of genomes/person/day on the days WW was measured + // These are the true expected genomes at the site level before observation error + // (which is at the lab-site level) + for (i in 1:owt) { + exp_obs_log_v_true[i] = model_log_v_ot[ww_sampled_sites[i], ww_sampled_times[i]]; + } + + // modify by lab-site specific variation (multiplier!) + ww_site_mod = ww_site_mod_raw * ww_site_mod_sd; + // LHS log transformed obs genomes per person-day, RHS multiplies the expected observed + // genomes by the site-specific multiplier at that sampling time + exp_obs_log_v = exp_obs_log_v_true + ww_site_mod[ww_sampled_lab_sites]; + // Option to add a population offset here at some point log(model_V) + site_level_multiplier+ pop_ww[ww_sampled_sites] + + // Get the transformed lab-site level error (NCP for sigma_site ~ n(mean_sigma_site, sigma_sigma_ww_site)) + sigma_ww_site = sigma_ww_site_mean + sigma_ww_site_sd*sigma_ww_site_raw; +} + +// Prior and sampling distribution +model { + // priors + vector[7] effect_mean = rep_vector(wday_effect_prior_mean, 7); + w ~ std_normal(); + eta_sd ~ normal(0, eta_sd_sd); + autoreg_rt_site ~ beta(autoreg_rt_site_a, autoreg_rt_site_b); + + autoreg_rt ~ beta(autoreg_rt_a, autoreg_rt_b); + autoreg_p_hosp ~ beta(autoreg_p_hosp_a, autoreg_p_hosp_b); + log_r_mu_intercept ~ normal(r_logmean, r_logsd); + to_vector(error_site) ~ std_normal(); + sigma_rt ~ normal(0, sigma_rt_prior); + i0_over_n ~ beta(i0_over_n_prior_a, + i0_over_n_prior_b); + sigma_i0 ~ normal(sigma_i0_prior_mode, + sigma_i0_prior_sd); + eta_i0 ~ std_normal(); + sigma_growth ~ normal(0, 0.05); + eta_growth ~ std_normal(); + initial_growth ~ normal(initial_growth_prior_mean, initial_growth_prior_sd); + inv_sqrt_phi_h ~ normal(inv_sqrt_phi_prior_mean, inv_sqrt_phi_prior_sd); + sigma_ww_site_mean ~ normal(sigma_ww_site_prior_mean_mean, sigma_ww_site_prior_mean_sd); + sigma_ww_site_sd ~ normal(sigma_ww_site_prior_sd_mean, sigma_ww_site_prior_sd_sd); + sigma_ww_site_raw ~ std_normal(); + log10_g ~ normal(log10_g_prior_mean, log10_g_prior_sd); + hosp_wday_effect ~ normal(effect_mean, wday_effect_prior_sd); + p_hosp_mean ~ normal(logit(p_hosp_prior_mean), p_hosp_sd_logit); // logit scale + p_hosp_w ~ std_normal(); + p_hosp_w_sd ~ normal(0, p_hosp_w_sd_sd); + t_peak ~ normal(t_peak_mean, t_peak_sd); + viral_peak ~ normal(viral_peak_mean, viral_peak_sd); + dur_shed ~ normal(dur_shed_mean, dur_shed_sd); + ww_site_mod_raw ~ std_normal(); + ww_site_mod_sd ~ normal(0, ww_site_mod_sd_sd); + infection_feedback ~ lognormal(inf_feedback_prior_logmean, inf_feedback_prior_logsd); + + //Compute log likelihood + if (compute_likelihood == 1) { + if (include_ww == 1) { + // Both genomes/person/day and observation error are now vectors + //log_conc ~ normal(exp_obs_log_v, sigma_ww_site[ww_sampled_lab_sites]); + // if non-censored: P(log_conc | expected log_conc) + log_conc[ww_uncensored] ~ normal(exp_obs_log_v[ww_uncensored], sigma_ww_site[ww_sampled_lab_sites[ww_uncensored]]); + // The stdev is at the lab-site-level + // if censored: P(expected_log_conc <= LOD) + target += normal_lcdf(ww_log_lod[ww_censored]| exp_obs_log_v[ww_censored], + sigma_ww_site[ww_sampled_lab_sites[ww_censored]]); + } + + if (include_hosp == 1) { + hosp ~ neg_binomial_2(exp_obs_hosp, phi_h); + } + } // end if for computing log likelihood +} + +generated quantities { + array[ot + ht] real pred_hosp; + array[ot + ht] real pred_new_i; + array[n_ww_lab_sites, ot + ht] real pred_ww; /// viral genome copies/person/day + vector[ot + ht] exp_state_ww_conc; + vector[ot + ht] state_log_c; + vector[uot + ot + ht] state_model_net_i; + vector [n_subpops] site_i0_over_n_start; + vector[ot + ht] rt; // state level R(t) + + for(i in 1:n_subpops) { + site_i0_over_n_start[i] = i0_site_over_n[i] * + exp(growth_site[i] * uot); + } + + pred_hosp = neg_binomial_2_rng(state_pop * day_of_week_effect(model_hosp_per_capita[uot + 1 : + uot + ot + ht], + day_of_week, + hosp_wday_effect), + phi_h); + pred_new_i = neg_binomial_2_rng(state_pop * state_inf_per_capita[uot + 1 : uot + ot + ht], phi_h); + + // Here need to iterate through each lab-site, find the corresponding site + // and apply the expected lab-site error + for(i in 1:n_ww_lab_sites) { + pred_ww[i] = normal_rng(model_log_v_ot[lab_site_to_site_map[i], 1 : ot + ht] + ww_site_mod[i], + sigma_ww_site[i]); + } + + state_model_net_i = convolve_dot_product(state_inf_per_capita, + reverse(s), (uot + ot + ht)); + state_log_c = log(10) * log10_g + + log(state_model_net_i[(uot + 1): (uot + ot + ht)] + 1e-8) - + log(mwpd); + + exp_state_ww_conc = exp(state_log_c); + + // Deterministic calculation of state level R(t) from incident infections + // and the generation interval + rt = (state_inf_per_capita ./ convolve_dot_product(state_inf_per_capita, gt_rev_pmf, uot + ot + ht))[uot+1: uot + ot + ht]; + +} diff --git a/vignettes/wwinference.Rmd b/vignettes/wwinference.Rmd new file mode 100644 index 00000000..9d294e43 --- /dev/null +++ b/vignettes/wwinference.Rmd @@ -0,0 +1,74 @@ +--- +title: "Getting started with wwinference" +description: "A quick start example demonstrating the use of wwinference to jointly fit wastewater and hospital admissions data" +author: "Kaitlyn Johnson" +date: "2024-06-27" +output: + bookdown::html_vignette2: + fig_caption: yes + code_folding: show +pkgdown: + as_is: true +vignette: > + %\VignetteIndexEntry{Getting started with wwinference} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +# Quick start + +In this quick start, we demonstrate using `wwinference` to specify and fit a +minimal model using daily COVID-19 hospital admissions from a "global" population and +viral concentrations in wastewater from a few "local" wastewater treatment plants, +which come from subsets of the larger population. +This is intended to be used as a reference for those +interested in fitting the `wwinference` model to their own data. + +# Package + +In this quick start, we also use `dplyr` `tidybayes` and `ggplot2` packages. +These are installed as dependencies when `wwinference` is installed. + +```{r} +library(wwinference) +library(dplyr) +library(ggplot2) +library(tidybayes) +``` + +# Data + +The model expects two types of data: daily counts of hospital admissions data +from the larger "global" population, and wastewater concentration +data from wastewater treatment plants whose catchment areas are contained within +the larger "global" population. For this quick start, we will use +simulated data, modeled after a hypothetical US state with 6 wastewater treatment +plants reporting data on viral concentrations of SARS-COV-2, covering about X % +of the state's population. This simulated data is assigned to cover dates +from September 1, 2023 to December 1, 2023. These data are provided as part +of the package data. + +These data are already in a format that can be used for `wwinference`. For the +hospital admissions data, it contains: +- a date (column `date`): the date of the observation, in this case, the date +the hospital admissions occurred +- a count (column `daily_hosp_admits`): the number of hospital admissions observed on that day +- a population size (column `state_pop`): the population size covered +by the hospital admissions data, in this case, the size of the theoretical state. + +For the wastewater data, it contains: +- a date (column `date`): the date the sample was collected +- a site indicator (column `site`): the unique identifier for the wastewater treatment plant +that the sample was collected from +- a lab indicator (column `lab`): the unique identifier for the lab where the sample was processed +- a concentration (column `genome_copies_mL`): the measured genome copies per mL from +the sample collected, in natural scale +- a limit of detection (column `lod`): the limit of detection of the assay used to +process the sample, in natural scale +- a site population size (column `site_pop`): the population size covered by the +wastewater catchment area of that site + +```{r} +hosp_data <- wwinference::hosp_data +ww_data <- wwinference::ww_data +``` From 70c69a5ab86ae87f56648e2d4bef3bdb420dcec6 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Thu, 27 Jun 2024 11:32:28 -0400 Subject: [PATCH 020/103] start of generating simulated data --- NAMESPACE | 1 + R/generate_simulated_data.R | 466 +++++++++++++++++++++++++++++++++ data-raw/vignette_data.R | 7 + man/generate_simulated_data.Rd | 120 +++++++++ vignettes/wwinference.Rmd | 22 ++ 5 files changed, 616 insertions(+) create mode 100644 R/generate_simulated_data.R create mode 100644 data-raw/vignette_data.R create mode 100644 man/generate_simulated_data.Rd diff --git a/NAMESPACE b/NAMESPACE index c42a4827..582284f5 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -1,3 +1,4 @@ # Generated by roxygen2: do not edit by hand export(add_pmfs) +export(generate_simulated_data) diff --git a/R/generate_simulated_data.R b/R/generate_simulated_data.R new file mode 100644 index 00000000..5d88ad9e --- /dev/null +++ b/R/generate_simulated_data.R @@ -0,0 +1,466 @@ +#' Generate simulated data from the underlying model's generative process +#' @description +#' Function that allows the user to generate hospital admissions and site-level +#' wastewater data directly from the generative model, specifying the conditions +#' and parameters to generate from. +#' +#' @param site_level_inf_dynamics if TRUE then the toy data has variation in the +#' site-level R(t), if FALSE, assumes same underlying R(t) for the state as in +#' each site +#' @param site_level_conc_dynamics if TRUE then the toy data has variation in +#' the site-level concentration each day, if FALSE, then the relationship from +#' infectionto concentration in each site is the same across sites +#' @param r_in_weeks The mean weekly R(t) that drives infection dynamics at the +#' state-level. This gets jittered with random noise to add week-to-week +#' variation. +#' @param n_sites Number of sites +#' @param ww_pop_sites Catchment area in each of those sites (order must match) +#' @param pop_size Population size in the state +#' @param n_lab_sites NUmber of unique combinations of labs and sites. Must be +#' greater than or equal to `n_sites` +#' @param map_site_to_lab Vector mapping the sites to the lab-sites in order +#' of the sites +#' @param ot observed time: length of hospital admissions calibration time in +#' days +#' @param nt nowcast time: length of time between last hospital admissions date +#' and forecast date in days +#' @param forecast_time duration of the forecast in days e.g. 28 days +#' @param sim_start_date the start date of the simulation, used to get a weekday +#' vector +#' @param hosp_wday_effect a simplex of length 7 describing how the hospital +#' admissions are spread out over a week, starting at Monday = 1 +#' @param i0_over_n the initial per capita infections in the state +#' @param initial_growth exponential growth rate during the unobserved time +#' @param sd_in_lab_level_multiplier standard deviation in the log of the site- +#' lab level multiplier determining how much variation there is systematically +#' in site-labs from the state mean +#' @param mean_obs_error_in_ww_lab_site mean day to day variation in observed +#' wastewater concentrations across all lab-sites +#' @param mean_reporting_freq mean frequency of wastewater measurements across +#' sites in per day (e.g. 1/7 is once per week) +#' @param sd_reporting_freq standard deviation in the frequency of wastewater +#' measurements across sites +#' @param mean_reporting_latency mean time from forecast date to last +#' wastewater sample collection date, across sites +#' @param sd_reporting_latency standard deviation in the time from the forecast +#' date to the last wastewater sample collection date, across sites +#' @param mean_log_lod mean log of the LOD in each lab-site +#' @param sd_log_lod standard deviation in the log of the LOD across sites +#' @param example_params_path path to the toml file with the parameters to use +#' to generate the simulated data +#' +#' @return a list containing two dataframes. hosp_data is a dataframe containing +#' the number of daily hospital admissions by day for a theoretical US state. +#' ww_data is a dataframe containing the measured wastewater concentrations +#' in each site alongside other metadata necessary for modeling that data. +#' @export +#' +#' @examples +#' # Generate a simulated dataset from a hypothetical state with 6 sites and 2 +#' # different labs +#' sim_data <- generate_simulated_data( +#' n_sites = 6, +#' map_site_to_lab = c(rep(1, 4), rep(2, 2)) +#' ) +#' hosp_data <- sim_data$hosp_data +#' ww_data <- sim_data$ww_data +generate_simulated_data <- function(site_level_inf_dynamics = TRUE, # nolint + site_level_conc_dynamics = FALSE, + r_in_weeks = c( + rep(1.1, 5), rep(0.9, 5), + 1 + 0.007 * 1:16 + ), + n_sites = 4, + ww_pop_sites = c(4e5, 2e5, 1e5, 5e4), + pop_size = 1e6, + n_lab_sites = 5, + map_site_to_lab = c(1, 1, 2, 3, 4), + ot = 90, + nt = 9, + forecast_time = 28, + sim_start_date = ymd("2023-10-30"), + hosp_wday_effect = c( + 0.95, 1.01, 1.02, + 1.02, 1.01, 1, + 0.99 + ) / 7, + i0_over_n = 5e-4, + initial_growth = 1e-4, + sd_in_lab_level_multiplier = 0.25, + mean_obs_error_in_ww_lab_site = 0.3, + mean_reporting_freq = 1 / 7, + sd_reporting_freq = 1 / 14, + mean_reporting_latency = 7, + sd_reporting_latency = 5, + mean_log_lod = 3.8, + sd_log_lod = 0.2, + input_params_path = + fs::path_package("extdata", + "example_params.toml", + package = "wwinference" + )) { + stopifnot( + "weekly R(t) passed in isn't long enough" = + length(r_in_weeks) >= (ot + nt + forecast_time) / 7 + ) + stopifnot( + "Sum of wastewater site populations is greater than state pop" = + pop_size > sum(ww_pop_sites) + ) + + if (length(ww_pop_sites) < n_sites) { + ww_pop_sites <- rnorm(n_sites, + mean = (0.7 * pop_size / n_sites), + sd = 0.1 * (0.7 * pop_size / n_sites) + ) + } + if (n_lab_sites < n_sites) { + n_lab_sites <- n_sites + map_site_to_lab <- 1:n_sites + } + + # Get pop fractrions of each subpop + pop_fraction <- c( + ww_pop_sites / pop_size, + (pop_size - sum(ww_pop_sites)) / pop_size + ) + + # Expose the stan functions (can use any of the models here) + model <- cmdstan_model( + stan_file = system.file( + "stan", "wwinference.stan", + package = "wwinference" + ), + compile = TRUE, + compile_standalone = TRUE, + force_recompile = TRUE + ) + + model$expose_functions(global = TRUE) + params <- get_params(input_params_path) # load in a data table with parameters + par_names <- colnames(params) # pull them into memory + for (i in seq_along(par_names)) { + assign(par_names[i], as.double(params[i])) + } + + site_lab_map <- data.frame( + lab_site = 1:n_lab_sites, + site = map_site_to_lab + ) |> + left_join(data.frame(site = 1:n_sites, ww_pop = ww_pop_sites)) + + ht <- nt + forecast_time + n_weeks <- ceiling((ot + ht) / 7) + tot_weeks <- ceiling((uot + ot + ht) / 7) + # We need dates to get a weekday vector + dates <- seq( + from = sim_start_date, to = + (sim_start_date + days(ot + nt + ht - 1)), by = "days" + ) + log_i0_over_n <- log(i0_over_n) + day_of_week_vector <- lubridate::wday(dates, week_start = 1) + date_df <- data.frame( + t = 1:(ot + nt + ht), + date = dates + ) + forecast_date <- date_df %>% + filter(t == ot + nt) %>% + pull(date) + # set the lab-site multiplier presumably from lab measurement processes + log_m_lab_sites <- rnorm(n_lab_sites, + mean = 0, sd = sd_in_lab_level_multiplier + ) + # Assign a site level observation error to each site, but have it scale + # inversely with the catchment area of the site (this may not be the right + # scaling) + sigma_ww_lab_site <- mean(site_lab_map$ww_pop) * + mean_obs_error_in_ww_lab_site / site_lab_map$ww_pop + # Set randomly the lab-site reporting avg frequency (per day) and the + # reporting latency (in days). Will use this to sample times in the observed + # data + lab_site_reporting_freq <- abs(rnorm( + n = n_lab_sites, mean = mean_reporting_freq, + sd = sd_reporting_freq + )) + lab_site_reporting_latency <- pmax(1, ceiling(rnorm( + n = n_lab_sites, + mean = mean_reporting_latency, sd = sd_reporting_latency + ))) + # Set a lab-site-specific LOD in log scale + lod_lab_site <- rnorm(n_lab_sites, mean = mean_log_lod, sd = sd_log_lod) + + ## Delay distributions---------------------------------------------------- + generation_interval <- simulate_double_censored_pmf( + max = gt_max, meanlog = mu_gi, sdlog = sigma_gi, fun_dist = rlnorm, n = 5e6 + ) %>% drop_first_and_renormalize() + + + + # Set infection feedback to generation interval + infection_feedback_pmf <- generation_interval + infection_feedback_rev_pmf <- rev(infection_feedback_pmf) + infection_feedback <- 0 + if_feedback <- 1 + # Delay from infection to hospital admission: incubation period + + # time from symptom onset to hospital admission + inc <- make_incubation_period_pmf( + backward_scale, backward_shape, r + ) + sym_to_hosp <- make_hospital_onset_delay_pmf(neg_binom_mu, neg_binom_size) + inf_to_hosp <- make_reporting_delay_pmf(inc, sym_to_hosp) + # shedding kinetics delay distribution + vl_trajectory <- model$functions$get_vl_trajectory( + t_peak_mean, viral_peak_mean, + duration_shedding_mean, gt_max + ) + + # Generate the state level weekly R(t) before infection feedback------------- + unadj_r_weeks <- (r_in_weeks * rnorm(length(r_in_weeks), 1, 0.03))[1:n_weeks] + # Convert to daily for input into renewal equation + ind_m <- get_ind_m(ot + ht, n_weeks) + unadj_r <- ind_m %*% unadj_r_weeks + + + # Generate the site-level expected observed concentrations ----------------- + # first by adding + # variation to the site-level R(t) in each site, and then by adding time + # varying deviations in true concentration in each site, and then adding a + # site level true variability, and then adding lab-site level multiplier and + # obersvation error + + + ### Generate the site level infection dynamics------------------------------- + new_i_over_n_site <- matrix(nrow = n_sites + 1, ncol = (uot + ot + ht)) + r_site <- matrix(nrow = n_sites + 1, ncol = (ot + ht)) + # Generate site-level R(t) + if (isTRUE(site_level_inf_dynamics)) { + log_r_state_week <- log(unadj_r_weeks) + log_r_site <- matrix(nrow = n_sites + 1, ncol = n_weeks) + initial_growth_site <- vector(length = n_sites + 1) + log_i0_over_n_site <- vector(length = n_sites + 1) + for (i in 1:(n_sites + 1)) { + if (i <= n_sites) { + log_r_site[i, ] <- rnorm( + n = n_weeks, + mean = log_r_state_week, + sd = 0.05 + ) # sigma_rt + initial_growth_site[i] <- rnorm( + n = 1, mean = initial_growth, + sd = initial_growth_prior_sd + ) + log_i0_over_n_site[i] <- rnorm( + n = 1, mean = log_i0_over_n, + sd = 0.5 + ) + } else { + log_r_site[i, ] <- log_r_state_week + initial_growth_site[i] <- initial_growth + log_i0_over_n_site[i] <- log_i0_over_n + } + } + + new_i_over_n <- rep(0, (uot + ot + ht)) + for (i in 1:(n_sites + 1)) { + unadj_r_site <- ind_m %*% exp(log_r_site[i, ]) # daily R site + site_output <- model$functions$generate_infections( + unadj_r_site, uot, rev(generation_interval), log_i0_over_n_site[i], + initial_growth_site[i], ht, + infection_feedback, infection_feedback_rev_pmf + ) + new_i_over_n_site[i, ] <- site_output[[1]] + new_i_over_n <- new_i_over_n + pop_fraction[i] * site_output[[1]] + r_site[i, ] <- site_output[[2]] + } + } else { # site level R(t) and infections = state level R(t) and infections + for (i in 1:n_sites) { + new_i_over_n_site[i, ] <- new_i_over_n + r_site[i, ] <- rt + } + } + + rt <- (new_i_over_n / model$functions$convolve_dot_product( + new_i_over_n, rev(generation_interval), uot + ot + ht + ))[(uot + 1):(uot + ot + ht)] + + + # Generate expected state level hospitalizations from subpop infections----- + # generate state-level incident infections using renewal equation for + # all time points + + # Generate a time varying P(hosp|infection), + p_hosp_int_logit <- qlogis(p_hosp_mean) # p_hosp_mean is in linear scale + p_hosp_m <- get_ind_m(uot + ot + ht, tot_weeks) # matrix needed to convert + # from weekly to daily + p_hosp_w_logit <- p_hosp_int_logit + rnorm( + tot_weeks - 1, 0, + p_hosp_w_sd_sd + ) + # random walk on p_hosp in logit scale + p_hosp_logit_weeks <- c( + p_hosp_int_logit, + p_hosp_w_logit + ) # combine with intercept + p_hosp_logit_days <- p_hosp_m %*% c( + p_hosp_int_logit, + p_hosp_w_logit + ) # convert to days + p_hosp_days <- plogis(p_hosp_logit_days) # convert back to linear scale + # Corresponds to a standard deviation in linear scale of 0.0003 + + # Get expected trajectory of hospital admissions from incident infections + # by convolving scaled incident infections with delay from infection to + # hospital admission + model_hosp_over_n <- model$functions$convolve_dot_product( + p_hosp_days * new_i_over_n, + rev(inf_to_hosp), + uot + ot + ht + )[(uot + 1):(uot + ot + ht)] + exp_hosp <- pop_size * model$functions$day_of_week_effect( + model_hosp_over_n, + day_of_week_vector, + hosp_wday_effect + ) + # Add observation error, get hospital admissions in the forecast period + exp_obs_hosp <- rnbinom( + n = length(exp_hosp), mu = exp_hosp, + size = 1 / ((inv_sqrt_phi_prior_mean)^2) + ) + + + + ## Generate site-level mean genomes from infections in each site------- + log_g_over_n_site <- matrix(nrow = n_sites, ncol = (ot + ht)) + + for (i in 1:n_sites) { + model_net_i <- model$functions$convolve_dot_product( + new_i_over_n_site[i, ], + rev(vl_trajectory), + (uot + ot + ht) + )[(uot + 1):(uot + ot + ht)] + log_g_over_n_site[i, ] <- log(10) * log10_g_prior_mean + + log(model_net_i + 1e-8) + } + + ## Generate site-level true genomes ------------------------------------ + # with site multiplier and time-varying deviation + + log_exp_g_over_n_site <- matrix(nrow = n_sites, ncol = (ot + ht)) + + for (i in 1:n_sites) { + if (isFALSE(site_level_conc_dynamics)) { + log_exp_g_over_n_site[i, ] <- log_g_over_n_site[i, ] + } else { + log_exp_g_over_n_site[i, ] <- log_g_over_n_site[i, ] + + rnorm( + n = (ot + ht), mean = 0, + sd = 0.01 + ) # sigma_log_conc prior + } + } + + # Add on site-lab-level observation error ----------------------------------- + log_obs_g_over_n_lab_site <- matrix(nrow = n_lab_sites, ncol = (ot + ht)) + for (i in 1:n_lab_sites) { + log_g_w_multiplier <- log_exp_g_over_n_site[map_site_to_lab[i], ] + + log_m_lab_sites[i] + log_obs_g_over_n_lab_site[i, ] <- log_g_w_multiplier + + rnorm( + n = (ot + ht), mean = 0, + sd = sigma_ww_lab_site[i] + ) + } + + # Sample from some lab-sites more frequently than others and add different + # latencies for each lab-site + log_obs_conc_lab_site <- matrix(nrow = n_lab_sites, ncol = ot + ht) + for (i in 1:n_lab_sites) { + st <- sample(1:(ot + nt), round((ot + nt) * lab_site_reporting_freq[i])) + stl <- pmin((ot + nt - lab_site_reporting_latency[i]), st) + log_obs_conc_lab_site[i, stl] <- log_obs_g_over_n_lab_site[i, stl] - + log(ml_of_ww_per_person_day) + } + + # Format the data----------------------------------------------------------- + + df_long <- as.data.frame(t(log_obs_conc_lab_site)) %>% + dplyr::mutate(t = 1:(ot + ht)) %>% + tidyr::pivot_longer(!t, + names_to = "lab_wwtp_unique_id", + names_prefix = "V", + values_to = "log_conc" + ) %>% + dplyr::mutate( + lab_wwtp_unique_id = as.integer(lab_wwtp_unique_id) + ) %>% + dplyr::left_join(date_df, by = "t") %>% + dplyr::left_join( + data.frame( + lab_site = 1:n_lab_sites, + lod_sewage = lod_lab_site + ), + by = c("lab_wwtp_unique_id" = "lab_site") + ) %>% + dplyr::mutate(below_LOD = ifelse(log_conc >= lod_sewage, 0, 1)) %>% + dplyr::mutate(lod_sewage = case_when( + is.na(log_conc) ~ NA, + !is.na(log_conc) ~ lod_sewage + )) + + # Make a hospital admissions dataframe to bind to + df_hosp <- data.frame( + t = 1:(ot + ht), + daily_hosp_admits = c(exp_obs_hosp[1:ot], rep(NA, ht)), + daily_hosp_admits_for_eval = exp_obs_hosp + ) + + # State infections per capita + df_inf <- data.frame( + t = 1:(ot + ht), + inf_per_capita = new_i_over_n[(uot + 1):(uot + ot + ht)] + ) + + example_df <- df_long %>% + dplyr::left_join(df_hosp, + by = "t" + ) %>% + dplyr::mutate( + pop = pop_size, + forecast_date = forecast_date, + hosp_calibration_time = ot + ) %>% + dplyr::left_join(site_lab_map, + by = c("lab_wwtp_unique_id" = "lab_site") + ) %>% + dplyr::left_join(df_inf, + by = "t" + ) + + + + # Get the true parameter dataframe, making sure this is formatted the same as + # as the output from get_full_param_distrib() + p_hosp_df <- data.frame( + name = "p_hosp", true_value = p_hosp_days, + index_rows = NA, + index_cols = seq_along(p_hosp_days) + ) + r_df <- data.frame( + name = "rt", true_value = rt, + index_rows = NA, + index_cols = seq_along(rt) + ) + log10_g_df <- data.frame( + name = "log10_g", true_value = log10_g_prior_mean, + index_rows = NA, + index_cols = NA + ) + + param_df <- rbind(p_hosp_df, r_df, log10_g_df) + + toy_data_and_params <- list( + param_df = param_df, + example_df = example_df + ) + return(toy_data_and_params) +} diff --git a/data-raw/vignette_data.R b/data-raw/vignette_data.R new file mode 100644 index 00000000..346b5bc3 --- /dev/null +++ b/data-raw/vignette_data.R @@ -0,0 +1,7 @@ +set.seed(1) +simulated_data <- wwinference::generate_simulated_data() +hosp_data <- simulated_data$hosp_data +ww_data <- simulated_data$ww_data + +usethis::use_data(hosp_data, overwrite = TRUE) +usethis::use_data(ww_data, overwrite = TRUE) diff --git a/man/generate_simulated_data.Rd b/man/generate_simulated_data.Rd new file mode 100644 index 00000000..4a601d1e --- /dev/null +++ b/man/generate_simulated_data.Rd @@ -0,0 +1,120 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/generate_simulated_data.R +\name{generate_simulated_data} +\alias{generate_simulated_data} +\title{Generate simulated data from the underlying model's generative process} +\usage{ +generate_simulated_data( + site_level_inf_dynamics = TRUE, + site_level_conc_dynamics = FALSE, + r_in_weeks = c(rep(1.1, 5), rep(0.9, 5), 1 + 0.007 * 1:16), + n_sites = 4, + ww_pop_sites = c(4e+05, 2e+05, 1e+05, 50000), + pop_size = 1e+06, + n_lab_sites = 5, + map_site_to_lab = c(1, 1, 2, 3, 4), + ot = 90, + nt = 9, + forecast_time = 28, + sim_start_date = ymd("2023-10-30"), + hosp_wday_effect = c(0.95, 1.01, 1.02, 1.02, 1.01, 1, 0.99)/7, + i0_over_n = 5e-04, + initial_growth = 1e-04, + sd_in_lab_level_multiplier = 0.25, + mean_obs_error_in_ww_lab_site = 0.3, + mean_reporting_freq = 1/7, + sd_reporting_freq = 1/14, + mean_reporting_latency = 7, + sd_reporting_latency = 5, + mean_log_lod = 3.8, + sd_log_lod = 0.2, + input_params_path = fs::path_package("extdata", "example_params.toml", package = + "wwinference") +) +} +\arguments{ +\item{site_level_inf_dynamics}{if TRUE then the toy data has variation in the +site-level R(t), if FALSE, assumes same underlying R(t) for the state as in +each site} + +\item{site_level_conc_dynamics}{if TRUE then the toy data has variation in the +site-level concentration each day, if FALSE, then the relationship from infection +to concentration in each site is the same across sites} + +\item{r_in_weeks}{The mean weekly R(t) that drives infection dynamics at the state- +level. This gets jittered with random noise to add week-to-week variation.} + +\item{n_sites}{Number of sites} + +\item{ww_pop_sites}{Catchment area in each of those sites (order must match)} + +\item{pop_size}{Population size in the state} + +\item{n_lab_sites}{NUmber of unique combinations of labs and sites. Must be +greater than or equal to \code{n_sites}} + +\item{map_site_to_lab}{Vector mapping the sites to the lab-sites in order +of the sites} + +\item{ot}{observed time: length of hospital admissions calibration time in days} + +\item{nt}{nowcast time: length of time between last hospital admissions date +and forecast date in days} + +\item{forecast_time}{duration of the forecast in days e.g. 28 days} + +\item{sim_start_date}{the start date of the simulation, used to get a weekday +vector} + +\item{hosp_wday_effect}{a simplex of length 7 describing how the hospital +admissions are spread out over a week, starting at Monday = 1} + +\item{i0_over_n}{the initial per capita infections in the state} + +\item{initial_growth}{exponential growth rate during the unobserved time} + +\item{sd_in_lab_level_multiplier}{standard deviation in the log of the site- +lab level multiplier determining how much variation there is systematically +in site-labs from the state mean} + +\item{mean_obs_error_in_ww_lab_site}{mean day to day variation in observed +wastewater concentrations across all lab-sites} + +\item{mean_reporting_freq}{mean frequency of wastewater measurements across +sites in per day (e.g. 1/7 is once per week)} + +\item{sd_reporting_freq}{standard deviation in the frequency of wastewater +measurements across sites} + +\item{mean_reporting_latency}{mean time from forecast date to last +wastewater sample collection date, across sites} + +\item{sd_reporting_latency}{standard deviation in the time from the forecast +date to the last wastewater sample collection date, across sites} + +\item{mean_log_lod}{mean log of the LOD in each lab-site} + +\item{sd_log_lod}{standard deviation in the log of the LOD across sites} + +\item{example_params_path}{path to the toml file with the parameters to use to +generate the simulated data} +} +\value{ +a list containing two dataframes. hosp_data is a dataframe containing +the number of daily hospital admissions by day for a theoretical US state. +ww_data is a dataframe containing the measured wastewater concentrations +in each site alongside other metadata necessary for modeling that data. +} +\description{ +Function that allows the user to generate hospital admissions and site-level +wastewater data directly from the generative model, specifying the conditions +and parameters to generate from. +} +\examples{ +# Generate a simulated dataset from a hypothetical state with 6 sites and 2 +# different labs +sim_data <- generate_simulated_data(n_sites = 6, + map_site_to_lab = c(rep(1,4), rep(2,2)) +hosp_data <- sim_data$hosp_data +ww_data <- sim_data$ww_data +} diff --git a/vignettes/wwinference.Rmd b/vignettes/wwinference.Rmd index 9d294e43..bdd26fae 100644 --- a/vignettes/wwinference.Rmd +++ b/vignettes/wwinference.Rmd @@ -72,3 +72,25 @@ wastewater catchment area of that site hosp_data <- wwinference::hosp_data ww_data <- wwinference::ww_data ``` + +We'll make some plots of the data just to make sure it looks like what we'd expect: + +```{r} +# Add plots +``` + +# Pre-processing +We will need to set some metadata to facilitate model specification. These include: +- the date on which we are making the forecast +- number of days to calibrate the model for +- number of days to forecast +- specification of the generation interval for COVID-19 +- specification of the delay from infection to hospital admissions for COVID-19 +- setting the priors and parameters, some of which are COVID-19 specific +```{r} +params <- get_params( + system.file("extdata", "example_params.toml", + package = "wwinference" + ) +) +``` From d453696d2eb970e9a9b792f32b3b91efd836e7ae Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 28 Jun 2024 14:53:24 -0400 Subject: [PATCH 021/103] add parameters, package, more documentation to gen simulated data --- R/generate_simulated_data.R | 112 ++++++++++++++++--------------- R/wwinference-package.R | 13 ++++ inst/extdata/example_params.toml | 97 ++++++++++++++++++++++++++ 3 files changed, 168 insertions(+), 54 deletions(-) create mode 100644 R/wwinference-package.R create mode 100644 inst/extdata/example_params.toml diff --git a/R/generate_simulated_data.R b/R/generate_simulated_data.R index 5d88ad9e..1246c017 100644 --- a/R/generate_simulated_data.R +++ b/R/generate_simulated_data.R @@ -4,49 +4,51 @@ #' wastewater data directly from the generative model, specifying the conditions #' and parameters to generate from. #' -#' @param site_level_inf_dynamics if TRUE then the toy data has variation in the -#' site-level R(t), if FALSE, assumes same underlying R(t) for the state as in -#' each site -#' @param site_level_conc_dynamics if TRUE then the toy data has variation in -#' the site-level concentration each day, if FALSE, then the relationship from -#' infectionto concentration in each site is the same across sites -#' @param r_in_weeks The mean weekly R(t) that drives infection dynamics at the -#' state-level. This gets jittered with random noise to add week-to-week -#' variation. -#' @param n_sites Number of sites -#' @param ww_pop_sites Catchment area in each of those sites (order must match) -#' @param pop_size Population size in the state -#' @param n_lab_sites NUmber of unique combinations of labs and sites. Must be -#' greater than or equal to `n_sites` -#' @param map_site_to_lab Vector mapping the sites to the lab-sites in order +#' @param r_in_weeks vector indcating the mean weekly R(t) that drives infection +#' dynamics at the state-level. This gets jittered with random noise to add +#' week-to-week variation. +#' @param n_sites integer indicating the number of sites +#' @param ww_pop_sites vector indicating the population size in the +#' catchment area in each of those sites (order must match) +#' @param pop_size integer indicating the population size in the hypothetical +#' state +#' @param n_lab_sites integer indicating the nummber of unique combinations of +#' labs and sites. Must be greater than or equal to `n_sites` +#' @param map_site_to_lab vector mapping the sites to the lab-sites in order #' of the sites -#' @param ot observed time: length of hospital admissions calibration time in -#' days -#' @param nt nowcast time: length of time between last hospital admissions date -#' and forecast date in days -#' @param forecast_time duration of the forecast in days e.g. 28 days -#' @param sim_start_date the start date of the simulation, used to get a weekday -#' vector -#' @param hosp_wday_effect a simplex of length 7 describing how the hospital -#' admissions are spread out over a week, starting at Monday = 1 -#' @param i0_over_n the initial per capita infections in the state -#' @param initial_growth exponential growth rate during the unobserved time -#' @param sd_in_lab_level_multiplier standard deviation in the log of the site- -#' lab level multiplier determining how much variation there is systematically -#' in site-labs from the state mean -#' @param mean_obs_error_in_ww_lab_site mean day to day variation in observed -#' wastewater concentrations across all lab-sites -#' @param mean_reporting_freq mean frequency of wastewater measurements across -#' sites in per day (e.g. 1/7 is once per week) -#' @param sd_reporting_freq standard deviation in the frequency of wastewater -#' measurements across sites -#' @param mean_reporting_latency mean time from forecast date to last -#' wastewater sample collection date, across sites -#' @param sd_reporting_latency standard deviation in the time from the forecast -#' date to the last wastewater sample collection date, across sites -#' @param mean_log_lod mean log of the LOD in each lab-site -#' @param sd_log_lod standard deviation in the log of the LOD across sites -#' @param example_params_path path to the toml file with the parameters to use +#' @param ot integer indicating the observed time: length of hospital admissions +#' calibration time in days +#' @param nt integer indicating the nowcast time: length of time between last +#' hospital admissions date and forecast date in days +#' @param forecast_horizon integer indicating the duration of the forecast in +#' days e.g. 28 days +#' @param sim_start_date character string formatted as "YYYY-MM-DD" indicating +#' the start date of the simulation, used to get a weekday vector +#' @param hosp_wday_effect a vector that is a simplex of length 7 describing +#' how the hospital admissions are spread out over a week, starting at +#' Monday = 1 +#' @param i0_over_n float between 0 and 1 indicating the initial per capita +#' infections in the state +#' @param initial_growth float indicating the exponential growth rate in +#' infections (daily) during the unobserved time +#' @param sd_in_lab_level_multiplier float indicating the standard deviation in +#' the log of the site-lab level multiplier determining how much variation +#' there is systematically in site-labs from the state mean +#' @param mean_obs_error_in_ww_lab_site float indicating the mean day-to-day +#' variation in observed wastewater concentrations across all lab-sites +#' @param mean_reporting_freq float indicating the mean frequency of wastewater +#' measurements across sites in per day (e.g. 1/7 is once per week) +#' @param sd_reporting_freq float indicating the standard deviation in the +#' frequency of wastewater measurements across sites +#' @param mean_reporting_latency float indicating the mean time from forecast +#' date to last wastewater sample collection date, across sites +#' @param sd_reporting_latency float indicating the standard deviation in the +#' time from the forecast date to the last wastewater sample collection date, +#' across sites +#' @param mean_log_lod float indicating the mean log of the LOD in each lab-site +#' @param sd_log_lod float indicating the standard deviation in the log of the +#' LOD across sites +#' @param input_params_path path to the toml file with the parameters to use #' to generate the simulated data #' #' @return a list containing two dataframes. hosp_data is a dataframe containing @@ -64,21 +66,22 @@ #' ) #' hosp_data <- sim_data$hosp_data #' ww_data <- sim_data$ww_data -generate_simulated_data <- function(site_level_inf_dynamics = TRUE, # nolint - site_level_conc_dynamics = FALSE, - r_in_weeks = c( - rep(1.1, 5), rep(0.9, 5), - 1 + 0.007 * 1:16 - ), +generate_simulated_data <- function(r_in_weeks = # nolint + c( + rep(1.1, 5), rep(0.9, 5), + 1 + 0.007 * 1:16 + ), n_sites = 4, ww_pop_sites = c(4e5, 2e5, 1e5, 5e4), pop_size = 1e6, - n_lab_sites = 5, - map_site_to_lab = c(1, 1, 2, 3, 4), + n_lab_sites = 4, + map_site_to_lab = c(1, 2, 3, 4), ot = 90, nt = 9, - forecast_time = 28, - sim_start_date = ymd("2023-10-30"), + forecast_horizon = 28, + sim_start_date = lubridate::ymd( + "2023-09-01" + ), hosp_wday_effect = c( 0.95, 1.01, 1.02, 1.02, 1.01, 1, @@ -99,9 +102,10 @@ generate_simulated_data <- function(site_level_inf_dynamics = TRUE, # nolint "example_params.toml", package = "wwinference" )) { + # Some quick checks to make sure the inputs work as expected stopifnot( "weekly R(t) passed in isn't long enough" = - length(r_in_weeks) >= (ot + nt + forecast_time) / 7 + length(r_in_weeks) >= (ot + nt + forecast_horizon) / 7 ) stopifnot( "Sum of wastewater site populations is greater than state pop" = @@ -149,7 +153,7 @@ generate_simulated_data <- function(site_level_inf_dynamics = TRUE, # nolint ) |> left_join(data.frame(site = 1:n_sites, ww_pop = ww_pop_sites)) - ht <- nt + forecast_time + ht <- nt + forecast_horizon n_weeks <- ceiling((ot + ht) / 7) tot_weeks <- ceiling((uot + ot + ht) / 7) # We need dates to get a weekday vector diff --git a/R/wwinference-package.R b/R/wwinference-package.R new file mode 100644 index 00000000..87901102 --- /dev/null +++ b/R/wwinference-package.R @@ -0,0 +1,13 @@ +#' @keywords internal +"_PACKAGE" + +#' @importFrom lubridate ymd +#' @importFrom tidybayes spread_draws stat_halfeye stat_slab +#' @importFrom dplyr filter left_join select pull distinct mutate as_tibble +#' rename ungroup arrange row_number group_by +#' @importFrom tidyr pivot_wider pivot_longer +#' @importFrom ggplot2 ggplot facet_wrap geom_line geom_hline geom_point +#' geom_bar theme scale_y_continuous scale_colour_discrete scale_fill_discrete +#' geom_ribbon scale_x_date facet_grid geom_vline labs aes +#' @importFrom cmdstanr cmdstan_model +NULL diff --git a/inst/extdata/example_params.toml b/inst/extdata/example_params.toml new file mode 100644 index 00000000..ebacf844 --- /dev/null +++ b/inst/extdata/example_params.toml @@ -0,0 +1,97 @@ +[timescale] +uot = 50 + +[infection_process] +r_prior_mean = 1 +r_prior_sd = 1 +sigma_rt_prior = 0.1 +dur_inf = 7 + +sigma_i0_prior_mode = 0 +sigma_i0_prior_sd = 0.5 +i0_certainty = 5 + ## effective number of binomial trials + ## in beta prior centered on estimated i0/n + +initial_growth_prior_mean = 0 +initial_growth_prior_sd = 0.01 +autoreg_rt_a = 2 # shape1 parameter of autoreg term on Rt trend +autoreg_rt_b = 40 # shape2 parameter of autoreg on Rt trend + # mean = a/(a+b) = 0.05, stdv = sqrt(a)/b = sqrt(2)/40 = 0.035 +autoreg_rt_site_a = 1 # shape1 parameter of autoreg term on difference between + # R(t) state and R(t) site +autoreg_rt_site_b = 4 # shape2 parameter of autoreg term on difference between +# R(t) state and R(t) site +autoreg_p_hosp_a = 1 # shape1 parameter of autoreg term on IHR(t) trend +autoreg_p_hosp_b = 100 # shape2 parameter of autoreg term on IHR(t) trend +eta_sd_sd = 0.01 +infection_feedback_prior_logmean = 6.37408 # log(mode) + q^2 mode = 500, q = 0.4 +infection_feedback_prior_logsd = 0.4 + +[hospital_admission_observation_process] +# Hospitalization parameters (informative priors) +# IHR estimate from: https://www.nature.com/articles/s41467-023-39661-5 +p_hosp_mean = 0.01 +p_hosp_sd_logit = 0.3 + +# time variation in p_hosp +p_hosp_w_sd_sd = 0.01 + +inv_sqrt_phi_prior_mean = 0.1 # 1 / sqrt(100) +inv_sqrt_phi_prior_sd = 0.1414214 # 1 / sqrt(50) +wday_effect_prior_mean = 0.1428571 # 1 / 7 +wday_effect_prior_sd = 0.05 + +[wastewater_observation_process] +ml_of_ww_per_person_day = 22.7e4 +t_peak_mean = 5 +t_peak_sd = 1 +viral_peak_mean = 5.1 +viral_peak_sd = 0.5 +duration_shedding_mean = 17 +duration_shedding_sd = 3 +log10_g_prior_mean = 12 +log10_g_prior_sd = 2 +log_g_prior_mean = 27.63102 # 12 * log(10) +log_g_prior_sd = 4.60517 # 2 * log(10) + +sigma_ww_site_prior_mean_mean = 1 +sigma_ww_site_prior_mean_sd = 1 +sigma_ww_site_prior_sd_mean = 0 +sigma_ww_site_prior_sd_sd = 1 + +ww_site_mod_sd_sd = 0.25 +log_phi_g_prior_mean = -2.302585 # log(0.1) +# prior mean in individual level dispersion +# in fecal shedding +log_phi_g_prior_sd = 5 # wide std + + +[continuous_distribution_parameters] + # Generation Interval + # From: Park, Sang Woo, et al. "Inferring the differences in incubation-period + # and generation-interval distributions of the Delta and Omicron variants of + # SARS-CoV-2." Proceedings of the National Academy of Sciences 120.22 (2023): + # e2221887120. + # from the object in Fig 4F corresponding to between household transmission + # in Omicron https://github.com/parksw3/omicron-generation/blob/d36d4568bfd3b3d389b30282758b9c322cfe2b9f/figure/compare.R#L175 #nolint + + mu_gi = 0.92877 + sigma_gi = 0.526 # (using lognormal CDF and Park CIs of 2.7 and 3.2) + gt_max = 15 # number of daily bins for discretization + + # Incubation period parameters + # From: Park, Sang Woo, et al. "Inferring the differences in incubation-period + # and generation-interval distributions of the Delta and Omicron variants of + # SARS-CoV-2." Proceedings of the National Academy of Sciences 120.22 (2023): + # e2221887120. + r = 0.15 + backward_shape = 1.5 + backward_scale = 3.6 + + # Symptom onset to hospital admission delay parameters + # From fitting a negative binomial to data from + # Danache et al + # https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0261428 + neg_binom_mu = 6.98665 + neg_binom_size = 2.490848 From 513587be2cc60484af303cf39b4c1c7ed50b574a Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 28 Jun 2024 15:53:48 -0400 Subject: [PATCH 022/103] add in functions for getting delay distributions --- NAMESPACE | 41 +++++++ R/delay_distribs.R | 155 +++++++++++++++++++++++++++ R/generate_simulated_data.R | 80 ++++++++------ R/get_params.R | 64 +++++++++++ man/drop_first_and_renormalize.Rd | 28 +++++ man/generate_simulated_data.Rd | 104 +++++++++--------- man/get_params.Rd | 22 ++++ man/make_hospital_onset_delay_pmf.Rd | 32 ++++++ man/make_incubation_period_pmf.Rd | 35 ++++++ man/simulate_double_censored_pmf.Rd | 53 +++++++++ man/validate_paramlist.Rd | 18 ++++ man/wwinference-package.Rd | 39 +++++++ 12 files changed, 590 insertions(+), 81 deletions(-) create mode 100644 R/delay_distribs.R create mode 100644 R/get_params.R create mode 100644 man/drop_first_and_renormalize.Rd create mode 100644 man/get_params.Rd create mode 100644 man/make_hospital_onset_delay_pmf.Rd create mode 100644 man/make_incubation_period_pmf.Rd create mode 100644 man/simulate_double_censored_pmf.Rd create mode 100644 man/validate_paramlist.Rd create mode 100644 man/wwinference-package.Rd diff --git a/NAMESPACE b/NAMESPACE index 582284f5..bef8f0cc 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -1,4 +1,45 @@ # Generated by roxygen2: do not edit by hand export(add_pmfs) +export(drop_first_and_renormalize) export(generate_simulated_data) +export(get_params) +export(make_hospital_onset_delay_pmf) +export(make_incubation_period_pmf) +export(simulate_double_censored_pmf) +export(validate_paramlist) +importFrom(cmdstanr,cmdstan_model) +importFrom(dplyr,arrange) +importFrom(dplyr,as_tibble) +importFrom(dplyr,distinct) +importFrom(dplyr,filter) +importFrom(dplyr,group_by) +importFrom(dplyr,left_join) +importFrom(dplyr,mutate) +importFrom(dplyr,pull) +importFrom(dplyr,rename) +importFrom(dplyr,row_number) +importFrom(dplyr,select) +importFrom(dplyr,ungroup) +importFrom(ggplot2,aes) +importFrom(ggplot2,facet_grid) +importFrom(ggplot2,facet_wrap) +importFrom(ggplot2,geom_bar) +importFrom(ggplot2,geom_hline) +importFrom(ggplot2,geom_line) +importFrom(ggplot2,geom_point) +importFrom(ggplot2,geom_ribbon) +importFrom(ggplot2,geom_vline) +importFrom(ggplot2,ggplot) +importFrom(ggplot2,labs) +importFrom(ggplot2,scale_colour_discrete) +importFrom(ggplot2,scale_fill_discrete) +importFrom(ggplot2,scale_x_date) +importFrom(ggplot2,scale_y_continuous) +importFrom(ggplot2,theme) +importFrom(lubridate,ymd) +importFrom(tidybayes,spread_draws) +importFrom(tidybayes,stat_halfeye) +importFrom(tidybayes,stat_slab) +importFrom(tidyr,pivot_longer) +importFrom(tidyr,pivot_wider) diff --git a/R/delay_distribs.R b/R/delay_distribs.R new file mode 100644 index 00000000..6b1b14ca --- /dev/null +++ b/R/delay_distribs.R @@ -0,0 +1,155 @@ +#' Drop the first element of a simplex +#' +#' When this vector corresponds to the generation interval distribution, we +#' want to drop this first bin. The renewal equation assumes that same-day +#' infection and onward transmission does not occur, and we assume +#' everything is 1 indexed not 0 indeced. We need to +#' manually drop the first element from the PMF vector. +#' +#' @param x A numeric vector, sums to 1. Corresponds to a discretized PDF or PMF +#' (usually the GI distribution). +#' +#' @return A numeric vector, sums to 1. +#' @export +#' @examples +#' pmf_orig <- c(0.1, 0.1, 0.1, 0.7) +#' pmf_trunc <- drop_first_and_renormalize(pmf_orig) +drop_first_and_renormalize <- function(x) { + # Check input sums to 1 + stopifnot(abs(sum(x) - 1) < 1e-8) + # Drop and renormalize + y <- x[2:length(x)] / sum(x[2:length(x)]) + vec_outside_tol <- abs(sum(y) - 1L) > 1e-10 + # Normalize until within tolerance + while (vec_outside_tol) { + y <- y / sum(y) + vec_outside_tol <- abs(sum(y) - 1L) > 1e-10 + } + return(y) +} + +#' Simulate daily double censored PMF. From {epinowcast}: +#' https://package.epinowcast.org/dev/reference/simulate_double_censored_pmf.html #nolint +#' +#' This function simulates the probability mass function of a daily +#' double-censored process. The process involves two distributions: a primary +#' distribution which represents the censoring process for the primary event +#' and another distribution (which is offset by the primary). +#' +#' Based off of: +#' https://www.medrxiv.org/content/10.1101/2024.01.12.24301247v1 +#' +#' @param max Maximum value for the computed CDF. If not specified, the maximum +#' value is the maximum simulated delay. +#' @param fun_primary Primary distribution function (default is \code{runif}). +#' @param fun_dist Distribution function to be added to the primary (default is +#' \code{rlnorm}). +#' @param n Number of simulations (default is 1e6). +#' @param primary_args List of additional arguments to be passed to the primary +#' distribution function. +#' @param dist_args List of additional arguments to be passed to the +#' distribution function. +#' @param ... Additional arguments to be passed to the distribution function. +#' This is an alternative to `dist_args`. +#' +#' @return A numeric vector representing the PMF. +#' @export +#' @examples +#' simulate_double_censored_pmf(10, meanlog = 0, sdlog = 1) +simulate_double_censored_pmf <- function( + max, fun_primary = stats::runif, primary_args = list(), + fun_dist = stats::rlnorm, + dist_args = list(...), n = 1e6, ...) { + primary <- do.call(fun_primary, c(list(n), primary_args)) + secondary <- primary + do.call(fun_dist, c(list(n), dist_args)) + delay <- floor(secondary) - floor(primary) + if (missing(max)) { + max <- base::max(delay) + } + cdf <- ecdf(delay)(0:max) + pmf <- c(cdf[1], diff(cdf)) + vec_outside_tol <- abs(sum(pmf) - 1L) > 1e-10 + while (vec_outside_tol) { + pmf <- pmf / sum(pmf) + vec_outside_tol <- abs(sum(pmf) - 1L) > 1e-10 + } + return(pmf) +} + +#' @title Make incubation period pmf +#' @description This makes a pmf corresponding to +#' the incubation period for COVID after Omicron used in Park et al 2023 +#' These estimates are from early Omicron. +#' @param backward_scale numeric indicating the scale parameter for the Weibull +#' used in producing the incubation period distribution. default is `3.60` for +#' COVID +#' @param backward_shape numeric indicating the shape parameter for the Weibull +#' used in producing the incubation period distribution, default is `1.50` for +#' COVID +#' @param r numeric indicating the exponential rate used in producing the +#' correction on the incubaion period distribution, default is `0.15` for COVID +#' +#' @return pmf of incubation period +#' @export +#' +#' @examples +#' inc_pmf <- make_incubation_period_pmf(3.6, 1.5, 0.15) +make_incubation_period_pmf <- function(backward_scale = 3.60, + backward_shape = 1.50, + r = 0.15) { + # From: Park, Sang Woo, et al. "Inferring the differences in incubation-period + # and generation-interval distributions of the Delta and Omicron variants of + # SARS-CoV-2." Proceedings of the National Academy of Sciences 120.22 (2023): + # e2221887120. + + # "However, when we account for growth-rate differences and reestimate the + # forward incubation periods, we find that both variants have similar + # incubation-period distributions with a mean of 4.1 d (95% CI: 3.8 to 4.4 d) + # for the Delta variant and 4.2 d (95% CI: 3.6 to 4.9 d) for the Omicron + # variant Fig. 3B)." + + # Fits a Weibull to the data + + # Relies on fundamental assumption about epidemic growth rate. + + + corrected_sgtf <- tibble::tibble( + time = seq(0, 23, by = 1), # 23 seems to get most of the distribution mass + density0 = dweibull(time, + shape = backward_shape, + scale = backward_scale + ) * exp(r * time) + ) + + inc_period_pmf <- corrected_sgtf$density0 / sum(corrected_sgtf$density0) + return(inc_period_pmf) +} + +#' @title Make hospital onset delay pmf +#' @description Uses the parameter estimates from cfa-parameter-estimates, +#' which is based on Danache et al linelist data from symptom onset to hospital +#' admission. See below: +#' https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0261428 +#' +#' @param neg_binom_mu float indicating the mean of the negative binomial shaped +#' delay from symptom onset to hospital admissions, default is `6.98665` from +#' fit to data in above paper +#' @param neg_binom_size float indicating the dispersion parameter in the +#' negative binomial delay from symptom onset to hospital admissions, default +#' is `2.490848` from fit to data in above paper +#' +#' @return pmf of distribution from symptom onset to hospital admission +#' @export +#' +#' @examples +#' delay_pmf <- make_hospital_onset_delay_pmf(7, 2.5) +make_hospital_onset_delay_pmf <- function(neg_binom_mu = 6.98665, + neg_binom_size = 2.490848) { + density <- dnbinom( + x = seq(0, 30, 1), + mu = neg_binom_mu, size = neg_binom_size + ) + hosp_onset_delay_pmf <- density / sum(density) + + return(hosp_onset_delay_pmf) +} diff --git a/R/generate_simulated_data.R b/R/generate_simulated_data.R index 1246c017..86a28dad 100644 --- a/R/generate_simulated_data.R +++ b/R/generate_simulated_data.R @@ -111,26 +111,29 @@ generate_simulated_data <- function(r_in_weeks = # nolint "Sum of wastewater site populations is greater than state pop" = pop_size > sum(ww_pop_sites) ) + stopifnot( + "Insufficient population sizes provided for wastewater catchment areas" = + length(ww_pop_sites) >= n_sites + ) - if (length(ww_pop_sites) < n_sites) { - ww_pop_sites <- rnorm(n_sites, - mean = (0.7 * pop_size / n_sites), - sd = 0.1 * (0.7 * pop_size / n_sites) - ) - } - if (n_lab_sites < n_sites) { - n_lab_sites <- n_sites - map_site_to_lab <- 1:n_sites - } + stopifnot( + "Insufficient number of lab-site combinations provided" = + n_lab_sites >= n_sites + ) + stopifnot( + "Mapping from sites to lab-sites not provided" = + length(map_site_to_lab) == n_lab_sites + ) - # Get pop fractrions of each subpop + + # Get pop fractions of each subpop. There will n_sites + 1 subpops pop_fraction <- c( ww_pop_sites / pop_size, (pop_size - sum(ww_pop_sites)) / pop_size ) - # Expose the stan functions (can use any of the models here) - model <- cmdstan_model( + # Expose the stan functions into the global environment + model <- cmdstanr::cmdstan_model( stan_file = system.file( "stan", "wwinference.stan", package = "wwinference" @@ -139,27 +142,32 @@ generate_simulated_data <- function(r_in_weeks = # nolint compile_standalone = TRUE, force_recompile = TRUE ) - model$expose_functions(global = TRUE) - params <- get_params(input_params_path) # load in a data table with parameters + + # Pull parameter values into memory + params <- get_params(input_params_path) # load in a single row tibble par_names <- colnames(params) # pull them into memory for (i in seq_along(par_names)) { assign(par_names[i], as.double(params[i])) } + # Create a tibble that maps sites, labs, and population sizes of the sites site_lab_map <- data.frame( lab_site = 1:n_lab_sites, site = map_site_to_lab ) |> - left_join(data.frame(site = 1:n_sites, ww_pop = ww_pop_sites)) + dplyr::left_join(data.frame(site = 1:n_sites, ww_pop = ww_pop_sites)) + # Define some time variables ht <- nt + forecast_horizon - n_weeks <- ceiling((ot + ht) / 7) - tot_weeks <- ceiling((uot + ot + ht) / 7) + n_weeks <- ceiling((ot + ht) / 7) # calibration + forecast time + tot_weeks <- ceiling((uot + ot + ht) / 7) # initialization time + + # calibration + forecast time + # We need dates to get a weekday vector dates <- seq( from = sim_start_date, to = - (sim_start_date + days(ot + nt + ht - 1)), by = "days" + (sim_start_date + lubridate::days(ot + nt + ht - 1)), by = "days" ) log_i0_over_n <- log(i0_over_n) day_of_week_vector <- lubridate::wday(dates, week_start = 1) @@ -167,18 +175,26 @@ generate_simulated_data <- function(r_in_weeks = # nolint t = 1:(ot + nt + ht), date = dates ) - forecast_date <- date_df %>% - filter(t == ot + nt) %>% - pull(date) - # set the lab-site multiplier presumably from lab measurement processes + + forecast_date <- date_df |> + dplyr::filter(t == ot + nt) |> + dplyr::pull(date) + + # Set the lab-site multiplier presumably from lab measurement processes log_m_lab_sites <- rnorm(n_lab_sites, mean = 0, sd = sd_in_lab_level_multiplier - ) + ) # This is the magnitude shift (multiplier in natural scale) on the + # observations, presumably from things like concentration method, PCR type, + # collection type, etc. + # Assign a site level observation error to each site, but have it scale - # inversely with the catchment area of the site (this may not be the right - # scaling) + # inversely with the catchment area of the site for now. Eventually, we will + # want to impose the expected variability as a function of the contributing + # infections, but since this module isn't currently in the model we will + # just do this for now. sigma_ww_lab_site <- mean(site_lab_map$ww_pop) * mean_obs_error_in_ww_lab_site / site_lab_map$ww_pop + # Set randomly the lab-site reporting avg frequency (per day) and the # reporting latency (in days). Will use this to sample times in the observed # data @@ -193,20 +209,24 @@ generate_simulated_data <- function(r_in_weeks = # nolint # Set a lab-site-specific LOD in log scale lod_lab_site <- rnorm(n_lab_sites, mean = mean_log_lod, sd = sd_log_lod) - ## Delay distributions---------------------------------------------------- + # Delay distributions: Note, these are COVID specific, and we will + # generally want to specify these outside model configuration + + # Double censored, zero truncated, generation interval generation_interval <- simulate_double_censored_pmf( max = gt_max, meanlog = mu_gi, sdlog = sigma_gi, fun_dist = rlnorm, n = 5e6 - ) %>% drop_first_and_renormalize() - - + ) |> drop_first_and_renormalize() # Set infection feedback to generation interval infection_feedback_pmf <- generation_interval infection_feedback_rev_pmf <- rev(infection_feedback_pmf) infection_feedback <- 0 if_feedback <- 1 + # Delay from infection to hospital admission: incubation period + # time from symptom onset to hospital admission + + # Get incubation period for COVID. inc <- make_incubation_period_pmf( backward_scale, backward_shape, r ) diff --git a/R/get_params.R b/R/get_params.R new file mode 100644 index 00000000..39a6b5c1 --- /dev/null +++ b/R/get_params.R @@ -0,0 +1,64 @@ +## Get model parameters (priors and set params of set distributions)------------ +#' @title Get parameters for model run +#' +#' @param param_file Path to a `.toml` file defining +#' parameter values. +#' +#' @return a dataframe with numeric values for parameter values passed to the +#' model +#' @export +#' +#' @examples +#' params <- get_params("input/params.toml") +get_params <- function(param_file) { + paramlist <- RcppTOML::parseTOML(param_file) + validate_paramlist(paramlist) + + flat_paramlist <- c( + paramlist$continuous_distribution_parameters, + paramlist$timescale, + paramlist$infection_process, + paramlist$hospital_admission_observation_process, + paramlist$wastewater_observation_process + ) + + ## this preserves the dataframe structure + params <- as.data.frame(flat_paramlist) + + return(params) +} + +#' Validate a parameter list +#' +#' @param paramlist The parameter list to validate +#' +#' @return the parameter list, on success, +#' or raise an error +#' @export +validate_paramlist <- function(paramlist) { + expected_sections <- c( + "continuous_distribution_parameters", + "hospital_admission_observation_process", + "infection_process", + "timescale", + "wastewater_observation_process" + ) + + + missing_sections <- setdiff( + names(paramlist), + expected_sections + ) + + if (length(missing_sections) > 0) { + cli::cli_abort( + paste0( + "Parameter list missing expected ", + "section(s) {missing_sections}" + ) + ) + } + + ## additional validation logic can go here + return(paramlist) +} diff --git a/man/drop_first_and_renormalize.Rd b/man/drop_first_and_renormalize.Rd new file mode 100644 index 00000000..ffd998e4 --- /dev/null +++ b/man/drop_first_and_renormalize.Rd @@ -0,0 +1,28 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/delay_distribs.R +\name{drop_first_and_renormalize} +\alias{drop_first_and_renormalize} +\title{Drop the first element of a simplex} +\usage{ +drop_first_and_renormalize(x) +} +\arguments{ +\item{x}{A numeric vector, sums to 1. Corresponds to a discretized PDF or PMF +(usually the GI distribution).} +} +\value{ +A numeric vector, sums to 1. +} +\description{ +When this vector corresponds to the generation interval distribution, we +want to drop this first bin. The renewal equation assumes that same-day +infection and onward transmission does not occur, and we assume +everything is 1 indexed not 0 indeced. We need to +manually drop the first element from the PMF vector. +} +\examples{ +pmf_orig <- c(0.1, 0.1, 0.1, 0.7) +pmf_trunc <- drop_first_and_renormalize(pmf_orig) +pmf_trunc +0.11, 0.11, 0.777 +} diff --git a/man/generate_simulated_data.Rd b/man/generate_simulated_data.Rd index 4a601d1e..9d1739bf 100644 --- a/man/generate_simulated_data.Rd +++ b/man/generate_simulated_data.Rd @@ -5,18 +5,16 @@ \title{Generate simulated data from the underlying model's generative process} \usage{ generate_simulated_data( - site_level_inf_dynamics = TRUE, - site_level_conc_dynamics = FALSE, r_in_weeks = c(rep(1.1, 5), rep(0.9, 5), 1 + 0.007 * 1:16), n_sites = 4, ww_pop_sites = c(4e+05, 2e+05, 1e+05, 50000), pop_size = 1e+06, - n_lab_sites = 5, - map_site_to_lab = c(1, 1, 2, 3, 4), + n_lab_sites = 4, + map_site_to_lab = c(1, 2, 3, 4), ot = 90, nt = 9, - forecast_time = 28, - sim_start_date = ymd("2023-10-30"), + forecast_horizon = 28, + sim_start_date = lubridate::ymd("2023-09-01"), hosp_wday_effect = c(0.95, 1.01, 1.02, 1.02, 1.01, 1, 0.99)/7, i0_over_n = 5e-04, initial_growth = 1e-04, @@ -33,71 +31,73 @@ generate_simulated_data( ) } \arguments{ -\item{site_level_inf_dynamics}{if TRUE then the toy data has variation in the -site-level R(t), if FALSE, assumes same underlying R(t) for the state as in -each site} +\item{r_in_weeks}{vector indcating the mean weekly R(t) that drives infection +dynamics at the state-level. This gets jittered with random noise to add +week-to-week variation.} -\item{site_level_conc_dynamics}{if TRUE then the toy data has variation in the -site-level concentration each day, if FALSE, then the relationship from infection -to concentration in each site is the same across sites} +\item{n_sites}{integer indicating the number of sites} -\item{r_in_weeks}{The mean weekly R(t) that drives infection dynamics at the state- -level. This gets jittered with random noise to add week-to-week variation.} +\item{ww_pop_sites}{vector indicating the population size in the +catchment area in each of those sites (order must match)} -\item{n_sites}{Number of sites} +\item{pop_size}{integer indicating the population size in the hypothetical +state} -\item{ww_pop_sites}{Catchment area in each of those sites (order must match)} +\item{n_lab_sites}{integer indicating the nummber of unique combinations of +labs and sites. Must be greater than or equal to \code{n_sites}} -\item{pop_size}{Population size in the state} - -\item{n_lab_sites}{NUmber of unique combinations of labs and sites. Must be -greater than or equal to \code{n_sites}} - -\item{map_site_to_lab}{Vector mapping the sites to the lab-sites in order +\item{map_site_to_lab}{vector mapping the sites to the lab-sites in order of the sites} -\item{ot}{observed time: length of hospital admissions calibration time in days} +\item{ot}{integer indicating the observed time: length of hospital admissions +calibration time in days} -\item{nt}{nowcast time: length of time between last hospital admissions date -and forecast date in days} +\item{nt}{integer indicating the nowcast time: length of time between last +hospital admissions date and forecast date in days} -\item{forecast_time}{duration of the forecast in days e.g. 28 days} +\item{forecast_horizon}{integer indicating the duration of the forecast in +days e.g. 28 days} -\item{sim_start_date}{the start date of the simulation, used to get a weekday -vector} +\item{sim_start_date}{character string formatted as "YYYY-MM-DD" indicating +the start date of the simulation, used to get a weekday vector} -\item{hosp_wday_effect}{a simplex of length 7 describing how the hospital -admissions are spread out over a week, starting at Monday = 1} +\item{hosp_wday_effect}{a vector that is a simplex of length 7 describing +how the hospital admissions are spread out over a week, starting at +Monday = 1} -\item{i0_over_n}{the initial per capita infections in the state} +\item{i0_over_n}{float between 0 and 1 indicating the initial per capita +infections in the state} -\item{initial_growth}{exponential growth rate during the unobserved time} +\item{initial_growth}{float indicating the exponential growth rate in +infections (daily) during the unobserved time} -\item{sd_in_lab_level_multiplier}{standard deviation in the log of the site- -lab level multiplier determining how much variation there is systematically -in site-labs from the state mean} +\item{sd_in_lab_level_multiplier}{float indicating the standard deviation in +the log of the site-lab level multiplier determining how much variation +there is systematically in site-labs from the state mean} -\item{mean_obs_error_in_ww_lab_site}{mean day to day variation in observed -wastewater concentrations across all lab-sites} +\item{mean_obs_error_in_ww_lab_site}{float indicating the mean day-to-day +variation in observed wastewater concentrations across all lab-sites} -\item{mean_reporting_freq}{mean frequency of wastewater measurements across -sites in per day (e.g. 1/7 is once per week)} +\item{mean_reporting_freq}{float indicating the mean frequency of wastewater +measurements across sites in per day (e.g. 1/7 is once per week)} -\item{sd_reporting_freq}{standard deviation in the frequency of wastewater -measurements across sites} +\item{sd_reporting_freq}{float indicating the standard deviation in the +frequency of wastewater measurements across sites} -\item{mean_reporting_latency}{mean time from forecast date to last -wastewater sample collection date, across sites} +\item{mean_reporting_latency}{float indicating the mean time from forecast +date to last wastewater sample collection date, across sites} -\item{sd_reporting_latency}{standard deviation in the time from the forecast -date to the last wastewater sample collection date, across sites} +\item{sd_reporting_latency}{float indicating the standard deviation in the +time from the forecast date to the last wastewater sample collection date, +across sites} -\item{mean_log_lod}{mean log of the LOD in each lab-site} +\item{mean_log_lod}{float indicating the mean log of the LOD in each lab-site} -\item{sd_log_lod}{standard deviation in the log of the LOD across sites} +\item{sd_log_lod}{float indicating the standard deviation in the log of the +LOD across sites} -\item{example_params_path}{path to the toml file with the parameters to use to -generate the simulated data} +\item{input_params_path}{path to the toml file with the parameters to use +to generate the simulated data} } \value{ a list containing two dataframes. hosp_data is a dataframe containing @@ -113,8 +113,10 @@ and parameters to generate from. \examples{ # Generate a simulated dataset from a hypothetical state with 6 sites and 2 # different labs -sim_data <- generate_simulated_data(n_sites = 6, - map_site_to_lab = c(rep(1,4), rep(2,2)) +sim_data <- generate_simulated_data( + n_sites = 6, + map_site_to_lab = c(rep(1, 4), rep(2, 2)) +) hosp_data <- sim_data$hosp_data ww_data <- sim_data$ww_data } diff --git a/man/get_params.Rd b/man/get_params.Rd new file mode 100644 index 00000000..a8c21c7e --- /dev/null +++ b/man/get_params.Rd @@ -0,0 +1,22 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/get_params.R +\name{get_params} +\alias{get_params} +\title{Get parameters for model run} +\usage{ +get_params(param_file) +} +\arguments{ +\item{param_file}{Path to a \code{.toml} file defining +parameter values.} +} +\value{ +a dataframe with numeric values for parameter values passed to the +model +} +\description{ +Get parameters for model run +} +\examples{ +params <- get_params("input/params.toml") +} diff --git a/man/make_hospital_onset_delay_pmf.Rd b/man/make_hospital_onset_delay_pmf.Rd new file mode 100644 index 00000000..702cce5d --- /dev/null +++ b/man/make_hospital_onset_delay_pmf.Rd @@ -0,0 +1,32 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/delay_distribs.R +\name{make_hospital_onset_delay_pmf} +\alias{make_hospital_onset_delay_pmf} +\title{Make hospital onset delay pmf} +\usage{ +make_hospital_onset_delay_pmf( + neg_binom_mu = 6.98665, + neg_binom_size = 2.490848 +) +} +\arguments{ +\item{neg_binom_mu}{float indicating the mean of the negative binomial shaped +delay from symptom onset to hospital admissions, default is \code{6.98665} from +fit to data in above paper} + +\item{neg_binom_size}{float indicating the dispersion parameter in the +negative binomial delay from symptom onset to hospital admissions, default +is \code{2.490848} from fit to data in above paper} +} +\value{ +pmf of distribution from symptom onset to hospital admission +} +\description{ +Uses the parameter estimates from cfa-parameter-estimates, +which is based on Danache et al linelist data from symptom onset to hospital +admission. See below: +https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0261428 +} +\examples{ +delay_pmf <- make_hospital_onset_delay_pmf(7, 2.5) +} diff --git a/man/make_incubation_period_pmf.Rd b/man/make_incubation_period_pmf.Rd new file mode 100644 index 00000000..847de80d --- /dev/null +++ b/man/make_incubation_period_pmf.Rd @@ -0,0 +1,35 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/delay_distribs.R +\name{make_incubation_period_pmf} +\alias{make_incubation_period_pmf} +\title{Make incubation period pmf} +\usage{ +make_incubation_period_pmf( + backward_scale = 3.6, + backward_shape = 1.5, + r = 0.15 +) +} +\arguments{ +\item{backward_scale}{numeric indicating the scale parameter for the Weibull +used in producing the incubation period distribution. default is \code{3.60} for +COVID} + +\item{backward_shape}{numeric indicating the shape parameter for the Weibull +used in producing the incubation period distribution, default is \code{1.50} for +COVID} + +\item{r}{numeric indicating the exponential rate used in producing the +correction on the incubaion period distribution, default is \code{0.15} for COVID} +} +\value{ +pmf of incubation period +} +\description{ +This makes a pmf corresponding to +the incubation period for COVID after Omicron used in Park et al 2023 +These estimates are from early Omicron. +} +\examples{ +inc_pmf <- make_incubation_period_pmf(3.6, 1.5, 0.15) +} diff --git a/man/simulate_double_censored_pmf.Rd b/man/simulate_double_censored_pmf.Rd new file mode 100644 index 00000000..7d12f025 --- /dev/null +++ b/man/simulate_double_censored_pmf.Rd @@ -0,0 +1,53 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/delay_distribs.R +\name{simulate_double_censored_pmf} +\alias{simulate_double_censored_pmf} +\title{Simulate daily double censored PMF. From {epinowcast}: +https://package.epinowcast.org/dev/reference/simulate_double_censored_pmf.html} +\usage{ +simulate_double_censored_pmf( + max, + fun_primary = stats::runif, + primary_args = list(), + fun_dist = stats::rlnorm, + dist_args = list(...), + n = 1e+06, + ... +) +} +\arguments{ +\item{max}{Maximum value for the computed CDF. If not specified, the maximum +value is the maximum simulated delay.} + +\item{fun_primary}{Primary distribution function (default is \code{runif}).} + +\item{primary_args}{List of additional arguments to be passed to the primary +distribution function.} + +\item{fun_dist}{Distribution function to be added to the primary (default is +\code{rlnorm}).} + +\item{dist_args}{List of additional arguments to be passed to the +distribution function.} + +\item{n}{Number of simulations (default is 1e6).} + +\item{...}{Additional arguments to be passed to the distribution function. +This is an alternative to \code{dist_args}.} +} +\value{ +A numeric vector representing the PMF. +} +\description{ +This function simulates the probability mass function of a daily +double-censored process. The process involves two distributions: a primary +distribution which represents the censoring process for the primary event +and another distribution (which is offset by the primary). +} +\details{ +Based off of: +https://www.medrxiv.org/content/10.1101/2024.01.12.24301247v1 +} +\examples{ +simulate_double_censored_pmf(10, meanlog = 0, sdlog = 1) +} diff --git a/man/validate_paramlist.Rd b/man/validate_paramlist.Rd new file mode 100644 index 00000000..b4307d55 --- /dev/null +++ b/man/validate_paramlist.Rd @@ -0,0 +1,18 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/get_params.R +\name{validate_paramlist} +\alias{validate_paramlist} +\title{Validate a parameter list} +\usage{ +validate_paramlist(paramlist) +} +\arguments{ +\item{paramlist}{The parameter list to validate} +} +\value{ +the parameter list, on success, +or raise an error +} +\description{ +Validate a parameter list +} diff --git a/man/wwinference-package.Rd b/man/wwinference-package.Rd new file mode 100644 index 00000000..e2495c64 --- /dev/null +++ b/man/wwinference-package.Rd @@ -0,0 +1,39 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/wwinference-package.R +\docType{package} +\name{wwinference-package} +\alias{wwinference} +\alias{wwinference-package} +\title{wwinference: Jointly infers infection dynamics from wastewater data and epidemiological indicators} +\description{ +An implementation of a hierarchical semi-mechanistic renewal approach jointly calibrating to multiple wastewater concentrations datasets from subsets of a specified population and epidemioliogical indicators such as cases or hospital admissions from the whole population. Our framework is an extension of the widely used semi-mechanistic renewal framework EpiNow2, using a Bayesian latent variable approach implemented in the probabilistic programming language Stan. This package contains just the core components needed to fit these two data sources and produce the following outputs-- estimated and forecasted hospital admissions, estimated and forecasted wastewater concentrations, global R(t) estimates, local R(t) estimates for the subpopulations represented by each wastewater catchment area. +} +\seealso{ +Useful links: +\itemize{ + \item \url{https://github.com/cdcgov/ww-inference-model/} + \item \url{https://cdcgov.github.io/ww-inference-model/} + \item Report bugs at \url{https://github.com/cdcgov/ww-inference-model/issues/} +} + +} +\author{ +\strong{Maintainer}: Kaitlyn Johnson \email{uox1@cdc.gov} (\href{https://orcid.org/0000-0001-8011-0012}{ORCID}) + +Authors: +\itemize{ + \item Sam Abbott \email{contact@samabbott.co.uk} (\href{https://orcid.org/0000-0001-8057-8037}{ORCID}) + \item Zachary Susswein \email{utb2@cdc.gov} + \item Andrew Magee \email{rzg0@cdc.gov} + \item Dylan Morris \email{dylan@dylanhmorris.com} (\href{https://orcid.org/0000-0002-3655-406X}{ORCID}) + \item Scott Olesen \email{ulp7@cdc.gov} + \item Damon Bayer \email{xum8@cdc.gov} +} + +Other contributors: +\itemize{ + \item George Vega Yon \email{g.vegayon@gmail.com} (\href{https://orcid.org/0000-0002-3171-0844}{ORCID}) [contributor] +} + +} +\keyword{internal} From fcad57ae92e4fd98f2fd56570d86b4468a30df0d Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 28 Jun 2024 17:05:42 -0400 Subject: [PATCH 023/103] add functions needed for generating simulated data --- NAMESPACE | 2 + R/delay_distribs.R | 30 +++++ R/generate_simulated_data.R | 178 ++++++++++++++-------------- R/utils.R | 26 ++++ man/drop_first_and_renormalize.Rd | 2 - man/get_ind_m.Rd | 23 ++++ man/make_reporting_delay_pmf.Rd | 28 +++++ man/simulate_double_censored_pmf.Rd | 2 +- 8 files changed, 199 insertions(+), 92 deletions(-) create mode 100644 man/get_ind_m.Rd create mode 100644 man/make_reporting_delay_pmf.Rd diff --git a/NAMESPACE b/NAMESPACE index bef8f0cc..ea435bb8 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -3,9 +3,11 @@ export(add_pmfs) export(drop_first_and_renormalize) export(generate_simulated_data) +export(get_ind_m) export(get_params) export(make_hospital_onset_delay_pmf) export(make_incubation_period_pmf) +export(make_reporting_delay_pmf) export(simulate_double_censored_pmf) export(validate_paramlist) importFrom(cmdstanr,cmdstan_model) diff --git a/R/delay_distribs.R b/R/delay_distribs.R index 6b1b14ca..72559cb5 100644 --- a/R/delay_distribs.R +++ b/R/delay_distribs.R @@ -153,3 +153,33 @@ make_hospital_onset_delay_pmf <- function(neg_binom_mu = 6.98665, return(hosp_onset_delay_pmf) } + +#' @title Make reporting delay pmf +#' @description +#' Convolve the incubation period pmf with the symptom to hospital admission pmf +#' and normalize +#' +#' @param incubation_period_pmf a numeric vector, sums to 1, indicating +#' the probability of time from infection to symptom onset +#' @param hospital_onset_delay_pmf a numeric vector, sums to 1, indicating the +#' proabbility of time from symptom onset to hospital admissions +#' +#' @return convolution of incubation period and sympton onset to hospital +#' admission pmf +#' @export +#' +#' @examples +#' inc_pmf <- make_incubation_period_pmf(3.6, 1.5, 0.15) +#' hosp_delay_pmf <- make_hospital_onset_delay_pmf(7, 2.5) +#' inf_to_hosp_pmf <- make_reporting_delay_pmf(inc_pmf, hosp_delay_pmf) +make_reporting_delay_pmf <- function(incubation_period_pmf, + hospital_onset_delay_pmf) { + pmfs <- list( + "incubation_period" = incubation_period_pmf, + "hosp_onset_delay" = hospital_onset_delay_pmf + ) + + infection_to_hosp_delay_pmf <- add_pmfs(pmfs) |> + (\(x) x / sum(x))() + return(infection_to_hosp_delay_pmf) +} diff --git a/R/generate_simulated_data.R b/R/generate_simulated_data.R index 86a28dad..21db05a6 100644 --- a/R/generate_simulated_data.R +++ b/R/generate_simulated_data.R @@ -231,86 +231,90 @@ generate_simulated_data <- function(r_in_weeks = # nolint backward_scale, backward_shape, r ) sym_to_hosp <- make_hospital_onset_delay_pmf(neg_binom_mu, neg_binom_size) + + # Infection to hospital admissions delay distribution inf_to_hosp <- make_reporting_delay_pmf(inc, sym_to_hosp) - # shedding kinetics delay distribution + + # Shedding kinetics delay distribution vl_trajectory <- model$functions$get_vl_trajectory( t_peak_mean, viral_peak_mean, duration_shedding_mean, gt_max ) - # Generate the state level weekly R(t) before infection feedback------------- + # Generate the state level weekly R(t) before infection feedback + # Adds a bit of noise, can add more... unadj_r_weeks <- (r_in_weeks * rnorm(length(r_in_weeks), 1, 0.03))[1:n_weeks] + # Convert to daily for input into renewal equation ind_m <- get_ind_m(ot + ht, n_weeks) unadj_r <- ind_m %*% unadj_r_weeks # Generate the site-level expected observed concentrations ----------------- - # first by adding - # variation to the site-level R(t) in each site, and then by adding time - # varying deviations in true concentration in each site, and then adding a - # site level true variability, and then adding lab-site level multiplier and - # obersvation error + # first by adding variation to the site-level R(t) in each site, + # and then adding lab-site level multiplier and observation error ### Generate the site level infection dynamics------------------------------- new_i_over_n_site <- matrix(nrow = n_sites + 1, ncol = (uot + ot + ht)) r_site <- matrix(nrow = n_sites + 1, ncol = (ot + ht)) # Generate site-level R(t) - if (isTRUE(site_level_inf_dynamics)) { - log_r_state_week <- log(unadj_r_weeks) - log_r_site <- matrix(nrow = n_sites + 1, ncol = n_weeks) - initial_growth_site <- vector(length = n_sites + 1) - log_i0_over_n_site <- vector(length = n_sites + 1) - for (i in 1:(n_sites + 1)) { - if (i <= n_sites) { - log_r_site[i, ] <- rnorm( - n = n_weeks, - mean = log_r_state_week, - sd = 0.05 - ) # sigma_rt - initial_growth_site[i] <- rnorm( - n = 1, mean = initial_growth, - sd = initial_growth_prior_sd - ) - log_i0_over_n_site[i] <- rnorm( - n = 1, mean = log_i0_over_n, - sd = 0.5 - ) - } else { - log_r_site[i, ] <- log_r_state_week - initial_growth_site[i] <- initial_growth - log_i0_over_n_site[i] <- log_i0_over_n - } - } - - new_i_over_n <- rep(0, (uot + ot + ht)) - for (i in 1:(n_sites + 1)) { - unadj_r_site <- ind_m %*% exp(log_r_site[i, ]) # daily R site - site_output <- model$functions$generate_infections( - unadj_r_site, uot, rev(generation_interval), log_i0_over_n_site[i], - initial_growth_site[i], ht, - infection_feedback, infection_feedback_rev_pmf - ) - new_i_over_n_site[i, ] <- site_output[[1]] - new_i_over_n <- new_i_over_n + pop_fraction[i] * site_output[[1]] - r_site[i, ] <- site_output[[2]] - } - } else { # site level R(t) and infections = state level R(t) and infections - for (i in 1:n_sites) { - new_i_over_n_site[i, ] <- new_i_over_n - r_site[i, ] <- rt - } + log_r_state_week <- log(unadj_r_weeks) + log_r_site <- matrix(nrow = n_sites + 1, ncol = n_weeks) + initial_growth_site <- vector(length = n_sites + 1) + log_i0_over_n_site <- vector(length = n_sites + 1) + for (i in 1:(n_sites + 1)) { + # This creates each R(t) vector for each subpopulation, by sampling + # from a normal distribution centered on the state R(t). + # In the model, this is an AR(1) process based on the previous deviation + log_r_site[i, ] <- rnorm( + n = n_weeks, + mean = log_r_state_week, + sd = 0.05 + ) # sigma_rt + + # Generate deviations in the initial growth rate and initial incidence + initial_growth_site[i] <- rnorm( + n = 1, mean = initial_growth, + sd = initial_growth_prior_sd + ) + # This is I0/N at the first unobserved time + log_i0_over_n_site[i] <- rnorm( + n = 1, mean = log_i0_over_n, + sd = 0.5 + ) } + + new_i_over_n <- rep(0, (uot + ot + ht)) # State infections + for (i in 1:(n_sites + 1)) { + unadj_r_site <- ind_m %*% exp(log_r_site[i, ]) # daily R site + + site_output <- model$functions$generate_infections( + unadj_r_site, # Daily unadjusted R(t) in each site + uot, # the duration of initialization time for exponential growth + rev(generation_interval), # the reversed generation interval + log_i0_over_n_site[i], # log of the initial infections per capita + initial_growth_site[i], # initial exponential growth rate + ht, # time after last observed hospital admission + infection_feedback, # binary indicating whether or not inf feedback + infection_feedback_rev_pmf # inf feedback delay pmf + ) + # matrix to hold infections + new_i_over_n_site[i, ] <- site_output[[1]] + # Cumulatively sum infections to get overall state infections + new_i_over_n <- new_i_over_n + pop_fraction[i] * site_output[[1]] + # Adjusted R(t) estimate in each site + r_site[i, ] <- site_output[[2]] + } + + # State adjusted R(t) = I(t)/convolve(I(t), g(t)) rt <- (new_i_over_n / model$functions$convolve_dot_product( new_i_over_n, rev(generation_interval), uot + ot + ht ))[(uot + 1):(uot + ot + ht)] - # Generate expected state level hospitalizations from subpop infections----- - # generate state-level incident infections using renewal equation for - # all time points + # Generate expected state level hospitalizations from subpop infections ----- # Generate a time varying P(hosp|infection), p_hosp_int_logit <- qlogis(p_hosp_mean) # p_hosp_mean is in linear scale @@ -330,16 +334,18 @@ generate_simulated_data <- function(r_in_weeks = # nolint p_hosp_w_logit ) # convert to days p_hosp_days <- plogis(p_hosp_logit_days) # convert back to linear scale - # Corresponds to a standard deviation in linear scale of 0.0003 + # Get expected trajectory of hospital admissions from incident infections # by convolving scaled incident infections with delay from infection to # hospital admission model_hosp_over_n <- model$functions$convolve_dot_product( - p_hosp_days * new_i_over_n, + p_hosp_days * new_i_over_n, # individuals who will be hospitalized rev(inf_to_hosp), uot + ot + ht )[(uot + 1):(uot + ot + ht)] + # only care about hospital admission in observed time, but need uot infections + exp_hosp <- pop_size * model$functions$day_of_week_effect( model_hosp_over_n, day_of_week_vector, @@ -357,79 +363,73 @@ generate_simulated_data <- function(r_in_weeks = # nolint log_g_over_n_site <- matrix(nrow = n_sites, ncol = (ot + ht)) for (i in 1:n_sites) { + # Convolve infections with shedding kinetics model_net_i <- model$functions$convolve_dot_product( new_i_over_n_site[i, ], rev(vl_trajectory), (uot + ot + ht) )[(uot + 1):(uot + ot + ht)] + # Scale by average genomes shed per infection log_g_over_n_site[i, ] <- log(10) * log10_g_prior_mean + log(model_net_i + 1e-8) } - ## Generate site-level true genomes ------------------------------------ - # with site multiplier and time-varying deviation - - log_exp_g_over_n_site <- matrix(nrow = n_sites, ncol = (ot + ht)) - - for (i in 1:n_sites) { - if (isFALSE(site_level_conc_dynamics)) { - log_exp_g_over_n_site[i, ] <- log_g_over_n_site[i, ] - } else { - log_exp_g_over_n_site[i, ] <- log_g_over_n_site[i, ] + - rnorm( - n = (ot + ht), mean = 0, - sd = 0.01 - ) # sigma_log_conc prior - } - } # Add on site-lab-level observation error ----------------------------------- log_obs_g_over_n_lab_site <- matrix(nrow = n_lab_sites, ncol = (ot + ht)) for (i in 1:n_lab_sites) { - log_g_w_multiplier <- log_exp_g_over_n_site[map_site_to_lab[i], ] + - log_m_lab_sites[i] + log_g_w_multiplier <- log_g_over_n_site[map_site_to_lab[i], ] + + log_m_lab_sites[i] # Add site level multiplier in log scale log_obs_g_over_n_lab_site[i, ] <- log_g_w_multiplier + rnorm( n = (ot + ht), mean = 0, sd = sigma_ww_lab_site[i] - ) + ) # + add observation error in log scale } # Sample from some lab-sites more frequently than others and add different # latencies for each lab-site log_obs_conc_lab_site <- matrix(nrow = n_lab_sites, ncol = ot + ht) for (i in 1:n_lab_sites) { + # Get the indices where we observe concentrations st <- sample(1:(ot + nt), round((ot + nt) * lab_site_reporting_freq[i])) + # cut off end based on latency stl <- pmin((ot + nt - lab_site_reporting_latency[i]), st) + # Calculate log concentration for the days that we have observations log_obs_conc_lab_site[i, stl] <- log_obs_g_over_n_lab_site[i, stl] - log(ml_of_ww_per_person_day) } # Format the data----------------------------------------------------------- - df_long <- as.data.frame(t(log_obs_conc_lab_site)) %>% - dplyr::mutate(t = 1:(ot + ht)) %>% + ww_data <- as.data.frame(t(log_obs_conc_lab_site)) |> + dplyr::mutate(t = 1:(ot + ht)) |> tidyr::pivot_longer(!t, - names_to = "lab_wwtp_unique_id", + names_to = "lab_site", names_prefix = "V", values_to = "log_conc" - ) %>% + ) |> dplyr::mutate( - lab_wwtp_unique_id = as.integer(lab_wwtp_unique_id) - ) %>% - dplyr::left_join(date_df, by = "t") %>% + lab_site = as.integer(lab_site) + ) |> + dplyr::left_join(date_df, by = "t") |> + dplyr::left_join(site_lab_map, + by = c("lab_site") + ) |> dplyr::left_join( data.frame( lab_site = 1:n_lab_sites, lod_sewage = lod_lab_site ), - by = c("lab_wwtp_unique_id" = "lab_site") - ) %>% - dplyr::mutate(below_LOD = ifelse(log_conc >= lod_sewage, 0, 1)) %>% - dplyr::mutate(lod_sewage = case_when( - is.na(log_conc) ~ NA, - !is.na(log_conc) ~ lod_sewage - )) + by = c("lab_site") + ) |> # Remove below LOD values + dplyr::mutate( + lod_sewage = + dplyr::case_when( + is.na(log_conc) ~ NA, + !is.na(log_conc) ~ lod_sewage + ) + ) # Make a hospital admissions dataframe to bind to df_hosp <- data.frame( diff --git a/R/utils.R b/R/utils.R index 9e0ff95d..7f2af596 100644 --- a/R/utils.R +++ b/R/utils.R @@ -51,3 +51,29 @@ add_pmfs <- function(pmfs) { }) ) } + +#' @title Get index matrix +#' @description Get the matrix needed to convert a vetor from weekly to daily +#' @param n_days number of days we will expand to +#' @param n_weeks number of weeks those days correspond to +#' +#' @return a n_day x n_week matrix for multiplying by weekly estimated +#' value to conver it to daily +#' @export +#' +#' @examples +#' ind_m <- get_ind_m(14, 2) +get_ind_m <- function(n_days, n_weeks) { + ind_m <- matrix(nrow = n_days, ncol = n_weeks) + for (i in 1:n_days) { + for (j in 1:n_weeks) { + if (((i - 1) %/% 7) + 1 == j) { + ind_m[i, j] <- 1 + } else { + ind_m[i, j] <- 0 + } + } + } + + return(ind_m) +} diff --git a/man/drop_first_and_renormalize.Rd b/man/drop_first_and_renormalize.Rd index ffd998e4..6fc16218 100644 --- a/man/drop_first_and_renormalize.Rd +++ b/man/drop_first_and_renormalize.Rd @@ -23,6 +23,4 @@ manually drop the first element from the PMF vector. \examples{ pmf_orig <- c(0.1, 0.1, 0.1, 0.7) pmf_trunc <- drop_first_and_renormalize(pmf_orig) -pmf_trunc -0.11, 0.11, 0.777 } diff --git a/man/get_ind_m.Rd b/man/get_ind_m.Rd new file mode 100644 index 00000000..e8127388 --- /dev/null +++ b/man/get_ind_m.Rd @@ -0,0 +1,23 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/utils.R +\name{get_ind_m} +\alias{get_ind_m} +\title{Get index matrix} +\usage{ +get_ind_m(n_days, n_weeks) +} +\arguments{ +\item{n_days}{number of days we will expand to} + +\item{n_weeks}{number of weeks those days correspond to} +} +\value{ +a n_day x n_week matrix for multiplying by weekly estimated +value to conver it to daily +} +\description{ +Get the matrix needed to convert a vetor from weekly to daily +} +\examples{ +ind_m <- get_ind_m(14, 2) +} diff --git a/man/make_reporting_delay_pmf.Rd b/man/make_reporting_delay_pmf.Rd new file mode 100644 index 00000000..9d7605be --- /dev/null +++ b/man/make_reporting_delay_pmf.Rd @@ -0,0 +1,28 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/delay_distribs.R +\name{make_reporting_delay_pmf} +\alias{make_reporting_delay_pmf} +\title{Make reporting delay pmf} +\usage{ +make_reporting_delay_pmf(incubation_period_pmf, hospital_onset_delay_pmf) +} +\arguments{ +\item{incubation_period_pmf}{a numeric vector, sums to 1, indicating +the probability of time from infection to symptom onset} + +\item{hospital_onset_delay_pmf}{a numeric vector, sums to 1, indicating the +proabbility of time from symptom onset to hospital admissions} +} +\value{ +convolution of incubation period and sympton onset to hospital +admission pmf +} +\description{ +Convolve the incubation period pmf with the symptom to hospital admission pmf +and normalize +} +\examples{ +inc_pmf <- make_incubation_period_pmf(3.6, 1.5, 0.15) +hosp_delay_pmf <- make_hospital_onset_delay_pmf(7, 2.5) +inf_to_hosp_pmf <- make_reporting_delay_pmf(inc_pmf, hosp_delay_pmf) +} diff --git a/man/simulate_double_censored_pmf.Rd b/man/simulate_double_censored_pmf.Rd index 7d12f025..5127a495 100644 --- a/man/simulate_double_censored_pmf.Rd +++ b/man/simulate_double_censored_pmf.Rd @@ -3,7 +3,7 @@ \name{simulate_double_censored_pmf} \alias{simulate_double_censored_pmf} \title{Simulate daily double censored PMF. From {epinowcast}: -https://package.epinowcast.org/dev/reference/simulate_double_censored_pmf.html} +https://package.epinowcast.org/dev/reference/simulate_double_censored_pmf.html #nolint} \usage{ simulate_double_censored_pmf( max, From e858a0301c7fa5ebf554f331873b833fbefb2e10 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Sat, 29 Jun 2024 14:14:26 -0400 Subject: [PATCH 024/103] add simulated data as package data --- DESCRIPTION | 1 + R/generate_simulated_data.R | 140 ++++++++++++++++----------------- data-raw/vignette_data.R | 2 + data/hosp_data.rda | Bin 0 -> 466 bytes data/hosp_data_eval.rda | Bin 0 -> 558 bytes data/ww_data.rda | Bin 0 -> 1258 bytes man/generate_simulated_data.Rd | 20 +++-- vignettes/wwinference.Rmd | 60 ++++++++++++-- 8 files changed, 136 insertions(+), 87 deletions(-) create mode 100644 data/hosp_data.rda create mode 100644 data/hosp_data_eval.rda create mode 100644 data/ww_data.rda diff --git a/DESCRIPTION b/DESCRIPTION index 62bd51f5..6db2c948 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -62,3 +62,4 @@ RoxygenNote: 7.3.1 Suggests: testthat (>= 3.0.0) Config/testthat/edition: 3 +LazyData: true diff --git a/R/generate_simulated_data.R b/R/generate_simulated_data.R index 21db05a6..37ba142a 100644 --- a/R/generate_simulated_data.R +++ b/R/generate_simulated_data.R @@ -12,10 +12,10 @@ #' catchment area in each of those sites (order must match) #' @param pop_size integer indicating the population size in the hypothetical #' state -#' @param n_lab_sites integer indicating the nummber of unique combinations of -#' labs and sites. Must be greater than or equal to `n_sites` -#' @param map_site_to_lab vector mapping the sites to the lab-sites in order -#' of the sites +#' @param site vector of integers indicating which site (WWTP) each separate +#' lab-site observation comes frm +#' @param lab vector of integers indicating which lab the lab-site observations +#' come from #' @param ot integer indicating the observed time: length of hospital admissions #' calibration time in days #' @param nt integer indicating the nowcast time: length of time between last @@ -51,8 +51,12 @@ #' @param input_params_path path to the toml file with the parameters to use #' to generate the simulated data #' -#' @return a list containing two dataframes. hosp_data is a dataframe containing -#' the number of daily hospital admissions by day for a theoretical US state. +#' @return a list containing three dataframes. hosp_data is a dataframe +#' containing the number of daily hospital admissions by day for a theoretical +#' US state, for the duration of the specified calibration period. +#' hosp_data_eval is a dataframe containing the number of daily hospital +#' admissions by day for a theoretical US state, for the entire evaluation +#' period. #' ww_data is a dataframe containing the measured wastewater concentrations #' in each site alongside other metadata necessary for modeling that data. #' @export @@ -74,8 +78,8 @@ generate_simulated_data <- function(r_in_weeks = # nolint n_sites = 4, ww_pop_sites = c(4e5, 2e5, 1e5, 5e4), pop_size = 1e6, - n_lab_sites = 4, - map_site_to_lab = c(1, 2, 3, 4), + site = c(1, 1, 2, 3, 4), + lab = c(1, 2, 3, 3, 3), ot = 90, nt = 9, forecast_horizon = 28, @@ -112,17 +116,8 @@ generate_simulated_data <- function(r_in_weeks = # nolint pop_size > sum(ww_pop_sites) ) stopifnot( - "Insufficient population sizes provided for wastewater catchment areas" = - length(ww_pop_sites) >= n_sites - ) - - stopifnot( - "Insufficient number of lab-site combinations provided" = - n_lab_sites >= n_sites - ) - stopifnot( - "Mapping from sites to lab-sites not provided" = - length(map_site_to_lab) == n_lab_sites + "Site and lab indices don't align" = + length(site) == length(lab) ) @@ -152,11 +147,19 @@ generate_simulated_data <- function(r_in_weeks = # nolint } # Create a tibble that maps sites, labs, and population sizes of the sites + n_sites <- length(unique(site)) site_lab_map <- data.frame( - lab_site = 1:n_lab_sites, - site = map_site_to_lab + site, + lab ) |> - dplyr::left_join(data.frame(site = 1:n_sites, ww_pop = ww_pop_sites)) + dplyr::mutate( + lab_site = dplyr::row_number() + ) |> + dplyr::left_join(data.frame( + site = 1:n_sites, + ww_pop = ww_pop_sites + )) + n_lab_sites <- nrow(site_lab_map) # Define some time variables ht <- nt + forecast_horizon @@ -414,7 +417,7 @@ generate_simulated_data <- function(r_in_weeks = # nolint ) |> dplyr::left_join(date_df, by = "t") |> dplyr::left_join(site_lab_map, - by = c("lab_site") + by = "lab_site" ) |> dplyr::left_join( data.frame( @@ -429,62 +432,53 @@ generate_simulated_data <- function(r_in_weeks = # nolint is.na(log_conc) ~ NA, !is.na(log_conc) ~ lod_sewage ) + ) |> + dplyr::mutate( + genome_copies_per_ml = exp(log_conc), + lod = exp(lod_sewage) + ) |> + dplyr::filter(!is.na(genome_copies_per_ml)) |> + dplyr::rename(site_pop = ww_pop) |> + dplyr::arrange(site, lab, date) |> + dplyr::select(date, site, lab, genome_copies_per_ml, lod, site_pop) + + # Make a hospital admissions dataframe for model calibration + hosp_data <- data.frame( + t = 1:ot, + daily_hosp_admits = exp_obs_hosp[1:ot], + state_pop = pop_size + ) |> + dplyr::left_join( + date_df, + by = "t" + ) |> + dplyr::select( + date, + daily_hosp_admits, + state_pop ) - # Make a hospital admissions dataframe to bind to - df_hosp <- data.frame( - t = 1:(ot + ht), - daily_hosp_admits = c(exp_obs_hosp[1:ot], rep(NA, ht)), - daily_hosp_admits_for_eval = exp_obs_hosp - ) - - # State infections per capita - df_inf <- data.frame( + # Make another one for model evaluation + hosp_data_eval <- data.frame( t = 1:(ot + ht), - inf_per_capita = new_i_over_n[(uot + 1):(uot + ot + ht)] - ) - - example_df <- df_long %>% - dplyr::left_join(df_hosp, - by = "t" - ) %>% - dplyr::mutate( - pop = pop_size, - forecast_date = forecast_date, - hosp_calibration_time = ot - ) %>% - dplyr::left_join(site_lab_map, - by = c("lab_wwtp_unique_id" = "lab_site") - ) %>% - dplyr::left_join(df_inf, + daily_hosp_admits_for_eval = exp_obs_hosp, + state_pop = pop_size + ) |> + dplyr::left_join( + date_df, by = "t" + ) |> + dplyr::select( + date, + daily_hosp_admits_for_eval, + state_pop ) - - - # Get the true parameter dataframe, making sure this is formatted the same as - # as the output from get_full_param_distrib() - p_hosp_df <- data.frame( - name = "p_hosp", true_value = p_hosp_days, - index_rows = NA, - index_cols = seq_along(p_hosp_days) - ) - r_df <- data.frame( - name = "rt", true_value = rt, - index_rows = NA, - index_cols = seq_along(rt) - ) - log10_g_df <- data.frame( - name = "log10_g", true_value = log10_g_prior_mean, - index_rows = NA, - index_cols = NA + example_data <- list( + ww_data = ww_data, + hosp_data = hosp_data, + hosp_data_eval = hosp_data_eval ) - param_df <- rbind(p_hosp_df, r_df, log10_g_df) - - toy_data_and_params <- list( - param_df = param_df, - example_df = example_df - ) - return(toy_data_and_params) + return(example_data) } diff --git a/data-raw/vignette_data.R b/data-raw/vignette_data.R index 346b5bc3..26317682 100644 --- a/data-raw/vignette_data.R +++ b/data-raw/vignette_data.R @@ -2,6 +2,8 @@ set.seed(1) simulated_data <- wwinference::generate_simulated_data() hosp_data <- simulated_data$hosp_data ww_data <- simulated_data$ww_data +hosp_data_eval <- simulated_data$hosp_data_eval usethis::use_data(hosp_data, overwrite = TRUE) +usethis::use_data(hosp_data_eval, overwrite = TRUE) usethis::use_data(ww_data, overwrite = TRUE) diff --git a/data/hosp_data.rda b/data/hosp_data.rda new file mode 100644 index 0000000000000000000000000000000000000000..7f551268f21c639841cb811a0feb855409f27f4f GIT binary patch literal 466 zcmZ>Y%CIzaj8qGbls(aKoPqhz|7-t>Yh1$r|NroWK_Fm#`ke(X42%p83>*v&j0YGm z9^84u#A_BC2jl*$DhzBh8UA1O{~D6P!0fo^IWvFd46vE?K^i5?%f~G&Mz(}6EpQ@VWSimGtdz{!NV%b?$uschlF5J%s@V59{wb>i!9Eah|qA!f3beMBRm~ zj-4z{6Qx`upO06jZ_6;qJ6M5uMm31r;dm^CM7Nb{@Wqa**P4Bi> zx!Z2r3JaRLTnI|wJnF$TiRF}M!<1jIZZtBna0nM|a)oRGH)%;|U~^ o8Y%CIzaj8qGbY)O-!!N8jR|Jc9kn2_-Q|NsAC5O7$Zes{qW1_uTX1_#CiOu;cS zyNf z!q&o;g)vXLo@yuutvt8lRj}3Nt*g7`toHpWzj1QfJBE8HPJzMKy86PmPMTVlpp=^z z^+2|F(E^tVf4;`X77LcRx>r7ax&LsYbCr@>jM2^&>>KR^1=;H^Zqe;L(OD%R%A~^R z)v#4yk>b6Ou0od^os&PW)?|9=Fo~hGk%L)DX8x-$3JkIhhYFav9*pvhjo>P%i!k2?i3xq6D7pMT@^F*gu! z;Sf-8Xk=o6QeZJRH>cMtLNonhYxmUbPCNg-es%GZy&r2=2VKuHZC_jdTx{=}s4cs# zLuCAgSG@i1)7o{-Xn$r-_ZQQd{?|J9uH2?_ciE$}Ve{fTJXI!no2pC#Ql2W)KQJ&p ltZHCj0J3L2frve96W3a&z#GoP^}lnfCRvqp&1P{#Kg&f0$|W45r_;!Mwo$+ z+G#RjG{ni84GD~cNrszL!eKL0Mnll5jWTEe2ATk9001-^00001pa1{^LqGsB0MGyc z0002c004S`07OLwng9TsQ%0L01Z^V$Gf9NN005e4fJ^`Y0%BrdCMEy?005aZzzATC zG|7Mnr|C+brXi80BTS4VBLXmuG+<3KX|y27!Zb9%2r?Qp10klGGHL1zCMJvo#4rM5 zQznd<1R6ABQ_%uRDiM*9k&&jFG}Az6&@{+028M$}Kr{h}8Zjx`b&^1z z*;nP9A!RwvK!gvIGZrVemk>=b)Xan0ePB<*OoSldxeh>ugW);pkwk%zDJo-xpGFG; z0hFBch{^X7UVsyfLqyOzB?4Gf1QIY1v>ByKn|shmWKuw_E^Y{88Zi&%jS?z5la2z8 z949N<#fk_4+v1i2B6oJ9CXM|5U0_avnBJ1Q8DqF^E4R36TLM zsRFJ7VP%*mn3J@-&AWEhrdWnRM>#{Z8!f{DG=YvAd#Bt`Fi$e--+1(ye4KDdG3$!d z$^vFU5jRpYc5-rVHcw1x=H}m|L74!+GbEABf0Y2Sq>n4U;G6?XmRo(vmN$j?yLj&s zim94@bQUBrQYMokKTgd((N4|IRz)z4T{<%CvvblP{gf#%A?m!9+fThrisVE+M$fps zl)wNW00cl006+jh0DuI{12P$bm|-NHN7jQtQ&G7nw6`o9-nMdNADuMsF_W(plmS6^X1M%}I*?UTZOFoa>#vv(2 zNbNe*7z}N;1css!oTtOb%0@E2Fu* z=dAD&v7?m!AOHdUAUJRrIfnhjI*5kKLt2nP0L?41#7qfR`58h01`RWn=7}o-5Pbe|6gngWsmaboC@Bv)q2NG? zEy5(S6N}JYWiy&yIzSYDWf6jFJhueMm$_(L+w4IIhI~4r59<;b9i{>X5=9z;7DK-T z&>TXDmtG4Wb&?Iz02>1UJ^USU`$GZ(mqK+6rt`}!Q;37oRAv9NyBbPV7|+$~sDP^N zkQn(K`qTZAZP$9w^zK@ih=`wieaYJ9$!6Ow`3_9~ZxcQ?005Y3pM8bQO@Zi>P`s)_ zKv8C6^PQ`uOZ}0o$RTkONWj1I=+K~XQb;q?v@%cpLuty_#egYLpe;c0++HM}IJ$I~ U)N6zUr2X-CBvXY61<+9%05%R8G5`Po literal 0 HcmV?d00001 diff --git a/man/generate_simulated_data.Rd b/man/generate_simulated_data.Rd index 9d1739bf..d5d890d0 100644 --- a/man/generate_simulated_data.Rd +++ b/man/generate_simulated_data.Rd @@ -9,8 +9,8 @@ generate_simulated_data( n_sites = 4, ww_pop_sites = c(4e+05, 2e+05, 1e+05, 50000), pop_size = 1e+06, - n_lab_sites = 4, - map_site_to_lab = c(1, 2, 3, 4), + site = c(1, 1, 2, 3, 4), + lab = c(1, 2, 3, 3, 3), ot = 90, nt = 9, forecast_horizon = 28, @@ -43,11 +43,11 @@ catchment area in each of those sites (order must match)} \item{pop_size}{integer indicating the population size in the hypothetical state} -\item{n_lab_sites}{integer indicating the nummber of unique combinations of -labs and sites. Must be greater than or equal to \code{n_sites}} +\item{site}{vector of integers indicating which site (WWTP) each separate +lab-site observation comes frm} -\item{map_site_to_lab}{vector mapping the sites to the lab-sites in order -of the sites} +\item{lab}{vector of integers indicating which lab the lab-site observations +come from} \item{ot}{integer indicating the observed time: length of hospital admissions calibration time in days} @@ -100,8 +100,12 @@ LOD across sites} to generate the simulated data} } \value{ -a list containing two dataframes. hosp_data is a dataframe containing -the number of daily hospital admissions by day for a theoretical US state. +a list containing three dataframes. hosp_data is a dataframe +containing the number of daily hospital admissions by day for a theoretical +US state, for the duration of the specified calibration period. +hosp_data_eval is a dataframe containing the number of daily hospital +admissions by day for a theoretical US state, for the entire evaluation +period. ww_data is a dataframe containing the measured wastewater concentrations in each site alongside other metadata necessary for modeling that data. } diff --git a/vignettes/wwinference.Rmd b/vignettes/wwinference.Rmd index bdd26fae..af14a9c5 100644 --- a/vignettes/wwinference.Rmd +++ b/vignettes/wwinference.Rmd @@ -73,12 +73,6 @@ hosp_data <- wwinference::hosp_data ww_data <- wwinference::ww_data ``` -We'll make some plots of the data just to make sure it looks like what we'd expect: - -```{r} -# Add plots -``` - # Pre-processing We will need to set some metadata to facilitate model specification. These include: - the date on which we are making the forecast @@ -88,9 +82,63 @@ We will need to set some metadata to facilitate model specification. These inclu - specification of the delay from infection to hospital admissions for COVID-19 - setting the priors and parameters, some of which are COVID-19 specific ```{r} +# Get the parameters params <- get_params( system.file("extdata", "example_params.toml", package = "wwinference" ) ) + +# Pre-process the wastewater dataset +ww_data <- ww_data |> + dplyr::left_join( + ww_data |> + dplyr::distinct(lab, site) |> + dplyr::mutate( + lab_site = dplyr::row_number() + ), + by = c("lab", "site") + ) |> + dplyr::mutate(lab_site_name = glue::glue("Site: {site}, Lab: {lab}")) + +# Pre-process the hospital admissions dataset +``` + +We'll make some plots of the data just to make sure it looks like what we'd expect: + +```{r} +# Add plots +ggplot(ww_data) + + geom_point( + aes( + x = date, y = genome_copies_per_ml, + color = as.factor(lab_site_name) + ), + show.legend = FALSE + ) + + geom_point( + data = ww_data |> filter(genome_copies_per_ml <= lod), + aes(x = date, y = genome_copies_per_ml, color = "red"), + show.legend = FALSE + ) + + geom_hline(aes(yintercept = lod), linetype = "dashed") + + facet_wrap(~lab_site_name, scales = "free") + + xlab("") + + ylab("Genome copies/mL") + + ggtitle("Lab-site level wastewater concentration") + + theme_bw() + +ggplot(hosp_data) + + geom_point( + data = hosp_data_eval, aes( + x = date, + y = daily_hosp_admits_for_eval + ), + shape = 21, color = "black", fill = "white" + ) + + geom_point(aes(x = date, y = daily_hosp_admits)) + + xlab("") + + ylab("Daily hospital admissions") + + ggtitle("State level hospital admissions") + + theme_bw() ``` From 5a91ec87ad9a88027a7110f14d4328135c598d85 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Sat, 29 Jun 2024 15:33:36 -0400 Subject: [PATCH 025/103] add some initial pre-processing functions from other packages --- R/get_stan_data.R | 335 ++++++++++++++++++++++++++++++++++++++ R/preprocessing.R | 152 +++++++++++++++++ vignettes/wwinference.Rmd | 1 + 3 files changed, 488 insertions(+) create mode 100644 R/get_stan_data.R create mode 100644 R/preprocessing.R diff --git a/R/get_stan_data.R b/R/get_stan_data.R new file mode 100644 index 00000000..aaea9342 --- /dev/null +++ b/R/get_stan_data.R @@ -0,0 +1,335 @@ +#' Get stan data +#' +#' @param model_type string indicating which model we are getting data for +#' Options are `ww` or `hosp` +#' @param forecast_date string indicating the forecast date +#' @param forecast_time integer indicating the number of days to make a forecast +#' for +#' @param calibration_time integer indicating the max duration in days that +#' the model is calibrated to hospital admissions for +#' @param input_ww_data a dataframe with the input wastewater data +#' @param input_hosp_data a dataframe with the input hospital admissions data +#' @param generation_interval a vector with a zero-truncated normalized pmf of +#' the generation interval +#' @param inf_to_hosp a vector with a normalized pmf of the delay from infection +#' to hospital admissions +#' @param infection_feedback_pmf a vector with a normalized pmf dictating the +#' delay of infection feedback +#' @param params a dataframe of parameter names and numeric values +#' @param compute_likelihood indicator variable telling stan whether or not to +#' compute the likelihood, default = `1` +#' @param ww_outlier_col_name A string representing the name of the +#' column in the input_ww_data that provides a 0 if the data point is not an +#' outlier to be excluded from the model fit, or a 1 if it is to be excluded +#' default value is `flag_as_ww_outlier` +#' @param lod_col_name A string representing the name of the +#' column in the input_ww_data that provides a 0 if the data point is not above +#' the LOD and a 1 if the data is below the LOD, default value is `below_LOD` +#' @param ww_measurement_col_name A string representing the name of the column +#' in the input_ww_data that indicates the wastewater measurement value in +#' natural scale, default is `ww` +#' @param ww_value_lod_col_name A string representing the name of the column +#' in the input_ww_data that indicates the value of the LOD in natural scale, +#' default is `lod_sewage` +#' @param hosp_value_col_name A string representing the name of the column +#' in the input_hosp-data that indicates the number of daily hospital +#' admissions, default is `daily_hosp_admits` +#' +#' @return a list named variables to pass to stan +#' @export +get_stan_data <- function(model_type, + forecast_date, + forecast_time, + calibration_time, + input_ww_data, + input_hosp_data, + generation_interval, + inf_to_hosp, + infection_feedback_pmf, + params, + compute_likelihood = 1, + ww_outlier_col_name = "flag_as_ww_outlier", + lod_col_name = "below_LOD", + ww_measurement_col_name = "ww", + ww_value_lod_col_name = "lod_sewage", + hosp_value_col_name = "daily_hosp_admits") { + # Assign parameter names + par_names <- colnames(params) + for (i in seq_along(par_names)) { + assign(par_names[i], as.double(params[i])) + } + + # Indicator variable whether or not to include ww in likelihood + include_ww <- ifelse(model_type == "ww", 1, 0) + + last_hosp_data_date <- get_last_hosp_data_date(input_hosp_data) + + # Get state pop + pop <- input_hosp_data |> + dplyr::select(pop) |> + unique() |> + dplyr::pull(pop) + + stopifnot( + "More than one population size in training data" = + length(pop) == 1 + ) + + + if (include_ww == 1) { + # Test for presence of column names + stopifnot( + "Outlier column name isn't present in input dataset" = + ww_outlier_col_name %in% colnames(input_ww_data) + ) + + # Test to see if ww_data_present + ww_data_present <- nrow(input_ww_data) != 0 + if (ww_data_present == FALSE) { + message("No wastewater data present") + } + + # Filter out wastewater outliers and arrange data for indexing + ww_data <- input_ww_data |> + dplyr::filter({{ ww_outlier_col_name }} != 1) |> + dplyr::arrange(date, site_index) + + ww_data_sizes <- get_ww_data_sizes( + ww_data, + lod_col_name + ) + ww_indices <- get_ww_data_indices(ww_data, + input_hosp_data, + owt = ww_data_sizes$owt, + lod_col_name = lod_col_name + ) + ww_values <- get_ww_values( + ww_data, + ww_measurement_col_name, + ww_value_lod_col_name, + ) + + stopifnot( + "Wastewater sampled times not equal to length of input ww data" = + length(ww_indices$ww_sampled_times) == ww_data_sizes$owt + ) + + + message( + "Prop of population size covered by wastewater: ", + sum(ww_values$pop_ww) / pop + ) + + # Logic to determine the number of subpopulations to estimate R(t) for: + # First determine if we need to add an additional subpopulation + add_auxiliary_site <- ifelse(pop >= sum(ww_values$pop_ww), TRUE, FALSE) + # Then get the number of subpopulations, the population to normalize by + # (sum of the subpopulations), and the vector of sizes of each subpopulation + subpop_data <- get_subpop_data(add_auxiliary_site, + state_pop = pop, + pop_ww = ww_values$pop_ww, + n_ww_sites = ww_data_sizes$n_ww_sites + ) + } else { # Hospital admissions only model) + # Still need to specify wastewater input data, so set as 0s. Won't get + # used by stan to compute the likelihood. None of these will be used. + owt <- 1 + ww_sampled_times <- c(1) + log_conc <- c(1) + } + + # Get the remaining things needed for both models + hosp_data <- add_time_indexing(input_hosp_data) + hosp_data_sizes <- get_hosp_data_sizes( + input_hosp_data = hosp_data, + forecast_date = forecast_date, + forecast_time = forecast_time, + calibration_time = calibration_time, + last_hosp_data_date = last_hosp_data_date, + uot = uot, + hosp_value_col_name = hosp_value_col_name + ) + hosp_indices <- get_hosp_indices(hosp_data) + hosp_values <- get_hosp_values( + hosp_data, + ot = hosp_data_sizes$ot, + ht = hosp_data_sizes$ht, + hosp_value_col_name + ) + + if (include_ww == 1) { + message( + "Removed ", nrow(input_ww_data) - ww_data_sizes$owt, + " outliers from WW data" + ) + } + + # matrix to transform IHR from weekly to daily + ind_m <- get_ind_m( + hosp_data_sizes$ot + hosp_data_sizes$ht, + hosp_data_sizes$n_weeks + ) + # matrix to transform p_hosp RW from weekly to daily + p_hosp_m <- get_ind_m( + uot + hosp_data_sizes$ot + hosp_data_sizes$ht, + hosp_data_sizes$tot_weeks + ) + + # Estimate of number of initial infections + i0 <- mean(hosp_values$hosp_admits[1:7], na.rm = TRUE) / p_hosp_mean + + # package up parameters for stan data object + viral_shedding_pars <- c( + t_peak_mean, t_peak_sd, viral_peak_mean, viral_peak_sd, + duration_shedding_mean, duration_shedding_sd + ) + + hosp_delay_max <- length(inf_to_hosp) + + if (model_type == "ww") { + data_renewal <- list( + gt_max = gt_max, + hosp_delay_max = hosp_delay_max, + inf_to_hosp = inf_to_hosp, + dur_inf = dur_inf, + mwpd = ml_of_ww_per_person_day, + ot = hosp_data_sizes$ot, + n_subpops = subpop_data$n_subpops, + n_ww_sites = ww_data_sizes$n_ww_sites, + n_ww_lab_sites = ww_data_sizes$n_ww_lab_sites, + owt = ww_data_sizes$owt, + oht = hosp_data_sizes$oht, + n_censored = ww_data_sizes$n_censored, + n_uncensored = ww_data_sizes$n_uncensored, + uot = uot, + ht = hosp_data_sizes$ht, + n_weeks = hosp_data_sizes$n_weeks, + ind_m = ind_m, + tot_weeks = hosp_data_sizes$tot_weeks, + p_hosp_m = p_hosp_m, + generation_interval = generation_interval, + ts = 1:gt_max, + state_pop = pop, + subpop_size = subpop_data$subpop_size, + norm_pop = subpop_data$norm_pop, + ww_sampled_times = ww_indices$ww_sampled_times, + hosp_times = hosp_indices$hosp_times, + ww_sampled_lab_sites = ww_indices$ww_sampled_lab_sites, + ww_log_lod = ww_values$ww_lod, + ww_censored = ww_indices$ww_censored, + ww_uncensored = ww_indices$ww_uncensored, + hosp = hosp_values$hosp_admits, + day_of_week = hosp_values$day_of_week, + log_conc = ww_values$log_conc, + compute_likelihood = compute_likelihood, + include_ww = include_ww, + include_hosp = 1, + if_l = length(infection_feedback_pmf), + infection_feedback_pmf = infection_feedback_pmf, + # All the priors! + viral_shedding_pars = viral_shedding_pars, # tpeak, viral peak, dur_shed + autoreg_rt_a = autoreg_rt_a, + autoreg_rt_b = autoreg_rt_b, + autoreg_rt_site_a = autoreg_rt_site_a, + autoreg_rt_site_b = autoreg_rt_site_b, + autoreg_p_hosp_a = autoreg_p_hosp_a, + autoreg_p_hosp_b = autoreg_p_hosp_b, + inv_sqrt_phi_prior_mean = inv_sqrt_phi_prior_mean, + inv_sqrt_phi_prior_sd = inv_sqrt_phi_prior_sd, + r_prior_mean = r_prior_mean, + r_prior_sd = r_prior_sd, + log10_g_prior_mean = log10_g_prior_mean, + log10_g_prior_sd = log10_g_prior_sd, + i0_over_n_prior_a = 1 + i0_certainty * (i0 / pop), + i0_over_n_prior_b = 1 + i0_certainty * (1 - (i0 / pop)), + wday_effect_prior_mean = wday_effect_prior_mean, + wday_effect_prior_sd = wday_effect_prior_sd, + initial_growth_prior_mean = initial_growth_prior_mean, + initial_growth_prior_sd = initial_growth_prior_sd, + sigma_ww_site_prior_mean_mean = sigma_ww_site_prior_mean_mean, + sigma_ww_site_prior_mean_sd = sigma_ww_site_prior_mean_sd, + sigma_ww_site_prior_sd_mean = sigma_ww_site_prior_sd_mean, + sigma_ww_site_prior_sd_sd = sigma_ww_site_prior_sd_sd, + eta_sd_sd = eta_sd_sd, + sigma_i0_prior_mode = sigma_i0_prior_mode, + sigma_i0_prior_sd = sigma_i0_prior_sd, + p_hosp_prior_mean = p_hosp_mean, + p_hosp_sd_logit = p_hosp_sd_logit, + p_hosp_w_sd_sd = p_hosp_w_sd_sd, + ww_site_mod_sd_sd = ww_site_mod_sd_sd, + inf_feedback_prior_logmean = infection_feedback_prior_logmean, + inf_feedback_prior_logsd = infection_feedback_prior_logsd, + sigma_rt_prior = sigma_rt_prior, + log_phi_g_prior_mean = log_phi_g_prior_mean, + log_phi_g_prior_sd = log_phi_g_prior_sd, + ww_sampled_sites = ww_indices$ww_sampled_sites, + lab_site_to_site_map = ww_indices$lab_site_to_site_map + ) + } else if (model_type == "hosp") { + data_renewal <- list( + gt_max = gt_max, + hosp_delay_max = hosp_delay_max, + inf_to_hosp = inf_to_hosp, + dur_inf = dur_inf, # this is used bc drift + mwpd = ml_of_ww_per_person_day, + ot = hosp_data_sizes$ot, + owt = owt, + oht = hosp_data_sizes$oht, + uot = uot, + ht = hosp_data_sizes$ht, + n_weeks = hosp_data_sizes$n_weeks, + ind_m = ind_m, + tot_weeks = hosp_data_sizes$tot_weeks, + p_hosp_m = p_hosp_m, + generation_interval = generation_interval, + ts = 1:gt_max, + n = pop, + hosp_times = hosp_indices$hosp_times, + ww_sampled_times = ww_sampled_times, + hosp = hosp_values$hosp_admits, + day_of_week = hosp_values$day_of_week, + log_conc = log_conc, + compute_likelihood = compute_likelihood, + include_ww = include_ww, + include_hosp = 1, + if_l = length(infection_feedback_pmf), + infection_feedback_pmf = infection_feedback_pmf, + # Priors + viral_shedding_pars = viral_shedding_pars, # tpeak, viral peak, + # duration shedding + autoreg_rt_a = autoreg_rt_a, + autoreg_rt_b = autoreg_rt_b, + autoreg_p_hosp_a = autoreg_p_hosp_a, + autoreg_p_hosp_b = autoreg_p_hosp_b, + inv_sqrt_phi_prior_mean = inv_sqrt_phi_prior_mean, + inv_sqrt_phi_prior_sd = inv_sqrt_phi_prior_sd, + r_prior_mean = r_prior_mean, + r_prior_sd = r_prior_sd, + log10_g_prior_mean = log10_g_prior_mean, + log10_g_prior_sd = log10_g_prior_sd, + i0_over_n_prior_a = 1 + i0_certainty * (i0 / pop), + i0_over_n_prior_b = 1 + i0_certainty * (1 - (i0 / pop)), + wday_effect_prior_mean = wday_effect_prior_mean, + wday_effect_prior_sd = wday_effect_prior_sd, + initial_growth_prior_mean = initial_growth_prior_mean, + initial_growth_prior_sd = initial_growth_prior_sd, + sigma_ww_prior_mean = sigma_ww_site_prior_mean_mean, + eta_sd_sd = eta_sd_sd, + p_hosp_prior_mean = p_hosp_mean, + p_hosp_sd_logit = p_hosp_sd_logit, + p_hosp_w_sd_sd = p_hosp_w_sd_sd, + inf_feedback_prior_logmean = infection_feedback_prior_logmean, + inf_feedback_prior_logsd = infection_feedback_prior_logsd + ) + } else { + cli::cli_abort("Unknown model") + data_renewal <- list() + } + + stopifnot( + "Model type not specified properly" = + !purrr::is_empty(data_renewal) + ) + + return(data_renewal) +} diff --git a/R/preprocessing.R b/R/preprocessing.R new file mode 100644 index 00000000..d8ed1428 --- /dev/null +++ b/R/preprocessing.R @@ -0,0 +1,152 @@ +#' Get input wastewater data +#' @param ww_data +#' @param conc_col_name name of the column containing the concentration +#' measurements in the wastewater data, default is `genome_copies_per_ml` +#' @param forecast_date The forecast date for this iteration, +#' formatted as a character string in IS08601 format (YYYY-MM-DD). +#' @param calibration_time The duration of the model calibration period +#' (relative to the last hospital admissions data point) in units of +#' model timesteps (typically days). +#' @param last_hosp_data_date A date indicating the date of last reported +#' hospital admission as of the forecast date +#' +#' @return a dataframe containing the transformed and clean NWSS data +#' at the site and lab label for the forecast date and location specified +#' @export +preprocess_ww_data <- function(ww_data, + conc_col_name = "genome_copies_per_ml", + forecast_date, + calibration_time, + last_hosp_data_date, + ww_data_mapping) { + # Add some columns + ww_data <- ww_data |> + dplyr::left_join( + ww_data |> + dplyr::distinct(lab, site) |> + dplyr::mutate( + lab_site = dplyr::row_number() + ), + by = c("lab", "site") + ) |> + dplyr::mutate( + lab_site_name = glue::glue("Site: {site}, Lab: {lab}"), + below_lod = ifelse({{ conc_col_name }} < lod, 1, 0) + ) + + # Get extra columns that identify wastewater outliers + ww_w_outliers <- flag_ww_outliers(ww_data) |> + select( + date, location, ww, site, lab, lab_wwtp_unique_id, + ww_pop, below_LOD, lod_sewage, flag_as_ww_outlier + ) + # If more than one location, than this data isn't being used for fitting + # And we don't wanto generate these + if (length(location_i) == 1) { + site_map <- ww_w_outliers |> + distinct(site) |> + mutate(site_index = row_number()) + site_lab_map <- ww_w_outliers |> + distinct(lab_wwtp_unique_id) |> + mutate(lab_site_index = row_number()) + + ww <- ww_w_outliers |> + left_join(site_map, by = "site") |> + left_join(site_lab_map, by = "lab_wwtp_unique_id") + } else { + ww <- ww_w_outliers + } + + + return(ww) +} + + + +#' Flag WW outliers +#' +#' @param ww_data data at the lab-site level of WW concentrations +#' @param rho_threshold z-score threshold for "jump" +#' @param log_conc_threshold z-score threshold for log concentration +#' @param threshold_n_dps min number of data points above the LOD per lab-site +#' +#' @return ww_data + columns for outlier flagging +#' @export +#' +#' @examples +flag_ww_outliers <- function(ww_data, + rho_threshold = 2, + log_conc_threshold = 3, + threshold_n_dps = 1) { + n_dps <- ww_data |> + dplyr::filter(below_LOD == 0) %>% + group_by(lab_wwtp_unique_id) %>% + summarise(n_data_points = n()) + + # Get the ww statistics we need for outlier detection + ww_stats <- ww_data %>% + left_join(n_dps, by = "lab_wwtp_unique_id") %>% + # exclude below LOD from z scoring and remove lab-sites with too + # few data points + dplyr::filter(below_LOD == 0, n_data_points > threshold_n_dps) %>% + group_by(lab_wwtp_unique_id) %>% + arrange(date, "desc") %>% + mutate( + log_conc = log(ww), + prev_log_conc = lag(log_conc, 1), + prev_date = lag(date, 1), + diff_log_conc = log_conc - prev_log_conc, + diff_time = as.numeric(difftime(date, prev_date)), + rho = diff_log_conc / diff_time + ) %>% + select(date, lab_wwtp_unique_id, rho) %>% + distinct() + + # Combine stats with ww data + ww_rho <- ww_data %>% + left_join(ww_stats, by = c("lab_wwtp_unique_id", "date")) + + # compute z scores and flag + ww_z_scored <- ww_rho %>% + dplyr::left_join( + ww_rho %>% + dplyr::group_by(lab_wwtp_unique_id) %>% + dplyr::summarise( + mean_rho = mean(rho, na.rm = TRUE), + std_rho = sd(rho, na.rm = TRUE), + mean_conc = mean(ww, na.rm = TRUE), + std_conc = sd(ww, na.rm = TRUE) + ), + by = "lab_wwtp_unique_id" + ) %>% + dplyr::group_by(lab_wwtp_unique_id) %>% + mutate( + z_score_conc = (ww - mean_conc) / std_conc, + z_score_rho = (rho - mean_rho) / std_rho + ) %>% + dplyr::mutate( + z_score_rho_t_plus_1 = lead(z_score_rho, 1), + flagged_for_removal_conc = dplyr::case_when( + abs(z_score_conc) >= log_conc_threshold ~ 1, + is.na(z_score_conc) ~ 0, + TRUE ~ 0 + ), + flagged_for_removal_rho = dplyr::case_when( + ( + abs(z_score_rho) >= rho_threshold & + (abs(z_score_rho_t_plus_1) >= rho_threshold) & + sign(z_score_rho != sign(z_score_rho_t_plus_1)) + ) ~ 1, + is.na(z_score_rho) ~ NA, + TRUE ~ 0 + ) + ) %>% + dplyr::mutate(flag_as_ww_outlier = dplyr::case_when( + flagged_for_removal_rho == 1 ~ 1, + flagged_for_removal_conc == 1 ~ 1, + TRUE ~ 0 + )) %>% + dplyr::ungroup() + + return(ww_z_scored) +} diff --git a/vignettes/wwinference.Rmd b/vignettes/wwinference.Rmd index af14a9c5..b03ebbe6 100644 --- a/vignettes/wwinference.Rmd +++ b/vignettes/wwinference.Rmd @@ -70,6 +70,7 @@ wastewater catchment area of that site ```{r} hosp_data <- wwinference::hosp_data +hosp_data_eval <- wwinference::hosp_data_eval ww_data <- wwinference::ww_data ``` From 9afd74d0f6a0b0e62605cd5e732bf1a3926d8d67 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Sun, 30 Jun 2024 13:33:49 -0400 Subject: [PATCH 026/103] add functions to preprocess wastewater data --- NAMESPACE | 3 + R/preprocessing.R | 135 ++++++++++++++++++++------------------ man/flag_ww_outliers.Rd | 40 +++++++++++ man/get_stan_data.Rd | 82 +++++++++++++++++++++++ man/preprocess_ww_data.Rd | 32 +++++++++ vignettes/wwinference.Rmd | 30 +++++---- 6 files changed, 246 insertions(+), 76 deletions(-) create mode 100644 man/flag_ww_outliers.Rd create mode 100644 man/get_stan_data.Rd create mode 100644 man/preprocess_ww_data.Rd diff --git a/NAMESPACE b/NAMESPACE index ea435bb8..752eb1fc 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -2,12 +2,15 @@ export(add_pmfs) export(drop_first_and_renormalize) +export(flag_ww_outliers) export(generate_simulated_data) export(get_ind_m) export(get_params) +export(get_stan_data) export(make_hospital_onset_delay_pmf) export(make_incubation_period_pmf) export(make_reporting_delay_pmf) +export(preprocess_ww_data) export(simulate_double_censored_pmf) export(validate_paramlist) importFrom(cmdstanr,cmdstan_model) diff --git a/R/preprocessing.R b/R/preprocessing.R index d8ed1428..10047f99 100644 --- a/R/preprocessing.R +++ b/R/preprocessing.R @@ -1,24 +1,20 @@ #' Get input wastewater data -#' @param ww_data -#' @param conc_col_name name of the column containing the concentration -#' measurements in the wastewater data, default is `genome_copies_per_ml` +#' @param ww_data dataframe containing the following columns: site, lab, +#' date, a column for concentration, and lod #' @param forecast_date The forecast date for this iteration, #' formatted as a character string in IS08601 format (YYYY-MM-DD). -#' @param calibration_time The duration of the model calibration period -#' (relative to the last hospital admissions data point) in units of -#' model timesteps (typically days). -#' @param last_hosp_data_date A date indicating the date of last reported -#' hospital admission as of the forecast date +#' @param conc_col_name name of the column containing the concentration +#' measurements in the wastewater data, default is `genome_copies_per_ml` #' #' @return a dataframe containing the transformed and clean NWSS data #' at the site and lab label for the forecast date and location specified #' @export +#' +#' @examples +#' ww_data_preprocessed <- preprocess_ww_data(ww_data, "2023-12-01") preprocess_ww_data <- function(ww_data, - conc_col_name = "genome_copies_per_ml", forecast_date, - calibration_time, - last_hosp_data_date, - ww_data_mapping) { + conc_col_name = "genome_copies_per_ml") { # Add some columns ww_data <- ww_data |> dplyr::left_join( @@ -34,96 +30,99 @@ preprocess_ww_data <- function(ww_data, below_lod = ifelse({{ conc_col_name }} < lod, 1, 0) ) - # Get extra columns that identify wastewater outliers + # Get an extra column that identifies the wastewater outliers using the + # default parameters ww_w_outliers <- flag_ww_outliers(ww_data) |> - select( - date, location, ww, site, lab, lab_wwtp_unique_id, - ww_pop, below_LOD, lod_sewage, flag_as_ww_outlier + dplyr::mutate( + forecast_date = !!forecast_date + ) |> + # In case the wastewater data being passed in isn't vintaged, we want to + # make sure we don't include values that are past the forecast date + dplyr::filter( + date < forecast_date ) - # If more than one location, than this data isn't being used for fitting - # And we don't wanto generate these - if (length(location_i) == 1) { - site_map <- ww_w_outliers |> - distinct(site) |> - mutate(site_index = row_number()) - site_lab_map <- ww_w_outliers |> - distinct(lab_wwtp_unique_id) |> - mutate(lab_site_index = row_number()) - ww <- ww_w_outliers |> - left_join(site_map, by = "site") |> - left_join(site_lab_map, by = "lab_wwtp_unique_id") - } else { - ww <- ww_w_outliers - } - - return(ww) + return(ww_w_outliers) } #' Flag WW outliers #' -#' @param ww_data data at the lab-site level of WW concentrations -#' @param rho_threshold z-score threshold for "jump" -#' @param log_conc_threshold z-score threshold for log concentration +#' @param ww_data dataframe containing the following columns: site, lab, +#' lab_site, date, a column for concentration, and below_lod +#' @param conc_col_name string, name of the column containing the concentration +#' measurements in the wastewater data, default is `genome_copies_per_ml` +#' @param rho_threshold float indicating the z-score threshold for "jump" +#' @param log_conc_threshold float indicating the z-score threshold for +#' log concentration #' @param threshold_n_dps min number of data points above the LOD per lab-site #' -#' @return ww_data + columns for outlier flagging +#' @return ww_w_outliers_flaged dataframe containing all of the columns in +#' ww_data input dataframe plus an additional column `flag_as_ww_outlier` +#' which contains a 0 if the datapoint is not an outlier and a 1 if it is +#' an outlier. #' @export #' #' @examples +#' ww_data_outliers_flagged <- flag_ww_outliers(ww_data) flag_ww_outliers <- function(ww_data, + conc_col_name = "genome_copies_per_ml", rho_threshold = 2, log_conc_threshold = 3, threshold_n_dps = 1) { n_dps <- ww_data |> - dplyr::filter(below_LOD == 0) %>% - group_by(lab_wwtp_unique_id) %>% - summarise(n_data_points = n()) + dplyr::filter(below_lod == 0) |> + dplyr::group_by(lab_site) |> + dplyr::summarise(n_data_points = dplyr::n()) # Get the ww statistics we need for outlier detection - ww_stats <- ww_data %>% - left_join(n_dps, by = "lab_wwtp_unique_id") %>% + ww_stats <- ww_data |> + dplyr::left_join(n_dps, + by = "lab_site" + ) |> # exclude below LOD from z scoring and remove lab-sites with too # few data points - dplyr::filter(below_LOD == 0, n_data_points > threshold_n_dps) %>% - group_by(lab_wwtp_unique_id) %>% - arrange(date, "desc") %>% - mutate( - log_conc = log(ww), + dplyr::filter( + below_lod == 0, + n_data_points > threshold_n_dps + ) |> + dplyr::group_by(lab_site) |> + dplyr::arrange(date, "desc") |> + dplyr::mutate( + log_conc = log(!!sym(conc_col_name)), prev_log_conc = lag(log_conc, 1), prev_date = lag(date, 1), diff_log_conc = log_conc - prev_log_conc, diff_time = as.numeric(difftime(date, prev_date)), rho = diff_log_conc / diff_time - ) %>% - select(date, lab_wwtp_unique_id, rho) %>% - distinct() + ) |> + dplyr::select(date, lab_site, rho) |> + dplyr::distinct() # Combine stats with ww data - ww_rho <- ww_data %>% - left_join(ww_stats, by = c("lab_wwtp_unique_id", "date")) + ww_rho <- ww_data |> + left_join(ww_stats, by = c("lab_site", "date")) # compute z scores and flag - ww_z_scored <- ww_rho %>% + ww_z_scored <- ww_rho |> dplyr::left_join( - ww_rho %>% - dplyr::group_by(lab_wwtp_unique_id) %>% + ww_rho |> + dplyr::group_by(lab_site) |> dplyr::summarise( mean_rho = mean(rho, na.rm = TRUE), std_rho = sd(rho, na.rm = TRUE), - mean_conc = mean(ww, na.rm = TRUE), - std_conc = sd(ww, na.rm = TRUE) + mean_conc = mean(!!sym(conc_col_name), na.rm = TRUE), + std_conc = sd(!!sym(conc_col_name), na.rm = TRUE) ), - by = "lab_wwtp_unique_id" - ) %>% - dplyr::group_by(lab_wwtp_unique_id) %>% + by = "lab_site" + ) |> + dplyr::group_by(lab_site) |> mutate( - z_score_conc = (ww - mean_conc) / std_conc, + z_score_conc = (!!sym(conc_col_name) - mean_conc) / std_conc, z_score_rho = (rho - mean_rho) / std_rho - ) %>% + ) |> dplyr::mutate( z_score_rho_t_plus_1 = lead(z_score_rho, 1), flagged_for_removal_conc = dplyr::case_when( @@ -140,13 +139,19 @@ flag_ww_outliers <- function(ww_data, is.na(z_score_rho) ~ NA, TRUE ~ 0 ) - ) %>% + ) |> dplyr::mutate(flag_as_ww_outlier = dplyr::case_when( flagged_for_removal_rho == 1 ~ 1, flagged_for_removal_conc == 1 ~ 1, TRUE ~ 0 - )) %>% + )) |> dplyr::ungroup() - return(ww_z_scored) + ww_w_outliers_flagged <- ww_z_scored |> + dplyr::select( + colnames(ww_data), + flag_as_ww_outlier + ) + + return(ww_w_outliers_flagged) } diff --git a/man/flag_ww_outliers.Rd b/man/flag_ww_outliers.Rd new file mode 100644 index 00000000..af3ef6c4 --- /dev/null +++ b/man/flag_ww_outliers.Rd @@ -0,0 +1,40 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/preprocessing.R +\name{flag_ww_outliers} +\alias{flag_ww_outliers} +\title{Flag WW outliers} +\usage{ +flag_ww_outliers( + ww_data, + conc_col_name = "genome_copies_per_ml", + rho_threshold = 2, + log_conc_threshold = 3, + threshold_n_dps = 1 +) +} +\arguments{ +\item{ww_data}{dataframe containing the following columns: site, lab, +lab_site, date, a column for concentration, and below_lod} + +\item{conc_col_name}{string, name of the column containing the concentration +measurements in the wastewater data, default is \code{genome_copies_per_ml}} + +\item{rho_threshold}{float indicating the z-score threshold for "jump"} + +\item{log_conc_threshold}{float indicating the z-score threshold for +log concentration} + +\item{threshold_n_dps}{min number of data points above the LOD per lab-site} +} +\value{ +ww_w_outliers_flaged dataframe containing all of the columns in +ww_data input dataframe plus an additional column \code{flag_as_ww_outlier} +which contains a 0 if the datapoint is not an outlier and a 1 if it is +an outlier. +} +\description{ +Flag WW outliers +} +\examples{ +ww_data_outliers_flagged <- flag_ww_outliers(ww_data) +} diff --git a/man/get_stan_data.Rd b/man/get_stan_data.Rd new file mode 100644 index 00000000..45b19bb0 --- /dev/null +++ b/man/get_stan_data.Rd @@ -0,0 +1,82 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/get_stan_data.R +\name{get_stan_data} +\alias{get_stan_data} +\title{Get stan data} +\usage{ +get_stan_data( + model_type, + forecast_date, + forecast_time, + calibration_time, + input_ww_data, + input_hosp_data, + generation_interval, + inf_to_hosp, + infection_feedback_pmf, + params, + compute_likelihood = 1, + ww_outlier_col_name = "flag_as_ww_outlier", + lod_col_name = "below_LOD", + ww_measurement_col_name = "ww", + ww_value_lod_col_name = "lod_sewage", + hosp_value_col_name = "daily_hosp_admits" +) +} +\arguments{ +\item{model_type}{string indicating which model we are getting data for +Options are \code{ww} or \code{hosp}} + +\item{forecast_date}{string indicating the forecast date} + +\item{forecast_time}{integer indicating the number of days to make a forecast +for} + +\item{calibration_time}{integer indicating the max duration in days that +the model is calibrated to hospital admissions for} + +\item{input_ww_data}{a dataframe with the input wastewater data} + +\item{input_hosp_data}{a dataframe with the input hospital admissions data} + +\item{generation_interval}{a vector with a zero-truncated normalized pmf of +the generation interval} + +\item{inf_to_hosp}{a vector with a normalized pmf of the delay from infection +to hospital admissions} + +\item{infection_feedback_pmf}{a vector with a normalized pmf dictating the +delay of infection feedback} + +\item{params}{a dataframe of parameter names and numeric values} + +\item{compute_likelihood}{indicator variable telling stan whether or not to +compute the likelihood, default = \code{1}} + +\item{ww_outlier_col_name}{A string representing the name of the +column in the input_ww_data that provides a 0 if the data point is not an +outlier to be excluded from the model fit, or a 1 if it is to be excluded +default value is \code{flag_as_ww_outlier}} + +\item{lod_col_name}{A string representing the name of the +column in the input_ww_data that provides a 0 if the data point is not above +the LOD and a 1 if the data is below the LOD, default value is \code{below_LOD}} + +\item{ww_measurement_col_name}{A string representing the name of the column +in the input_ww_data that indicates the wastewater measurement value in +natural scale, default is \code{ww}} + +\item{ww_value_lod_col_name}{A string representing the name of the column +in the input_ww_data that indicates the value of the LOD in natural scale, +default is \code{lod_sewage}} + +\item{hosp_value_col_name}{A string representing the name of the column +in the input_hosp-data that indicates the number of daily hospital +admissions, default is \code{daily_hosp_admits}} +} +\value{ +a list named variables to pass to stan +} +\description{ +Get stan data +} diff --git a/man/preprocess_ww_data.Rd b/man/preprocess_ww_data.Rd new file mode 100644 index 00000000..62bee0aa --- /dev/null +++ b/man/preprocess_ww_data.Rd @@ -0,0 +1,32 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/preprocessing.R +\name{preprocess_ww_data} +\alias{preprocess_ww_data} +\title{Get input wastewater data} +\usage{ +preprocess_ww_data( + ww_data, + forecast_date, + conc_col_name = "genome_copies_per_ml" +) +} +\arguments{ +\item{ww_data}{dataframe containing the following columns: site, lab, +date, a column for concentration, and lod} + +\item{forecast_date}{The forecast date for this iteration, +formatted as a character string in IS08601 format (YYYY-MM-DD).} + +\item{conc_col_name}{name of the column containing the concentration +measurements in the wastewater data, default is \code{genome_copies_per_ml}} +} +\value{ +a dataframe containing the transformed and clean NWSS data +at the site and lab label for the forecast date and location specified +} +\description{ +Get input wastewater data +} +\examples{ +ww_data_preprocessed <- preprocess_ww_data(ww_data, "2023-12-01") +} diff --git a/vignettes/wwinference.Rmd b/vignettes/wwinference.Rmd index b03ebbe6..5095c17d 100644 --- a/vignettes/wwinference.Rmd +++ b/vignettes/wwinference.Rmd @@ -90,17 +90,25 @@ params <- get_params( ) ) -# Pre-process the wastewater dataset -ww_data <- ww_data |> - dplyr::left_join( - ww_data |> - dplyr::distinct(lab, site) |> - dplyr::mutate( - lab_site = dplyr::row_number() - ), - by = c("lab", "site") - ) |> - dplyr::mutate(lab_site_name = glue::glue("Site: {site}, Lab: {lab}")) +# Specify the forecast date. Let's pretend we want to make a forecast on +# 2023-12-07, with our hospital admissions data that is available up until +# 2023-11-29 +forecast_date <- "2023-12-07" + +# Pre-process the wastewater dataset: this function adds a few columns to +# the original wastewater dataset. First, it assigns a unique identifier +# the unique combinations of labs and sites, since this is the unit we will +# use for estimating the observation error in the reported measurements. +# Second it adds a column `below_lod` which is an indicator of whether the +# reported concentration is above or below the lod. If the point is below the +# LOD, the model will treat this observation as censored. +# Third, it adds a column `flag_as_ww_outlier` that indicates whether the +# measurement is identified as an outlier by our algorithm and the default +# thresholds. The user can still choose to include these in the fitting process +# later on. Lastly, it filters the wastewater measurements to dates before the +# forecast date specified above, just in case the wastewater data passed in +# isn't vintaged. +ww_data_preprocessed <- preprocess_ww_data(ww_data, forecast_date) # Pre-process the hospital admissions dataset ``` From f09fee2f7e78b72119cf4efd3ae9df34141adfe5 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Sun, 30 Jun 2024 13:53:49 -0400 Subject: [PATCH 027/103] add minimal hosp data preprocessing --- NAMESPACE | 1 + R/preprocessing.R | 47 +++++++++++++++++++++++++++++++++---- man/preprocess_hosp_data.Rd | 36 ++++++++++++++++++++++++++++ man/preprocess_ww_data.Rd | 2 +- vignettes/wwinference.Rmd | 18 ++++++++++---- 5 files changed, 94 insertions(+), 10 deletions(-) create mode 100644 man/preprocess_hosp_data.Rd diff --git a/NAMESPACE b/NAMESPACE index 752eb1fc..d41f6085 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -10,6 +10,7 @@ export(get_stan_data) export(make_hospital_onset_delay_pmf) export(make_incubation_period_pmf) export(make_reporting_delay_pmf) +export(preprocess_hosp_data) export(preprocess_ww_data) export(simulate_double_censored_pmf) export(validate_paramlist) diff --git a/R/preprocessing.R b/R/preprocessing.R index 10047f99..744afc18 100644 --- a/R/preprocessing.R +++ b/R/preprocessing.R @@ -6,7 +6,7 @@ #' @param conc_col_name name of the column containing the concentration #' measurements in the wastewater data, default is `genome_copies_per_ml` #' -#' @return a dataframe containing the transformed and clean NWSS data +#' @return a dataframe containing the transformed and clean wastewater data #' at the site and lab label for the forecast date and location specified #' @export #' @@ -32,9 +32,9 @@ preprocess_ww_data <- function(ww_data, # Get an extra column that identifies the wastewater outliers using the # default parameters - ww_w_outliers <- flag_ww_outliers(ww_data) |> + ww_preprocessed <- flag_ww_outliers(ww_data) |> dplyr::mutate( - forecast_date = !!forecast_date + forecast_date = lubridate::ymd(!!forecast_date) ) |> # In case the wastewater data being passed in isn't vintaged, we want to # make sure we don't include values that are past the forecast date @@ -43,9 +43,48 @@ preprocess_ww_data <- function(ww_data, ) - return(ww_w_outliers) + return(ww_preprocessed) } +#' Get input hospital admissions data +#' @param hosp_data dataframe containing the following columns: date, +#' a count column, and a population size column +#' @param forecast_date The forecast date for this iteration, +#' formatted as a character string in IS08601 format (YYYY-MM-DD). +#' @param count_col_name name of the column containing the epidemiological +#' indicator, default is `daily_hosp_admits` +#' @param pop_size_col_name name of the column containing the population size +#' of that the counts are coming from, default is `state_pop` +#' +#' @return a dataframe containing the hospital admissions data renamed to +#' have the following columns `date`, `count`, `total_pop` and `forecast_date` +#' @export +#' +#' @examples +#' hosp_data_preprocessed <- preprocess_hospdata(hosp_data, "2023-12-01") +preprocess_hosp_data <- function(hosp_data, + forecast_date, + count_col_name = "daily_hosp_admits", + pop_size_col_name = "state_pop") { + hosp_data_preprocessed <- hosp_data |> + dplyr::rename( + count = {{ count_col_name }}, + total_pop = {{ pop_size_col_name }} + ) |> + dplyr::mutate( + forecast_date = lubridate::ymd(!!forecast_date) + ) |> + # In case the count data being passed in isn't vintaged, we want to + # make sure we don't include values that are past the forecast date + dplyr::filter( + date < forecast_date + ) + + + return(hosp_data_preprocessed) +} + + #' Flag WW outliers diff --git a/man/preprocess_hosp_data.Rd b/man/preprocess_hosp_data.Rd new file mode 100644 index 00000000..f17abc9a --- /dev/null +++ b/man/preprocess_hosp_data.Rd @@ -0,0 +1,36 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/preprocessing.R +\name{preprocess_hosp_data} +\alias{preprocess_hosp_data} +\title{Get input hospital admissions data} +\usage{ +preprocess_hosp_data( + hosp_data, + forecast_date, + count_col_name = "daily_hosp_admits", + pop_size_col_name = "state_pop" +) +} +\arguments{ +\item{hosp_data}{dataframe containing the following columns: date, +a count column, and a population size column} + +\item{forecast_date}{The forecast date for this iteration, +formatted as a character string in IS08601 format (YYYY-MM-DD).} + +\item{count_col_name}{name of the column containing the epidemiological +indicator, default is \code{daily_hosp_admits}} + +\item{pop_size_col_name}{name of the column containing the population size +of that the counts are coming from, default is \code{state_pop}} +} +\value{ +a dataframe containing the hospital admissions data renamed to +have the following columns \code{date}, \code{count}, \code{total_pop} and \code{forecast_date} +} +\description{ +Get input hospital admissions data +} +\examples{ +hosp_data_preprocessed <- preprocess_hospdata(hosp_data, "2023-12-01") +} diff --git a/man/preprocess_ww_data.Rd b/man/preprocess_ww_data.Rd index 62bee0aa..db409d10 100644 --- a/man/preprocess_ww_data.Rd +++ b/man/preprocess_ww_data.Rd @@ -21,7 +21,7 @@ formatted as a character string in IS08601 format (YYYY-MM-DD).} measurements in the wastewater data, default is \code{genome_copies_per_ml}} } \value{ -a dataframe containing the transformed and clean NWSS data +a dataframe containing the transformed and clean wastewater data at the site and lab label for the forecast date and location specified } \description{ diff --git a/vignettes/wwinference.Rmd b/vignettes/wwinference.Rmd index 5095c17d..e3ff5887 100644 --- a/vignettes/wwinference.Rmd +++ b/vignettes/wwinference.Rmd @@ -108,16 +108,23 @@ forecast_date <- "2023-12-07" # later on. Lastly, it filters the wastewater measurements to dates before the # forecast date specified above, just in case the wastewater data passed in # isn't vintaged. -ww_data_preprocessed <- preprocess_ww_data(ww_data, forecast_date) - -# Pre-process the hospital admissions dataset +ww_data_preprocessed <- wwinference::preprocess_ww_data(ww_data, forecast_date) + +# Pre-process the hospital admissions dataset: This function adds a column to +# indicate the forecast date and standardizes the column names. +# Eventually we could implement outlier detection, +# data exclusion here but we will keep as simple as we can for now. +hosp_data_preprocessed <- wwinference::preprocess_hosp_data( + hosp_data, + forecast_date +) ``` We'll make some plots of the data just to make sure it looks like what we'd expect: ```{r} # Add plots -ggplot(ww_data) + +ggplot(ww_data_preprocessed) + geom_point( aes( x = date, y = genome_copies_per_ml, @@ -126,11 +133,12 @@ ggplot(ww_data) + show.legend = FALSE ) + geom_point( - data = ww_data |> filter(genome_copies_per_ml <= lod), + data = ww_data_preprocessed |> filter(genome_copies_per_ml <= lod), aes(x = date, y = genome_copies_per_ml, color = "red"), show.legend = FALSE ) + geom_hline(aes(yintercept = lod), linetype = "dashed") + + geom_vline(aes(xintercept = forecast_date), linetype = "dashed") + facet_wrap(~lab_site_name, scales = "free") + xlab("") + ylab("Genome copies/mL") + From 9f8ad5d94e56f4975e8625dc402f5a0d911f820c Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Sun, 30 Jun 2024 14:23:30 -0400 Subject: [PATCH 028/103] add internal package data for covid GI and delay --- data-raw/covid_pmfs.R | 24 ++++++++++++++++++ data/generation_interval.rda | Bin 0 -> 244 bytes data/inf_to_hosp.rda | Bin 0 -> 625 bytes vignettes/wwinference.Rmd | 46 +++++++++++++++++++++++++---------- 4 files changed, 57 insertions(+), 13 deletions(-) create mode 100644 data-raw/covid_pmfs.R create mode 100644 data/generation_interval.rda create mode 100644 data/inf_to_hosp.rda diff --git a/data-raw/covid_pmfs.R b/data-raw/covid_pmfs.R new file mode 100644 index 00000000..ce8dd1ba --- /dev/null +++ b/data-raw/covid_pmfs.R @@ -0,0 +1,24 @@ +params <- get_params( + system.file("extdata", "example_params.toml", + package = "wwinference" + ) +) + +generation_interval <- withr::with_seed(42, { + wwinference::simulate_double_censored_pmf( + max = params$gt_max, meanlog = params$mu_gi, + sdlog = params$sigma_gi, fun_dist = rlnorm, n = 5e6 + ) |> cfaforecastrenewalww::drop_first_and_renormalize() +}) + +inc <- wwinference::make_incubation_period_pmf( + params$backward_scale, params$backward_shape, params$r +) +sym_to_hosp <- wwinference::make_hospital_onset_delay_pmf( + params$neg_binom_mu, + params$neg_binom_size +) +inf_to_hosp <- wwinference::make_reporting_delay_pmf(inc, sym_to_hosp) + +usethis::use_data(generation_interval, overwrite = TRUE) +usethis::use_data(inf_to_hosp, overwrite = TRUE) diff --git a/data/generation_interval.rda b/data/generation_interval.rda new file mode 100644 index 0000000000000000000000000000000000000000..a6f2eb39d72907979c7bc972a5cbf7680808be81 GIT binary patch literal 244 zcmZ>Y%CIzaj8qGb%vUzyXJF8-|NnR6%BH{r27*h3EdMNeJO6eBgJs5AhCK=@3tAML zWOO1IFieq@Vn{Jz00L$Q240@FW@lep-<7Wf-B&bU0*OKZF9VlxicH~)hSml~ZpEgM zdraKD0uD+cQy$f*9b<`0n!Rnl-s##C1p@2K%D9%X+-+2OwCmh#t0X12hpf7X6dF^F zPJN1(wd=8NRu-Mz@b}cB_ZE}gCRyC+R4nY}FUx4X^IX!CYqEv&r}hTMDv!)rI5K6A<3_nqzAO#U-Co%tvKv0Np%m6Pq0vBL)djW%ED literal 0 HcmV?d00001 diff --git a/data/inf_to_hosp.rda b/data/inf_to_hosp.rda new file mode 100644 index 0000000000000000000000000000000000000000..bf63b3eafda18c3aba8569ab9268bd7bba81e1ba GIT binary patch literal 625 zcmV-%0*?JcT4*^jL0KkKS;x2DY5)Lh|NsC0+u#59xw)^degEJ0+~;rqes`C5`+p<< z{qNp}yZNvITWK^wrU{VH9-~2~fChnxG|*^hGyn!b27m)YKm$ND$PEFY4H*wmX`s^~ zkYO}Ipgllj0LTCV>KQZ!n1(~t8fefN8UPx4fEs8t^%^uX4F*64jQ{}C)BpfzBB!Lt zk5fZU05kvq02&Pd000000Lh?e00000007V$03M(KZ$dev1OchFBH&1i01_m@0dW>x zb{ZgYfmp-{p=0A^7#tS6*v+dO4h?KBA{ zx3%ncJxiuN>A=>i!)+26Eh^=_$hY5@4Du(Pxl9m=8?(vYoNND5&UhPgN;6ogT{k%WWeRF6zK4bD$W7A(*jjcRKG8 zXE-&cgjaunT+qrAkW;XY zIoSUZW Date: Sun, 30 Jun 2024 17:42:07 -0400 Subject: [PATCH 029/103] add functions to create stan data, modify model to be more generally based on count data rather than only hosp --- NAMESPACE | 10 + R/compile_model.R | 91 ++++ R/get_stan_data.R | 864 +++++++++++++++++++++++++----------- R/preprocessing.R | 32 +- R/utils.R | 16 + inst/stan/wwinference.stan | 28 +- man/add_time_indexing.Rd | 27 ++ man/compile_model.Rd | 67 +++ man/create_dir.Rd | 15 + man/flag_ww_outliers.Rd | 2 +- man/get_count_data_sizes.Rd | 52 +++ man/get_count_indices.Rd | 20 + man/get_count_values.Rd | 32 ++ man/get_stan_data.Rd | 69 +-- man/get_subpop_data.Rd | 28 ++ man/get_ww_data_indices.Rd | 40 ++ man/get_ww_data_sizes.Rd | 28 ++ man/get_ww_values.Rd | 45 ++ man/preprocess_ww_data.Rd | 6 +- vignettes/wwinference.Rmd | 54 +++ 20 files changed, 1197 insertions(+), 329 deletions(-) create mode 100644 R/compile_model.R create mode 100644 man/add_time_indexing.Rd create mode 100644 man/compile_model.Rd create mode 100644 man/create_dir.Rd create mode 100644 man/get_count_data_sizes.Rd create mode 100644 man/get_count_indices.Rd create mode 100644 man/get_count_values.Rd create mode 100644 man/get_subpop_data.Rd create mode 100644 man/get_ww_data_indices.Rd create mode 100644 man/get_ww_data_sizes.Rd create mode 100644 man/get_ww_values.Rd diff --git a/NAMESPACE b/NAMESPACE index d41f6085..5f869f8b 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -1,12 +1,22 @@ # Generated by roxygen2: do not edit by hand export(add_pmfs) +export(add_time_indexing) +export(compile_model) +export(create_dir) export(drop_first_and_renormalize) export(flag_ww_outliers) export(generate_simulated_data) +export(get_count_data_sizes) +export(get_count_indices) +export(get_count_values) export(get_ind_m) export(get_params) export(get_stan_data) +export(get_subpop_data) +export(get_ww_data_indices) +export(get_ww_data_sizes) +export(get_ww_values) export(make_hospital_onset_delay_pmf) export(make_incubation_period_pmf) export(make_reporting_delay_pmf) diff --git a/R/compile_model.R b/R/compile_model.R new file mode 100644 index 00000000..6b883af7 --- /dev/null +++ b/R/compile_model.R @@ -0,0 +1,91 @@ +#' Compile a stan model while pointing at the package default +#' include directory (`stan`) for #include statements +#' +#' @description +#' This function reads in and optionally compiles a Stan model. +#' It is written to search the installed package `stan` directory +#' for additional source files to include. Within each stan file, +#' use #include {path to your file with the `stan` directory}.stan +#' +#' @details The code for this function has been adapted +#' from code written (under an MIT license) as part of +#' the [`epinowcast`](https://github.com/epinowcast/epinowcast) +#' R package. +#' +#' @param model_filepath path to .stan file defining the model +#' @param include_paths path(s) to directories to search for files +#' specified in #include statements. Passed to [cmdstanr::cmdstan_model()]. +#' Defaults to the `stan` subdirectory of the installed +#' `cfaforecastrenewalww` package. +#' @param threads Number of threads to use in model compilation, +#' as an integer. Passed to [cmdstanr::cmdstan_model()]. +#' Default `FALSE` (use single-threaded compilation). +#' @param target_dir Directory in which to save the compiled +#' stan model binary. Passed as the `dir` keyword argument to +#' [cmdstanr::cmdstan_model()]. Defaults to a temporary directory +#' for the R session (the output of [tempdir()]). +#' @param stanc_options Options for the stan compiler passed to +#' [cmdstanr::cmdstan_model()], as a list. See that function's +#' documentation for more details. Default `list()` (use default +#' options). +#' @param cpp_options Options for the C++ compiler passed to +#' [cmdstanr::cmdstan_model()], as a list. See that function's +#' documentation for more details. Default `list()` (use default +#' options). +#' @param verbose Write detailed output to the terminal while +#' executing the function? Boolean, default `TRUE`. +#' @param ... Additional keyword arguments passed to +#' [cmdstanr::cmdstan_model()]. +#' +#' @return The resulting `cmdstanr` model object, as the output +#' of [cmdstanr::cmdstan_model()]. +#' @export +compile_model <- function(model_filepath, + include_paths = system.file( + "stan", + package = "cfaforecastrenewalww" + ), + threads = FALSE, + target_dir = tempdir(), + stanc_options = list(), + cpp_options = list(), + verbose = TRUE, + ...) { + if (verbose) { + cli::cli_inform( + glue::glue(paste0( + "Using model source file: ", + "{model_filepath}" + )) + ) + cli::cli_inform( + sprintf( + "Using include paths: %s", + toString(include_paths) + ) + ) + } + + create_dir(target_dir) + + model <- cmdstanr::cmdstan_model( + model_filepath, + include_paths = include_paths, + compile = TRUE, + stanc_options = stanc_options, + cpp_options = cpp_options, + threads = threads, + dir = target_dir, + ... + ) + + if (verbose) { + cli::cli_inform(paste0( + "Model compiled or loaded successfully; ", + "model executable binary located at: ", + "{model$exe_file()}" + )) + } + + return(model) +} diff --git a/R/get_stan_data.R b/R/get_stan_data.R index aaea9342..cfc8a099 100644 --- a/R/get_stan_data.R +++ b/R/get_stan_data.R @@ -1,74 +1,63 @@ -#' Get stan data +#' Get stan data for ww + hosp model #' -#' @param model_type string indicating which model we are getting data for -#' Options are `ww` or `hosp` +#' @param input_count_data a dataframe with the input count data, must have +#' the following columns: date, count, total_pop +#' @param input_ww_data a dataframe with the input wastewater data with no gaps, +#' must have the following columns: date, site, lab, genome_copies_per_ml, +#' site_pop, below_lod, and if removing outliers, flag_as_ww_outlier #' @param forecast_date string indicating the forecast date -#' @param forecast_time integer indicating the number of days to make a forecast -#' for +#' @param forecast_horizon integer indicating the number of days to make a +#' forecast for #' @param calibration_time integer indicating the max duration in days that -#' the model is calibrated to hospital admissions for -#' @param input_ww_data a dataframe with the input wastewater data -#' @param input_hosp_data a dataframe with the input hospital admissions data +#' the model is calibrated to the count data for #' @param generation_interval a vector with a zero-truncated normalized pmf of #' the generation interval -#' @param inf_to_hosp a vector with a normalized pmf of the delay from infection -#' to hospital admissions +#' @param inf_to_count_delay a vector with a normalized pmf of the delay from +#' infection to counts #' @param infection_feedback_pmf a vector with a normalized pmf dictating the #' delay of infection feedback #' @param params a dataframe of parameter names and numeric values +#' @param exclude_ww_outliers boolean indicating whether or not to remove +#' the flagged ww_outliers, default = `TRUE` #' @param compute_likelihood indicator variable telling stan whether or not to #' compute the likelihood, default = `1` -#' @param ww_outlier_col_name A string representing the name of the -#' column in the input_ww_data that provides a 0 if the data point is not an -#' outlier to be excluded from the model fit, or a 1 if it is to be excluded -#' default value is `flag_as_ww_outlier` -#' @param lod_col_name A string representing the name of the -#' column in the input_ww_data that provides a 0 if the data point is not above -#' the LOD and a 1 if the data is below the LOD, default value is `below_LOD` -#' @param ww_measurement_col_name A string representing the name of the column -#' in the input_ww_data that indicates the wastewater measurement value in -#' natural scale, default is `ww` -#' @param ww_value_lod_col_name A string representing the name of the column -#' in the input_ww_data that indicates the value of the LOD in natural scale, -#' default is `lod_sewage` -#' @param hosp_value_col_name A string representing the name of the column -#' in the input_hosp-data that indicates the number of daily hospital -#' admissions, default is `daily_hosp_admits` #' -#' @return a list named variables to pass to stan +#' @return a list of named variables to pass to stan #' @export -get_stan_data <- function(model_type, +get_stan_data <- function(input_count_data, + input_ww_data, forecast_date, - forecast_time, + forecast_horizon, calibration_time, - input_ww_data, - input_hosp_data, generation_interval, - inf_to_hosp, + inf_to_count_delay, infection_feedback_pmf, params, - compute_likelihood = 1, - ww_outlier_col_name = "flag_as_ww_outlier", - lod_col_name = "below_LOD", - ww_measurement_col_name = "ww", - ww_value_lod_col_name = "lod_sewage", - hosp_value_col_name = "daily_hosp_admits") { + exclude_ww_outliers = TRUE, + compute_likelihood = 1) { # Assign parameter names par_names <- colnames(params) for (i in seq_along(par_names)) { assign(par_names[i], as.double(params[i])) } - # Indicator variable whether or not to include ww in likelihood - include_ww <- ifelse(model_type == "ww", 1, 0) + # Get the last date that there were observations of the epidemiological + # indicator (aka cases or hospital admissions counts) + last_count_data_date <- max(input_count_data$date, na.rm = TRUE) + + # Test to see if ww_data_present + ww_data_present <- nrow(input_ww_data) != 0 + if (ww_data_present == FALSE) { + message("No wastewater data present") + } - last_hosp_data_date <- get_last_hosp_data_date(input_hosp_data) - # Get state pop - pop <- input_hosp_data |> - dplyr::select(pop) |> + # Get the total pop, coming from the larger population generating the + # count data + pop <- input_count_data |> + dplyr::select(total_pop) |> unique() |> - dplyr::pull(pop) + dplyr::pull(total_pop) stopifnot( "More than one population size in training data" = @@ -76,107 +65,110 @@ get_stan_data <- function(model_type, ) - if (include_ww == 1) { - # Test for presence of column names + if (isTRUE(exclude_ww_outliers)) { + # Test for presence of needed column names stopifnot( "Outlier column name isn't present in input dataset" = - ww_outlier_col_name %in% colnames(input_ww_data) + "flag_as_ww_outlier" %in% colnames(input_ww_data) ) - # Test to see if ww_data_present - ww_data_present <- nrow(input_ww_data) != 0 - if (ww_data_present == FALSE) { - message("No wastewater data present") - } - # Filter out wastewater outliers and arrange data for indexing ww_data <- input_ww_data |> - dplyr::filter({{ ww_outlier_col_name }} != 1) |> - dplyr::arrange(date, site_index) + dplyr::filter(flag_as_ww_outlier != 1) |> + dplyr::arrange(date, lab_site_index) + } else { + ww_data <- input_ww_data |> + dplyr::arrange(date, lab_site_index) + } - ww_data_sizes <- get_ww_data_sizes( - ww_data, - lod_col_name - ) - ww_indices <- get_ww_data_indices(ww_data, - input_hosp_data, - owt = ww_data_sizes$owt, - lod_col_name = lod_col_name - ) - ww_values <- get_ww_values( - ww_data, - ww_measurement_col_name, - ww_value_lod_col_name, - ) + # Returns a list with the numbers of elements needed for the stan model + ww_data_sizes <- get_ww_data_sizes( + ww_data, + lod_col_name = "below_lod" + ) + # Returns the vectors of indices you need to map latent variables to + # observations + ww_indices <- get_ww_data_indices( + ww_data, + input_count_data, + owt = ww_data_sizes$owt, + lod_col_name = "below_lod" + ) + # Returns a list of the vectors of lod values, the site population sizes in + # order of the site index, a vector of observations of the log of + # the genome copies per ml + ww_values <- get_ww_values( + ww_data, + ww_measurement_col_name = "genome_copies_per_ml", + ww_lod_value_col_name = "lod", + ww_site_pop_col_name = "site_pop" + ) - stopifnot( - "Wastewater sampled times not equal to length of input ww data" = - length(ww_indices$ww_sampled_times) == ww_data_sizes$owt - ) + stopifnot( + "Wastewater sampled times not equal to length of input ww data" = + length(ww_indices$ww_sampled_times) == ww_data_sizes$owt + ) + message( + "Prop of population size covered by wastewater: ", + sum(ww_values$pop_ww) / pop + ) - message( - "Prop of population size covered by wastewater: ", - sum(ww_values$pop_ww) / pop - ) + # Logic to determine the number of subpopulations to estimate R(t) for: + # First determine if we need to add an additional subpopulation + add_auxiliary_site <- ifelse(pop >= sum(ww_values$pop_ww), TRUE, FALSE) + # Then get the number of subpopulations, the population to normalize by + # (sum of the subpopulations), and the vector of sizes of each subpopulation + subpop_data <- get_subpop_data(add_auxiliary_site, + state_pop = pop, + pop_ww = ww_values$pop_ww, + n_ww_sites = ww_data_sizes$n_ww_sites + ) - # Logic to determine the number of subpopulations to estimate R(t) for: - # First determine if we need to add an additional subpopulation - add_auxiliary_site <- ifelse(pop >= sum(ww_values$pop_ww), TRUE, FALSE) - # Then get the number of subpopulations, the population to normalize by - # (sum of the subpopulations), and the vector of sizes of each subpopulation - subpop_data <- get_subpop_data(add_auxiliary_site, - state_pop = pop, - pop_ww = ww_values$pop_ww, - n_ww_sites = ww_data_sizes$n_ww_sites + # Get the remaining things needed for both models + input_count_data_filtered <- input_count_data |> + dplyr::filter( + date >= last_count_data_date - lubridate::days(calibration_time) ) - } else { # Hospital admissions only model) - # Still need to specify wastewater input data, so set as 0s. Won't get - # used by stan to compute the likelihood. None of these will be used. - owt <- 1 - ww_sampled_times <- c(1) - log_conc <- c(1) - } + count_data <- add_time_indexing(input_count_data_filtered) - # Get the remaining things needed for both models - hosp_data <- add_time_indexing(input_hosp_data) - hosp_data_sizes <- get_hosp_data_sizes( - input_hosp_data = hosp_data, + # Get the sizes of all the elements + count_data_sizes <- get_count_data_sizes( + input_count_data = count_data, forecast_date = forecast_date, - forecast_time = forecast_time, + forecast_horizon = forecast_horizon, calibration_time = calibration_time, - last_hosp_data_date = last_hosp_data_date, - uot = uot, - hosp_value_col_name = hosp_value_col_name + last_count_data_date = last_count_data_date, + uot = uot ) - hosp_indices <- get_hosp_indices(hosp_data) - hosp_values <- get_hosp_values( - hosp_data, - ot = hosp_data_sizes$ot, - ht = hosp_data_sizes$ht, - hosp_value_col_name + count_indices <- get_count_indices(count_data) + count_values <- get_count_values( + count_data, + ot = count_data_sizes$ot, + ht = count_data_sizes$ht, + count_col_name = "count" ) - if (include_ww == 1) { + if (isTRUE(exclude_ww_outliers)) { message( "Removed ", nrow(input_ww_data) - ww_data_sizes$owt, " outliers from WW data" ) } - # matrix to transform IHR from weekly to daily + # matrix to transform P(count|I) from weekly to daily ind_m <- get_ind_m( - hosp_data_sizes$ot + hosp_data_sizes$ht, - hosp_data_sizes$n_weeks + count_data_sizes$ot + count_data_sizes$ht, + count_data_sizes$n_weeks ) # matrix to transform p_hosp RW from weekly to daily p_hosp_m <- get_ind_m( - uot + hosp_data_sizes$ot + hosp_data_sizes$ht, - hosp_data_sizes$tot_weeks + uot + count_data_sizes$ot + count_data_sizes$ht, + count_data_sizes$tot_weeks ) # Estimate of number of initial infections - i0 <- mean(hosp_values$hosp_admits[1:7], na.rm = TRUE) / p_hosp_mean + i0 <- mean(count_values$count[1:7], na.rm = TRUE) / p_hosp_mean # package up parameters for stan data object viral_shedding_pars <- c( @@ -184,152 +176,516 @@ get_stan_data <- function(model_type, duration_shedding_mean, duration_shedding_sd ) - hosp_delay_max <- length(inf_to_hosp) - - if (model_type == "ww") { - data_renewal <- list( - gt_max = gt_max, - hosp_delay_max = hosp_delay_max, - inf_to_hosp = inf_to_hosp, - dur_inf = dur_inf, - mwpd = ml_of_ww_per_person_day, - ot = hosp_data_sizes$ot, - n_subpops = subpop_data$n_subpops, - n_ww_sites = ww_data_sizes$n_ww_sites, - n_ww_lab_sites = ww_data_sizes$n_ww_lab_sites, - owt = ww_data_sizes$owt, - oht = hosp_data_sizes$oht, - n_censored = ww_data_sizes$n_censored, - n_uncensored = ww_data_sizes$n_uncensored, - uot = uot, - ht = hosp_data_sizes$ht, - n_weeks = hosp_data_sizes$n_weeks, - ind_m = ind_m, - tot_weeks = hosp_data_sizes$tot_weeks, - p_hosp_m = p_hosp_m, - generation_interval = generation_interval, - ts = 1:gt_max, - state_pop = pop, - subpop_size = subpop_data$subpop_size, - norm_pop = subpop_data$norm_pop, - ww_sampled_times = ww_indices$ww_sampled_times, - hosp_times = hosp_indices$hosp_times, - ww_sampled_lab_sites = ww_indices$ww_sampled_lab_sites, - ww_log_lod = ww_values$ww_lod, - ww_censored = ww_indices$ww_censored, - ww_uncensored = ww_indices$ww_uncensored, - hosp = hosp_values$hosp_admits, - day_of_week = hosp_values$day_of_week, - log_conc = ww_values$log_conc, - compute_likelihood = compute_likelihood, - include_ww = include_ww, - include_hosp = 1, - if_l = length(infection_feedback_pmf), - infection_feedback_pmf = infection_feedback_pmf, - # All the priors! - viral_shedding_pars = viral_shedding_pars, # tpeak, viral peak, dur_shed - autoreg_rt_a = autoreg_rt_a, - autoreg_rt_b = autoreg_rt_b, - autoreg_rt_site_a = autoreg_rt_site_a, - autoreg_rt_site_b = autoreg_rt_site_b, - autoreg_p_hosp_a = autoreg_p_hosp_a, - autoreg_p_hosp_b = autoreg_p_hosp_b, - inv_sqrt_phi_prior_mean = inv_sqrt_phi_prior_mean, - inv_sqrt_phi_prior_sd = inv_sqrt_phi_prior_sd, - r_prior_mean = r_prior_mean, - r_prior_sd = r_prior_sd, - log10_g_prior_mean = log10_g_prior_mean, - log10_g_prior_sd = log10_g_prior_sd, - i0_over_n_prior_a = 1 + i0_certainty * (i0 / pop), - i0_over_n_prior_b = 1 + i0_certainty * (1 - (i0 / pop)), - wday_effect_prior_mean = wday_effect_prior_mean, - wday_effect_prior_sd = wday_effect_prior_sd, - initial_growth_prior_mean = initial_growth_prior_mean, - initial_growth_prior_sd = initial_growth_prior_sd, - sigma_ww_site_prior_mean_mean = sigma_ww_site_prior_mean_mean, - sigma_ww_site_prior_mean_sd = sigma_ww_site_prior_mean_sd, - sigma_ww_site_prior_sd_mean = sigma_ww_site_prior_sd_mean, - sigma_ww_site_prior_sd_sd = sigma_ww_site_prior_sd_sd, - eta_sd_sd = eta_sd_sd, - sigma_i0_prior_mode = sigma_i0_prior_mode, - sigma_i0_prior_sd = sigma_i0_prior_sd, - p_hosp_prior_mean = p_hosp_mean, - p_hosp_sd_logit = p_hosp_sd_logit, - p_hosp_w_sd_sd = p_hosp_w_sd_sd, - ww_site_mod_sd_sd = ww_site_mod_sd_sd, - inf_feedback_prior_logmean = infection_feedback_prior_logmean, - inf_feedback_prior_logsd = infection_feedback_prior_logsd, - sigma_rt_prior = sigma_rt_prior, - log_phi_g_prior_mean = log_phi_g_prior_mean, - log_phi_g_prior_sd = log_phi_g_prior_sd, - ww_sampled_sites = ww_indices$ww_sampled_sites, - lab_site_to_site_map = ww_indices$lab_site_to_site_map + inf_to_count_delay_max <- length(inf_to_count_delay) + + data_renewal <- list( + gt_max = gt_max, + inf_to_count_delay_max = inf_to_count_delay_max, + inf_to_count_delay = inf_to_count_delay, + mwpd = ml_of_ww_per_person_day, + ot = count_data_sizes$ot, + n_subpops = subpop_data$n_subpops, + n_ww_sites = ww_data_sizes$n_ww_sites, + n_ww_lab_sites = ww_data_sizes$n_ww_lab_sites, + owt = ww_data_sizes$owt, + oht = count_data_sizes$oht, + n_censored = ww_data_sizes$n_censored, + n_uncensored = ww_data_sizes$n_uncensored, + uot = uot, + ht = count_data_sizes$ht, + n_weeks = count_data_sizes$n_weeks, + ind_m = ind_m, + tot_weeks = count_data_sizes$tot_weeks, + p_hosp_m = p_hosp_m, + generation_interval = generation_interval, + ts = 1:gt_max, + total_pop = pop, + subpop_size = subpop_data$subpop_size, + norm_pop = subpop_data$norm_pop, + ww_sampled_times = ww_indices$ww_sampled_times, + count_times = count_indices$count_times, + ww_sampled_lab_sites = ww_indices$ww_sampled_lab_sites, + ww_log_lod = ww_values$ww_lod, + ww_censored = ww_indices$ww_censored, + ww_uncensored = ww_indices$ww_uncensored, + counts = count_values$counts, + day_of_week = count_values$day_of_week, + log_conc = ww_values$log_conc, + compute_likelihood = compute_likelihood, + include_ww = 1, # hardcoding that we include ww + include_hosp = 1, + if_l = length(infection_feedback_pmf), + infection_feedback_pmf = infection_feedback_pmf, + # All the priors! + viral_shedding_pars = viral_shedding_pars, # tpeak, viral peak, dur_shed + autoreg_rt_a = autoreg_rt_a, + autoreg_rt_b = autoreg_rt_b, + autoreg_rt_site_a = autoreg_rt_site_a, + autoreg_rt_site_b = autoreg_rt_site_b, + autoreg_p_hosp_a = autoreg_p_hosp_a, + autoreg_p_hosp_b = autoreg_p_hosp_b, + inv_sqrt_phi_prior_mean = inv_sqrt_phi_prior_mean, + inv_sqrt_phi_prior_sd = inv_sqrt_phi_prior_sd, + r_prior_mean = r_prior_mean, + r_prior_sd = r_prior_sd, + log10_g_prior_mean = log10_g_prior_mean, + log10_g_prior_sd = log10_g_prior_sd, + i0_over_n_prior_a = 1 + i0_certainty * (i0 / pop), + i0_over_n_prior_b = 1 + i0_certainty * (1 - (i0 / pop)), + wday_effect_prior_mean = wday_effect_prior_mean, + wday_effect_prior_sd = wday_effect_prior_sd, + initial_growth_prior_mean = initial_growth_prior_mean, + initial_growth_prior_sd = initial_growth_prior_sd, + sigma_ww_site_prior_mean_mean = sigma_ww_site_prior_mean_mean, + sigma_ww_site_prior_mean_sd = sigma_ww_site_prior_mean_sd, + sigma_ww_site_prior_sd_mean = sigma_ww_site_prior_sd_mean, + sigma_ww_site_prior_sd_sd = sigma_ww_site_prior_sd_sd, + eta_sd_sd = eta_sd_sd, + sigma_i0_prior_mode = sigma_i0_prior_mode, + sigma_i0_prior_sd = sigma_i0_prior_sd, + p_hosp_prior_mean = p_hosp_mean, + p_hosp_sd_logit = p_hosp_sd_logit, + p_hosp_w_sd_sd = p_hosp_w_sd_sd, + ww_site_mod_sd_sd = ww_site_mod_sd_sd, + inf_feedback_prior_logmean = infection_feedback_prior_logmean, + inf_feedback_prior_logsd = infection_feedback_prior_logsd, + sigma_rt_prior = sigma_rt_prior, + log_phi_g_prior_mean = log_phi_g_prior_mean, + log_phi_g_prior_sd = log_phi_g_prior_sd, + ww_sampled_sites = ww_indices$ww_sampled_sites, + lab_site_to_site_map = ww_indices$lab_site_to_site_map + ) + + return(data_renewal) +} + +#' Get the integer sizes of the wastewater input data +#' +#' @param ww_data Input wastewater dataframe containing one row +#' per observation, with outliers already removed +#' @param lod_col_name A string representing the name of the +#' column in the input_ww_data that provides a 0 if the data point is not above +#' the LOD and a 1 if the data is below the LOD, default value is `below_LOD` +#' +#' @return A list containing the integer sizes of the follow variables that +#' the stan model requires: +#' owt: number of wastewater observations +#' n_censored: number of censored wastewater observations (below the LOD) +#' n_uncensored: number of uncensored wastewter observations (above the LOD) +#' n_ww_sites: number of wastewater sites +#' n_ww_lab_sites: number of unique wastewater site-lab combinations +#' +#' @export +get_ww_data_sizes <- function(ww_data, + lod_col_name = "below_lod") { + ww_data_present <- nrow(ww_data) != 0 + if (isTRUE(ww_data_present)) { + # Test for presence of column names + stopifnot( + "LOD column name isn't present in input dataset" = + lod_col_name %in% colnames(ww_data) ) - } else if (model_type == "hosp") { - data_renewal <- list( - gt_max = gt_max, - hosp_delay_max = hosp_delay_max, - inf_to_hosp = inf_to_hosp, - dur_inf = dur_inf, # this is used bc drift - mwpd = ml_of_ww_per_person_day, - ot = hosp_data_sizes$ot, + + # Number of wastewater observations + owt <- nrow(ww_data) + # Number of censored wastewater observations + n_censored <- sum(ww_data[lod_col_name] == 1) + # Number of uncensored wastewater observations + n_uncensored <- owt - n_censored + + # Number of ww sites + n_ww_sites <- dplyr::n_distinct(ww_data$site_index) + + # Number of unique combinations of wastewater sites and labs + n_ww_lab_sites <- dplyr::n_distinct(ww_data$lab_site_index) + + data_sizes <- list( owt = owt, - oht = hosp_data_sizes$oht, - uot = uot, - ht = hosp_data_sizes$ht, - n_weeks = hosp_data_sizes$n_weeks, - ind_m = ind_m, - tot_weeks = hosp_data_sizes$tot_weeks, - p_hosp_m = p_hosp_m, - generation_interval = generation_interval, - ts = 1:gt_max, - n = pop, - hosp_times = hosp_indices$hosp_times, + n_censored = n_censored, + n_uncensored = n_uncensored, + n_ww_sites = n_ww_sites, + n_ww_lab_sites = n_ww_lab_sites + ) + } else { + data_sizes <- list( + owt = 0, + n_censored = 0, + n_uncensored = 0, + n_ww_sites = 0, + n_ww_lab_sites = 0 + ) + } + + + return(data_sizes) +} + +#' Get wastewater data indices +#' +#' @param ww_data Input wastewater dataframe containing one row +#' per observation, with outliers already removed +#' @param input_hosp_data Input hospital admissions data frame with one row +#' per day and location +#' @param owt number of wastewater observations +#' @param lod_col_name A string representing the name of the +#' column in the input_ww_data that provides a 0 if the data point is not above +#' the LOD and a 1 if the data is below the LOD, default value is `below_LOD` +#' +#' @return A list containing the necessary vectors of indices that +#' the stan model requires: +#' ww_censored: the vector of time points that the wastewater observations are +#' censored (below the LOD) in order of the date and the site index +#' ww_uncensored: the vector of time points that the wastewater observations are +#' uncensored (above the LOD) in order of the date and the site index +#' ww_sampled_times: the vector of time points that the wastewater observations +#' are passed in in log_conc in order of the date and the site index +#' ww_sampled_sites: the vector of sites that correspond to the observations +#' passed in in log_conc in order of the date and the site index +#' ww_sampled_lab_sites: the vector of unique combinations of site and labs +#' that correspond to the observations passed in in log_conc in order of the +#' date and the site index +#' lab_site_to_site_map: the vector of sites that correspond to each lab-site +#' @export +get_ww_data_indices <- function(ww_data, + input_hosp_data, + owt, + lod_col_name = "below_lod") { + # Vector of indices along the list of wastewater concentrations that + # correspond to censored observations + ww_data_present <- nrow(ww_data) != 0 + + if (isTRUE(ww_data_present)) { + ww_data_with_index <- ww_data |> + dplyr::mutate(ind_rel_to_sampled_times = dplyr::row_number()) + ww_censored <- ww_data_with_index |> + dplyr::filter(.data[[lod_col_name]] == 1) |> + dplyr::pull(ind_rel_to_sampled_times) + ww_uncensored <- ww_data_with_index |> + dplyr::filter(.data[[lod_col_name]] == 0) |> + dplyr::pull(ind_rel_to_sampled_times) + stopifnot( + "Length of censored vectors incorrect" = + length(ww_censored) + length(ww_uncensored) == owt + ) + + + # Need to get the times of wastewater sampling, starting at the first + # day of hospital admissions data + ww_date_df <- data.frame( + date = seq( + from = min(input_hosp_data$date), + to = max(ww_data$date), + by = "days" + ), + t = 1:(as.integer(max(ww_data$date) - min(input_hosp_data$date)) + 1) + ) + + # Left join the data mapped to time to the wastewater data + spine_ww <- ww_data |> + dplyr::left_join(ww_date_df, by = "date") + + # Pull just the vector of times of wastewater observations + ww_sampled_times <- spine_ww |> + dplyr::pull(t) + + # Pull just the indexes of the sites that correspond to the vector of + # sampled times + ww_sampled_sites <- ww_data$site_index + + # Pull just the indexes of the lab-sites that correspond to the vector of + # sampled times + ww_sampled_lab_sites <- ww_data$lab_site_index + + # Need a vector of indices indicating the site for each lab-site + lab_site_to_site_map <- ww_data |> + dplyr::select(lab_site_index, site_index) |> + dplyr::arrange(lab_site_index, "desc") |> + dplyr::distinct() |> + dplyr::pull(site_index) + + ww_data_indices <- list( + ww_censored = ww_censored, + ww_uncensored = ww_uncensored, ww_sampled_times = ww_sampled_times, - hosp = hosp_values$hosp_admits, - day_of_week = hosp_values$day_of_week, - log_conc = log_conc, - compute_likelihood = compute_likelihood, - include_ww = include_ww, - include_hosp = 1, - if_l = length(infection_feedback_pmf), - infection_feedback_pmf = infection_feedback_pmf, - # Priors - viral_shedding_pars = viral_shedding_pars, # tpeak, viral peak, - # duration shedding - autoreg_rt_a = autoreg_rt_a, - autoreg_rt_b = autoreg_rt_b, - autoreg_p_hosp_a = autoreg_p_hosp_a, - autoreg_p_hosp_b = autoreg_p_hosp_b, - inv_sqrt_phi_prior_mean = inv_sqrt_phi_prior_mean, - inv_sqrt_phi_prior_sd = inv_sqrt_phi_prior_sd, - r_prior_mean = r_prior_mean, - r_prior_sd = r_prior_sd, - log10_g_prior_mean = log10_g_prior_mean, - log10_g_prior_sd = log10_g_prior_sd, - i0_over_n_prior_a = 1 + i0_certainty * (i0 / pop), - i0_over_n_prior_b = 1 + i0_certainty * (1 - (i0 / pop)), - wday_effect_prior_mean = wday_effect_prior_mean, - wday_effect_prior_sd = wday_effect_prior_sd, - initial_growth_prior_mean = initial_growth_prior_mean, - initial_growth_prior_sd = initial_growth_prior_sd, - sigma_ww_prior_mean = sigma_ww_site_prior_mean_mean, - eta_sd_sd = eta_sd_sd, - p_hosp_prior_mean = p_hosp_mean, - p_hosp_sd_logit = p_hosp_sd_logit, - p_hosp_w_sd_sd = p_hosp_w_sd_sd, - inf_feedback_prior_logmean = infection_feedback_prior_logmean, - inf_feedback_prior_logsd = infection_feedback_prior_logsd + ww_sampled_sites = ww_sampled_sites, + ww_sampled_lab_sites = ww_sampled_lab_sites, + lab_site_to_site_map = lab_site_to_site_map ) } else { - cli::cli_abort("Unknown model") - data_renewal <- list() + ww_data_indices <- list( + ww_censored = c(), + ww_uncensored = c(), + ww_sampled_times = c(), + ww_sampled_sites = c(), + ww_sampled_lab_sites = c(), + lab_site_to_site_map = c() + ) } - stopifnot( - "Model type not specified properly" = - !purrr::is_empty(data_renewal) + + return(ww_data_indices) +} + +#' Get wastewater data values +#' +#' @param ww_data Input wastewater dataframe containing one row +#' per observation, with outliers already removed +#' @param ww_measurement_col_name A string representing the name of the column +#' in the input_ww_data that indicates the wastewater measurement value in +#' natural scale, default is `genome_copies_per_ml` +#' @param ww_lod_value_col_name A string representing the name of the column +#' in the ww_data that indicates the value of the LOD in natural scale, +#' default is `lod` +#' @param ww_site_pop_col_name A string representing the name of the column in +#' the ww_data that indicates the number of people represented by that +#' wastewater catchment, default is `site_pop` +#' @param one_pop_per_site a boolean variable indicating if there should only +#' be on catchment area population per site, default is `TRUE` because this is +#' what the stan model expects +#' +#' @return A list containing the necessary vectors of values that +#' the stan model requires: +#' ww_lod: a vector of the LODs of the corresponding wastewater measurement +#' pop_ww: a vector of the population sizes of the wastewater catchment areas +#' in order of the sites by site_index +#' log_conc: a vector of the log of the wastewater concentration observation +#' @export +get_ww_values <- function(ww_data, + ww_measurement_col_name = "genome_copies_per_ml", + ww_lod_value_col_name = "lod", + ww_site_pop_col_name = "site_pop", + one_pop_per_site = TRUE) { + ww_data_present <- nrow(ww_data) != 0 + + if (isTRUE(ww_data_present)) { + # Get the vector of log LOD values corresponding to each observation + ww_lod <- ww_data |> + dplyr::pull({{ ww_lod_value_col_name }}) |> + log() + + # Get a vector of population sizes + if (isTRUE(one_pop_per_site)) { + # Want one population per site during the model calibration period, + # so just take the average across the populations reported for each + # observation + pop_ww <- ww_data |> + dplyr::select(site_index, {{ ww_site_pop_col_name }}) |> + dplyr::group_by(site_index) |> + dplyr::summarise(pop_avg = mean(.data[[ww_site_pop_col_name]])) |> + dplyr::arrange(site_index, "desc") |> + dplyr::pull(pop_avg) + } else { + # Want a vector of length of the number of observations, corresponding to + # the population at that time + pop_ww <- ww_data |> + dplyr::pull({{ ww_site_pop_col_name }}) + } + + + # Get the vector of log wastewater concentrations + log_conc <- ww_data |> + dplyr::mutate(log_conc = as.numeric(log(!!sym( + ww_measurement_col_name + ) + 1e-8))) |> + dplyr::pull(log_conc) + + ww_values <- list( + ww_lod = ww_lod, + pop_ww = pop_ww, + log_conc = log_conc + ) + } else { + ww_values <- list( + ww_lod = c(), + pop_ww = c(), + log_conc = c() + ) + } + + + return(ww_values) +} + +#' Add time indexing to count data +#' +#' @param input_count_data data frame with dates and counts, +#' but without time indexing. +#' +#' @return The same data frame, with an added +#' time index, including NA rows if dates internal +#' to the timeseries are missing admissions data. +#' @export +#' +#' @examples +#' hosp_data_example <- tibble::tibble( +#' date = lubridate::ymd("2024-01-01", "2024-01-02", "2024-01-06"), +#' daily_hosp_admits = c(5, 3, 8) +#' ) +#' hosp_data_w_t <- add_time_indexing(hosp_data_example) +add_time_indexing <- function(input_count_data) { + date_df <- tibble::tibble(date = seq( + from = min(input_count_data$date), + to = max(input_count_data$date), + by = "days" + )) |> + dplyr::mutate(t = dplyr::row_number()) + + count_data <- input_count_data |> + dplyr::left_join(date_df, by = "date") |> + dplyr::arrange(date) + + return(count_data) +} + +#' Get subpopulation data +#' +#' @param add_auxiliary_site Boolean indicating whether to add another +#' subpopulation in addition to the wastewater sites to estimate R(t) of +#' @param state_pop The state population size +#' @param pop_ww The population size in each of the wastewater sites +#' @param n_ww_sites The number of wastewater sites +#' +#' @return A list containing the necessary integers and vectors that stan +#' needs to estiamte infection dynamics for each subpopulation +#' @export +#' +#' @examples subpop_data <- get_subpop_data(TRUE, 100000, c(1000, 500), 2) +get_subpop_data <- function(add_auxiliary_site, + state_pop, + pop_ww, + n_ww_sites) { + if (add_auxiliary_site) { + # In most cases, wastewater catchment coverage < entire state. + # So here we add a subpopulation that represents the population not + # covered by wastewater surveillance + norm_pop <- state_pop + n_subpops <- n_ww_sites + 1 + subpop_size <- c(pop_ww, state_pop - sum(pop_ww)) + } else { + message("Sum of wastewater catchment areas is greater than state pop") + norm_pop <- sum(pop_ww) + # If sum catchment areas > state pop, + # use sum of catchment area pop to normalize + n_subpops <- n_ww_sites # Only divide the state into n_site subpops + subpop_size <- pop_ww + } + + subpop_data <- list( + norm_pop = norm_pop, + n_subpops = n_subpops, + subpop_size = subpop_size ) + return(subpop_data) +} - return(data_renewal) +#' Get count data integer sizes for stan +#' +#' @param input_cout_data a dataframe with the input count data +#' @param forecast_date string indicating the forecast date +#' @param forecast_horizon integer indicating the number of days to make a +#' forecast for +#' @param calibration_time integer indicating the max duration in days that +#' the model is calibrated to hospital admissions for +#' @param last_count_data_date string indicating the date of the last observed +#' count data point +#' @param uot integer indicating the time of model initialization when there are +#' no observations +#' @param count_col_name A string represeting the name of the column in the +#' input_count_data that indicates the number of daily counts, +#' default is `count` +#' +#' @return A list containing the integer sizes of the follow variables that +#' the stan model requires: +#' ht: integer indicating horizon time for the model(hospital admissions +#' nowcast + forecast time in days) +#' ot: integer indicating the total duration of time that model for producing +#' counts (e.g. cases or admissions) has available calibration data +#' oht: integer indicating the number of count observations +#' n_weeks: number of weeks (rounded up) that counts are generated +#' from the model +#' tot_weeks: number of week(rounded up) that infections are generated for +#' @export +get_count_data_sizes <- function(input_count_data, + forecast_date, + forecast_horizon, + calibration_time, + last_count_data_date, + uot, + count_col_name = "count") { + nowcast_time <- as.integer( + lubridate::ymd(forecast_date) - last_count_data_date + ) + ot <- calibration_time + ht <- nowcast_time + forecast_horizon + oht <- input_count_data |> + dplyr::filter(!is.na(.data[[count_col_name]])) |> + nrow() + n_weeks <- ceiling((ot + ht) / 7) + tot_weeks <- ceiling((ot + uot + ht) / 7) + count_data_sizes <- list( + ht = ht, + ot = calibration_time, + oht = oht, + n_weeks = n_weeks, + tot_weeks = tot_weeks + ) + return(count_data_sizes) +} +#' Get count data indices +#' +#' @param input_count_data a dataframe with the input count data +#' +#' @return A list containing the vectors of indices that +#' the stan model requires: +#' count_times: a vector of integer times corresponding to the times when the +#' count observations were made +#' @export +get_count_indices <- function(input_count_data) { + count_times <- input_count_data |> + dplyr::pull(t) + + count_indices <- list( + count_times = count_times + ) + return(count_indices) +} + +#' Get count values +#' +#' @param input_count_data a dataframe with the input count data +#' @param ot integer indicating the total duration of time that the +#' model has available calibration data in days +#' @param ht integer indicating the number of days to produce count estimates +#' outside the calibration period (forecast + nowcast time) in days +#' @param count_col_name A string representing the name of the column in the +#' input_count_data that indicates the number of daily counts of the +#' epidemiological indicator, e.g. cases or hospital admissions, +#' default is `count` +#' +#' @return A list containing the necessary vectors of values that +#' the stan model requires: +#' counts: a vector of number of daily count observations +#' day_of_week: a vector indicating the day of the week of each of the dates +#' in the calibration and forecast period +# +#' @export +get_count_values <- function(input_count_data, + ot, + ht, + count_col_name = "count") { + counts <- input_count_data |> + dplyr::pull({{ count_col_name }}) + + full_dates <- seq( + from = min(input_count_data$date), + to = min(input_count_data$date) + lubridate::days(ht + ot - 1), + by = "days" + ) + day_of_week <- lubridate::wday(full_dates, week_start = 1) + + count_values <- list( + counts = counts, + day_of_week = day_of_week + ) + return(count_values) } diff --git a/R/preprocessing.R b/R/preprocessing.R index 744afc18..9b08fa0f 100644 --- a/R/preprocessing.R +++ b/R/preprocessing.R @@ -6,8 +6,10 @@ #' @param conc_col_name name of the column containing the concentration #' measurements in the wastewater data, default is `genome_copies_per_ml` #' -#' @return a dataframe containing the transformed and clean wastewater data -#' at the site and lab label for the forecast date and location specified +#' @return a dataframe containing the same columns as ww_data plus the following +#' additional columns neede for the stan model: +#' lab_site_index, site_index, flag_as_ww_outlier, lab_site_name, +#' forecast_date #' @export #' #' @examples @@ -21,10 +23,16 @@ preprocess_ww_data <- function(ww_data, ww_data |> dplyr::distinct(lab, site) |> dplyr::mutate( - lab_site = dplyr::row_number() + lab_site_index = dplyr::row_number() ), by = c("lab", "site") ) |> + dplyr::left_join( + ww_data |> + dplyr::distinct(site) |> + dplyr::mutate(site_index = dplyr::row_number()), + by = "site" + ) |> dplyr::mutate( lab_site_name = glue::glue("Site: {site}, Lab: {lab}"), below_lod = ifelse({{ conc_col_name }} < lod, 1, 0) @@ -90,7 +98,7 @@ preprocess_hosp_data <- function(hosp_data, #' Flag WW outliers #' #' @param ww_data dataframe containing the following columns: site, lab, -#' lab_site, date, a column for concentration, and below_lod +#' lab_site_index, date, a column for concentration, and below_lod #' @param conc_col_name string, name of the column containing the concentration #' measurements in the wastewater data, default is `genome_copies_per_ml` #' @param rho_threshold float indicating the z-score threshold for "jump" @@ -113,13 +121,13 @@ flag_ww_outliers <- function(ww_data, threshold_n_dps = 1) { n_dps <- ww_data |> dplyr::filter(below_lod == 0) |> - dplyr::group_by(lab_site) |> + dplyr::group_by(lab_site_index) |> dplyr::summarise(n_data_points = dplyr::n()) # Get the ww statistics we need for outlier detection ww_stats <- ww_data |> dplyr::left_join(n_dps, - by = "lab_site" + by = "lab_site_index" ) |> # exclude below LOD from z scoring and remove lab-sites with too # few data points @@ -127,7 +135,7 @@ flag_ww_outliers <- function(ww_data, below_lod == 0, n_data_points > threshold_n_dps ) |> - dplyr::group_by(lab_site) |> + dplyr::group_by(lab_site_index) |> dplyr::arrange(date, "desc") |> dplyr::mutate( log_conc = log(!!sym(conc_col_name)), @@ -137,27 +145,27 @@ flag_ww_outliers <- function(ww_data, diff_time = as.numeric(difftime(date, prev_date)), rho = diff_log_conc / diff_time ) |> - dplyr::select(date, lab_site, rho) |> + dplyr::select(date, lab_site_index, rho) |> dplyr::distinct() # Combine stats with ww data ww_rho <- ww_data |> - left_join(ww_stats, by = c("lab_site", "date")) + left_join(ww_stats, by = c("lab_site_index", "date")) # compute z scores and flag ww_z_scored <- ww_rho |> dplyr::left_join( ww_rho |> - dplyr::group_by(lab_site) |> + dplyr::group_by(lab_site_index) |> dplyr::summarise( mean_rho = mean(rho, na.rm = TRUE), std_rho = sd(rho, na.rm = TRUE), mean_conc = mean(!!sym(conc_col_name), na.rm = TRUE), std_conc = sd(!!sym(conc_col_name), na.rm = TRUE) ), - by = "lab_site" + by = "lab_site_index" ) |> - dplyr::group_by(lab_site) |> + dplyr::group_by(lab_site_index) |> mutate( z_score_conc = (!!sym(conc_col_name) - mean_conc) / std_conc, z_score_rho = (rho - mean_rho) / std_rho diff --git a/R/utils.R b/R/utils.R index 7f2af596..bfffb6e4 100644 --- a/R/utils.R +++ b/R/utils.R @@ -77,3 +77,19 @@ get_ind_m <- function(n_days, n_weeks) { return(ind_m) } + +#' @title Create a new directory if one doesn't exist +#' @description +#' Function to create a directory for the specified output file path if needed. +#' dir_create won't throw a warning if its already made though! +#' +#' +#' @param output_file_path file path that may or may not need to be created +#' +#' @export +create_dir <- function(output_file_path) { + if (!file.exists(output_file_path)) { + fs::dir_create(output_file_path, recurse = TRUE, mode = "0777") + Sys.chmod(output_file_path, mode = "0777", use_umask = FALSE) + } +} diff --git a/inst/stan/wwinference.stan b/inst/stan/wwinference.stan index e07ef35a..5c104797 100644 --- a/inst/stan/wwinference.stan +++ b/inst/stan/wwinference.stan @@ -12,8 +12,8 @@ functions { // The fixed input data data { int gt_max; - int hosp_delay_max; - vector[hosp_delay_max] inf_to_hosp; // delay distribution from infecion to hospital admission + int inf_to_count_delay_max; + vector[inf_to_count_delay_max] inf_to_count_delay; // delay distribution from infecion to hospital admission real mwpd; // mL of ww produced per person per day int if_l; // length of infection feedback pmf vector[if_l] infection_feedback_pmf; // infection feedback pmf @@ -31,12 +31,12 @@ data { int tot_weeks; // number of weeks for the weekly random walk on IHR (includes unobserved time) matrix [uot+ot+ht, tot_weeks] p_hosp_m; // matrix to convert p_hosp from weekly to daily vector[gt_max] generation_interval; // generation interval distribution - real state_pop; // population size + real total_pop; // population size vector[n_subpops] subpop_size; // the population sizes for each subpopulation - real norm_pop; + real norm_pop; array[owt] int ww_sampled_times; // a list of all of the days on which WW is sampled // will be mapped to the corresponding sites (ww_sampled_sites) - array[oht] int hosp_times; // the days on which hospital admissions are observed + array[oht] int count_times; // the days on which hospital admissions are observed array[owt] int ww_sampled_sites; // vector of unique sites in order of the sampled times array[owt] int ww_sampled_lab_sites; // vector of unique lab-site combos i // n order of the sampled times @@ -44,7 +44,7 @@ data { array[n_uncensored] int ww_uncensored; // time that WW data is above LOD vector[owt] ww_log_lod; // The limit of detection in that site at that time point array[n_ww_lab_sites] int lab_site_to_site_map; // which lab sites correspond to which sites - array[oht] int hosp; // observed hospital admissions + array[oht] int counts; // observed counts array[ot + ht] int day_of_week; // integer vector with 1-7 corresponding to the weekday vector[owt] log_conc; // observed concentration of viral genomes in WW int compute_likelihood; // 1= use data to compute likelihood @@ -169,7 +169,7 @@ transformed parameters { vector[ot + uot + ht] state_inf_per_capita = rep_vector(0, uot + ot + ht); // state level incident infections per capita matrix[n_subpops, ot + ht] model_log_v_ot; // expected observed viral genomes/mL at all observed and forecasted times real g = pow(log10_g, 10); // Estimated genomes shed per infected individual - real i0 = i0_over_n * state_pop; // Initial absolute infection incidence + real i0 = i0_over_n * total_pop; // Initial absolute infection incidence vector[n_subpops] i0_site_over_n; // site-level initial // per capita infection incidence vector[n_subpops] growth_site; @@ -238,7 +238,7 @@ transformed parameters { // Expected hospital admissions per capita: // This is a convolution of incident infections and the hospital-admission delay distribution // generates all hospitalizations, across unobserved time, observed time, and forecast time - model_hosp_per_capita = convolve_dot_product(p_hosp .* state_inf_per_capita, reverse(inf_to_hosp), + model_hosp_per_capita = convolve_dot_product(p_hosp .* state_inf_per_capita, reverse(inf_to_count_delay), ot + uot + ht); // predicted hospital admissions per capita @@ -246,9 +246,9 @@ transformed parameters { // apply the weekday effect so these are distributed with fewer admits on Sat & Sun // multiply by state population to convert from predicted per capita admissions to // predicted absolute admissions - exp_obs_hosp = state_pop * day_of_week_effect( - exp_obs_hosp_per_capita_no_wday_effect[hosp_times], - day_of_week[hosp_times], + exp_obs_hosp = total_pop * day_of_week_effect( + exp_obs_hosp_per_capita_no_wday_effect[count_times], + day_of_week[count_times], hosp_wday_effect); // Observations at the site level (genomes/person/day) are: @@ -321,7 +321,7 @@ model { } if (include_hosp == 1) { - hosp ~ neg_binomial_2(exp_obs_hosp, phi_h); + counts ~ neg_binomial_2(exp_obs_hosp, phi_h); } } // end if for computing log likelihood } @@ -341,12 +341,12 @@ generated quantities { exp(growth_site[i] * uot); } - pred_hosp = neg_binomial_2_rng(state_pop * day_of_week_effect(model_hosp_per_capita[uot + 1 : + pred_hosp = neg_binomial_2_rng(total_pop * day_of_week_effect(model_hosp_per_capita[uot + 1 : uot + ot + ht], day_of_week, hosp_wday_effect), phi_h); - pred_new_i = neg_binomial_2_rng(state_pop * state_inf_per_capita[uot + 1 : uot + ot + ht], phi_h); + pred_new_i = neg_binomial_2_rng(total_pop * state_inf_per_capita[uot + 1 : uot + ot + ht], phi_h); // Here need to iterate through each lab-site, find the corresponding site // and apply the expected lab-site error diff --git a/man/add_time_indexing.Rd b/man/add_time_indexing.Rd new file mode 100644 index 00000000..547bb124 --- /dev/null +++ b/man/add_time_indexing.Rd @@ -0,0 +1,27 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/get_stan_data.R +\name{add_time_indexing} +\alias{add_time_indexing} +\title{Add time indexing to count data} +\usage{ +add_time_indexing(input_count_data) +} +\arguments{ +\item{input_count_data}{data frame with dates and counts, +but without time indexing.} +} +\value{ +The same data frame, with an added +time index, including NA rows if dates internal +to the timeseries are missing admissions data. +} +\description{ +Add time indexing to count data +} +\examples{ +hosp_data_example <- tibble::tibble( + date = lubridate::ymd("2024-01-01", "2024-01-02", "2024-01-06"), + daily_hosp_admits = c(5, 3, 8) +) +hosp_data_w_t <- add_time_indexing(hosp_data_example) +} diff --git a/man/compile_model.Rd b/man/compile_model.Rd new file mode 100644 index 00000000..3ab2bda4 --- /dev/null +++ b/man/compile_model.Rd @@ -0,0 +1,67 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/compile_model.R +\name{compile_model} +\alias{compile_model} +\title{Compile a stan model while pointing at the package default +include directory (\code{stan}) for #include statements} +\usage{ +compile_model( + model_filepath, + include_paths = system.file("stan", package = "cfaforecastrenewalww"), + threads = FALSE, + target_dir = tempdir(), + stanc_options = list(), + cpp_options = list(), + verbose = TRUE, + ... +) +} +\arguments{ +\item{model_filepath}{path to .stan file defining the model} + +\item{include_paths}{path(s) to directories to search for files +specified in #include statements. Passed to \code{\link[cmdstanr:cmdstan_model]{cmdstanr::cmdstan_model()}}. +Defaults to the \code{stan} subdirectory of the installed +\code{cfaforecastrenewalww} package.} + +\item{threads}{Number of threads to use in model compilation, +as an integer. Passed to \code{\link[cmdstanr:cmdstan_model]{cmdstanr::cmdstan_model()}}. +Default \code{FALSE} (use single-threaded compilation).} + +\item{target_dir}{Directory in which to save the compiled +stan model binary. Passed as the \code{dir} keyword argument to +\code{\link[cmdstanr:cmdstan_model]{cmdstanr::cmdstan_model()}}. Defaults to a temporary directory +for the R session (the output of \code{\link[=tempdir]{tempdir()}}).} + +\item{stanc_options}{Options for the stan compiler passed to +\code{\link[cmdstanr:cmdstan_model]{cmdstanr::cmdstan_model()}}, as a list. See that function's +documentation for more details. Default \code{list()} (use default +options).} + +\item{cpp_options}{Options for the C++ compiler passed to +\code{\link[cmdstanr:cmdstan_model]{cmdstanr::cmdstan_model()}}, as a list. See that function's +documentation for more details. Default \code{list()} (use default +options).} + +\item{verbose}{Write detailed output to the terminal while +executing the function? Boolean, default \code{TRUE}.} + +\item{...}{Additional keyword arguments passed to +\code{\link[cmdstanr:cmdstan_model]{cmdstanr::cmdstan_model()}}.} +} +\value{ +The resulting \code{cmdstanr} model object, as the output +of \code{\link[cmdstanr:cmdstan_model]{cmdstanr::cmdstan_model()}}. +} +\description{ +This function reads in and optionally compiles a Stan model. +It is written to search the installed package \code{stan} directory +for additional source files to include. Within each stan file, +use #include {path to your file with the \code{stan} directory}.stan +} +\details{ +The code for this function has been adapted +from code written (under an MIT license) as part of +the \href{https://github.com/epinowcast/epinowcast}{\code{epinowcast}} +R package. +} diff --git a/man/create_dir.Rd b/man/create_dir.Rd new file mode 100644 index 00000000..0861e69c --- /dev/null +++ b/man/create_dir.Rd @@ -0,0 +1,15 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/utils.R +\name{create_dir} +\alias{create_dir} +\title{Create a new directory if one doesn't exist} +\usage{ +create_dir(output_file_path) +} +\arguments{ +\item{output_file_path}{file path that may or may not need to be created} +} +\description{ +Function to create a directory for the specified output file path if needed. +dir_create won't throw a warning if its already made though! +} diff --git a/man/flag_ww_outliers.Rd b/man/flag_ww_outliers.Rd index af3ef6c4..9deb4d2c 100644 --- a/man/flag_ww_outliers.Rd +++ b/man/flag_ww_outliers.Rd @@ -14,7 +14,7 @@ flag_ww_outliers( } \arguments{ \item{ww_data}{dataframe containing the following columns: site, lab, -lab_site, date, a column for concentration, and below_lod} +lab_site_index, date, a column for concentration, and below_lod} \item{conc_col_name}{string, name of the column containing the concentration measurements in the wastewater data, default is \code{genome_copies_per_ml}} diff --git a/man/get_count_data_sizes.Rd b/man/get_count_data_sizes.Rd new file mode 100644 index 00000000..82935002 --- /dev/null +++ b/man/get_count_data_sizes.Rd @@ -0,0 +1,52 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/get_stan_data.R +\name{get_count_data_sizes} +\alias{get_count_data_sizes} +\title{Get count data integer sizes for stan} +\usage{ +get_count_data_sizes( + input_count_data, + forecast_date, + forecast_horizon, + calibration_time, + last_count_data_date, + uot, + count_col_name = "count" +) +} +\arguments{ +\item{forecast_date}{string indicating the forecast date} + +\item{forecast_horizon}{integer indicating the number of days to make a +forecast for} + +\item{calibration_time}{integer indicating the max duration in days that +the model is calibrated to hospital admissions for} + +\item{last_count_data_date}{string indicating the date of the last observed +count data point} + +\item{uot}{integer indicating the time of model initialization when there are +no observations} + +\item{count_col_name}{A string represeting the name of the column in the +input_count_data that indicates the number of daily counts, +default is \code{count}} + +\item{input_cout_data}{a dataframe with the input count data} +} +\value{ +A list containing the integer sizes of the follow variables that +the stan model requires: +ht: integer indicating horizon time for the model(hospital admissions +nowcast + forecast time in days) +ot: integer indicating the total duration of time that model for producing +counts (e.g. cases or admissions) has available calibration data +oht: integer indicating the number of count observations +n_weeks: number of weeks (rounded up) that counts are generated +from the model +tot_weeks: number of week(rounded up) that infections are generated for +} +\description{ +Get count data integer sizes for stan +} diff --git a/man/get_count_indices.Rd b/man/get_count_indices.Rd new file mode 100644 index 00000000..4c6e7c80 --- /dev/null +++ b/man/get_count_indices.Rd @@ -0,0 +1,20 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/get_stan_data.R +\name{get_count_indices} +\alias{get_count_indices} +\title{Get count data indices} +\usage{ +get_count_indices(input_count_data) +} +\arguments{ +\item{input_count_data}{a dataframe with the input count data} +} +\value{ +A list containing the vectors of indices that +the stan model requires: +count_times: a vector of integer times corresponding to the times when the +count observations were made +} +\description{ +Get count data indices +} diff --git a/man/get_count_values.Rd b/man/get_count_values.Rd new file mode 100644 index 00000000..a4ae5261 --- /dev/null +++ b/man/get_count_values.Rd @@ -0,0 +1,32 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/get_stan_data.R +\name{get_count_values} +\alias{get_count_values} +\title{Get count values} +\usage{ +get_count_values(input_count_data, ot, ht, count_col_name = "count") +} +\arguments{ +\item{input_count_data}{a dataframe with the input count data} + +\item{ot}{integer indicating the total duration of time that the +model has available calibration data in days} + +\item{ht}{integer indicating the number of days to produce count estimates +outside the calibration period (forecast + nowcast time) in days} + +\item{count_col_name}{A string representing the name of the column in the +input_count_data that indicates the number of daily counts of the +epidemiological indicator, e.g. cases or hospital admissions, +default is \code{count}} +} +\value{ +A list containing the necessary vectors of values that +the stan model requires: +counts: a vector of number of daily count observations +day_of_week: a vector indicating the day of the week of each of the dates +in the calibration and forecast period +} +\description{ +Get count values +} diff --git a/man/get_stan_data.Rd b/man/get_stan_data.Rd index 45b19bb0..aeae54c6 100644 --- a/man/get_stan_data.Rd +++ b/man/get_stan_data.Rd @@ -2,81 +2,58 @@ % Please edit documentation in R/get_stan_data.R \name{get_stan_data} \alias{get_stan_data} -\title{Get stan data} +\title{Get stan data for ww + hosp model} \usage{ get_stan_data( - model_type, + input_count_data, + input_ww_data, forecast_date, - forecast_time, + forecast_horizon, calibration_time, - input_ww_data, - input_hosp_data, generation_interval, - inf_to_hosp, + inf_to_count_delay, infection_feedback_pmf, params, - compute_likelihood = 1, - ww_outlier_col_name = "flag_as_ww_outlier", - lod_col_name = "below_LOD", - ww_measurement_col_name = "ww", - ww_value_lod_col_name = "lod_sewage", - hosp_value_col_name = "daily_hosp_admits" + exclude_ww_outliers = TRUE, + compute_likelihood = 1 ) } \arguments{ -\item{model_type}{string indicating which model we are getting data for -Options are \code{ww} or \code{hosp}} +\item{input_count_data}{a dataframe with the input count data, must have +the following columns: date, count, total_pop} + +\item{input_ww_data}{a dataframe with the input wastewater data with no gaps, +must have the following columns: date, site, lab, genome_copies_per_ml, +site_pop, below_lod, and if removing outliers, flag_as_ww_outlier} \item{forecast_date}{string indicating the forecast date} -\item{forecast_time}{integer indicating the number of days to make a forecast -for} +\item{forecast_horizon}{integer indicating the number of days to make a +forecast for} \item{calibration_time}{integer indicating the max duration in days that -the model is calibrated to hospital admissions for} - -\item{input_ww_data}{a dataframe with the input wastewater data} - -\item{input_hosp_data}{a dataframe with the input hospital admissions data} +the model is calibrated to the count data for} \item{generation_interval}{a vector with a zero-truncated normalized pmf of the generation interval} -\item{inf_to_hosp}{a vector with a normalized pmf of the delay from infection -to hospital admissions} +\item{inf_to_count_delay}{a vector with a normalized pmf of the delay from +infection to counts} \item{infection_feedback_pmf}{a vector with a normalized pmf dictating the delay of infection feedback} \item{params}{a dataframe of parameter names and numeric values} +\item{exclude_ww_outliers}{boolean indicating whether or not to remove +the flagged ww_outliers, default = \code{TRUE}} + \item{compute_likelihood}{indicator variable telling stan whether or not to compute the likelihood, default = \code{1}} - -\item{ww_outlier_col_name}{A string representing the name of the -column in the input_ww_data that provides a 0 if the data point is not an -outlier to be excluded from the model fit, or a 1 if it is to be excluded -default value is \code{flag_as_ww_outlier}} - -\item{lod_col_name}{A string representing the name of the -column in the input_ww_data that provides a 0 if the data point is not above -the LOD and a 1 if the data is below the LOD, default value is \code{below_LOD}} - -\item{ww_measurement_col_name}{A string representing the name of the column -in the input_ww_data that indicates the wastewater measurement value in -natural scale, default is \code{ww}} - -\item{ww_value_lod_col_name}{A string representing the name of the column -in the input_ww_data that indicates the value of the LOD in natural scale, -default is \code{lod_sewage}} - -\item{hosp_value_col_name}{A string representing the name of the column -in the input_hosp-data that indicates the number of daily hospital -admissions, default is \code{daily_hosp_admits}} } \value{ -a list named variables to pass to stan +a list of named variables to pass to stan } \description{ -Get stan data +Get stan data for ww + hosp model } diff --git a/man/get_subpop_data.Rd b/man/get_subpop_data.Rd new file mode 100644 index 00000000..ed5600a9 --- /dev/null +++ b/man/get_subpop_data.Rd @@ -0,0 +1,28 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/get_stan_data.R +\name{get_subpop_data} +\alias{get_subpop_data} +\title{Get subpopulation data} +\usage{ +get_subpop_data(add_auxiliary_site, state_pop, pop_ww, n_ww_sites) +} +\arguments{ +\item{add_auxiliary_site}{Boolean indicating whether to add another +subpopulation in addition to the wastewater sites to estimate R(t) of} + +\item{state_pop}{The state population size} + +\item{pop_ww}{The population size in each of the wastewater sites} + +\item{n_ww_sites}{The number of wastewater sites} +} +\value{ +A list containing the necessary integers and vectors that stan +needs to estiamte infection dynamics for each subpopulation +} +\description{ +Get subpopulation data +} +\examples{ +subpop_data <- get_subpop_data(TRUE, 100000, c(1000, 500), 2) +} diff --git a/man/get_ww_data_indices.Rd b/man/get_ww_data_indices.Rd new file mode 100644 index 00000000..e16fce3f --- /dev/null +++ b/man/get_ww_data_indices.Rd @@ -0,0 +1,40 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/get_stan_data.R +\name{get_ww_data_indices} +\alias{get_ww_data_indices} +\title{Get wastewater data indices} +\usage{ +get_ww_data_indices(ww_data, input_hosp_data, owt, lod_col_name = "below_lod") +} +\arguments{ +\item{ww_data}{Input wastewater dataframe containing one row +per observation, with outliers already removed} + +\item{input_hosp_data}{Input hospital admissions data frame with one row +per day and location} + +\item{owt}{number of wastewater observations} + +\item{lod_col_name}{A string representing the name of the +column in the input_ww_data that provides a 0 if the data point is not above +the LOD and a 1 if the data is below the LOD, default value is \code{below_LOD}} +} +\value{ +A list containing the necessary vectors of indices that +the stan model requires: +ww_censored: the vector of time points that the wastewater observations are +censored (below the LOD) in order of the date and the site index +ww_uncensored: the vector of time points that the wastewater observations are +uncensored (above the LOD) in order of the date and the site index +ww_sampled_times: the vector of time points that the wastewater observations +are passed in in log_conc in order of the date and the site index +ww_sampled_sites: the vector of sites that correspond to the observations +passed in in log_conc in order of the date and the site index +ww_sampled_lab_sites: the vector of unique combinations of site and labs +that correspond to the observations passed in in log_conc in order of the +date and the site index +lab_site_to_site_map: the vector of sites that correspond to each lab-site +} +\description{ +Get wastewater data indices +} diff --git a/man/get_ww_data_sizes.Rd b/man/get_ww_data_sizes.Rd new file mode 100644 index 00000000..5c5d78eb --- /dev/null +++ b/man/get_ww_data_sizes.Rd @@ -0,0 +1,28 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/get_stan_data.R +\name{get_ww_data_sizes} +\alias{get_ww_data_sizes} +\title{Get the integer sizes of the wastewater input data} +\usage{ +get_ww_data_sizes(ww_data, lod_col_name = "below_lod") +} +\arguments{ +\item{ww_data}{Input wastewater dataframe containing one row +per observation, with outliers already removed} + +\item{lod_col_name}{A string representing the name of the +column in the input_ww_data that provides a 0 if the data point is not above +the LOD and a 1 if the data is below the LOD, default value is \code{below_LOD}} +} +\value{ +A list containing the integer sizes of the follow variables that +the stan model requires: +owt: number of wastewater observations +n_censored: number of censored wastewater observations (below the LOD) +n_uncensored: number of uncensored wastewter observations (above the LOD) +n_ww_sites: number of wastewater sites +n_ww_lab_sites: number of unique wastewater site-lab combinations +} +\description{ +Get the integer sizes of the wastewater input data +} diff --git a/man/get_ww_values.Rd b/man/get_ww_values.Rd new file mode 100644 index 00000000..f0e643b2 --- /dev/null +++ b/man/get_ww_values.Rd @@ -0,0 +1,45 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/get_stan_data.R +\name{get_ww_values} +\alias{get_ww_values} +\title{Get wastewater data values} +\usage{ +get_ww_values( + ww_data, + ww_measurement_col_name = "genome_copies_per_ml", + ww_lod_value_col_name = "lod", + ww_site_pop_col_name = "site_pop", + one_pop_per_site = TRUE +) +} +\arguments{ +\item{ww_data}{Input wastewater dataframe containing one row +per observation, with outliers already removed} + +\item{ww_measurement_col_name}{A string representing the name of the column +in the input_ww_data that indicates the wastewater measurement value in +natural scale, default is \code{genome_copies_per_ml}} + +\item{ww_lod_value_col_name}{A string representing the name of the column +in the ww_data that indicates the value of the LOD in natural scale, +default is \code{lod}} + +\item{ww_site_pop_col_name}{A string representing the name of the column in +the ww_data that indicates the number of people represented by that +wastewater catchment, default is \code{site_pop}} + +\item{one_pop_per_site}{a boolean variable indicating if there should only +be on catchment area population per site, default is \code{TRUE} because this is +what the stan model expects} +} +\value{ +A list containing the necessary vectors of values that +the stan model requires: +ww_lod: a vector of the LODs of the corresponding wastewater measurement +pop_ww: a vector of the population sizes of the wastewater catchment areas +in order of the sites by site_index +log_conc: a vector of the log of the wastewater concentration observation +} +\description{ +Get wastewater data values +} diff --git a/man/preprocess_ww_data.Rd b/man/preprocess_ww_data.Rd index db409d10..abfe2377 100644 --- a/man/preprocess_ww_data.Rd +++ b/man/preprocess_ww_data.Rd @@ -21,8 +21,10 @@ formatted as a character string in IS08601 format (YYYY-MM-DD).} measurements in the wastewater data, default is \code{genome_copies_per_ml}} } \value{ -a dataframe containing the transformed and clean wastewater data -at the site and lab label for the forecast date and location specified +a dataframe containing the same columns as ww_data plus the following +additional columns neede for the stan model: +lab_site_index, site_index, flag_as_ww_outlier, lab_site_name, +forecast_date } \description{ Get input wastewater data diff --git a/vignettes/wwinference.Rmd b/vignettes/wwinference.Rmd index 2a37dfe1..60850b97 100644 --- a/vignettes/wwinference.Rmd +++ b/vignettes/wwinference.Rmd @@ -178,4 +178,58 @@ to assigning the forecast date (done above), this also includes: # configuration file. calibration_time <- 90 # number of days to calibrate hospital admissions to +forecast_horizon <- 28 # number of days from the forecast date to generate +# forecasts for + +# We will pass in some pmfs that are specific to COVID, and to the delay from +# infections to hospital admissions. If using a different pathogen or a +# different count dataset, these pmfs need to be replaced. We provide them as +# package data here. These are both vectors of simplexes (they must sum to 1). +generation_interval <- wwinference::generation_interval +inf_to_hosp <- wwinference::inf_to_hosp + +# Additionally, the model requires specifying a delay distribution for the +# infection feedback term, which essentially describes the delay at which +# high incident infections results in negative feedback on future infections +# (due to susceptibility, behavior changes, policies to reduce transmission, +# etc.). We by default set this as the generation interval, but this can be +# modified as long as the values sum to 1. +infection_feedback_pmf <- generation_interval + +# Next, we will get the stan data objected from the data and the model +# specifications + +stan_data <- get_stan_data( + input_count_data = hosp_data_preprocessed, + input_ww_data = ww_data_preprocessed, + forecast_date = forecast_date, + calibration_time = calibration_time, + forecast_horizon = forecast_horizon, + generation_interval = generation_interval, + inf_to_count_delay = inf_to_hosp, + infection_feedback_pmf = infection_feedback_pmf, + params = params +) +``` +# Compile and fit the model +```{r} +model_file_path <- system.file( + "stan", "wwinference.stan", + package = "wwinference" +) + +model <- compile_model( + model_file_path +) + + +ww_fit_obj <- model$sample( + data = stan_data, + seed = 123, + iter_sampling = 500, + iter_warmup = 750, + max_treedepth = 12, + chains = 4, + parallel_chains = 4 +) ``` From 900a5c7799f303508203ca31a9a4c763768d7abf Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Mon, 1 Jul 2024 10:20:40 -0400 Subject: [PATCH 030/103] simplify pre-processing in vignette --- R/compile_model.R | 12 ++- R/generate_simulated_data.R | 4 +- R/preprocessing.R | 68 +++++++------ man/compile_model.Rd | 9 +- man/preprocess_hosp_data.Rd | 13 +-- man/preprocess_ww_data.Rd | 25 +++-- vignettes/wwinference.Rmd | 190 +++++++++++++++++++++++------------- 7 files changed, 201 insertions(+), 120 deletions(-) diff --git a/R/compile_model.R b/R/compile_model.R index 6b883af7..f1faeb99 100644 --- a/R/compile_model.R +++ b/R/compile_model.R @@ -12,11 +12,12 @@ #' the [`epinowcast`](https://github.com/epinowcast/epinowcast) #' R package. #' -#' @param model_filepath path to .stan file defining the model +#' @param model_filepath path to .stan file defining the model, default is +#' `system.file("stan", "wwinference.stan", package = "wwinference") #' @param include_paths path(s) to directories to search for files #' specified in #include statements. Passed to [cmdstanr::cmdstan_model()]. #' Defaults to the `stan` subdirectory of the installed -#' `cfaforecastrenewalww` package. +#' `wwinference` package. #' @param threads Number of threads to use in model compilation, #' as an integer. Passed to [cmdstanr::cmdstan_model()]. #' Default `FALSE` (use single-threaded compilation). @@ -40,10 +41,13 @@ #' @return The resulting `cmdstanr` model object, as the output #' of [cmdstanr::cmdstan_model()]. #' @export -compile_model <- function(model_filepath, +compile_model <- function(model_filepath = system.file("stan", + "wwinference.stan", + package = "wwinference" + ), include_paths = system.file( "stan", - package = "cfaforecastrenewalww" + package = "wwinference" ), threads = FALSE, target_dir = tempdir(), diff --git a/R/generate_simulated_data.R b/R/generate_simulated_data.R index 37ba142a..69549099 100644 --- a/R/generate_simulated_data.R +++ b/R/generate_simulated_data.R @@ -443,7 +443,7 @@ generate_simulated_data <- function(r_in_weeks = # nolint dplyr::select(date, site, lab, genome_copies_per_ml, lod, site_pop) # Make a hospital admissions dataframe for model calibration - hosp_data <- data.frame( + hosp_data <- tibble::tibble( t = 1:ot, daily_hosp_admits = exp_obs_hosp[1:ot], state_pop = pop_size @@ -459,7 +459,7 @@ generate_simulated_data <- function(r_in_weeks = # nolint ) # Make another one for model evaluation - hosp_data_eval <- data.frame( + hosp_data_eval <- tibble::tibble( t = 1:(ot + ht), daily_hosp_admits_for_eval = exp_obs_hosp, state_pop = pop_size diff --git a/R/preprocessing.R b/R/preprocessing.R index 9b08fa0f..c115255a 100644 --- a/R/preprocessing.R +++ b/R/preprocessing.R @@ -1,11 +1,12 @@ #' Get input wastewater data #' @param ww_data dataframe containing the following columns: site, lab, #' date, a column for concentration, and lod -#' @param forecast_date The forecast date for this iteration, -#' formatted as a character string in IS08601 format (YYYY-MM-DD). -#' @param conc_col_name name of the column containing the concentration -#' measurements in the wastewater data, default is `genome_copies_per_ml` -#' +#' @param conc_col_name string indicating the name of the column containing +#' the concentration measurements in the wastewater data, default is +#' `genome_copies_per_ml` +#' @param lod_col_name string indicating the name of the column containing +#' the concentration measurements in the wastewater data, default is +#' `genome_copies_per_ml` #' @return a dataframe containing the same columns as ww_data plus the following #' additional columns neede for the stan model: #' lab_site_index, site_index, flag_as_ww_outlier, lab_site_name, @@ -13,10 +14,21 @@ #' @export #' #' @examples -#' ww_data_preprocessed <- preprocess_ww_data(ww_data, "2023-12-01") +#' ww_data <- tibble::tibble( +#' date = rep(c("2023-11-01", "2023-11-02"), 2), +#' site = c(rep(1), rep(2)), +#' lab = c(1, 1, 1, 1), +#' conc = c(345.2, 784.1, 401.5, 681.8), +#' lod = c(20, 20, 15, 15) +#' ) + +#' ww_data_preprocessed <- preprocess_ww_data(ww_data, +#' conc_col_name = "conc", +#' lod_col_name = "lod" +#' ) preprocess_ww_data <- function(ww_data, - forecast_date, - conc_col_name = "genome_copies_per_ml") { + conc_col_name = "genome_copies_per_ml", + lod_col_name = "lod") { # Add some columns ww_data <- ww_data |> dplyr::left_join( @@ -35,19 +47,16 @@ preprocess_ww_data <- function(ww_data, ) |> dplyr::mutate( lab_site_name = glue::glue("Site: {site}, Lab: {lab}"), - below_lod = ifelse({{ conc_col_name }} < lod, 1, 0) + below_lod = ifelse({{ conc_col_name }} < {{ lod_col_name }}, 1, 0) ) # Get an extra column that identifies the wastewater outliers using the # default parameters - ww_preprocessed <- flag_ww_outliers(ww_data) |> - dplyr::mutate( - forecast_date = lubridate::ymd(!!forecast_date) - ) |> - # In case the wastewater data being passed in isn't vintaged, we want to - # make sure we don't include values that are past the forecast date - dplyr::filter( - date < forecast_date + ww_preprocessed <- flag_ww_outliers(ww_data, + conc_col_name = !!conc_col_name + ) |> + dplyr::rename( + genome_copies_per_ml = {{ conc_col_name }} ) @@ -57,38 +66,35 @@ preprocess_ww_data <- function(ww_data, #' Get input hospital admissions data #' @param hosp_data dataframe containing the following columns: date, #' a count column, and a population size column -#' @param forecast_date The forecast date for this iteration, -#' formatted as a character string in IS08601 format (YYYY-MM-DD). #' @param count_col_name name of the column containing the epidemiological #' indicator, default is `daily_hosp_admits` #' @param pop_size_col_name name of the column containing the population size #' of that the counts are coming from, default is `state_pop` #' #' @return a dataframe containing the hospital admissions data renamed to -#' have the following columns `date`, `count`, `total_pop` and `forecast_date` +#' have the following columns `date`, `count`, and `total_pop` #' @export #' #' @examples -#' hosp_data_preprocessed <- preprocess_hospdata(hosp_data, "2023-12-01") +#' hosp_data <- tibble::tibble( +#' date = c("2023-11-01", "2023-11-02"), +#' daily_admits = c(10, 20), +#' state_pop = c(1e6, 1e6) +#' ) +#' hosp_data_preprocessed <- preprocess_hospdata( +#' hosp_data, +#' "daily_admits", +#' "state_pop" +#' ) preprocess_hosp_data <- function(hosp_data, - forecast_date, count_col_name = "daily_hosp_admits", pop_size_col_name = "state_pop") { hosp_data_preprocessed <- hosp_data |> dplyr::rename( count = {{ count_col_name }}, total_pop = {{ pop_size_col_name }} - ) |> - dplyr::mutate( - forecast_date = lubridate::ymd(!!forecast_date) - ) |> - # In case the count data being passed in isn't vintaged, we want to - # make sure we don't include values that are past the forecast date - dplyr::filter( - date < forecast_date ) - return(hosp_data_preprocessed) } diff --git a/man/compile_model.Rd b/man/compile_model.Rd index 3ab2bda4..3b15a9ee 100644 --- a/man/compile_model.Rd +++ b/man/compile_model.Rd @@ -6,8 +6,8 @@ include directory (\code{stan}) for #include statements} \usage{ compile_model( - model_filepath, - include_paths = system.file("stan", package = "cfaforecastrenewalww"), + model_filepath = system.file("stan", "wwinference.stan", package = "wwinference"), + include_paths = system.file("stan", package = "wwinference"), threads = FALSE, target_dir = tempdir(), stanc_options = list(), @@ -17,12 +17,13 @@ compile_model( ) } \arguments{ -\item{model_filepath}{path to .stan file defining the model} +\item{model_filepath}{path to .stan file defining the model, default is +`system.file("stan", "wwinference.stan", package = "wwinference")} \item{include_paths}{path(s) to directories to search for files specified in #include statements. Passed to \code{\link[cmdstanr:cmdstan_model]{cmdstanr::cmdstan_model()}}. Defaults to the \code{stan} subdirectory of the installed -\code{cfaforecastrenewalww} package.} +\code{wwinference} package.} \item{threads}{Number of threads to use in model compilation, as an integer. Passed to \code{\link[cmdstanr:cmdstan_model]{cmdstanr::cmdstan_model()}}. diff --git a/man/preprocess_hosp_data.Rd b/man/preprocess_hosp_data.Rd index f17abc9a..2aaa5393 100644 --- a/man/preprocess_hosp_data.Rd +++ b/man/preprocess_hosp_data.Rd @@ -6,7 +6,6 @@ \usage{ preprocess_hosp_data( hosp_data, - forecast_date, count_col_name = "daily_hosp_admits", pop_size_col_name = "state_pop" ) @@ -15,9 +14,6 @@ preprocess_hosp_data( \item{hosp_data}{dataframe containing the following columns: date, a count column, and a population size column} -\item{forecast_date}{The forecast date for this iteration, -formatted as a character string in IS08601 format (YYYY-MM-DD).} - \item{count_col_name}{name of the column containing the epidemiological indicator, default is \code{daily_hosp_admits}} @@ -26,11 +22,16 @@ of that the counts are coming from, default is \code{state_pop}} } \value{ a dataframe containing the hospital admissions data renamed to -have the following columns \code{date}, \code{count}, \code{total_pop} and \code{forecast_date} +have the following columns \code{date}, \code{count}, and \code{total_pop} } \description{ Get input hospital admissions data } \examples{ -hosp_data_preprocessed <- preprocess_hospdata(hosp_data, "2023-12-01") +hosp_data <- tibble::tibble(date = c("2023-11-01", "2023-11-02"), + daily_admits = c( 10, 20), + state_pop = c(1e6, 1e6)) +hosp_data_preprocessed <- preprocess_hospdata(hosp_data, + "daily_admits", + "state_pop") } diff --git a/man/preprocess_ww_data.Rd b/man/preprocess_ww_data.Rd index abfe2377..cc68e995 100644 --- a/man/preprocess_ww_data.Rd +++ b/man/preprocess_ww_data.Rd @@ -6,19 +6,21 @@ \usage{ preprocess_ww_data( ww_data, - forecast_date, - conc_col_name = "genome_copies_per_ml" + conc_col_name = "genome_copies_per_ml", + lod_col_name = "lod" ) } \arguments{ \item{ww_data}{dataframe containing the following columns: site, lab, date, a column for concentration, and lod} -\item{forecast_date}{The forecast date for this iteration, -formatted as a character string in IS08601 format (YYYY-MM-DD).} +\item{conc_col_name}{string indicating the name of the column containing +the concentration measurements in the wastewater data, default is +\code{genome_copies_per_ml}} -\item{conc_col_name}{name of the column containing the concentration -measurements in the wastewater data, default is \code{genome_copies_per_ml}} +\item{lod_col_name}{string indicating the name of the column containing +the concentration measurements in the wastewater data, default is +\code{genome_copies_per_ml}} } \value{ a dataframe containing the same columns as ww_data plus the following @@ -30,5 +32,14 @@ forecast_date Get input wastewater data } \examples{ -ww_data_preprocessed <- preprocess_ww_data(ww_data, "2023-12-01") +ww_data <- tibble::tibble(date = rep(c("2023-11-01", "2023-11-02"),2), + site= c(rep(1), rep(2)), + lab = c(1,1,1,1), + conc = c(345.2, 784.1, 401.5, 681.8), + lod = c(20, 20, 15, 15) + ) +ww_data_preprocessed <- preprocess_ww_data(ww_data, + conc_col_name = "conc", + lod_col_name = "lod" + ) } diff --git a/vignettes/wwinference.Rmd b/vignettes/wwinference.Rmd index 60850b97..bbb6d186 100644 --- a/vignettes/wwinference.Rmd +++ b/vignettes/wwinference.Rmd @@ -42,20 +42,29 @@ The model expects two types of data: daily counts of hospital admissions data from the larger "global" population, and wastewater concentration data from wastewater treatment plants whose catchment areas are contained within the larger "global" population. For this quick start, we will use -simulated data, modeled after a hypothetical US state with 6 wastewater treatment -plants reporting data on viral concentrations of SARS-COV-2, covering about X % -of the state's population. This simulated data is assigned to cover dates -from September 1, 2023 to December 1, 2023. These data are provided as part -of the package data. +simulated data, modeled after a hypothetical US state with 4 wastewater +treatmentplants (also referred to as sites) reporting data on viral +concentrations of SARS-COV-2, processed in 3 different labs, covering about 70% +of the state's population. This simulated data contains daily counts of the +total hospital admissions in a hypothetical US state from September 1, 2023 to +November 29, 2023. It contains wastewater concentration data spanning from +September 1, 2023 to December 1, 2023, with varying sampling frequencies. We +will be using this data to produce a forecast of COVID-19 hospital admissions +as of December 6, 2023. These data are provided as part of the package data. These data are already in a format that can be used for `wwinference`. For the hospital admissions data, it contains: - a date (column `date`): the date of the observation, in this case, the date the hospital admissions occurred -- a count (column `daily_hosp_admits`): the number of hospital admissions observed on that day +- a count (column `daily_hosp_admits`): the number of hospital admissions +observed on that day - a population size (column `state_pop`): the population size covered by the hospital admissions data, in this case, the size of the theoretical state. +Additionally, we provide the `hosp_data_eval` dataset which contains the +simulated hospital admissions 28 days ahead of the forecast date, which can be +used to evaluate the model. + For the wastewater data, it contains: - a date (column `date`): the date the sample was collected - a site indicator (column `site`): the unique identifier for the wastewater treatment plant @@ -68,61 +77,80 @@ process the sample, in natural scale - a site population size (column `site_pop`): the population size covered by the wastewater catchment area of that site + + ```{r} hosp_data <- wwinference::hosp_data hosp_data_eval <- wwinference::hosp_data_eval ww_data <- wwinference::ww_data + +head(ww_data) +head(hosp_data) ``` # Pre-processing The user will need to provide data that is in a similar format to the package -data. This represents the bare minimum required data for a single location -and a single forecast date. We will need to do some pre-processing to add -some columns that the model will need to be able apply features such -as outlier exclusion and censoring of values below the limit of detection. +data, as described above. This represents the bare minimum required data for a +single location and a single forecast date. We will need to do some +pre-processing to add some additional variables that the model will need to be +able apply features such as outlier exclusion and censoring of values below the +limit of detection. +## Parameters +Get the parameters from the package. Note that some of these are COVID +specific, others are more general to the model. This is indicated in the +.toml file. ```{r} -# Get the parameters from the package. Note that some of these are COVID -# specific, others are more general to the model. This is indicated in the -# .toml file. params <- get_params( system.file("extdata", "example_params.toml", package = "wwinference" ) ) +``` + +## Wastewater data pre-processing -# Specify the forecast date. Let's pretend we want to make a forecast on -# 2023-12-07, with our hospital admissions data that is available up until -# 2023-11-29 -forecast_date <- "2023-12-07" - -# Pre-process the wastewater dataset: this function adds a few columns to -# the original wastewater dataset. First, it assigns a unique identifier -# the unique combinations of labs and sites, since this is the unit we will -# use for estimating the observation error in the reported measurements. -# Second it adds a column `below_lod` which is an indicator of whether the -# reported concentration is above or below the lod. If the point is below the -# LOD, the model will treat this observation as censored. -# Third, it adds a column `flag_as_ww_outlier` that indicates whether the -# measurement is identified as an outlier by our algorithm and the default -# thresholds. The user can still choose to include these in the fitting process -# later on. Lastly, it filters the wastewater measurements to dates before the -# forecast date specified above, just in case the wastewater data passed in -# isn't vintaged. +The `preprocess_ww_data` function adds the following variables to the original +dataset. First, it assigns a unique identifier +the unique combinations of labs and sites, since this is the unit we will +use for estimating the observation error in the reported measurements. +Second it adds a column `below_lod` which is an indicator of whether the +reported concentration is above or below the limit of detection (LOD). If the +point is below the LOD, the model will treat this observation as censored. +Third, it adds a column `flag_as_ww_outlier` that indicates whether the +measurement is identified as an outlier by our algorithm and the default +thresholds. While the default choice will be to exclude the measurements flagged +as outliers, the user can still choose to include these if they'd like later on. +The user must specify the name of the column containing the +concentration measurements (presumed to be in genome copies per mL) and the +name of the column containing the limit of detection for each measurement. The +function assumes that the original data contains the columns `date`, `site`, +and `lab`, and will return a dataframe with the column names needed to +pass to the downstream model fitting functions. +```{r} ww_data_preprocessed <- wwinference::preprocess_ww_data( ww_data, - forecast_date + conc_col_name = "genome_copies_per_ml", + lod_col_name = "lod" ) +``` -# Pre-process the hospital admissions dataset: This function adds a column to -# indicate the forecast date and standardizes the column names. -# Eventually we could implement outlier detection, -# data exclusion here but we will keep as simple as we can for now. +## Hospital admissions data pre-processing +The `preprocess_hosp_data` function standardizes the column names of the +resulting datafame. The user must specify the name of the column containing +the daily hospital admissions counts and the population size that the hospital +admissions are coming from (from in this case, a hypothetical US state). The +function assumes that the original data contains the column `date`, and will +return a dataframe with the column names needed to pass to the downstream model +fitting functions. +```{r} hosp_data_preprocessed <- wwinference::preprocess_hosp_data( hosp_data, - forecast_date + count_col_name = "daily_hosp_admits", + pop_size_col_name = "state_pop" ) ``` + We'll make some plots of the data just to make sure it looks like what we'd expect: ```{r} @@ -164,41 +192,84 @@ ggplot(hosp_data_preprocessed) + ggtitle("State level hospital admissions") + theme_bw() ``` -User specifications: +# Model specification: We will need to set some metadata to facilitate model specification. In addition to assigning the forecast date (done above), this also includes: +- forecast date (the date we are making a forecast) - number of days to calibrate the model for - number of days to forecast - specification of the generation interval, in this case for COVID-19 - specification of the delay from infection to the count data, in this case from infection to COVID-19 hospital admission + +## Calibration time and forecast time +The calibration time represents the number of days to calibrate the count data +to. This must be less than or equal to the number of rows in `hosp_data`. The +forecast horizon represents the number of days from the forecast date to +generate forecasted hospital admissions for. Typically, the hospital admissions +data will not be complete up until the forecast date, and we will refer to the +time between the last hospital admissions data point and the forecast date as +the nowcast time. The model will "forecast" this period, in addition to the +specified forecast horizon. ```{r} -# Let's assign some of the variables that we would expect to be in a -# configuration file. +calibration_time <- 90 +forecast_horizon <- 28 +``` -calibration_time <- 90 # number of days to calibrate hospital admissions to -forecast_horizon <- 28 # number of days from the forecast date to generate -# forecasts for +## Delay distributions +We will pass in some probabiltiy mass functions (pmfs) that are specific to +COVID, and to the delay from infections to hospital admissions, the count +data we are using to fit th emodel. If using a different pathogen or a +different count dataset, these pmfs need to be replaced. We provide them as +package data here. These are both vectors of simplexes (they must sum to 1). -# We will pass in some pmfs that are specific to COVID, and to the delay from -# infections to hospital admissions. If using a different pathogen or a -# different count dataset, these pmfs need to be replaced. We provide them as -# package data here. These are both vectors of simplexes (they must sum to 1). +Additionally, the model requires specifying a delay distribution for the +infection feedback term, which essentially describes the delay at which +high incident infections results in negative feedback on future infections +(due to susceptibility, behavior changes, policies to reduce transmission, +etc.). We by default set this as the generation interval, but this can be +modified as long as the values sum to 1. +```{r} generation_interval <- wwinference::generation_interval inf_to_hosp <- wwinference::inf_to_hosp -# Additionally, the model requires specifying a delay distribution for the -# infection feedback term, which essentially describes the delay at which -# high incident infections results in negative feedback on future infections -# (due to susceptibility, behavior changes, policies to reduce transmission, -# etc.). We by default set this as the generation interval, but this can be -# modified as long as the values sum to 1. +# Assign infection feedback equal to the generation interval infection_feedback_pmf <- generation_interval +``` +# Precompiling the model +As `wwinference` uses `cmdstan` to fit its models, it is necessary to first +compile the model. This can be done using the compile_model() function. +```{r} +model <- compile_model() +``` +# Fitting the model +We're now ready to fit the model using the “No-U-Turn Sampler Markov chain +Monte Carlo” method. This is a type of Hamiltonian Monte Carlo (HMC) algorithm +and is the core fitting method used by `cmdstan`. The user can adjust the MCMC +settings (see the documentation for `fit_model`), however this vignette will use +the default parameter settings which includes running 4 parallel chains with +750 warm up iterations, 500 sampling iterations for each chain, a target average +acceptance probability of 0.95 and a maximum tree depth of 12. The user may wish +to adjust these as they are iterating to reduce model run-time or to achieve +better convergence on a real-world use case. + +We also pass our preprocessed data (pobs), our pre-compiled model (model), and our model modules (expectation_module, reference_module, and report_module) to epinowcast, where they are combined and used to fit the model. # Next, we will get the stan data objected from the data and the model # specifications +```{r} +``` + +# The `wwinference` object +The `wwinference()` function returns a `wwinference` object which includes +diagnostic information, the data used for for fitting, and the underlying +`CmdStanModel` object. The `CmdStanModel` object contains the estimated, +nowcasted, and forecasted expected observed hospital admissions and wastewater +concentrations, as well as the latent variables of interest including the site- +level $R(t)$ estimates and the state-level $R(t)$ estimate. +```{r} stan_data <- get_stan_data( input_count_data = hosp_data_preprocessed, input_ww_data = ww_data_preprocessed, @@ -210,19 +281,6 @@ stan_data <- get_stan_data( infection_feedback_pmf = infection_feedback_pmf, params = params ) -``` -# Compile and fit the model -```{r} -model_file_path <- system.file( - "stan", "wwinference.stan", - package = "wwinference" -) - -model <- compile_model( - model_file_path -) - - ww_fit_obj <- model$sample( data = stan_data, seed = 123, From 6ead306d2fd40c97af8c78fc76ac3e7b0b0a4768 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Mon, 1 Jul 2024 15:07:11 -0400 Subject: [PATCH 031/103] add draft of wrapper function to fit model --- R/checkers.R | 39 +++++++++++++++++++++++++++ R/get_stan_data.R | 3 ++- R/preprocessing.R | 2 +- man/check_date.Rd | 22 +++++++++++++++ man/get_stan_data.Rd | 3 ++- man/preprocess_hosp_data.Rd | 16 ++++++----- man/preprocess_ww_data.Rd | 13 ++++----- vignettes/wwinference.Rmd | 53 ++++++++++++++++++++++++++++++++++--- 8 files changed, 133 insertions(+), 18 deletions(-) create mode 100644 R/checkers.R create mode 100644 man/check_date.Rd diff --git a/R/checkers.R b/R/checkers.R new file mode 100644 index 00000000..b0e419dd --- /dev/null +++ b/R/checkers.R @@ -0,0 +1,39 @@ +#' Check that all dates in dataframe passed in are before a specified date +#' +#' @param df dataframe with `date` column +#' @param max_date string indicating the maximum date in ISO8 convention +#' e.g. YYYY-MM-DD +#' @param call Calling environment to be passed to the type checker +#' +#' @return NULL, invisibly +check_date <- function(df, max_date, call = rlang::caller_env()) { + if (max(df$date) > max_date) { + cli::cli_abort( + c( + "The data passed in has observations beyond the specified", + "maximum date. Either this is the incorrect vintaged", + "data, or the data needs to be filtered to only contain", + "observations before the maximum date" + ), + call = call, + class = "wwinference_input_data_error" + ) + } + invisible() +} + +check_elements_non_neg <- function(x, arg = "x", call = rlang::caller_env()) { + # Greater than or equal to 0 or is NA + is_non_neg <- (x >= 0) | is.na(x) + if (!all(is_non_neg)) { + cli::cli_abort( + c("{.arg {arg}} has negative elements", + "!" = "All elements must be 0 or greater", + "i" = "Elements {.val {which(!is_non_neg)}} are negative" + ), + class = "RtGam_invalid_input", + call = call + ) + } + invisible() +} diff --git a/R/get_stan_data.R b/R/get_stan_data.R index cfc8a099..c3188899 100644 --- a/R/get_stan_data.R +++ b/R/get_stan_data.R @@ -5,7 +5,8 @@ #' @param input_ww_data a dataframe with the input wastewater data with no gaps, #' must have the following columns: date, site, lab, genome_copies_per_ml, #' site_pop, below_lod, and if removing outliers, flag_as_ww_outlier -#' @param forecast_date string indicating the forecast date +#' @param forecast_date string indicating the forecast date in ISO8 convention +#' e.g. YYYY-MM-DD #' @param forecast_horizon integer indicating the number of days to make a #' forecast for #' @param calibration_time integer indicating the max duration in days that diff --git a/R/preprocessing.R b/R/preprocessing.R index c115255a..812e9fff 100644 --- a/R/preprocessing.R +++ b/R/preprocessing.R @@ -53,7 +53,7 @@ preprocess_ww_data <- function(ww_data, # Get an extra column that identifies the wastewater outliers using the # default parameters ww_preprocessed <- flag_ww_outliers(ww_data, - conc_col_name = !!conc_col_name + conc_col_name = {{ conc_col_name }} ) |> dplyr::rename( genome_copies_per_ml = {{ conc_col_name }} diff --git a/man/check_date.Rd b/man/check_date.Rd new file mode 100644 index 00000000..f0c8726d --- /dev/null +++ b/man/check_date.Rd @@ -0,0 +1,22 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/checkers.R +\name{check_date} +\alias{check_date} +\title{Check that all dates in dataframe passed in are before a specified date} +\usage{ +check_date(df, max_date, call = rlang::caller_env()) +} +\arguments{ +\item{df}{dataframe with \code{date} column} + +\item{max_date}{string indicating the maximum date in ISO8 convention +e.g. YYYY-MM-DD} + +\item{call}{Calling environment to be passed to the type checker} +} +\value{ +NULL, invisibly +} +\description{ +Check that all dates in dataframe passed in are before a specified date +} diff --git a/man/get_stan_data.Rd b/man/get_stan_data.Rd index aeae54c6..4f8bcb77 100644 --- a/man/get_stan_data.Rd +++ b/man/get_stan_data.Rd @@ -26,7 +26,8 @@ the following columns: date, count, total_pop} must have the following columns: date, site, lab, genome_copies_per_ml, site_pop, below_lod, and if removing outliers, flag_as_ww_outlier} -\item{forecast_date}{string indicating the forecast date} +\item{forecast_date}{string indicating the forecast date in ISO8 convention +e.g. YYYY-MM-DD} \item{forecast_horizon}{integer indicating the number of days to make a forecast for} diff --git a/man/preprocess_hosp_data.Rd b/man/preprocess_hosp_data.Rd index 2aaa5393..ba37dce7 100644 --- a/man/preprocess_hosp_data.Rd +++ b/man/preprocess_hosp_data.Rd @@ -28,10 +28,14 @@ have the following columns \code{date}, \code{count}, and \code{total_pop} Get input hospital admissions data } \examples{ -hosp_data <- tibble::tibble(date = c("2023-11-01", "2023-11-02"), - daily_admits = c( 10, 20), - state_pop = c(1e6, 1e6)) -hosp_data_preprocessed <- preprocess_hospdata(hosp_data, - "daily_admits", - "state_pop") +hosp_data <- tibble::tibble( + date = c("2023-11-01", "2023-11-02"), + daily_admits = c(10, 20), + state_pop = c(1e6, 1e6) +) +hosp_data_preprocessed <- preprocess_hospdata( + hosp_data, + "daily_admits", + "state_pop" +) } diff --git a/man/preprocess_ww_data.Rd b/man/preprocess_ww_data.Rd index cc68e995..028b2a8d 100644 --- a/man/preprocess_ww_data.Rd +++ b/man/preprocess_ww_data.Rd @@ -32,12 +32,13 @@ forecast_date Get input wastewater data } \examples{ -ww_data <- tibble::tibble(date = rep(c("2023-11-01", "2023-11-02"),2), - site= c(rep(1), rep(2)), - lab = c(1,1,1,1), - conc = c(345.2, 784.1, 401.5, 681.8), - lod = c(20, 20, 15, 15) - ) +ww_data <- tibble::tibble( + date = rep(c("2023-11-01", "2023-11-02"), 2), + site = c(rep(1), rep(2)), + lab = c(1, 1, 1, 1), + conc = c(345.2, 784.1, 401.5, 681.8), + lod = c(20, 20, 15, 15) +) ww_data_preprocessed <- preprocess_ww_data(ww_data, conc_col_name = "conc", lod_col_name = "lod" diff --git a/vignettes/wwinference.Rmd b/vignettes/wwinference.Rmd index bbb6d186..f18e8d91 100644 --- a/vignettes/wwinference.Rmd +++ b/vignettes/wwinference.Rmd @@ -168,7 +168,6 @@ ggplot(ww_data_preprocessed) + show.legend = FALSE ) + geom_hline(aes(yintercept = lod), linetype = "dashed") + - geom_vline(aes(xintercept = forecast_date), linetype = "dashed") + facet_wrap(~lab_site_name) + xlab("") + ylab("Genome copies/mL") + @@ -186,7 +185,6 @@ ggplot(hosp_data_preprocessed) + ) + # Plot the data we will calibrate to geom_point(aes(x = date, y = count)) + - geom_vline(aes(xintercept = forecast_date), linetype = "dashed") + xlab("") + ylab("Daily hospital admissions") + ggtitle("State level hospital admissions") + @@ -213,6 +211,7 @@ time between the last hospital admissions data point and the forecast date as the nowcast time. The model will "forecast" this period, in addition to the specified forecast horizon. ```{r} +forecast_date <- "2023-12-06" calibration_time <- 90 forecast_horizon <- 28 ``` @@ -238,6 +237,19 @@ inf_to_hosp <- wwinference::inf_to_hosp infection_feedback_pmf <- generation_interval ``` +## Combine into a single list called `model_spec` +```{r} +model_spec <- list( + forecast_date = forecast_date, + calibration_time = calibration_time, + forecast_horizon = forecast_horizon, + generation_interval = generation_interval, + inf_to_count_delay = inf_to_hosp, + infection_feedback_pmf = infection_feedback_pmf, + params = params +) +``` + # Precompiling the model As `wwinference` uses `cmdstan` to fit its models, it is necessary to first compile the model. This can be done using the compile_model() function. @@ -255,10 +267,45 @@ acceptance probability of 0.95 and a maximum tree depth of 12. The user may wish to adjust these as they are iterating to reduce model run-time or to achieve better convergence on a real-world use case. -We also pass our preprocessed data (pobs), our pre-compiled model (model), and our model modules (expectation_module, reference_module, and report_module) to epinowcast, where they are combined and used to fit the model. +We also pass our preprocessed datasets (`ww_data_preprocessed` and +`hosp_data_preprocessed`), our model specifications (`model_spec`), and our +pre-compiled model(model) to `wwinference` where they are combined and used to +fit the model. # Next, we will get the stan data objected from the data and the model # specifications ```{r} +wwinference <- function(ww_data, + count_data, + model_spec, # can provide defaults + model) { + # Check that data is compatible with specifications + check_date(ww_data, model_spec$forecast_date) + check_date(count_data, model_spec$forecast_date) + + # If checks pass, create stan data object + stan_data <- get_stan_data( + input_count_data = count_data, + input_ww_data = ww_data, + forecast_date = model_spec$forecast_date, + calibration_time = model_spec$calibration_time, + forecast_horizon = model_spec$forecast_horizon, + generation_interval = model_spec$generation_interval, + inf_to_count_delay = model_spec$inf_to_count_delay, + infection_feedback_pmf = model_spec$infection_feedback_pmf, + params = model_spec$params + ) + + + fit_obj <- model$sample( + data = stan_data, + seed = 123, + iter_sampling = 500, + iter_warmup = 750, + max_treedepth = 12, + chains = 4, + parallel_chains = 4 + ) +} ``` # The `wwinference` object From d04ace042d48d72f631bff1d045a19ce353a0961 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Mon, 1 Jul 2024 16:53:13 -0400 Subject: [PATCH 032/103] add initialization functions --- NAMESPACE | 3 ++ R/initialization.R | 86 +++++++++++++++++++++++++++++++++++++++ R/utils.R | 43 ++++++++++++++++++++ man/convert_to_logmean.Rd | 20 +++++++++ man/convert_to_logsd.Rd | 20 +++++++++ man/get_inits.Rd | 23 +++++++++++ vignettes/wwinference.Rmd | 5 +++ 7 files changed, 200 insertions(+) create mode 100644 R/initialization.R create mode 100644 man/convert_to_logmean.Rd create mode 100644 man/convert_to_logsd.Rd create mode 100644 man/get_inits.Rd diff --git a/NAMESPACE b/NAMESPACE index 5f869f8b..4e94d1c4 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -3,6 +3,8 @@ export(add_pmfs) export(add_time_indexing) export(compile_model) +export(convert_to_logmean) +export(convert_to_logsd) export(create_dir) export(drop_first_and_renormalize) export(flag_ww_outliers) @@ -23,6 +25,7 @@ export(make_reporting_delay_pmf) export(preprocess_hosp_data) export(preprocess_ww_data) export(simulate_double_censored_pmf) +export(to_simplex) export(validate_paramlist) importFrom(cmdstanr,cmdstan_model) importFrom(dplyr,arrange) diff --git a/R/initialization.R b/R/initialization.R new file mode 100644 index 00000000..74374b28 --- /dev/null +++ b/R/initialization.R @@ -0,0 +1,86 @@ +#' Given a set of prior parameters and stan data, initialize the model +#' near the center of the prior distribution +#' +#' @param stan_data a list of data elements that will be passed to stan +#' @param params a dataframe of parameter values that are passed to stan +#' to specify the priors in the model +#' +#' @return a list of initial values for each of the parameters in the +#' `wwinference` model +get_inits <- function(stan_data, params) { + # Assign parmeter names + par_names <- colnames(params) + for (i in seq_along(par_names)) { + assign(par_names[i], as.double(params[i])) + } + + pop <- stan_data$total_pop + + n_weeks <- as.numeric(stan_data$n_weeks) + tot_weeks <- as.numeric(stan_data$tot_weeks) + ot <- as.numeric(stan_data$ot) + ht <- as.numeric(stan_data$ht) + + # Estimate of number of initial infections + i0 <- mean(stan_data$counts[1:7], na.rm = TRUE) / p_hosp_mean + + n_subpops <- as.numeric(stan_data$n_subpops) + n_ww_lab_sites <- as.numeric(stan_data$n_ww_lab_sites) + + init_list <- list( + w = stats::rnorm(n_weeks - 1, 0, 0.01), + eta_sd = abs(stats::rnorm(1, 0, 0.01)), + eta_i0 = abs(stats::rnorm(n_subpops, 0, 0.01)), + sigma_i0 = abs(stats::rnorm(1, 0, 0.01)), + eta_growth = abs(stats::rnorm(n_subpops, 0, 0.01)), + sigma_growth = abs(stats::rnorm(1, 0, 0.01)), + autoreg_rt = abs(stats::rnorm( + 1, + autoreg_rt_a / (autoreg_rt_a + autoreg_rt_b), + 0.05 + )), + log_r_mu_intercept = stats::rnorm( + 1, + convert_to_logmean(1, 0.1), + convert_to_logsd(1, 0.1) + ), + error_site = matrix( + stats::rnorm(n_subpops * n_weeks, + mean = 0, + sd = 0.1 + ), + n_subpops, + n_weeks + ), + autoreg_rt_site = abs(stats::rnorm(1, 0.5, 0.05)), + autoreg_p_hosp = abs(stats::rnorm(1, 1 / 100, 0.001)), + sigma_rt = abs(stats::rnorm(1, 0, 0.01)), + i0_over_n = stats::plogis(stats::rnorm(1, stats::qlogis(i0 / pop), 0.05)), + initial_growth = stats::rnorm(1, 0, 0.001), + inv_sqrt_phi_h = 1 / sqrt(200) + stats::rnorm(1, 1 / 10000, 1 / 10000), + sigma_ww_site_mean = abs(stats::rnorm( + 1, sigma_ww_site_prior_mean_mean, + 0.1 * sigma_ww_site_prior_mean_sd + )), + sigma_ww_site_sd = abs(stats::rnorm( + 1, sigma_ww_site_prior_sd_mean, + 0.1 * sigma_ww_site_prior_sd_sd + )), + sigma_ww_site_raw = abs(stats::rnorm(n_ww_lab_sites, 0, 0.05)), + p_hosp_mean = stats::rnorm(1, stats::qlogis(p_hosp_mean), 0.01), + p_hosp_w = stats::rnorm(tot_weeks, 0, 0.01), + p_hosp_w_sd = abs(stats::rnorm(1, 0.01, 0.001)), + t_peak = stats::rnorm(1, t_peak_mean, 0.1 * t_peak_sd), + viral_peak = stats::rnorm(1, viral_peak_mean, 0.1 * viral_peak_sd), + dur_shed = stats::rnorm( + 1, duration_shedding_mean, + 0.1 * duration_shedding_sd + ), + log10_g = stats::rnorm(1, log10_g_prior_mean, 0.5), + ww_site_mod_raw = abs(stats::rnorm(n_ww_lab_sites, 0, 0.05)), + ww_site_mod_sd = abs(stats::rnorm(1, 0, 0.05)), + hosp_wday_effect = to_simplex(stats::rnorm(7, 1 / 7, 0.01)), + infection_feedback = abs(stats::rnorm(1, 500, 20)) + ) + return(init_list) +} diff --git a/R/utils.R b/R/utils.R index bfffb6e4..c1a329d4 100644 --- a/R/utils.R +++ b/R/utils.R @@ -93,3 +93,46 @@ create_dir <- function(output_file_path) { Sys.chmod(output_file_path, mode = "0777", use_umask = FALSE) } } + +#' @title Convert to logmean in lognorm distribution +#' @description +#' see arithmetic moments here +#' https://en.wikipedia.org/wiki/Log-normal_distribution +#' +#' @param mean mean of the normal distribution +#' @param sd sd of the normal distribution +#' +#' @return corresponding mean of the lognormal distribution +#' @export +convert_to_logmean <- function(mean, sd) { + logmean <- log(mean^2 / sqrt(sd^2 + mean^2)) + return(logmean) +} + + +#' @title Convert to logsd in lognormal distribution +#' @description@description see arithmetic moments here +#' https://en.wikipedia.org/wiki/Log-normal_distribution +#' +#' @param mean mean of the normal distribution +#' @param sd sd of the normal distribution +#' +#' @return corresponding stdev of the lognormal distribution +#' @export +convert_to_logsd <- function(mean, sd) { + logsd <- sqrt(log(1 + (sd^2 / mean^2))) + return(logsd) +} + +#' @title Normalize vector to a simplex +#' +#' @param vector numeric vector +#' +#' @return vector whos sum adds to 1 +#' @export +#' @examples +#' to_simplex(c(1, 1, 1)) +#' @noRd +to_simplex <- function(vector) { + return(vector / sum(vector)) +} diff --git a/man/convert_to_logmean.Rd b/man/convert_to_logmean.Rd new file mode 100644 index 00000000..f97df71c --- /dev/null +++ b/man/convert_to_logmean.Rd @@ -0,0 +1,20 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/utils.R +\name{convert_to_logmean} +\alias{convert_to_logmean} +\title{Convert to logmean in lognorm distribution} +\usage{ +convert_to_logmean(mean, sd) +} +\arguments{ +\item{mean}{mean of the normal distribution} + +\item{sd}{sd of the normal distribution} +} +\value{ +corresponding mean of the lognormal distribution +} +\description{ +see arithmetic moments here +https://en.wikipedia.org/wiki/Log-normal_distribution +} diff --git a/man/convert_to_logsd.Rd b/man/convert_to_logsd.Rd new file mode 100644 index 00000000..4f63e928 --- /dev/null +++ b/man/convert_to_logsd.Rd @@ -0,0 +1,20 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/utils.R +\name{convert_to_logsd} +\alias{convert_to_logsd} +\title{Convert to logsd in lognormal distribution} +\usage{ +convert_to_logsd(mean, sd) +} +\arguments{ +\item{mean}{mean of the normal distribution} + +\item{sd}{sd of the normal distribution} +} +\value{ +corresponding stdev of the lognormal distribution +} +\description{ +@description see arithmetic moments here +https://en.wikipedia.org/wiki/Log-normal_distribution +} diff --git a/man/get_inits.Rd b/man/get_inits.Rd new file mode 100644 index 00000000..59400968 --- /dev/null +++ b/man/get_inits.Rd @@ -0,0 +1,23 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/initialization.R +\name{get_inits} +\alias{get_inits} +\title{Given a set of prior parameters and stan data, initialize the model +near the center of the prior distribution} +\usage{ +get_inits(stan_data, params) +} +\arguments{ +\item{stan_data}{a list of data elements that will be passed to stan} + +\item{params}{a dataframe of parameter values that are passed to stan +to specify the priors in the model} +} +\value{ +a list of initial values for each of the parameters in the +\code{wwinference} model +} +\description{ +Given a set of prior parameters and stan data, initialize the model +near the center of the prior distribution +} diff --git a/vignettes/wwinference.Rmd b/vignettes/wwinference.Rmd index f18e8d91..02dddc8d 100644 --- a/vignettes/wwinference.Rmd +++ b/vignettes/wwinference.Rmd @@ -295,9 +295,14 @@ wwinference <- function(ww_data, params = model_spec$params ) + init_fun <- function() { + get_inits(stan_data, params) + } + fit_obj <- model$sample( data = stan_data, + init = init_fun, seed = 123, iter_sampling = 500, iter_warmup = 750, From e222e520a19831439fe397f70de0fabb4217f26f Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Tue, 2 Jul 2024 09:41:14 -0400 Subject: [PATCH 033/103] add a wrapper function around the model fitting --- R/wwinference.R | 103 ++++++++++++++++++++++++++++++++++++++ vignettes/wwinference.Rmd | 53 ++++++-------------- 2 files changed, 118 insertions(+), 38 deletions(-) create mode 100644 R/wwinference.R diff --git a/R/wwinference.R b/R/wwinference.R new file mode 100644 index 00000000..cffa88b5 --- /dev/null +++ b/R/wwinference.R @@ -0,0 +1,103 @@ +wwinference <- function(ww_data, + count_data, + model_spec = list( + forecast_date = "2023-12-06", + calibration_time = 90, + forecast_horizon = 28, + generation_interval = + wwinference::generation_interval, + inf_to_count_delay = wwinference::inf_to_hosp, + infection_feedback_pmf = + wwinference::generation_interval, + params = get_params( + system.file("extdata", "example_params.toml", + package = "wwinference" + ) + ), + # Default MCMC settings + iter_warmup = 750, + iter_sampling = 500, + n_chains = 4, + seed = 123, + adapt_delta = 0.95, + max_treedepth = 12, + # Default to excluding outliers and fitting to data + exclude_ww_outliers = TRUE, + compute_likelihood = 1 + ), + compiled_model = compile_model()) { + # Check that data is compatible with specifications + check_date(ww_data, model_spec$forecast_date) + check_date(count_data, model_spec$forecast_date) + + # If checks pass, create stan data object + stan_data <- get_stan_data( + input_count_data = count_data, + input_ww_data = ww_data, + forecast_date = model_spec$forecast_date, + calibration_time = model_spec$calibration_time, + forecast_horizon = model_spec$forecast_horizon, + generation_interval = model_spec$generation_interval, + inf_to_count_delay = model_spec$inf_to_count_delay, + infection_feedback_pmf = model_spec$infection_feedback_pmf, + params = model_spec$params, + exclude_ww_outliers = TRUE, + compute_likelihood = 1 + ) + + init_lists <- c() + for (i in 1:model_spec$n_chains) { + init_lists[[i]] <- get_inits(stan_data, params) + } + + + fit_model <- function(compiled_model, + standata, + model_spec, + init_lists) { + fit <- compiled_model$sample( + data = stan_data, + init = init_lists, + seed = model_spec$seed, + iter_sampling = model_spec$iter_sampling, + iter_warmup = model_spec$iter_warmup, + max_treedepth = model_spec$max_treedepth, + chains = model_spec$n_chains, + parallel_chains = model_spec$n_chains + ) + print(fit) + return(fit) + } + + # This returns the cmdstan object if the model runs, and result = NULL if + # the model errors + safe_fit_model <- purrr::safely(fit_model) + + fit <- safe_fit_model( + compiled_model, + standata, + model_spec, + init_lists + ) + + if (!is.null(fit$error)) { # If the model errors, return a list with the + # error and everything else NULL + out <- list( + error = fit$error[[1]] + ) + } else { + draws <- fit$result$draws() + diagnostics <- fit$result$sampler_diagnostics(format = "df") + summary_diagnostics <- fit$result$diagnostic_summary() + summary <- fit$result$summary() + + out <- list( + draws = draws, + diagnostics = diagnostics, + summary_diagnostics = summary_diagnostics, + summary = summary + ) + } + + return(out) +} diff --git a/vignettes/wwinference.Rmd b/vignettes/wwinference.Rmd index 02dddc8d..2af5e607 100644 --- a/vignettes/wwinference.Rmd +++ b/vignettes/wwinference.Rmd @@ -246,7 +246,15 @@ model_spec <- list( generation_interval = generation_interval, inf_to_count_delay = inf_to_hosp, infection_feedback_pmf = infection_feedback_pmf, - params = params + params = params, + iter_warmup = 750, + iter_sampling = 500, + n_chains = 4, + seed = 123, + adapt_delta = 0.95, + max_treedepth = 12, + exclude_ww_outliers = TRUE, + compute_likelihood = 1 ) ``` @@ -274,43 +282,12 @@ fit the model. # Next, we will get the stan data objected from the data and the model # specifications ```{r} -wwinference <- function(ww_data, - count_data, - model_spec, # can provide defaults - model) { - # Check that data is compatible with specifications - check_date(ww_data, model_spec$forecast_date) - check_date(count_data, model_spec$forecast_date) - - # If checks pass, create stan data object - stan_data <- get_stan_data( - input_count_data = count_data, - input_ww_data = ww_data, - forecast_date = model_spec$forecast_date, - calibration_time = model_spec$calibration_time, - forecast_horizon = model_spec$forecast_horizon, - generation_interval = model_spec$generation_interval, - inf_to_count_delay = model_spec$inf_to_count_delay, - infection_feedback_pmf = model_spec$infection_feedback_pmf, - params = model_spec$params - ) - - init_fun <- function() { - get_inits(stan_data, params) - } - - - fit_obj <- model$sample( - data = stan_data, - init = init_fun, - seed = 123, - iter_sampling = 500, - iter_warmup = 750, - max_treedepth = 12, - chains = 4, - parallel_chains = 4 - ) -} +fit <- wwinference( + ww_data, + count_data, + model_spec, + model +) ``` # The `wwinference` object From 0736f02e4f538520422b6856642342631ee6dcae Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Tue, 2 Jul 2024 15:55:16 -0400 Subject: [PATCH 034/103] fix bug --- R/generate_simulated_data.R | 4 ++-- man/generate_simulated_data.Rd | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/R/generate_simulated_data.R b/R/generate_simulated_data.R index 69549099..b3e17230 100644 --- a/R/generate_simulated_data.R +++ b/R/generate_simulated_data.R @@ -66,7 +66,7 @@ #' # different labs #' sim_data <- generate_simulated_data( #' n_sites = 6, -#' map_site_to_lab = c(rep(1, 4), rep(2, 2)) +#' site = c(rep(1, 4), rep(2, 2)) #' ) #' hosp_data <- sim_data$hosp_data #' ww_data <- sim_data$ww_data @@ -381,7 +381,7 @@ generate_simulated_data <- function(r_in_weeks = # nolint # Add on site-lab-level observation error ----------------------------------- log_obs_g_over_n_lab_site <- matrix(nrow = n_lab_sites, ncol = (ot + ht)) for (i in 1:n_lab_sites) { - log_g_w_multiplier <- log_g_over_n_site[map_site_to_lab[i], ] + + log_g_w_multiplier <- log_g_over_n_site[site[i], ] + log_m_lab_sites[i] # Add site level multiplier in log scale log_obs_g_over_n_lab_site[i, ] <- log_g_w_multiplier + rnorm( diff --git a/man/generate_simulated_data.Rd b/man/generate_simulated_data.Rd index d5d890d0..c1bfde5f 100644 --- a/man/generate_simulated_data.Rd +++ b/man/generate_simulated_data.Rd @@ -119,7 +119,7 @@ and parameters to generate from. # different labs sim_data <- generate_simulated_data( n_sites = 6, - map_site_to_lab = c(rep(1, 4), rep(2, 2)) + site = c(rep(1, 4), rep(2, 2)) ) hosp_data <- sim_data$hosp_data ww_data <- sim_data$ww_data From 4f2a2d12722babefc348f1e1afdebe71b4bb169d Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Tue, 2 Jul 2024 15:57:13 -0400 Subject: [PATCH 035/103] update gitignore to exclude stan binary and .Rproj --- .gitignore | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/.gitignore b/.gitignore index 2998db10..3cafdc54 100644 --- a/.gitignore +++ b/.gitignore @@ -1,6 +1,10 @@ ##### # Exclude many data and output file types by default +# Exclude stan compiled model + +inst/stan/wwinference + # Data *.csv *.tsv @@ -234,6 +238,7 @@ cython_debug/ # RStudio files .Rproj.user/ +*.Rproj # produced vignettes vignettes/*.html From 3961d08be6e23d4fe69f628a5f1eea7cca885503 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 3 Jul 2024 07:34:49 -0400 Subject: [PATCH 036/103] make it s.t. ww data needs a column called exclude, add preprocessing step to create this column --- NAMESPACE | 1 + R/get_stan_data.R | 37 +++++++++--------------- R/preprocessing.R | 54 +++++++++++++++++++++++++++++++++++ R/wwinference.R | 8 ++++-- man/get_stan_data.Rd | 4 --- man/indicate_ww_exclusions.Rd | 41 ++++++++++++++++++++++++++ vignettes/wwinference.Rmd | 22 ++++++++++++-- 7 files changed, 134 insertions(+), 33 deletions(-) create mode 100644 man/indicate_ww_exclusions.Rd diff --git a/NAMESPACE b/NAMESPACE index 4e94d1c4..27ee29f4 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -19,6 +19,7 @@ export(get_subpop_data) export(get_ww_data_indices) export(get_ww_data_sizes) export(get_ww_values) +export(indicate_ww_exclusions) export(make_hospital_onset_delay_pmf) export(make_incubation_period_pmf) export(make_reporting_delay_pmf) diff --git a/R/get_stan_data.R b/R/get_stan_data.R index c3188899..8e3a0256 100644 --- a/R/get_stan_data.R +++ b/R/get_stan_data.R @@ -18,8 +18,6 @@ #' @param infection_feedback_pmf a vector with a normalized pmf dictating the #' delay of infection feedback #' @param params a dataframe of parameter names and numeric values -#' @param exclude_ww_outliers boolean indicating whether or not to remove -#' the flagged ww_outliers, default = `TRUE` #' @param compute_likelihood indicator variable telling stan whether or not to #' compute the likelihood, default = `1` #' @@ -34,7 +32,6 @@ get_stan_data <- function(input_count_data, inf_to_count_delay, infection_feedback_pmf, params, - exclude_ww_outliers = TRUE, compute_likelihood = 1) { # Assign parameter names par_names <- colnames(params) @@ -66,21 +63,16 @@ get_stan_data <- function(input_count_data, ) - if (isTRUE(exclude_ww_outliers)) { - # Test for presence of needed column names - stopifnot( - "Outlier column name isn't present in input dataset" = - "flag_as_ww_outlier" %in% colnames(input_ww_data) - ) + # Test for presence of needed column names + stopifnot( + "Exclude column isn't present in input ww dataset" = + "exclude" %in% colnames(input_ww_data) + ) - # Filter out wastewater outliers and arrange data for indexing - ww_data <- input_ww_data |> - dplyr::filter(flag_as_ww_outlier != 1) |> - dplyr::arrange(date, lab_site_index) - } else { - ww_data <- input_ww_data |> - dplyr::arrange(date, lab_site_index) - } + # Filter out wastewater outliers and arrange data for indexing + ww_data <- input_ww_data |> + dplyr::filter(exclude != 1) |> + dplyr::arrange(date, lab_site_index) # Returns a list with the numbers of elements needed for the stan model ww_data_sizes <- get_ww_data_sizes( @@ -150,12 +142,11 @@ get_stan_data <- function(input_count_data, count_col_name = "count" ) - if (isTRUE(exclude_ww_outliers)) { - message( - "Removed ", nrow(input_ww_data) - ww_data_sizes$owt, - " outliers from WW data" - ) - } + message( + "Removed ", nrow(input_ww_data) - ww_data_sizes$owt, + " outliers from WW data" + ) + # matrix to transform P(count|I) from weekly to daily ind_m <- get_ind_m( diff --git a/R/preprocessing.R b/R/preprocessing.R index 812e9fff..120a14c0 100644 --- a/R/preprocessing.R +++ b/R/preprocessing.R @@ -208,3 +208,57 @@ flag_ww_outliers <- function(ww_data, return(ww_w_outliers_flagged) } + +#' Indicate data that we want to exclude from model fitting +#' @description This function takes in a dataframe which contains an outlier +#' column name specified by the `outlier_col_name`. +#' +#' @param data A dataframe of preprocessed data to be used to fit the o +#' @param outlier_col_name A character string indicating the name of the column +#' containing the outlier indicator, must contain only 0 or 1 +#' @param remove_outliers A boolean indicating whether or not to exclude the +#' outliers from the fitting. If TRUE, copy outliers to exclusions, if FALSE, +#' set exclusions to none +#' +#' @return a dataframe with the same columns as in `data` plus an additional +#' `exclude` column containing 0s for the data to be passed to the model +#' and 1s where the data should be excluded +#' @export +#' +#' @examples +#' data <- tibble::tibble( +#' date = c("2023-10-01", "2023-10-02"), +#' genome_copies_per_mL = c(300, 3e6), +#' flag_as_ww_outlier = c(0, 1) +#' ) +#' data_w_exclusions <- indicate_exclusions(data, +#' outlier_col_name = "flag_as_ww_outlier", +#' remove_outliers = TRUE +#' ) +indicate_ww_exclusions <- function(data, + outlier_col_name = "flag_as_ww_outlier", + remove_outliers = TRUE) { + # Check for the presence of the outlier column name + if (!outlier_col_name %in% c(colnames(data))) { + cli::cli_abort( + "Specified name of the outlier column not present in the data" + ) + } + + + if (isTRUE(remove_outliers)) { + # Copy over the column of outlier indicators to the "exclude" column. + data_w_exclusions <- data |> + dplyr::mutate( + exclude = {{ outlier_col_name }} + ) + } else { + data_w_exclusions <- data |> + dplyr::mutate( + exclude = 0 + ) + } + + + return(data_w_exclusions) +} diff --git a/R/wwinference.R b/R/wwinference.R index cffa88b5..65e09d11 100644 --- a/R/wwinference.R +++ b/R/wwinference.R @@ -21,8 +21,7 @@ wwinference <- function(ww_data, seed = 123, adapt_delta = 0.95, max_treedepth = 12, - # Default to excluding outliers and fitting to data - exclude_ww_outliers = TRUE, + # Default fitting to data compute_likelihood = 1 ), compiled_model = compile_model()) { @@ -41,7 +40,6 @@ wwinference <- function(ww_data, inf_to_count_delay = model_spec$inf_to_count_delay, infection_feedback_pmf = model_spec$infection_feedback_pmf, params = model_spec$params, - exclude_ww_outliers = TRUE, compute_likelihood = 1 ) @@ -85,6 +83,7 @@ wwinference <- function(ww_data, out <- list( error = fit$error[[1]] ) + message(error) } else { draws <- fit$result$draws() diagnostics <- fit$result$sampler_diagnostics(format = "df") @@ -97,6 +96,9 @@ wwinference <- function(ww_data, summary_diagnostics = summary_diagnostics, summary = summary ) + + # Run diagnostic tests, and message if a flag doesn't pass. Still return + # the same data } return(out) diff --git a/man/get_stan_data.Rd b/man/get_stan_data.Rd index 4f8bcb77..428199a0 100644 --- a/man/get_stan_data.Rd +++ b/man/get_stan_data.Rd @@ -14,7 +14,6 @@ get_stan_data( inf_to_count_delay, infection_feedback_pmf, params, - exclude_ww_outliers = TRUE, compute_likelihood = 1 ) } @@ -46,9 +45,6 @@ delay of infection feedback} \item{params}{a dataframe of parameter names and numeric values} -\item{exclude_ww_outliers}{boolean indicating whether or not to remove -the flagged ww_outliers, default = \code{TRUE}} - \item{compute_likelihood}{indicator variable telling stan whether or not to compute the likelihood, default = \code{1}} } diff --git a/man/indicate_ww_exclusions.Rd b/man/indicate_ww_exclusions.Rd new file mode 100644 index 00000000..5d6f6eb7 --- /dev/null +++ b/man/indicate_ww_exclusions.Rd @@ -0,0 +1,41 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/preprocessing.R +\name{indicate_ww_exclusions} +\alias{indicate_ww_exclusions} +\title{Indicate data that we want to exclude from model fitting} +\usage{ +indicate_ww_exclusions( + data, + outlier_col_name = "flag_as_ww_outlier", + remove_outliers = TRUE +) +} +\arguments{ +\item{data}{A dataframe of preprocessed data to be used to fit the o} + +\item{outlier_col_name}{A character string indicating the name of the column +containing the outlier indicator, must contain only 0 or 1} + +\item{remove_outliers}{A boolean indicating whether or not to exclude the +outliers from the fitting. If TRUE, copy outliers to exclusions, if FALSE, +set exclusions to none} +} +\value{ +a dataframe with the same columns as in \code{data} plus an additional +\code{exclude} column containing 0s for the data to be passed to the model +and 1s where the data should be excluded +} +\description{ +This function takes in a dataframe which contains an outlier +column name specified by the \code{outlier_col_name}. +} +\examples{ +data = tibble::tibble(date = c("2023-10-01", "2023-10-02"), + genome_copies_per_mL = c( 300, 3e6), + flag_as_ww_outlier = c(0,1) + ) +data_w_exclusions <- indicate_exclusions(data, + outlier_col_name = "flag_as_ww_outlier", + remove_outliers = TRUE + ) +} diff --git a/vignettes/wwinference.Rmd b/vignettes/wwinference.Rmd index 2af5e607..05e40b2a 100644 --- a/vignettes/wwinference.Rmd +++ b/vignettes/wwinference.Rmd @@ -96,7 +96,7 @@ pre-processing to add some additional variables that the model will need to be able apply features such as outlier exclusion and censoring of values below the limit of detection. ## Parameters -Get the parameters from the package. Note that some of these are COVID +Get the default parameters from the package. Note that some of these are COVID specific, others are more general to the model. This is indicated in the .toml file. ```{r} @@ -134,6 +134,7 @@ ww_data_preprocessed <- wwinference::preprocess_ww_data( ) ``` + ## Hospital admissions data pre-processing The `preprocess_hosp_data` function standardizes the column names of the resulting datafame. The user must specify the name of the column containing @@ -190,6 +191,21 @@ ggplot(hosp_data_preprocessed) + ggtitle("State level hospital admissions") + theme_bw() ``` +## Data exclusion +As an additional pre-processing step, the user can decide to exclude certain +data points from being included in the model fit procedure. For example, +we recommend excluding the flagged wastewater concentration outliers. To do so +we will add a column to each dataset to indicate whether or not it should be +excluded (1 meaning it will be removed). +```{r} +ww_data_to_fit <- indicate_ww_exclusions( + ww_data_preprocessed, + outlier_col_name = "flag_as_ww_outlier", + remove_outliers = TRUE +) +``` + + # Model specification: We will need to set some metadata to facilitate model specification. In addition to assigning the forecast date (done above), this also includes: @@ -283,8 +299,8 @@ fit the model. # specifications ```{r} fit <- wwinference( - ww_data, - count_data, + ww_data_to_fit, + hosp_data_preprocessed, model_spec, model ) From d9bd1d7229ebec32f338f1e9f63704a3f00161f6 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 3 Jul 2024 08:09:17 -0400 Subject: [PATCH 037/103] add exclude column in preprocessing, make outlier flag optional step --- R/preprocessing.R | 29 +++++++++++++++-------------- man/flag_ww_outliers.Rd | 10 +++++++--- man/indicate_ww_exclusions.Rd | 15 ++++++++------- vignettes/wwinference.Rmd | 10 +++++----- 4 files changed, 35 insertions(+), 29 deletions(-) diff --git a/R/preprocessing.R b/R/preprocessing.R index 120a14c0..0f122ece 100644 --- a/R/preprocessing.R +++ b/R/preprocessing.R @@ -113,9 +113,13 @@ preprocess_hosp_data <- function(hosp_data, #' @param threshold_n_dps min number of data points above the LOD per lab-site #' #' @return ww_w_outliers_flaged dataframe containing all of the columns in -#' ww_data input dataframe plus an additional column `flag_as_ww_outlier` -#' which contains a 0 if the datapoint is not an outlier and a 1 if it is -#' an outlier. +#' ww_data input dataframe plus two additional columns: +#' `flag_as_ww_outlier` and `exclude` +#' `flag as_ww_outlier` contains a 0 if the datapoint is not an outlier and a 1 +#' if it is an outlier. `exclude` tells the model whether or not to exclude that +#' data point, which here is by default set to 0 for all data points (even +#' those flagged as outliers). Excluding the outliers is a second optional +#' step. #' @export #' #' @examples @@ -198,12 +202,16 @@ flag_ww_outliers <- function(ww_data, flagged_for_removal_conc == 1 ~ 1, TRUE ~ 0 )) |> - dplyr::ungroup() + dplyr::ungroup() |> + dplyr::mutate( + exclude = 0 # by default, we don't exclude anything + ) ww_w_outliers_flagged <- ww_z_scored |> dplyr::select( colnames(ww_data), - flag_as_ww_outlier + flag_as_ww_outlier, + exclude ) return(ww_w_outliers_flagged) @@ -247,18 +255,11 @@ indicate_ww_exclusions <- function(data, if (isTRUE(remove_outliers)) { - # Copy over the column of outlier indicators to the "exclude" column. + # Port over the outliers flagged to the exclude column data_w_exclusions <- data |> dplyr::mutate( - exclude = {{ outlier_col_name }} - ) - } else { - data_w_exclusions <- data |> - dplyr::mutate( - exclude = 0 + exclude = ifelse({{ outlier_col_name }} == 1, 1, exclude) ) } - - return(data_w_exclusions) } diff --git a/man/flag_ww_outliers.Rd b/man/flag_ww_outliers.Rd index 9deb4d2c..de6ffe9b 100644 --- a/man/flag_ww_outliers.Rd +++ b/man/flag_ww_outliers.Rd @@ -28,9 +28,13 @@ log concentration} } \value{ ww_w_outliers_flaged dataframe containing all of the columns in -ww_data input dataframe plus an additional column \code{flag_as_ww_outlier} -which contains a 0 if the datapoint is not an outlier and a 1 if it is -an outlier. +ww_data input dataframe plus two additional columns: +\code{flag_as_ww_outlier} and \code{exclude} +\verb{flag as_ww_outlier} contains a 0 if the datapoint is not an outlier and a 1 +if it is an outlier. \code{exclude} tells the model whether or not to exclude that +data point, which here is by default set to 0 for all data points (even +those flagged as outliers). Excluding the outliers is a second optional +step. } \description{ Flag WW outliers diff --git a/man/indicate_ww_exclusions.Rd b/man/indicate_ww_exclusions.Rd index 5d6f6eb7..82dc3e18 100644 --- a/man/indicate_ww_exclusions.Rd +++ b/man/indicate_ww_exclusions.Rd @@ -30,12 +30,13 @@ This function takes in a dataframe which contains an outlier column name specified by the \code{outlier_col_name}. } \examples{ -data = tibble::tibble(date = c("2023-10-01", "2023-10-02"), - genome_copies_per_mL = c( 300, 3e6), - flag_as_ww_outlier = c(0,1) - ) +data <- tibble::tibble( + date = c("2023-10-01", "2023-10-02"), + genome_copies_per_mL = c(300, 3e6), + flag_as_ww_outlier = c(0, 1) +) data_w_exclusions <- indicate_exclusions(data, - outlier_col_name = "flag_as_ww_outlier", - remove_outliers = TRUE - ) + outlier_col_name = "flag_as_ww_outlier", + remove_outliers = TRUE +) } diff --git a/vignettes/wwinference.Rmd b/vignettes/wwinference.Rmd index 05e40b2a..93528acf 100644 --- a/vignettes/wwinference.Rmd +++ b/vignettes/wwinference.Rmd @@ -192,13 +192,13 @@ ggplot(hosp_data_preprocessed) + theme_bw() ``` ## Data exclusion -As an additional pre-processing step, the user can decide to exclude certain -data points from being included in the model fit procedure. For example, +As an optional additional pre-processing step, the user can decide to exclude +certain data points from being included in the model fit procedure. For example, we recommend excluding the flagged wastewater concentration outliers. To do so -we will add a column to each dataset to indicate whether or not it should be -excluded (1 meaning it will be removed). +we will use the `indicate_ww_exclusions()` function, which will add the +flagged outliers to the exclude column where indicated. ```{r} -ww_data_to_fit <- indicate_ww_exclusions( +ww_data_to_fit <- wwinference::indicate_ww_exclusions( ww_data_preprocessed, outlier_col_name = "flag_as_ww_outlier", remove_outliers = TRUE From 2a943324c640c0528272882bc35e5509ef3e2b56 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 3 Jul 2024 09:52:54 -0400 Subject: [PATCH 038/103] add functions that set up model specification and mcmc options --- NAMESPACE | 2 + R/wwinference.R | 136 +++++++++++++++++++++++++++++++------- man/get_mcmc_options.Rd | 52 +++++++++++++++ man/model_spec.Rd | 56 ++++++++++++++++ vignettes/wwinference.Rmd | 55 +++------------ 5 files changed, 234 insertions(+), 67 deletions(-) create mode 100644 man/get_mcmc_options.Rd create mode 100644 man/model_spec.Rd diff --git a/NAMESPACE b/NAMESPACE index 27ee29f4..a1513f88 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -13,6 +13,7 @@ export(get_count_data_sizes) export(get_count_indices) export(get_count_values) export(get_ind_m) +export(get_mcmc_options) export(get_params) export(get_stan_data) export(get_subpop_data) @@ -23,6 +24,7 @@ export(indicate_ww_exclusions) export(make_hospital_onset_delay_pmf) export(make_incubation_period_pmf) export(make_reporting_delay_pmf) +export(model_spec) export(preprocess_hosp_data) export(preprocess_ww_data) export(simulate_double_censored_pmf) diff --git a/R/wwinference.R b/R/wwinference.R index 65e09d11..68b39fea 100644 --- a/R/wwinference.R +++ b/R/wwinference.R @@ -1,29 +1,10 @@ wwinference <- function(ww_data, count_data, - model_spec = list( - forecast_date = "2023-12-06", - calibration_time = 90, - forecast_horizon = 28, - generation_interval = - wwinference::generation_interval, - inf_to_count_delay = wwinference::inf_to_hosp, - infection_feedback_pmf = - wwinference::generation_interval, - params = get_params( - system.file("extdata", "example_params.toml", - package = "wwinference" - ) - ), - # Default MCMC settings - iter_warmup = 750, - iter_sampling = 500, - n_chains = 4, - seed = 123, - adapt_delta = 0.95, - max_treedepth = 12, - # Default fitting to data - compute_likelihood = 1 + model_spec = get_model_spec( + forecast_date = + "2023-12-06" ), + mcmc_options = get_mcmc_options(), compiled_model = compile_model()) { # Check that data is compatible with specifications check_date(ww_data, model_spec$forecast_date) @@ -103,3 +84,112 @@ wwinference <- function(ww_data, return(out) } + +#' Get MCMC options +#' +#' @description +#' This function returns a list of MCMC settings to pass to the +#' `cmdstanr::sample()` function to fit the model. The default settings are +#' specified for production-level runs, consider adjusting to optimize +#' for speed while iterating. +#' +#' +#' @param iter_warmup integer indicating the number of warm-up iterations, +#' default is `750` +#' @param iter_sampling integer indicating the number of sampling iterations, +#' default is `500` +#' @param n_chains integer indicating the number of MCMC chains to run, default +#' is `4` +#' @param seed set of integers indicating the random seed of the stan sampler, +#' default is `123` +#' @param adapt_delta float between 0 and 1 indicating the average acceptance +#' probability, default is `0.95` +#' @param max_treedepth integer indicating the maximum tree depth of the +#' sampler, default is 12 +#' @param compute_likelihood integer indicating whether or not to compute the +#' likelihood using the data, default is `1` which will fit the model to the +#' data. If set to 0, the model will sample from the prior only +#' +#' @return a list of mcmc settings with the values given by the function +#' arguments +#' @export +#' +#' @examples +#' mcmc_settings <- get_mcmc_options() +get_mcmc_options <- function( + iter_warmup = 750, + iter_sampling = 500, + n_chains = 4, + seed = 123, + adapt_delta = 0.95, + max_treedepth = 12, + compute_likelihood = 1) { + mcmc_settings <- list( + iter_warmup = iter_warmup, + iter_sampling = iter_sampling, + n_chains = n_chains, + seed = seed, + adapt_delta = adapt_delta, + max_treedepth = max_treedepth, + compute_likelihood = compute_likelihood + ) + + return(mcmc_settings) +} + +#' Get model specificaitons +#' @description +#' This function returns a nested list containing the model specifications +#' in the function arguments. All defaults are set for the case of fitting a +#' post-omicron COVID-19 model with joint inference of hospital admissions +#' and data on wastewater viral concentrations +#' +#' +#' @param forecast_date a character string in ISO8 format (YYYY-MM-DD) +#' indicating the date that the forecast is to be made. Default is +#' @param calibration_time integer indicating the number of days to calibrate +#' the model for, default is `90` +#' @param forecast_horizon integer indicating the number of days, including the +#' forecast date, to produce forecasts for, default is `28` +#' @param generation_interval vector of a simplex (must sum to 1) describing +#' the daily probability of onwards transmission, default is package data +#' provided for the COVID-19 generation interval post-Omicron +#' @param inf_to_count_delay vector of a simplex (must sum to 1) describing the +#' daily probability of transitioning from infection to whatever the count +#' variable is, e.g. hospital admissions or cases. Default corresonds to the +#' delay distribution from COVID-19 infection to hospital admission +#' @param infection_feedback_pmf vector of a simplex (must sum to 1) describing +#' the delay from incident infection to feedback in the transmission dynamics. +#' The default is the COVID-19 generation interval +#' @param params a 1 row dataframe of parameters corresponding to model +#' priors and disease/data specific parameters. Default is for COVID-19 hospital +#' admissions and viral concentrations in wastewater +#' +#' @return a list of model specs to be passed to the `get_stan_data()` function +#' @export +#' +#' @examples +#' model_spec_list <- model_spec(forecast_date = "2023-12-06") +model_spec <- function( + forecast_date, + calibration_time = 90, + forecast_horizon = 28, + generation_interval = wwinference::generation_interval, + inf_to_count_delay = wwinference::inf_to_hosp, + infection_feedback_pmf = wwinference::generation_interval, + params = get_params( + system.file("extdata", "example_params.toml", + package = "wwinference" + ) + )) { + model_specs <- list( + forecast_date = forecast_date, + calibration_time = calibration_time, + forecast_horizon = forecast_horizon, + generation_interval = generation_interval, + inf_to_count_delay = inf_to_hosp, + infection_feedback_pmf = infection_feedback_pmf, + params = params + ) + return(model_specs) +} diff --git a/man/get_mcmc_options.Rd b/man/get_mcmc_options.Rd new file mode 100644 index 00000000..13d6cfe5 --- /dev/null +++ b/man/get_mcmc_options.Rd @@ -0,0 +1,52 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/wwinference.R +\name{get_mcmc_options} +\alias{get_mcmc_options} +\title{Get MCMC options} +\usage{ +get_mcmc_options( + iter_warmup = 750, + iter_sampling = 500, + n_chains = 4, + seed = 123, + adapt_delta = 0.95, + max_treedepth = 12, + compute_likelihood = 1 +) +} +\arguments{ +\item{iter_warmup}{integer indicating the number of warm-up iterations, +default is \code{750}} + +\item{iter_sampling}{integer indicating the number of sampling iterations, +default is \code{500}} + +\item{n_chains}{integer indicating the number of MCMC chains to run, default +is \code{4}} + +\item{seed}{set of integers indicating the random seed of the stan sampler, +default is \code{123}} + +\item{adapt_delta}{float between 0 and 1 indicating the average acceptance +probability, default is \code{0.95}} + +\item{max_treedepth}{integer indicating the maximum tree depth of the +sampler, default is 12} + +\item{compute_likelihood}{integer indicating whether or not to compute the +likelihood using the data, default is \code{1} which will fit the model to the +data. If set to 0, the model will sample from the prior only} +} +\value{ +a list of mcmc settings with the values given by the function +arguments +} +\description{ +This function returns a list of MCMC settings to pass to the +\code{cmdstanr::sample()} function to fit the model. The default settings are +specified for production-level runs, consider adjusting to optimize +for speed while iterating. +} +\examples{ +mcmc_settings <- get_mcmc_options() +} diff --git a/man/model_spec.Rd b/man/model_spec.Rd new file mode 100644 index 00000000..afe67b4d --- /dev/null +++ b/man/model_spec.Rd @@ -0,0 +1,56 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/wwinference.R +\name{model_spec} +\alias{model_spec} +\title{Get model specificaitons} +\usage{ +model_spec( + forecast_date, + calibration_time = 90, + forecast_horizon = 28, + generation_interval = wwinference::generation_interval, + inf_to_count_delay = wwinference::inf_to_hosp, + infection_feedback_pmf = wwinference::generation_interval, + params = get_params(system.file("extdata", "example_params.toml", package = + "wwinference")) +) +} +\arguments{ +\item{forecast_date}{a character string in ISO8 format (YYYY-MM-DD) +indicating the date that the forecast is to be made. Default is} + +\item{calibration_time}{integer indicating the number of days to calibrate +the model for, default is \code{90}} + +\item{forecast_horizon}{integer indicating the number of days, including the +forecast date, to produce forecasts for, default is \code{28}} + +\item{generation_interval}{vector of a simplex (must sum to 1) describing +the daily probability of onwards transmission, default is package data +provided for the COVID-19 generation interval post-Omicron} + +\item{inf_to_count_delay}{vector of a simplex (must sum to 1) describing the +daily probability of transitioning from infection to whatever the count +variable is, e.g. hospital admissions or cases. Default corresonds to the +delay distribution from COVID-19 infection to hospital admission} + +\item{infection_feedback_pmf}{vector of a simplex (must sum to 1) describing +the delay from incident infection to feedback in the transmission dynamics. +The default is the COVID-19 generation interval} + +\item{params}{a 1 row dataframe of parameters corresponding to model +priors and disease/data specific parameters. Default is for COVID-19 hospital +admissions and viral concentrations in wastewater} +} +\value{ +a list of model specs to be passed to the \code{get_stan_data()} function +} +\description{ +This function returns a nested list containing the model specifications +in the function arguments. All defaults are set for the case of fitting a +post-omicron COVID-19 model with joint inference of hospital admissions +and data on wastewater viral concentrations +} +\examples{ +model_spec_list <- model_spec(forecast_date = "2023-12-06") +} diff --git a/vignettes/wwinference.Rmd b/vignettes/wwinference.Rmd index 93528acf..d72dd9fa 100644 --- a/vignettes/wwinference.Rmd +++ b/vignettes/wwinference.Rmd @@ -252,27 +252,9 @@ inf_to_hosp <- wwinference::inf_to_hosp # Assign infection feedback equal to the generation interval infection_feedback_pmf <- generation_interval ``` +We will pass these to the `model_spec()` function of the `wwinference()` model, +along with the other specified parameters above. -## Combine into a single list called `model_spec` -```{r} -model_spec <- list( - forecast_date = forecast_date, - calibration_time = calibration_time, - forecast_horizon = forecast_horizon, - generation_interval = generation_interval, - inf_to_count_delay = inf_to_hosp, - infection_feedback_pmf = infection_feedback_pmf, - params = params, - iter_warmup = 750, - iter_sampling = 500, - n_chains = 4, - seed = 123, - adapt_delta = 0.95, - max_treedepth = 12, - exclude_ww_outliers = TRUE, - compute_likelihood = 1 -) -``` # Precompiling the model As `wwinference` uses `cmdstan` to fit its models, it is necessary to first @@ -301,7 +283,15 @@ fit the model. fit <- wwinference( ww_data_to_fit, hosp_data_preprocessed, - model_spec, + model_spec = get_model_spec( + forecast_date = forecast_date, + calibration_time = calibration_time, + forecast_horizon = forecast_horizon, + generation_interval = generation_interval, + inf_to_count_delay = inf_to_hosp, + infection_feedback_pmf = infection_feedback_pmf + ), + mcmc_options = get_mcmc_options(), model ) ``` @@ -313,26 +303,3 @@ diagnostic information, the data used for for fitting, and the underlying nowcasted, and forecasted expected observed hospital admissions and wastewater concentrations, as well as the latent variables of interest including the site- level $R(t)$ estimates and the state-level $R(t)$ estimate. - -```{r} -stan_data <- get_stan_data( - input_count_data = hosp_data_preprocessed, - input_ww_data = ww_data_preprocessed, - forecast_date = forecast_date, - calibration_time = calibration_time, - forecast_horizon = forecast_horizon, - generation_interval = generation_interval, - inf_to_count_delay = inf_to_hosp, - infection_feedback_pmf = infection_feedback_pmf, - params = params -) -ww_fit_obj <- model$sample( - data = stan_data, - seed = 123, - iter_sampling = 500, - iter_warmup = 750, - max_treedepth = 12, - chains = 4, - parallel_chains = 4 -) -``` From 2894ae1b5eb804bb8e39a28f84382e1913e338ea Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 3 Jul 2024 10:08:36 -0400 Subject: [PATCH 039/103] add wwinference function --- NAMESPACE | 1 + R/wwinference.R | 10 ++++++++++ man/wwinference-package.Rd | 1 - man/wwinference.Rd | 28 ++++++++++++++++++++++++++++ vignettes/wwinference.Rmd | 2 +- 5 files changed, 40 insertions(+), 2 deletions(-) create mode 100644 man/wwinference.Rd diff --git a/NAMESPACE b/NAMESPACE index a1513f88..6386f3fb 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -30,6 +30,7 @@ export(preprocess_ww_data) export(simulate_double_censored_pmf) export(to_simplex) export(validate_paramlist) +export(wwinference) importFrom(cmdstanr,cmdstan_model) importFrom(dplyr,arrange) importFrom(dplyr,as_tibble) diff --git a/R/wwinference.R b/R/wwinference.R index 68b39fea..274f7b9b 100644 --- a/R/wwinference.R +++ b/R/wwinference.R @@ -1,3 +1,13 @@ +#' Fit the joint inference of wastewater and count data +#' +#' @param ww_data +#' @param count_data +#' @param model_spec +#' @param mcmc_options +#' @param compiled_model +#' +#' @return +#' @export wwinference <- function(ww_data, count_data, model_spec = get_model_spec( diff --git a/man/wwinference-package.Rd b/man/wwinference-package.Rd index e2495c64..7520008f 100644 --- a/man/wwinference-package.Rd +++ b/man/wwinference-package.Rd @@ -2,7 +2,6 @@ % Please edit documentation in R/wwinference-package.R \docType{package} \name{wwinference-package} -\alias{wwinference} \alias{wwinference-package} \title{wwinference: Jointly infers infection dynamics from wastewater data and epidemiological indicators} \description{ diff --git a/man/wwinference.Rd b/man/wwinference.Rd new file mode 100644 index 00000000..52e2718d --- /dev/null +++ b/man/wwinference.Rd @@ -0,0 +1,28 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/wwinference.R +\name{wwinference} +\alias{wwinference} +\title{Fit the joint inference of wastewater and count data} +\usage{ +wwinference( + ww_data, + count_data, + model_spec = get_model_spec(forecast_date = "2023-12-06"), + mcmc_options = get_mcmc_options(), + compiled_model = compile_model() +) +} +\arguments{ +\item{ww_data}{} + +\item{count_data}{} + +\item{model_spec}{} + +\item{mcmc_options}{} + +\item{compiled_model}{} +} +\description{ +Fit the joint inference of wastewater and count data +} diff --git a/vignettes/wwinference.Rmd b/vignettes/wwinference.Rmd index d72dd9fa..3661b6a6 100644 --- a/vignettes/wwinference.Rmd +++ b/vignettes/wwinference.Rmd @@ -260,7 +260,7 @@ along with the other specified parameters above. As `wwinference` uses `cmdstan` to fit its models, it is necessary to first compile the model. This can be done using the compile_model() function. ```{r} -model <- compile_model() +model <- wwinference::compile_model() ``` # Fitting the model We're now ready to fit the model using the “No-U-Turn Sampler Markov chain From e6ef7140e52a5cc213b82d15adb08b30dabd8065 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 3 Jul 2024 17:13:32 -0400 Subject: [PATCH 040/103] add post processing functions, realizing that I have an indexing error somewhere --- NAMESPACE | 3 +- R/get_stan_data.R | 10 +- R/initialization.R | 4 +- R/postprocess.R | 147 +++++++++++++++++++++++ R/wwinference.R | 135 +++++++++++++++++---- inst/extdata/example_params.toml | 6 +- inst/stan/wwinference.stan | 28 ++--- man/{model_spec.Rd => get_model_spec.Rd} | 6 +- man/postprocess.Rd | 50 ++++++++ man/wwinference.Rd | 71 +++++++++-- vignettes/wwinference.Rmd | 124 +++++++++++++++++-- 11 files changed, 507 insertions(+), 77 deletions(-) create mode 100644 R/postprocess.R rename man/{model_spec.Rd => get_model_spec.Rd} (97%) create mode 100644 man/postprocess.Rd diff --git a/NAMESPACE b/NAMESPACE index 6386f3fb..c971281a 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -14,6 +14,7 @@ export(get_count_indices) export(get_count_values) export(get_ind_m) export(get_mcmc_options) +export(get_model_spec) export(get_params) export(get_stan_data) export(get_subpop_data) @@ -24,7 +25,7 @@ export(indicate_ww_exclusions) export(make_hospital_onset_delay_pmf) export(make_incubation_period_pmf) export(make_reporting_delay_pmf) -export(model_spec) +export(postprocess) export(preprocess_hosp_data) export(preprocess_ww_data) export(simulate_double_censored_pmf) diff --git a/R/get_stan_data.R b/R/get_stan_data.R index 8e3a0256..22a49eaa 100644 --- a/R/get_stan_data.R +++ b/R/get_stan_data.R @@ -172,8 +172,8 @@ get_stan_data <- function(input_count_data, data_renewal <- list( gt_max = gt_max, - inf_to_count_delay_max = inf_to_count_delay_max, - inf_to_count_delay = inf_to_count_delay, + hosp_delay_max = inf_to_count_delay_max, + inf_to_hosp = inf_to_count_delay, mwpd = ml_of_ww_per_person_day, ot = count_data_sizes$ot, n_subpops = subpop_data$n_subpops, @@ -191,16 +191,16 @@ get_stan_data <- function(input_count_data, p_hosp_m = p_hosp_m, generation_interval = generation_interval, ts = 1:gt_max, - total_pop = pop, + state_pop = pop, subpop_size = subpop_data$subpop_size, norm_pop = subpop_data$norm_pop, ww_sampled_times = ww_indices$ww_sampled_times, - count_times = count_indices$count_times, + hosp_times = count_indices$count_times, ww_sampled_lab_sites = ww_indices$ww_sampled_lab_sites, ww_log_lod = ww_values$ww_lod, ww_censored = ww_indices$ww_censored, ww_uncensored = ww_indices$ww_uncensored, - counts = count_values$counts, + hosp = count_values$counts, day_of_week = count_values$day_of_week, log_conc = ww_values$log_conc, compute_likelihood = compute_likelihood, diff --git a/R/initialization.R b/R/initialization.R index 74374b28..58e35099 100644 --- a/R/initialization.R +++ b/R/initialization.R @@ -14,7 +14,7 @@ get_inits <- function(stan_data, params) { assign(par_names[i], as.double(params[i])) } - pop <- stan_data$total_pop + pop <- stan_data$state_pop n_weeks <- as.numeric(stan_data$n_weeks) tot_weeks <- as.numeric(stan_data$tot_weeks) @@ -22,7 +22,7 @@ get_inits <- function(stan_data, params) { ht <- as.numeric(stan_data$ht) # Estimate of number of initial infections - i0 <- mean(stan_data$counts[1:7], na.rm = TRUE) / p_hosp_mean + i0 <- mean(stan_data$hosp[1:7], na.rm = TRUE) / p_hosp_mean n_subpops <- as.numeric(stan_data$n_subpops) n_ww_lab_sites <- as.numeric(stan_data$n_ww_lab_sites) diff --git a/R/postprocess.R b/R/postprocess.R new file mode 100644 index 00000000..d7fafca0 --- /dev/null +++ b/R/postprocess.R @@ -0,0 +1,147 @@ +#' @title Postprocess to generate a draws dataframe +#' +#' @description +#' This function takes in the two input data sources, the CmdStan fit object, +#' and the 3 relevant mappings from stan indices to the real data, in order +#' to generate a dataframe containing the posterior draws of the counts (e.g. +#' hospital admissions), the wastewater concentration values, the "global" R(t), +#' and the "local" R(t) estimates + the critical metadata in the data +#' +#' +#' @param ww_data A dataframe of the preprocessed wastewater concentration data +#' used to fit the model +#' @param count_data A dataframe of the preprocessed daily count data (e.g. +#' hospital admissions) from the "global" population +#' @param fit_obj a CmdStan object that is the output of fitting the model to +#' the `ww_data` and `count_data` +#' @param date_time_spine A tibble mapping the time index in stan (observed + +#' nowcast + forecast) to real dates +#' @param lab_site_spine A tibble mapping the site-lab index in stan to the +#' corresponding site, lab, and site population +#' @param subpop_spine A tibble mapping the site index in stan to the +#' corresponding subpopulation (either a site or the auxiliary site we add to +#' represent the rest of the population) +#' +#' @return A tibble containing the full set of posterior draws of the +#' estimated, nowcasted, and forecasted: counts, site-level wastewater +#' concentrations, "global"(e.g. state) R(t) estimate, and the "local" (site + +#' the one auxiliary subpopulation) R(t) estimates. In the instance where there +#' are observations, the data will be joined to each draw of the predicted +#' observation to facilitate plotting. +#' @export +postprocess <- function(ww_data, + count_data, + fit_obj, + date_time_spine, + lab_site_spine, + subpop_spine) { + draws <- fit_obj$result$draws() + + count_draws <- draws |> + tidybayes::spread_draws(pred_hosp[t]) |> + dplyr::rename(pred_value = pred_hosp) |> + dplyr::mutate( + draw = `.draw`, + name = "pred_counts" + ) |> + dplyr::select(name, t, pred_value, draw) |> + dplyr::left_join(date_time_spine, by = "t") |> + dplyr::left_join(count_data, by = "date") |> + dplyr::ungroup() |> + dplyr::rename(observed_value = count) |> + dplyr::mutate( + observation_type = "count", + type_of_quantity = "global", + lab_site_index = NA, + subpop = NA, + lab = NA, + site_pop = NA, + below_lod = NA, + lod = NA, + flag_as_ww_outlier = NA, + exlcude = NA + ) |> + dplyr::select(-t) + + ww_draws <- draws |> + tidybayes::spread_draws(pred_ww[lab_site_index, t]) |> + dplyr::rename(pred_value = pred_ww) |> + dplyr::mutate( + draw = `.draw`, + name = "pred_ww", + pred_value = exp(pred_value) + ) |> + dplyr::select(name, lab_site_index, t, pred_value, draw) |> + dplyr::left_join(date_time_spine, by = "t") |> + dplyr::left_join(lab_site_spine, by = "lab_site_index") |> + dplyr::left_join(ww_data, by = c( + "lab_site_index", "date", + "lab", "site", "site_pop" + )) |> + dplyr::ungroup() |> + dplyr::mutate(observed_value = genome_copies_per_ml) |> + dplyr::mutate( + observation_type = "genome copies per mL", + type_of_quantity = "local", + total_pop = NA, + subpop = glue::glue("Site: {site}") + ) |> + dplyr::select(colnames(count_draws), -t) + + global_rt_draws <- draws |> + tidybayes::spread_draws(rt[t]) |> + dplyr::rename(pred_value = rt) |> + dplyr::mutate( + draw = `.draw`, + name = "global R(t)" + ) |> + dplyr::select(name, t, pred_value, draw) |> + dplyr::left_join(date_time_spine, by = "t") |> + dplyr::left_join(count_data, by = "date") |> + dplyr::ungroup() |> + dplyr::rename(observed_value = count) |> + dplyr::mutate( + observed_value = NA, + observation_type = "latent variable", + type_of_quantity = "global", + lab_site_index = NA, + subpop = NA, + lab = NA, + site_pop = NA + ) |> + dplyr::select(-t) + + site_level_rt_draws <- draws |> + tidybayes::spread_draws(r_site_t[site_index, t]) |> + dplyr::rename(pred_value = r_site_t) |> + dplyr::mutate( + draw = `.draw`, + name = "subpop R(t)", + pred_value = pred_value + ) |> + dplyr::select(name, site_index, t, pred_value, draw) |> + dplyr::left_join(date_time_spine, by = "t") |> + dplyr::left_join(subpop_spine, by = "site_index") |> + dplyr::ungroup() |> + dplyr::mutate( + observed_value = NA, + lab_site_index = NA, + lab = NA, + observation_type = "latent variable", + type_of_quantity = "local", + total_pop = NA, + subpop = ifelse(site != "remainder of pop", + glue::glue("Site: {site}"), "remainder of pop" + ) + ) |> + dplyr::select(colnames(count_draws), -t) + + draws_df <- dplyr::bind_rows( + count_draws, + ww_draws, + global_rt_draws, + site_level_rt_draws + ) + + return(draws_df) +} diff --git a/R/wwinference.R b/R/wwinference.R index 274f7b9b..3f86bd75 100644 --- a/R/wwinference.R +++ b/R/wwinference.R @@ -1,21 +1,71 @@ -#' Fit the joint inference of wastewater and count data +#' @title Joint inference of count data (e.g. cases/admissions) and wastewater +#' data #' -#' @param ww_data -#' @param count_data -#' @param model_spec -#' @param mcmc_options -#' @param compiled_model +#' @description +#' Provides a user friendly interface around package functionality +#' to produce estimates, nowcasts, and forecasts pertaining to user-specified +#' delay distributions, set parameters, and priors that can be modified to +#' handledifferent types of "global" count data and "local" wastewater +#' concentrationdata using a Bayesian hierarchical framework applied to the two +#' distinctdata sources. By default the model assumes a fixed generation +#' interval and delay from infection to the event that is counted. See the +#' getting started vignette for an example model specifications fitting +#' COVID-19 hospital admissions from a hypothetical state and wasteawter +#' concentration data from multiple sites within that state. +#' +#' @param ww_data A dataframe containing the pre-processed, site-level +#' wastewater concentration data for a model run. The dataframe must contain +#' the following columns: `date`, `site`, `lab`, `genome_copies_per_ml`, +#' `lab_site_index`, `lod`, `below_lod`, `site_pop` `exclude` +#' @param count_data A dataframe containing the pre-procssed, "global" (e.g. +#' state) daily count data, pertaining to the number of events that are being +#' counted on that day, e.g. number of daily cases or daily hospital admissions. +#' Must contain the following columns: `date`, `count` , `total_pop` +#' @param model_spec The model specification parameters as defined using +#' `get_model_spec()`. The default here pertains to the `forecast_date` in the +#' example data provided by the package, but this should be specified by the +#' user based on the date they are producing a forecast +#' @param mcmc_options The MCMC parameters as defined using +#' `get_mcmc_options()`. +#' @param compiled_model The pre-compiled model as defined using +#' `compile_model()` #' -#' @return +#' @return A nested list of the following items, intended to allow the user to +#' quickly and easily plot results from their inference, while also being able +#' to have the full user functionality of running the model themselves in stan +#' by providing the raw model object and diagnostics. If the model runs, this +#' function will return: +#' `draws_df`: A tibble containing the full set of posterior draws of the +#' estimated, nowcasted, and forecasted: counts, site-level wastewater +#' concentrations, "global"(e.g. state) R(t) estimate, and the "local" (site + +#' the one auxiliary subpopulation) R(t) estimates. In the instance where there +#' are observations, the data will be joined to each draw of the predicted +#' observation to facilitate plotting. +#' `raw_fit_obj`: The CmdStan object that is returned from the call to +#' `cmdstanr::sample()`. Can be used to access draws, summary, diagnostics, etc. +#' `date_time_spine`: Mapping from time in stan to dates +#' `lab_site_spine`: Mapping from lab_site_index in stan to lab and site +#' `subpop_spine`: Mapping from site index in stan to site +#' +#' If the model fails to run, a list containing the follow will be returned: +#' `error`: the error message provided from stan, indicating why the model +#' failed to run. Note, the model might still run and produce draws even if it +#' has major model issues. We recommend the user always run the +#' `check_diagnostics()` function on the `diagnostic_summary` as part of any +#' pipeline to ensure model convergence. #' @export +#' +#' @examplesIf interactive() +#' # provide all the code and commenting in the getting started vignette +#' wwinference <- function(ww_data, count_data, - model_spec = get_model_spec( + model_spec = wwinference::get_model_spec( forecast_date = "2023-12-06" ), - mcmc_options = get_mcmc_options(), - compiled_model = compile_model()) { + mcmc_options = wwinference::get_mcmc_options(), + compiled_model = wwinference::compile_model()) { # Check that data is compatible with specifications check_date(ww_data, model_spec$forecast_date) check_date(count_data, model_spec$forecast_date) @@ -35,7 +85,7 @@ wwinference <- function(ww_data, ) init_lists <- c() - for (i in 1:model_spec$n_chains) { + for (i in 1:mcmc_options$n_chains) { init_lists[[i]] <- get_inits(stan_data, params) } @@ -47,12 +97,12 @@ wwinference <- function(ww_data, fit <- compiled_model$sample( data = stan_data, init = init_lists, - seed = model_spec$seed, - iter_sampling = model_spec$iter_sampling, - iter_warmup = model_spec$iter_warmup, - max_treedepth = model_spec$max_treedepth, - chains = model_spec$n_chains, - parallel_chains = model_spec$n_chains + seed = mcmc_options$seed, + iter_sampling = mcmc_options$iter_sampling, + iter_warmup = mcmc_options$iter_warmup, + max_treedepth = mcmc_options$max_treedepth, + chains = mcmc_options$n_chains, + parallel_chains = mcmc_options$n_chains ) print(fit) return(fit) @@ -74,18 +124,51 @@ wwinference <- function(ww_data, out <- list( error = fit$error[[1]] ) - message(error) + message(fit$error[[1]]) } else { - draws <- fit$result$draws() - diagnostics <- fit$result$sampler_diagnostics(format = "df") + # This is a bit messy, but get the spines needed to map stan data to + # the real data + # Time index to date + date_time_spine <- tibble::tibble( + date = seq( + from = min(count_data$date), + to = min(count_data$date) + stan_data$ot + stan_data$ht, + by = "days" + ) + ) |> + dplyr::mutate(t = row_number()) + # Lab-site index to corresponding lab, site, and site population size + lab_site_spine <- ww_data |> + dplyr::distinct(site, lab, lab_site_index, site_pop) + # Site index to corresponding site and subpopulation size + subpop_spine <- ww_data |> + dplyr::distinct(site, site_index, site_pop) |> + dplyr::mutate(site = as.factor(site)) |> + dplyr::bind_rows(tibble::tibble( + site = "remainder of pop", + site_index = max(ww_data$site_index) + 1, + site_pop = stan_data$subpop_size[ + length(unique(stan_data$subpop_size)) + ] + )) + + draws <- postprocess( + ww_data = ww_data, + count_data = count_data, + fit_obj = fit, + date_time_spine = date_time_spine, + lab_site_spine = lab_site_spine, + subpop_spine = subpop_spine + ) + diagnostics <- fit$result summary_diagnostics <- fit$result$diagnostic_summary() - summary <- fit$result$summary() out <- list( - draws = draws, - diagnostics = diagnostics, - summary_diagnostics = summary_diagnostics, - summary = summary + draws_df = draws, + raw_fit_obj = fit, + date_time_spine = date_time_spine, + lab_site_spine = lab_site_spine, + subpop_spine = subpop_spine ) # Run diagnostic tests, and message if a flag doesn't pass. Still return @@ -180,7 +263,7 @@ get_mcmc_options <- function( #' #' @examples #' model_spec_list <- model_spec(forecast_date = "2023-12-06") -model_spec <- function( +get_model_spec <- function( forecast_date, calibration_time = 90, forecast_horizon = 28, diff --git a/inst/extdata/example_params.toml b/inst/extdata/example_params.toml index ebacf844..0a0eb8c8 100644 --- a/inst/extdata/example_params.toml +++ b/inst/extdata/example_params.toml @@ -18,9 +18,9 @@ initial_growth_prior_sd = 0.01 autoreg_rt_a = 2 # shape1 parameter of autoreg term on Rt trend autoreg_rt_b = 40 # shape2 parameter of autoreg on Rt trend # mean = a/(a+b) = 0.05, stdv = sqrt(a)/b = sqrt(2)/40 = 0.035 -autoreg_rt_site_a = 1 # shape1 parameter of autoreg term on difference between +autoreg_rt_site_a = 2 # shape1 parameter of autoreg term on difference between # R(t) state and R(t) site -autoreg_rt_site_b = 4 # shape2 parameter of autoreg term on difference between +autoreg_rt_site_b = 40 # shape2 parameter of autoreg term on difference between # R(t) state and R(t) site autoreg_p_hosp_a = 1 # shape1 parameter of autoreg term on IHR(t) trend autoreg_p_hosp_b = 100 # shape2 parameter of autoreg term on IHR(t) trend @@ -55,7 +55,7 @@ log10_g_prior_sd = 2 log_g_prior_mean = 27.63102 # 12 * log(10) log_g_prior_sd = 4.60517 # 2 * log(10) -sigma_ww_site_prior_mean_mean = 1 +sigma_ww_site_prior_mean_mean = 0.5 sigma_ww_site_prior_mean_sd = 1 sigma_ww_site_prior_sd_mean = 0 sigma_ww_site_prior_sd_sd = 1 diff --git a/inst/stan/wwinference.stan b/inst/stan/wwinference.stan index 5c104797..e07ef35a 100644 --- a/inst/stan/wwinference.stan +++ b/inst/stan/wwinference.stan @@ -12,8 +12,8 @@ functions { // The fixed input data data { int gt_max; - int inf_to_count_delay_max; - vector[inf_to_count_delay_max] inf_to_count_delay; // delay distribution from infecion to hospital admission + int hosp_delay_max; + vector[hosp_delay_max] inf_to_hosp; // delay distribution from infecion to hospital admission real mwpd; // mL of ww produced per person per day int if_l; // length of infection feedback pmf vector[if_l] infection_feedback_pmf; // infection feedback pmf @@ -31,12 +31,12 @@ data { int tot_weeks; // number of weeks for the weekly random walk on IHR (includes unobserved time) matrix [uot+ot+ht, tot_weeks] p_hosp_m; // matrix to convert p_hosp from weekly to daily vector[gt_max] generation_interval; // generation interval distribution - real total_pop; // population size + real state_pop; // population size vector[n_subpops] subpop_size; // the population sizes for each subpopulation - real norm_pop; + real norm_pop; array[owt] int ww_sampled_times; // a list of all of the days on which WW is sampled // will be mapped to the corresponding sites (ww_sampled_sites) - array[oht] int count_times; // the days on which hospital admissions are observed + array[oht] int hosp_times; // the days on which hospital admissions are observed array[owt] int ww_sampled_sites; // vector of unique sites in order of the sampled times array[owt] int ww_sampled_lab_sites; // vector of unique lab-site combos i // n order of the sampled times @@ -44,7 +44,7 @@ data { array[n_uncensored] int ww_uncensored; // time that WW data is above LOD vector[owt] ww_log_lod; // The limit of detection in that site at that time point array[n_ww_lab_sites] int lab_site_to_site_map; // which lab sites correspond to which sites - array[oht] int counts; // observed counts + array[oht] int hosp; // observed hospital admissions array[ot + ht] int day_of_week; // integer vector with 1-7 corresponding to the weekday vector[owt] log_conc; // observed concentration of viral genomes in WW int compute_likelihood; // 1= use data to compute likelihood @@ -169,7 +169,7 @@ transformed parameters { vector[ot + uot + ht] state_inf_per_capita = rep_vector(0, uot + ot + ht); // state level incident infections per capita matrix[n_subpops, ot + ht] model_log_v_ot; // expected observed viral genomes/mL at all observed and forecasted times real g = pow(log10_g, 10); // Estimated genomes shed per infected individual - real i0 = i0_over_n * total_pop; // Initial absolute infection incidence + real i0 = i0_over_n * state_pop; // Initial absolute infection incidence vector[n_subpops] i0_site_over_n; // site-level initial // per capita infection incidence vector[n_subpops] growth_site; @@ -238,7 +238,7 @@ transformed parameters { // Expected hospital admissions per capita: // This is a convolution of incident infections and the hospital-admission delay distribution // generates all hospitalizations, across unobserved time, observed time, and forecast time - model_hosp_per_capita = convolve_dot_product(p_hosp .* state_inf_per_capita, reverse(inf_to_count_delay), + model_hosp_per_capita = convolve_dot_product(p_hosp .* state_inf_per_capita, reverse(inf_to_hosp), ot + uot + ht); // predicted hospital admissions per capita @@ -246,9 +246,9 @@ transformed parameters { // apply the weekday effect so these are distributed with fewer admits on Sat & Sun // multiply by state population to convert from predicted per capita admissions to // predicted absolute admissions - exp_obs_hosp = total_pop * day_of_week_effect( - exp_obs_hosp_per_capita_no_wday_effect[count_times], - day_of_week[count_times], + exp_obs_hosp = state_pop * day_of_week_effect( + exp_obs_hosp_per_capita_no_wday_effect[hosp_times], + day_of_week[hosp_times], hosp_wday_effect); // Observations at the site level (genomes/person/day) are: @@ -321,7 +321,7 @@ model { } if (include_hosp == 1) { - counts ~ neg_binomial_2(exp_obs_hosp, phi_h); + hosp ~ neg_binomial_2(exp_obs_hosp, phi_h); } } // end if for computing log likelihood } @@ -341,12 +341,12 @@ generated quantities { exp(growth_site[i] * uot); } - pred_hosp = neg_binomial_2_rng(total_pop * day_of_week_effect(model_hosp_per_capita[uot + 1 : + pred_hosp = neg_binomial_2_rng(state_pop * day_of_week_effect(model_hosp_per_capita[uot + 1 : uot + ot + ht], day_of_week, hosp_wday_effect), phi_h); - pred_new_i = neg_binomial_2_rng(total_pop * state_inf_per_capita[uot + 1 : uot + ot + ht], phi_h); + pred_new_i = neg_binomial_2_rng(state_pop * state_inf_per_capita[uot + 1 : uot + ot + ht], phi_h); // Here need to iterate through each lab-site, find the corresponding site // and apply the expected lab-site error diff --git a/man/model_spec.Rd b/man/get_model_spec.Rd similarity index 97% rename from man/model_spec.Rd rename to man/get_model_spec.Rd index afe67b4d..9c3a916f 100644 --- a/man/model_spec.Rd +++ b/man/get_model_spec.Rd @@ -1,10 +1,10 @@ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/wwinference.R -\name{model_spec} -\alias{model_spec} +\name{get_model_spec} +\alias{get_model_spec} \title{Get model specificaitons} \usage{ -model_spec( +get_model_spec( forecast_date, calibration_time = 90, forecast_horizon = 28, diff --git a/man/postprocess.Rd b/man/postprocess.Rd new file mode 100644 index 00000000..e9555301 --- /dev/null +++ b/man/postprocess.Rd @@ -0,0 +1,50 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/postprocess.R +\name{postprocess} +\alias{postprocess} +\title{Postprocess to generate a draws dataframe} +\usage{ +postprocess( + ww_data, + count_data, + fit_obj, + date_time_spine, + lab_site_spine, + subpop_spine +) +} +\arguments{ +\item{ww_data}{A dataframe of the preprocessed wastewater concentration data +used to fit the model} + +\item{count_data}{A dataframe of the preprocessed daily count data (e.g. +hospital admissions) from the "global" population} + +\item{fit_obj}{a CmdStan object that is the output of fitting the model to +the \code{ww_data} and \code{count_data}} + +\item{date_time_spine}{A tibble mapping the time index in stan (observed + +nowcast + forecast) to real dates} + +\item{lab_site_spine}{A tibble mapping the site-lab index in stan to the +corresponding site, lab, and site population} + +\item{subpop_spine}{A tibble mapping the site index in stan to the +corresponding subpopulation (either a site or the auxiliary site we add to +represent the rest of the population)} +} +\value{ +A tibble containing the full set of posterior draws of the +estimated, nowcasted, and forecasted: counts, site-level wastewater +concentrations, "global"(e.g. state) R(t) estimate, and the "local" (site + +the one auxiliary subpopulation) R(t) estimates. In the instance where there +are observations, the data will be joined to each draw of the predicted +observation to facilitate plotting. +} +\description{ +This function takes in the two input data sources, the CmdStan fit object, +and the 3 relevant mappings from stan indices to the real data, in order +to generate a dataframe containing the posterior draws of the counts (e.g. +hospital admissions), the wastewater concentration values, the "global" R(t), +and the "local" R(t) estimates + the critical metadata in the data +} diff --git a/man/wwinference.Rd b/man/wwinference.Rd index 52e2718d..f6ea5fd1 100644 --- a/man/wwinference.Rd +++ b/man/wwinference.Rd @@ -2,27 +2,78 @@ % Please edit documentation in R/wwinference.R \name{wwinference} \alias{wwinference} -\title{Fit the joint inference of wastewater and count data} +\title{Joint inference of count data (e.g. cases/admissions) and wastewater +data} \usage{ wwinference( ww_data, count_data, - model_spec = get_model_spec(forecast_date = "2023-12-06"), - mcmc_options = get_mcmc_options(), - compiled_model = compile_model() + model_spec = wwinference::get_model_spec(forecast_date = "2023-12-06"), + mcmc_options = wwinference::get_mcmc_options(), + compiled_model = wwinference::compile_model() ) } \arguments{ -\item{ww_data}{} +\item{ww_data}{A dataframe containing the pre-processed, site-level +wastewater concentration data for a model run. The dataframe must contain +the following columns: \code{date}, \code{site}, \code{lab}, \code{genome_copies_per_ml}, +\code{lab_site_index}, \code{lod}, \code{below_lod}, \code{site_pop} \code{exclude}} -\item{count_data}{} +\item{count_data}{A dataframe containing the pre-procssed, "global" (e.g. +state) daily count data, pertaining to the number of events that are being +counted on that day, e.g. number of daily cases or daily hospital admissions. +Must contain the following columns: \code{date}, \code{count} , \code{total_pop}} -\item{model_spec}{} +\item{model_spec}{The model specification parameters as defined using +\code{get_model_spec()}. The default here pertains to the \code{forecast_date} in the +example data provided by the package, but this should be specified by the +user based on the date they are producing a forecast} -\item{mcmc_options}{} +\item{mcmc_options}{The MCMC parameters as defined using +\code{get_mcmc_options()}.} -\item{compiled_model}{} +\item{compiled_model}{The pre-compiled model as defined using +\code{compile_model()}} +} +\value{ +A nested list of the following items, intended to allow the user to +quickly and easily plot results from their inference, while also being able +to have the full user functionality of running the model themselves in stan +by providing the raw model object and diagnostics. If the model runs, this +function will return: +\code{draws_df}: A tibble containing the full set of posterior draws of the +estimated, nowcasted, and forecasted: counts, site-level wastewater +concentrations, "global"(e.g. state) R(t) estimate, and the "local" (site + +the one auxiliary subpopulation) R(t) estimates. In the instance where there +are observations, the data will be joined to each draw of the predicted +observation to facilitate plotting. +\code{raw_fit_obj}: The CmdStan object that is returned from the call to +\code{cmdstanr::sample()}. Can be used to access draws, summary, diagnostics, etc. +\code{date_time_spine}: Mapping from time in stan to dates +\code{lab_site_spine}: Mapping from lab_site_index in stan to lab and site +\code{subpop_spine}: Mapping from site index in stan to site + +If the model fails to run, a list containing the follow will be returned: +\code{error}: the error message provided from stan, indicating why the model +failed to run. Note, the model might still run and produce draws even if it +has major model issues. We recommend the user always run the +\code{check_diagnostics()} function on the \code{diagnostic_summary} as part of any +pipeline to ensure model convergence. } \description{ -Fit the joint inference of wastewater and count data +Provides a user friendly interface around package functionality +to produce estimates, nowcasts, and forecasts pertaining to user-specified +delay distributions, set parameters, and priors that can be modified to handle +different types of "global" count data and "local" wastewater concentration +data using a Bayesian hierarchical framework applied to the two distinct +data sources. By default the model assumes a fixed generation interval and +delay from infection to the event that is counted. See the getting started +vignette for an example model specifications fitting COVID-19 hospital +admissions from a hypothetical state and wasteawter concentration data from +multiple sites within that state. +} +\examples{ +\dontshow{if (interactive()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +# provide all the code and commenting in the getting started vignette +\dontshow{\}) # examplesIf} } diff --git a/vignettes/wwinference.Rmd b/vignettes/wwinference.Rmd index 3661b6a6..d00b0cdd 100644 --- a/vignettes/wwinference.Rmd +++ b/vignettes/wwinference.Rmd @@ -273,14 +273,14 @@ acceptance probability of 0.95 and a maximum tree depth of 12. The user may wish to adjust these as they are iterating to reduce model run-time or to achieve better convergence on a real-world use case. -We also pass our preprocessed datasets (`ww_data_preprocessed` and -`hosp_data_preprocessed`), our model specifications (`model_spec`), and our -pre-compiled model(model) to `wwinference` where they are combined and used to +We also pass our preprocessed datasets (`ww_data_to_fit` and +`hosp_data_preprocessed`), specify our model using `get_model_spec()`, +set the MCMC settings using `get_mcmc_optins()`, and pass in our +pre-compiled model(`model`) to `wwinference` where they are combined and used to fit the model. -# Next, we will get the stan data objected from the data and the model -# specifications -```{r} -fit <- wwinference( + +```{r, message = FALSE} +fit <- wwinference::wwinference( ww_data_to_fit, hosp_data_preprocessed, model_spec = get_model_spec( @@ -292,14 +292,112 @@ fit <- wwinference( infection_feedback_pmf = infection_feedback_pmf ), mcmc_options = get_mcmc_options(), - model + compiled_model = model ) ``` # The `wwinference` object The `wwinference()` function returns a `wwinference` object which includes -diagnostic information, the data used for for fitting, and the underlying -`CmdStanModel` object. The `CmdStanModel` object contains the estimated, -nowcasted, and forecasted expected observed hospital admissions and wastewater -concentrations, as well as the latent variables of interest including the site- -level $R(t)$ estimates and the state-level $R(t)$ estimate. +a `draws_df`, the underlying `CmdStan` object (`raw_fit_obj`), and three +"spines" that map the stan indices to the data which include:a +`date_time_spine`, `lab_site_spine`, and `subpop_spine`. The `draws_df` is +intended to provide an easy to work with tibble of posterior draws of +the estimated, nowcasted, and forecasted expected observed hospital admissions +and wastewater concentrations, as well as the latent variables of interest +including the site-level $R(t)$ estimates and the state-level $R(t)$ estimate. +```{r} +head(fit) +``` +# Summarizing and plotting the model fit +The `draws_df` object is intended to be used to easily plot relevant model +outputs against data. This can be useful to get a sense of if you're model is +fitting the data well, and if the nowcasted/forecast quantities look reasonable. + +```{r} +sampled_draws <- sample(1:max(draws_df$draw), 100) + +# Hospital admissions: fits, nowcasts, forecasts +ggplot(draws_df |> dplyr::filter( + name == "pred_counts", + draw %in% sampled_draws +)) + + geom_line(aes(x = date, y = pred_value, group = draw), + color = "red4", alpha = 0.1, size = 0.2 + ) + + geom_point( + data = hosp_data_eval, + aes(x = date, y = daily_hosp_admits_for_eval), + shape = 21, color = "black", fill = "white" + ) + + geom_point(aes(x = date, y = observed_value)) + + geom_vline(aes(xintercept = lubridate::ymd(forecast_date)), + linetype = "dashed" + ) + + xlab("") + + ylab("Daily hospital admissions") + + ggtitle("Fit and forecasted hospital admissions ") + + theme_bw() + +# R(t) of the hypothetical state +ggplot(draws_df |> dplyr::filter( + name == "global R(t)", + draw %in% sampled_draws +)) + + geom_line(aes(x = date, y = pred_value, group = draw), + color = "blue4", alpha = 0.1, size = 0.2 + ) + + geom_vline(aes(xintercept = lubridate::ymd(forecast_date)), + linetype = "dashed" + ) + + geom_hline(aes(yintercept = 1), linetype = "dashed") + + xlab("") + + ylab("R(t) of hypothetical state") + + ggtitle("R(t) estimate") + + theme_bw() + + + +ggplot(draws_df |> dplyr::filter( + name == "pred_ww", + draw %in% sampled_draws +)) + + geom_line( + aes( + x = date, y = log(pred_value), + color = subpop, + group = draw + ), + alpha = 0.1, size = 0.2, + show.legend = FALSE + ) + + geom_point(aes(x = date, y = log(observed_value)), + color = "black", show.legend = FALSE + ) + + facet_wrap(~subpop, scales = "free") + + geom_vline(aes(xintercept = forecast_date), linetype = "dashed") + + xlab("") + + ylab("Log(Genome copies/mL)") + + ggtitle("Lab-site level wastewater concentration") + + theme_bw() + +ggplot(draws_df |> dplyr::filter( + name == "subpop R(t)", + draw %in% sampled_draws +)) + + geom_line( + aes( + x = date, y = pred_value, group = draw, + color = subpop + ), + alpha = 0.1, size = 0.2 + ) + + geom_vline(aes(xintercept = lubridate::ymd(forecast_date)), + linetype = "dashed" + ) + + facet_wrap(~subpop, scales = "free") + + geom_hline(aes(yintercept = 1), linetype = "dashed") + + xlab("") + + ylab("R(t) of hypothetical state") + + ggtitle("R(t) estimate") + + theme_bw() +``` From e8a3dfa22dd453fa7471fb82866f4711639f88ef Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Thu, 4 Jul 2024 09:47:16 -0400 Subject: [PATCH 041/103] fix the handling of passing in col names, error was breaking LOD calling --- R/preprocessing.R | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/R/preprocessing.R b/R/preprocessing.R index 0f122ece..4625a989 100644 --- a/R/preprocessing.R +++ b/R/preprocessing.R @@ -7,8 +7,10 @@ #' @param lod_col_name string indicating the name of the column containing #' the concentration measurements in the wastewater data, default is #' `genome_copies_per_ml` -#' @return a dataframe containing the same columns as ww_data plus the following -#' additional columns neede for the stan model: +#' @return a dataframe containing the same columns as ww_data except +#' the `conc_col_name` will be replaced with `genome_copies_per_ml` and +#' the `lod_col_name` will be replaced with `lod` plus the following +#' additional columns needed for the stan model: #' lab_site_index, site_index, flag_as_ww_outlier, lab_site_name, #' forecast_date #' @export @@ -30,7 +32,7 @@ preprocess_ww_data <- function(ww_data, conc_col_name = "genome_copies_per_ml", lod_col_name = "lod") { # Add some columns - ww_data <- ww_data |> + ww_data_add_cols <- ww_data |> dplyr::left_join( ww_data |> dplyr::distinct(lab, site) |> @@ -45,14 +47,18 @@ preprocess_ww_data <- function(ww_data, dplyr::mutate(site_index = dplyr::row_number()), by = "site" ) |> + dplyr::rename( + lod = {{ lod_col_name }}, + genome_copies_per_ml = {{ conc_col_name }} + ) |> dplyr::mutate( lab_site_name = glue::glue("Site: {site}, Lab: {lab}"), - below_lod = ifelse({{ conc_col_name }} < {{ lod_col_name }}, 1, 0) + below_lod = ifelse(genome_copies_per_ml < lod, 1, 0) ) # Get an extra column that identifies the wastewater outliers using the # default parameters - ww_preprocessed <- flag_ww_outliers(ww_data, + ww_preprocessed <- flag_ww_outliers(ww_data_add_cols, conc_col_name = {{ conc_col_name }} ) |> dplyr::rename( From f5b314f57de8314259744cbf7b24ff939039f388 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Thu, 4 Jul 2024 09:53:19 -0400 Subject: [PATCH 042/103] update documentation --- man/preprocess_ww_data.Rd | 6 ++++-- man/wwinference.Rd | 16 ++++++++-------- 2 files changed, 12 insertions(+), 10 deletions(-) diff --git a/man/preprocess_ww_data.Rd b/man/preprocess_ww_data.Rd index 028b2a8d..dc443c07 100644 --- a/man/preprocess_ww_data.Rd +++ b/man/preprocess_ww_data.Rd @@ -23,8 +23,10 @@ the concentration measurements in the wastewater data, default is \code{genome_copies_per_ml}} } \value{ -a dataframe containing the same columns as ww_data plus the following -additional columns neede for the stan model: +a dataframe containing the same columns as ww_data except +the \code{conc_col_name} will be replaced with \code{genome_copies_per_ml} and +the \code{lod_col_name} will be replaced with \code{lod} plus the following +additional columns needed for the stan model: lab_site_index, site_index, flag_as_ww_outlier, lab_site_name, forecast_date } diff --git a/man/wwinference.Rd b/man/wwinference.Rd index f6ea5fd1..39a46fb0 100644 --- a/man/wwinference.Rd +++ b/man/wwinference.Rd @@ -63,14 +63,14 @@ pipeline to ensure model convergence. \description{ Provides a user friendly interface around package functionality to produce estimates, nowcasts, and forecasts pertaining to user-specified -delay distributions, set parameters, and priors that can be modified to handle -different types of "global" count data and "local" wastewater concentration -data using a Bayesian hierarchical framework applied to the two distinct -data sources. By default the model assumes a fixed generation interval and -delay from infection to the event that is counted. See the getting started -vignette for an example model specifications fitting COVID-19 hospital -admissions from a hypothetical state and wasteawter concentration data from -multiple sites within that state. +delay distributions, set parameters, and priors that can be modified to +handledifferent types of "global" count data and "local" wastewater +concentrationdata using a Bayesian hierarchical framework applied to the two +distinctdata sources. By default the model assumes a fixed generation +interval and delay from infection to the event that is counted. See the +getting started vignette for an example model specifications fitting +COVID-19 hospital admissions from a hypothetical state and wasteawter +concentration data from multiple sites within that state. } \examples{ \dontshow{if (interactive()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} From f558bd92edbb691d50f9bda84b912bc30c89b2c6 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Thu, 4 Jul 2024 09:57:48 -0400 Subject: [PATCH 043/103] fix typo --- R/postprocess.R | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/R/postprocess.R b/R/postprocess.R index d7fafca0..30baf9e6 100644 --- a/R/postprocess.R +++ b/R/postprocess.R @@ -59,7 +59,7 @@ postprocess <- function(ww_data, below_lod = NA, lod = NA, flag_as_ww_outlier = NA, - exlcude = NA + exclude = NA ) |> dplyr::select(-t) From 00c05437a7cb4d1e4e5576639ee50c2679e4a158 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 5 Jul 2024 10:34:18 -0400 Subject: [PATCH 044/103] adjust postprocess function, add in diagnostics --- NAMESPACE | 1 + R/model_diagnostics.R | 57 +++++++++++++++++++++++++++++++ R/postprocess.R | 10 +++++- R/wwinference.R | 15 +++++--- man/get_model_diagnostic_flags.Rd | 40 ++++++++++++++++++++++ 5 files changed, 118 insertions(+), 5 deletions(-) create mode 100644 R/model_diagnostics.R create mode 100644 man/get_model_diagnostic_flags.Rd diff --git a/NAMESPACE b/NAMESPACE index c971281a..dec3d968 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -14,6 +14,7 @@ export(get_count_indices) export(get_count_values) export(get_ind_m) export(get_mcmc_options) +export(get_model_diagnostic_flags) export(get_model_spec) export(get_params) export(get_stan_data) diff --git a/R/model_diagnostics.R b/R/model_diagnostics.R new file mode 100644 index 00000000..d2dcba08 --- /dev/null +++ b/R/model_diagnostics.R @@ -0,0 +1,57 @@ +#' Get diagnostic flags +#' +#' @description +#' This function takes in the output from a cmdstanr$sample() function (the +#' fit object) and a series of diagnostic tolerances and returns +#' a dataframe containing flags for whether any of the diagnostic thresholds +#' were exceeded, which would indicate that the model did not properly +#' converge +#' +#' +#' @param stan_fit_object The R6 Cmdstan Object fit object +#' @param ebmfi_tolerance float indicating the tolerance for EBMFI +#' (estimated bayesian fraction of missing information), default is `0.2` +#' @param divergences_tolerance float indicating the tolerance for the +#' proportion of sampling iterations that are divergent, default is `0.01` +#' @param p_high_rhat_tolerance float indicating the tolerance for the +#' proportion of parameters rhats>1.05, default is `0.05` +#' @param max_tree_depth_tol float indicating the tolerance for the proportion +#' of iterations that exceed the maximum tree depth, default is `0.01` +#' +#' @return flag_df: dataframe containing columns for each of the flags, +#' if any flags are TRUE that indicates some model issue +#' @export +#' +get_model_diagnostic_flags <- function(stan_fit_object, + ebmfi_tolerance = 0.2, + divergences_tolerance = 0.01, + p_high_rhat_tolerance = 0.05, + max_tree_depth_tol = 0.01) { + n_chains <- stan_fit_object$num_chains() + iter_sampling <- stan_fit_object$metadata()$iter_sampling + + # Summary is a large dataframe with diagnostics for each parameters + summary <- stan_fit_object$summary() + diagnostic_summary <- stan_fit_object$diagnostic_summary(quiet = TRUE) + + flag_low_embfi <- mean(diagnostic_summary$ebfmi) <= ebmfi_tolerance + max_n_divergences <- n_chains * iter_sampling * divergences_tolerance + flag_too_many_divergences <- any( + diagnostic_summary$num_divergent >= max_n_divergences + ) + p_high_rhat <- as.numeric(mean(summary[, "rhat"]$rhat > 1.05, na.rm = TRUE)) + flag_high_rhat <- p_high_rhat >= + p_high_rhat_tolerance + max_n_max_treedepth <- n_chains * iter_sampling * max_tree_depth_tol + flag_high_max_treedepth <- any( + diagnostic_summary$num_max_tree_depth >= max_n_max_treedepth + ) + + flag_df <- data.frame( + flag_high_max_treedepth, + flag_too_many_divergences, + flag_high_rhat, + flag_low_embfi + ) + return(flag_df) +} diff --git a/R/postprocess.R b/R/postprocess.R index 30baf9e6..23c2fcc2 100644 --- a/R/postprocess.R +++ b/R/postprocess.R @@ -107,7 +107,11 @@ postprocess <- function(ww_data, lab_site_index = NA, subpop = NA, lab = NA, - site_pop = NA + site_pop = NA, + below_lod = NA, + lod = NA, + flag_as_ww_outlier = NA, + exclude = NA ) |> dplyr::select(-t) @@ -127,6 +131,10 @@ postprocess <- function(ww_data, observed_value = NA, lab_site_index = NA, lab = NA, + below_lod = NA, + lod = NA, + flag_as_ww_outlier = NA, + exclude = NA, observation_type = "latent variable", type_of_quantity = "local", total_pop = NA, diff --git a/R/wwinference.R b/R/wwinference.R index 3f86bd75..bc1b7171 100644 --- a/R/wwinference.R +++ b/R/wwinference.R @@ -160,19 +160,26 @@ wwinference <- function(ww_data, lab_site_spine = lab_site_spine, subpop_spine = subpop_spine ) - diagnostics <- fit$result summary_diagnostics <- fit$result$diagnostic_summary() + convergence_flag_df <- get_model_diagnostic_flags( + stan_fit_object = + fit$result + ) out <- list( draws_df = draws, - raw_fit_obj = fit, + raw_fit_obj = fit$result, date_time_spine = date_time_spine, lab_site_spine = lab_site_spine, subpop_spine = subpop_spine ) - # Run diagnostic tests, and message if a flag doesn't pass. Still return - # the same data + # Message if a flag doesn't pass. Still return + # the same data, but we want user to know the issue + if (any(convergence_flag_df[1, ])) { + message("Model flagged for convergence issues, run model diagnostics + on the output stanfit object for more information") + } } return(out) diff --git a/man/get_model_diagnostic_flags.Rd b/man/get_model_diagnostic_flags.Rd new file mode 100644 index 00000000..d3be1e0e --- /dev/null +++ b/man/get_model_diagnostic_flags.Rd @@ -0,0 +1,40 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/model_diagnostics.R +\name{get_model_diagnostic_flags} +\alias{get_model_diagnostic_flags} +\title{Get diagnostic flags} +\usage{ +get_model_diagnostic_flags( + stan_fit_object, + ebmfi_tolerance = 0.2, + divergences_tolerance = 0.01, + p_high_rhat_tolerance = 0.05, + max_tree_depth_tol = 0.01 +) +} +\arguments{ +\item{stan_fit_object}{The R6 Cmdstan Object fit object} + +\item{ebmfi_tolerance}{float indicating the tolerance for EBMFI +(estimated bayesian fraction of missing information), default is \code{0.2}} + +\item{divergences_tolerance}{float indicating the tolerance for the +proportion of sampling iterations that are divergent, default is \code{0.01}} + +\item{p_high_rhat_tolerance}{float indicating the tolerance for the +proportion of parameters rhats>1.05, default is \code{0.05}} + +\item{max_tree_depth_tol}{float indicating the tolerance for the proportion +of iterations that exceed the maximum tree depth, default is \code{0.01}} +} +\value{ +flag_df: dataframe containing columns for each of the flags, +if any flags are TRUE that indicates some model issue +} +\description{ +This function takes in the output from a cmdstanr$sample() function (the +fit object) and a series of diagnostic tolerances and returns +a dataframe containing flags for whether any of the diagnostic thresholds +were exceeded, which would indicate that the model did not properly +converge +} From f675dc438fab3cd2d63e2710dac0269c215112c1 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 5 Jul 2024 11:07:33 -0400 Subject: [PATCH 045/103] add functions for quickly plotting outputs --- NAMESPACE | 4 + R/figures.R | 220 ++++++++++++++++++++++++++++++ man/get_plot_forecasted_counts.Rd | 35 +++++ man/get_plot_global_rt.Rd | 22 +++ man/get_plot_subpop_rt.Rd | 22 +++ man/get_plot_ww_conc.Rd | 22 +++ 6 files changed, 325 insertions(+) create mode 100644 R/figures.R create mode 100644 man/get_plot_forecasted_counts.Rd create mode 100644 man/get_plot_global_rt.Rd create mode 100644 man/get_plot_subpop_rt.Rd create mode 100644 man/get_plot_ww_conc.Rd diff --git a/NAMESPACE b/NAMESPACE index dec3d968..f12c65ff 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -17,6 +17,10 @@ export(get_mcmc_options) export(get_model_diagnostic_flags) export(get_model_spec) export(get_params) +export(get_plot_forecasted_counts) +export(get_plot_global_rt) +export(get_plot_subpop_rt) +export(get_plot_ww_conc) export(get_stan_data) export(get_subpop_data) export(get_ww_data_indices) diff --git a/R/figures.R b/R/figures.R new file mode 100644 index 00000000..6309c7ab --- /dev/null +++ b/R/figures.R @@ -0,0 +1,220 @@ +#' Get plot of fit and forecasted counts +#' +#' @param draws A dataframe containing the posterior draws with the data joined +#' to it. This is `draws_df` output of the call to `wwinference()` +#' @param count_data_eval A dataframe containing the count data we will +#' evaluate the forecasts against. +#' @param forecast_date A string indicating the date we made the forecast, for +#' plotting, in ISO8 format YYYY-MM-DD +#' @param count_type A string indicating what data the counts refer to, +#' default is `hospital admissions` +#' +#' @return A ggplot object containing the posterior draw of the estimated, +#' nowcasted, and forecasted hospital admissions alongside the data used to +#' calibrate the hospital admissions model and the later observed hospital +#' admissions used to evaluate the forecast performance +#' @export +#' +get_plot_forecasted_counts <- function(draws, + count_data_eval, + forecast_date, + count_type = "hospital admissions") { + p <- ggplot(draws |> dplyr::filter( + name == "pred_counts", + draw %in% sampled_draws + )) + + geom_line(aes(x = date, y = pred_value, group = draw), + color = "red4", alpha = 0.1, size = 0.2 + ) + + geom_point( + data = count_data_eval, + aes(x = date, y = daily_hosp_admits_for_eval), + shape = 21, color = "black", fill = "white" + ) + + geom_point(aes(x = date, y = observed_value)) + + geom_vline(aes(xintercept = lubridate::ymd(forecast_date)), + linetype = "dashed" + ) + + xlab("") + + ylab(glue::glue("Daily {count_type}")) + + ggtitle(glue::glue("Fit and forecasted {count_type}")) + + scale_x_date( + date_breaks = "2 weeks", + labels = scales::date_format("%Y-%m-%d") + ) + + theme_bw() + + theme( + axis.text.x = element_text( + size = 8, vjust = 1, + hjust = 1, angle = 45 + ), + axis.title.x = element_text(size = 12), + axis.title.y = element_text(size = 12), + plot.title = element_text( + size = 10, + vjust = 0.5, hjust = 0.5 + ) + ) + return(p) +} + +#' Get plot of fit and forecasted wastewater concentrations +#' +#' @param draws A dataframe containing the posterior draws with the data joined +#' to it. This is `draws_df` output of the call to `wwinference()` +#' @param forecast_date A string indicating the date we made the forecast, for +#' plotting, in ISO8 format YYYY-MM-DD +#' +#' @return a ggplot object containing faceted plots of the wastewaster +#' concentrations in each site and lab combination +#' @export +#' +get_plot_ww_conc <- function(draws, + forecast_date) { + p <- ggplot(draws |> dplyr::filter( + name == "pred_ww", + draw %in% sampled_draws + ) |> + dplyr::mutate( + site_lab_name = glue::glue("{subpop}, Lab: {lab}") + )) + + geom_line( + aes( + x = date, y = log(pred_value), + color = subpop, + group = draw + ), + alpha = 0.1, size = 0.2, + show.legend = FALSE + ) + + geom_point(aes(x = date, y = log(observed_value)), + color = "black", show.legend = FALSE + ) + + facet_wrap(~site_lab_name, scales = "free") + + geom_vline(aes(xintercept = lubridate::ymd(forecast_date)), + linetype = "dashed" + ) + + xlab("") + + ylab("Log(Genome copies/mL)") + + ggtitle("Lab-site level wastewater concentration") + + scale_x_date( + date_breaks = "2 weeks", + labels = scales::date_format("%Y-%m-%d") + ) + + theme_bw() + + theme( + axis.text.x = element_text( + size = 8, vjust = 1, + hjust = 1, angle = 45 + ), + axis.title.x = element_text(size = 12), + axis.title.y = element_text(size = 12), + plot.title = element_text( + size = 10, + vjust = 0.5, hjust = 0.5 + ) + ) + return(p) +} + +#' Get plot of fit, nowcasted, and forecasted "global" R(t) +#' +#' @param draws A dataframe containing the posterior draws with the data joined +#' to it. This is `draws_df` output of the call to `wwinference()` +#' @param forecast_date A string indicating the date we made the forecast, for +#' plotting, in ISO8 format YYYY-MM-DD +#' +#' @return A ggplot object containing the posterior draws of the global R(t) +#' estimate +#' @export +#' +get_plot_global_rt <- function(draws, + forecast_date) { + # R(t) of the hypothetical state + p <- ggplot(draws |> dplyr::filter( + name == "global R(t)", + draw %in% sampled_draws + )) + + geom_line(aes(x = date, y = pred_value, group = draw), + color = "blue4", alpha = 0.1, size = 0.2 + ) + + geom_vline(aes(xintercept = lubridate::ymd(forecast_date)), + linetype = "dashed" + ) + + geom_hline(aes(yintercept = 1), linetype = "dashed") + + xlab("") + + ylab("R(t) of hypothetical state") + + ggtitle("R(t) estimate") + + scale_x_date( + date_breaks = "2 weeks", + labels = scales::date_format("%Y-%m-%d") + ) + + theme_bw() + + theme( + axis.text.x = element_text( + size = 8, vjust = 1, + hjust = 1, angle = 45 + ), + axis.title.x = element_text(size = 12), + axis.title.y = element_text(size = 12), + plot.title = element_text( + size = 10, + vjust = 0.5, hjust = 0.5 + ) + ) + return(p) +} + +#' Get plot of fit, nowcasted, and forecasted R(t) in each subpopulation +#' +#' @param draws A dataframe containing the posterior draws with the data joined +#' to it. This is `draws_df` output of the call to `wwinference()` +#' @param forecast_date A string indicating the date we made the forecast, for +#' plotting, in ISO8 format YYYY-MM-DD +#' +#' @return A ggplot object containing faceted plots of the R(t) estimate in each +#' subpopulation (so wastewater sites + those not on wastewater) +#' @export +#' +get_plot_subpop_rt <- function(draws, + forecast_date) { + p <- ggplot(draws |> dplyr::filter( + name == "subpop R(t)", + draw %in% sampled_draws + )) + + geom_line( + aes( + x = date, y = pred_value, group = draw, + color = subpop + ), + alpha = 0.1, size = 0.2, + show.legend = FALSE + ) + + geom_vline(aes(xintercept = lubridate::ymd(forecast_date)), + linetype = "dashed" + ) + + facet_wrap(~subpop, scales = "free") + + geom_hline(aes(yintercept = 1), linetype = "dashed") + + xlab("") + + ylab("Subpopulation R(t)") + + ggtitle("R(t) estimate of each subpopulation") + + scale_x_date( + date_breaks = "2 weeks", + labels = scales::date_format("%Y-%m-%d") + ) + + theme_bw() + + theme( + axis.text.x = element_text( + size = 8, vjust = 1, + hjust = 1, angle = 45 + ), + axis.title.x = element_text(size = 12), + axis.title.y = element_text(size = 12), + plot.title = element_text( + size = 10, + vjust = 0.5, hjust = 0.5 + ) + ) + + return(p) +} diff --git a/man/get_plot_forecasted_counts.Rd b/man/get_plot_forecasted_counts.Rd new file mode 100644 index 00000000..fec154c6 --- /dev/null +++ b/man/get_plot_forecasted_counts.Rd @@ -0,0 +1,35 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/figures.R +\name{get_plot_forecasted_counts} +\alias{get_plot_forecasted_counts} +\title{Get plot of fit and forecasted counts} +\usage{ +get_plot_forecasted_counts( + draws, + count_data_eval, + forecast_date, + count_type = "hospital admissions" +) +} +\arguments{ +\item{draws}{A dataframe containing the posterior draws with the data joined +to it. This is \code{draws_df} output of the call to \code{wwinference()}} + +\item{count_data_eval}{A dataframe containing the count data we will +evaluate the forecasts against.} + +\item{forecast_date}{A string indicating the date we made the forecast, for +plotting, in ISO8 format YYYY-MM-DD} + +\item{count_type}{A string indicating what data the counts refer to, +default is \verb{hospital admissions}} +} +\value{ +A ggplot object containing the posterior draw of the estimated, +nowcasted, and forecasted hospital admissions alongside the data used to +calibrate the hospital admissions model and the later observed hospital +admissions used to evaluate the forecast performance +} +\description{ +Get plot of fit and forecasted counts +} diff --git a/man/get_plot_global_rt.Rd b/man/get_plot_global_rt.Rd new file mode 100644 index 00000000..ed0347f2 --- /dev/null +++ b/man/get_plot_global_rt.Rd @@ -0,0 +1,22 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/figures.R +\name{get_plot_global_rt} +\alias{get_plot_global_rt} +\title{Get plot of fit, nowcasted, and forecasted "global" R(t)} +\usage{ +get_plot_global_rt(draws, forecast_date) +} +\arguments{ +\item{draws}{A dataframe containing the posterior draws with the data joined +to it. This is \code{draws_df} output of the call to \code{wwinference()}} + +\item{forecast_date}{A string indicating the date we made the forecast, for +plotting, in ISO8 format YYYY-MM-DD} +} +\value{ +A ggplot object containing the posterior draws of the global R(t) +estimate +} +\description{ +Get plot of fit, nowcasted, and forecasted "global" R(t) +} diff --git a/man/get_plot_subpop_rt.Rd b/man/get_plot_subpop_rt.Rd new file mode 100644 index 00000000..9fa9ef28 --- /dev/null +++ b/man/get_plot_subpop_rt.Rd @@ -0,0 +1,22 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/figures.R +\name{get_plot_subpop_rt} +\alias{get_plot_subpop_rt} +\title{Get plot of fit, nowcasted, and forecasted R(t) in each subpopulation} +\usage{ +get_plot_subpop_rt(draws, forecast_date) +} +\arguments{ +\item{draws}{A dataframe containing the posterior draws with the data joined +to it. This is \code{draws_df} output of the call to \code{wwinference()}} + +\item{forecast_date}{A string indicating the date we made the forecast, for +plotting, in ISO8 format YYYY-MM-DD} +} +\value{ +A ggplot object containing faceted plots of the R(t) estimate in each +subpopulation (so wastewater sites + those not on wastewater) +} +\description{ +Get plot of fit, nowcasted, and forecasted R(t) in each subpopulation +} diff --git a/man/get_plot_ww_conc.Rd b/man/get_plot_ww_conc.Rd new file mode 100644 index 00000000..5c782df0 --- /dev/null +++ b/man/get_plot_ww_conc.Rd @@ -0,0 +1,22 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/figures.R +\name{get_plot_ww_conc} +\alias{get_plot_ww_conc} +\title{Get plot of fit and forecasted wastewater concentrations} +\usage{ +get_plot_ww_conc(draws, forecast_date) +} +\arguments{ +\item{draws}{A dataframe containing the posterior draws with the data joined +to it. This is \code{draws_df} output of the call to \code{wwinference()}} + +\item{forecast_date}{A string indicating the date we made the forecast, for +plotting, in ISO8 format YYYY-MM-DD} +} +\value{ +a ggplot object containing faceted plots of the wastewaster +concentrations in each site and lab combination +} +\description{ +Get plot of fit and forecasted wastewater concentrations +} From d272090a0ec30786125bfd69b2a38d142982b9a5 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 5 Jul 2024 11:07:55 -0400 Subject: [PATCH 046/103] revert params back to current modle version --- inst/extdata/example_params.toml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/inst/extdata/example_params.toml b/inst/extdata/example_params.toml index 0a0eb8c8..ebacf844 100644 --- a/inst/extdata/example_params.toml +++ b/inst/extdata/example_params.toml @@ -18,9 +18,9 @@ initial_growth_prior_sd = 0.01 autoreg_rt_a = 2 # shape1 parameter of autoreg term on Rt trend autoreg_rt_b = 40 # shape2 parameter of autoreg on Rt trend # mean = a/(a+b) = 0.05, stdv = sqrt(a)/b = sqrt(2)/40 = 0.035 -autoreg_rt_site_a = 2 # shape1 parameter of autoreg term on difference between +autoreg_rt_site_a = 1 # shape1 parameter of autoreg term on difference between # R(t) state and R(t) site -autoreg_rt_site_b = 40 # shape2 parameter of autoreg term on difference between +autoreg_rt_site_b = 4 # shape2 parameter of autoreg term on difference between # R(t) state and R(t) site autoreg_p_hosp_a = 1 # shape1 parameter of autoreg term on IHR(t) trend autoreg_p_hosp_b = 100 # shape2 parameter of autoreg term on IHR(t) trend @@ -55,7 +55,7 @@ log10_g_prior_sd = 2 log_g_prior_mean = 27.63102 # 12 * log(10) log_g_prior_sd = 4.60517 # 2 * log(10) -sigma_ww_site_prior_mean_mean = 0.5 +sigma_ww_site_prior_mean_mean = 1 sigma_ww_site_prior_mean_sd = 1 sigma_ww_site_prior_sd_mean = 0 sigma_ww_site_prior_sd_sd = 1 From 728d83d72f541f74bbee3b984b2d6c2d07210daf Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 5 Jul 2024 11:09:00 -0400 Subject: [PATCH 047/103] add plotting and diagnostics to vignette --- vignettes/wwinference.Rmd | 50 +++++++++++++++++++++++++++++++++++---- 1 file changed, 45 insertions(+), 5 deletions(-) diff --git a/vignettes/wwinference.Rmd b/vignettes/wwinference.Rmd index d00b0cdd..6e72fd9d 100644 --- a/vignettes/wwinference.Rmd +++ b/vignettes/wwinference.Rmd @@ -356,11 +356,13 @@ ggplot(draws_df |> dplyr::filter( theme_bw() - ggplot(draws_df |> dplyr::filter( name == "pred_ww", draw %in% sampled_draws -)) + +) |> + dplyr::mutate( + site_lab_name = glue::glue("{subpop}, Lab: {lab}") + )) + geom_line( aes( x = date, y = log(pred_value), @@ -373,8 +375,10 @@ ggplot(draws_df |> dplyr::filter( geom_point(aes(x = date, y = log(observed_value)), color = "black", show.legend = FALSE ) + - facet_wrap(~subpop, scales = "free") + - geom_vline(aes(xintercept = forecast_date), linetype = "dashed") + + facet_wrap(~site_lab_name, scales = "free") + + geom_vline(aes(xintercept = lubridate::ymd(forecast_date)), + linetype = "dashed" + ) + xlab("") + ylab("Log(Genome copies/mL)") + ggtitle("Lab-site level wastewater concentration") + @@ -397,7 +401,43 @@ ggplot(draws_df |> dplyr::filter( facet_wrap(~subpop, scales = "free") + geom_hline(aes(yintercept = 1), linetype = "dashed") + xlab("") + - ylab("R(t) of hypothetical state") + + ylab("Subpopulation R(t)") + ggtitle("R(t) estimate") + theme_bw() ``` + +Alternatively, we can create each of these plots with the fitting wrapper +functions. +```{r} +plot_hosp <- get_plot_forecasted_counts( + draws = draws_df, + count_data_eval = hosp_data_eval, + forecast_date = forecast_date +) +plot_hosp +plot_ww <- get_plot_ww_conc(draws_df, forecast_date) +plot_ww +plot_state_rt <- get_plot_global_rt(draws_df, forecast_date) +plot_state_rt +plot_subpop_rt <- get_plot_subpop_rt(draws_df, forecast_date) +plot_subpop_rt +``` + +## Diagnostics +While the `wwinference()` function will print out messaging if any of the +diagnostics flags fail, we recommend running diagnostics as a separate +post-processing step on the `CmdStan` fit object. Start by running function +`get_model_diagnostic_flags()` on the stan fit object. Then, we recommend +looking at the `raw_fit_obj$summary()` which will show the diagnostic +metrics for each parameter in the model and can help identify which parameters +are likely to be driving any convergence issues. We have set default thresholds +on the model diagnostics for production-level runs, we recommend adjusting +as needed. For further information on troubleshooting the model diagnostics, +we recommend the (bayesplot tutorial)[https://mc-stan.org/bayesplot/articles/visual-mcmc-diagnostics.html]. +```{r} +convergence_flag_df <- wwinference::get_model_diagnostic_flags( + stan_fit_obj = + fit +) +parameter_diangostics <- fit$summary() +``` From f663ab62d2295470096fa07de6c3a605f805f1f6 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 5 Jul 2024 11:33:01 -0400 Subject: [PATCH 048/103] fix merge conflicts with main from updating package infreastructure" --- .github/workflows/pkgdown.yaml | 5 +++++ .pre-commit-config.yaml | 1 - 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/.github/workflows/pkgdown.yaml b/.github/workflows/pkgdown.yaml index 0fe6140b..5add54ff 100644 --- a/.github/workflows/pkgdown.yaml +++ b/.github/workflows/pkgdown.yaml @@ -42,6 +42,11 @@ jobs: extra-packages: any::pkgdown local::. needs: website + - name: "Install cmdstan via cmdstanr" + uses: epinowcast/actions/install-cmdstan@v1 + with: + cmdstan-version: "latest" + - name: Build site run: "pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE)" shell: Rscript {0} diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index ffc5480b..a8a77069 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -37,7 +37,6 @@ repos: - id: style-files args: [--style_pkg=styler, --style_fun=tidyverse_style, --cache-root=styler-perm] - #- id: roxygenize - id: use-tidy-description - id: lintr - id: readme-rmd-rendered From 06c8147b553dd757ceb09dfcea6f8fdc339d0f11 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 5 Jul 2024 11:39:25 -0400 Subject: [PATCH 049/103] try running pre-commit locally --- DESCRIPTION | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index f6e5a70b..473c2fc7 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -64,6 +64,13 @@ Suggests: Config/testthat/edition: 3 LazyData: true Imports: - fs + fs, + dplyr, + lubridate, + glue, + RcppTOML, + cli, + tibble, + usethis Remotes: stan-dev/cmdstanr From 64ab128933242859255fa52e051aca885267d38e Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 5 Jul 2024 11:46:31 -0400 Subject: [PATCH 050/103] add dependences to DESCRIPTION --- DESCRIPTION | 7 ++++++- data-raw/covid_pmfs.R | 2 +- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index 473c2fc7..4c2fd1ce 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -71,6 +71,11 @@ Imports: RcppTOML, cli, tibble, - usethis + usethis, + tidybayes, + tidyr, + purrr, + withr, + cmdstanr (>= 0.8.0) Remotes: stan-dev/cmdstanr diff --git a/data-raw/covid_pmfs.R b/data-raw/covid_pmfs.R index ce8dd1ba..5f999866 100644 --- a/data-raw/covid_pmfs.R +++ b/data-raw/covid_pmfs.R @@ -8,7 +8,7 @@ generation_interval <- withr::with_seed(42, { wwinference::simulate_double_censored_pmf( max = params$gt_max, meanlog = params$mu_gi, sdlog = params$sigma_gi, fun_dist = rlnorm, n = 5e6 - ) |> cfaforecastrenewalww::drop_first_and_renormalize() + ) |> wwinference::drop_first_and_renormalize() }) inc <- wwinference::make_incubation_period_pmf( From fe88a398606758652b6e2855b6d30aae73331c0e Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 5 Jul 2024 11:54:01 -0400 Subject: [PATCH 051/103] add more deps --- DESCRIPTION | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index 4c2fd1ce..feba3b55 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -76,6 +76,8 @@ Imports: tidyr, purrr, withr, - cmdstanr (>= 0.8.0) + cmdstanr (>= 0.8.0), + rlang, + scales Remotes: stan-dev/cmdstanr From b9c4eb8fa6dc0a6d650f24ee396e3975c0b54c74 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 5 Jul 2024 12:05:38 -0400 Subject: [PATCH 052/103] try removing print statement --- R/wwinference.R | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/R/wwinference.R b/R/wwinference.R index bc1b7171..e56e0ced 100644 --- a/R/wwinference.R +++ b/R/wwinference.R @@ -104,7 +104,7 @@ wwinference <- function(ww_data, chains = mcmc_options$n_chains, parallel_chains = mcmc_options$n_chains ) - print(fit) + # print(fit)#nolint return(fit) } @@ -177,7 +177,7 @@ wwinference <- function(ww_data, # Message if a flag doesn't pass. Still return # the same data, but we want user to know the issue if (any(convergence_flag_df[1, ])) { - message("Model flagged for convergence issues, run model diagnostics + warning("Model flagged for convergence issues, run model diagnostics on the output stanfit object for more information") } } From 56037c4499fde7958d405a99eefada5fefef030a Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 5 Jul 2024 12:06:55 -0400 Subject: [PATCH 053/103] revert --- R/wwinference.R | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/R/wwinference.R b/R/wwinference.R index e56e0ced..8d961ac1 100644 --- a/R/wwinference.R +++ b/R/wwinference.R @@ -104,7 +104,7 @@ wwinference <- function(ww_data, chains = mcmc_options$n_chains, parallel_chains = mcmc_options$n_chains ) - # print(fit)#nolint + # print(fit) # nolint return(fit) } From 102da37e17a9c51993384d4887ac5f042a360f6b Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 5 Jul 2024 12:12:00 -0400 Subject: [PATCH 054/103] update vignette data --- R/generate_simulated_data.R | 8 ++++---- data/hosp_data.rda | Bin 466 -> 476 bytes data/hosp_data_eval.rda | Bin 558 -> 558 bytes data/ww_data.rda | Bin 1258 -> 1719 bytes vignettes/wwinference.Rmd | 5 +++-- 5 files changed, 7 insertions(+), 6 deletions(-) diff --git a/R/generate_simulated_data.R b/R/generate_simulated_data.R index b3e17230..61ad2219 100644 --- a/R/generate_simulated_data.R +++ b/R/generate_simulated_data.R @@ -94,11 +94,11 @@ generate_simulated_data <- function(r_in_weeks = # nolint i0_over_n = 5e-4, initial_growth = 1e-4, sd_in_lab_level_multiplier = 0.25, - mean_obs_error_in_ww_lab_site = 0.3, - mean_reporting_freq = 1 / 7, - sd_reporting_freq = 1 / 14, + mean_obs_error_in_ww_lab_site = 0.2, + mean_reporting_freq = 1 / 5, + sd_reporting_freq = 1 / 20, mean_reporting_latency = 7, - sd_reporting_latency = 5, + sd_reporting_latency = 3, mean_log_lod = 3.8, sd_log_lod = 0.2, input_params_path = diff --git a/data/hosp_data.rda b/data/hosp_data.rda index 7f551268f21c639841cb811a0feb855409f27f4f..d44f24cca2cb7f4da429ecd3059b5e9296070dbb 100644 GIT binary patch literal 476 zcmZ>Y%CIzaj8qGb49Be5=chpWQ&4GwFJ9vuI2 zyT?#a@Tj1ms0$EC-JSU=eZu4o{EyX`{EjhsHcZ)-b)u1pg+oBW0g1A2Ii_+>^!?5U zSL0L9PtTS6A9llV9Z&n*)jt=_{Y%CIzaj8qGbls(aKoPqhz|7-t>Yh1$r|NroWK_Fm#`ke(X42%p83>*v&j0YGm z9^84u#A_BC2jl*$DhzBh8UA1O{~D6P!0fo^IWvFd46vE?K^i5?%f~G&Mz(}6EpQ@VWSimGtdz{!NV%b?$uschlF5J%s@V59{wb>i!9Eah|qA!f3beMBRm~ zj-4z{6Qx`upO06jZ_6;qJ6M5uMm31r;dm^CM7Nb{@Wqa**P4Bi> zx!Z2r3JaRLTnI|wJnF$TiRF}M!<1jIZZtBna0nM|a)oRGH)%;|U~^ o8Y%CIzaj8qGbytanrAp>ji|6~8FV?x6J|NsAoLBL^u`rQRj7#tWl7#tW6Fa^iR z>`LerRbO%0t$}^z0{gFl3=9m+e}n&qGB8{Sx&LaB2#^mHaR3qm|8K46I51O!!9;Ba z0|O(kfwP+$$E3-ED|tamE?=se9gxA+(<^22SLM$4Um!6k;L=#sB~mGr>OJGi&C7o^ zsxB_|TN-usP2|?Rr?)TVKR9$b%qk?DO;R#(n&d7gG3A3=$tsy%vvT})9NKXxU|msb z(GTg8eb=VD{a~*hkU)3Maz3dW%D(H)}3MTmCDs&yf{x(NTu!H zYK=!HA_W7#S-0Mm>1dnLF{k6LKREGJ&~+!$DHu_dmB43=D5rUpGjEiN*vAoO@q%cji`OV{2n$i>0Y%CIzaj8qGbY)O-!!N8jR|Jc9kn2_-Q|NsAC5O7$Zes{qW1_uTX1_#CiOu;cS zyNf z!q&o;g)vXLo@yuutvt8lRj}3Nt*g7`toHpWzj1QfJBE8HPJzMKy86PmPMTVlpp=^z z^+2|F(E^tVf4;`X77LcRx>r7ax&LsYbCr@>jM2^&>>KR^1=;H^Zqe;L(OD%R%A~^R z)v#4yk>b6Ou0od^os&PW)?|9=Fo~hGk%L)DX8x-$3JkIhhYFav9*pvhjo>P%i!k2?i3xq6D7pMT@^F*gu! z;Sf-8Xk=o6QeZJRH>cMtLNonhYxmUbPCNg-es%GZy&r2=2VKuHZC_jdTx{=}s4cs# zLuCAgSG@i1)7o{-Xn$r-_ZQQd{?|J9uH2?_ciE$}Ve{fTJXI!no2pC#Ql2W)KQJ&p ltZHCj0J3L2frve96W3a&z#GoP^}lzw7`1{r~^>|NsB*|Nr~E z|G)qL|Nqbgyzm;)ySujcyz|_Y;v$s&Q1p*cqhyDq3{5mWH5z(MOrDXarlup*V^D1- z)B);bG|)8A(?Ifr(1u1vL8AgaK=mFd%|`T*uuw1{^*uqhsCpv=4?zh{QzL0MnwXgk znWQk888)Us^)Q%2M%2lnlSUB8%@LuY>KURLsf`mu(2vys^qDdmVKXE$X_HMdZ6hWT zq57i;8hR1xXhG@#0MGyc0B8n)00000001-tA)wFz00000000000000D01YyvC;&78 zpaGBpp`+CG8UPJ6008v>00xGd0ib9!27mwnGyo4!001-^0004?NhDBAQ_~dl)lX9z zLE$|}WNjnV^bzR>jXguuKUDP^Z8U+1Hm9a3Tj&smKB%F^MSzi7AOlu>`V3ib*1w6jCIFf&`>eNt59E7MSHV zQ`JJEDp1vGygg(+4_{x{Fb#l&mL()2gh4d?;2?SV)FckW1CWwjqX_~h-*Av}s}jk> z5T9}@<52xiM`;N(Bp9)F^56aF4iws~Sf= zCb;GWE+oP%Ss+H*$=;g@09o5wK?+e4A`ppIMZy7r1-mWeT|ATGQ-cK>5Yd?O*oMdE zAy~Nf8|7?p2_A-R0QCzAN;Ta4q`UmK1ES&Yo~LGwk=Ev1d@X1&A{?1=CEcp zd0Ad#BWHKGox&hDUQ{|D1Q}ZBe8cR~*KmIoo!5w4$CYaom@fA&oROdKojk?07QRXQhQVSdK|vgs-9eG& zAh2O@LO!sd%W!Q;seI)0hz}5)m&d+G>JhM;Yr4QL>J zuJ(G!d;O0ux03&KxyxEw+s4eD71meA$Ev=Dfj6-T^Vc5vG}E34M!#YUC{lrX)Gjyd zN;>O9RFz*a-)#hHPO7|Wn07j-rJVHFSbYtPiZ$Ekb+gC!_%dG%8b5a%mdc$zvaR@l zgh)XG0RWaIi6j6^5&;ANOA-Mr31ABXz!pi6fjf7k!$s|KXhuPVEAG=s@ zzt{T6c<9_DOQ;(c`=at#bHPH1@wz+_FjHSYAJajQ>pXQuswnA>9CdsGfx^twaS1Zd99Rer z&iWlT8Y-V;iQy0nL~STHE?0A~o#QIK9Jn0>y%%Qb_yht%pOO4Am&Npn27_P*m=A`i zH*X}(v&#(N6TBY~k&KgEc%p;?4x<#&tc$fdx@tj>Vl*TH4NHFfYfO=7AP^xxLPjWw zJW|0C2y28pGB1|WFZ$pEU5|?Wzjr?(>|^C+J+7sseY2P8SjTijjFeJlwT?}{jfivj zUdW-vBjJTiJ8^tBJ;j7nhXKi?Hwz}|q>E@6*6(#F2aMYka~x{#5<=Cn8(L}C`RSN@ z7vy@!QrcHMOiw!bAmS8JvJVi734TZxh(>*qrO}>%H1K_6G0nVf%~~!J`P8O?LY2ah z_2Ry`>z+m*2~(@t&D|M~)1P;N5Cd$LvF+$m3bN%xRVrX;b7r&QwMb1WK$8OT{MFm5 z&2WW>%cK+z@AzC3S>wZdw&HaZSI3j!bm%)dfJSCZBsVJc~e$hKk!5V#WFCrd@v9eM+RO( zU)h(I&nr<|Z5s9w|0AW$pq4u=kZ|~*gUtSUscwIRSs#6IlcwKptp9FDX>O2ibG&3E zpt06Lp}9yhnA@9M*zsl15i;CcZJZu5K?p+BR9nZ=-|W8*9zW#$aJkZ*vO7px6oH?> N7ji{7P>>T1%?0yn5*z>k literal 1258 zcmVnfCRvqp&1P{#Kg&f0$|W45r_;!Mwo$+ z+G#RjG{ni84GD~cNrszL!eKL0Mnll5jWTEe2ATk9001-^00001pa1{^LqGsB0MGyc z0002c004S`07OLwng9TsQ%0L01Z^V$Gf9NN005e4fJ^`Y0%BrdCMEy?005aZzzATC zG|7Mnr|C+brXi80BTS4VBLXmuG+<3KX|y27!Zb9%2r?Qp10klGGHL1zCMJvo#4rM5 zQznd<1R6ABQ_%uRDiM*9k&&jFG}Az6&@{+028M$}Kr{h}8Zjx`b&^1z z*;nP9A!RwvK!gvIGZrVemk>=b)Xan0ePB<*OoSldxeh>ugW);pkwk%zDJo-xpGFG; z0hFBch{^X7UVsyfLqyOzB?4Gf1QIY1v>ByKn|shmWKuw_E^Y{88Zi&%jS?z5la2z8 z949N<#fk_4+v1i2B6oJ9CXM|5U0_avnBJ1Q8DqF^E4R36TLM zsRFJ7VP%*mn3J@-&AWEhrdWnRM>#{Z8!f{DG=YvAd#Bt`Fi$e--+1(ye4KDdG3$!d z$^vFU5jRpYc5-rVHcw1x=H}m|L74!+GbEABf0Y2Sq>n4U;G6?XmRo(vmN$j?yLj&s zim94@bQUBrQYMokKTgd((N4|IRz)z4T{<%CvvblP{gf#%A?m!9+fThrisVE+M$fps zl)wNW00cl006+jh0DuI{12P$bm|-NHN7jQtQ&G7nw6`o9-nMdNADuMsF_W(plmS6^X1M%}I*?UTZOFoa>#vv(2 zNbNe*7z}N;1css!oTtOb%0@E2Fu* z=dAD&v7?m!AOHdUAUJRrIfnhjI*5kKLt2nP0L?41#7qfR`58h01`RWn=7}o-5Pbe|6gngWsmaboC@Bv)q2NG? zEy5(S6N}JYWiy&yIzSYDWf6jFJhueMm$_(L+w4IIhI~4r59<;b9i{>X5=9z;7DK-T z&>TXDmtG4Wb&?Iz02>1UJ^USU`$GZ(mqK+6rt`}!Q;37oRAv9NyBbPV7|+$~sDP^N zkQn(K`qTZAZP$9w^zK@ih=`wieaYJ9$!6Ow`3_9~ZxcQ?005Y3pM8bQO@Zi>P`s)_ zKv8C6^PQ`uOZ}0o$RTkONWj1I=+K~XQb;q?v@%cpLuty_#egYLpe;c0++HM}IJ$I~ U)N6zUr2X-CBvXY61<+9%05%R8G5`Po diff --git a/vignettes/wwinference.Rmd b/vignettes/wwinference.Rmd index 6e72fd9d..b0b0e194 100644 --- a/vignettes/wwinference.Rmd +++ b/vignettes/wwinference.Rmd @@ -314,6 +314,7 @@ outputs against data. This can be useful to get a sense of if you're model is fitting the data well, and if the nowcasted/forecast quantities look reasonable. ```{r} +draws_df <- fit$draws_df sampled_draws <- sample(1:max(draws_df$draw), 100) # Hospital admissions: fits, nowcasts, forecasts @@ -437,7 +438,7 @@ we recommend the (bayesplot tutorial)[https://mc-stan.org/bayesplot/articles/vis ```{r} convergence_flag_df <- wwinference::get_model_diagnostic_flags( stan_fit_obj = - fit + fit$raw_fit_obj ) -parameter_diangostics <- fit$summary() +parameter_diangostics <- fit$raw_fit_object$summary() ``` From b2d064e5aa1d6ef5c91a8c5ace15e6dcb55f5791 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 5 Jul 2024 13:56:02 -0400 Subject: [PATCH 055/103] fix typos --- R/wwinference.R | 2 +- vignettes/wwinference.Rmd | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/R/wwinference.R b/R/wwinference.R index 8d961ac1..55629062 100644 --- a/R/wwinference.R +++ b/R/wwinference.R @@ -55,7 +55,7 @@ #' pipeline to ensure model convergence. #' @export #' -#' @examplesIf interactive() +#' @examples #' # provide all the code and commenting in the getting started vignette #' wwinference <- function(ww_data, diff --git a/vignettes/wwinference.Rmd b/vignettes/wwinference.Rmd index b0b0e194..e7cf28e0 100644 --- a/vignettes/wwinference.Rmd +++ b/vignettes/wwinference.Rmd @@ -169,7 +169,7 @@ ggplot(ww_data_preprocessed) + show.legend = FALSE ) + geom_hline(aes(yintercept = lod), linetype = "dashed") + - facet_wrap(~lab_site_name) + + facet_wrap(~lab_site_name, scales = "free") + xlab("") + ylab("Genome copies/mL") + ggtitle("Lab-site level wastewater concentration") + @@ -440,5 +440,5 @@ convergence_flag_df <- wwinference::get_model_diagnostic_flags( stan_fit_obj = fit$raw_fit_obj ) -parameter_diangostics <- fit$raw_fit_object$summary() +parameter_diangostics <- fit$raw_fit_obj$summary() ``` From 9b3777f694cd61046ed9d385a275c9ed768e8d86 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 5 Jul 2024 14:17:30 -0400 Subject: [PATCH 056/103] remove example --- R/wwinference.R | 3 --- 1 file changed, 3 deletions(-) diff --git a/R/wwinference.R b/R/wwinference.R index 55629062..f403ee32 100644 --- a/R/wwinference.R +++ b/R/wwinference.R @@ -55,9 +55,6 @@ #' pipeline to ensure model convergence. #' @export #' -#' @examples -#' # provide all the code and commenting in the getting started vignette -#' wwinference <- function(ww_data, count_data, model_spec = wwinference::get_model_spec( From e7702352010e3103145b115c25a35c6fb820d779 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 5 Jul 2024 14:30:17 -0400 Subject: [PATCH 057/103] try adding bookdown to imports --- DESCRIPTION | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index feba3b55..a685cc58 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -78,6 +78,7 @@ Imports: withr, cmdstanr (>= 0.8.0), rlang, - scales + scales, + bookdown Remotes: stan-dev/cmdstanr From 81ef6d828b24e2fc531c19529e165a0135097f42 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Mon, 8 Jul 2024 08:36:13 -0400 Subject: [PATCH 058/103] fix to specify ISO8601 format for dates --- R/checkers.R | 2 +- R/figures.R | 8 ++++---- R/generate_simulated_data.R | 4 ++-- R/get_stan_data.R | 4 ++-- R/wwinference.R | 2 +- man/check_date.Rd | 2 +- man/generate_simulated_data.Rd | 12 ++++++------ man/get_model_spec.Rd | 2 +- man/get_plot_forecasted_counts.Rd | 2 +- man/get_plot_global_rt.Rd | 2 +- man/get_plot_subpop_rt.Rd | 2 +- man/get_plot_ww_conc.Rd | 2 +- man/get_stan_data.Rd | 4 ++-- man/wwinference.Rd | 5 ----- 14 files changed, 24 insertions(+), 29 deletions(-) diff --git a/R/checkers.R b/R/checkers.R index b0e419dd..29e0bf2b 100644 --- a/R/checkers.R +++ b/R/checkers.R @@ -1,7 +1,7 @@ #' Check that all dates in dataframe passed in are before a specified date #' #' @param df dataframe with `date` column -#' @param max_date string indicating the maximum date in ISO8 convention +#' @param max_date string indicating the maximum date in ISO8601 convention #' e.g. YYYY-MM-DD #' @param call Calling environment to be passed to the type checker #' diff --git a/R/figures.R b/R/figures.R index 6309c7ab..176ceff0 100644 --- a/R/figures.R +++ b/R/figures.R @@ -5,7 +5,7 @@ #' @param count_data_eval A dataframe containing the count data we will #' evaluate the forecasts against. #' @param forecast_date A string indicating the date we made the forecast, for -#' plotting, in ISO8 format YYYY-MM-DD +#' plotting, in ISO8601 format YYYY-MM-DD #' @param count_type A string indicating what data the counts refer to, #' default is `hospital admissions` #' @@ -63,7 +63,7 @@ get_plot_forecasted_counts <- function(draws, #' @param draws A dataframe containing the posterior draws with the data joined #' to it. This is `draws_df` output of the call to `wwinference()` #' @param forecast_date A string indicating the date we made the forecast, for -#' plotting, in ISO8 format YYYY-MM-DD +#' plotting, in ISO8601 format YYYY-MM-DD #' #' @return a ggplot object containing faceted plots of the wastewaster #' concentrations in each site and lab combination @@ -122,7 +122,7 @@ get_plot_ww_conc <- function(draws, #' @param draws A dataframe containing the posterior draws with the data joined #' to it. This is `draws_df` output of the call to `wwinference()` #' @param forecast_date A string indicating the date we made the forecast, for -#' plotting, in ISO8 format YYYY-MM-DD +#' plotting, in ISO8601 format YYYY-MM-DD #' #' @return A ggplot object containing the posterior draws of the global R(t) #' estimate @@ -170,7 +170,7 @@ get_plot_global_rt <- function(draws, #' @param draws A dataframe containing the posterior draws with the data joined #' to it. This is `draws_df` output of the call to `wwinference()` #' @param forecast_date A string indicating the date we made the forecast, for -#' plotting, in ISO8 format YYYY-MM-DD +#' plotting, in ISO8601 format YYYY-MM-DD #' #' @return A ggplot object containing faceted plots of the R(t) estimate in each #' subpopulation (so wastewater sites + those not on wastewater) diff --git a/R/generate_simulated_data.R b/R/generate_simulated_data.R index 61ad2219..15e84175 100644 --- a/R/generate_simulated_data.R +++ b/R/generate_simulated_data.R @@ -22,8 +22,8 @@ #' hospital admissions date and forecast date in days #' @param forecast_horizon integer indicating the duration of the forecast in #' days e.g. 28 days -#' @param sim_start_date character string formatted as "YYYY-MM-DD" indicating -#' the start date of the simulation, used to get a weekday vector +#' @param sim_start_date character string in ISO8601 format YYYY-MM-DD +#' indicating the start date of the simulation, used to get a weekday vector #' @param hosp_wday_effect a vector that is a simplex of length 7 describing #' how the hospital admissions are spread out over a week, starting at #' Monday = 1 diff --git a/R/get_stan_data.R b/R/get_stan_data.R index 22a49eaa..4a6ee965 100644 --- a/R/get_stan_data.R +++ b/R/get_stan_data.R @@ -5,8 +5,8 @@ #' @param input_ww_data a dataframe with the input wastewater data with no gaps, #' must have the following columns: date, site, lab, genome_copies_per_ml, #' site_pop, below_lod, and if removing outliers, flag_as_ww_outlier -#' @param forecast_date string indicating the forecast date in ISO8 convention -#' e.g. YYYY-MM-DD +#' @param forecast_date string indicating the forecast date in ISO8601 +#' convention e.g. YYYY-MM-DD #' @param forecast_horizon integer indicating the number of days to make a #' forecast for #' @param calibration_time integer indicating the max duration in days that diff --git a/R/wwinference.R b/R/wwinference.R index f403ee32..98e0daba 100644 --- a/R/wwinference.R +++ b/R/wwinference.R @@ -242,7 +242,7 @@ get_mcmc_options <- function( #' and data on wastewater viral concentrations #' #' -#' @param forecast_date a character string in ISO8 format (YYYY-MM-DD) +#' @param forecast_date a character string in ISO8601 format (YYYY-MM-DD) #' indicating the date that the forecast is to be made. Default is #' @param calibration_time integer indicating the number of days to calibrate #' the model for, default is `90` diff --git a/man/check_date.Rd b/man/check_date.Rd index f0c8726d..ddea07ac 100644 --- a/man/check_date.Rd +++ b/man/check_date.Rd @@ -9,7 +9,7 @@ check_date(df, max_date, call = rlang::caller_env()) \arguments{ \item{df}{dataframe with \code{date} column} -\item{max_date}{string indicating the maximum date in ISO8 convention +\item{max_date}{string indicating the maximum date in ISO8601 convention e.g. YYYY-MM-DD} \item{call}{Calling environment to be passed to the type checker} diff --git a/man/generate_simulated_data.Rd b/man/generate_simulated_data.Rd index c1bfde5f..b5638514 100644 --- a/man/generate_simulated_data.Rd +++ b/man/generate_simulated_data.Rd @@ -19,11 +19,11 @@ generate_simulated_data( i0_over_n = 5e-04, initial_growth = 1e-04, sd_in_lab_level_multiplier = 0.25, - mean_obs_error_in_ww_lab_site = 0.3, - mean_reporting_freq = 1/7, - sd_reporting_freq = 1/14, + mean_obs_error_in_ww_lab_site = 0.2, + mean_reporting_freq = 1/5, + sd_reporting_freq = 1/20, mean_reporting_latency = 7, - sd_reporting_latency = 5, + sd_reporting_latency = 3, mean_log_lod = 3.8, sd_log_lod = 0.2, input_params_path = fs::path_package("extdata", "example_params.toml", package = @@ -58,8 +58,8 @@ hospital admissions date and forecast date in days} \item{forecast_horizon}{integer indicating the duration of the forecast in days e.g. 28 days} -\item{sim_start_date}{character string formatted as "YYYY-MM-DD" indicating -the start date of the simulation, used to get a weekday vector} +\item{sim_start_date}{character string in ISO8601 format YYYY-MM-DD +indicating the start date of the simulation, used to get a weekday vector} \item{hosp_wday_effect}{a vector that is a simplex of length 7 describing how the hospital admissions are spread out over a week, starting at diff --git a/man/get_model_spec.Rd b/man/get_model_spec.Rd index 9c3a916f..aa53f4e3 100644 --- a/man/get_model_spec.Rd +++ b/man/get_model_spec.Rd @@ -16,7 +16,7 @@ get_model_spec( ) } \arguments{ -\item{forecast_date}{a character string in ISO8 format (YYYY-MM-DD) +\item{forecast_date}{a character string in ISO8601 format (YYYY-MM-DD) indicating the date that the forecast is to be made. Default is} \item{calibration_time}{integer indicating the number of days to calibrate diff --git a/man/get_plot_forecasted_counts.Rd b/man/get_plot_forecasted_counts.Rd index fec154c6..b201e9ba 100644 --- a/man/get_plot_forecasted_counts.Rd +++ b/man/get_plot_forecasted_counts.Rd @@ -19,7 +19,7 @@ to it. This is \code{draws_df} output of the call to \code{wwinference()}} evaluate the forecasts against.} \item{forecast_date}{A string indicating the date we made the forecast, for -plotting, in ISO8 format YYYY-MM-DD} +plotting, in ISO8601 format YYYY-MM-DD} \item{count_type}{A string indicating what data the counts refer to, default is \verb{hospital admissions}} diff --git a/man/get_plot_global_rt.Rd b/man/get_plot_global_rt.Rd index ed0347f2..1b01cc92 100644 --- a/man/get_plot_global_rt.Rd +++ b/man/get_plot_global_rt.Rd @@ -11,7 +11,7 @@ get_plot_global_rt(draws, forecast_date) to it. This is \code{draws_df} output of the call to \code{wwinference()}} \item{forecast_date}{A string indicating the date we made the forecast, for -plotting, in ISO8 format YYYY-MM-DD} +plotting, in ISO8601 format YYYY-MM-DD} } \value{ A ggplot object containing the posterior draws of the global R(t) diff --git a/man/get_plot_subpop_rt.Rd b/man/get_plot_subpop_rt.Rd index 9fa9ef28..31f091bc 100644 --- a/man/get_plot_subpop_rt.Rd +++ b/man/get_plot_subpop_rt.Rd @@ -11,7 +11,7 @@ get_plot_subpop_rt(draws, forecast_date) to it. This is \code{draws_df} output of the call to \code{wwinference()}} \item{forecast_date}{A string indicating the date we made the forecast, for -plotting, in ISO8 format YYYY-MM-DD} +plotting, in ISO8601 format YYYY-MM-DD} } \value{ A ggplot object containing faceted plots of the R(t) estimate in each diff --git a/man/get_plot_ww_conc.Rd b/man/get_plot_ww_conc.Rd index 5c782df0..a8cd1567 100644 --- a/man/get_plot_ww_conc.Rd +++ b/man/get_plot_ww_conc.Rd @@ -11,7 +11,7 @@ get_plot_ww_conc(draws, forecast_date) to it. This is \code{draws_df} output of the call to \code{wwinference()}} \item{forecast_date}{A string indicating the date we made the forecast, for -plotting, in ISO8 format YYYY-MM-DD} +plotting, in ISO8601 format YYYY-MM-DD} } \value{ a ggplot object containing faceted plots of the wastewaster diff --git a/man/get_stan_data.Rd b/man/get_stan_data.Rd index 428199a0..16736a64 100644 --- a/man/get_stan_data.Rd +++ b/man/get_stan_data.Rd @@ -25,8 +25,8 @@ the following columns: date, count, total_pop} must have the following columns: date, site, lab, genome_copies_per_ml, site_pop, below_lod, and if removing outliers, flag_as_ww_outlier} -\item{forecast_date}{string indicating the forecast date in ISO8 convention -e.g. YYYY-MM-DD} +\item{forecast_date}{string indicating the forecast date in ISO8601 +convention e.g. YYYY-MM-DD} \item{forecast_horizon}{integer indicating the number of days to make a forecast for} diff --git a/man/wwinference.Rd b/man/wwinference.Rd index 39a46fb0..6f7bbdca 100644 --- a/man/wwinference.Rd +++ b/man/wwinference.Rd @@ -72,8 +72,3 @@ getting started vignette for an example model specifications fitting COVID-19 hospital admissions from a hypothetical state and wasteawter concentration data from multiple sites within that state. } -\examples{ -\dontshow{if (interactive()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} -# provide all the code and commenting in the getting started vignette -\dontshow{\}) # examplesIf} -} From 4cccf18ac01d1db98d972ed0784c982ea90b26c5 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Mon, 8 Jul 2024 08:48:57 -0400 Subject: [PATCH 059/103] add ar1 testing functions --- tests/testthat/helper.R | 109 ++++++++++++++++++++++++++++++++++++++ tests/testthat/setup.R | 22 ++++++++ tests/testthat/test_ar1.R | 40 ++++++++++++++ 3 files changed, 171 insertions(+) create mode 100644 tests/testthat/helper.R create mode 100644 tests/testthat/setup.R create mode 100644 tests/testthat/test_ar1.R diff --git a/tests/testthat/helper.R b/tests/testthat/helper.R new file mode 100644 index 00000000..28a31b09 --- /dev/null +++ b/tests/testthat/helper.R @@ -0,0 +1,109 @@ +logit_fn <- function(p) { + stats::qlogis(p) +} + +inv_logit_fn <- function(x) { + stats::plogis(x) +} + + +#' Reference R implementation of a zero-mean AR(1) +#' assembly from a set of z scores, +#' a standard deviation, and +#' an autocorrelation coefficient +#' +#' @param z a vector of z scores +#' @param sd the standard deviation of the AR(1) process +#' error distribution, as a scalar +#' @param ac the autocorrelation coefficient of the AR(1) +#' process, as a scalar +#' @param stationary Whether to initialize the process +#' at stationarity. Boolean, default `FALSE`. +#' @return Vector of values of the zero-mean AR(1) process +ar1_from_z_scores <- function(z, sd, ar, stationary = FALSE) { + x <- rep(NA, length(z)) + + x[1] <- z[1] * sd + if (stationary) { + stat_sd <- sd / sqrt(1 - ar * ar) + x[1] <- z[1] * stat_sd + } + + for (i in 2:length(z)) { + x[i] <- ar * x[i - 1] + z[i] * sd + } + + return(x) +} + +#' Reference R implementation of a first +#' differenced zero-mean AR(1) assembly +#' from an initial +#' value, a set of z scores, +#' a standard deviation, and +#' an autocorrelation coefficient +#' +#' @param x0 the initial value of the first difference +#' ar1 process +#' @param ar the autocorrelation coefficient of the +#' underlying zero-mean AR(1) process, as a scalar +#' @param sd the standard deviation, of the underlying +#' AR(1) process, as a scalar +#' @param z a vector of z scores +#' @param stationary Whether to initialize the underlying +#' AR(1) on the first differences at stationarity. +#' Boolean, default `FALSE`. +#' @return Vector of values of the first differenced +#' zero-mean AR(1) process +diff_ar1_from_z_scores <- function(x0, ar, sd, z, stationary = FALSE) { + n <- length(z) + 1 + diffs <- rep(NA, n) + diffs[1] <- x0 + diffs[2:n] <- ar1_from_z_scores( + z, sd, ar, + stationary = stationary + ) + + return(cumsum(diffs)) +} + + + +#' Alternative R implementation of a first +#' differenced zero-mean AR(1) assembly +#' from an initial +#' value, a set of z scores, +#' a standard deviation, and +#' an autocorrelation coefficient +#' +#' @param x0 the initial value of the first difference +#' ar1 process +#' @param ar the autocorrelation coefficient of the +#' underlying zero-mean AR(1) process, as a scalar +#' @param sd the standard deviation, of the underlying +#' AR(1) process, as a scalar +#' @param z a vector of z scores +#' @param stationary Whether to initialize the underlying +#' AR(1) on the first differences at stationarity. +#' Boolean, default `FALSE`. +#' @return Vector of values of the first differenced +#' zero-mean AR(1) process +diff_ar1_from_z_scores_alt <- function(x0, ar, sd, z, stationary = FALSE) { + n <- length(z) + 1 + x <- rep(NA, n) + x[1] <- x0 + + first_sd <- sd + + if (stationary) { + first_sd <- first_sd / sqrt(1 - ar^2) + } + + x[2] <- x[1] + first_sd * z[1] + + for (i in 3:n) { + x[i] <- x[i - 1] + ar * (x[i - 1] - x[i - 2]) + sd * z[i - 1] + } + + return(x) +} diff --git a/tests/testthat/setup.R b/tests/testthat/setup.R new file mode 100644 index 00000000..dc810cf9 --- /dev/null +++ b/tests/testthat/setup.R @@ -0,0 +1,22 @@ +testthat_stan_include <- function() { + system.file( + "stan", + package = "wwinference" + ) +} + +model_file_path_id <- system.file( + "stan", "wwinference.stan", + package = "wwinference" +) + +cat("\nsetup.R is compiling the stan model in preparation for testing.\n") + +# precompiled site-level infection dynamics model +compiled_site_inf_model <- cmdstanr::cmdstan_model( + model_file_path_id, + force_recompile = TRUE, + compile_standalone = TRUE, + include = testthat_stan_include(), + dir = tempdir() +) diff --git a/tests/testthat/test_ar1.R b/tests/testthat/test_ar1.R new file mode 100644 index 00000000..49c7ddce --- /dev/null +++ b/tests/testthat/test_ar1.R @@ -0,0 +1,40 @@ +test_that("Test AR(1) function in stan.", { + model <- compiled_site_inf_model + + withr::with_seed(42, { + z <- rnorm(10) + + stan_ar <- model$functions$ar1( + mu = rep(0, length(z)), + ac = 0.73, + sd = 1.26, + z = z, + is_stat = TRUE + ) + r_ar <- ar1_from_z_scores(z, 1.26, 0.73, TRUE) + + testthat::expect_equal( + stan_ar, + r_ar + ) + + stan_ar_nonstat <- model$functions$ar1( + mu = rep(0, length(z)), + ac = 0.73, + sd = 1.26, + z = z, + is_stat = FALSE + ) + + testthat::expect_true( + all(abs(stan_ar - stan_ar_nonstat) > testthat::testthat_tolerance()) + ) + + r_ar_nonstat <- ar1_from_z_scores(z, 1.26, 0.73, FALSE) + + testthat::expect_equal( + stan_ar_nonstat, + r_ar_nonstat + ) + }) +}) From 74f081867d2acdb3497b7ad05cb261636d71db36 Mon Sep 17 00:00:00 2001 From: Kaitlyn Johnson <94390107+kaitejohnson@users.noreply.github.com> Date: Mon, 8 Jul 2024 08:52:21 -0400 Subject: [PATCH 060/103] Create CI for package tests --- .github/workflows/r-cmd-check | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) create mode 100644 .github/workflows/r-cmd-check diff --git a/.github/workflows/r-cmd-check b/.github/workflows/r-cmd-check new file mode 100644 index 00000000..2bc50679 --- /dev/null +++ b/.github/workflows/r-cmd-check @@ -0,0 +1,28 @@ +name: R CMD check project packages + +on: + pull_request: + push: + branches: [main, prod] + +jobs: + check-package: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - uses: actions/checkout@v4 + - uses: r-lib/actions/setup-r@v2 + with: + r-version: "release" + use-public-rspm: true + extra-repositories: "https://mc-stan.org/r-packages/" + - name: "Set up dependencies for wweval" + uses: r-lib/actions/setup-r-dependencies@v2 + with: + needs: check + - name: "Install cmdstan via cmdstanr" + uses: epinowcast/actions/install-cmdstan@v1 + with: + cmdstan-version: "latest" + - name: "Check wwinference package" + uses: r-lib/actions/check-r-package@v2 From 90dcdd27e5abacd03ed1ed2abe8c98c692b38227 Mon Sep 17 00:00:00 2001 From: Kaitlyn Johnson <94390107+kaitejohnson@users.noreply.github.com> Date: Mon, 8 Jul 2024 08:54:59 -0400 Subject: [PATCH 061/103] Rename r-cmd-check to r-cmd-check.yaml --- .github/workflows/{r-cmd-check => r-cmd-check.yaml} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename .github/workflows/{r-cmd-check => r-cmd-check.yaml} (100%) diff --git a/.github/workflows/r-cmd-check b/.github/workflows/r-cmd-check.yaml similarity index 100% rename from .github/workflows/r-cmd-check rename to .github/workflows/r-cmd-check.yaml From 098cb22686ec52490e13aec6652a395c37fd808b Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Mon, 8 Jul 2024 09:05:30 -0400 Subject: [PATCH 062/103] add rcmdcheck to package imports --- DESCRIPTION | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index a685cc58..b6d97be5 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -79,6 +79,7 @@ Imports: cmdstanr (>= 0.8.0), rlang, scales, - bookdown + bookdown, + rcmdcheck Remotes: stan-dev/cmdstanr From cc05cd6363d8c0b5018a85542669356fd0a94d2e Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Mon, 8 Jul 2024 09:12:47 -0400 Subject: [PATCH 063/103] add ggplot2 dependency --- DESCRIPTION | 1 + 1 file changed, 1 insertion(+) diff --git a/DESCRIPTION b/DESCRIPTION index b6d97be5..d25633cd 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -80,6 +80,7 @@ Imports: rlang, scales, bookdown, + ggplot2, rcmdcheck Remotes: stan-dev/cmdstanr From f29071565ef7fbdd139dccb0ff702ccfa2a5b5c5 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Mon, 8 Jul 2024 09:16:20 -0400 Subject: [PATCH 064/103] add way to generate testing data from model run and asve testing data --- DESCRIPTION | 3 +- R/sysdata.rda | Bin 0 -> 37948 bytes R/wwinference-package.R | 2 + data-raw/test_data.R | 89 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 93 insertions(+), 1 deletion(-) create mode 100644 R/sysdata.rda create mode 100644 data-raw/test_data.R diff --git a/DESCRIPTION b/DESCRIPTION index d25633cd..21611fdd 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -81,6 +81,7 @@ Imports: scales, bookdown, ggplot2, - rcmdcheck + rcmdcheck, + posterior Remotes: stan-dev/cmdstanr diff --git a/R/sysdata.rda b/R/sysdata.rda new file mode 100644 index 0000000000000000000000000000000000000000..ef0eb9fbe85e8fb3ede8051a85ece914c25377a8 GIT binary patch literal 37948 zcmcG#1#}(Dk|io;28)@=VrHSa@h<1=RuztLapMmZ+ z8X+4{{T!n=@ERK-vjXD81;23ClCgDxXKRKFLV-Xft5cJ~B$g@=QGw+>K>9)E`UwF; zfe|7B0gvncp89VQecqr16A(&`W{2diZ4`uGtw@(rF}4HxoIW`(0-o83vWMB^_n~X( zJl0KtRFGIw-U&~tT&N(ZK%Ar)T0;FpNIp@yG}K=L7)erq7zvFC9VRCftRj~Xn6^TM z_=h}898jdF9}qc^fO!stA4_o>E5ihE&V-Q}5IhiO8jyi`SWp)u3q-KA-vl;S*B4%{ot`d$^GD^rBQ)c(D^4o zB|U|#1k5I+&EVnEON^F^te=$&RYap$%*=z#CV=y0KxY<9%|K^9W&;8h00PeWsDX?8 zusD;i%^y6S0 za>9`MQ-cl28!M1XI${g)3^a@7LL&-g;QOk3Zo5@x(PJ=em!weGw)>V1j2<;@Wy@W0 zT|Qjw-!}AL$v(gMl3pxQNtG>UP)OOGT{O(6e0~Cm(_eG~8RUgQg-eNA2?0z}OEnVo zYN=IM&9Rmxt}Fo~eV; z$`v}2RH4E~FSMkQiXVoi(WongIb-kbf!NS&j!pT3t&N`agNo{_qJ)IT>P+=8t)aS+ zwZE0(?za|sDojPi6?3{$U2T}vIvQo={-c?~srR0GG=j-=+qBGUw;~qcI?^c<;(W+|P!v|Zd|6K0V|AvO+&QnN< z4)$f}^x?JWUgH~WJxE2Q3PIvIuWX$j)J!1~sunRdo#QtkVCEBWNRPEm0S!hpM_X4B za6Xeq7~*f3wt$^!AE(c6XI-6`=AGZW@HRIBcX@EK_-sIsL!r9~IKD4luP!jhj5+7K zyQu|JOt8U+gwSMg;DM3!@QB;XMu><-deGMUZ)fqOv-|o)4Df^va!aqeG20YVKwbF{ zH4971D(_|&;5LmoGGxq9utQx#;;@&G>Y(amb7t&vt{0Cx^1KU9+nkE@_ywtM3xwIR74~01R^BS1E@s`2}LNRmSq+SV6^(keV<7I;~Jy1b98iFZok* zbPDRYMh(O-SZ89)1Mqaqav5_21PiFbMTY=3^GNO4WB?lMk6v|WdAh6QLpsGSwB}IF ziq~}|b2Pdh+5jq7TfFFsaDU>M!SIq`a69Uhbc0Z>04jeVWhr|Noyjja2b2KCw{SmI zkZ&t6f;eR^rwhc%ich(kOGMIzAStdX$xubq5ZJ3|SYU1k!H{BtK$^ttw0VA0K*WZ` z#QHzY7A0ZD!Tj=VDB%ey6+ymH%70-jG1_p8RrYKwZRs%*riF@x@}mX@1NI6nEiv5+ zFL{au7PuMF%HL89FQJ$Sr8fKo6LTFgECvE)qi~}~cqfnX1ynQCHOP6ifz&69|54NF zzDW0A&^v6_`rbq${JKY}q+3CChu$C+Hwpt&1VPg$=Rl7`Z{%GvwJiV*c)Ka$&9!Ld z23PpvjRDsY#}9Sov{#FkQ_*oglu;E!xZ^1Rf1|(;4VNU;9|Vpg8>nj`JmsmQ505JG zu6^i+H$VTFUanYS18^c1+>@v%(h;m->y@Rb?jGNeRwu2UsGwcaB(50IZRkWaW5B4T zBl+4Cw}v=iFx5usqBHJMIJ-=Vlsee=vhuHqa z-;gXTk8DW@lD?fGBAodx?g-AvRwe2FV71I6v_76`IX+SbV(FtjJJcvia;pia`HM;# z0e<4r6y)GqbWS=WGLHNJu5&B0Owm@_q2-%>lj%(g25uaoRgcFa!+U>j!Hvlk@bS)p0`CXb=) zDuK*t^IR8*|R2oze%L1fF5<8_6& zkE~Rmy2Vn`W@t+Y6X(8V$XKrT#Y_{de!f02Q1Va#E}_%*AhCkeCTA#BuIhw2=Fj0^ z0@l7HWrC7PEj@hQJ?4`YDQD_u=dMvd4s@$y{#9XxunYz)@^fc2OTqA=HA4-y+DS0Y zEvh#=6^$18Jv5Ns+n4L$Gb|&?<3uDd5hy?0u*HcT-ZiSJ%p;G$T;Peqlw#K@8L;$`HuBB2I5G{p6&RL^k+Rxc; z;U%fE!(_*Ng_r1u;o1eaL%%GPpy+F$Z$w-$PbV)EClb$9Ij?vREY9ogd7l}~`K($> z8FT4gKDH1r4q^~Z-rWeo#EMz$`G}D{J8H6Z;8mpuj4xdH@4F1Yo)0}SE+W!(>tPzY z(fpD(1cmPk_3y{EFm6L>GB@Rz@T=tZi5MNZ2McpY#gGvgrtaf#h>s;~U~wpB%II=5 ziw&3zg}r36BZNYNhmeLs1~d=T364JgDSKo;6+_=-RsCt{T|{WsW+i*m=bUPQQI-Eg`(f28sQVU|FpqD2=4I*Op!g=6&Z_#m3jpM(*jt9~l(7 zf6B`QP}4LXIu>gYs9`PU3fn=rmssNCU5%2}Y^pIDJc$S;arD!6w52{59tErD0YN7s ztxs>dXHLjJ?XxH;;p`!*UWH-hX0|sbt_hKCyCVko_V&IZe=-}ll=d%)Sg`{JZSu(s)X(w2bz@Yo;5nKXKyhrYO`^V&5=pBaep&gkhp89?6Kf zv?GK3RKu6pk_na^^0I8gAsKgfJVb(5JLuj>F~8@z;=`0)!rN2BTJqeR776OZ?fo5p znRAH}VU!+|1j+_=ct3s9psveEM>D!71rtEv5f;@DrvMi=nmz0*X>x}XV;g-FS@-artCT1j2vHU1(zmAw&JcS}N;vI5JpIfWz;Fc=@bD-{_0V9#8s0~{ww&F-8J~cTB7Bn8 zF0jCYJ4b~mhV{LK@Ux+?04)8g+V|pb%56ud2M(Hdu^S$J6*v@5ek<~N^368Wm0z4V zL*O8AkZ@`u3zWjq&pq`bI?hyc8i9vhMsA7;#xHqaboD)Ds}X-WDDLvoiemKESz^@4 zapQTx9r0WG<*@DB0*fQB80nE4Y9cw|IPp`<&y1y+aZA7~d2?xqOV)ksx3RK(5zY z!>8u*ip!8is>Slp0IgRdwl-|&?FJMyf)hG0Lx^zKyYKLI++<0M=Z1YJ0FJk}wroB? z$%A#@3qp2jVqTIQMD7uyXZHhgmKK0YkM1OIW`<>m9Z9wg5^vIn{>7PkmeAq>Wjp8^ zwp`QO;g~GBcH7`GBW?K5CQ?&g>oj{D{^(i@>oC7;=Bpmn^vNLR49sJ4$!G-fbB5oB zi7{Aw%uZ1T_RR2PLlIwKTq+_Va;hK055`V^BzafqX+b-)&j*O?R9M|(aL7~SEp5|J z^%ek}De>Gwd4F5)B`-^cx>ZNzB@ggI#%Cz>${Lx2EICDOaQ6q@t;4W47HK1uH4YMn zA{c2g47-JCZMfkZs&Dz0BRLGi@jQ$FU2xeEJwxj&Hj-U7s0&GPWjCK;gs;& z`-pGS$$`k)IH)hOPzZNO-qnc*rnlBC-f>6Nt}yHK;U#z?0y&Sveb@U`X>x9* z$auo$X*eRNO>iNJ<0mwyNv<~I=1O30^W>P!afJZ~rkE@BYT?2aSt}Y}W?T%@as$;l z`u-YKOLl&uu1dkh7-B1q89_UM)*F2ASX(t`%%8L=xW zq-AdGghDm$35RP&o{vJ0ewEnM^G%s z$8p9KtZ%%67f*=A&Mqe4xr=f}W!s(OlaQ8&cY{YR=JjwIQ#oX_V9CjOQ-g-6wrg@5 zsf(A=Z56D9&T`*RWf3^ISu$|`bt+^wtf`ZAvBTpgu3vaFM1@|JtC7c*17AgPo=BDn z`VEB}&f1PomltF-MIo~+hNMai5ApRHB-@luEG5*ZI#Gu{vIv{L099Ih@)A~d3cd5qyyw{ge77lq~ zp;O#YhW(Ko;F`Owz-?ApAk$&CT#6qFJHStP?Q2Z!#%L`(GNQ1Mmoeq$F&2N_dd?02 zU21HN)SR5(aAaUt4^I2)2)GpX47}VTFu<$z72)a(_gJ>Z4@AI%3Et~u@Z!-|8VDi6 za0!9ma;aO}-)bwa)&pnHzy(c$107}Obs5)ud!mC%*%)d5>5jnK3N9!^ia@RGhYRbA zdfjsyu>>JRcbWMOHiDDXh_%#I=v;vEJd z8qz`FpyIyH$vZ-T1OL}LLq}!orT2GjpOo=7B`)Mujwhm5pG@%8&9HcgRR1a)7LEt9 z67CfO4WimG3iv&G-e4ao zU&HQ+JYUU19>d0h*yH=I?uV|&uweis+i-dDOzngBr9#{jLrDE)(@}&qZo%izH<1+ z?;S+9k8qRejPnBVwnQcB$m) zSf~fs%J?I59&lwPXbRb15D`uD<||^*Jn`pJY&DN|J_6}?C^>cm_XJ-a9+Uh!0pc;a z{pE3RyPK$g_LKjz?U_hfay4B?Xi8Q%1TVkAldXcqrSBoLN zQ_|0154S#5XF)Qv?Y^ts5nZ?aLs~4%(VH#U&76&!HU`e=8jMEn1;e-K?C#m@T%Mu` zK@Eo*Hyl12Jep-5Sg4oR?;AWPz`>oXi5g1DatVd^usA`;llM=$-w>HMljWZPxgo^n%=D zuZ4f53kQ%H@@NGy`k;3Na5BTy(evfK1j#LDBCmS!8jf-pN7DBwtXP3246dXc-7~Ke zreWE+zB(bW#h#V`Ewp7HJFsx1nCwF&8+%p`MUtkWZC=HZok$feCtaW`Adpzuvtdy4n&vnI3fCe?lxaOk2dR7|{B%TyQ+Ltqr9rv>Nc!=AcieMVgp}*@+ME?z1&k>BrU>ZVUfp2#hyo2dVmaw-8l=xFSF9-W=xQU^W89vVoz0Uq{q!5u4a2UKITFgDjDz5mPGHsH0TH3bvC7;45@y2cYsOk=QS{5kJ#LkOqQbY~ zrAsqpI)BDpqpu?JtRtilOTOxllP0IQ&{EM_>R532H{ zj)j9Fwp_V{E){sFs_D?do)3GLvCiiri+x{6t9;#Vb&6CVH>F&~D{ymrBz+M$9 zxpSCcKagmFLoL6=ByT#!#yw4TLqnDo+b?FmF9ZWN*GvMt+4M%F`6 zG>8Th>`_aLY@NX$1Pls^@Z%Z&|E-jk3jCKLgup7w|E1}FI>gUwYWex!lpr%#R~P;FFu=qdsq??U#+v-x~yRRWz`!c=e5oTg%$InT;0$hN_xoK+)Vg%46&M6iitZRvw- z%7M@9;>w0s$4QVqHi@07YF-W6h?J8 zz;niGcNU%-%2i9sm9_Vr9xjF@q77QGj;RPLt$@NS<>#oMV<%lXNpnIZ>G9=hx3a+v z1!4Cx%V#JA^TnrR?uMh^-17M>q}~Ox&Ugl z!)h3co(#D)a24319zNq)oq;5JG+i1rq(7XT7q6kC2!QX~letor->IvWWG1-;EZ=T? z-7Q`}2(l0R-sPz?zvI$AjxoP_eE(U(VRty?vsRPCZs#$e`4|vxsdWWX`ky6)#}9k1 zuEJ7du5O%=`d>Xc@aG(s5T+x<~K^w~G|+u89u2&S}$f%l{4=r3>_%~G* z`Tx_QtE=6=x-|U;EyzZ-8_r72E_Cr_sTeKgxIYvHYY|I^q^>2G* z3H;NmV&b?S&H3BFzio`OcKeUT=6OC~9*#j*VU9)Dx>jN40OC0B$Gcz8wEUy{kFiSK zfb{u>yVrc>?qw@SOK!^?hiXxHt{~|7V>3QYibt6nF2V+f+k6aTtmPlf{*CY7J^=jx$p=jF9=0=n z`^d@1g7teT{9}f3?zQ{s?OqjiKkBBvQWHBM`}Jt<^A4{0-~M&~j{_-R0R8%VMAHtq ziZb)^@-j0sSL1f(Z|px)m;U^-2(H?+yK~N2s|wZ%YUM2Md+<7yB&ckK1SF zwgYk;{sYT@N;t~g`+YcW%dgYYTE{-}5%>M9zk%j;o)`K9wDUZk^Sm}zs~PBW2d?Y7 zPu+(*HF(edGr!e-CQv`m--AA~m3)r58fNA>EyuBj-wRe_X=x|d?>wjcPcc601=;uN zd9<$bP7pJ5eccvEYFS;_pR)m=xbdqa52sy;r>M#E^)5l0q~8}^TLJzB1P6v*)k6uL zzU$>UGwgF4+-Zfhp!hd4p#~HZioP_vNc8@D=+4kAIE=H&jJLP>tNVfM1L6OjVs@Om z+aXtC&^Hv8Aad8C&AAN5JKxu&A&pb#%5lKW^^H+G8=rNGScECgf`OOkI|1T3?|$nO zM(cgn{sh+82g!fO8}@l^=Xv!sKGW^!*S4m=0yhx#Kk=sJvhTMynA!j#e8Bs${j00y zpJmy9Uh8_C=l9nCG1b2T?+59_ao%C$`DfarecJn=2#ZN>6)CZ#kWd)5#Q(r+1a^tJRd&r<4N_u}Fvw+b zG;xtI6lfG=Vv#@#@ITcJi;hkc*XhSYw>;Kv+vUDbAQL>+3R}C zV@J!QZp%Gy-|uWC^~3u|Z$Ge|Gm#6p?Dtt)`R(NYWgY#1pP9MgBl-P<9}xOKyj1^K zLVt&kzghho`Rxx>a_zTD%Rd+y{ee~F7|=(MX+6*Sv(Wx4*8zfa{DbYqhZA9atSP1y z5Bt9v{D;ise?X;F_^X<{6uz@9Y1~cZ#=zzJPR4+3o9{VW+?@!Pygbb}l{Ulb+WHkA z!@l!Z_u}TJ0QwS@t(-2O!cq_xD0I!r*~Xm}HbG{m2ce!pCYLss>__qRg^Pd6 z>EFBmtq_kRj3YuJ73aF+=RJ5xmB#0~E*mu(jHpih*>L46@jy`QcE66ly>icf=9#tM z`=S}S{{az`Newfz_lGAtrzLEBB#{jbynoQt1OEpxHJ?8@$;fFZ@PAP2?><%6?P$l3 zNtl0f9sg$V$L99Iu>&2BkK_@d`8Tln48R|uchX%qvxVMLqcYyba z7p+A57Co1jcyB&f%_~_cc46Pn89w(pNc~o{a%`BD+Z|lE!Ryv=x$6c=nc~*u$6#Yq{ z{Y*dnQ`{-gQ=5_Io`)|C>P>_P;ajwBQJ^LL8z2T%7gi}JBfygP{d*1gh^PDn5N%>4$=k9mF%x=;FRm6 zC3UVAejn%K+QE9{Z%S>IOdtRj%4{1d4PB%YxnfeLqtZ(XJK*cR=*2XoD$FPO?HIHlh+_!N?} zserpfAP9^ft!wOuvY#8JrxCKOe>`KO56@yhda`eUF+VBe=q4KhUw@9FpB&+D z*$GAP7F^I3AL`|e?>9ED%kdL()RBPRd(O!VYbHE>M+!DXxcqd}%(x1TMf&p$1B@eO z2gj*HfWii4*C^OeIX6QjU+Kv5^8;B-%$k>ZF?!MGm###aEze)xs|kbnpc3%~cu z4%mTeX{IahfXSk^7N%L6Ag&#lPscp-ja(0|Vs{{jpu(vCM&OV?kfQz8LV<=j1Hr7^4fIrF7TRF+2}wglT*Wqgx_=JG_+5tjSE2}iBI-Mk zjY__;XCV(Hn8uufmUq)c2+c;&2Z~17?q7d;9piDUqi=hSFjdoL785&?97oz@eX>mV@)j zTdta>N%|$M>9iq#VC>)%p3|yR9}i5~Juk}lb)qU}caRXzaOvsRnbY>rdWQ_FQoiM$ z`wl$v;*{*<8x568_t?3&%J%c5Gz`P&Da+V94W8S~;j2)rReVS9m^g_t2cQqg{_BX6-$d-# zS2wxk!D7dS-$RFeT6wTy$ULr(dy^qwtYFiP;30|^Fyot`)+_!fea!1PBw5Hjp%uR< zx$=6aHuVa)S52^4F5CY?zwK+c&dZy>>P?Uhdpc+Iy;Z`9826e6ZA^>Ir^YBT{DrqQ z2&TQJeqCd1gmE1aKt2A}>09si>3V91n|{f9w#{T+eM*X9#rZ?wt6)&+m#dNlnfcG_b|DCT4kwkIisieuu*p)QZU z0_!n?8#-yZ@pYf~X4rZ2!9hOUm&S#!j?U}jqk?C$EFvV>9WKq-AeLsgPJQzLj%%^wc4 zQTCIKeY;J({lzNea~WP~_Y~TMVBWh6@n~r^-n%qq64Uj^q(p8}G-oF5nXppGapyax z*y}jiOK>(U9NkAcZd{JlzxQr8(a3gmkEowl(AlC=bgWYBu4Xuo>A;k8D?IbZ5L;#q zc5HFhRkn<*rM7<0bRCm_*V57o5pHvP<+rPiq}Lv+>o4;!J8sfq={D_X-?sKSnfX*R z8r@_AI@;gYZyu93esbW5V|i5Xv+Qk)WxDsRZ1m?cD{8l(a(q{*#CpdrPG@t7}ZQZQj_Smp#54!g7V&0S0`%TGBw|cJG5X_kFf+MApyel5~me z%KZ`NQqz|QcFtGqanPzdJI9-i+YZd3tmLXVUaHmeYwp-e;m)h6YX|sO*Kay!qxBFn zd6N7|9aoYk`xiL6&PWW|>ik|7b#!O!x54S#XH}T4n(BzxuM3}_pA@srzrL&Psy|Eh zOc3qZX=XcT4C3(vWXba{ek!L^j88dues& z^Vq3(ZiG5T(_{D49jSue2bLAO z?}=-Gk+R(C%_ecRv0*b#H96c62~s|7AMzq?2Vb%TVcrIL^GPE{PDHM@N&b)q!{?3O zdI2MU`u^?L2GjC6;&&|6vv+x4)3m1>@2sKFMa+J#emX~HOu9(1Y^32haG@%Zo?Dv9 z(!zayIekjJ$yYliGhpVuX%B zlH_o28Dxu0IJSt5b#B%d0<2d1q!f* zYonrLf1OMsU{UDfXbG=6x5kOch4gjFu>)5=UHWqPn)ehH9b(s zlRV_U8Lls#Fq`b`6JvFMHFQzj79)#BxpLH&bgo+ z2fHz{ z#&c%j{^$B4-mO`&g;qp3Ki$FV8?q5i+&PvB&D7foy>g~gq|OOFU$7$EFWBqK{X3s^ zoA<@^`68^OT;7Z0y!m!+Y_1{R?H-iTHF5>+Psu5ZKZotj>1vep!I{#H0_t4UVqak@ zu9J@q`6%ns-ae1oHs?<`_cw^zo4CEmp?fB+wA?Pq(4O0Nc~w1(v95#vVA)55MQH6k zQd@mXU*3y=%q`6tEV&qetqF{HYnps;%j^h}J^Xf_oE^P1YLTqjyy82Lh_?GUa;%XV zr4O?m>;o9Z-`YLiz<0KP*_YL2cb~tg6|~h;y;<_$Xmw#^o_HS|TDf`eQ)yaS+Uvb9 z^iGy6bhJF`hf3NRi65KhE}mUp6=Sb3`L1qYc5(@an~V+aQ_*IC026^TSA$bFB53EX zIak|bbF`ubPviJ*ccp5~TjZJCRlQ@(!P8{TjN?dL*KQs{Hfqz)C+H1u(V|T$Rxt}W zU;p*->i6Dze1%*Ar3zx!H#8}}b4c>-RO6PfX@m{`+<#eW|0QA$df!&EtK&FoF;B05h8YO$f%+VTTWD=!(6f{@G_~Da0Gm?nsxa%>lJRv*N1D2Go8vAh z?6c*EwfnYkTWu{B-0V$1lkesa_v0o^s-rG=XkA=UUSopWerV{)DwRK|ciX(FDrpUV zGXhMy*Xh6yMX$|Qra4Ln0(N|F5wDxD)GhLsUNO4tFV;wt?UHY1jx_^kNxaSw!;LuvpRC|TujL=BDi?1n_!51z z&Cf*7`gDKrGdgxw?u>S~I< z)pC^gDEWTVT<=)$6da#eK4N86#B7R^+%62ZEc&~{K1yxJ+r$u{t&V}bd? z-&tFBut=$Yd9-WxKk z+gFH}=C{cAkUBBN;K$e4&lSg1wl&mPd(|4pGpzTXCz(a31=p=d^|4oRuV>M|yB8z3 z-;`opUP{SSJ1D*cJC!R*)&i=+I|+6z&+pF~>9}7}IIY!`>Q;r@Vh=@zue>lt6kh>K za<&tr|G)XJC2l;GH3 zN}kBp_Rci8^(C=AQ>MYbT$ecb_URp;Nk&u&3mlT|nn1BC<@ta^+Vwn*6a{I-Xx z*KXKClSpRIXgjTK=^l&_w5x*)-!%RCV1taQ!%tmWM>#eMXt~bo-LnQ3>nePL9%-i4 z@2=%*Z|5EB!pznRikZ&sWn}GzA%NC!G3&i0We%g9%cNm*wLrs*OggRw_PseHqFXWB?|T@ z#ySWY{$|Qcg%kvW3=M}_*f@s}P3Rs8W7G)-K?==R`?>@7;J-vIHmJ&LPEad8N#{4<5NiV#|-()@B9Om*?Jw?r*OUJ#2*wW_$_e5*UR!^_4X4cs zz#;^t?~Gj}DCh|`tLy8jK$^g8{b$fAlhr6|M3`PF6t+niIl_A*PP&+IMa#dDyYBp$z6S(N6J*T?(lFt(47 zt8n)&$eM9q5ROg&v#dxn1VubPM&cnke;7*WZ`vQnJBDZR&^8l=PCi7lU!ya3)IN)8 z=|T<&0i95B{webVve&n2LJlL`*6K$(s?_yHS9?fbjLUq62pM{Y^Z4Q%pnBYazq?G0&cX5;r@Q6F_ebZ-;^_l>w9kGi`NtH zsMb1Uwj!`n^<&4N8jhKKOId1;=;j#>II_FK{=w@C=*K~iiBRJt{h)1WacL!N$OkL$ zuk>H`{Sc|mbD+2F=9FsHALOQoK6MV_K|iH)npOc8C;(ghta{s(zv0d+W)Se+)UOlS zW0;+apA;J`;No+E3PCoapPNiZw6V0XWOKiXXgRdF&{eF)8I*@h;-2zMS*~OiWu0o zwN|Nt7l#p1t>oZce_REz(s#DcI#m!j)|j2gI~GsdHFxkH6{1%yT`suR$_uy0w@}&pI_p1<<&^=+>eWIBtphw&1WYe7=r2I;{~7U%%w*9t{sQ^#gYKLkI-K(_gQ=33VX zuvg=Tt2Dsp>D~lmjg>fInBD|)u<~Pu87K-)=Qj`j6&c;n@XUije$kaZW$LYAHf z(eO0`gw*!|Z2W zRX2#LX}Gc5c`(PePoxg73xEeeAs`MlCBxVdM;DGihOX;^$1fw#}k;w10I zFTFdHW@6h_8lwKZu81Dw?C^Tv7)2*bnr3)F?>;@*5m&_0wWl#UxlQ(c%oVNK3)>At z;qx!kJwFfPHY}44SaESmZH%aOD^B1<1@K^=exK!oY817X8H^}AUblG#;Jjq(I*nz6 zvN5xMJyGiz95q;M^e1^KN7BHQm_D_Q1_L7!>z+?{J>GB-L+EonhbATK!AqVGLi?4$ z4QI9+D(;DOplR5|@{F&#(Hj;BH_>Zfv?>$xnYgpovBCy5%q9~wf1!QSZ99g#z3LTQ zcEx?~=HFn-ce+w-3^wEK$Fwd%0D~0NsEG>$PB>VwJ|HkeANXQQ3uN`yXT-Gz1#w-0 znTRMK0On40$>Ux|?e_$}Me!v%oK{Wr_Bv!82_`pX_c(8mI!+c7@>U&f;yhbxIZh{M z*paf~R-Ms()U!|D7+Njy8WXkoiWUS|y?gP(3_^brVXA!sEUpeiMlGzw3(gp+c%I2gR=47WFVVRe~Z&S9YW;-al^uDl!ln{ksU0YZSNyXIYc zIwA9ypR$Ew^<6M%-fDGlS5anKtD;jx%+Wnh07ORRIn(Os2QZTh%Sd6ozMt4*;hV;) z2n)uHXv+UUv=a6sM_(kbPvH~&+x_t$6Hrb#=5kt zgM62|{cOKFf!N<*_9oy&bW@8ygY$#2HPdJAf2PC#_{M#}K|`fU1TR9DV6+WXyeznQ zP;{6V61fG^O*C}4&jH;PR0u)#3xS5dkD6G_UaNz^!>p;98}=hPx~u>Cr`U?IrDAs) z5L2M>S?GKHps?206SIM8qqjzp*Y_-V++`>Fdva}43GJt z&>;f%3B-Bf$_)ub8*$B8C~^!+aqE6_pPkB)LM4`y`!spZ{Gg=+JQXh4fo(eW(!gQ& zl@}tW3Lm08sA2$ttLf#dd*Dp*^Qi)gT%p3=()&potf6Vpx>Ps*_~aBw;U9wB0(L0n zGn|~;$M>Pf@nV?RV3d|4vgcrPa8{qu#*n2L5g*H;$rG`H%JFp?VaoXfpbZ4@x~BZT zzovj^P%2(hcHXSGnlgqC?+T`k(0X0<$x0x;Tq}pNGZU@U8#Sn}5&4_q$~Z&}9Gk?k zo%q%(`-wr9M0BK*E>tQg4;F5FBqD;)3RM%%ac+?AGMbsp?>XGINXm=h*=q4xIwp+s zTJL)$E!O&+^=c?7F&*Pl)(bGN%wtrf)ZB~F)^8VB4d(t!bB@q12ss0;YuG8(&UOOI}WPX5I8G~)yyus+I*r_Bl z1+&c-wR5y$Yyzh3a7asP!%(`O{Ps?t+7)3rDK-6G5~?$sbpq!=@!(Kc%Wo`dN`)PjX?&Se6+J{uvmY8*JTi9YXDDlQ#!@$@ zDg0tTM?+*IGO*;n7){m(bs==g&g_br?5k#jak+3b2;yrh*=R+HLmrq>72DWz1+Jk z&Fo#{wio{b{FVz|)%m%VTcc!E?ARNch<}!nR`PE*UvRw<7#{b{+JNbBO@Qk&*`sF4yzfbM>&o%t#77LsBUkmzw zc5&2lR6QyfM$ARdFis#6*C)VNxV^keSUV?~(VCVw(zo@c)}q{Qh_LU%hW+3XiX{$B zBK)HcNf)}?td%hGr=q8ZpT2HbS|mjALDc7jr=-@x9_u&0T>eYZD$!t_6o>hU_(@l9 zpgRy23Hx3!SgE{HS`Z5DmnNKSN&2oFCpFf{z%Ek)fdPXlA{|gT!|LvgoPM2JP~sK^ z>X}N1r5@esUSP+w+blMWoZQwj7*sDA2Vmba?mbK+JXki~tY60hjS;>{k@H>*`(YpCLsW&Jer+koN%Dg2v|Pgr0P-GdR@U^p z!T)|xM{Z}XzX0O+V|1&Be~zzZ#<}QsufMrJl>h+XzK`?h{oFa{_oh(*z^_4-zn?(& zYf!BKz+bg+eg)XQ*ZMpC3Ndc|7Xp6^0T#Qzl?wD8oqzIQy7{Z-yIR}d(YR_ocD}v6 z1^r750KCgBEiKK@GT!04I{I~wGuJ=$Dc|vaB;J2jrM&#_rvG=YsO81{DBGBk&btXs1AYR9_EgOgsZI(?GVQ&RYS_?R>z#31*R!TID5;M(!9RXr zfBNtYE$=!6-(NWi4G%xx&0z{1m|#Ay2icTeYapOF`#IvSy*b?rEAYU+!DL)CL1PNW1_5KTR^5A3M4=s?krgo^Lc$}UXd}740sK)vwU0%~~e%0Pg zpcBv1pEj@GOc%>z-~C3v0st`gf)H;mt+%f`ty`I0YBsN1`u{9<5XXEaj>}Y)NAmeB zsG3z5)!t70>!ZToi8{22dcQgQH1v-T{(n4>{}U3tvSL?Oj$N41^!Pfn)9fS?Ql|k0 zdkCBb2EO`%rgVL|R*%x#44O)B=o{N+uDd-M`tP*uO0^061H7MUKBnp%$U&d#NCKo~blcu(A@Ak}i1!B7S@a z0B?A5=k#-L;s=f9=JcPZN6qEt03dt-ar|+w!oHOG%0{#tmI=$INN&i#GSy58B&7N> zuN)!y6bt}1^L0p-4`S4l)EYd#E3cKMr-lUDMF=XXll#D_&G|=`f2rTAj|cLllDSt* z8p#_`?ZC&zk*DuM={9%qaa(dtSZbr8O}RngFN>#8ax*TQl+7D+pRXDdpQ81Ju|0ZJ zDLA8fcW~4b$v|i)FWxrEN=v^J?F3GH3RnagyZb%6xWVDdcEjhf=yvQpt;|(%BNws5 zq7hL925N>K9D+-tkOx<6lf@9@hwvkSsMK6wvsz=5g|v4!ZQg9zG+Tt`?ACd=6L`h~ zNjGoTrad0eN{|GR`Cm*r@8)FkP;aX8eFpy^;su(x4Et3#>}JE?>>>19vzG==4`--F zKnBR^hwYA;@r7fePjplr7J8Pm&eVOQ!Zj^ZWh#?lWjy}*h2-Sw5yhzsQ;qv8>5=@4 zc9{KVX+cfaAE~UhysStBZdvCMd)^mEoecH~J&`aTw?`2%nzSQk7oc(SN4n|uE66WT z+4{9;sN`FL`8fxCaa>=pQddBI%=0twM$b&0n!#3#ZK)?YD1ySLlD<1$<1w6NoIJpv zy!b!x9J^}HS++emR^9RRxGVi>QN;glSby_CzuYC}D^J9bAJrm|bd3Ps+GWyud<6M? z(c-m7TftYlpCHaKEU|GDu4B{X8@;7-GA@$B96a6U9A&<=IE>#0CLC7_`{Xz^8;oJs1mW#%jxHr@aN!{k#HfDM*E-3?a zMM4Nyd-YipMd*PE3H5yGdLZL}@bgRaNsU>OgS5?1w&53Lsw~_tPebAh3aNiX3#KPv zLkc7TsX%MZr4v{v=TCYDF?!?DP#J*(#fqZ(9vN@i?0|HZrJbv$LYppGk$eNlyy^EE3XK@Ic$7@L- zjXkAS08Czr=ZfyfGf;;ZEmXvP1O+_XHG3{0BBUQb0YZ@A=e5nBB4a0J^2mvj&Wtjd z*?8&aq65S}BE&vl?bOi)Hp|*Nny5pw_uom@{pc&xT+A=_kL!YtyDf`$yT2~~kbU=T zeQzc-VW$oQiP7O*I&cfdT}BZ*wu9&{I%dm=Kz}=}t*0IPFl0xnb|h5#Cxqn$-r4mGyCyO>_5Agky6jX?nOawf?LkV?-vOP30q4 zXh1&>@6Oo?cCZN|x@!2-J$=)x#M$VXtL>?^6I5}=LRxx$1tLOWk37V^9YLxg!dn4t zM-XnJS927HJ(Vr=0bOH1Dk zbZaY|r#4lX*^iez@=w=IH#r_R&5m zSO^@W*&GtTES48{BxXi{4SZl7jx_jK63Y&I)mDqi8Wh4gC=>HOr-7!&nPcvQpKaI8^HK>OS zlk2Mx%1vDaSfhICKuZ}Z?gas?9N9PP&0s7GFUn{z^q47FRKaD-RfQ`{l+_zyQ?t}j z{1#Ad2S=W`ZFBPACs!Qt(}SLC79rz$t$Z>F;l-rqzH{K5a6T!JV6P&Q>g=>x23GIA zen){r{bpuz_}<6v$|;g7PomEH6T^5y$DNIoUTQ>IDens=TqfcyUAc$@?Fc9_r97c2|KU*joF(dC#Fa6==g-Q{v5 ze)jAq*|Yu zWmQ|j%0LQZqx~x*tl4E1=Zv#t#>Nb)^H)#p(X2cH8v%3vc>q!5?fDMN@M_N9fkXjQ z?5UC2DgQ-wo5Q?*m@7Dq4}X;%3!9!Y#B1hU90}6?TTI5s%8|aAn}aW(>u!hKZ5Bwz zQq0eB3@7L?Ow~7LOwk&x#kCIzQM}XX^RaHv8iuHHy25W~-aYb{TdJK|{pp^+E?$J@ zmKBA5wu}Ak!Uttl`;7{}Ys@LzsTKdr-QWG&w4{kzKH-;l|KZ|Hv+C&NLx1@FFOohs z3jpw>cK|?W0f1oufB^u2`Rjs~hxjk{A94W0hd&+y-^7UScy`l!VeS-fY1!KK9Nnkv z2C?p}UPmOu6ISf?PVlFEJD0i<5HXxa1&~#t-cKP78Ib)zd#kTFK$F~o;4-AgB|k^v z2Q$x-+={+VDw==acFI$e1bSkFiS$)QaP!H6WFKLX`UMEph0}b$Xu9snmFhBtdj8Wk z5+CtH~n5>Q8w%vw_j%LFM~5x^ovMp~^y|zIk2hd>#t`#z32p6Q3$; z=ar*!K>6~zUWZpcYl~YmWYl!G>kQPdV(B3pO)EkZJNNTl^cTqg&(@TD^xjziFH=Jow@(oI zS@o}H&%4G{fVz#SP20|RI$t#px!{QOS02ZLxybU$3O+fsu^bhc_L8=B8(wm{iQ&op zWPIIf?|>EPyjo_jj9l7z{(~WJ#OHTW-P#PDWMh9ye(bCxdjnl9`&-rch6_k%d%sCUkP!loShU!5WGeUcK8 zE=vd$)ruNT4ZN=1emy!rKoP2Fj*9&s#yV@(Hyx=jHW6YVTM-+`xj_706<|#}gXZP*; zY-84F7vW}~#i%FZiGL-K1iyc!fMTj?0N{c{5{r8JH008z$6TQ>* zyy#P=fZEuBx&H!IN_&yTolaqb4D5#hv2HZ+R zm*h(sM!nClVYs%FcZh=;H>4MdMs{m^XVeF%i3iVoq;1tm9TJnahE44NS*KMMAx7+k z)}&X!G5n>_`|ciM44O)ag{~(IiZ&zB?4xl2va>Epv-6aePrumXGH`kt@DS!JLH||Z!?3pf7p3<$iROselj2M zW9>Owd9?fX#s_ROu1H(Uw5Ai@%FgB=`%+&Fmgb)Y?3%x9@hVR4@lIjbVz;51z;3IF zRoShnVW(N1fKugq%KnhT$$>ZX31^6^Ic@QeI>}uEZ>BkLAjIblqK@`P0E>7XyfYl0 zap3O+_Vhg1#&i*_n8Dj_wAeZ_{oD`aWJVUxo^@fn{Z{Nkn9mxn!(=F2?z( zK6Q}f*lwumy>v>rV4Qq@-UX6~FmE!X8}8rPIefMxm`^c?CEc?St2^4aMG%ebg_-)c zAPRLUo#9CQ?;oO>!P_nBzTJLH^_RM*soZ0>3yAjb7%F_kg$2o^9-z~08*`x*_-|W2hIIZ1X(UhVSeW=ZIiZM#Z0W<*-_G}8w(xvgbKbFg`DoA zX?S+AML9w9r>2S$yYDyJ7TUz8V^m- z%P$UxUk|Cl?ewbK=4c^cV-~c-wnMJP%}9q6p(_$m=YVOTmzreW5AW||))S{g<**3N zTM*z$qqc&V&c2VYb*6Fe-d{j;Ui`!YH#<_1L-&N=+e&!9Gr%jLd$dG(Dm*}AO|+9^ z%wJbq>9;4_DO3Sw*%FjeE|C-8izfj*sB~55y?tumvT8%?O9VbHdqaiYALZG@q_{C+ z0P$F8%T=e=1*E_=6*TB>kID+!S0!!kq427n)B3Q?uLbNNH+oFl2=X&fqL`%lmIopF z6j~#qdgdpV*F`O68V*U$M!SvW{QwvVq@uB>i+^Y(!6W2u*N)*rS9z6d-%jZFTmM-3 zH$XQ=b0KT|o!?}~GZ#`YOEVuvV~rUrZNaXr6otIQZKL81fCC(~fZ$6yn9`#{{eBtm zLr{IQh+ab`v4GCn4s(zN#f)=qgQ>e8Dz?wG87f{MSv#myWpNs!v(60yh%lQS5Mj$J zs&$k;p~fp;!?b1e**q_h90Lt7VAqj}^ewhJtjzIAW-S@KoiO~mXCwX8S)>%Dhe6Gh zxy!y$w`!zNaq1$LRUYsIhk-Af#n42E+H1W$V4gG@(rtQaQgo%|pew!~v4i?eRQvs1 z2#}CGDnvR_|2YTDZmF;YG&~S=g8;4C&~VNs4xyNAEzH(7I4mNlnBM=ooE0Z);DQIF z9{Za}821d`l;oW)e()i2EHz%HOy5)e18D?4Se7KAc!dfe^IXdVpqEX^9(2}{3Wxob zyUy-a#*P*LB_0@_jT6L{;XWOnJw*Q%irH>(d{P{0tlSyM63-*3|E2ap6E%;`iiA^r zYUa`E*po;uT9*5ku3`UMG2q5h2AC%L)Io~7!|ph|Iq#>((7 z=W((SbNBUZvh2XweUSMg+Yb{04zN0ex%hH|32XiHktjcUr@q`g5$8jjEGiJ;)>5U} zh?Zj3wo}hV8R$JOZpH)9;09wFgZK<78AiM3Sx+y~-xO zG^)2vJ_DiWZSGT3Q?*U`6p>=1XQ*V~fY~Se5Kkm51x?SZQ`dvdNkxSisiq1g1l*$l z+p3<65iT0W&4wqK1>e=EnCw}}6r+n9)pF{f5mqhaNYC>NqD?)Q@LfrQwohoVnd9RZgVc+|WucB-79ij|z zd@h5a(^(Xp-&@L`_EX`#kWg^2QzpLkd!z5OO{-&eKDR2xUevN(OK-+)lvd=~EPrv1 z-+7~RD4|PsL>cW#dGf=9DvuKE>P5>SL7x<_K0Y*L;sDrcSt3Iq2y}^*5EZ6j$EM|KhZ;{ z1iSin$`H=IrGz4h-vzH=`JM{-Ve0o~O{f7!-WUG4U-7O~Qy|)!(X`Z`kw)l*jkXxiNOjOK2W18gu0+mOxqsrY5^l zA``NkCw}PA{0Lv#1cFgQFIj7_z=g8z{aGcWs;+mg0+QnJIKuop&fYlE|NsI07{GjJehFqRg#QMrU_envXrl zF)c~8Rg;d>1@s0J>bbGPHr+0r-byy!lJ#?)Ka4>dV{1`uzeKzonQz?|o?(0x*AZem zH9R`qs41nCL-z!}R=M&10*q0Qdg1Q2Hpri)U9OPCd0Q6TJK2T%^XQHqACg^vBgkv) zF^|Fmk6YKat@;}^s5H`RGauicvQnRGbs`WW{*fT?XurbI8tLlT4b?Q^I%q3rcfFX3 zg_b*rD8Jx@$9OQTt_!o@ZgnITNe1hH1RE6cKCkm*IF$6R{OrT-t-bzM$5*zm2o@47?7*tBy-=seJ(bK~$DGYrKyyevJ8glRP6+C0)%vgsfH;vN-uoNtH zI|pmp{J`I97QR8p5qq$l_+(XDO)a~ZDZV;hI+D1{yl2vOZ9mvjHdaJP(b9D!!5Nvf9s)Sh9&*EVmPGWeHYyNs|;_()T^* zZoEV>n=PQKLD%7I&iOP`rd}k-%vlY*(@l{v&S73!wC)OEc^@3%i#gX*efAKd%U;cd z0(2!#8GuBceoB*VHNkWgRQt;_S^Km2VxGdrc6iQ79_8UQaC-%T*QO2OL71U1wJ?)g z(1oF^SjiJj`5dlcu8-XIP}|$sR%O=tIwn=j@eV%<0g9L-A%)X~{5XK9-rP`k@zjAH zzfU6BUOgQk)5O#rLmNVkLNgC9U)bZ2U+9meK6=={YH4;6k^eBW)ZVtAkgIIZFRnzI zi^n3~lXE$#^R|i#i_TB>o_F$LFZCpyYm`QdBr{!Ccp)n9=E@N~_z3@^+22BWj^t9^ zt=K6hUvZ#a((>J6kcCQ9YP}dY#Ddqwy(*Luok*0nZs)!|!Vh@yao-Apypf9|Q9X(N zAtvCK_)VTHLl;_#$AEW;)>ccOmM!jOMrk7B!=Nz1U}uI8!`%@}>Q2^ADwrcS{BE-! zBCb-SuD zt(Nwh(qj&?Ts*7gDcPWUVJt0ZEV3=R<`H3rg@qD?M>@8EOUN~{MlcoKRoIAsFF4VV z`KN1Q%3ynVKXL;#8(AbWaEpm4EpRT_+h@UWIL_M(4Vp#mj{qxPbT$>ljl}ao zd2Tl+D&jisn*e7^Zk~ckB%r_qQ(5~+x1Zo%05zZLC*Jl;5bXgk>#n5qkGz_j`dKUD zm({xGhMw{LGH7%F8t-Q`EwFs?g!tyAjO5Ja^Xml9Zm6fvHN!`#Iwo|bA6~qUCA32D z$cQ@(GvRj)D^a_NSAEz`lL0*}X2WDW9S!E_8p2@?E%E7Rppk|0o>m$6ewUkd*+1_C zmzg~XE7&ANjjBH@(4w#neV9!V2D6#(|D;nT(AgW5(lUC@K2`Ju=XBI>hwNft6T@VK z3N$tCTH@h-pDRH!UX}8ZzMiwEv^S>=o1$fGBdJFkK@PwpMqf^14P_jExfzVKgMzo67Fb=|hy`J}{9C;`A2^gh(bC zf?cJcRs|QG?hJG4qMb5N(8OWL#8B(H!{M#+0`O*2evP5QI~vC_db}=+PNsQ#_2w;Rh4)*OuTKs zN}h2u5}JU13p1uinccY- z7~ddgsB4W3cTifNbPbQhjgZ%DV@xvp(m6;N;wZ2he`b#!p|Ac%@67&I0z8ZCEg|7? zm^P7zNu=erRjhtV6$2+t#x7u?w@fKa#29nygt9fKP-@m{ z0N1#phm~=i(jlvXIf6jq5N?Qg8l|rD|uGzMHf(7eDryj*04#)B3&r2;Uj82_nKu; z<^*WyJ@6tW?v6jJ2ndl^;#9qG8xS2$os1;2$OIY17kz;~xNXIho>tUYv?(QnS9Dm> zffpaaOoh~6qg%^jnI93D2eLOojz&krBN z8Y`&5BCH%C^Lz}jq^0v@g;LpNHOo@58cUGBT|sUkC(j^9Sh5_&7;M)vQ&lT`gO;Ga zGu57)ucYqk4qZFA)bbG4>tKb&^9xy0Lf*a6t)?bpb_GJdwVxfwe@Ygb36@I5sy1xC2985_Bny~&$u8h?;rE?QpI zoenlzo#E@b+gVZmQ4JMXtJf0SnwUS^AOx=ANf;2Sr3zy@HBG$RP`GT5gbB3U!RF2#J~;TpDLDoog@m1tM|wXD3x8L6L6;C6xhu$zw51snju_;3?($ z2YVS#I^0w*1F&Cf96V%Z)4}^vVLjNU=x^3RbRCYjDAVYM!@e~jiN2uVhB1+Am4h&@ zuc~eJNzGp&r>FHF=Ru9wIMs-raBQOYaST0EYkqdXwXP1l?`n*8&SD2AFsPqK)lhm} zZ#@ldOy4#wtD6^v0{i@3e#NN(Bpu0$F0MPcXAG{Qd8t`BLLR((R$3ZW@rj~54UK!D zi4uZUEL@Tb@u%zU3GixOdsvd}%GdQO_AUJkDbbB*x%*qKTa&O=xruA7k1}$(Upbjr z5AKHn967qlDQHvfk&RqB&W=|Q8o>y2o+WR&!~}8rKO@z6p-MR=P3XU%rJM^*(;+w| z&r?(qzX_I($nbV28&sMKONt}}3D&pf8%NgK$!tbdw+z4wQ+a8A)Wv-{3~|n3O!2sF z-gQu3^G5f8JURs)rOWy7qdSq}r6|Zt4dW|cOJ+5dke4#TEgjZS>+Pcv{t8T;b)_7%JRj<#Afqz38j zip}z1?xhgsO-Fv(?sI`Hu8Tvtd$?@Ym#57)`C-bh+U-P}LYVZdX2^<98$)wgn)hgS zjweH3)o}FWm&7O31JVWJRpq?H-SAfI9GJ~4#qiM~OWh8(JB)YrdnvnF_$R&wH95W= zf2g?+5`eb<`bH;SVc|rRV!w=O93QYGMI-;cxxn9L)A1$tLrznZ0f;P3Gd}MO4{gSp zutb^m%sEFGDkxilZQ@=83C(`eg=Qx0fMn2)wW-N8FUzV7Qg&uWZZc@bn)Ji=CFr+u zio-79k=C1~UQveDF$`;p@t3jQ-fv|Td4{5r2HpYC>TG+|sxE4R;1+gur(#EaNJHl! zSkj+8K&@6u9G@pKqpO_kwI8mVO?1l{x(jd^KwZ#R67F1fE2$}F; z7wZsdiwq}o`E&aUxr{MSq@A7DY9|D(?kNx^Xx++yB&wZMb%pMGo~;?mnaXWOuiZ+! z)Z?=^%=W-!pUfC{qTrT%&>tI&dHG*ibRWEjMrKqr$Xp)6z@<#*w4)0W(_IP&sQmL* zYFsR{3`7jEA8)|*c15>YZJ@b$S4S*mT@+i`nM>gr8S>pGPsY8Mu3YuPk0W@HD^!d3 z22*Xz+DaqzDR2f<)B8Z_DiXt+cnhc^Q78lY8ampmtPgn2-}k}a znBvf!=Pr(Bb}y`MBmX?BZ_W?t+ImJf;-h^3%w!KI3)Llyq)5pQ$bG3Bl{ISChG()?~kXqC88xjn2 zvK)(3`rah(!YhoM&kGw@k>o31*O=?;QX88wyL{dj&jifon#+%@{d3EyJ-qv(y)c?` zI;g%USl5>#m>n} zTvm-@lRYO`EPHSG85>t+O{h9?)8Lvq?YHU5UY?V0iE-OQVYy2}W6>f$7~A(AVl_kc z!YUUf{a8H5zn#kT}rdu+rT>umtovglvMKd(blv!S8-w%u`y06{hUvHW$E^uy3R1f zZ;40k?C0ik#YZglawS6!z)r#}-4lqBQ1nsA{_+e)C2c>Osr&o40hzk>=F$6|yNa@P zlRfmXkmu$XW@p57Wk^FlL7koK)r=61XKDyFA4ZP{a<$y!QpGSe1tf9Dyl7D$$71g= ziM_3+M4m#9o?{SEOt?sgXdl8%ausBZ8`IzBr8$FLv}nP#7vqhE6~xR1$T+U%HNSHV zwkUscrSFKCGJy1-Ww>S4;rMu?N78%P)pQ^;@&)y_k$}lGe{|H*+!n1vQ?L!J%dAi#v&jjq>E*SK2f4yBtl}d zTu2s}>ZX1VR;`{TkB!D>sK@r)Rm5R#BU-KEx>twT%gGx(z))a1ZWk+cz+*%FoP1$O zQbseZYPv%81;pO#g#O;b#hoo(fkVXIK0uS@+)tylW@O}?E7S@o-SD`;-;TnZttO828m%pitA6GM0C5+Cv#Z}sbz)wMKuA`WfH zH$4k_H5xIp@=M*X+cP7s!`=waZRc6{HYKRaE~FM0zG>N*PP7b^sX4|u%YnLL)ZJ61 zz;&8)UN6W=WdSHsJO}A(ZUXQBmL$24v*Q>)4Qt3im&w~U%Tz;=wOpqsqC4$of5#2~ zAVyzBF>{SU@P?Vk@-yn}?Gv5Baq5>*ng_HkJn)u2OP%~k*t-jUs!-5TDrn0}B2VTR zsYQeWp=IR&{cnEmA0tP~DDJ#c*fXCk2>BrF*HI?XKj@b~HopMJAsA-hvKG$;PYb(P z$HE5yXAnVhQ0AAOd{g&ozAkL=-ybNktWF_sphv=9p-wF8FFV@aK4c{H{PIB9Qdwpa zc7yu^aiAQG&XMY-Vm^m@BZE+5*Ih&4<`#KQG$|Q+*G#KVzu#}iYhzjjytUyhC*yE8 z!&p6cV#mx2(h4)(@<;jvRRf&jm)2V}g%V=r%`G7_yB2gJNje!pTm6mbuMsT+vkKpP zx8WWev5BLxNiqy0^=X@E5IQm}U{SOvL>PR%(?(C{;-qf&g;N6&u}L5C-&f5K=Cw#) z=Oygu5^jg?TNNm(23NQ^xr49+d{aEU6sBDp#BM|uCnR`7^m@Q~-j~0+J*whBtmZP& zYcM=+X{wSWMg6F@+TQ4WkvjdQq3Ew~uXd|0)d9>4Y9jOwNY;W*9L8vjG zvr+;c>iwTBqdGc zak$C$wh_F_6j75`8Vtk2=!D{1s0@9R5O8HGztL`c$rPSOM2I z_4~9K93hj%J>M)Q{92J#Wr}i^w=4E%Rs5?nj&88?>0@t90}!ifHmzu3MX*ug)}h&8 zof;f3FLJD|GU1|q(tu}!XlRYGr#(jIx}NzPd`XT~r1*YJ5`D=Tr%T@yMWAa*U_{l) zU`f%unt=U~C|~npkQYm~Swa zk*))UCS)~1ezB_XJ*`I5%6*`+EPnP$-4%~7K(p0XDpG2ymw}opW!?>b^4|XKEK0LQ z>HVCALUf%NdWh%9o|>LaNvHzziusLVI{wn0M_$W2lm;0sxYcp1ZObheX)K?u*D7oG z=%`!v%}i9J&n~d%ivzHXSo?IQu4nRfH8M z#7|~N51oz4<{~7(i_uah;JwqtpjX`~3d_%~Oznf8qy0s9fT~J|RcOz>)N=XzNu5Ah z8nFtf?<=CD_lMairJidOJhw}vmX-7cm(#_+O$Z4vR;^)Q!F+~=*^s7rlnaKz>?Z(> zacShgTy9iY|2$e0$5|1iY`FF(@>p%RScW4oe4>2`xLXjB={&J{jL{gf!YIhtv|GTq z|DiR?uHHd6eYnRr>B#9!;$p^;68lR00SBf9@ z?X}^p^^S8I-Bf1QsPeM7$$CWBr#j^$*%Pz`9y`XE{YJo-E9J~}36kOGf}&LB>>VDN z*g_H1F1J)c(COUfT2@sS$fyFP*a4az2Sv=#!I7nVf|}D>{lc3$l)hB<_NdpOf!4#g zLzi@rVy>RBJfE&s0pi9dYn!dd1N+lw@Ln3-dULKpJ?{~fjJZhgwCJ1Sh#e*5OC6ug5w8}w@|%H%VB$mS;c8ew@lje2}UA z)vehqdb8xA;5Zxm3EWwViYG@Z3BBvS@|czk7>4p+K_v>gqj4I50P$+>(b=S&BY=yQ z7y9d3?R!~#|FT8mW-+QvUiG=OvgWw=hEQ7W8}Osxr6X-tkTGHO``5#_t=41lK?!!~9xOL>k*6=B)%G0y(O zQT6kQw&TH3NxU;X3_clDpIPV&@~Gx8B=t8$b(D+Lb~k)633a>-OdwED7%1cyC3si| zVCwdXy{ubx=GwjCc=OUu<={+g(CK#UIuE$Rv9vfQyJhnk^nO6dww0q1UVzJwhA+>l z8TyZznKHRJQD|~PS6M#Ig(!+mZH10{gTZj-{8xw$s2{+O=KH$^;a1S0v|#}|m?n=TYabKl!{qxWf6#ucr3!k3 z0&k(XW0G5sU!I{?zcLBjtneImixN<{4bzcT2KQKFRG@aXqV{5YKyL&~yn}trqF@W_ zqNtouN1gLy*?ObAs}fWve`uoRFekl+MrEtnxC>AOJU!RKg78pwIT<(ks){ik>NfAX zVY8OSeH3el(Jk)SupxytWW-|Js>A1QmbAa2)rB?$kT`21;~L0`#7AMAwF~Mx7KdwV zT5CAEO^ry~ws-XD>EflPKN2Nn)r)n)ukx~$FQLbekS7k~y5#QnIN?dJZ&y>FwJH~# z_MSw22NAHfd9^zp-Y+`E*x&(((kG z(d~|&iid%M4(NZ=dG3H1yt^bXlUb?O*bc}vUDN2DvxDlN(KgVo0CE);h*v1RL+v~Y zEVOP}Z6(W6-|J2w`Di}j1ovtOli&_mUjsE83&SzIwMJx@$0-9HH945-;vwl8SlNJZ z0}2wQ{S33A5WA5fBo=g{d+Fwm(>7C$}M=K+rdMn0SA=R@OH@S!|@ z3=QKsnKMM!vMs$xpJq>Yw#prLhxZEP9HGt@6gZOqQFw@%Ft=Z-Z=EhGFg~^@5O*c>&cA8LiXb%(4Tv~?> zXd4>pjTFxO%feOwOjC}qpx_H!y!Eev8rha5f~zMEKR)a&YgBwE!bLz}P&XcG7c#xc=oMe;0YFqJ)vHgL2o zWpk{i0aE1oF+{b$h#hWz1)uJHRC=2jnQj?#>%{AY?+a03%r({Ja@ksVzjC6zyFwc5 z_n2M2(QCNb+QU*;uFN!X52YrlK0!+xV}eSM(=EW{BM493_~M})X*HN`T72ygeEzYn zW8xY!M0H6nVivu3`F-dZTA}q?jN_K$d?o_WF}|rdHMmQFxFJbwcO;VG_DCyaL>?nw zTy*u98m6}E_y1Dphk7V494CLSo3ct$Gf@N~OI`cdMOvC?Nu!bow*C zbOY!Di@7-i0|w#Ts6w3Q8R-K(X_P4<4olOgi}*Y{enQojO}AHP9)J^PYCjt$4^zE9 zUo^ER4XMngJgAPOjQdtLoD>XCpZ*mj8hjKMi@7}9&R4bz1VVc$b;53(aat3XbCnF> z!ofx@IJ7uJj(Z6l?yPA*m+$+3n&QecJ$OeJgfkwyL3*}>gHd&tkR%+OyHr*TUp8sj zN{@BTmrh4=b46~~%lsHAjapxXSLF4;~O zzv+!_2hZkP@>G=)csS$~grVBPd^C;eN~3=!&Q$7hU%F^5nZR3a&N+a-o1GdLEvcFx zPP=WnhdJX`%oZw~U){ZP5mR<|pHN(AUA=s{kJo+q6T%rWiMGEi4Od5-v9Sz>!HNa{ zLH+7sizu5z#lV&(QS|+Mn&30j{Y-rMfu=PkQ~hq*T1+h#q8z3Nz*>rKXe~r@(lIH> zsVCd4-P>2+&jhSqd%9T?@z`iG5s`OOLzh1;;@Kl0N9C4CPX@hLkDvVugyFsWWCtR$ z0^P*I*k=NYxW^@298}f0sTF0rZk;q264e~)Ei4m7kF_VIsM@owXD)I=lbsudD?U_r z@itS*aw!7djSCTy zx0aVy6jUG#dUqe-fFU%n?ssK?K91X@}ov9!$2zD zFPuf9H0FubMpguiLrqGap zI+39iVgc&8+L$8<30!3|A4D7doaY!H#{GH%+OKK&GWnGeXpaPIriEwv5j_iL^do8w zbuQz$_!NR1j}+qyeZ0^?nKB%%RQ#in4u|2#yNN+$?FSlOqMjGV2-UtE9PQF!rGUUwMIn)u~(bezu`Lw)yx?A3o7C zmMcwiJyw6*(O;Io%oBXC?}=~rnr}Q15pb)ZrObn|-Nj`Ho<8WdDDj);Xw*G{0?tF( zwDxu?^RfgjxI^es<}X*vWziuEpK!A!j)UpuC8rTGI=}Nqd%W+85*wOYrA}#}5hq~H z{9d?%3KqpHisZlv-F)8pF;Izz)8IX_(VRr@1HD*hUQ=e{qUNcblNcg?!ZgK!Cuwa~ z7#cW(F_Zyww)>gi-jkNoJ+i4}8$i0ntSSp1OjnJuUzdp4f_S`c$_XM2SmPwnJb zRMDJFj>}V0^J4x)V)8-}Z^)e_j1y}W^C>)u^rMBoa>*x!om#=sW`0Ws+TQwl)?_XbIzkm9x<$d^Q_}}DTpYXE zRCzl~Y-SR6m|=#x(V6@#l`r3<7~oC2+<>{ahj{Nb@*s`}AF$SLgOr&^>e8uneLlKQ#hH2H7BrD;zUDuY0_L{D+mhOYdv$Cp`mc_gw*N70u(@ z1b{Lo>>6Gt?F0wWktS8IO0#gEW=L;cm%oCw92Dz6YRwb|8B^Eu!Pah;UlG3`%EE%v zCFtzpeuP_Qh4UAh9m$peJ9r(MVBWfQPAZB8+A%Hg37JVbj$mYZ&=ROjvto@acP}(4 z!j5Z?hYRJPjJ^=_Y5CEs9@U!gWhU~>V8VgM?p7@D+oUWrab+g<(M-ysTOi%;hD-Hj4uXh eT`EJw+IJ<4n&ZR=JK*>K;_gVN3K9yo(H)>C7((6v literal 0 HcmV?d00001 diff --git a/R/wwinference-package.R b/R/wwinference-package.R index 87901102..a463aaa5 100644 --- a/R/wwinference-package.R +++ b/R/wwinference-package.R @@ -10,4 +10,6 @@ #' geom_bar theme scale_y_continuous scale_colour_discrete scale_fill_discrete #' geom_ribbon scale_x_date facet_grid geom_vline labs aes #' @importFrom cmdstanr cmdstan_model +#' @importFrom posterior susbet_draws +#' @importFrom fs path_package NULL diff --git a/data-raw/test_data.R b/data-raw/test_data.R new file mode 100644 index 00000000..dbfe9ec2 --- /dev/null +++ b/data-raw/test_data.R @@ -0,0 +1,89 @@ +############ +# Make entirely fake stan input data via prior-predictive generated quantities +############ + +hosp_data <- wwinference::hosp_data +ww_data <- wwinference::ww_data +params <- wwinference::get_params( + fs::path_package("extdata", "example_params.toml", + package = "wwinference" + ) +) + + +# Data pre-processing -------------------------------------------------------- +ww_data_preprocessed <- wwinference::preprocess_ww_data( + ww_data, + conc_col_name = "genome_copies_per_ml", + lod_col_name = "lod" +) + +hosp_data_preprocessed <- wwinference::preprocess_hosp_data( + hosp_data, + count_col_name = "daily_hosp_admits", + pop_size_col_name = "state_pop" +) + +ww_data_to_fit <- wwinference::indicate_ww_exclusions( + ww_data_preprocessed, + outlier_col_name = "flag_as_ww_outlier", + remove_outliers = TRUE +) + +forecast_date <- "2023-12-06" +calibration_time <- 90 +forecast_horizon <- 28 +generation_interval <- wwinference::generation_interval +inf_to_hosp <- wwinference::inf_to_hosp + +# Assign infection feedback equal to the generation interval +infection_feedback_pmf <- generation_interval +model <- wwinference::compile_model() + +model_spec <- wwinference::get_model_spec( + forecast_date = forecast_date, + calibration_time = calibration_time, + forecast_horizon = forecast_horizon, + generation_interval = generation_interval, + inf_to_count_delay = inf_to_hosp, + infection_feedback_pmf = infection_feedback_pmf +) + +fit <- wwinference::wwinference( + ww_data_to_fit, + hosp_data_preprocessed, + model_spec = model_spec, + mcmc_options = get_mcmc_options( + n_chains = 1, + iter_sampling = 25, + iter_warmup = 25 + ), + compiled_model = model +) + + +# Create the toy stan data object for testing +toy_stan_data <- get_stan_data( + input_count_data = hosp_data_preprocessed, + input_ww_data = ww_data_to_fit, + forecast_date = model_spec$forecast_date, + calibration_time = model_spec$calibration_time, + forecast_horizon = model_spec$forecast_horizon, + generation_interval = model_spec$generation_interval, + inf_to_count_delay = model_spec$inf_to_count_delay, + infection_feedback_pmf = model_spec$infection_feedback_pmf, + params = model_spec$params, + compute_likelihood = 1 +) +# Generate the last draw of a very short run for testing +toy_stan_fit_last_draw <- posterior::subset_draws(fit$raw_fit_obj$draws(), + draw = 25 +) +# Save the data as internal data. Every time the model changes, will need +# to regenerate this testing data. +usethis::use_data( + toy_stan_data, + toy_stan_fit_last_draw, + internal = TRUE, + overwrite = TRUE +) From c971abde4c1104a85cf8cc1e763235c09c596db3 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Mon, 8 Jul 2024 09:42:15 -0400 Subject: [PATCH 065/103] add vignette builder to DESCRIPTION --- DESCRIPTION | 2 ++ 1 file changed, 2 insertions(+) diff --git a/DESCRIPTION b/DESCRIPTION index 21611fdd..796fe10b 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -85,3 +85,5 @@ Imports: posterior Remotes: stan-dev/cmdstanr +VignetteBuilder: + knitr From 12fb9daa7e802bf032f26aee97b2d16b4e3721f5 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Mon, 8 Jul 2024 10:00:21 -0400 Subject: [PATCH 066/103] add knitr to dependencies --- DESCRIPTION | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index 796fe10b..c4f5c88b 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -82,7 +82,8 @@ Imports: bookdown, ggplot2, rcmdcheck, - posterior + posterior, + knitr Remotes: stan-dev/cmdstanr VignetteBuilder: From 8d362f4f7c407c83eff8f3e139e93a1f9b815a8b Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Mon, 8 Jul 2024 10:29:16 -0400 Subject: [PATCH 067/103] fix check of examples --- NAMESPACE | 2 ++ R/preprocessing.R | 4 +++- R/wwinference-package.R | 2 +- man/flag_ww_outliers.Rd | 4 +++- 4 files changed, 9 insertions(+), 3 deletions(-) diff --git a/NAMESPACE b/NAMESPACE index f12c65ff..aeb38dcc 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -50,6 +50,7 @@ importFrom(dplyr,rename) importFrom(dplyr,row_number) importFrom(dplyr,select) importFrom(dplyr,ungroup) +importFrom(fs,path_package) importFrom(ggplot2,aes) importFrom(ggplot2,facet_grid) importFrom(ggplot2,facet_wrap) @@ -67,6 +68,7 @@ importFrom(ggplot2,scale_x_date) importFrom(ggplot2,scale_y_continuous) importFrom(ggplot2,theme) importFrom(lubridate,ymd) +importFrom(posterior,subset_draws) importFrom(tidybayes,spread_draws) importFrom(tidybayes,stat_halfeye) importFrom(tidybayes,stat_slab) diff --git a/R/preprocessing.R b/R/preprocessing.R index 4625a989..fd86e11a 100644 --- a/R/preprocessing.R +++ b/R/preprocessing.R @@ -129,7 +129,9 @@ preprocess_hosp_data <- function(hosp_data, #' @export #' #' @examples -#' ww_data_outliers_flagged <- flag_ww_outliers(ww_data) +#' ww_data <- wwinference::ww_data +#' ww_data_preprocessed <- wwinference::preprocess_ww_data(ww_data) +#' ww_data_outliers_flagged <- flag_ww_outliers(ww_data_preprocessed) flag_ww_outliers <- function(ww_data, conc_col_name = "genome_copies_per_ml", rho_threshold = 2, diff --git a/R/wwinference-package.R b/R/wwinference-package.R index a463aaa5..fcdd92d9 100644 --- a/R/wwinference-package.R +++ b/R/wwinference-package.R @@ -10,6 +10,6 @@ #' geom_bar theme scale_y_continuous scale_colour_discrete scale_fill_discrete #' geom_ribbon scale_x_date facet_grid geom_vline labs aes #' @importFrom cmdstanr cmdstan_model -#' @importFrom posterior susbet_draws +#' @importFrom posterior subset_draws #' @importFrom fs path_package NULL diff --git a/man/flag_ww_outliers.Rd b/man/flag_ww_outliers.Rd index de6ffe9b..3fc27d1b 100644 --- a/man/flag_ww_outliers.Rd +++ b/man/flag_ww_outliers.Rd @@ -40,5 +40,7 @@ step. Flag WW outliers } \examples{ -ww_data_outliers_flagged <- flag_ww_outliers(ww_data) +ww_data <- wwinference::ww_data +ww_data_preprocessed <- wwinference::preprocess_ww_data(ww_data) +ww_data_outliers_flagged <- flag_ww_outliers(ww_data_preprocessed) } From 194e63355813b83acb5000269a98e960715db9a0 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Mon, 8 Jul 2024 11:09:57 -0400 Subject: [PATCH 068/103] try fixing flag ww example, specify package for sym --- R/preprocessing.R | 8 ++++---- R/wwinference-package.R | 1 + 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/R/preprocessing.R b/R/preprocessing.R index fd86e11a..88ed414a 100644 --- a/R/preprocessing.R +++ b/R/preprocessing.R @@ -156,7 +156,7 @@ flag_ww_outliers <- function(ww_data, dplyr::group_by(lab_site_index) |> dplyr::arrange(date, "desc") |> dplyr::mutate( - log_conc = log(!!sym(conc_col_name)), + log_conc = log(!!rlang::sym(conc_col_name)), prev_log_conc = lag(log_conc, 1), prev_date = lag(date, 1), diff_log_conc = log_conc - prev_log_conc, @@ -178,14 +178,14 @@ flag_ww_outliers <- function(ww_data, dplyr::summarise( mean_rho = mean(rho, na.rm = TRUE), std_rho = sd(rho, na.rm = TRUE), - mean_conc = mean(!!sym(conc_col_name), na.rm = TRUE), - std_conc = sd(!!sym(conc_col_name), na.rm = TRUE) + mean_conc = mean(!!rlang::sym(conc_col_name), na.rm = TRUE), + std_conc = sd(!!rlang::sym(conc_col_name), na.rm = TRUE) ), by = "lab_site_index" ) |> dplyr::group_by(lab_site_index) |> mutate( - z_score_conc = (!!sym(conc_col_name) - mean_conc) / std_conc, + z_score_conc = (!!rlang::sym(conc_col_name) - mean_conc) / std_conc, z_score_rho = (rho - mean_rho) / std_rho ) |> dplyr::mutate( diff --git a/R/wwinference-package.R b/R/wwinference-package.R index fcdd92d9..e0e93041 100644 --- a/R/wwinference-package.R +++ b/R/wwinference-package.R @@ -12,4 +12,5 @@ #' @importFrom cmdstanr cmdstan_model #' @importFrom posterior subset_draws #' @importFrom fs path_package +#' @importFrom rlang sym NULL From d7f335d409574ed268e85a310ac8db3f72011e5b Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Mon, 8 Jul 2024 12:03:31 -0400 Subject: [PATCH 069/103] add dpyr to lag and lead terms --- R/preprocessing.R | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/R/preprocessing.R b/R/preprocessing.R index 88ed414a..93f186cb 100644 --- a/R/preprocessing.R +++ b/R/preprocessing.R @@ -157,8 +157,8 @@ flag_ww_outliers <- function(ww_data, dplyr::arrange(date, "desc") |> dplyr::mutate( log_conc = log(!!rlang::sym(conc_col_name)), - prev_log_conc = lag(log_conc, 1), - prev_date = lag(date, 1), + prev_log_conc = dplyr::lag(log_conc, 1), + prev_date = dplyr::lag(date, 1), diff_log_conc = log_conc - prev_log_conc, diff_time = as.numeric(difftime(date, prev_date)), rho = diff_log_conc / diff_time @@ -189,7 +189,7 @@ flag_ww_outliers <- function(ww_data, z_score_rho = (rho - mean_rho) / std_rho ) |> dplyr::mutate( - z_score_rho_t_plus_1 = lead(z_score_rho, 1), + z_score_rho_t_plus_1 = dplyr::lead(z_score_rho, 1), flagged_for_removal_conc = dplyr::case_when( abs(z_score_conc) >= log_conc_threshold ~ 1, is.na(z_score_conc) ~ 0, From 3a5d7ecc55639c9c1e5befd302a1300b79668bcc Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Mon, 8 Jul 2024 15:26:02 -0400 Subject: [PATCH 070/103] add generation of test data to package --- NAMESPACE | 4 ++++ R/get_stan_data.R | 2 +- R/sysdata.rda | Bin 37948 -> 26963 bytes R/wwinference-package.R | 4 ++-- R/wwinference.R | 44 +++++++++++++++++++++++++++++----------- data-raw/test_data.R | 7 +++++-- 6 files changed, 44 insertions(+), 17 deletions(-) diff --git a/NAMESPACE b/NAMESPACE index aeb38dcc..877fcbd2 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -43,6 +43,8 @@ importFrom(dplyr,as_tibble) importFrom(dplyr,distinct) importFrom(dplyr,filter) importFrom(dplyr,group_by) +importFrom(dplyr,lag) +importFrom(dplyr,lead) importFrom(dplyr,left_join) importFrom(dplyr,mutate) importFrom(dplyr,pull) @@ -68,7 +70,9 @@ importFrom(ggplot2,scale_x_date) importFrom(ggplot2,scale_y_continuous) importFrom(ggplot2,theme) importFrom(lubridate,ymd) +importFrom(posterior,as_draws_list) importFrom(posterior,subset_draws) +importFrom(rlang,sym) importFrom(tidybayes,spread_draws) importFrom(tidybayes,stat_halfeye) importFrom(tidybayes,stat_slab) diff --git a/R/get_stan_data.R b/R/get_stan_data.R index 4a6ee965..5317b60d 100644 --- a/R/get_stan_data.R +++ b/R/get_stan_data.R @@ -476,7 +476,7 @@ get_ww_values <- function(ww_data, # Get the vector of log wastewater concentrations log_conc <- ww_data |> - dplyr::mutate(log_conc = as.numeric(log(!!sym( + dplyr::mutate(log_conc = as.numeric(log(!!rlang::sym( ww_measurement_col_name ) + 1e-8))) |> dplyr::pull(log_conc) diff --git a/R/sysdata.rda b/R/sysdata.rda index ef0eb9fbe85e8fb3ede8051a85ece914c25377a8..1dd714e66bf6f7be64aaef3d2ba90909ce9c1e10 100644 GIT binary patch literal 26963 zcmd3O1yr2Nwq`d@a0?coL$E+_cTaG44-hng;0}!hcM0wU2<~nHf&_=)9^Bn+n&>(A zoq6xxnKf(8UcGjg?Am|TS6}U_{`>D1(lFs<6jY~BRQE;Q`v{1V_qh9?WWh02wHzR~ zVMGN`l7CH=u0bHi!a|j4&Y}?g761`K=oYs9%clSbu+9P@SbN}wBkT+znA>X?xCdQ> zG@}Vt(<5%HK>{n01;W!RMLsB-1OU^hc%T;mBzkxPj{!bV008ISy{Dftd9(l^g@Uf@ zuHlxc5g_HhV#t!OoMa;z`}yUNWcL4 zN+6E%u@YfK8XzT2ps&}JkSKB^fS8a-0mag4BK?71z=YuGM#3vz$~^bzQ}%Edc5CcVPiQdUKyL z`H_ccv%J(xmlL7L=+AoQX7xNq2fU92fvc@fZw{ zhQLDr42Y8Q%oa>DVapciPGd1?kpckFbHlw%q^Q2=jhNzPr%3Zs=C&JtCRDo4j>AIC z2~71jNr}fgAxIQtU}YA}GtNkSx^8I7%P<_K$SjwV^Km)Mlocg+q&>l8Xwz95xDm+0 zf)8=w7CWS!zDXU_)N8;|aPt~}VDoa`dvt3NHT^4Z@}^ba6@Bs28GVmuC-k;`r`lO? zJxFJVz;Xl`^c3nNMP<^V7Wlv&003$V%CkNy@#R8c$S~+w#IjMb+sMZs&CeRyJ1xN{ zi!%RMsiPW0skLX)?|zyS^CbT3Vb+{(f&0xGPqIVf&bwU2{alb2@bT@ZY&U@dOqBlp znr8|;4>iqXK05ACF2C1_FEW&nJ$F}XoUrh%yE4D4k>@elJt}CXDLeq*sLhC3I{At} zR`1U;h=BL?l4%Xjz$!j!u-SLi@@x21ZZ>ft_2vyA>?rqfAruhoOkB7seX_=n#KE%) znLLjN{qM55=&j0idQ@0Dx`s<8%Ri(C8{qXLzuCwuWa6s7C~(w%*)j(tr)D&tNp?w$ z>!+Pr=+-ch&zBlpH^8tHXAAT%XyFSvS40@JfrU%$4R}<8+MRf1$SP zF9dYa6M<`rFxZbw7De8+25NigtP@b9ZCJ6LyUfSmElIbmaU_TqfWU}Qjg@9M6vRqs z5)@^`6Fe)Lifu^Dj0zc3A4}sf5dm8(*K`c{FruqG^WYoWc@(@6wFh-fxtA|`9FpDu zUQjTAlNL!c);bal0ayoW*u=8g!BC44FTzh81r$dvIa;Y+hE8rha-uv)l{^1>&0fuo z72Re}zv!ob8729VTl*3VM47-VZ!g4?3063k!OXg*=n)QN;P1d3f|cvD)!Q5s3FyB5 z=+%;tp_)aFqERwsbV`9#vx}TisKKG-F6{j<7Iy+qvA<~mx(LwAm-%F3Bke@aG)W(- z(=DHKG~~IrDZM0FK_~1g5*AXF2KyRJlC$LdT)KdWXMKg*b`lFuaDzK55t`Gq#STE)x6i}hJ4|P9tU$NW^-9lS**7hw0%?PLf`_E$z+*%k#`hq0mee?8-au{Ft6p% z?S&bFl?$08?L|#8joy!#SM`}^v)0=!SPld<^M))7SV4X4-r;-RR6J%5fUns(=M;s} zZiRKJRmJo!*2U%QfWc-@Qxc@T;|$KD0-i1_raj3gYX)rEM7{6aP8MHb8S@a<9Yqvy z73u0SJXb4?DIwI{g6nCS7U6bG8H6LdCb+w0Nd^qCpB zXVhX0NHXE(ZE0p$aixnG`UrNd*Q7(K5Bw^ad@W;fWscbMEXb$B(4Z_NZofXhgbA4G zWQYEGXSH?*oph$r-w&8S?bAz{z%X<0z!b&bGOx%3lSS}(Aw2O%7kK&BV*=&Q-*akJ zoJs?C@+&r9F8l*9;yYt8m~deU1fFLwNWAf9q`u0L1965fqsr+MsAE&Q_~qwBGCeBC zJvI%WVc&QSn7vgKtb_t1ItqxTUGU70pPO8&kD2qVVO8C|=fSxwKVrMntsF_7O~;nC z17+#R8dV%O^_{=+q#o8(q774she1aqOnIK0h&<1%8sD`JwbB<`fb_(Vy}C6Qg$!(n z{`{dRsm1Mur*reO&msjc1p&iPRD%(<^qvWd$(d7i%$Gp#j~3|pu&LMSJ+q+gy}=NW ztih~Cr9=rWPMTN;3p`tgd54Wg?Rvy1mP#}oNBr5v(w`?sGM{)%rZ|%Yy3*fg5sb=0 zspO3Tq%PgWzj>(vjRCy6)hq_-wa6b)5GLA<;7)7uviQN4tIXr(TfDy0hch_%r~u${ z*%XN|hp8Qnf6?DjV_&K(<2{Y_p&xT~E(g=I?YSMm>xx{NzO#mUqB0GLA@1#v01ya8 zzuhHl?z?46NKvBUl;Z$CZak?z3K!`1=hi)me?7BhHl6z^G`UitW!zdUTx{}MF=%3; z1*+n_0*Cb}Ji9_qpWLHK@$KY6=%{Flh%6G8@mCBC&z>q7DFherZn-sjdSZBJx)+5I$K&cAY7U|aK zluW9Y=cvxAWw4>eyrr{m)9~W1R7ppB*#up8@wsN7(7A^+d+fd~U@^JIZz45lO&U$` z>#qQd!+Pa$8;}^0ZymfiZrt(i8#Y_(ta2U-*eE_%*d?7yJBv7I5$(<`)CkO~id4a*s>K z>G$e#2%sf|smmuT2dU>_Ak0nj-&+m{d`I(QE;g3@Outg4`yXLxahSWHBj zczEh~{$!EQuRjuV&n9gZvYbly(+%4GxC?r(U97)hdH@)HgdQDBYdHhJZZFRRD3gdV%VMC`VMJUp!$-HX?0RXpk zKqMM)neVVhrsu0drYlz44U)7H&%=~&c%a(AQ^Lv?%wfO~NNs7C%=01Ea%Hgat}+OY zA!w}k?MzaZV>!bq&IxMT=~^SKh(rrW;A{{8B?l5y%zu0tc%CS%y%e}IhpIk6h`-`X zQekkI)LNIi%Gb^=b@slSP_)2xy3A5iacMq&U=n~SY^aaYjTq?#M$z~>Tck@QcXQ=Q zNnLek#$6In&~UfQjSEG@09E8Up13aX^;q8qDG7J$f=pB70ci^@P#!EWgck__sERiW z%%%Dx!wN^rI*U$<7w#Q^l`7=$3Gm z7tj!ZdBeDOtpkAT_kYR3k(<&b`p_W)alg72!Mn~!)_6kZ}uDi&GK9wZ$ zN4mmN4-Alb@?m;FZ{zk%FbPTh{qc(0*V|sGof!I2H2^?3qXnj+SrH-axXfw@;jJll z^Js`kWskmM9dga+Er>`g@EAw(^a~n>Y0f$Zb^%duz4|yIdueuKT$lZy~ zu#63Zf^r`FTCLi56HzU75m8q^+66t!B%FL`yoUtUW8#hHmOiJv zxhHvC9vCUwdT$nVHLjE&b^S0y35n=(hvxNLbCs9sm?+9ws9y1~(kD#)eKm@#!*w4UFeA_r3`p@zpWH+#^oq0@Jq*YtX!q}?g zMetJ9m12?7*BKxn9R?E`_9ddfkQF_PCX{Lvp#n4K>BC0?h|J9&dT)HZG6W8MJVTbT zgRkTe#pTvJ0>B<~uS^djaXjlRuWc&U{WPg4KhoxL*u!NYLVb*)l7PzhV)F2bhd!%3 z5xJb#gJOlbZdQ#{%HK<;mYh}b6pGjvSg#`J6P18!a3+znO1LSS3=lKJVm|?jcu91{ zkhMkL5a@NO?vMEy7|RJ=>g9ICNYKiOkJ8n%6oyo@y#}7s65$XlDJk|-_kh3}Avo{! zESVD^RNK!xV$?2qRD=;T<|t zHwA;FXiFl^(C54l(6kM*wPxf`+flMwtXA#54wa9kVrD9u8yQQ{QYi&x*Sw#f^7XNc z@o9859Py)gK;4kHc!3bp+UFKW1X{&3(^hDFK@Rs2CM4Z~dWv@DrbF5+5>e(n0iA`K z+tn`CwLSudA`fjDx=VfuL2-GQqBWg%iqE{}fk?~=xllYPLc|{$9W3NFfPg%)r88<$ z&~7;YAXK{Yg-EJA@#WO^IDznirdoboW5M&D_Uz1DW)Z(vX!2|y%ov|+tTw7Yq$b)V z|L8frkt3yKRHVuBHI!7C)bJQ}K9ZO;ZjbFI;`$EP++lm+b~;H}y0~ay+oN6f!^ZgN z+vBLOUA(u;(rsle&dTrsA8jcP2qL}(bLvm=1f5GU${GS%@m$mOc5w+kkK7qdh?Ivf zQ_|)}?nHY9yEs+(mPfRYJ>b(u!pu+?n;ITZcy50*mPy~G;FAvUnF>dp;99J zj9V*+0>l~A#r%enA+5vb7&r8e?zIzdOIH2(yt2jEYQ!M9V1fVsMdG)7OrOg!n=m(c zR3BIhLODI>tvFbrP+KK>ho~=|>B9RrCm1+nRv=ddiVpq;8i?L3cS}zaxZiECT_gQmuo&L1`$y*HYe0fCZfJn zDnfhD!g{*x|IlWE8hcfZ6(P;2x!9#!*JRGXp@6Z4TbCO8pZP;vI~ zIq9;scOurN!R3rOQbJuFOp#91gC*9rDYP^LWIcDwjGYUw)VX4vSq&%-5SYrZuwkEs za^Qj7nTD=JXxmfe>r$02?dfthGEeyk>6PI(+g4?x>Fd01_N=d+tO*3)VOJ(A94ZG} z4~)tvh`DR(!aREIo6Vw&E`S@}eSAKB5%i9UB_T}1en&z34*HABLx#TM_zq+EBIQn8 zTm7k7R5ByvL&Qh7Hb)3J{WLt}uRhNx$TE>yP`XvIMdr1TG4=Yx(pA>Y`#n7OdPdhJ zVM40A$xajGQV|nmHCE9BTJB&M86O;!3o~U&=vy~dJju~bifmRy&65xVglJv8@-ckY z!nmOS`9X~P-f9m`u)!mtx@A-Yp(}qS6mqRE-St;_RrLy@ZZ8-)y0R)QmTcRUcBQ3u#5A1VxQm~H-7xk=Zh06 z9pFdl1XGuU#4qf*H1a@m$6xm??ZeSDde?`$G>#GE`8)antK6Tk=y52W6nB~&<{xap zY7D}T@Wg(Av9~Sm?O{VnkYG|mRw{qcpjUl3+WLsh?Ky?&=V5glk z;ifcYP*Z6X9Cbi96??~lKXSc?J5Ev$jAa@nv{1Kz00@LB01X!o17CmCECCvM{13QKi+vP9g3d*JcxVsGh==#*xMk_QYp8sIq^cwq485GwDl;>QEV z>tKZ31L|NL08#&`Nkd-a_*ouokYbaxM&6m17o%*H6?B4B9l(3tmYOJ{gOS+I4E7 z%!!%eB2Eo1VKy1?qjK5m!LP{@RA>?rWSC;& zn1Wd}@PK$PTTY0z)3dd&q zM#Sr;v!Y{^FzWu2O`e_klONk4r*v?Dv54EUtG2Nnwy1*S8;Umn?jXwiiEXE5VSPe^ z^w=!fvKPCQUEOpo+3cGkPlze|&jbFKU9K&+PBYv#_z7#kti*!)G86fnV0wiu)c zC@Tx=++0CyjoBaBC~J&Tm8;!A#(0_|cWn3v?ah`V2C@nz4*C~gHnP+-4`$3TQ4{LpGm2eSK_C;NeQ zH^r7kr1%eL zCMj<}#QKSkkj`?G?W;T5CNt5FlszvyXu#k*uq&-y*~EN6L4HXAkuw*05tv8_^p77r zzMlbF&LhY&80AA7C)_Hk>0pN-Gw}!9W-arO@5sGg3y1!>Dmj;7e}S)3*?(ize|C@N zDKuyQFBI|aS7f)? z=X)Wz-{u$JJ2)ZqwAOOY%h&k7f?P~q6kA>t^`7m6_iCv9iv;|<*tiYz)MkOt=SJ9y;NcQ^PZ~`2?eQ;0?u_XW{S9Cl+F{$SYME}1 z>VL%lPxW&+kYO;gUIaAeb2zx+mkV{3>S+0b6LR5lCEulxfGPt@?H;4Q!O5>>SyZ=# zHwWRP-XDtISQW8;hpRkA0Qr7Xfk3tvj`NGc_Yq<3M|^>>?I*<_55UO{E=1X_HF3_3EIiP2c5h5%6<1#nmb*_bvKf})%mHz=IT!;L5t`DP8dC% z3=!o1naIBx`Uj!^0l>dk=eGWs8PYAcFe%Q3Te}{9(8)As=F1MRg^xEkH`janxXS3r zKKJn;0}>bzIMm{!q*(@iXqD3BJIC?n-CQ7XVW)Aqywz-7LOz3~jP3Z`9t zaXhbkfCHGrQ~I6V_lJN?Ed$7!C_1qk;JAlUqb4tJkl1N0#c8$S{uH5c=QqGdxyWY~ z?%0<#o;)vyHu&iyvTk=1srNvNSmMau(6zpAyy$|33jLrh@nnkkF!bWz!{=X?@&B;O z|Eap&eQkFsOuS4-3i^2n0gK7dP@X^2J)%PXkWdTS$cB9D-io{tH7n0H*h+ zES{qlSY;)1JjfLsQnfB1?F-Gm<>d=x1l)Hi0ZW=?L>j^M(w+#+bgZh4qs?C+a(8sp zD91W~rM;4<&Uf^-#C-45pc>!e8|&4tR*__-*e@wDa8H&nQvO>UA6Y~d)>%}W&eqA0 zMPiCYia#a`fFy*W(HWu98K}9Pp6xuzs&FH1nIYh$r&xsjA&3YW_iP-ixCcc2zW06R zivMpwGY7*#*0p8wJ!*Bo=$8XP`GK+{ATKZEJM)gVQXw$17HTOgo1lFMROfyc3SBk7 z@nJ1Go_ly*47A)GU^&iMI|>o_*mdQMBM!KCjKkD#x4+QjW`U-c_Ba+NJ3bgF^Mpe7 z^!$ayJtT)Pc)yYP=fp+olkz_gLy4S{r7`{yp|xCe{%ko zR|pyGeY#BpU^z}Wc5xqD|E$utfRP_u1VawlV~mgs3?v;4K#CH^^70CS-(8tQS`V-m zce%4~HaM&houBiUy*yP+l4SPf?_16A*!zU1NuEx@Dqv92gyc8Adnmk!fbbX)NfHHz zN#-SvNGe7a9{JY<+d`{IID|&xFp-#$1Q3=?PMmVp(h}IRayNNbUv(|Kcy17AUymA$ zXX*kYC7!=^|6$D>1N}RtgwWqXwl{1c2jvN9xO_uKsBK$6vW zvs&`sPQJ#FrX%CnV_uQ37b0tx)Jd6k6#_D+Ji z#-n>D<@Xn``_WLPh480!^8Ktpj;T@xfc~0avCRbf-qx1ler;l?_2h{1D-I=)-Z?%d zk+%bh03Pit?b8nO*wvq-{vV9-m^xS$kI0aLKnvmUM2eh#njYJJ;ck3Dkn0$%3iV>v&>?D_hx zez7WFgxTyA>a&*ELtEk(NT~#}WCBOuPNMO7UdUp=Cz&2v99Bs_^!26L?ED7B>c4}H zX>3k1?>@)0x?grCtZ`Vof5i8m4yyV&Xei%TA@i~qtJ9NZ{7U7iWqA-k)|96xt&T6w zY0`%k+Gi%VvTqc^%Tefzsx#d&OaM@Teb@-KUOuZO9BAii3-lAyyari=*U&g|p<(lv zT!!Z8yudV!pmS5!J+9IB(a1}By-~(GvL5>AG=@9na~P|Bt6#sZ=Epg~UNW`|)&<*E zd}7%SA5Z&)R3>wsuQ;MTzIBZBaS2m)JlBIm2S_s&Sur6$g~ zH<|?bp)pGw8V{46-N4=I0oeB~6W6Ch*e?$8?C@qL?GpWgqaPe^6LU3AwT2XUzBS4W z71Y}llqL|mGH;n%=sK*O42l#4n$|6E=&#V*X(d_9+uMC!*_4I0oVzWN)~xrCtbfq4 zjpu1P*0jL{X95Iu-oCSOFY(q${bvyk^ z4u<1jdbj2?c?XB{Id$E2%%t8{9MRP$CWpER`?p+8?HOh3U-zdCq-F3?PdB(#a$J@ZCe@!UO+ zCWwO{&dj{HISjo}y^NV~=QM%OMbM}IizOpzv)4sOm`l9h;V11R`x10ybms?d(m6yV zuP?sH^{jlmAOQ~THrI5}9jY#*9lV!Oaq(Vg6Ekciyl|?j-m20r4?HJH(@cCXQ1{hQ z?I!NCHWjXngIU>NsXf}v(6=JZ9Q@kN8sysZ9L+02`gl4z!o^ZXwA7QQtyqD}LgZV_>5-;D~3B^FE z1l7U`CBbS#r-f}u02=lB+Jg2#*)ycz;qgS2E36s^m85Sz#*6|=LrnOz$rXhQ`}-QU z0GjNKz}}RU7&_WXN+niVfdK%NNJN}OD-e`KfZ7PPfUrO&@c3z+dK~L1Hf?N!j$P~- zEiP*|av^&&`9?nsEF-K*pooyHjzO)xred__kYBx^Vu zH|9>dc4$4J0_%Q8{i@~dzdh5=mI<8cJ79N;6=W?S@1)~j|`D$fwj>VFnG_!G#6;FM=x{BC2af9Ja*`~UA7b5 z44^N)0OG(jd@XFv2DUA!-iPw~Ts>e?1qmECavtna4@=_$?~FbUGbb1U4%s8*S>cA( zx#HV7yVaeQf-6e)t9KYBlTy1#8ziu^7O(sH!%^YNrIt!s-MMsUD8#F}7tc&>8es8{ zb!srJc~I}1UA|@WHM^)kMNTxw6>I2}z@$%Q#&C4q-zGgf9q)R}NGw@it*sNbuB&X* zaKu}~SDjOI9SEmfNJ9X(#W3%c+rFgtVJtA8vcp?4rNsI?=mnwotA1^Y104+=!c+qa za}7DAnzb5|Ae9w)Zkjy)v(k;eZnnfnFNZ_w+&FBTbQm1BXiw~L;WG`l+@B$=R6lc$ z)|~ug&K>s#Iyd(7OE>IoBVDbl8hPERPkWC$7o%rnI;TEql%I~=ILR!0segGiRP(Z_ zVZnq5|EM7@sg?X2FZL3{#%@WT#||1LrTVqkD%Eb6N|R#m0OORJ1&L=ibyUh(nP+&g z>9JlxYC00;@LhnvYj*h#XTO4c%FZ~B^61e+5Jy#mLq?@};&EHIdq(T6ugZe{Y3>+v zubH5O*`1l3oIO*o?br<41Uvu4?92I)ovln7XZD``yokn)OA?UgjA9h;`N-~~$KG6* zoMDqyZRX4EaXt-1QmhekNd5yKRCkvoNS;kUM0>IJ$_kan@`k4MLw&F`dqBXo zb%n+IPCk6%?vM^d6&`(pBa`kYLvoTvNNhbgEsyfMdTyR*S}x5mO;#djX}a*{irHN& z>h0E=D)CGKml&XF8X^nrN%!dUCSB<6Os7xf))go_F)WRZ-W z^Ssj`p-&ncvhWgT)-;cdZr}W%v(K#bjCgEkkYHQv4Q!&u@S-#;#@8>-swL1qTH#Om z#a!7~F=9*PC}2JeV)#T+iU|<8-Ze!M`Ha1KX9)k8AZ41XL;q1r6>$C}OEWO6(0Q4u zjFrp%9i_Jb;bcp?a1XU(1Q?x^2a^(zQ#OLItHACjLtgkQ#t#&3e2e9 z@b5u_dRDdO^#L6YM02++aBzTN5SN&nM_gOGn$9V|GcRu?NwxK@2Rvm;Fnwb9^2yx5 zxs!M6mSAT~ZT-egqnO+Bl@2kQy^x2@`m141GU78sA!2ebJ^D39C0;(CK#sZ9tHHE| z{1vL6YvsjXK}vQ&4a*7yiKkuvmR;fd)(5c1240TETveO|FVzP5&@ z^c63g7JtfeH?H68#xQHtgUNBtPKGR16x&JrTGe*9N`u*=%J&5!C;RsP+A6Ge^jB_1 z^HP6%ybmWnS73>r*H?j}UZxv7`{Wo!S3Og!FB3e{HDuXsW0h32q?%3VmySgx9xuUa zR;KCWwLCgoCBSqtF}4$G_{it=8t+&!lQK1SjP|KE0h_+?;h|FsJ>{tCAK^-o#y9%1q}Ssc)4Z5FX-xSgtiVj^xC_@{C!UZdrU* zzY*Y+a^=SKtzzF@At!QRda6#`%C_&z4FVR{(G|OVa?5cht?v6aryHd3oDM9wa(bM= zOzbbGw!?|x^KaHouDzCdmyyQby?Q+z`R+?(53B~cSDR>5d4-UoUCs-laN15@ESM&- z3b_|+5*@Q9lbZ=&(c+(ZQ*kF3%lLaD#SZR;NSO(R@+9auP1hg^g!ku9K^J^-Jduw} z{mO{-uCi3pVFyiLP0}<)R?`luKt_u@f@_S?FpF{ds^MzR@UH&-iNagOm=6QI4<7Xz zM|{C(Z`oj;MV+XXxoBVWW8-@^(07n=^}06lnC}H#s^*I>y>-v{lgg!2{&)HlQYeV1 zEmDgMZ06iJI@Lw?G5+vhVbld-M>#4`cU3W}il<*ael3{6G>;FHDV-rnvsm`Jyur6A z=n)~H*;?T|H@L)H{KZ%Ww#r_QW4+v6rK+Uvi=+IAv$YMXgkK#q zOsWS;J6Y>aT5EmIe*Mi04u-8yOF;@Fr1Sx!b3GxE8AO`{%{B%?v}XkZt+goj*13Ai zM~?_8J1w^`2U<&)?1A(|ee-D%1l>#mDOS@}lR;szQ%~z+8R2zr%IFImonQrosY zhE*yh*!PS2E(Zp~dwTdKbGYyobVyz&(C%J~`(s5`L!vjOIxiSSNr~srZkP7n^#-0w zTFlKgAmG-)ugNz!4LRD<6RW&ta+Vk%zQos}+2x)ilA9|pI7Y4v%gV4ea%2&%$;-eP z3&6B4cDAa*;M&nP#HCbva+M-E7I&l}JpLg5ji-+Eh3#GJ@;Xlce#WR)(zMZlOX&pi zx9aWvv9=@>0>5uIW|)o^V{r|735ETZLg?uJ5D9B#B8A8kh3|5|mOirYA@y}jG;hQ9gHO5Z_GclE#1?7#8YMDJ2mKRN3-rR&bwleiQcWtnW4 z-~Xz&$u6>(GnVOMOrl{`t7lD}`Y7Ip9#U0Q*VD5VPpj}jSt3T6T3P916L z|HaKO=ev8K?c%wM2^XDoID7v0msPB9#u+ob$E)*S+;NU8=Ive{XurDe)3xJT%K1R~ zeEhf5#*>+)CMJX40~h0Y?HSwB^X$6L?S^@dqZ{wep@SR#^fjKo)$>a}KB+mX?-Y3& zfdLVGh{liqE)ntf1sMOP#^vPf?w7H)Nw#)8oMm&+k^XRo#l4r`T{$;=6_9*3x;(XK z>ek+WwR!l)p`z)VvD=9gTm6)-6|>9F5~Bkf8td3nX2;r?TFHs{2CXkMAGW3D=w4YU zobots^&TwSEg!iq4IDZ`IxQ>Q@08egaO~}eEvaj=6kaLp!~n1DgV;_ z3r+4~3WckH#LvBjUv%30zOAaP(Chzq{iCqNWvYJag2f&buNmF=vtef#ky8p!XhZ+C zO^Qbp1oT5^Vw>JM%dpI)<-TwCnEN080rv;y@%Q-un1z2wV?`w;Ew6DcAFLc!Ip|11 zApctr{`0Pf5px~!I1xD|pRe(xyt;5$KIU1AZ^&> z{M%Ya#->P?SeM4wZutvj`u}OV{}IlA9?S&^Wn@m`@=?I7Q==2l0DCdBL*G$^+W>vv z8l7cq4}^XSr~iSwztiq#L!$F7$m)B&+wl6s*-}z%6DNeZ{};UYIS_+pB8P8veaYh? z(M{JjvFx4n6yr|+O8o!4>kdmr*Q)B}bq5vqURNwbb)@&WvU{5>ahYxMUPDOC&XwKq zA7%DBzV~x@)HGM2)C@g4uQ9|~9X}nFPQ!&^=Yg?y=5mrCKWoUqs$RU)Y`M)T{O_lK@}YeR84Z6%#i?p1J`4cyLWnRlP+qrUC! ze|6bs+6_cetv1F#7~veS2XuTo-tdeUA9iXo9N*CiwMpgdcRNU0<2B~I;NaeCy4qI$ z`;$t*ac#Slx~-yK@Sg{k9vR=?9sXSaT~xP~zK!;$9`YNvP)HNMhBeNu@Ar{8zx?>p zp{N!f41k_*cAvEy*Zzw=^qfu7Nz(FPO7il4yZ=k+|1N2#JgR^J)XKwaKu?mx9Af^^{&0QmLx9UW*$f16qW|MK?z?)zU{Am885kNGiff#2oGGowns zGrU)4_hbL#>sQVF*R~L##%vvV_7HHjx`=1?WsruoTmY#6@+Yt>isqCEN!To z1@NB}T0d2`ulUy$=z=C$SEl`$$J`2ayt~drFin$v&C# z$XhuFIUEEzyM0NwV(;pFv0)U=8YUF#005y-ZUSl1exNx3IJG;7gR0oKF}u;cH$msDHtS$7Z3=NdG4`(aL|GA;f8FJ))NlPbhB?XqmDCz3J{FsvqHyyH z(WQ_%0N`Rw=+dCtOnCf-dIl)37y}6uxq%Rsual&;jZNgDMWzg|5xO7!;i-fIL+pk#3?j$c7aQ(hL$gsy$5Fw2~htuv4=i0Ru z)ny((qEs#&b4~V3vGZRE&m1iqbl>zB|$QwPZP>Og-O33!Gu zPzxQ^)fKr!b-|zL8WvW5Cb9(g1U~hvnzleli*6;m+nVQ^ivL$A9$P2U1)>&7$@6eb z$lTD*zq@1;?P^y`d*jPHg_Noyf0g@(l!~n`)PeHdM7myUUW41L4~h#FVW~s_PvW=` zH={G3AmrR-d*eJd&n`{FjkkaD$N=eF{0W|?1yY8oCJp zpmIZY7}7tZ0$yB=I6mS40H5NxksMlgiqXsX^U80nDDN{-w}W%7+PL5aavfuUU;W;X zf5Ixd*1ZEg(m8ttl2q zz?Ms|6<#8Z3EuZ?9=Q@Kfj_C!LPp##FGweC?odSQu}T%+4flYK)Z2N@Xt>=3A^*1t zoVv;Vl0y8t=j^zjE=lTLupBy*EC)K*x0H3!Z&p#!#eAQO$`P%;jg3BL0ENd0s0qxY zC!Y+wm$4i};7$tJu&q=vXQrsm;|py+6LmhLw+pxt^*zwSM8BATBM*FAjW#bO?ZjAM zXd{Fe8jTmOI|YXi0|Y!kZ$|(FKl*#JkQ()(pHy@olBn%LD-#%Hrjw+v@4PiSGB;Q^ z;8IQ$A!6iCZhBV72?{PZ5sSu%&U^;!|2X5r0+Jzi+_!*+{Z?-Meyf?4{xZXg9=M%*z zYaCuLIN6md3ru6{di(u(80f1?&(O^MFFgc{yR!y_L3PIL_H)m0w;0V~Fr^$ab=qWx z(cz5*aCwIKNQ}zPM5FeZ_2cdq+jaohKGr)uR^qQuD~uWE*N&^-!}q_sR6>gfF~;pO z7{PdAEugsd8U?*Sy7;{IE_;{_ArOS_$Cg zxKj8YaY1-4tk1tF$^7jXsWE-(BRJ8c8HjK3>ga+p)*&jpt|wzf-Bue{bm(4#aE-Bv6AXG=(Pchz11fKM9o9}Ic`U>pEo1OVXw%ILW=0|3a7 z(;fo=0I~W@=KkcR)?DB0K_5FLoPI@VvwP{m+~@4hxUfuTQ1^vh^H3asKyVyN#=^zI zijE~1&}S*29WEk{hEJ#P8F;dzf3obZa|c$AjM}@IkDWkpbStV}L&)Fbls=OuE%F`d z-J{Al_hf(hb0y=S4t`&@_euk-v~vQ3Ns0qb(;u{-S4TYv=$^rZ!iFE<{Z<|0*6Z#w zR<$vXry)BYH&+p@X61lSe5v@JPx_T>d62{BD|KH&biv2h006AFl@1v?5Xb`xY?fUG zPCG`wQ`PO*#1bHG5#>p>2FwTN*wB3Hv;BKK z$LPPuqt3O%&tJ0b)5}B@x(NA?kyq#_Go(e>D9tg)Z%H%n;*J%2!MVr%TnKZUA&Y#M zn7+^qzuRhIFe3^B|0Rz7u2z+{ixvG+c2ALipC@v^oSOj`oJm$@_V_6)PPNxw2q5GE zVJ5nq-}v}Uosu0bNpX*lQSK{b&W15X;Ir2FO^-^}sDuzi1cys}t03k<&&Nm&qifAH*3j%1j zsV8GFVz8c+0hC(HcaD*?eCoy~O45z(-nz2mz`yb}Ya@2AMFM6N!|b!%K-NgT`jqta z1R<84aNdIm%eDCqRU28`OT~n}sr4H3Gfoc#_^*5p+YYJx?dMRChuFb49QVI|2&7tZ zVE_=jIidTFtuVa3;N8palw44(znHT-+F)M-6wp*Txl{ zcq{f|16yVh?fGuxqqnGo3GxpkD1C@}J$Cyd2FdnLH(6MjZufA9;xYF6OdyX>{@J+y zDMg$Q+2x0~A%C-@2l#z>pKSOi)4!DQu>1Zto_>?q@|UWANJ6(c{^euA|DkiTe{lT0 zXn&6IkM}o!l2y~@f6ZwAB{OyZ$)%ZhH_c0NvDdCP&9d1E^ms=i&T+QfIH5}A(BxxKZ zf+rCXU{Rx^LAO@NXMNN_f|#dj_*y_%{zE&GubVt17~A#5-P5XjV*EtHsa$)`*0As& zv3gqNTAXu#pJwQ4WXd4?DEhr{&;ti!x~I~*@QamM{7Lqc@nNB>HG=D^yA{elXx2~L zUp};cLww;Q>_GfT9ONYo6d;4j;(4y^PmhMTc^&(W9`(ub?RthWu|aJbyxv#s{j47rAfO+gMzu|cZ3b6pSoqTi8y zDq;5d)E8m<7+0hlck5}wEA7Y+85$|PCFcy^za56g!(;2%i(m+0W;wWgWE5c4o~?Va zjT?}ZB{CsZFmLBHy20MtGw@;WC2iG#i^e>$_@x{CjNNH|QK2pqJ)N8dLwdZW>SmMV zat@of980lhipSz6vPE3cHsz7f6TF;q*DP@TBCCBIVYR{a6a=kGR`YWWx!%QQkKu`a7(>^JxcD?uc9q<1 zLH^l`+L&eBv1}h_vgw9z`r%S@;ae>@J+4wY_o7C&9sWyNqkjL-7sD_GnT;h%JzoZ4 zeM&uwE$jojk|j{;v|dmVa7W>iXsi#e%6xNq#xPQ#C4cC<#v(p#iJbYSvXP%r@$_yB z-LU5k2s8<@KSH0gI&oTZz+>4}R!rtrJ!Rr1mhw82j+N(;5leD!j0IY{iBQAv5E{!0 ze!VfJ?c1r;zKTpljYSveLVMJio|U2lcYr^*GM{v-$>KuldQ=C1<|HM*avI%&!`+D4 zj^Jq5MNuZaZp_Euq^Z;?31n%DO;TZXNRXMKgjveYOw>7v7jmzim3?l$+Jzk}f-xru zopYZ2kwpDU`NLJ}45jL40o-ALM;m63vE+xKgW9OG)?;W(H6AOAVZt|Ufz#1VbKMrh z(|k_TEo+yx=5IW@hXKjje$mGw3Y(3rPY^9B@dHa*Czwj(r{M0YGeH|`GoK8b`)8B- zS6jia1-#voa*MCuT85R_O;j1izX1RONP1{er*unO8d}ES&Br%$gN5?h)P<)%{CSlCLg#cK&X!MNP? zJwEun<<&&vz3?pmEBYA)=6jk^uK+2oW0V1lLZ&WW`sEU838&4i;!JcteZl)d1FFAu zrS3xV+Eh7=x167zIBlhxM8yO^ktK1c&V8&z2a6MDm)fTjP=V~e7Iz|8@UjQaFkx`A zVaZv)*RXhxfKd2Fzg($}gEw~UDZP9@P?e0lAv=1*c)!gg{bLQRj*AqMIz4EQr z(f1z`A>tYKW05=Em4NAv*^#hqfD*Sw_`IBe^wJORiskR+{P|=2*MKo8v&DftI0gqUvtxL#$P->} zBe)TfNgzTHL7D}HmBa>@$re0|-um_fHb&3Qyfm?A!t1GHkx((fCp}&9r*@U;-a>Hf zi-p&(pi{5O{i8=6i9ad?GS&58RxV)ZI55S>a!KNHiJu%nx#~0O(5*@mJ8q?O zTpb2xY}aufU5<*qlbd?q66xc5CpVk^;i8pQz6`%-Ig#n)>|O=tS%3H>AF*(QAO5PS z=9V4vlsSGc8P;gP5gp`+aGDDV*C-v|HxmXwxo?fDj^;+q8C0=g{e?sb-#4E~VH}p1 z9YsAenvhTt{Z@IO6_3N{C{4ha&DKLc=AfKyQ&V@t%cCX1i-n;5wYW%I6T0%JD?2Q_ ztwPMBU(5Ke3gYgu%NUktA0!|GwIS^Qob=Dg0~cPPgyz+7#GuI1zd%w8el?qA@DNlU zCe+TZlE4}=b1DCD`A#DD_ZV-fWs($BTc;K|1T zFt3=6QlFz7IusznlKXYaGKh+f9cw`OOJMW#9$5(1YH;lLxT~V=AoOQuP(shE_jG20 z3CO8$7?%L&is!L$;%+(@|Hg}@f97YG^Fu!I5o70kZKA!C&gl~&ro7>7S^K*mydkij ze0upH1c5D|~lXlT|sK`O3nmhB08)weOS; zL3TH8m(RU4($w0~$+&8H{Xwjlby}d3DEoCo>#AV-OiB+GzK(mmzqsqnq#k_SsB-^3 z^*PvAOpAWnA;ebS$^EenA1jUyb7l0k&_;Lmys?%{SNqWS8XScWtB+7UHb&PPV{U<5 z#=@_`uIS0b`9-RX53oM=sQMVl9$LB1Jz`nuR0;r<*zM_no^O`PVgQcjL)kkcNle0E zv>7ma#GrWjgtGq9Nkk^dCcni?|5y@*bembFgTwmigqP59nCyd?9A}Xi%;ys8=&fte;@>0owiBh`2qa|BWbHaDS>9I(c4GA>yTGi=KwkO`L{Jd+R$-pH`_<+j_NOY z@YP3kWOTX=QPPUVl5(6?2#UMmAV$9@^TbJ(e^vsH#UC6Xz~r~ZUyb>g_?7&Vir+Q0TAwdbPEpP|0epTy&@ZZ(YW-#?Fat z1Xq4SVIa4P_!%qb`-PeqPUbSPFrL2Se|xYSiMEUNOUWwu8*lSi{1cI+RzGn~q^Er5 z0oB64s3l95bax|QeXz0W~>f} zp0!~)>f0Emz(;7TH&z*b6_>Yx_Y(UT#lDin=*8|vYZ@uT@RERh)@j{$6Nh!GCT|~U z6U-!s=aPK-w$C)i%dFBWzl`H!69!7KULZDp2`PFeNane9HP6b&) zqYGdLw8o%F5Wn!_x(*BA(z8^1?e}7SJfH^^=@_OBP{BBPqUzT*@}32BH&d?5a51(h z`v;;RpY-1ge#B36Uo`okKB3|E5CYG8l{#Zbnk6j4t$Q+0S8wQU06QWC1OpM(-e9QmYKsHNC;RppOntI5kWMpk z{r?il2LLL^%A>jQz}d-o10cFE} z&35Q;sWMxjM|Q78S<~o^b`YneC6@&SP@C!*F_~{f#)BjToVab=yzMAFd7& zUEyca`%h!=&{WYkI2umDSYHqQfOh}e?JnuMoxmv@re*Y7SIZi1mT+MuVtbnmEepi~ zrwV-n?$(jCeV8u{(9<()OG*85VpMc$IelIC*Ms<)`(&1VY#|ZT%^H?{I%IA>!Tx*Q z4DExZR;uX{O@CL=;mcIr+WDjSvb~`=-{#<8*I(9TA!Di~^sc*H5_OnPO#g?-rb6gi zrcwDP_=qXG`*{D&m(8W{Lico?-TptXJIjQ+jU7#;4M*ghC%4fR1nus;L~j2)V-u{$ z$8mYp(86YB{@3X%l{75IxT{!iT0GJ`6!IV5<`vyZBG0@i+j z-act>R#!*L7R#>y_+iu6+R4Q8wFq-ZaFAI~9oKi+XNV@1>~L zbZh4_Zb|7?Z2gcMl_BXN$Kkd}Rn=;#e&{@o5rigwO{F|ke{w4buhpQIpw};T?`@-H zVxYqFBvtlVEoO_Tj>m#}iiK6Q%>I`?>1X29a4fZr?9OkUr`|<8i?*Ae=eKC@++3zN zpPkKjv-|5$&zkt(IMu~-Xv(t=KhrVmdc`;oa)kj$Iy^LdN zT;!x-Cen(^nzqG;{$?pTNNz@tzd7dM-r9dA+&k@yIIZ&=x{MxE3D{SbK)%S_ni%tH zJCod>!t730rE}-`9{rJZyd|THL9Ujcj_GcjRkO$6KR4}EVkPZcooZY0Pi8h#WHxH@nHJuzil^NFZL`;4) zc=$Np`%E37fvUd0Fusa=JDd;sv zLI$0jc9Sz27ZFGiaWZU;P7xf<=ZZD(AdpzZ5OEyf|{b2Ea zu-!wB(?@BF?PSMBn=dnAl)%^9^^I-K_}vVb<6(YbX}FPor@l?ji|mQ_4Mm!Ivi3a# z%PdJC0J}6Ie%mcZDvjR|Jq*KPz{7I`eCG#z$l2rF*vYW56ZIG-tPOqsVu5YO=~?wd zYV7XOUnXJgx$A#4&ANLO%E{c_ePE%+W4yn)m|G91pT%UHwrTmT@6>=oXc1oDZnD_( zn9f{=zoW=x=>CTy_tYlyYz@Dvsd+NXT^l45-o~oUF?gi5t(=uGt~(Bx(fUc_>-mdu zf6fn-z)|krE$nFpcDyg;4SIO#X)(ovb>@K=^_E&3yn&re~ z?ur=05+^>0AW+IzdJ|)IGtr<5U44HIx8@Jv(Ua`eGeZ)Cy0<^@_CE&gaHAkYVjAW= z_)L(-Sfg@pyUiD8sDPHRkzA*C#ZA3qwy0)73+M@(wSGdGbY=-$_hYE0!LxHXo`zPy zNZj@a}W$xELg@qmS+=U&cRoO&Gmb$@jROHHhsA~f0p<1eLTu^MDog1wV;q8 zgSLh4~P$F%#d)~UmDSjt+$hjYXB+m zFmHJnIgAe&W8<`|HvbWW~r8j&01ozAKA5xr>HXYLzt;JW-%*w5rwQ zKvCGPYuqjWHu8I2E4+j?YCTgH(3`?ZtkITMs$Q-t0DyYOcvj{)&+lt%TO|F!zA7lZ zh3Cuu6ufI{BC8Zkv2hen94^SaWH-E(azo@JtdY>HI*bHBkE4vb)gBbhP&td<#SL^vZdhJrU8R&c`a_#LN4DSSr z_9ee!5Ia_yq2^ET=p>K;-R0ITZ0SD}!-m{(>l!>KUxK=nDoKjEv5iF3KX+4vvxQ19 z47leH0M1Qs5_CZ&(?Hc0P3AymuC1$bHpH%t)Uge_-pro=eG<`pm#bim9KB4Ahx&-M;7_B7d)Jbq~*`4q2eeWG!&w4x8X|-XqaKk~6YA30D@#qO{ z_veJ{Q{#KZh+v-e!vmKDhqd&S-z_!kBVgmXDEw=9tX&4r3TnXG+0*8uW=wf2xB6Ad zKnXGpo%^eyy80)de}C=PbM*&T;nu`AUwTp1*_?0iNj^LVUI;Cqmg_iVul@~HW+$Tl zEcI4GkK?luu;8~T5Dg=*9cWxvRO}2GyW*}c`n8<337zJ*Mux>l?~&%%calky;-Z%0L=6W#V|3TOYKI*I=k~hJWuPRMv|h} zfvBQfo!gPptD2z->)ERTIVhN*ZCD1u&!AE*cvadcomI-GnW*E~`>eJegSTZA(7#qr zxJBXL4YvO-b^CPwdOrZFsTquSGJl=WseAc1B|`t5Ptrd&2Ky!7IR@Q;9{bQ#O)~Cw z%#;Bd(SY12y*zFPDW2zK=vUj>Jn5|quj)iHnp;6McQjn-0>SFiAQ;#FBl!N-O=ESE zx>(`<+CDXeaA0`gXBbnpx?c&yT)o0~ecob5yl~yMT#RDr+Y9w|mU;=U2ZvkdB@>vV z2lcB@)7tb9AmErQ>4*ZY6E~m{^MQFEYwg6488{-{yk*x)?12l3zHtNYpCMfr3V)Xl zt%tbZa-N&5@_mCDeM>7inKEj;BGdV+rCI>S|yRApx0NFj=Ea+oS;X37`(->h!iM%u#Ut?%|)_LN`=;vuRvlF>{Xc#0~oTpr~?@#4gv1&>ZGe@5zl-7Ouh zLpdl1k%q-4R1>}NJN{P|_5+>sa1Cz`o z$HJZ1zm+OSNI0qh9}l)5N*deMu`noeZ@%viyiQ1_6rJ?OV5{-(?py>=*{);3Ke%@4 zVY5*wyX1%CErR)QPb{aW zk9BmCdwyd2wgZ$WBv^!!Fyx?|UTOFyPS#@W=E)#9kDu^2>NK;=TL)yjamz=fuJ*IS zeG;h-)IGJ-yy3_6K}TimIvQ`2$CqLva7^=q7dw#k2apG&5==zefF4{l4&AVw_B)0I za*wxbEo8`MgnNN2I5BjAJx`bbvYX~l_@C=IZmbDjYoc(g#Eb2tF$aiM!S5xhCJBhL z!df-N#%8tb6yvLPo`02nAP&`k<_Y; z!BY(LDM}Vzb(!9<75SqQR|o+KG%0O4&>#7<2ULF0K@9W<^FW)R6ux zXsc+AHv(}NRrF$|*$E&LfO2zccsT^Nb?<-Q`&T?I+MDlDx*j7u191@`(!9Chziy~*D0aFa_VN?CGT9;R)uKT#7IiZUNlw_|M1oUYMJ%1(F+C!0C>M_Th8O{>{E3M!cWb2%YhNUh>ce*hvROIsFBNi>h^_&>l zLk2)};_H3B_7WDSYmZ@&lM$U6gEAz-v9|L$1uc{A5bwU?K?HGk5K9rndM5sBp!4Yb zG0%so$Pc;VT;b8XY1u~=$I3*G&fs+bn4`y9ev=`l~Ql%o413#Z-jJmJc?xfj%AGSe>%(^q(HrsqVqg$&^k)fS672I{339BnGj*F9u z#O6W*>yPoo?tBfF-CU?QRD0FT_$|ri@iq*dSa@h<1=RuztLapMmZ+ z8X+4{{T!n=@ERK-vjXD81;23ClCgDxXKRKFLV-Xft5cJ~B$g@=QGw+>K>9)E`UwF; zfe|7B0gvncp89VQecqr16A(&`W{2diZ4`uGtw@(rF}4HxoIW`(0-o83vWMB^_n~X( zJl0KtRFGIw-U&~tT&N(ZK%Ar)T0;FpNIp@yG}K=L7)erq7zvFC9VRCftRj~Xn6^TM z_=h}898jdF9}qc^fO!stA4_o>E5ihE&V-Q}5IhiO8jyi`SWp)u3q-KA-vl;S*B4%{ot`d$^GD^rBQ)c(D^4o zB|U|#1k5I+&EVnEON^F^te=$&RYap$%*=z#CV=y0KxY<9%|K^9W&;8h00PeWsDX?8 zusD;i%^y6S0 za>9`MQ-cl28!M1XI${g)3^a@7LL&-g;QOk3Zo5@x(PJ=em!weGw)>V1j2<;@Wy@W0 zT|Qjw-!}AL$v(gMl3pxQNtG>UP)OOGT{O(6e0~Cm(_eG~8RUgQg-eNA2?0z}OEnVo zYN=IM&9Rmxt}Fo~eV; z$`v}2RH4E~FSMkQiXVoi(WongIb-kbf!NS&j!pT3t&N`agNo{_qJ)IT>P+=8t)aS+ zwZE0(?za|sDojPi6?3{$U2T}vIvQo={-c?~srR0GG=j-=+qBGUw;~qcI?^c<;(W+|P!v|Zd|6K0V|AvO+&QnN< z4)$f}^x?JWUgH~WJxE2Q3PIvIuWX$j)J!1~sunRdo#QtkVCEBWNRPEm0S!hpM_X4B za6Xeq7~*f3wt$^!AE(c6XI-6`=AGZW@HRIBcX@EK_-sIsL!r9~IKD4luP!jhj5+7K zyQu|JOt8U+gwSMg;DM3!@QB;XMu><-deGMUZ)fqOv-|o)4Df^va!aqeG20YVKwbF{ zH4971D(_|&;5LmoGGxq9utQx#;;@&G>Y(amb7t&vt{0Cx^1KU9+nkE@_ywtM3xwIR74~01R^BS1E@s`2}LNRmSq+SV6^(keV<7I;~Jy1b98iFZok* zbPDRYMh(O-SZ89)1Mqaqav5_21PiFbMTY=3^GNO4WB?lMk6v|WdAh6QLpsGSwB}IF ziq~}|b2Pdh+5jq7TfFFsaDU>M!SIq`a69Uhbc0Z>04jeVWhr|Noyjja2b2KCw{SmI zkZ&t6f;eR^rwhc%ich(kOGMIzAStdX$xubq5ZJ3|SYU1k!H{BtK$^ttw0VA0K*WZ` z#QHzY7A0ZD!Tj=VDB%ey6+ymH%70-jG1_p8RrYKwZRs%*riF@x@}mX@1NI6nEiv5+ zFL{au7PuMF%HL89FQJ$Sr8fKo6LTFgECvE)qi~}~cqfnX1ynQCHOP6ifz&69|54NF zzDW0A&^v6_`rbq${JKY}q+3CChu$C+Hwpt&1VPg$=Rl7`Z{%GvwJiV*c)Ka$&9!Ld z23PpvjRDsY#}9Sov{#FkQ_*oglu;E!xZ^1Rf1|(;4VNU;9|Vpg8>nj`JmsmQ505JG zu6^i+H$VTFUanYS18^c1+>@v%(h;m->y@Rb?jGNeRwu2UsGwcaB(50IZRkWaW5B4T zBl+4Cw}v=iFx5usqBHJMIJ-=Vlsee=vhuHqa z-;gXTk8DW@lD?fGBAodx?g-AvRwe2FV71I6v_76`IX+SbV(FtjJJcvia;pia`HM;# z0e<4r6y)GqbWS=WGLHNJu5&B0Owm@_q2-%>lj%(g25uaoRgcFa!+U>j!Hvlk@bS)p0`CXb=) zDuK*t^IR8*|R2oze%L1fF5<8_6& zkE~Rmy2Vn`W@t+Y6X(8V$XKrT#Y_{de!f02Q1Va#E}_%*AhCkeCTA#BuIhw2=Fj0^ z0@l7HWrC7PEj@hQJ?4`YDQD_u=dMvd4s@$y{#9XxunYz)@^fc2OTqA=HA4-y+DS0Y zEvh#=6^$18Jv5Ns+n4L$Gb|&?<3uDd5hy?0u*HcT-ZiSJ%p;G$T;Peqlw#K@8L;$`HuBB2I5G{p6&RL^k+Rxc; z;U%fE!(_*Ng_r1u;o1eaL%%GPpy+F$Z$w-$PbV)EClb$9Ij?vREY9ogd7l}~`K($> z8FT4gKDH1r4q^~Z-rWeo#EMz$`G}D{J8H6Z;8mpuj4xdH@4F1Yo)0}SE+W!(>tPzY z(fpD(1cmPk_3y{EFm6L>GB@Rz@T=tZi5MNZ2McpY#gGvgrtaf#h>s;~U~wpB%II=5 ziw&3zg}r36BZNYNhmeLs1~d=T364JgDSKo;6+_=-RsCt{T|{WsW+i*m=bUPQQI-Eg`(f28sQVU|FpqD2=4I*Op!g=6&Z_#m3jpM(*jt9~l(7 zf6B`QP}4LXIu>gYs9`PU3fn=rmssNCU5%2}Y^pIDJc$S;arD!6w52{59tErD0YN7s ztxs>dXHLjJ?XxH;;p`!*UWH-hX0|sbt_hKCyCVko_V&IZe=-}ll=d%)Sg`{JZSu(s)X(w2bz@Yo;5nKXKyhrYO`^V&5=pBaep&gkhp89?6Kf zv?GK3RKu6pk_na^^0I8gAsKgfJVb(5JLuj>F~8@z;=`0)!rN2BTJqeR776OZ?fo5p znRAH}VU!+|1j+_=ct3s9psveEM>D!71rtEv5f;@DrvMi=nmz0*X>x}XV;g-FS@-artCT1j2vHU1(zmAw&JcS}N;vI5JpIfWz;Fc=@bD-{_0V9#8s0~{ww&F-8J~cTB7Bn8 zF0jCYJ4b~mhV{LK@Ux+?04)8g+V|pb%56ud2M(Hdu^S$J6*v@5ek<~N^368Wm0z4V zL*O8AkZ@`u3zWjq&pq`bI?hyc8i9vhMsA7;#xHqaboD)Ds}X-WDDLvoiemKESz^@4 zapQTx9r0WG<*@DB0*fQB80nE4Y9cw|IPp`<&y1y+aZA7~d2?xqOV)ksx3RK(5zY z!>8u*ip!8is>Slp0IgRdwl-|&?FJMyf)hG0Lx^zKyYKLI++<0M=Z1YJ0FJk}wroB? z$%A#@3qp2jVqTIQMD7uyXZHhgmKK0YkM1OIW`<>m9Z9wg5^vIn{>7PkmeAq>Wjp8^ zwp`QO;g~GBcH7`GBW?K5CQ?&g>oj{D{^(i@>oC7;=Bpmn^vNLR49sJ4$!G-fbB5oB zi7{Aw%uZ1T_RR2PLlIwKTq+_Va;hK055`V^BzafqX+b-)&j*O?R9M|(aL7~SEp5|J z^%ek}De>Gwd4F5)B`-^cx>ZNzB@ggI#%Cz>${Lx2EICDOaQ6q@t;4W47HK1uH4YMn zA{c2g47-JCZMfkZs&Dz0BRLGi@jQ$FU2xeEJwxj&Hj-U7s0&GPWjCK;gs;& z`-pGS$$`k)IH)hOPzZNO-qnc*rnlBC-f>6Nt}yHK;U#z?0y&Sveb@U`X>x9* z$auo$X*eRNO>iNJ<0mwyNv<~I=1O30^W>P!afJZ~rkE@BYT?2aSt}Y}W?T%@as$;l z`u-YKOLl&uu1dkh7-B1q89_UM)*F2ASX(t`%%8L=xW zq-AdGghDm$35RP&o{vJ0ewEnM^G%s z$8p9KtZ%%67f*=A&Mqe4xr=f}W!s(OlaQ8&cY{YR=JjwIQ#oX_V9CjOQ-g-6wrg@5 zsf(A=Z56D9&T`*RWf3^ISu$|`bt+^wtf`ZAvBTpgu3vaFM1@|JtC7c*17AgPo=BDn z`VEB}&f1PomltF-MIo~+hNMai5ApRHB-@luEG5*ZI#Gu{vIv{L099Ih@)A~d3cd5qyyw{ge77lq~ zp;O#YhW(Ko;F`Owz-?ApAk$&CT#6qFJHStP?Q2Z!#%L`(GNQ1Mmoeq$F&2N_dd?02 zU21HN)SR5(aAaUt4^I2)2)GpX47}VTFu<$z72)a(_gJ>Z4@AI%3Et~u@Z!-|8VDi6 za0!9ma;aO}-)bwa)&pnHzy(c$107}Obs5)ud!mC%*%)d5>5jnK3N9!^ia@RGhYRbA zdfjsyu>>JRcbWMOHiDDXh_%#I=v;vEJd z8qz`FpyIyH$vZ-T1OL}LLq}!orT2GjpOo=7B`)Mujwhm5pG@%8&9HcgRR1a)7LEt9 z67CfO4WimG3iv&G-e4ao zU&HQ+JYUU19>d0h*yH=I?uV|&uweis+i-dDOzngBr9#{jLrDE)(@}&qZo%izH<1+ z?;S+9k8qRejPnBVwnQcB$m) zSf~fs%J?I59&lwPXbRb15D`uD<||^*Jn`pJY&DN|J_6}?C^>cm_XJ-a9+Uh!0pc;a z{pE3RyPK$g_LKjz?U_hfay4B?Xi8Q%1TVkAldXcqrSBoLN zQ_|0154S#5XF)Qv?Y^ts5nZ?aLs~4%(VH#U&76&!HU`e=8jMEn1;e-K?C#m@T%Mu` zK@Eo*Hyl12Jep-5Sg4oR?;AWPz`>oXi5g1DatVd^usA`;llM=$-w>HMljWZPxgo^n%=D zuZ4f53kQ%H@@NGy`k;3Na5BTy(evfK1j#LDBCmS!8jf-pN7DBwtXP3246dXc-7~Ke zreWE+zB(bW#h#V`Ewp7HJFsx1nCwF&8+%p`MUtkWZC=HZok$feCtaW`Adpzuvtdy4n&vnI3fCe?lxaOk2dR7|{B%TyQ+Ltqr9rv>Nc!=AcieMVgp}*@+ME?z1&k>BrU>ZVUfp2#hyo2dVmaw-8l=xFSF9-W=xQU^W89vVoz0Uq{q!5u4a2UKITFgDjDz5mPGHsH0TH3bvC7;45@y2cYsOk=QS{5kJ#LkOqQbY~ zrAsqpI)BDpqpu?JtRtilOTOxllP0IQ&{EM_>R532H{ zj)j9Fwp_V{E){sFs_D?do)3GLvCiiri+x{6t9;#Vb&6CVH>F&~D{ymrBz+M$9 zxpSCcKagmFLoL6=ByT#!#yw4TLqnDo+b?FmF9ZWN*GvMt+4M%F`6 zG>8Th>`_aLY@NX$1Pls^@Z%Z&|E-jk3jCKLgup7w|E1}FI>gUwYWex!lpr%#R~P;FFu=qdsq??U#+v-x~yRRWz`!c=e5oTg%$InT;0$hN_xoK+)Vg%46&M6iitZRvw- z%7M@9;>w0s$4QVqHi@07YF-W6h?J8 zz;niGcNU%-%2i9sm9_Vr9xjF@q77QGj;RPLt$@NS<>#oMV<%lXNpnIZ>G9=hx3a+v z1!4Cx%V#JA^TnrR?uMh^-17M>q}~Ox&Ugl z!)h3co(#D)a24319zNq)oq;5JG+i1rq(7XT7q6kC2!QX~letor->IvWWG1-;EZ=T? z-7Q`}2(l0R-sPz?zvI$AjxoP_eE(U(VRty?vsRPCZs#$e`4|vxsdWWX`ky6)#}9k1 zuEJ7du5O%=`d>Xc@aG(s5T+x<~K^w~G|+u89u2&S}$f%l{4=r3>_%~G* z`Tx_QtE=6=x-|U;EyzZ-8_r72E_Cr_sTeKgxIYvHYY|I^q^>2G* z3H;NmV&b?S&H3BFzio`OcKeUT=6OC~9*#j*VU9)Dx>jN40OC0B$Gcz8wEUy{kFiSK zfb{u>yVrc>?qw@SOK!^?hiXxHt{~|7V>3QYibt6nF2V+f+k6aTtmPlf{*CY7J^=jx$p=jF9=0=n z`^d@1g7teT{9}f3?zQ{s?OqjiKkBBvQWHBM`}Jt<^A4{0-~M&~j{_-R0R8%VMAHtq ziZb)^@-j0sSL1f(Z|px)m;U^-2(H?+yK~N2s|wZ%YUM2Md+<7yB&ckK1SF zwgYk;{sYT@N;t~g`+YcW%dgYYTE{-}5%>M9zk%j;o)`K9wDUZk^Sm}zs~PBW2d?Y7 zPu+(*HF(edGr!e-CQv`m--AA~m3)r58fNA>EyuBj-wRe_X=x|d?>wjcPcc601=;uN zd9<$bP7pJ5eccvEYFS;_pR)m=xbdqa52sy;r>M#E^)5l0q~8}^TLJzB1P6v*)k6uL zzU$>UGwgF4+-Zfhp!hd4p#~HZioP_vNc8@D=+4kAIE=H&jJLP>tNVfM1L6OjVs@Om z+aXtC&^Hv8Aad8C&AAN5JKxu&A&pb#%5lKW^^H+G8=rNGScECgf`OOkI|1T3?|$nO zM(cgn{sh+82g!fO8}@l^=Xv!sKGW^!*S4m=0yhx#Kk=sJvhTMynA!j#e8Bs${j00y zpJmy9Uh8_C=l9nCG1b2T?+59_ao%C$`DfarecJn=2#ZN>6)CZ#kWd)5#Q(r+1a^tJRd&r<4N_u}Fvw+b zG;xtI6lfG=Vv#@#@ITcJi;hkc*XhSYw>;Kv+vUDbAQL>+3R}C zV@J!QZp%Gy-|uWC^~3u|Z$Ge|Gm#6p?Dtt)`R(NYWgY#1pP9MgBl-P<9}xOKyj1^K zLVt&kzghho`Rxx>a_zTD%Rd+y{ee~F7|=(MX+6*Sv(Wx4*8zfa{DbYqhZA9atSP1y z5Bt9v{D;ise?X;F_^X<{6uz@9Y1~cZ#=zzJPR4+3o9{VW+?@!Pygbb}l{Ulb+WHkA z!@l!Z_u}TJ0QwS@t(-2O!cq_xD0I!r*~Xm}HbG{m2ce!pCYLss>__qRg^Pd6 z>EFBmtq_kRj3YuJ73aF+=RJ5xmB#0~E*mu(jHpih*>L46@jy`QcE66ly>icf=9#tM z`=S}S{{az`Newfz_lGAtrzLEBB#{jbynoQt1OEpxHJ?8@$;fFZ@PAP2?><%6?P$l3 zNtl0f9sg$V$L99Iu>&2BkK_@d`8Tln48R|uchX%qvxVMLqcYyba z7p+A57Co1jcyB&f%_~_cc46Pn89w(pNc~o{a%`BD+Z|lE!Ryv=x$6c=nc~*u$6#Yq{ z{Y*dnQ`{-gQ=5_Io`)|C>P>_P;ajwBQJ^LL8z2T%7gi}JBfygP{d*1gh^PDn5N%>4$=k9mF%x=;FRm6 zC3UVAejn%K+QE9{Z%S>IOdtRj%4{1d4PB%YxnfeLqtZ(XJK*cR=*2XoD$FPO?HIHlh+_!N?} zserpfAP9^ft!wOuvY#8JrxCKOe>`KO56@yhda`eUF+VBe=q4KhUw@9FpB&+D z*$GAP7F^I3AL`|e?>9ED%kdL()RBPRd(O!VYbHE>M+!DXxcqd}%(x1TMf&p$1B@eO z2gj*HfWii4*C^OeIX6QjU+Kv5^8;B-%$k>ZF?!MGm###aEze)xs|kbnpc3%~cu z4%mTeX{IahfXSk^7N%L6Ag&#lPscp-ja(0|Vs{{jpu(vCM&OV?kfQz8LV<=j1Hr7^4fIrF7TRF+2}wglT*Wqgx_=JG_+5tjSE2}iBI-Mk zjY__;XCV(Hn8uufmUq)c2+c;&2Z~17?q7d;9piDUqi=hSFjdoL785&?97oz@eX>mV@)j zTdta>N%|$M>9iq#VC>)%p3|yR9}i5~Juk}lb)qU}caRXzaOvsRnbY>rdWQ_FQoiM$ z`wl$v;*{*<8x568_t?3&%J%c5Gz`P&Da+V94W8S~;j2)rReVS9m^g_t2cQqg{_BX6-$d-# zS2wxk!D7dS-$RFeT6wTy$ULr(dy^qwtYFiP;30|^Fyot`)+_!fea!1PBw5Hjp%uR< zx$=6aHuVa)S52^4F5CY?zwK+c&dZy>>P?Uhdpc+Iy;Z`9826e6ZA^>Ir^YBT{DrqQ z2&TQJeqCd1gmE1aKt2A}>09si>3V91n|{f9w#{T+eM*X9#rZ?wt6)&+m#dNlnfcG_b|DCT4kwkIisieuu*p)QZU z0_!n?8#-yZ@pYf~X4rZ2!9hOUm&S#!j?U}jqk?C$EFvV>9WKq-AeLsgPJQzLj%%^wc4 zQTCIKeY;J({lzNea~WP~_Y~TMVBWh6@n~r^-n%qq64Uj^q(p8}G-oF5nXppGapyax z*y}jiOK>(U9NkAcZd{JlzxQr8(a3gmkEowl(AlC=bgWYBu4Xuo>A;k8D?IbZ5L;#q zc5HFhRkn<*rM7<0bRCm_*V57o5pHvP<+rPiq}Lv+>o4;!J8sfq={D_X-?sKSnfX*R z8r@_AI@;gYZyu93esbW5V|i5Xv+Qk)WxDsRZ1m?cD{8l(a(q{*#CpdrPG@t7}ZQZQj_Smp#54!g7V&0S0`%TGBw|cJG5X_kFf+MApyel5~me z%KZ`NQqz|QcFtGqanPzdJI9-i+YZd3tmLXVUaHmeYwp-e;m)h6YX|sO*Kay!qxBFn zd6N7|9aoYk`xiL6&PWW|>ik|7b#!O!x54S#XH}T4n(BzxuM3}_pA@srzrL&Psy|Eh zOc3qZX=XcT4C3(vWXba{ek!L^j88dues& z^Vq3(ZiG5T(_{D49jSue2bLAO z?}=-Gk+R(C%_ecRv0*b#H96c62~s|7AMzq?2Vb%TVcrIL^GPE{PDHM@N&b)q!{?3O zdI2MU`u^?L2GjC6;&&|6vv+x4)3m1>@2sKFMa+J#emX~HOu9(1Y^32haG@%Zo?Dv9 z(!zayIekjJ$yYliGhpVuX%B zlH_o28Dxu0IJSt5b#B%d0<2d1q!f* zYonrLf1OMsU{UDfXbG=6x5kOch4gjFu>)5=UHWqPn)ehH9b(s zlRV_U8Lls#Fq`b`6JvFMHFQzj79)#BxpLH&bgo+ z2fHz{ z#&c%j{^$B4-mO`&g;qp3Ki$FV8?q5i+&PvB&D7foy>g~gq|OOFU$7$EFWBqK{X3s^ zoA<@^`68^OT;7Z0y!m!+Y_1{R?H-iTHF5>+Psu5ZKZotj>1vep!I{#H0_t4UVqak@ zu9J@q`6%ns-ae1oHs?<`_cw^zo4CEmp?fB+wA?Pq(4O0Nc~w1(v95#vVA)55MQH6k zQd@mXU*3y=%q`6tEV&qetqF{HYnps;%j^h}J^Xf_oE^P1YLTqjyy82Lh_?GUa;%XV zr4O?m>;o9Z-`YLiz<0KP*_YL2cb~tg6|~h;y;<_$Xmw#^o_HS|TDf`eQ)yaS+Uvb9 z^iGy6bhJF`hf3NRi65KhE}mUp6=Sb3`L1qYc5(@an~V+aQ_*IC026^TSA$bFB53EX zIak|bbF`ubPviJ*ccp5~TjZJCRlQ@(!P8{TjN?dL*KQs{Hfqz)C+H1u(V|T$Rxt}W zU;p*->i6Dze1%*Ar3zx!H#8}}b4c>-RO6PfX@m{`+<#eW|0QA$df!&EtK&FoF;B05h8YO$f%+VTTWD=!(6f{@G_~Da0Gm?nsxa%>lJRv*N1D2Go8vAh z?6c*EwfnYkTWu{B-0V$1lkesa_v0o^s-rG=XkA=UUSopWerV{)DwRK|ciX(FDrpUV zGXhMy*Xh6yMX$|Qra4Ln0(N|F5wDxD)GhLsUNO4tFV;wt?UHY1jx_^kNxaSw!;LuvpRC|TujL=BDi?1n_!51z z&Cf*7`gDKrGdgxw?u>S~I< z)pC^gDEWTVT<=)$6da#eK4N86#B7R^+%62ZEc&~{K1yxJ+r$u{t&V}bd? z-&tFBut=$Yd9-WxKk z+gFH}=C{cAkUBBN;K$e4&lSg1wl&mPd(|4pGpzTXCz(a31=p=d^|4oRuV>M|yB8z3 z-;`opUP{SSJ1D*cJC!R*)&i=+I|+6z&+pF~>9}7}IIY!`>Q;r@Vh=@zue>lt6kh>K za<&tr|G)XJC2l;GH3 zN}kBp_Rci8^(C=AQ>MYbT$ecb_URp;Nk&u&3mlT|nn1BC<@ta^+Vwn*6a{I-Xx z*KXKClSpRIXgjTK=^l&_w5x*)-!%RCV1taQ!%tmWM>#eMXt~bo-LnQ3>nePL9%-i4 z@2=%*Z|5EB!pznRikZ&sWn}GzA%NC!G3&i0We%g9%cNm*wLrs*OggRw_PseHqFXWB?|T@ z#ySWY{$|Qcg%kvW3=M}_*f@s}P3Rs8W7G)-K?==R`?>@7;J-vIHmJ&LPEad8N#{4<5NiV#|-()@B9Om*?Jw?r*OUJ#2*wW_$_e5*UR!^_4X4cs zz#;^t?~Gj}DCh|`tLy8jK$^g8{b$fAlhr6|M3`PF6t+niIl_A*PP&+IMa#dDyYBp$z6S(N6J*T?(lFt(47 zt8n)&$eM9q5ROg&v#dxn1VubPM&cnke;7*WZ`vQnJBDZR&^8l=PCi7lU!ya3)IN)8 z=|T<&0i95B{webVve&n2LJlL`*6K$(s?_yHS9?fbjLUq62pM{Y^Z4Q%pnBYazq?G0&cX5;r@Q6F_ebZ-;^_l>w9kGi`NtH zsMb1Uwj!`n^<&4N8jhKKOId1;=;j#>II_FK{=w@C=*K~iiBRJt{h)1WacL!N$OkL$ zuk>H`{Sc|mbD+2F=9FsHALOQoK6MV_K|iH)npOc8C;(ghta{s(zv0d+W)Se+)UOlS zW0;+apA;J`;No+E3PCoapPNiZw6V0XWOKiXXgRdF&{eF)8I*@h;-2zMS*~OiWu0o zwN|Nt7l#p1t>oZce_REz(s#DcI#m!j)|j2gI~GsdHFxkH6{1%yT`suR$_uy0w@}&pI_p1<<&^=+>eWIBtphw&1WYe7=r2I;{~7U%%w*9t{sQ^#gYKLkI-K(_gQ=33VX zuvg=Tt2Dsp>D~lmjg>fInBD|)u<~Pu87K-)=Qj`j6&c;n@XUije$kaZW$LYAHf z(eO0`gw*!|Z2W zRX2#LX}Gc5c`(PePoxg73xEeeAs`MlCBxVdM;DGihOX;^$1fw#}k;w10I zFTFdHW@6h_8lwKZu81Dw?C^Tv7)2*bnr3)F?>;@*5m&_0wWl#UxlQ(c%oVNK3)>At z;qx!kJwFfPHY}44SaESmZH%aOD^B1<1@K^=exK!oY817X8H^}AUblG#;Jjq(I*nz6 zvN5xMJyGiz95q;M^e1^KN7BHQm_D_Q1_L7!>z+?{J>GB-L+EonhbATK!AqVGLi?4$ z4QI9+D(;DOplR5|@{F&#(Hj;BH_>Zfv?>$xnYgpovBCy5%q9~wf1!QSZ99g#z3LTQ zcEx?~=HFn-ce+w-3^wEK$Fwd%0D~0NsEG>$PB>VwJ|HkeANXQQ3uN`yXT-Gz1#w-0 znTRMK0On40$>Ux|?e_$}Me!v%oK{Wr_Bv!82_`pX_c(8mI!+c7@>U&f;yhbxIZh{M z*paf~R-Ms()U!|D7+Njy8WXkoiWUS|y?gP(3_^brVXA!sEUpeiMlGzw3(gp+c%I2gR=47WFVVRe~Z&S9YW;-al^uDl!ln{ksU0YZSNyXIYc zIwA9ypR$Ew^<6M%-fDGlS5anKtD;jx%+Wnh07ORRIn(Os2QZTh%Sd6ozMt4*;hV;) z2n)uHXv+UUv=a6sM_(kbPvH~&+x_t$6Hrb#=5kt zgM62|{cOKFf!N<*_9oy&bW@8ygY$#2HPdJAf2PC#_{M#}K|`fU1TR9DV6+WXyeznQ zP;{6V61fG^O*C}4&jH;PR0u)#3xS5dkD6G_UaNz^!>p;98}=hPx~u>Cr`U?IrDAs) z5L2M>S?GKHps?206SIM8qqjzp*Y_-V++`>Fdva}43GJt z&>;f%3B-Bf$_)ub8*$B8C~^!+aqE6_pPkB)LM4`y`!spZ{Gg=+JQXh4fo(eW(!gQ& zl@}tW3Lm08sA2$ttLf#dd*Dp*^Qi)gT%p3=()&potf6Vpx>Ps*_~aBw;U9wB0(L0n zGn|~;$M>Pf@nV?RV3d|4vgcrPa8{qu#*n2L5g*H;$rG`H%JFp?VaoXfpbZ4@x~BZT zzovj^P%2(hcHXSGnlgqC?+T`k(0X0<$x0x;Tq}pNGZU@U8#Sn}5&4_q$~Z&}9Gk?k zo%q%(`-wr9M0BK*E>tQg4;F5FBqD;)3RM%%ac+?AGMbsp?>XGINXm=h*=q4xIwp+s zTJL)$E!O&+^=c?7F&*Pl)(bGN%wtrf)ZB~F)^8VB4d(t!bB@q12ss0;YuG8(&UOOI}WPX5I8G~)yyus+I*r_Bl z1+&c-wR5y$Yyzh3a7asP!%(`O{Ps?t+7)3rDK-6G5~?$sbpq!=@!(Kc%Wo`dN`)PjX?&Se6+J{uvmY8*JTi9YXDDlQ#!@$@ zDg0tTM?+*IGO*;n7){m(bs==g&g_br?5k#jak+3b2;yrh*=R+HLmrq>72DWz1+Jk z&Fo#{wio{b{FVz|)%m%VTcc!E?ARNch<}!nR`PE*UvRw<7#{b{+JNbBO@Qk&*`sF4yzfbM>&o%t#77LsBUkmzw zc5&2lR6QyfM$ARdFis#6*C)VNxV^keSUV?~(VCVw(zo@c)}q{Qh_LU%hW+3XiX{$B zBK)HcNf)}?td%hGr=q8ZpT2HbS|mjALDc7jr=-@x9_u&0T>eYZD$!t_6o>hU_(@l9 zpgRy23Hx3!SgE{HS`Z5DmnNKSN&2oFCpFf{z%Ek)fdPXlA{|gT!|LvgoPM2JP~sK^ z>X}N1r5@esUSP+w+blMWoZQwj7*sDA2Vmba?mbK+JXki~tY60hjS;>{k@H>*`(YpCLsW&Jer+koN%Dg2v|Pgr0P-GdR@U^p z!T)|xM{Z}XzX0O+V|1&Be~zzZ#<}QsufMrJl>h+XzK`?h{oFa{_oh(*z^_4-zn?(& zYf!BKz+bg+eg)XQ*ZMpC3Ndc|7Xp6^0T#Qzl?wD8oqzIQy7{Z-yIR}d(YR_ocD}v6 z1^r750KCgBEiKK@GT!04I{I~wGuJ=$Dc|vaB;J2jrM&#_rvG=YsO81{DBGBk&btXs1AYR9_EgOgsZI(?GVQ&RYS_?R>z#31*R!TID5;M(!9RXr zfBNtYE$=!6-(NWi4G%xx&0z{1m|#Ay2icTeYapOF`#IvSy*b?rEAYU+!DL)CL1PNW1_5KTR^5A3M4=s?krgo^Lc$}UXd}740sK)vwU0%~~e%0Pg zpcBv1pEj@GOc%>z-~C3v0st`gf)H;mt+%f`ty`I0YBsN1`u{9<5XXEaj>}Y)NAmeB zsG3z5)!t70>!ZToi8{22dcQgQH1v-T{(n4>{}U3tvSL?Oj$N41^!Pfn)9fS?Ql|k0 zdkCBb2EO`%rgVL|R*%x#44O)B=o{N+uDd-M`tP*uO0^061H7MUKBnp%$U&d#NCKo~blcu(A@Ak}i1!B7S@a z0B?A5=k#-L;s=f9=JcPZN6qEt03dt-ar|+w!oHOG%0{#tmI=$INN&i#GSy58B&7N> zuN)!y6bt}1^L0p-4`S4l)EYd#E3cKMr-lUDMF=XXll#D_&G|=`f2rTAj|cLllDSt* z8p#_`?ZC&zk*DuM={9%qaa(dtSZbr8O}RngFN>#8ax*TQl+7D+pRXDdpQ81Ju|0ZJ zDLA8fcW~4b$v|i)FWxrEN=v^J?F3GH3RnagyZb%6xWVDdcEjhf=yvQpt;|(%BNws5 zq7hL925N>K9D+-tkOx<6lf@9@hwvkSsMK6wvsz=5g|v4!ZQg9zG+Tt`?ACd=6L`h~ zNjGoTrad0eN{|GR`Cm*r@8)FkP;aX8eFpy^;su(x4Et3#>}JE?>>>19vzG==4`--F zKnBR^hwYA;@r7fePjplr7J8Pm&eVOQ!Zj^ZWh#?lWjy}*h2-Sw5yhzsQ;qv8>5=@4 zc9{KVX+cfaAE~UhysStBZdvCMd)^mEoecH~J&`aTw?`2%nzSQk7oc(SN4n|uE66WT z+4{9;sN`FL`8fxCaa>=pQddBI%=0twM$b&0n!#3#ZK)?YD1ySLlD<1$<1w6NoIJpv zy!b!x9J^}HS++emR^9RRxGVi>QN;glSby_CzuYC}D^J9bAJrm|bd3Ps+GWyud<6M? z(c-m7TftYlpCHaKEU|GDu4B{X8@;7-GA@$B96a6U9A&<=IE>#0CLC7_`{Xz^8;oJs1mW#%jxHr@aN!{k#HfDM*E-3?a zMM4Nyd-YipMd*PE3H5yGdLZL}@bgRaNsU>OgS5?1w&53Lsw~_tPebAh3aNiX3#KPv zLkc7TsX%MZr4v{v=TCYDF?!?DP#J*(#fqZ(9vN@i?0|HZrJbv$LYppGk$eNlyy^EE3XK@Ic$7@L- zjXkAS08Czr=ZfyfGf;;ZEmXvP1O+_XHG3{0BBUQb0YZ@A=e5nBB4a0J^2mvj&Wtjd z*?8&aq65S}BE&vl?bOi)Hp|*Nny5pw_uom@{pc&xT+A=_kL!YtyDf`$yT2~~kbU=T zeQzc-VW$oQiP7O*I&cfdT}BZ*wu9&{I%dm=Kz}=}t*0IPFl0xnb|h5#Cxqn$-r4mGyCyO>_5Agky6jX?nOawf?LkV?-vOP30q4 zXh1&>@6Oo?cCZN|x@!2-J$=)x#M$VXtL>?^6I5}=LRxx$1tLOWk37V^9YLxg!dn4t zM-XnJS927HJ(Vr=0bOH1Dk zbZaY|r#4lX*^iez@=w=IH#r_R&5m zSO^@W*&GtTES48{BxXi{4SZl7jx_jK63Y&I)mDqi8Wh4gC=>HOr-7!&nPcvQpKaI8^HK>OS zlk2Mx%1vDaSfhICKuZ}Z?gas?9N9PP&0s7GFUn{z^q47FRKaD-RfQ`{l+_zyQ?t}j z{1#Ad2S=W`ZFBPACs!Qt(}SLC79rz$t$Z>F;l-rqzH{K5a6T!JV6P&Q>g=>x23GIA zen){r{bpuz_}<6v$|;g7PomEH6T^5y$DNIoUTQ>IDens=TqfcyUAc$@?Fc9_r97c2|KU*joF(dC#Fa6==g-Q{v5 ze)jAq*|Yu zWmQ|j%0LQZqx~x*tl4E1=Zv#t#>Nb)^H)#p(X2cH8v%3vc>q!5?fDMN@M_N9fkXjQ z?5UC2DgQ-wo5Q?*m@7Dq4}X;%3!9!Y#B1hU90}6?TTI5s%8|aAn}aW(>u!hKZ5Bwz zQq0eB3@7L?Ow~7LOwk&x#kCIzQM}XX^RaHv8iuHHy25W~-aYb{TdJK|{pp^+E?$J@ zmKBA5wu}Ak!Uttl`;7{}Ys@LzsTKdr-QWG&w4{kzKH-;l|KZ|Hv+C&NLx1@FFOohs z3jpw>cK|?W0f1oufB^u2`Rjs~hxjk{A94W0hd&+y-^7UScy`l!VeS-fY1!KK9Nnkv z2C?p}UPmOu6ISf?PVlFEJD0i<5HXxa1&~#t-cKP78Ib)zd#kTFK$F~o;4-AgB|k^v z2Q$x-+={+VDw==acFI$e1bSkFiS$)QaP!H6WFKLX`UMEph0}b$Xu9snmFhBtdj8Wk z5+CtH~n5>Q8w%vw_j%LFM~5x^ovMp~^y|zIk2hd>#t`#z32p6Q3$; z=ar*!K>6~zUWZpcYl~YmWYl!G>kQPdV(B3pO)EkZJNNTl^cTqg&(@TD^xjziFH=Jow@(oI zS@o}H&%4G{fVz#SP20|RI$t#px!{QOS02ZLxybU$3O+fsu^bhc_L8=B8(wm{iQ&op zWPIIf?|>EPyjo_jj9l7z{(~WJ#OHTW-P#PDWMh9ye(bCxdjnl9`&-rch6_k%d%sCUkP!loShU!5WGeUcK8 zE=vd$)ruNT4ZN=1emy!rKoP2Fj*9&s#yV@(Hyx=jHW6YVTM-+`xj_706<|#}gXZP*; zY-84F7vW}~#i%FZiGL-K1iyc!fMTj?0N{c{5{r8JH008z$6TQ>* zyy#P=fZEuBx&H!IN_&yTolaqb4D5#hv2HZ+R zm*h(sM!nClVYs%FcZh=;H>4MdMs{m^XVeF%i3iVoq;1tm9TJnahE44NS*KMMAx7+k z)}&X!G5n>_`|ciM44O)ag{~(IiZ&zB?4xl2va>Epv-6aePrumXGH`kt@DS!JLH||Z!?3pf7p3<$iROselj2M zW9>Owd9?fX#s_ROu1H(Uw5Ai@%FgB=`%+&Fmgb)Y?3%x9@hVR4@lIjbVz;51z;3IF zRoShnVW(N1fKugq%KnhT$$>ZX31^6^Ic@QeI>}uEZ>BkLAjIblqK@`P0E>7XyfYl0 zap3O+_Vhg1#&i*_n8Dj_wAeZ_{oD`aWJVUxo^@fn{Z{Nkn9mxn!(=F2?z( zK6Q}f*lwumy>v>rV4Qq@-UX6~FmE!X8}8rPIefMxm`^c?CEc?St2^4aMG%ebg_-)c zAPRLUo#9CQ?;oO>!P_nBzTJLH^_RM*soZ0>3yAjb7%F_kg$2o^9-z~08*`x*_-|W2hIIZ1X(UhVSeW=ZIiZM#Z0W<*-_G}8w(xvgbKbFg`DoA zX?S+AML9w9r>2S$yYDyJ7TUz8V^m- z%P$UxUk|Cl?ewbK=4c^cV-~c-wnMJP%}9q6p(_$m=YVOTmzreW5AW||))S{g<**3N zTM*z$qqc&V&c2VYb*6Fe-d{j;Ui`!YH#<_1L-&N=+e&!9Gr%jLd$dG(Dm*}AO|+9^ z%wJbq>9;4_DO3Sw*%FjeE|C-8izfj*sB~55y?tumvT8%?O9VbHdqaiYALZG@q_{C+ z0P$F8%T=e=1*E_=6*TB>kID+!S0!!kq427n)B3Q?uLbNNH+oFl2=X&fqL`%lmIopF z6j~#qdgdpV*F`O68V*U$M!SvW{QwvVq@uB>i+^Y(!6W2u*N)*rS9z6d-%jZFTmM-3 zH$XQ=b0KT|o!?}~GZ#`YOEVuvV~rUrZNaXr6otIQZKL81fCC(~fZ$6yn9`#{{eBtm zLr{IQh+ab`v4GCn4s(zN#f)=qgQ>e8Dz?wG87f{MSv#myWpNs!v(60yh%lQS5Mj$J zs&$k;p~fp;!?b1e**q_h90Lt7VAqj}^ewhJtjzIAW-S@KoiO~mXCwX8S)>%Dhe6Gh zxy!y$w`!zNaq1$LRUYsIhk-Af#n42E+H1W$V4gG@(rtQaQgo%|pew!~v4i?eRQvs1 z2#}CGDnvR_|2YTDZmF;YG&~S=g8;4C&~VNs4xyNAEzH(7I4mNlnBM=ooE0Z);DQIF z9{Za}821d`l;oW)e()i2EHz%HOy5)e18D?4Se7KAc!dfe^IXdVpqEX^9(2}{3Wxob zyUy-a#*P*LB_0@_jT6L{;XWOnJw*Q%irH>(d{P{0tlSyM63-*3|E2ap6E%;`iiA^r zYUa`E*po;uT9*5ku3`UMG2q5h2AC%L)Io~7!|ph|Iq#>((7 z=W((SbNBUZvh2XweUSMg+Yb{04zN0ex%hH|32XiHktjcUr@q`g5$8jjEGiJ;)>5U} zh?Zj3wo}hV8R$JOZpH)9;09wFgZK<78AiM3Sx+y~-xO zG^)2vJ_DiWZSGT3Q?*U`6p>=1XQ*V~fY~Se5Kkm51x?SZQ`dvdNkxSisiq1g1l*$l z+p3<65iT0W&4wqK1>e=EnCw}}6r+n9)pF{f5mqhaNYC>NqD?)Q@LfrQwohoVnd9RZgVc+|WucB-79ij|z zd@h5a(^(Xp-&@L`_EX`#kWg^2QzpLkd!z5OO{-&eKDR2xUevN(OK-+)lvd=~EPrv1 z-+7~RD4|PsL>cW#dGf=9DvuKE>P5>SL7x<_K0Y*L;sDrcSt3Iq2y}^*5EZ6j$EM|KhZ;{ z1iSin$`H=IrGz4h-vzH=`JM{-Ve0o~O{f7!-WUG4U-7O~Qy|)!(X`Z`kw)l*jkXxiNOjOK2W18gu0+mOxqsrY5^l zA``NkCw}PA{0Lv#1cFgQFIj7_z=g8z{aGcWs;+mg0+QnJIKuop&fYlE|NsI07{GjJehFqRg#QMrU_envXrl zF)c~8Rg;d>1@s0J>bbGPHr+0r-byy!lJ#?)Ka4>dV{1`uzeKzonQz?|o?(0x*AZem zH9R`qs41nCL-z!}R=M&10*q0Qdg1Q2Hpri)U9OPCd0Q6TJK2T%^XQHqACg^vBgkv) zF^|Fmk6YKat@;}^s5H`RGauicvQnRGbs`WW{*fT?XurbI8tLlT4b?Q^I%q3rcfFX3 zg_b*rD8Jx@$9OQTt_!o@ZgnITNe1hH1RE6cKCkm*IF$6R{OrT-t-bzM$5*zm2o@47?7*tBy-=seJ(bK~$DGYrKyyevJ8glRP6+C0)%vgsfH;vN-uoNtH zI|pmp{J`I97QR8p5qq$l_+(XDO)a~ZDZV;hI+D1{yl2vOZ9mvjHdaJP(b9D!!5Nvf9s)Sh9&*EVmPGWeHYyNs|;_()T^* zZoEV>n=PQKLD%7I&iOP`rd}k-%vlY*(@l{v&S73!wC)OEc^@3%i#gX*efAKd%U;cd z0(2!#8GuBceoB*VHNkWgRQt;_S^Km2VxGdrc6iQ79_8UQaC-%T*QO2OL71U1wJ?)g z(1oF^SjiJj`5dlcu8-XIP}|$sR%O=tIwn=j@eV%<0g9L-A%)X~{5XK9-rP`k@zjAH zzfU6BUOgQk)5O#rLmNVkLNgC9U)bZ2U+9meK6=={YH4;6k^eBW)ZVtAkgIIZFRnzI zi^n3~lXE$#^R|i#i_TB>o_F$LFZCpyYm`QdBr{!Ccp)n9=E@N~_z3@^+22BWj^t9^ zt=K6hUvZ#a((>J6kcCQ9YP}dY#Ddqwy(*Luok*0nZs)!|!Vh@yao-Apypf9|Q9X(N zAtvCK_)VTHLl;_#$AEW;)>ccOmM!jOMrk7B!=Nz1U}uI8!`%@}>Q2^ADwrcS{BE-! zBCb-SuD zt(Nwh(qj&?Ts*7gDcPWUVJt0ZEV3=R<`H3rg@qD?M>@8EOUN~{MlcoKRoIAsFF4VV z`KN1Q%3ynVKXL;#8(AbWaEpm4EpRT_+h@UWIL_M(4Vp#mj{qxPbT$>ljl}ao zd2Tl+D&jisn*e7^Zk~ckB%r_qQ(5~+x1Zo%05zZLC*Jl;5bXgk>#n5qkGz_j`dKUD zm({xGhMw{LGH7%F8t-Q`EwFs?g!tyAjO5Ja^Xml9Zm6fvHN!`#Iwo|bA6~qUCA32D z$cQ@(GvRj)D^a_NSAEz`lL0*}X2WDW9S!E_8p2@?E%E7Rppk|0o>m$6ewUkd*+1_C zmzg~XE7&ANjjBH@(4w#neV9!V2D6#(|D;nT(AgW5(lUC@K2`Ju=XBI>hwNft6T@VK z3N$tCTH@h-pDRH!UX}8ZzMiwEv^S>=o1$fGBdJFkK@PwpMqf^14P_jExfzVKgMzo67Fb=|hy`J}{9C;`A2^gh(bC zf?cJcRs|QG?hJG4qMb5N(8OWL#8B(H!{M#+0`O*2evP5QI~vC_db}=+PNsQ#_2w;Rh4)*OuTKs zN}h2u5}JU13p1uinccY- z7~ddgsB4W3cTifNbPbQhjgZ%DV@xvp(m6;N;wZ2he`b#!p|Ac%@67&I0z8ZCEg|7? zm^P7zNu=erRjhtV6$2+t#x7u?w@fKa#29nygt9fKP-@m{ z0N1#phm~=i(jlvXIf6jq5N?Qg8l|rD|uGzMHf(7eDryj*04#)B3&r2;Uj82_nKu; z<^*WyJ@6tW?v6jJ2ndl^;#9qG8xS2$os1;2$OIY17kz;~xNXIho>tUYv?(QnS9Dm> zffpaaOoh~6qg%^jnI93D2eLOojz&krBN z8Y`&5BCH%C^Lz}jq^0v@g;LpNHOo@58cUGBT|sUkC(j^9Sh5_&7;M)vQ&lT`gO;Ga zGu57)ucYqk4qZFA)bbG4>tKb&^9xy0Lf*a6t)?bpb_GJdwVxfwe@Ygb36@I5sy1xC2985_Bny~&$u8h?;rE?QpI zoenlzo#E@b+gVZmQ4JMXtJf0SnwUS^AOx=ANf;2Sr3zy@HBG$RP`GT5gbB3U!RF2#J~;TpDLDoog@m1tM|wXD3x8L6L6;C6xhu$zw51snju_;3?($ z2YVS#I^0w*1F&Cf96V%Z)4}^vVLjNU=x^3RbRCYjDAVYM!@e~jiN2uVhB1+Am4h&@ zuc~eJNzGp&r>FHF=Ru9wIMs-raBQOYaST0EYkqdXwXP1l?`n*8&SD2AFsPqK)lhm} zZ#@ldOy4#wtD6^v0{i@3e#NN(Bpu0$F0MPcXAG{Qd8t`BLLR((R$3ZW@rj~54UK!D zi4uZUEL@Tb@u%zU3GixOdsvd}%GdQO_AUJkDbbB*x%*qKTa&O=xruA7k1}$(Upbjr z5AKHn967qlDQHvfk&RqB&W=|Q8o>y2o+WR&!~}8rKO@z6p-MR=P3XU%rJM^*(;+w| z&r?(qzX_I($nbV28&sMKONt}}3D&pf8%NgK$!tbdw+z4wQ+a8A)Wv-{3~|n3O!2sF z-gQu3^G5f8JURs)rOWy7qdSq}r6|Zt4dW|cOJ+5dke4#TEgjZS>+Pcv{t8T;b)_7%JRj<#Afqz38j zip}z1?xhgsO-Fv(?sI`Hu8Tvtd$?@Ym#57)`C-bh+U-P}LYVZdX2^<98$)wgn)hgS zjweH3)o}FWm&7O31JVWJRpq?H-SAfI9GJ~4#qiM~OWh8(JB)YrdnvnF_$R&wH95W= zf2g?+5`eb<`bH;SVc|rRV!w=O93QYGMI-;cxxn9L)A1$tLrznZ0f;P3Gd}MO4{gSp zutb^m%sEFGDkxilZQ@=83C(`eg=Qx0fMn2)wW-N8FUzV7Qg&uWZZc@bn)Ji=CFr+u zio-79k=C1~UQveDF$`;p@t3jQ-fv|Td4{5r2HpYC>TG+|sxE4R;1+gur(#EaNJHl! zSkj+8K&@6u9G@pKqpO_kwI8mVO?1l{x(jd^KwZ#R67F1fE2$}F; z7wZsdiwq}o`E&aUxr{MSq@A7DY9|D(?kNx^Xx++yB&wZMb%pMGo~;?mnaXWOuiZ+! z)Z?=^%=W-!pUfC{qTrT%&>tI&dHG*ibRWEjMrKqr$Xp)6z@<#*w4)0W(_IP&sQmL* zYFsR{3`7jEA8)|*c15>YZJ@b$S4S*mT@+i`nM>gr8S>pGPsY8Mu3YuPk0W@HD^!d3 z22*Xz+DaqzDR2f<)B8Z_DiXt+cnhc^Q78lY8ampmtPgn2-}k}a znBvf!=Pr(Bb}y`MBmX?BZ_W?t+ImJf;-h^3%w!KI3)Llyq)5pQ$bG3Bl{ISChG()?~kXqC88xjn2 zvK)(3`rah(!YhoM&kGw@k>o31*O=?;QX88wyL{dj&jifon#+%@{d3EyJ-qv(y)c?` zI;g%USl5>#m>n} zTvm-@lRYO`EPHSG85>t+O{h9?)8Lvq?YHU5UY?V0iE-OQVYy2}W6>f$7~A(AVl_kc z!YUUf{a8H5zn#kT}rdu+rT>umtovglvMKd(blv!S8-w%u`y06{hUvHW$E^uy3R1f zZ;40k?C0ik#YZglawS6!z)r#}-4lqBQ1nsA{_+e)C2c>Osr&o40hzk>=F$6|yNa@P zlRfmXkmu$XW@p57Wk^FlL7koK)r=61XKDyFA4ZP{a<$y!QpGSe1tf9Dyl7D$$71g= ziM_3+M4m#9o?{SEOt?sgXdl8%ausBZ8`IzBr8$FLv}nP#7vqhE6~xR1$T+U%HNSHV zwkUscrSFKCGJy1-Ww>S4;rMu?N78%P)pQ^;@&)y_k$}lGe{|H*+!n1vQ?L!J%dAi#v&jjq>E*SK2f4yBtl}d zTu2s}>ZX1VR;`{TkB!D>sK@r)Rm5R#BU-KEx>twT%gGx(z))a1ZWk+cz+*%FoP1$O zQbseZYPv%81;pO#g#O;b#hoo(fkVXIK0uS@+)tylW@O}?E7S@o-SD`;-;TnZttO828m%pitA6GM0C5+Cv#Z}sbz)wMKuA`WfH zH$4k_H5xIp@=M*X+cP7s!`=waZRc6{HYKRaE~FM0zG>N*PP7b^sX4|u%YnLL)ZJ61 zz;&8)UN6W=WdSHsJO}A(ZUXQBmL$24v*Q>)4Qt3im&w~U%Tz;=wOpqsqC4$of5#2~ zAVyzBF>{SU@P?Vk@-yn}?Gv5Baq5>*ng_HkJn)u2OP%~k*t-jUs!-5TDrn0}B2VTR zsYQeWp=IR&{cnEmA0tP~DDJ#c*fXCk2>BrF*HI?XKj@b~HopMJAsA-hvKG$;PYb(P z$HE5yXAnVhQ0AAOd{g&ozAkL=-ybNktWF_sphv=9p-wF8FFV@aK4c{H{PIB9Qdwpa zc7yu^aiAQG&XMY-Vm^m@BZE+5*Ih&4<`#KQG$|Q+*G#KVzu#}iYhzjjytUyhC*yE8 z!&p6cV#mx2(h4)(@<;jvRRf&jm)2V}g%V=r%`G7_yB2gJNje!pTm6mbuMsT+vkKpP zx8WWev5BLxNiqy0^=X@E5IQm}U{SOvL>PR%(?(C{;-qf&g;N6&u}L5C-&f5K=Cw#) z=Oygu5^jg?TNNm(23NQ^xr49+d{aEU6sBDp#BM|uCnR`7^m@Q~-j~0+J*whBtmZP& zYcM=+X{wSWMg6F@+TQ4WkvjdQq3Ew~uXd|0)d9>4Y9jOwNY;W*9L8vjG zvr+;c>iwTBqdGc zak$C$wh_F_6j75`8Vtk2=!D{1s0@9R5O8HGztL`c$rPSOM2I z_4~9K93hj%J>M)Q{92J#Wr}i^w=4E%Rs5?nj&88?>0@t90}!ifHmzu3MX*ug)}h&8 zof;f3FLJD|GU1|q(tu}!XlRYGr#(jIx}NzPd`XT~r1*YJ5`D=Tr%T@yMWAa*U_{l) zU`f%unt=U~C|~npkQYm~Swa zk*))UCS)~1ezB_XJ*`I5%6*`+EPnP$-4%~7K(p0XDpG2ymw}opW!?>b^4|XKEK0LQ z>HVCALUf%NdWh%9o|>LaNvHzziusLVI{wn0M_$W2lm;0sxYcp1ZObheX)K?u*D7oG z=%`!v%}i9J&n~d%ivzHXSo?IQu4nRfH8M z#7|~N51oz4<{~7(i_uah;JwqtpjX`~3d_%~Oznf8qy0s9fT~J|RcOz>)N=XzNu5Ah z8nFtf?<=CD_lMairJidOJhw}vmX-7cm(#_+O$Z4vR;^)Q!F+~=*^s7rlnaKz>?Z(> zacShgTy9iY|2$e0$5|1iY`FF(@>p%RScW4oe4>2`xLXjB={&J{jL{gf!YIhtv|GTq z|DiR?uHHd6eYnRr>B#9!;$p^;68lR00SBf9@ z?X}^p^^S8I-Bf1QsPeM7$$CWBr#j^$*%Pz`9y`XE{YJo-E9J~}36kOGf}&LB>>VDN z*g_H1F1J)c(COUfT2@sS$fyFP*a4az2Sv=#!I7nVf|}D>{lc3$l)hB<_NdpOf!4#g zLzi@rVy>RBJfE&s0pi9dYn!dd1N+lw@Ln3-dULKpJ?{~fjJZhgwCJ1Sh#e*5OC6ug5w8}w@|%H%VB$mS;c8ew@lje2}UA z)vehqdb8xA;5Zxm3EWwViYG@Z3BBvS@|czk7>4p+K_v>gqj4I50P$+>(b=S&BY=yQ z7y9d3?R!~#|FT8mW-+QvUiG=OvgWw=hEQ7W8}Osxr6X-tkTGHO``5#_t=41lK?!!~9xOL>k*6=B)%G0y(O zQT6kQw&TH3NxU;X3_clDpIPV&@~Gx8B=t8$b(D+Lb~k)633a>-OdwED7%1cyC3si| zVCwdXy{ubx=GwjCc=OUu<={+g(CK#UIuE$Rv9vfQyJhnk^nO6dww0q1UVzJwhA+>l z8TyZznKHRJQD|~PS6M#Ig(!+mZH10{gTZj-{8xw$s2{+O=KH$^;a1S0v|#}|m?n=TYabKl!{qxWf6#ucr3!k3 z0&k(XW0G5sU!I{?zcLBjtneImixN<{4bzcT2KQKFRG@aXqV{5YKyL&~yn}trqF@W_ zqNtouN1gLy*?ObAs}fWve`uoRFekl+MrEtnxC>AOJU!RKg78pwIT<(ks){ik>NfAX zVY8OSeH3el(Jk)SupxytWW-|Js>A1QmbAa2)rB?$kT`21;~L0`#7AMAwF~Mx7KdwV zT5CAEO^ry~ws-XD>EflPKN2Nn)r)n)ukx~$FQLbekS7k~y5#QnIN?dJZ&y>FwJH~# z_MSw22NAHfd9^zp-Y+`E*x&(((kG z(d~|&iid%M4(NZ=dG3H1yt^bXlUb?O*bc}vUDN2DvxDlN(KgVo0CE);h*v1RL+v~Y zEVOP}Z6(W6-|J2w`Di}j1ovtOli&_mUjsE83&SzIwMJx@$0-9HH945-;vwl8SlNJZ z0}2wQ{S33A5WA5fBo=g{d+Fwm(>7C$}M=K+rdMn0SA=R@OH@S!|@ z3=QKsnKMM!vMs$xpJq>Yw#prLhxZEP9HGt@6gZOqQFw@%Ft=Z-Z=EhGFg~^@5O*c>&cA8LiXb%(4Tv~?> zXd4>pjTFxO%feOwOjC}qpx_H!y!Eev8rha5f~zMEKR)a&YgBwE!bLz}P&XcG7c#xc=oMe;0YFqJ)vHgL2o zWpk{i0aE1oF+{b$h#hWz1)uJHRC=2jnQj?#>%{AY?+a03%r({Ja@ksVzjC6zyFwc5 z_n2M2(QCNb+QU*;uFN!X52YrlK0!+xV}eSM(=EW{BM493_~M})X*HN`T72ygeEzYn zW8xY!M0H6nVivu3`F-dZTA}q?jN_K$d?o_WF}|rdHMmQFxFJbwcO;VG_DCyaL>?nw zTy*u98m6}E_y1Dphk7V494CLSo3ct$Gf@N~OI`cdMOvC?Nu!bow*C zbOY!Di@7-i0|w#Ts6w3Q8R-K(X_P4<4olOgi}*Y{enQojO}AHP9)J^PYCjt$4^zE9 zUo^ER4XMngJgAPOjQdtLoD>XCpZ*mj8hjKMi@7}9&R4bz1VVc$b;53(aat3XbCnF> z!ofx@IJ7uJj(Z6l?yPA*m+$+3n&QecJ$OeJgfkwyL3*}>gHd&tkR%+OyHr*TUp8sj zN{@BTmrh4=b46~~%lsHAjapxXSLF4;~O zzv+!_2hZkP@>G=)csS$~grVBPd^C;eN~3=!&Q$7hU%F^5nZR3a&N+a-o1GdLEvcFx zPP=WnhdJX`%oZw~U){ZP5mR<|pHN(AUA=s{kJo+q6T%rWiMGEi4Od5-v9Sz>!HNa{ zLH+7sizu5z#lV&(QS|+Mn&30j{Y-rMfu=PkQ~hq*T1+h#q8z3Nz*>rKXe~r@(lIH> zsVCd4-P>2+&jhSqd%9T?@z`iG5s`OOLzh1;;@Kl0N9C4CPX@hLkDvVugyFsWWCtR$ z0^P*I*k=NYxW^@298}f0sTF0rZk;q264e~)Ei4m7kF_VIsM@owXD)I=lbsudD?U_r z@itS*aw!7djSCTy zx0aVy6jUG#dUqe-fFU%n?ssK?K91X@}ov9!$2zD zFPuf9H0FubMpguiLrqGap zI+39iVgc&8+L$8<30!3|A4D7doaY!H#{GH%+OKK&GWnGeXpaPIriEwv5j_iL^do8w zbuQz$_!NR1j}+qyeZ0^?nKB%%RQ#in4u|2#yNN+$?FSlOqMjGV2-UtE9PQF!rGUUwMIn)u~(bezu`Lw)yx?A3o7C zmMcwiJyw6*(O;Io%oBXC?}=~rnr}Q15pb)ZrObn|-Nj`Ho<8WdDDj);Xw*G{0?tF( zwDxu?^RfgjxI^es<}X*vWziuEpK!A!j)UpuC8rTGI=}Nqd%W+85*wOYrA}#}5hq~H z{9d?%3KqpHisZlv-F)8pF;Izz)8IX_(VRr@1HD*hUQ=e{qUNcblNcg?!ZgK!Cuwa~ z7#cW(F_Zyww)>gi-jkNoJ+i4}8$i0ntSSp1OjnJuUzdp4f_S`c$_XM2SmPwnJb zRMDJFj>}V0^J4x)V)8-}Z^)e_j1y}W^C>)u^rMBoa>*x!om#=sW`0Ws+TQwl)?_XbIzkm9x<$d^Q_}}DTpYXE zRCzl~Y-SR6m|=#x(V6@#l`r3<7~oC2+<>{ahj{Nb@*s`}AF$SLgOr&^>e8uneLlKQ#hH2H7BrD;zUDuY0_L{D+mhOYdv$Cp`mc_gw*N70u(@ z1b{Lo>>6Gt?F0wWktS8IO0#gEW=L;cm%oCw92Dz6YRwb|8B^Eu!Pah;UlG3`%EE%v zCFtzpeuP_Qh4UAh9m$peJ9r(MVBWfQPAZB8+A%Hg37JVbj$mYZ&=ROjvto@acP}(4 z!j5Z?hYRJPjJ^=_Y5CEs9@U!gWhU~>V8VgM?p7@D+oUWrab+g<(M-ysTOi%;hD-Hj4uXh eT`EJw+IJ<4n&ZR=JK*>K;_gVN3K9yo(H)>C7((6v diff --git a/R/wwinference-package.R b/R/wwinference-package.R index e0e93041..e6be48c4 100644 --- a/R/wwinference-package.R +++ b/R/wwinference-package.R @@ -4,13 +4,13 @@ #' @importFrom lubridate ymd #' @importFrom tidybayes spread_draws stat_halfeye stat_slab #' @importFrom dplyr filter left_join select pull distinct mutate as_tibble -#' rename ungroup arrange row_number group_by +#' rename ungroup arrange row_number group_by lead lag #' @importFrom tidyr pivot_wider pivot_longer #' @importFrom ggplot2 ggplot facet_wrap geom_line geom_hline geom_point #' geom_bar theme scale_y_continuous scale_colour_discrete scale_fill_discrete #' geom_ribbon scale_x_date facet_grid geom_vline labs aes #' @importFrom cmdstanr cmdstan_model -#' @importFrom posterior subset_draws +#' @importFrom posterior subset_draws as_draws_list #' @importFrom fs path_package #' @importFrom rlang sym NULL diff --git a/R/wwinference.R b/R/wwinference.R index 98e0daba..3528684f 100644 --- a/R/wwinference.R +++ b/R/wwinference.R @@ -27,6 +27,10 @@ #' user based on the date they are producing a forecast #' @param mcmc_options The MCMC parameters as defined using #' `get_mcmc_options()`. +#' @param spec_inits Boolean indicating whether or not to specify the +#' initialization of the sampler, default is `TRUE`, meaning that the +#' initialization lists will be passed to the `cmdstanr::sample()` +#' function #' @param compiled_model The pre-compiled model as defined using #' `compile_model()` #' @@ -62,6 +66,7 @@ wwinference <- function(ww_data, "2023-12-06" ), mcmc_options = wwinference::get_mcmc_options(), + spec_inits = TRUE, compiled_model = wwinference::compile_model()) { # Check that data is compatible with specifications check_date(ww_data, model_spec$forecast_date) @@ -90,17 +95,31 @@ wwinference <- function(ww_data, fit_model <- function(compiled_model, standata, model_spec, - init_lists) { - fit <- compiled_model$sample( - data = stan_data, - init = init_lists, - seed = mcmc_options$seed, - iter_sampling = mcmc_options$iter_sampling, - iter_warmup = mcmc_options$iter_warmup, - max_treedepth = mcmc_options$max_treedepth, - chains = mcmc_options$n_chains, - parallel_chains = mcmc_options$n_chains - ) + init_lists, + spec_inits) { + if (isTRUE(spec_inits)) { + fit <- compiled_model$sample( + data = stan_data, + init = init_lists, + seed = mcmc_options$seed, + iter_sampling = mcmc_options$iter_sampling, + iter_warmup = mcmc_options$iter_warmup, + max_treedepth = mcmc_options$max_treedepth, + chains = mcmc_options$n_chains, + parallel_chains = mcmc_options$n_chains + ) + } else { + fit <- compiled_model$sample( + data = stan_data, + seed = mcmc_options$seed, + iter_sampling = mcmc_options$iter_sampling, + iter_warmup = mcmc_options$iter_warmup, + max_treedepth = mcmc_options$max_treedepth, + chains = mcmc_options$n_chains, + parallel_chains = mcmc_options$n_chains + ) + } + # print(fit) # nolint return(fit) } @@ -113,7 +132,8 @@ wwinference <- function(ww_data, compiled_model, standata, model_spec, - init_lists + init_lists, + spec_inits ) if (!is.null(fit$error)) { # If the model errors, return a list with the diff --git a/data-raw/test_data.R b/data-raw/test_data.R index dbfe9ec2..0a8a4331 100644 --- a/data-raw/test_data.R +++ b/data-raw/test_data.R @@ -53,17 +53,18 @@ fit <- wwinference::wwinference( ww_data_to_fit, hosp_data_preprocessed, model_spec = model_spec, - mcmc_options = get_mcmc_options( + mcmc_options = wwinference::get_mcmc_options( n_chains = 1, iter_sampling = 25, iter_warmup = 25 ), + spec_inits = FALSE, compiled_model = model ) # Create the toy stan data object for testing -toy_stan_data <- get_stan_data( +toy_stan_data <- wwinference::get_stan_data( input_count_data = hosp_data_preprocessed, input_ww_data = ww_data_to_fit, forecast_date = model_spec$forecast_date, @@ -75,6 +76,8 @@ toy_stan_data <- get_stan_data( params = model_spec$params, compute_likelihood = 1 ) + + # Generate the last draw of a very short run for testing toy_stan_fit_last_draw <- posterior::subset_draws(fit$raw_fit_obj$draws(), draw = 25 From 6c49a089773939e320457c093e68a6cd8062099a Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Mon, 8 Jul 2024 15:30:12 -0400 Subject: [PATCH 071/103] add all the tests that are relevant from cdcgov repo --- man/wwinference.Rd | 6 ++ tests/testthat/helper.R | 24 ++++++++ tests/testthat/setup.R | 6 ++ tests/testthat/test_ar1_marginal_variance.R | 40 +++++++++++++ tests/testthat/test_diff_ar1.R | 48 ++++++++++++++++ tests/testthat/test_helper.R | 23 ++++++++ tests/testthat/test_ihr_transform.R | 36 ++++++++++++ tests/testthat/test_pmfs_normalized.R | 23 ++++++++ tests/testthat/test_rt_assembly.R | 59 +++++++++++++++++++ tests/testthat/test_ww_model.R | 64 +++++++++++++++++++++ 10 files changed, 329 insertions(+) create mode 100644 tests/testthat/test_ar1_marginal_variance.R create mode 100644 tests/testthat/test_diff_ar1.R create mode 100644 tests/testthat/test_helper.R create mode 100644 tests/testthat/test_ihr_transform.R create mode 100644 tests/testthat/test_pmfs_normalized.R create mode 100644 tests/testthat/test_rt_assembly.R create mode 100644 tests/testthat/test_ww_model.R diff --git a/man/wwinference.Rd b/man/wwinference.Rd index 6f7bbdca..91474480 100644 --- a/man/wwinference.Rd +++ b/man/wwinference.Rd @@ -10,6 +10,7 @@ wwinference( count_data, model_spec = wwinference::get_model_spec(forecast_date = "2023-12-06"), mcmc_options = wwinference::get_mcmc_options(), + spec_inits = TRUE, compiled_model = wwinference::compile_model() ) } @@ -32,6 +33,11 @@ user based on the date they are producing a forecast} \item{mcmc_options}{The MCMC parameters as defined using \code{get_mcmc_options()}.} +\item{spec_inits}{Boolean indicating whether or not to specify the +initialization of the sampler, default is \code{TRUE}, meaning that the +initialization lists will be passed to the \code{cmdstanr::sample()} +function} + \item{compiled_model}{The pre-compiled model as defined using \code{compile_model()}} } diff --git a/tests/testthat/helper.R b/tests/testthat/helper.R index 28a31b09..59e37f77 100644 --- a/tests/testthat/helper.R +++ b/tests/testthat/helper.R @@ -1,3 +1,27 @@ +get_nonmatrix_names_from_draws <- function(draws) { + posterior::as_draws_list(draws)[[1]] |> + names() |> + strsplit(split = "[", fixed = TRUE) |> + sapply(function(x) { + x[1] + }) |> + unique() +} + +get_par_dims_flat <- function(draws) { + par_names_no_dim <- get_nonmatrix_names_from_draws(draws) + par_names_with_dim <- posterior::as_draws_list(draws)[[1]] %>% + names() + counts <- sapply(par_names_no_dim, function(par) { + full <- paste0("^", par, "$") + pre_dim <- paste0("^", par, "\\[") + sum( + grepl(full, par_names_with_dim) | grepl(pre_dim, par_names_with_dim) + ) + }) + return(counts) +} + logit_fn <- function(p) { stats::qlogis(p) } diff --git a/tests/testthat/setup.R b/tests/testthat/setup.R index dc810cf9..36796c46 100644 --- a/tests/testthat/setup.R +++ b/tests/testthat/setup.R @@ -20,3 +20,9 @@ compiled_site_inf_model <- cmdstanr::cmdstan_model( include = testthat_stan_include(), dir = tempdir() ) + +params <- wwinference::get_params( + system.file("extdata", "example_params.toml", + package = "wwinference" + ) +) diff --git a/tests/testthat/test_ar1_marginal_variance.R b/tests/testthat/test_ar1_marginal_variance.R new file mode 100644 index 00000000..433b1493 --- /dev/null +++ b/tests/testthat/test_ar1_marginal_variance.R @@ -0,0 +1,40 @@ +test_explanation <- paste0( + "When running a stationary AR(1) process initialized from the stationary ", + "variance, the marginal variances should be larger than when compared to an ", + "AR(1) process initialized from the (smaller) variance of the residuals. ", + "(At least, until stationarity is reached.)" +) + +test_that(test_explanation, { + model <- compiled_site_inf_model + + withr::with_seed(42, { + stationary <- sapply(1:1e3, function(i) { + z <- rnorm(10) + + model$functions$ar1( + mu = rep(0, length(z)), + ac = 0.95, + sd = 0.15, + z = z, + is_stat = TRUE + ) + }) + + nonstationary <- sapply(1:1e3, function(i) { + z <- rnorm(10) + + model$functions$ar1( + mu = rep(0, length(z)), + ac = 0.95, + sd = 0.15, + z = z, + is_stat = FALSE + ) + }) + + testthat::expect_true( + all(apply(stationary, 1, var) > apply(nonstationary, 1, var)) + ) + }) +}) diff --git a/tests/testthat/test_diff_ar1.R b/tests/testthat/test_diff_ar1.R new file mode 100644 index 00000000..34792b5a --- /dev/null +++ b/tests/testthat/test_diff_ar1.R @@ -0,0 +1,48 @@ +test_that( + "Test differenced AR(1) Stan function agrees with R", + "implementation.", + { + model <- compiled_site_inf_model + + withr::with_seed(42, { + z <- rnorm(100) + ar <- runif(1) + sd <- exp(rnorm(1, 0, 0.5)) + x0 <- rnorm(1, 0, 5) + }) + + for (stat in c(TRUE, FALSE)) { + stan_ar_diff <- model$functions$diff_ar1( + x0 = x0, + ar = ar, + sd = sd, + z = z, + is_stat = stat + ) + + r_ar_diff <- diff_ar1_from_z_scores( + x0 = x0, + ar = ar, + sd = sd, + z = z, + stationary = stat + ) + r_ar_diff_alt <- diff_ar1_from_z_scores_alt( + x0 = x0, + ar = ar, + sd = sd, + z = z, + stationary = stat + ) + + expect_equal( + stan_ar_diff, + r_ar_diff + ) + expect_equal( + stan_ar_diff, + r_ar_diff_alt + ) + } + } +) diff --git a/tests/testthat/test_helper.R b/tests/testthat/test_helper.R new file mode 100644 index 00000000..e4156eaf --- /dev/null +++ b/tests/testthat/test_helper.R @@ -0,0 +1,23 @@ +test_that("Make sure we can find and load files we need for other tests.", { + testthat::expect_true( + exists("toy_stan_data") + ) + + testthat::expect_true( + exists("toy_stan_fit_last_draw") + ) + + + # Compiled model object should exist in the workspace, with functions exposed + testthat::expect_true( + exists("compiled_site_inf_model") + ) + + testthat::expect_true( + "CmdStanModel" %in% class(compiled_site_inf_model) + ) + + testthat::expect_no_error( + compiled_site_inf_model$functions$convert_to_logmean(1.0, 1.0) + ) +}) diff --git a/tests/testthat/test_ihr_transform.R b/tests/testthat/test_ihr_transform.R new file mode 100644 index 00000000..b9787a13 --- /dev/null +++ b/tests/testthat/test_ihr_transform.R @@ -0,0 +1,36 @@ +test_that("Test logit-scale random walk on IHR in stan works", { + model <- compiled_site_inf_model + + days_weeks <- dim(toy_stan_data$p_hosp_m) + ndays <- days_weeks[1] + nweeks <- days_weeks[2] + + # Make sure we cover a wide range + sigma <- 0.5 + ac <- 0.1 + std_normal <- rnorm((nweeks)) + mu <- rep(logit_fn(0.01), (nweeks)) + + # Build the vector ourselves using the AR1 function from stan + p_hosp_r <- model$functions$ar1(mu, ac, sigma, std_normal, 1) + p_hosp_r <- rep(p_hosp_r, each = 7) # In R, expand weekly to daily + p_hosp_r <- inv_logit_fn(p_hosp_r) # convert to natural scale + p_hosp_r <- p_hosp_r[1:ndays] # Trim to size + + + # Get vector from stan and compare + p_hosp_stan <- model$functions$assemble_p_hosp( + toy_stan_data$p_hosp_m, # matrix to expand from weekly to daily + mu[1], # intercept to regress back to + sigma, # SD + ac, # autocorrelation factor + std_normal, + nweeks, + 1 + ) + + testthat::expect_equal( + p_hosp_stan, + p_hosp_r + ) +}) diff --git a/tests/testthat/test_pmfs_normalized.R b/tests/testthat/test_pmfs_normalized.R new file mode 100644 index 00000000..4475ff89 --- /dev/null +++ b/tests/testthat/test_pmfs_normalized.R @@ -0,0 +1,23 @@ +test_that("PMFs sum to 1", { + model <- compiled_site_inf_model + + shedding_pdf <- model$functions$get_vl_trajectory( + tpeak = 5, + viral_peak = 5, + duration_shedding = 17, + n = 100 + ) + + testthat::expect_equal(sum(shedding_pdf), 1.0) + + + generation_interval <- toy_stan_data$generation_interval + testthat::expect_equal(sum(generation_interval), 1.0) + + + inf_to_count_delay <- toy_stan_data$inf_to_hosp + testthat::expect_equal(sum(inf_to_count_delay), 1.0) + + inf_feedback <- toy_stan_data$infection_feedback_pmf + testthat::expect_equal(sum(inf_feedback), 1.0) +}) diff --git a/tests/testthat/test_rt_assembly.R b/tests/testthat/test_rt_assembly.R new file mode 100644 index 00000000..a7b07638 --- /dev/null +++ b/tests/testthat/test_rt_assembly.R @@ -0,0 +1,59 @@ +test_that(paste0( + "Test that assembling an unadjusted R(t) ", + "vector using R (language) code gives the ", + "same result as doing it via our custom Stan", + "functions when those are loaded into R" +), { + model <- compiled_site_inf_model + + days_weeks <- dim(toy_stan_data$ind_m) + ndays <- days_weeks[1] + nweeks <- days_weeks[2] + + ## Make sure we cover a wide range + sigma <- 5 + ac <- 0.25 + + withr::with_seed(5325, { + std_normal <- rnorm((nweeks - 1)) + init_val <- rnorm(1, 0.5, 0.25) + }) + + ## Build the vector ourselves + unadj_log_r_weeks_r <- diff_ar1_from_z_scores( + init_val, ac, sigma, std_normal, + stationary = FALSE + ) + unadj_log_r_days_r <- rep(unadj_log_r_weeks_r, + each = 7 + ) # In R, expand weekly to daily + + ## convert to linear scale and trim to size + unadj_r_days_r <- exp(unadj_log_r_days_r)[1:ndays] + + + ## Compute in the same way it is done in stan + unadj_log_r_weeks_stan <- model$functions$diff_ar1( + init_val, + ac, + sigma, + std_normal, + is_stat = FALSE + ) |> + as.numeric() + + expect_equal( + unadj_log_r_weeks_r, + unadj_log_r_weeks_stan + ) + + unadj_r_days_stan <- exp( + toy_stan_data$ind_m %*% unadj_log_r_weeks_stan + ) |> + as.numeric() + + expect_equal( + unadj_r_days_r, + unadj_r_days_stan + ) +}) diff --git a/tests/testthat/test_ww_model.R b/tests/testthat/test_ww_model.R new file mode 100644 index 00000000..d2afebbf --- /dev/null +++ b/tests/testthat/test_ww_model.R @@ -0,0 +1,64 @@ +test_that("Test the wastewater inference model on simulated data.", { + ####### + # run model briefly on the simulated data + ####### + model <- compiled_site_inf_model + fit <- model$sample( + data = toy_stan_data, + seed = 123, + iter_sampling = 25, + iter_warmup = 25, + chains = 1 + ) + + obs_last_draw <- posterior::subset_draws(fit$draws(), draw = 25) + + # Check all parameters (ignoring their dimensions) are in both fits + # But in a way that makes error messages easy to understand + obs_par_names <- get_nonmatrix_names_from_draws(obs_last_draw) + exp_par_names <- get_nonmatrix_names_from_draws(toy_stan_fit_last_draw) + + expect_true( + all(!!obs_par_names %in% !!exp_par_names) + ) + + expect_true( + all(!!exp_par_names %in% !!obs_par_names) + ) + + # Check dims + obs_par_lens <- get_par_dims_flat(obs_last_draw) + exp_par_lens <- get_par_dims_flat(toy_stan_fit_last_draw) + + agg_names <- c(names(obs_par_lens), names(exp_par_lens)) |> unique() + for (param in agg_names) { + expect_equal( + obs_par_lens[!!param], + exp_par_lens[!!param] + ) + } + expect_mapequal( + obs_par_lens, + exp_par_lens + ) + + # Check the parameters we care most about + model_params <- c( + "eta_sd", "autoreg_rt", "log_r_mu_intercept", "sigma_rt", + "autoreg_rt_site", "i0_over_n", "sigma_i0", "sigma_growth", + "initial_growth", "inv_sqrt_phi_h", "sigma_ww_site_mean", + "sigma_ww_site_sd", + "p_hosp_w_sd", "t_peak", "dur_shed", "ww_site_mod_sd", "rt", "rt_site_t", + "p_hosp", "w", "hosp_wday_effect", "eta_i0", "eta_growth", + "infection_feedback", "p_hosp_mean" + ) + + for (param in model_params) { + # Compare everything, with numerical tolerance + testthat::expect_equal( + obs_last_draw, + toy_stan_fit_last_draw, + tolerance = 0.0001 + ) + } +}) From b1d279a35a715e32a77d5e6fd4cf845023746793 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Mon, 8 Jul 2024 18:35:00 -0400 Subject: [PATCH 072/103] add documentation for the package data --- R/data.R | 94 +++++++++++++++++++++++++++++++++++++++++++ man/hosp_data.Rd | 73 +++++++++++++++++++++++++++++++++ man/hosp_data_eval.Rd | 59 +++++++++++++++++++++++++++ man/ww_data.Rd | 51 +++++++++++++++++++++++ 4 files changed, 277 insertions(+) create mode 100644 R/data.R create mode 100644 man/hosp_data.Rd create mode 100644 man/hosp_data_eval.Rd create mode 100644 man/ww_data.Rd diff --git a/R/data.R b/R/data.R new file mode 100644 index 00000000..deac836b --- /dev/null +++ b/R/data.R @@ -0,0 +1,94 @@ +#' Example wastewater dataset. +#' +#' A dataset containing the simulated wastewater concentrations +#' (labeled here as `genome_copies_per_ml`) by sample collection date (`date`), +#' the site where the sample was collected (`site`) and the lab where the +#' samples were processed (`lab`). Additional columns that are required +#' attributes needed for the model are the limit of detection for that lab on +#' each day (labeled here as `lod`) and the population size of the wastewater +#' catchment area represented by the wastewater concentrations in each `site`. +#' +#' This data is generated via the default values in the +#' `generate_simulated_data()` function. They represent the bare minumum +#' required fields needed to pass to the model, and we recommend that users +#' try to format their own data to match this format. +#' +#' The variables are as follows: +#' +#' @format ## ww_data +#' A tibble with 102 rows and 6 columns +#' \describe{ +#' \item{date}{Sample collection date, formatted in ISO8601 standards as +#' YYYY-MM-DD} +#' \item{site}{The wastewater treatment plant where the sample was collected} +#' \item{lab}{The lab where the sample was processed} +#' \item{genome_copies_per_ml}{The wastewater concentration measured on the +#' date specified, collected in the site specified, and processed in the lab +#' specified. The default parameters assume that this quantity is reported +#' as the genome copies per mL, on a natural scale.} +#' \item{lod}{The limit of detection in the site and lab on a particular day +#' of the quantification device (e.g. PCR). This is also by default reported +#' in terms of the genome copies per mL.} +#' \item{site_pop}{The population size of the wastewater catchment area +#' represented by the site variable} +#' } +#' @source generate_simulated_data.R +"ww_data" + + + + +#' Example hospital admissions dataset +#' +#' A dataset containing the simulated daily hospital admissions +#' (labeled here as `daily_hosp_admits`) by date of admission (`date`). +#' Additional columns that are required are the population size of the +#' population contributing to the hospital admissions. It is assumed that +#' the wastewater sites are subsets of this larger population, which +#' is in the package data assumed to be from a hypothetical US state. +#' The data generated are daily hospital admissions but they could be any other +#' epidemiological count dataset e.g. cases. This data should only contain +#' hospital admissions that would have been available as of the date that +#' the forecast was made. We recommend that users try to format their data +#' to match this format. +#' +#' This data is generated via the default values in the +#' `generate_simulated_data()` function. They represent the bare minumum +#' required fields needed to pass to the model, and we recommend that users +#' try to format their own data to match this formate. +#' +#' The variables are as follows: +#' \describe{ +#' \item{date}{Date the hospital admissions occurred, formatte din ISO8601 +#' standatds as YYYY-MM-DD} +#' \item{daily_hosp_admits}{The number of individuals admitted to the +#' hospital on that date, available as of the forecast date} +#' \item{state_pop}{The number of people contributing to the daily hospital +#' admissions} +#' } +#' @source generate_simulated_data.R +"hosp_data" + +#' Example hospital admissions dataset for evaluation +#' +#' A dataset containing the simulated daily hospital admissions that the model +#' will be evaluated against (labeled here as `daily_hosp_admits_for_eval`) +#' by date of admission (`date`). This data is not needed to fit the model, +#' but is used in the Getting Started vignette to demonstrate the forecasted +#' hospital admissions compared to those later observed. +#' +#' This data is generated via the default values in the +#' `generate_simulated_data()` function. +#' +#' The variables are as follows: +#' \describe{ +#' \item{date}{Date the hospital admissions occurred, formatte din ISO8601 +#' standatds as YYYY-MM-DD} +#' \item{daily_hosp_admits_for_eval}{The number of individuals admitted to the +#' hospital on that date, available beyond the forecast date for evaluating +#' the forecasted hospital admissions} +#' \item{state_pop}{The number of people contributing to the daily hospital +#' admissions} +#' } +#' @source generate_simulated_data.R +"hosp_data_eval" diff --git a/man/hosp_data.Rd b/man/hosp_data.Rd new file mode 100644 index 00000000..9d8d335c --- /dev/null +++ b/man/hosp_data.Rd @@ -0,0 +1,73 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/data.R +\docType{data} +\name{hosp_data} +\alias{hosp_data} +\title{Example hospital admissions dataset + +A dataset containing the simulated daily hospital admissions +(labeled here as \code{daily_hosp_admits}) by date of admission (\code{date}). +Additional columns that are required are the population size of the +population contributing to the hospital admissions. It is assumed that +the wastewater sites are subsets of this larger population, which +is in the package data assumed to be from a hypothetical US state. +The data generated are daily hospital admissions but they could be any other +epidemiological count dataset e.g. cases. This data should only contain +hospital admissions that would have been available as of the date that +the forecast was made. We recommend that users try to format their data +to match this format. + +This data is generated via the default values in the +\code{generate_simulated_data()} function. They represent the bare minumum +required fields needed to pass to the model, and we recommend that users +try to format their own data to match this formate. + +The variables are as follows: +\describe{ +\item{date}{Date the hospital admissions occurred, formatte din ISO8601 +standatds as YYYY-MM-DD} +\item{daily_hosp_admits}{The number of individuals admitted to the +hospital on that date, available as of the forecast date} +\item{state_pop}{The number of people contributing to the daily hospital +admissions} +}} +\format{ +An object of class \code{tbl_df} (inherits from \code{tbl}, \code{data.frame}) with 90 rows and 3 columns. +} +\source{ +generate_simulated_data.R +} +\usage{ +hosp_data +} +\description{ +Example hospital admissions dataset + +A dataset containing the simulated daily hospital admissions +(labeled here as \code{daily_hosp_admits}) by date of admission (\code{date}). +Additional columns that are required are the population size of the +population contributing to the hospital admissions. It is assumed that +the wastewater sites are subsets of this larger population, which +is in the package data assumed to be from a hypothetical US state. +The data generated are daily hospital admissions but they could be any other +epidemiological count dataset e.g. cases. This data should only contain +hospital admissions that would have been available as of the date that +the forecast was made. We recommend that users try to format their data +to match this format. + +This data is generated via the default values in the +\code{generate_simulated_data()} function. They represent the bare minumum +required fields needed to pass to the model, and we recommend that users +try to format their own data to match this formate. + +The variables are as follows: +\describe{ +\item{date}{Date the hospital admissions occurred, formatte din ISO8601 +standatds as YYYY-MM-DD} +\item{daily_hosp_admits}{The number of individuals admitted to the +hospital on that date, available as of the forecast date} +\item{state_pop}{The number of people contributing to the daily hospital +admissions} +} +} +\keyword{datasets} diff --git a/man/hosp_data_eval.Rd b/man/hosp_data_eval.Rd new file mode 100644 index 00000000..9a9c40cd --- /dev/null +++ b/man/hosp_data_eval.Rd @@ -0,0 +1,59 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/data.R +\docType{data} +\name{hosp_data_eval} +\alias{hosp_data_eval} +\title{Example hospital admissions dataset for evaluation + +A dataset containing the simulated daily hospital admissions that the model +will be evaluated against (labeled here as \code{daily_hosp_admits_for_eval}) +by date of admission (\code{date}). This data is not needed to fit the model, +but is used in the Getting Started vignette to demonstrate the forecasted +hospital admissions compared to those later observed. + +This data is generated via the default values in the +\code{generate_simulated_data()} function. + +The variables are as follows: +\describe{ +\item{date}{Date the hospital admissions occurred, formatte din ISO8601 +standatds as YYYY-MM-DD} +\item{daily_hosp_admits_for_eval}{The number of individuals admitted to the +hospital on that date, available beyond the forecast date for evaluating +the forecasted hospital admissions} +\item{state_pop}{The number of people contributing to the daily hospital +admissions} +}} +\format{ +An object of class \code{tbl_df} (inherits from \code{tbl}, \code{data.frame}) with 127 rows and 3 columns. +} +\source{ +generate_simulated_data.R +} +\usage{ +hosp_data_eval +} +\description{ +Example hospital admissions dataset for evaluation + +A dataset containing the simulated daily hospital admissions that the model +will be evaluated against (labeled here as \code{daily_hosp_admits_for_eval}) +by date of admission (\code{date}). This data is not needed to fit the model, +but is used in the Getting Started vignette to demonstrate the forecasted +hospital admissions compared to those later observed. + +This data is generated via the default values in the +\code{generate_simulated_data()} function. + +The variables are as follows: +\describe{ +\item{date}{Date the hospital admissions occurred, formatte din ISO8601 +standatds as YYYY-MM-DD} +\item{daily_hosp_admits_for_eval}{The number of individuals admitted to the +hospital on that date, available beyond the forecast date for evaluating +the forecasted hospital admissions} +\item{state_pop}{The number of people contributing to the daily hospital +admissions} +} +} +\keyword{datasets} diff --git a/man/ww_data.Rd b/man/ww_data.Rd new file mode 100644 index 00000000..ec825e59 --- /dev/null +++ b/man/ww_data.Rd @@ -0,0 +1,51 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/data.R +\docType{data} +\name{ww_data} +\alias{ww_data} +\title{Example wastewater dataset.} +\format{ +\subsection{ww_data}{ + +A tibble with 102 rows and 6 columns +\describe{ +\item{date}{Sample collection date, formatted in ISO8601 standards as +YYYY-MM-DD} +\item{site}{The wastewater treatment plant where the sample was collected} +\item{lab}{The lab where the sample was processed} +\item{genome_copies_per_ml}{The wastewater concentration measured on the +date specified, collected in the site specified, and processed in the lab +specified. The default parameters assume that this quantity is reported +as the genome copies per mL, on a natural scale.} +\item{lod}{The limit of detection in the site and lab on a particular day +of the quantification device (e.g. PCR). This is also by default reported +in terms of the genome copies per mL.} +\item{site_pop}{The population size of the wastewater catchment area +represented by the site variable} +} +} +} +\source{ +generate_simulated_data.R +} +\usage{ +ww_data +} +\description{ +A dataset containing the simulated wastewater concentrations +(labeled here as \code{genome_copies_per_ml}) by sample collection date (\code{date}), +the site where the sample was collected (\code{site}) and the lab where the +samples were processed (\code{lab}). Additional columns that are required +attributes needed for the model are the limit of detection for that lab on +each day (labeled here as \code{lod}) and the population size of the wastewater +catchment area represented by the wastewater concentrations in each \code{site}. +} +\details{ +This data is generated via the default values in the +\code{generate_simulated_data()} function. They represent the bare minumum +required fields needed to pass to the model, and we recommend that users +try to format their own data to match this format. + +The variables are as follows: +} +\keyword{datasets} From 12a8b069b005109f045ce664ccec9e4481272e76 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Tue, 9 Jul 2024 09:21:23 -0400 Subject: [PATCH 073/103] fix the bugs identified by CI --- NAMESPACE | 11 ++++++ R/data.R | 30 ++++++++++++++-- R/generate_simulated_data.R | 5 ++- R/get_stan_data.R | 2 +- R/utils.R | 1 - R/wwinference-package.R | 2 ++ man/generate_simulated_data.Rd | 5 ++- man/generation_interval.Rd | 25 +++++++++++++ man/get_count_data_sizes.Rd | 4 +-- man/hosp_data.Rd | 36 +++---------------- man/hosp_data_eval.Rd | 29 +++------------ man/inf_to_hosp.Rd | 25 +++++++++++++ man/to_simplex.Rd | 20 +++++++++++ man/ww_data.Rd | 2 +- tests/testthat/test_diff_ar1.R | 65 +++++++++++++++++----------------- 15 files changed, 162 insertions(+), 100 deletions(-) create mode 100644 man/generation_interval.Rd create mode 100644 man/inf_to_hosp.Rd create mode 100644 man/to_simplex.Rd diff --git a/NAMESPACE b/NAMESPACE index 877fcbd2..de76e0bc 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -73,6 +73,17 @@ importFrom(lubridate,ymd) importFrom(posterior,as_draws_list) importFrom(posterior,subset_draws) importFrom(rlang,sym) +importFrom(stats,dnbinom) +importFrom(stats,dweibull) +importFrom(stats,ecdf) +importFrom(stats,plogis) +importFrom(stats,qlogis) +importFrom(stats,rlnorm) +importFrom(stats,rnbinom) +importFrom(stats,rnorm) +importFrom(stats,rt) +importFrom(stats,sd) +importFrom(stats,time) importFrom(tidybayes,spread_draws) importFrom(tidybayes,stat_halfeye) importFrom(tidybayes,stat_slab) diff --git a/R/data.R b/R/data.R index deac836b..fac93c1e 100644 --- a/R/data.R +++ b/R/data.R @@ -32,7 +32,7 @@ #' \item{site_pop}{The population size of the wastewater catchment area #' represented by the site variable} #' } -#' @source generate_simulated_data.R +#' @source vignette_data.R "ww_data" @@ -66,7 +66,7 @@ #' \item{state_pop}{The number of people contributing to the daily hospital #' admissions} #' } -#' @source generate_simulated_data.R +#' @source vignette_data.R "hosp_data" #' Example hospital admissions dataset for evaluation @@ -90,5 +90,29 @@ #' \item{state_pop}{The number of people contributing to the daily hospital #' admissions} #' } -#' @source generate_simulated_data.R +#' @source vignette_data.R "hosp_data_eval" + +#' COVID-19 post-Omicron generation interval probability mass function +#' +#' \describe{ +#' A vector that sums to 1, with each element representing the daily +#' probability of secondary onward transmission occurring on that day. The +#' first element of this vector represents the day after primary transmission +#' occurred, it is assumed to be impossible for primary and secondary +#' transmission to occur on the same day. +#' } +#' @source covid_pmfs.R +"generation_interval" + +#' COVID-19 time delay distribution from infection to hospital admission +#' +#' \describe{ +#' A vector that sums to 1, with each element representing the daily +#' probabilty of transitioning from infected to hospitalized, conditioned on +#' being infected and eventually ending up hospitalized. The first element +#' represents the probability of being infected and admitted to the hospital +#' on the same day +#' } +#' @source covid_pmfs.R +"inf_to_hosp" diff --git a/R/generate_simulated_data.R b/R/generate_simulated_data.R index 15e84175..43c3fd45 100644 --- a/R/generate_simulated_data.R +++ b/R/generate_simulated_data.R @@ -66,7 +66,10 @@ #' # different labs #' sim_data <- generate_simulated_data( #' n_sites = 6, -#' site = c(rep(1, 4), rep(2, 2)) +#' site = c(1, 2, 3, 4, 5, 6, 6), +#' lab = c(1, 1, 1, 1, 2, 2, 3), +#' ww_pop_sites = c(1e5, 4e5, 2e5, 1.5e5, 5e4, 3e5), +#' pop_size = 2e6 #' ) #' hosp_data <- sim_data$hosp_data #' ww_data <- sim_data$ww_data diff --git a/R/get_stan_data.R b/R/get_stan_data.R index 5317b60d..1d34f67a 100644 --- a/R/get_stan_data.R +++ b/R/get_stan_data.R @@ -572,7 +572,7 @@ get_subpop_data <- function(add_auxiliary_site, #' Get count data integer sizes for stan #' -#' @param input_cout_data a dataframe with the input count data +#' @param input_count_data a dataframe with the input count data #' @param forecast_date string indicating the forecast date #' @param forecast_horizon integer indicating the number of days to make a #' forecast for diff --git a/R/utils.R b/R/utils.R index c1a329d4..98b2524e 100644 --- a/R/utils.R +++ b/R/utils.R @@ -132,7 +132,6 @@ convert_to_logsd <- function(mean, sd) { #' @export #' @examples #' to_simplex(c(1, 1, 1)) -#' @noRd to_simplex <- function(vector) { return(vector / sum(vector)) } diff --git a/R/wwinference-package.R b/R/wwinference-package.R index e6be48c4..d999b342 100644 --- a/R/wwinference-package.R +++ b/R/wwinference-package.R @@ -13,4 +13,6 @@ #' @importFrom posterior subset_draws as_draws_list #' @importFrom fs path_package #' @importFrom rlang sym +#' @importFrom stats dnbinom dweibull ecdf plogis qlogis rlnorm rnbinom rnorm +#' rt sd time NULL diff --git a/man/generate_simulated_data.Rd b/man/generate_simulated_data.Rd index b5638514..2cd6158b 100644 --- a/man/generate_simulated_data.Rd +++ b/man/generate_simulated_data.Rd @@ -119,7 +119,10 @@ and parameters to generate from. # different labs sim_data <- generate_simulated_data( n_sites = 6, - site = c(rep(1, 4), rep(2, 2)) + site = c(1, 2, 3, 4, 5, 6, 6), + lab = c(1, 1, 1, 1, 2, 2, 3), + ww_pop_sites = c(1e5, 4e5, 2e5, 1.5e5, 5e4, 3e5), + pop_size = 2e6 ) hosp_data <- sim_data$hosp_data ww_data <- sim_data$ww_data diff --git a/man/generation_interval.Rd b/man/generation_interval.Rd new file mode 100644 index 00000000..51654b66 --- /dev/null +++ b/man/generation_interval.Rd @@ -0,0 +1,25 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/data.R +\docType{data} +\name{generation_interval} +\alias{generation_interval} +\title{COVID-19 post-Omicron generation interval probability mass function} +\format{ +An object of class \code{numeric} of length 15. +} +\source{ +covid_pmfs.R +} +\usage{ +generation_interval +} +\description{ +\describe{ +A vector that sums to 1, with each element representing the daily +probability of secondary onward transmission occurring on that day. The +first element of this vector represents the day after primary transmission +occurred, it is assumed to be impossible for primary and secondary +transmission to occur on the same day. +} +} +\keyword{datasets} diff --git a/man/get_count_data_sizes.Rd b/man/get_count_data_sizes.Rd index 82935002..46d808e9 100644 --- a/man/get_count_data_sizes.Rd +++ b/man/get_count_data_sizes.Rd @@ -15,6 +15,8 @@ get_count_data_sizes( ) } \arguments{ +\item{input_count_data}{a dataframe with the input count data} + \item{forecast_date}{string indicating the forecast date} \item{forecast_horizon}{integer indicating the number of days to make a @@ -32,8 +34,6 @@ no observations} \item{count_col_name}{A string represeting the name of the column in the input_count_data that indicates the number of daily counts, default is \code{count}} - -\item{input_cout_data}{a dataframe with the input count data} } \value{ A list containing the integer sizes of the follow variables that diff --git a/man/hosp_data.Rd b/man/hosp_data.Rd index 9d8d335c..b75c70b3 100644 --- a/man/hosp_data.Rd +++ b/man/hosp_data.Rd @@ -3,46 +3,17 @@ \docType{data} \name{hosp_data} \alias{hosp_data} -\title{Example hospital admissions dataset - -A dataset containing the simulated daily hospital admissions -(labeled here as \code{daily_hosp_admits}) by date of admission (\code{date}). -Additional columns that are required are the population size of the -population contributing to the hospital admissions. It is assumed that -the wastewater sites are subsets of this larger population, which -is in the package data assumed to be from a hypothetical US state. -The data generated are daily hospital admissions but they could be any other -epidemiological count dataset e.g. cases. This data should only contain -hospital admissions that would have been available as of the date that -the forecast was made. We recommend that users try to format their data -to match this format. - -This data is generated via the default values in the -\code{generate_simulated_data()} function. They represent the bare minumum -required fields needed to pass to the model, and we recommend that users -try to format their own data to match this formate. - -The variables are as follows: -\describe{ -\item{date}{Date the hospital admissions occurred, formatte din ISO8601 -standatds as YYYY-MM-DD} -\item{daily_hosp_admits}{The number of individuals admitted to the -hospital on that date, available as of the forecast date} -\item{state_pop}{The number of people contributing to the daily hospital -admissions} -}} +\title{Example hospital admissions dataset} \format{ An object of class \code{tbl_df} (inherits from \code{tbl}, \code{data.frame}) with 90 rows and 3 columns. } \source{ -generate_simulated_data.R +vignette_data.R } \usage{ hosp_data } \description{ -Example hospital admissions dataset - A dataset containing the simulated daily hospital admissions (labeled here as \code{daily_hosp_admits}) by date of admission (\code{date}). Additional columns that are required are the population size of the @@ -54,7 +25,8 @@ epidemiological count dataset e.g. cases. This data should only contain hospital admissions that would have been available as of the date that the forecast was made. We recommend that users try to format their data to match this format. - +} +\details{ This data is generated via the default values in the \code{generate_simulated_data()} function. They represent the bare minumum required fields needed to pass to the model, and we recommend that users diff --git a/man/hosp_data_eval.Rd b/man/hosp_data_eval.Rd index 9a9c40cd..3713a275 100644 --- a/man/hosp_data_eval.Rd +++ b/man/hosp_data_eval.Rd @@ -3,45 +3,24 @@ \docType{data} \name{hosp_data_eval} \alias{hosp_data_eval} -\title{Example hospital admissions dataset for evaluation - -A dataset containing the simulated daily hospital admissions that the model -will be evaluated against (labeled here as \code{daily_hosp_admits_for_eval}) -by date of admission (\code{date}). This data is not needed to fit the model, -but is used in the Getting Started vignette to demonstrate the forecasted -hospital admissions compared to those later observed. - -This data is generated via the default values in the -\code{generate_simulated_data()} function. - -The variables are as follows: -\describe{ -\item{date}{Date the hospital admissions occurred, formatte din ISO8601 -standatds as YYYY-MM-DD} -\item{daily_hosp_admits_for_eval}{The number of individuals admitted to the -hospital on that date, available beyond the forecast date for evaluating -the forecasted hospital admissions} -\item{state_pop}{The number of people contributing to the daily hospital -admissions} -}} +\title{Example hospital admissions dataset for evaluation} \format{ An object of class \code{tbl_df} (inherits from \code{tbl}, \code{data.frame}) with 127 rows and 3 columns. } \source{ -generate_simulated_data.R +vignette_data.R } \usage{ hosp_data_eval } \description{ -Example hospital admissions dataset for evaluation - A dataset containing the simulated daily hospital admissions that the model will be evaluated against (labeled here as \code{daily_hosp_admits_for_eval}) by date of admission (\code{date}). This data is not needed to fit the model, but is used in the Getting Started vignette to demonstrate the forecasted hospital admissions compared to those later observed. - +} +\details{ This data is generated via the default values in the \code{generate_simulated_data()} function. diff --git a/man/inf_to_hosp.Rd b/man/inf_to_hosp.Rd new file mode 100644 index 00000000..ffa01998 --- /dev/null +++ b/man/inf_to_hosp.Rd @@ -0,0 +1,25 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/data.R +\docType{data} +\name{inf_to_hosp} +\alias{inf_to_hosp} +\title{COVID-19 time delay distribution from infection to hospital admission} +\format{ +An object of class \code{numeric} of length 55. +} +\source{ +covid_pmfs.R +} +\usage{ +inf_to_hosp +} +\description{ +\describe{ +A vector that sums to 1, with each element representing the daily +probabilty of transitioning from infected to hospitalized, conditioned on +being infected and eventually ending up hospitalized. The first element +represents the probability of being infected and admitted to the hospital +on the same day +} +} +\keyword{datasets} diff --git a/man/to_simplex.Rd b/man/to_simplex.Rd new file mode 100644 index 00000000..1a7b4d6d --- /dev/null +++ b/man/to_simplex.Rd @@ -0,0 +1,20 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/utils.R +\name{to_simplex} +\alias{to_simplex} +\title{Normalize vector to a simplex} +\usage{ +to_simplex(vector) +} +\arguments{ +\item{vector}{numeric vector} +} +\value{ +vector whos sum adds to 1 +} +\description{ +Normalize vector to a simplex +} +\examples{ +to_simplex(c(1, 1, 1)) +} diff --git a/man/ww_data.Rd b/man/ww_data.Rd index ec825e59..ee2bebde 100644 --- a/man/ww_data.Rd +++ b/man/ww_data.Rd @@ -26,7 +26,7 @@ represented by the site variable} } } \source{ -generate_simulated_data.R +vignette_data.R } \usage{ ww_data diff --git a/tests/testthat/test_diff_ar1.R b/tests/testthat/test_diff_ar1.R index 34792b5a..e69c3e3a 100644 --- a/tests/testthat/test_diff_ar1.R +++ b/tests/testthat/test_diff_ar1.R @@ -1,6 +1,5 @@ test_that( "Test differenced AR(1) Stan function agrees with R", - "implementation.", { model <- compiled_site_inf_model @@ -9,40 +8,40 @@ test_that( ar <- runif(1) sd <- exp(rnorm(1, 0, 0.5)) x0 <- rnorm(1, 0, 5) - }) - for (stat in c(TRUE, FALSE)) { - stan_ar_diff <- model$functions$diff_ar1( - x0 = x0, - ar = ar, - sd = sd, - z = z, - is_stat = stat - ) + for (stat in c(TRUE, FALSE)) { + stan_ar_diff <- model$functions$diff_ar1( + x0 = x0, + ar = ar, + sd = sd, + z = z, + is_stat = stat + ) - r_ar_diff <- diff_ar1_from_z_scores( - x0 = x0, - ar = ar, - sd = sd, - z = z, - stationary = stat - ) - r_ar_diff_alt <- diff_ar1_from_z_scores_alt( - x0 = x0, - ar = ar, - sd = sd, - z = z, - stationary = stat - ) + r_ar_diff <- diff_ar1_from_z_scores( + x0 = x0, + ar = ar, + sd = sd, + z = z, + stationary = stat + ) + r_ar_diff_alt <- diff_ar1_from_z_scores_alt( + x0 = x0, + ar = ar, + sd = sd, + z = z, + stationary = stat + ) - expect_equal( - stan_ar_diff, - r_ar_diff - ) - expect_equal( - stan_ar_diff, - r_ar_diff_alt - ) - } + expect_equal( + stan_ar_diff, + r_ar_diff + ) + expect_equal( + stan_ar_diff, + r_ar_diff_alt + ) + } + }) } ) From a3f9c56bfa62909be70739735ece12b8186ae00e Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Tue, 9 Jul 2024 09:47:30 -0400 Subject: [PATCH 074/103] fix example --- R/wwinference.R | 2 +- man/get_model_spec.Rd | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/R/wwinference.R b/R/wwinference.R index 3528684f..46e764a1 100644 --- a/R/wwinference.R +++ b/R/wwinference.R @@ -286,7 +286,7 @@ get_mcmc_options <- function( #' @export #' #' @examples -#' model_spec_list <- model_spec(forecast_date = "2023-12-06") +#' model_spec_list <- get_model_spec(forecast_date = "2023-12-06") get_model_spec <- function( forecast_date, calibration_time = 90, diff --git a/man/get_model_spec.Rd b/man/get_model_spec.Rd index aa53f4e3..1245f33f 100644 --- a/man/get_model_spec.Rd +++ b/man/get_model_spec.Rd @@ -52,5 +52,5 @@ post-omicron COVID-19 model with joint inference of hospital admissions and data on wastewater viral concentrations } \examples{ -model_spec_list <- model_spec(forecast_date = "2023-12-06") +model_spec_list <- get_model_spec(forecast_date = "2023-12-06") } From 66a8b054f8e8bdc140caf4756e55dbe6125178bc Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Tue, 9 Jul 2024 10:26:26 -0400 Subject: [PATCH 075/103] fix get params example --- R/get_params.R | 5 ++++- man/get_params.Rd | 5 ++++- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/R/get_params.R b/R/get_params.R index 39a6b5c1..4654ed05 100644 --- a/R/get_params.R +++ b/R/get_params.R @@ -9,7 +9,10 @@ #' @export #' #' @examples -#' params <- get_params("input/params.toml") +#' params <- get_params(param_file = fs::path_package("extdata", +#' "example_params.toml", +#' package = "wwinference" +#' )) get_params <- function(param_file) { paramlist <- RcppTOML::parseTOML(param_file) validate_paramlist(paramlist) diff --git a/man/get_params.Rd b/man/get_params.Rd index a8c21c7e..3aebeaf0 100644 --- a/man/get_params.Rd +++ b/man/get_params.Rd @@ -18,5 +18,8 @@ model Get parameters for model run } \examples{ -params <- get_params("input/params.toml") +params <- get_params(param_file = fs::path_package("extdata", + "example_params.toml", + package = "wwinference" + )) } From 8aa7f65427a81d5f35b33a143a0c642b07eaa83e Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Tue, 9 Jul 2024 11:12:31 -0400 Subject: [PATCH 076/103] fix examples in indicate ww exclusions --- .Rbuildignore | 2 ++ R/preprocessing.R | 2 +- man/get_params.Rd | 6 +++--- man/indicate_ww_exclusions.Rd | 2 +- 4 files changed, 7 insertions(+), 5 deletions(-) diff --git a/.Rbuildignore b/.Rbuildignore index 531e1155..dbbb7267 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -2,3 +2,5 @@ ^docs$ ^pkgdown$ ^\.github$ +^.*\.Rproj$ +^\.Rproj\.user$ diff --git a/R/preprocessing.R b/R/preprocessing.R index 93f186cb..e855352c 100644 --- a/R/preprocessing.R +++ b/R/preprocessing.R @@ -247,7 +247,7 @@ flag_ww_outliers <- function(ww_data, #' genome_copies_per_mL = c(300, 3e6), #' flag_as_ww_outlier = c(0, 1) #' ) -#' data_w_exclusions <- indicate_exclusions(data, +#' data_w_exclusions <- indicate_ww_exclusions(data, #' outlier_col_name = "flag_as_ww_outlier", #' remove_outliers = TRUE #' ) diff --git a/man/get_params.Rd b/man/get_params.Rd index 3aebeaf0..90c32a01 100644 --- a/man/get_params.Rd +++ b/man/get_params.Rd @@ -19,7 +19,7 @@ Get parameters for model run } \examples{ params <- get_params(param_file = fs::path_package("extdata", - "example_params.toml", - package = "wwinference" - )) + "example_params.toml", + package = "wwinference" +)) } diff --git a/man/indicate_ww_exclusions.Rd b/man/indicate_ww_exclusions.Rd index 82dc3e18..5054a7d9 100644 --- a/man/indicate_ww_exclusions.Rd +++ b/man/indicate_ww_exclusions.Rd @@ -35,7 +35,7 @@ data <- tibble::tibble( genome_copies_per_mL = c(300, 3e6), flag_as_ww_outlier = c(0, 1) ) -data_w_exclusions <- indicate_exclusions(data, +data_w_exclusions <- indicate_ww_exclusions(data, outlier_col_name = "flag_as_ww_outlier", remove_outliers = TRUE ) From 40146b9f651655f520326770107269231e222928 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Tue, 9 Jul 2024 11:39:41 -0400 Subject: [PATCH 077/103] fix example again, need exclude column going in --- R/preprocessing.R | 3 ++- man/indicate_ww_exclusions.Rd | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/R/preprocessing.R b/R/preprocessing.R index e855352c..68d8284f 100644 --- a/R/preprocessing.R +++ b/R/preprocessing.R @@ -245,7 +245,8 @@ flag_ww_outliers <- function(ww_data, #' data <- tibble::tibble( #' date = c("2023-10-01", "2023-10-02"), #' genome_copies_per_mL = c(300, 3e6), -#' flag_as_ww_outlier = c(0, 1) +#' flag_as_ww_outlier = c(0, 1), +#' exclude = c(0, 0) #' ) #' data_w_exclusions <- indicate_ww_exclusions(data, #' outlier_col_name = "flag_as_ww_outlier", diff --git a/man/indicate_ww_exclusions.Rd b/man/indicate_ww_exclusions.Rd index 5054a7d9..608d8c59 100644 --- a/man/indicate_ww_exclusions.Rd +++ b/man/indicate_ww_exclusions.Rd @@ -33,7 +33,8 @@ column name specified by the \code{outlier_col_name}. data <- tibble::tibble( date = c("2023-10-01", "2023-10-02"), genome_copies_per_mL = c(300, 3e6), - flag_as_ww_outlier = c(0, 1) + flag_as_ww_outlier = c(0, 1), + exclude = c(0,0) ) data_w_exclusions <- indicate_ww_exclusions(data, outlier_col_name = "flag_as_ww_outlier", From 0c0a7f4861e8413eb1b8676339669b3c32963398 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Tue, 9 Jul 2024 12:20:03 -0400 Subject: [PATCH 078/103] fix hosp preprocessing examples --- R/preprocessing.R | 2 +- man/indicate_ww_exclusions.Rd | 2 +- man/preprocess_hosp_data.Rd | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/R/preprocessing.R b/R/preprocessing.R index 68d8284f..359bf359 100644 --- a/R/preprocessing.R +++ b/R/preprocessing.R @@ -87,7 +87,7 @@ preprocess_ww_data <- function(ww_data, #' daily_admits = c(10, 20), #' state_pop = c(1e6, 1e6) #' ) -#' hosp_data_preprocessed <- preprocess_hospdata( +#' hosp_data_preprocessed <- preprocess_hosp_data( #' hosp_data, #' "daily_admits", #' "state_pop" diff --git a/man/indicate_ww_exclusions.Rd b/man/indicate_ww_exclusions.Rd index 608d8c59..fa51e0e2 100644 --- a/man/indicate_ww_exclusions.Rd +++ b/man/indicate_ww_exclusions.Rd @@ -34,7 +34,7 @@ data <- tibble::tibble( date = c("2023-10-01", "2023-10-02"), genome_copies_per_mL = c(300, 3e6), flag_as_ww_outlier = c(0, 1), - exclude = c(0,0) + exclude = c(0, 0) ) data_w_exclusions <- indicate_ww_exclusions(data, outlier_col_name = "flag_as_ww_outlier", diff --git a/man/preprocess_hosp_data.Rd b/man/preprocess_hosp_data.Rd index ba37dce7..516523e8 100644 --- a/man/preprocess_hosp_data.Rd +++ b/man/preprocess_hosp_data.Rd @@ -33,7 +33,7 @@ hosp_data <- tibble::tibble( daily_admits = c(10, 20), state_pop = c(1e6, 1e6) ) -hosp_data_preprocessed <- preprocess_hospdata( +hosp_data_preprocessed <- preprocess_hosp_data( hosp_data, "daily_admits", "state_pop" From c9091af3215813a18d149db22ab7dc76ff8497e2 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Tue, 9 Jul 2024 13:13:03 -0400 Subject: [PATCH 079/103] fix preprocess ww data example --- R/preprocessing.R | 2 +- man/preprocess_ww_data.Rd | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/R/preprocessing.R b/R/preprocessing.R index 359bf359..f665c4c6 100644 --- a/R/preprocessing.R +++ b/R/preprocessing.R @@ -18,7 +18,7 @@ #' @examples #' ww_data <- tibble::tibble( #' date = rep(c("2023-11-01", "2023-11-02"), 2), -#' site = c(rep(1), rep(2)), +#' site = c(rep(1, 2), rep(2, 2)), #' lab = c(1, 1, 1, 1), #' conc = c(345.2, 784.1, 401.5, 681.8), #' lod = c(20, 20, 15, 15) diff --git a/man/preprocess_ww_data.Rd b/man/preprocess_ww_data.Rd index dc443c07..b7988280 100644 --- a/man/preprocess_ww_data.Rd +++ b/man/preprocess_ww_data.Rd @@ -36,7 +36,7 @@ Get input wastewater data \examples{ ww_data <- tibble::tibble( date = rep(c("2023-11-01", "2023-11-02"), 2), - site = c(rep(1), rep(2)), + site = c(rep(1,2), rep(2,2)), lab = c(1, 1, 1, 1), conc = c(345.2, 784.1, 401.5, 681.8), lod = c(20, 20, 15, 15) From 38e7352e515cc6a23f0112c1804f6b01e44d87f9 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Tue, 9 Jul 2024 13:47:31 -0400 Subject: [PATCH 080/103] fix redundancy bug in naming conc col name --- R/preprocessing.R | 8 ++------ man/preprocess_ww_data.Rd | 2 +- 2 files changed, 3 insertions(+), 7 deletions(-) diff --git a/R/preprocessing.R b/R/preprocessing.R index f665c4c6..348d7f73 100644 --- a/R/preprocessing.R +++ b/R/preprocessing.R @@ -59,12 +59,8 @@ preprocess_ww_data <- function(ww_data, # Get an extra column that identifies the wastewater outliers using the # default parameters ww_preprocessed <- flag_ww_outliers(ww_data_add_cols, - conc_col_name = {{ conc_col_name }} - ) |> - dplyr::rename( - genome_copies_per_ml = {{ conc_col_name }} - ) - + conc_col_name = "genome_copies_per_ml" + ) return(ww_preprocessed) } diff --git a/man/preprocess_ww_data.Rd b/man/preprocess_ww_data.Rd index b7988280..a12d1787 100644 --- a/man/preprocess_ww_data.Rd +++ b/man/preprocess_ww_data.Rd @@ -36,7 +36,7 @@ Get input wastewater data \examples{ ww_data <- tibble::tibble( date = rep(c("2023-11-01", "2023-11-02"), 2), - site = c(rep(1,2), rep(2,2)), + site = c(rep(1, 2), rep(2, 2)), lab = c(1, 1, 1, 1), conc = c(345.2, 784.1, 401.5, 681.8), lod = c(20, 20, 15, 15) From 478e0a40accf54c17a60fd717a071c5fc88be202 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 10 Jul 2024 16:16:56 -0400 Subject: [PATCH 081/103] add posterior to imports --- DESCRIPTION | 1 + 1 file changed, 1 insertion(+) diff --git a/DESCRIPTION b/DESCRIPTION index 24435818..a345e1e8 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -76,6 +76,7 @@ Imports: tidyr, purrr, withr, + posterior, cmdstanr (>= 0.8.0), rlang, scales From 0f33512ef8c494afe3358e7160bd61e1e5018397 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 10 Jul 2024 16:22:37 -0400 Subject: [PATCH 082/103] modify DESCRIPTION --- DESCRIPTION | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index a345e1e8..a0ec2f31 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -72,13 +72,18 @@ Imports: RcppTOML, cli, tibble, + usethis, tidybayes, tidyr, purrr, withr, - posterior, cmdstanr (>= 0.8.0), rlang, - scales + scales, + ggplot2, + rcmdcheck, + posterior Remotes: stan-dev/cmdstanr +VignetteBuilder: + knitr From b9a6994fcf6cedb2ccc500e7be529ddd02b5b183 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 10 Jul 2024 16:26:44 -0400 Subject: [PATCH 083/103] remove documentation for delay distrib fxns --- NAMESPACE | 5 --- man/drop_first_and_renormalize.Rd | 26 -------------- man/make_hospital_onset_delay_pmf.Rd | 32 ----------------- man/make_incubation_period_pmf.Rd | 35 ------------------ man/make_reporting_delay_pmf.Rd | 28 --------------- man/simulate_double_censored_pmf.Rd | 53 ---------------------------- 6 files changed, 179 deletions(-) delete mode 100644 man/drop_first_and_renormalize.Rd delete mode 100644 man/make_hospital_onset_delay_pmf.Rd delete mode 100644 man/make_incubation_period_pmf.Rd delete mode 100644 man/make_reporting_delay_pmf.Rd delete mode 100644 man/simulate_double_censored_pmf.Rd diff --git a/NAMESPACE b/NAMESPACE index 43da22ea..5a49ab15 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -6,7 +6,6 @@ export(compile_model) export(convert_to_logmean) export(convert_to_logsd) export(create_dir) -export(drop_first_and_renormalize) export(flag_ww_outliers) export(generate_simulated_data) export(get_count_data_sizes) @@ -28,13 +27,9 @@ export(get_ww_data_indices) export(get_ww_data_sizes) export(get_ww_values) export(indicate_ww_exclusions) -export(make_hospital_onset_delay_pmf) -export(make_incubation_period_pmf) -export(make_reporting_delay_pmf) export(postprocess) export(preprocess_hosp_data) export(preprocess_ww_data) -export(simulate_double_censored_pmf) export(to_simplex) export(validate_paramlist) export(wwinference) diff --git a/man/drop_first_and_renormalize.Rd b/man/drop_first_and_renormalize.Rd deleted file mode 100644 index 6fc16218..00000000 --- a/man/drop_first_and_renormalize.Rd +++ /dev/null @@ -1,26 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/delay_distribs.R -\name{drop_first_and_renormalize} -\alias{drop_first_and_renormalize} -\title{Drop the first element of a simplex} -\usage{ -drop_first_and_renormalize(x) -} -\arguments{ -\item{x}{A numeric vector, sums to 1. Corresponds to a discretized PDF or PMF -(usually the GI distribution).} -} -\value{ -A numeric vector, sums to 1. -} -\description{ -When this vector corresponds to the generation interval distribution, we -want to drop this first bin. The renewal equation assumes that same-day -infection and onward transmission does not occur, and we assume -everything is 1 indexed not 0 indeced. We need to -manually drop the first element from the PMF vector. -} -\examples{ -pmf_orig <- c(0.1, 0.1, 0.1, 0.7) -pmf_trunc <- drop_first_and_renormalize(pmf_orig) -} diff --git a/man/make_hospital_onset_delay_pmf.Rd b/man/make_hospital_onset_delay_pmf.Rd deleted file mode 100644 index 702cce5d..00000000 --- a/man/make_hospital_onset_delay_pmf.Rd +++ /dev/null @@ -1,32 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/delay_distribs.R -\name{make_hospital_onset_delay_pmf} -\alias{make_hospital_onset_delay_pmf} -\title{Make hospital onset delay pmf} -\usage{ -make_hospital_onset_delay_pmf( - neg_binom_mu = 6.98665, - neg_binom_size = 2.490848 -) -} -\arguments{ -\item{neg_binom_mu}{float indicating the mean of the negative binomial shaped -delay from symptom onset to hospital admissions, default is \code{6.98665} from -fit to data in above paper} - -\item{neg_binom_size}{float indicating the dispersion parameter in the -negative binomial delay from symptom onset to hospital admissions, default -is \code{2.490848} from fit to data in above paper} -} -\value{ -pmf of distribution from symptom onset to hospital admission -} -\description{ -Uses the parameter estimates from cfa-parameter-estimates, -which is based on Danache et al linelist data from symptom onset to hospital -admission. See below: -https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0261428 -} -\examples{ -delay_pmf <- make_hospital_onset_delay_pmf(7, 2.5) -} diff --git a/man/make_incubation_period_pmf.Rd b/man/make_incubation_period_pmf.Rd deleted file mode 100644 index 847de80d..00000000 --- a/man/make_incubation_period_pmf.Rd +++ /dev/null @@ -1,35 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/delay_distribs.R -\name{make_incubation_period_pmf} -\alias{make_incubation_period_pmf} -\title{Make incubation period pmf} -\usage{ -make_incubation_period_pmf( - backward_scale = 3.6, - backward_shape = 1.5, - r = 0.15 -) -} -\arguments{ -\item{backward_scale}{numeric indicating the scale parameter for the Weibull -used in producing the incubation period distribution. default is \code{3.60} for -COVID} - -\item{backward_shape}{numeric indicating the shape parameter for the Weibull -used in producing the incubation period distribution, default is \code{1.50} for -COVID} - -\item{r}{numeric indicating the exponential rate used in producing the -correction on the incubaion period distribution, default is \code{0.15} for COVID} -} -\value{ -pmf of incubation period -} -\description{ -This makes a pmf corresponding to -the incubation period for COVID after Omicron used in Park et al 2023 -These estimates are from early Omicron. -} -\examples{ -inc_pmf <- make_incubation_period_pmf(3.6, 1.5, 0.15) -} diff --git a/man/make_reporting_delay_pmf.Rd b/man/make_reporting_delay_pmf.Rd deleted file mode 100644 index 9d7605be..00000000 --- a/man/make_reporting_delay_pmf.Rd +++ /dev/null @@ -1,28 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/delay_distribs.R -\name{make_reporting_delay_pmf} -\alias{make_reporting_delay_pmf} -\title{Make reporting delay pmf} -\usage{ -make_reporting_delay_pmf(incubation_period_pmf, hospital_onset_delay_pmf) -} -\arguments{ -\item{incubation_period_pmf}{a numeric vector, sums to 1, indicating -the probability of time from infection to symptom onset} - -\item{hospital_onset_delay_pmf}{a numeric vector, sums to 1, indicating the -proabbility of time from symptom onset to hospital admissions} -} -\value{ -convolution of incubation period and sympton onset to hospital -admission pmf -} -\description{ -Convolve the incubation period pmf with the symptom to hospital admission pmf -and normalize -} -\examples{ -inc_pmf <- make_incubation_period_pmf(3.6, 1.5, 0.15) -hosp_delay_pmf <- make_hospital_onset_delay_pmf(7, 2.5) -inf_to_hosp_pmf <- make_reporting_delay_pmf(inc_pmf, hosp_delay_pmf) -} diff --git a/man/simulate_double_censored_pmf.Rd b/man/simulate_double_censored_pmf.Rd deleted file mode 100644 index 5127a495..00000000 --- a/man/simulate_double_censored_pmf.Rd +++ /dev/null @@ -1,53 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/delay_distribs.R -\name{simulate_double_censored_pmf} -\alias{simulate_double_censored_pmf} -\title{Simulate daily double censored PMF. From {epinowcast}: -https://package.epinowcast.org/dev/reference/simulate_double_censored_pmf.html #nolint} -\usage{ -simulate_double_censored_pmf( - max, - fun_primary = stats::runif, - primary_args = list(), - fun_dist = stats::rlnorm, - dist_args = list(...), - n = 1e+06, - ... -) -} -\arguments{ -\item{max}{Maximum value for the computed CDF. If not specified, the maximum -value is the maximum simulated delay.} - -\item{fun_primary}{Primary distribution function (default is \code{runif}).} - -\item{primary_args}{List of additional arguments to be passed to the primary -distribution function.} - -\item{fun_dist}{Distribution function to be added to the primary (default is -\code{rlnorm}).} - -\item{dist_args}{List of additional arguments to be passed to the -distribution function.} - -\item{n}{Number of simulations (default is 1e6).} - -\item{...}{Additional arguments to be passed to the distribution function. -This is an alternative to \code{dist_args}.} -} -\value{ -A numeric vector representing the PMF. -} -\description{ -This function simulates the probability mass function of a daily -double-censored process. The process involves two distributions: a primary -distribution which represents the censoring process for the primary event -and another distribution (which is offset by the primary). -} -\details{ -Based off of: -https://www.medrxiv.org/content/10.1101/2024.01.12.24301247v1 -} -\examples{ -simulate_double_censored_pmf(10, meanlog = 0, sdlog = 1) -} From 015c6b2b5ac332d7ecf13c721be32e6d8f6c23bb Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 10 Jul 2024 16:44:12 -0400 Subject: [PATCH 084/103] try adding knitr to imports --- DESCRIPTION | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index a0ec2f31..10cfcea4 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -82,7 +82,8 @@ Imports: scales, ggplot2, rcmdcheck, - posterior + posterior, + knitr Remotes: stan-dev/cmdstanr VignetteBuilder: From 098758194b96a0d1ed50427729e5712b47ee282f Mon Sep 17 00:00:00 2001 From: Kaitlyn Johnson <94390107+kaitejohnson@users.noreply.github.com> Date: Wed, 10 Jul 2024 17:16:44 -0400 Subject: [PATCH 085/103] Update pkgdown.yaml to generate on push to main --- .github/workflows/pkgdown.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/pkgdown.yaml b/.github/workflows/pkgdown.yaml index 5add54ff..eda5fdb0 100644 --- a/.github/workflows/pkgdown.yaml +++ b/.github/workflows/pkgdown.yaml @@ -2,7 +2,7 @@ # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help on: push: - branches: [prod] + branches: [main, prod] pull_request: branches: [main, prod] release: From 2546249492d423f7c2ae768c2c6d4a7703f175c6 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 10 Jul 2024 17:18:35 -0400 Subject: [PATCH 086/103] add ggtitle to imports --- NAMESPACE | 1 + R/wwinference-package.R | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/NAMESPACE b/NAMESPACE index 5a49ab15..c244a229 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -59,6 +59,7 @@ importFrom(ggplot2,geom_point) importFrom(ggplot2,geom_ribbon) importFrom(ggplot2,geom_vline) importFrom(ggplot2,ggplot) +importFrom(ggplot2,ggtitle) importFrom(ggplot2,labs) importFrom(ggplot2,scale_colour_discrete) importFrom(ggplot2,scale_fill_discrete) diff --git a/R/wwinference-package.R b/R/wwinference-package.R index d999b342..7283a145 100644 --- a/R/wwinference-package.R +++ b/R/wwinference-package.R @@ -8,7 +8,7 @@ #' @importFrom tidyr pivot_wider pivot_longer #' @importFrom ggplot2 ggplot facet_wrap geom_line geom_hline geom_point #' geom_bar theme scale_y_continuous scale_colour_discrete scale_fill_discrete -#' geom_ribbon scale_x_date facet_grid geom_vline labs aes +#' geom_ribbon scale_x_date facet_grid geom_vline labs aes ggtitle #' @importFrom cmdstanr cmdstan_model #' @importFrom posterior subset_draws as_draws_list #' @importFrom fs path_package From 954aef0408dde16795cb50f14e6a697a95e87925 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 10 Jul 2024 19:34:41 -0400 Subject: [PATCH 087/103] move covid delay distribs into R/ but dont export --- R/delay_distribs.R | 183 +++++++++++++++++++++++++ data-raw/covid_pmfs.R | 191 --------------------------- man/drop_first_and_renormalize.Rd | 26 ++++ man/make_hospital_onset_delay_pmf.Rd | 32 +++++ man/make_incubation_period_pmf.Rd | 35 +++++ man/make_reporting_delay_pmf.Rd | 28 ++++ man/simulate_double_censored_pmf.Rd | 53 ++++++++ 7 files changed, 357 insertions(+), 191 deletions(-) create mode 100644 R/delay_distribs.R create mode 100644 man/drop_first_and_renormalize.Rd create mode 100644 man/make_hospital_onset_delay_pmf.Rd create mode 100644 man/make_incubation_period_pmf.Rd create mode 100644 man/make_reporting_delay_pmf.Rd create mode 100644 man/simulate_double_censored_pmf.Rd diff --git a/R/delay_distribs.R b/R/delay_distribs.R new file mode 100644 index 00000000..23eb8c21 --- /dev/null +++ b/R/delay_distribs.R @@ -0,0 +1,183 @@ +#' Simulate daily double censored PMF. From {epinowcast}: +#' https://package.epinowcast.org/dev/reference/simulate_double_censored_pmf.html #nolint +#' +#' This function simulates the probability mass function of a daily +#' double-censored process. The process involves two distributions: a primary +#' distribution which represents the censoring process for the primary event +#' and another distribution (which is offset by the primary). +#' +#' Based off of: +#' https://www.medrxiv.org/content/10.1101/2024.01.12.24301247v1 +#' +#' @param max Maximum value for the computed CDF. If not specified, the maximum +#' value is the maximum simulated delay. +#' @param fun_primary Primary distribution function (default is \code{runif}). +#' @param fun_dist Distribution function to be added to the primary (default is +#' \code{rlnorm}). +#' @param n Number of simulations (default is 1e6). +#' @param primary_args List of additional arguments to be passed to the primary +#' distribution function. +#' @param dist_args List of additional arguments to be passed to the +#' distribution function. +#' @param ... Additional arguments to be passed to the distribution function. +#' This is an alternative to `dist_args`. +#' +#' @return A numeric vector representing the PMF. +#' @examples +#' simulate_double_censored_pmf(10, meanlog = 0, sdlog = 1) +simulate_double_censored_pmf <- function( + max, fun_primary = stats::runif, primary_args = list(), + fun_dist = stats::rlnorm, + dist_args = list(...), n = 1e6, ...) { + primary <- do.call(fun_primary, c(list(n), primary_args)) + secondary <- primary + do.call(fun_dist, c(list(n), dist_args)) + delay <- floor(secondary) - floor(primary) + if (missing(max)) { + max <- base::max(delay) + } + cdf <- ecdf(delay)(0:max) + pmf <- c(cdf[1], diff(cdf)) + vec_outside_tol <- abs(sum(pmf) - 1L) > 1e-10 + while (vec_outside_tol) { + pmf <- pmf / sum(pmf) + vec_outside_tol <- abs(sum(pmf) - 1L) > 1e-10 + } + return(pmf) +} + +#' Drop the first element of a simplex and re-normalize the result to sum to 1. +#' +#' When this vector corresponds to the generation interval distribution, we +#' want to drop this first bin. The renewal equation assumes that same-day +#' infection and onward transmission does not occur, and we assume +#' everything is 1 indexed not 0 indeced. We need to +#' manually drop the first element from the PMF vector. +#' +#' @param x A numeric vector, sums to 1. Corresponds to a discretized PDF or PMF +#' (usually the GI distribution). +#' +#' @return A numeric vector, sums to 1. +#' @examples +#' pmf_orig <- c(0.1, 0.1, 0.1, 0.7) +#' pmf_trunc <- drop_first_and_renormalize(pmf_orig) +drop_first_and_renormalize <- function(x) { + # Check input sums to 1 + stopifnot(abs(sum(x) - 1) < 1e-8) + # Drop and renormalize + y <- x[2:length(x)] / sum(x[2:length(x)]) + vec_outside_tol <- abs(sum(y) - 1L) > 1e-10 + # Normalize until within tolerance + while (vec_outside_tol) { + y <- y / sum(y) + vec_outside_tol <- abs(sum(y) - 1L) > 1e-10 + } + return(y) +} + +#' @title Make incubation period pmf +#' @description When the default arguments are used, this returns a pmf +#' corresponding to the incubation period for COVID after Omicron used in +#' Park et al 2023. These estimates are from early Omicron. +#' @param backward_scale numeric indicating the scale parameter for the Weibull +#' used in producing the incubation period distribution. default is `3.60` for +#' COVID +#' @param backward_shape numeric indicating the shape parameter for the Weibull +#' used in producing the incubation period distribution, default is `1.50` for +#' COVID +#' @param r numeric indicating the exponential rate used in producing the +#' correction on the incubaion period distribution, default is `0.15` for COVID +#' +#' @return pmf of incubation period +#' +#' @examples +#' inc_pmf <- make_incubation_period_pmf(3.6, 1.5, 0.15) +make_incubation_period_pmf <- function(backward_scale = 3.60, + backward_shape = 1.50, + r = 0.15) { + # From: Park, Sang Woo, et al. "Inferring the differences in incubation-period + # and generation-interval distributions of the Delta and Omicron variants of + # SARS-CoV-2." Proceedings of the National Academy of Sciences 120.22 (2023): + # e2221887120. + + # "However, when we account for growth-rate differences and reestimate the + # forward incubation periods, we find that both variants have similar + # incubation-period distributions with a mean of 4.1 d (95% CI: 3.8 to 4.4 d) + # for the Delta variant and 4.2 d (95% CI: 3.6 to 4.9 d) for the Omicron + # variant Fig. 3B)." + + # Fits a Weibull to the data + + # Relies on fundamental assumption about epidemic growth rate. + + + discr_gr_adj_weibull <- tibble::tibble( + time = seq(0, 23, by = 1), # 23 seems to get most of the distribution mass + density0 = dweibull(time, + shape = backward_shape, + scale = backward_scale + ) * exp(r * time) + ) + + inc_period_pmf <- wwinference::to_simplex(discr_gr_adj_weibull$density0) + return(inc_period_pmf) +} + + +#' @title Make hospital onset delay pmf +#' @description Uses the parameter estimates from cfa-parameter-estimates, +#' which is based on Danache et al linelist data from symptom onset to hospital +#' admission. See below: +#' https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0261428 +#' +#' @param neg_binom_mu float indicating the mean of the negative binomial shaped +#' delay from symptom onset to hospital admissions, default is `6.98665` from +#' fit to data in above paper +#' @param neg_binom_size float indicating the dispersion parameter in the +#' negative binomial delay from symptom onset to hospital admissions, default +#' is `2.490848` from fit to data in above paper +#' +#' @return pmf of distribution from symptom onset to hospital admission +#' +#' @examples +#' delay_pmf <- make_hospital_onset_delay_pmf(7, 2.5) +make_hospital_onset_delay_pmf <- function(neg_binom_mu = 6.98665, + neg_binom_size = 2.490848) { + density <- dnbinom( + x = seq(0, 30, 1), + mu = neg_binom_mu, size = neg_binom_size + ) + hosp_onset_delay_pmf <- density / sum(density) + + return(hosp_onset_delay_pmf) +} + + + +#' @title Make reporting delay pmf +#' @description +#' Convolve the incubation period pmf with the symptom to hospital admission pmf +#' and normalize +#' +#' @param incubation_period_pmf a numeric vector, sums to 1, indicating +#' the probability of time from infection to symptom onset +#' @param hospital_onset_delay_pmf a numeric vector, sums to 1, indicating the +#' proabbility of time from symptom onset to hospital admissions +#' +#' @return convolution of incubation period and sympton onset to hospital +#' admission pmf +#' +#' @examples +#' inc_pmf <- make_incubation_period_pmf(3.6, 1.5, 0.15) +#' hosp_delay_pmf <- make_hospital_onset_delay_pmf(7, 2.5) +#' inf_to_hosp_pmf <- make_reporting_delay_pmf(inc_pmf, hosp_delay_pmf) +make_reporting_delay_pmf <- function(incubation_period_pmf, + hospital_onset_delay_pmf) { + pmfs <- list( + "incubation_period" = incubation_period_pmf, + "hosp_onset_delay" = hospital_onset_delay_pmf + ) + + infection_to_hosp_delay_pmf <- add_pmfs(pmfs) |> + wwinference::to_simplex() + return(infection_to_hosp_delay_pmf) +} diff --git a/data-raw/covid_pmfs.R b/data-raw/covid_pmfs.R index 19b52edc..ccc353d8 100644 --- a/data-raw/covid_pmfs.R +++ b/data-raw/covid_pmfs.R @@ -5,197 +5,6 @@ params <- get_params( ) ) - - - -#' Simulate daily double censored PMF. From {epinowcast}: -#' https://package.epinowcast.org/dev/reference/simulate_double_censored_pmf.html #nolint -#' -#' This function simulates the probability mass function of a daily -#' double-censored process. The process involves two distributions: a primary -#' distribution which represents the censoring process for the primary event -#' and another distribution (which is offset by the primary). -#' -#' Based off of: -#' https://www.medrxiv.org/content/10.1101/2024.01.12.24301247v1 -#' -#' @param max Maximum value for the computed CDF. If not specified, the maximum -#' value is the maximum simulated delay. -#' @param fun_primary Primary distribution function (default is \code{runif}). -#' @param fun_dist Distribution function to be added to the primary (default is -#' \code{rlnorm}). -#' @param n Number of simulations (default is 1e6). -#' @param primary_args List of additional arguments to be passed to the primary -#' distribution function. -#' @param dist_args List of additional arguments to be passed to the -#' distribution function. -#' @param ... Additional arguments to be passed to the distribution function. -#' This is an alternative to `dist_args`. -#' -#' @return A numeric vector representing the PMF. -#' @examples -#' simulate_double_censored_pmf(10, meanlog = 0, sdlog = 1) -simulate_double_censored_pmf <- function( - max, fun_primary = stats::runif, primary_args = list(), - fun_dist = stats::rlnorm, - dist_args = list(...), n = 1e6, ...) { - primary <- do.call(fun_primary, c(list(n), primary_args)) - secondary <- primary + do.call(fun_dist, c(list(n), dist_args)) - delay <- floor(secondary) - floor(primary) - if (missing(max)) { - max <- base::max(delay) - } - cdf <- ecdf(delay)(0:max) - pmf <- c(cdf[1], diff(cdf)) - vec_outside_tol <- abs(sum(pmf) - 1L) > 1e-10 - while (vec_outside_tol) { - pmf <- pmf / sum(pmf) - vec_outside_tol <- abs(sum(pmf) - 1L) > 1e-10 - } - return(pmf) -} - -#' Drop the first element of a simplex and re-normalize the result to sum to 1. -#' -#' When this vector corresponds to the generation interval distribution, we -#' want to drop this first bin. The renewal equation assumes that same-day -#' infection and onward transmission does not occur, and we assume -#' everything is 1 indexed not 0 indeced. We need to -#' manually drop the first element from the PMF vector. -#' -#' @param x A numeric vector, sums to 1. Corresponds to a discretized PDF or PMF -#' (usually the GI distribution). -#' -#' @return A numeric vector, sums to 1. -#' @examples -#' pmf_orig <- c(0.1, 0.1, 0.1, 0.7) -#' pmf_trunc <- drop_first_and_renormalize(pmf_orig) -drop_first_and_renormalize <- function(x) { - # Check input sums to 1 - stopifnot(abs(sum(x) - 1) < 1e-8) - # Drop and renormalize - y <- x[2:length(x)] / sum(x[2:length(x)]) - vec_outside_tol <- abs(sum(y) - 1L) > 1e-10 - # Normalize until within tolerance - while (vec_outside_tol) { - y <- y / sum(y) - vec_outside_tol <- abs(sum(y) - 1L) > 1e-10 - } - return(y) -} - -#' @title Make incubation period pmf -#' @description When the default arguments are used, this returns a pmf -#' corresponding to the incubation period for COVID after Omicron used in -#' Park et al 2023. These estimates are from early Omicron. -#' @param backward_scale numeric indicating the scale parameter for the Weibull -#' used in producing the incubation period distribution. default is `3.60` for -#' COVID -#' @param backward_shape numeric indicating the shape parameter for the Weibull -#' used in producing the incubation period distribution, default is `1.50` for -#' COVID -#' @param r numeric indicating the exponential rate used in producing the -#' correction on the incubaion period distribution, default is `0.15` for COVID -#' -#' @return pmf of incubation period -#' -#' @examples -#' inc_pmf <- make_incubation_period_pmf(3.6, 1.5, 0.15) -make_incubation_period_pmf <- function(backward_scale = 3.60, - backward_shape = 1.50, - r = 0.15) { - # From: Park, Sang Woo, et al. "Inferring the differences in incubation-period - # and generation-interval distributions of the Delta and Omicron variants of - # SARS-CoV-2." Proceedings of the National Academy of Sciences 120.22 (2023): - # e2221887120. - - # "However, when we account for growth-rate differences and reestimate the - # forward incubation periods, we find that both variants have similar - # incubation-period distributions with a mean of 4.1 d (95% CI: 3.8 to 4.4 d) - # for the Delta variant and 4.2 d (95% CI: 3.6 to 4.9 d) for the Omicron - # variant Fig. 3B)." - - # Fits a Weibull to the data - - # Relies on fundamental assumption about epidemic growth rate. - - - discr_gr_adj_weibull <- tibble::tibble( - time = seq(0, 23, by = 1), # 23 seems to get most of the distribution mass - density0 = dweibull(time, - shape = backward_shape, - scale = backward_scale - ) * exp(r * time) - ) - - inc_period_pmf <- wwinference::to_simplex(discr_gr_adj_weibull$density0) - return(inc_period_pmf) -} - - -#' @title Make hospital onset delay pmf -#' @description Uses the parameter estimates from cfa-parameter-estimates, -#' which is based on Danache et al linelist data from symptom onset to hospital -#' admission. See below: -#' https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0261428 -#' -#' @param neg_binom_mu float indicating the mean of the negative binomial shaped -#' delay from symptom onset to hospital admissions, default is `6.98665` from -#' fit to data in above paper -#' @param neg_binom_size float indicating the dispersion parameter in the -#' negative binomial delay from symptom onset to hospital admissions, default -#' is `2.490848` from fit to data in above paper -#' -#' @return pmf of distribution from symptom onset to hospital admission -#' -#' @examples -#' delay_pmf <- make_hospital_onset_delay_pmf(7, 2.5) -make_hospital_onset_delay_pmf <- function(neg_binom_mu = 6.98665, - neg_binom_size = 2.490848) { - density <- dnbinom( - x = seq(0, 30, 1), - mu = neg_binom_mu, size = neg_binom_size - ) - hosp_onset_delay_pmf <- density / sum(density) - - return(hosp_onset_delay_pmf) -} - - - -#' @title Make reporting delay pmf -#' @description -#' Convolve the incubation period pmf with the symptom to hospital admission pmf -#' and normalize -#' -#' @param incubation_period_pmf a numeric vector, sums to 1, indicating -#' the probability of time from infection to symptom onset -#' @param hospital_onset_delay_pmf a numeric vector, sums to 1, indicating the -#' proabbility of time from symptom onset to hospital admissions -#' -#' @return convolution of incubation period and sympton onset to hospital -#' admission pmf -#' -#' @examples -#' inc_pmf <- make_incubation_period_pmf(3.6, 1.5, 0.15) -#' hosp_delay_pmf <- make_hospital_onset_delay_pmf(7, 2.5) -#' inf_to_hosp_pmf <- make_reporting_delay_pmf(inc_pmf, hosp_delay_pmf) -make_reporting_delay_pmf <- function(incubation_period_pmf, - hospital_onset_delay_pmf) { - pmfs <- list( - "incubation_period" = incubation_period_pmf, - "hosp_onset_delay" = hospital_onset_delay_pmf - ) - - infection_to_hosp_delay_pmf <- add_pmfs(pmfs) |> - wwinference::to_simplex() - return(infection_to_hosp_delay_pmf) -} - - - - - # Put it all together generation_interval <- withr::with_seed(42, { simulate_double_censored_pmf( diff --git a/man/drop_first_and_renormalize.Rd b/man/drop_first_and_renormalize.Rd new file mode 100644 index 00000000..fd3b09d6 --- /dev/null +++ b/man/drop_first_and_renormalize.Rd @@ -0,0 +1,26 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/delay_distribs.R +\name{drop_first_and_renormalize} +\alias{drop_first_and_renormalize} +\title{Drop the first element of a simplex and re-normalize the result to sum to 1.} +\usage{ +drop_first_and_renormalize(x) +} +\arguments{ +\item{x}{A numeric vector, sums to 1. Corresponds to a discretized PDF or PMF +(usually the GI distribution).} +} +\value{ +A numeric vector, sums to 1. +} +\description{ +When this vector corresponds to the generation interval distribution, we +want to drop this first bin. The renewal equation assumes that same-day +infection and onward transmission does not occur, and we assume +everything is 1 indexed not 0 indeced. We need to +manually drop the first element from the PMF vector. +} +\examples{ +pmf_orig <- c(0.1, 0.1, 0.1, 0.7) +pmf_trunc <- drop_first_and_renormalize(pmf_orig) +} diff --git a/man/make_hospital_onset_delay_pmf.Rd b/man/make_hospital_onset_delay_pmf.Rd new file mode 100644 index 00000000..702cce5d --- /dev/null +++ b/man/make_hospital_onset_delay_pmf.Rd @@ -0,0 +1,32 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/delay_distribs.R +\name{make_hospital_onset_delay_pmf} +\alias{make_hospital_onset_delay_pmf} +\title{Make hospital onset delay pmf} +\usage{ +make_hospital_onset_delay_pmf( + neg_binom_mu = 6.98665, + neg_binom_size = 2.490848 +) +} +\arguments{ +\item{neg_binom_mu}{float indicating the mean of the negative binomial shaped +delay from symptom onset to hospital admissions, default is \code{6.98665} from +fit to data in above paper} + +\item{neg_binom_size}{float indicating the dispersion parameter in the +negative binomial delay from symptom onset to hospital admissions, default +is \code{2.490848} from fit to data in above paper} +} +\value{ +pmf of distribution from symptom onset to hospital admission +} +\description{ +Uses the parameter estimates from cfa-parameter-estimates, +which is based on Danache et al linelist data from symptom onset to hospital +admission. See below: +https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0261428 +} +\examples{ +delay_pmf <- make_hospital_onset_delay_pmf(7, 2.5) +} diff --git a/man/make_incubation_period_pmf.Rd b/man/make_incubation_period_pmf.Rd new file mode 100644 index 00000000..2f73fb0a --- /dev/null +++ b/man/make_incubation_period_pmf.Rd @@ -0,0 +1,35 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/delay_distribs.R +\name{make_incubation_period_pmf} +\alias{make_incubation_period_pmf} +\title{Make incubation period pmf} +\usage{ +make_incubation_period_pmf( + backward_scale = 3.6, + backward_shape = 1.5, + r = 0.15 +) +} +\arguments{ +\item{backward_scale}{numeric indicating the scale parameter for the Weibull +used in producing the incubation period distribution. default is \code{3.60} for +COVID} + +\item{backward_shape}{numeric indicating the shape parameter for the Weibull +used in producing the incubation period distribution, default is \code{1.50} for +COVID} + +\item{r}{numeric indicating the exponential rate used in producing the +correction on the incubaion period distribution, default is \code{0.15} for COVID} +} +\value{ +pmf of incubation period +} +\description{ +When the default arguments are used, this returns a pmf +corresponding to the incubation period for COVID after Omicron used in +Park et al 2023. These estimates are from early Omicron. +} +\examples{ +inc_pmf <- make_incubation_period_pmf(3.6, 1.5, 0.15) +} diff --git a/man/make_reporting_delay_pmf.Rd b/man/make_reporting_delay_pmf.Rd new file mode 100644 index 00000000..9d7605be --- /dev/null +++ b/man/make_reporting_delay_pmf.Rd @@ -0,0 +1,28 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/delay_distribs.R +\name{make_reporting_delay_pmf} +\alias{make_reporting_delay_pmf} +\title{Make reporting delay pmf} +\usage{ +make_reporting_delay_pmf(incubation_period_pmf, hospital_onset_delay_pmf) +} +\arguments{ +\item{incubation_period_pmf}{a numeric vector, sums to 1, indicating +the probability of time from infection to symptom onset} + +\item{hospital_onset_delay_pmf}{a numeric vector, sums to 1, indicating the +proabbility of time from symptom onset to hospital admissions} +} +\value{ +convolution of incubation period and sympton onset to hospital +admission pmf +} +\description{ +Convolve the incubation period pmf with the symptom to hospital admission pmf +and normalize +} +\examples{ +inc_pmf <- make_incubation_period_pmf(3.6, 1.5, 0.15) +hosp_delay_pmf <- make_hospital_onset_delay_pmf(7, 2.5) +inf_to_hosp_pmf <- make_reporting_delay_pmf(inc_pmf, hosp_delay_pmf) +} diff --git a/man/simulate_double_censored_pmf.Rd b/man/simulate_double_censored_pmf.Rd new file mode 100644 index 00000000..5127a495 --- /dev/null +++ b/man/simulate_double_censored_pmf.Rd @@ -0,0 +1,53 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/delay_distribs.R +\name{simulate_double_censored_pmf} +\alias{simulate_double_censored_pmf} +\title{Simulate daily double censored PMF. From {epinowcast}: +https://package.epinowcast.org/dev/reference/simulate_double_censored_pmf.html #nolint} +\usage{ +simulate_double_censored_pmf( + max, + fun_primary = stats::runif, + primary_args = list(), + fun_dist = stats::rlnorm, + dist_args = list(...), + n = 1e+06, + ... +) +} +\arguments{ +\item{max}{Maximum value for the computed CDF. If not specified, the maximum +value is the maximum simulated delay.} + +\item{fun_primary}{Primary distribution function (default is \code{runif}).} + +\item{primary_args}{List of additional arguments to be passed to the primary +distribution function.} + +\item{fun_dist}{Distribution function to be added to the primary (default is +\code{rlnorm}).} + +\item{dist_args}{List of additional arguments to be passed to the +distribution function.} + +\item{n}{Number of simulations (default is 1e6).} + +\item{...}{Additional arguments to be passed to the distribution function. +This is an alternative to \code{dist_args}.} +} +\value{ +A numeric vector representing the PMF. +} +\description{ +This function simulates the probability mass function of a daily +double-censored process. The process involves two distributions: a primary +distribution which represents the censoring process for the primary event +and another distribution (which is offset by the primary). +} +\details{ +Based off of: +https://www.medrxiv.org/content/10.1101/2024.01.12.24301247v1 +} +\examples{ +simulate_double_censored_pmf(10, meanlog = 0, sdlog = 1) +} From e990f34c20bc392ebea05e127ded6ff1dd8216ac Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Wed, 10 Jul 2024 21:37:45 -0400 Subject: [PATCH 088/103] see if space allows fxn to be recognized in CI --- R/delay_distribs.R | 1 + 1 file changed, 1 insertion(+) diff --git a/R/delay_distribs.R b/R/delay_distribs.R index 23eb8c21..6ca813cc 100644 --- a/R/delay_distribs.R +++ b/R/delay_distribs.R @@ -57,6 +57,7 @@ simulate_double_censored_pmf <- function( #' (usually the GI distribution). #' #' @return A numeric vector, sums to 1. +#' #' @examples #' pmf_orig <- c(0.1, 0.1, 0.1, 0.7) #' pmf_trunc <- drop_first_and_renormalize(pmf_orig) From 8a480776d96ca9f4022eecc92fae00852312e45c Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Thu, 11 Jul 2024 08:45:45 -0400 Subject: [PATCH 089/103] remove example --- R/delay_distribs.R | 4 ---- man/drop_first_and_renormalize.Rd | 4 ---- 2 files changed, 8 deletions(-) diff --git a/R/delay_distribs.R b/R/delay_distribs.R index 6ca813cc..dcc6582a 100644 --- a/R/delay_distribs.R +++ b/R/delay_distribs.R @@ -57,10 +57,6 @@ simulate_double_censored_pmf <- function( #' (usually the GI distribution). #' #' @return A numeric vector, sums to 1. -#' -#' @examples -#' pmf_orig <- c(0.1, 0.1, 0.1, 0.7) -#' pmf_trunc <- drop_first_and_renormalize(pmf_orig) drop_first_and_renormalize <- function(x) { # Check input sums to 1 stopifnot(abs(sum(x) - 1) < 1e-8) diff --git a/man/drop_first_and_renormalize.Rd b/man/drop_first_and_renormalize.Rd index fd3b09d6..ed85b412 100644 --- a/man/drop_first_and_renormalize.Rd +++ b/man/drop_first_and_renormalize.Rd @@ -20,7 +20,3 @@ infection and onward transmission does not occur, and we assume everything is 1 indexed not 0 indeced. We need to manually drop the first element from the PMF vector. } -\examples{ -pmf_orig <- c(0.1, 0.1, 0.1, 0.7) -pmf_trunc <- drop_first_and_renormalize(pmf_orig) -} From 92240b06cd77e15ccda5b39d7329a97b8fc9b8ba Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Thu, 11 Jul 2024 09:27:30 -0400 Subject: [PATCH 090/103] remove all examples from non exported functions --- R/checkers.R | 1 + R/delay_distribs.R | 13 ------------- man/check_elements_non_neg.Rd | 2 ++ man/make_hospital_onset_delay_pmf.Rd | 3 --- man/make_incubation_period_pmf.Rd | 3 --- man/make_reporting_delay_pmf.Rd | 5 ----- man/simulate_double_censored_pmf.Rd | 3 --- 7 files changed, 3 insertions(+), 27 deletions(-) diff --git a/R/checkers.R b/R/checkers.R index 1061bee5..8329d4eb 100644 --- a/R/checkers.R +++ b/R/checkers.R @@ -26,6 +26,7 @@ check_date <- function(df, max_date, call = rlang::caller_env()) { #' Check that all elements of a vector are non-negative #' #' @param x vector of arguments to check for negativity +#' @param arg string to print the name of the element your checking #' @param call Calling environment to be passed to the type checker #' #' @return NULL, invisibly diff --git a/R/delay_distribs.R b/R/delay_distribs.R index dcc6582a..14f5d53e 100644 --- a/R/delay_distribs.R +++ b/R/delay_distribs.R @@ -23,8 +23,6 @@ #' This is an alternative to `dist_args`. #' #' @return A numeric vector representing the PMF. -#' @examples -#' simulate_double_censored_pmf(10, meanlog = 0, sdlog = 1) simulate_double_censored_pmf <- function( max, fun_primary = stats::runif, primary_args = list(), fun_dist = stats::rlnorm, @@ -85,9 +83,6 @@ drop_first_and_renormalize <- function(x) { #' correction on the incubaion period distribution, default is `0.15` for COVID #' #' @return pmf of incubation period -#' -#' @examples -#' inc_pmf <- make_incubation_period_pmf(3.6, 1.5, 0.15) make_incubation_period_pmf <- function(backward_scale = 3.60, backward_shape = 1.50, r = 0.15) { @@ -134,9 +129,6 @@ make_incubation_period_pmf <- function(backward_scale = 3.60, #' is `2.490848` from fit to data in above paper #' #' @return pmf of distribution from symptom onset to hospital admission -#' -#' @examples -#' delay_pmf <- make_hospital_onset_delay_pmf(7, 2.5) make_hospital_onset_delay_pmf <- function(neg_binom_mu = 6.98665, neg_binom_size = 2.490848) { density <- dnbinom( @@ -162,11 +154,6 @@ make_hospital_onset_delay_pmf <- function(neg_binom_mu = 6.98665, #' #' @return convolution of incubation period and sympton onset to hospital #' admission pmf -#' -#' @examples -#' inc_pmf <- make_incubation_period_pmf(3.6, 1.5, 0.15) -#' hosp_delay_pmf <- make_hospital_onset_delay_pmf(7, 2.5) -#' inf_to_hosp_pmf <- make_reporting_delay_pmf(inc_pmf, hosp_delay_pmf) make_reporting_delay_pmf <- function(incubation_period_pmf, hospital_onset_delay_pmf) { pmfs <- list( diff --git a/man/check_elements_non_neg.Rd b/man/check_elements_non_neg.Rd index 2691883a..c7570617 100644 --- a/man/check_elements_non_neg.Rd +++ b/man/check_elements_non_neg.Rd @@ -9,6 +9,8 @@ check_elements_non_neg(x, arg = "x", call = rlang::caller_env()) \arguments{ \item{x}{vector of arguments to check for negativity} +\item{arg}{string to print the name of the element your checking} + \item{call}{Calling environment to be passed to the type checker} } \value{ diff --git a/man/make_hospital_onset_delay_pmf.Rd b/man/make_hospital_onset_delay_pmf.Rd index 702cce5d..8df84e27 100644 --- a/man/make_hospital_onset_delay_pmf.Rd +++ b/man/make_hospital_onset_delay_pmf.Rd @@ -27,6 +27,3 @@ which is based on Danache et al linelist data from symptom onset to hospital admission. See below: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0261428 } -\examples{ -delay_pmf <- make_hospital_onset_delay_pmf(7, 2.5) -} diff --git a/man/make_incubation_period_pmf.Rd b/man/make_incubation_period_pmf.Rd index 2f73fb0a..758ad647 100644 --- a/man/make_incubation_period_pmf.Rd +++ b/man/make_incubation_period_pmf.Rd @@ -30,6 +30,3 @@ When the default arguments are used, this returns a pmf corresponding to the incubation period for COVID after Omicron used in Park et al 2023. These estimates are from early Omicron. } -\examples{ -inc_pmf <- make_incubation_period_pmf(3.6, 1.5, 0.15) -} diff --git a/man/make_reporting_delay_pmf.Rd b/man/make_reporting_delay_pmf.Rd index 9d7605be..e147f4fd 100644 --- a/man/make_reporting_delay_pmf.Rd +++ b/man/make_reporting_delay_pmf.Rd @@ -21,8 +21,3 @@ admission pmf Convolve the incubation period pmf with the symptom to hospital admission pmf and normalize } -\examples{ -inc_pmf <- make_incubation_period_pmf(3.6, 1.5, 0.15) -hosp_delay_pmf <- make_hospital_onset_delay_pmf(7, 2.5) -inf_to_hosp_pmf <- make_reporting_delay_pmf(inc_pmf, hosp_delay_pmf) -} diff --git a/man/simulate_double_censored_pmf.Rd b/man/simulate_double_censored_pmf.Rd index 5127a495..81845c3e 100644 --- a/man/simulate_double_censored_pmf.Rd +++ b/man/simulate_double_censored_pmf.Rd @@ -48,6 +48,3 @@ and another distribution (which is offset by the primary). Based off of: https://www.medrxiv.org/content/10.1101/2024.01.12.24301247v1 } -\examples{ -simulate_double_censored_pmf(10, meanlog = 0, sdlog = 1) -} From 890a4db9b345dfa2a1007f8e5924f9c9f1a483ad Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Thu, 11 Jul 2024 11:38:37 -0400 Subject: [PATCH 091/103] remove postprocess --- R/postprocess.R | 155 ------------------------------------------------ 1 file changed, 155 deletions(-) delete mode 100644 R/postprocess.R diff --git a/R/postprocess.R b/R/postprocess.R deleted file mode 100644 index 23c2fcc2..00000000 --- a/R/postprocess.R +++ /dev/null @@ -1,155 +0,0 @@ -#' @title Postprocess to generate a draws dataframe -#' -#' @description -#' This function takes in the two input data sources, the CmdStan fit object, -#' and the 3 relevant mappings from stan indices to the real data, in order -#' to generate a dataframe containing the posterior draws of the counts (e.g. -#' hospital admissions), the wastewater concentration values, the "global" R(t), -#' and the "local" R(t) estimates + the critical metadata in the data -#' -#' -#' @param ww_data A dataframe of the preprocessed wastewater concentration data -#' used to fit the model -#' @param count_data A dataframe of the preprocessed daily count data (e.g. -#' hospital admissions) from the "global" population -#' @param fit_obj a CmdStan object that is the output of fitting the model to -#' the `ww_data` and `count_data` -#' @param date_time_spine A tibble mapping the time index in stan (observed + -#' nowcast + forecast) to real dates -#' @param lab_site_spine A tibble mapping the site-lab index in stan to the -#' corresponding site, lab, and site population -#' @param subpop_spine A tibble mapping the site index in stan to the -#' corresponding subpopulation (either a site or the auxiliary site we add to -#' represent the rest of the population) -#' -#' @return A tibble containing the full set of posterior draws of the -#' estimated, nowcasted, and forecasted: counts, site-level wastewater -#' concentrations, "global"(e.g. state) R(t) estimate, and the "local" (site + -#' the one auxiliary subpopulation) R(t) estimates. In the instance where there -#' are observations, the data will be joined to each draw of the predicted -#' observation to facilitate plotting. -#' @export -postprocess <- function(ww_data, - count_data, - fit_obj, - date_time_spine, - lab_site_spine, - subpop_spine) { - draws <- fit_obj$result$draws() - - count_draws <- draws |> - tidybayes::spread_draws(pred_hosp[t]) |> - dplyr::rename(pred_value = pred_hosp) |> - dplyr::mutate( - draw = `.draw`, - name = "pred_counts" - ) |> - dplyr::select(name, t, pred_value, draw) |> - dplyr::left_join(date_time_spine, by = "t") |> - dplyr::left_join(count_data, by = "date") |> - dplyr::ungroup() |> - dplyr::rename(observed_value = count) |> - dplyr::mutate( - observation_type = "count", - type_of_quantity = "global", - lab_site_index = NA, - subpop = NA, - lab = NA, - site_pop = NA, - below_lod = NA, - lod = NA, - flag_as_ww_outlier = NA, - exclude = NA - ) |> - dplyr::select(-t) - - ww_draws <- draws |> - tidybayes::spread_draws(pred_ww[lab_site_index, t]) |> - dplyr::rename(pred_value = pred_ww) |> - dplyr::mutate( - draw = `.draw`, - name = "pred_ww", - pred_value = exp(pred_value) - ) |> - dplyr::select(name, lab_site_index, t, pred_value, draw) |> - dplyr::left_join(date_time_spine, by = "t") |> - dplyr::left_join(lab_site_spine, by = "lab_site_index") |> - dplyr::left_join(ww_data, by = c( - "lab_site_index", "date", - "lab", "site", "site_pop" - )) |> - dplyr::ungroup() |> - dplyr::mutate(observed_value = genome_copies_per_ml) |> - dplyr::mutate( - observation_type = "genome copies per mL", - type_of_quantity = "local", - total_pop = NA, - subpop = glue::glue("Site: {site}") - ) |> - dplyr::select(colnames(count_draws), -t) - - global_rt_draws <- draws |> - tidybayes::spread_draws(rt[t]) |> - dplyr::rename(pred_value = rt) |> - dplyr::mutate( - draw = `.draw`, - name = "global R(t)" - ) |> - dplyr::select(name, t, pred_value, draw) |> - dplyr::left_join(date_time_spine, by = "t") |> - dplyr::left_join(count_data, by = "date") |> - dplyr::ungroup() |> - dplyr::rename(observed_value = count) |> - dplyr::mutate( - observed_value = NA, - observation_type = "latent variable", - type_of_quantity = "global", - lab_site_index = NA, - subpop = NA, - lab = NA, - site_pop = NA, - below_lod = NA, - lod = NA, - flag_as_ww_outlier = NA, - exclude = NA - ) |> - dplyr::select(-t) - - site_level_rt_draws <- draws |> - tidybayes::spread_draws(r_site_t[site_index, t]) |> - dplyr::rename(pred_value = r_site_t) |> - dplyr::mutate( - draw = `.draw`, - name = "subpop R(t)", - pred_value = pred_value - ) |> - dplyr::select(name, site_index, t, pred_value, draw) |> - dplyr::left_join(date_time_spine, by = "t") |> - dplyr::left_join(subpop_spine, by = "site_index") |> - dplyr::ungroup() |> - dplyr::mutate( - observed_value = NA, - lab_site_index = NA, - lab = NA, - below_lod = NA, - lod = NA, - flag_as_ww_outlier = NA, - exclude = NA, - observation_type = "latent variable", - type_of_quantity = "local", - total_pop = NA, - subpop = ifelse(site != "remainder of pop", - glue::glue("Site: {site}"), "remainder of pop" - ) - ) |> - dplyr::select(colnames(count_draws), -t) - - draws_df <- dplyr::bind_rows( - count_draws, - ww_draws, - global_rt_draws, - site_level_rt_draws - ) - - return(draws_df) -} From 17fe2a8b1d8b6de9f13844d6d93d0886c8d57088 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Thu, 11 Jul 2024 11:39:11 -0400 Subject: [PATCH 092/103] update documentation to remove postprocess --- NAMESPACE | 1 - man/postprocess.Rd | 50 ---------------------------------------------- 2 files changed, 51 deletions(-) delete mode 100644 man/postprocess.Rd diff --git a/NAMESPACE b/NAMESPACE index c244a229..310bc702 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -27,7 +27,6 @@ export(get_ww_data_indices) export(get_ww_data_sizes) export(get_ww_values) export(indicate_ww_exclusions) -export(postprocess) export(preprocess_hosp_data) export(preprocess_ww_data) export(to_simplex) diff --git a/man/postprocess.Rd b/man/postprocess.Rd deleted file mode 100644 index e9555301..00000000 --- a/man/postprocess.Rd +++ /dev/null @@ -1,50 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/postprocess.R -\name{postprocess} -\alias{postprocess} -\title{Postprocess to generate a draws dataframe} -\usage{ -postprocess( - ww_data, - count_data, - fit_obj, - date_time_spine, - lab_site_spine, - subpop_spine -) -} -\arguments{ -\item{ww_data}{A dataframe of the preprocessed wastewater concentration data -used to fit the model} - -\item{count_data}{A dataframe of the preprocessed daily count data (e.g. -hospital admissions) from the "global" population} - -\item{fit_obj}{a CmdStan object that is the output of fitting the model to -the \code{ww_data} and \code{count_data}} - -\item{date_time_spine}{A tibble mapping the time index in stan (observed + -nowcast + forecast) to real dates} - -\item{lab_site_spine}{A tibble mapping the site-lab index in stan to the -corresponding site, lab, and site population} - -\item{subpop_spine}{A tibble mapping the site index in stan to the -corresponding subpopulation (either a site or the auxiliary site we add to -represent the rest of the population)} -} -\value{ -A tibble containing the full set of posterior draws of the -estimated, nowcasted, and forecasted: counts, site-level wastewater -concentrations, "global"(e.g. state) R(t) estimate, and the "local" (site + -the one auxiliary subpopulation) R(t) estimates. In the instance where there -are observations, the data will be joined to each draw of the predicted -observation to facilitate plotting. -} -\description{ -This function takes in the two input data sources, the CmdStan fit object, -and the 3 relevant mappings from stan indices to the real data, in order -to generate a dataframe containing the posterior draws of the counts (e.g. -hospital admissions), the wastewater concentration values, the "global" R(t), -and the "local" R(t) estimates + the critical metadata in the data -} From 3117d1b685fe6e368b636dbebc221e96646b5e6a Mon Sep 17 00:00:00 2001 From: Kaitlyn Johnson <94390107+kaitejohnson@users.noreply.github.com> Date: Fri, 12 Jul 2024 11:24:03 -0400 Subject: [PATCH 093/103] Update .github/workflows/r-cmd-check.yaml Co-authored-by: Dylan H. Morris --- .github/workflows/r-cmd-check.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/r-cmd-check.yaml b/.github/workflows/r-cmd-check.yaml index 2bc50679..fab662c1 100644 --- a/.github/workflows/r-cmd-check.yaml +++ b/.github/workflows/r-cmd-check.yaml @@ -16,7 +16,7 @@ jobs: r-version: "release" use-public-rspm: true extra-repositories: "https://mc-stan.org/r-packages/" - - name: "Set up dependencies for wweval" + - name: "Set up dependencies for wwinference" uses: r-lib/actions/setup-r-dependencies@v2 with: needs: check From 5fca6ddd2550c968fe279402a06908d2ba57c385 Mon Sep 17 00:00:00 2001 From: Kaitlyn Johnson <94390107+kaitejohnson@users.noreply.github.com> Date: Fri, 12 Jul 2024 11:28:24 -0400 Subject: [PATCH 094/103] Update R/get_stan_data.R Co-authored-by: Dylan H. Morris --- R/get_stan_data.R | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/R/get_stan_data.R b/R/get_stan_data.R index 1d34f67a..0dfb475c 100644 --- a/R/get_stan_data.R +++ b/R/get_stan_data.R @@ -476,9 +476,7 @@ get_ww_values <- function(ww_data, # Get the vector of log wastewater concentrations log_conc <- ww_data |> - dplyr::mutate(log_conc = as.numeric(log(!!rlang::sym( - ww_measurement_col_name - ) + 1e-8))) |> + dplyr::mutate(log_conc = as.numeric(log(.data[[ww_measurement_col_name]] + 1e-8))) |> dplyr::pull(log_conc) ww_values <- list( From 00834a3fbfc5161e4c925c45a75638aaf063c88c Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 12 Jul 2024 11:32:49 -0400 Subject: [PATCH 095/103] fix little bugs --- DESCRIPTION | 2 -- R/get_stan_data.R | 5 ++++- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index 10cfcea4..f384034a 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -72,7 +72,6 @@ Imports: RcppTOML, cli, tibble, - usethis, tidybayes, tidyr, purrr, @@ -81,7 +80,6 @@ Imports: rlang, scales, ggplot2, - rcmdcheck, posterior, knitr Remotes: diff --git a/R/get_stan_data.R b/R/get_stan_data.R index 0dfb475c..2825933e 100644 --- a/R/get_stan_data.R +++ b/R/get_stan_data.R @@ -476,7 +476,10 @@ get_ww_values <- function(ww_data, # Get the vector of log wastewater concentrations log_conc <- ww_data |> - dplyr::mutate(log_conc = as.numeric(log(.data[[ww_measurement_col_name]] + 1e-8))) |> + dplyr::mutate( + log_conc = + (log(.data[[ww_measurement_col_name]] + 1e-8)) + ) |> dplyr::pull(log_conc) ww_values <- list( From c72e13daaaa53e189ae44aea2307821a3961d1f2 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 12 Jul 2024 11:33:16 -0400 Subject: [PATCH 096/103] remove extra stan file --- inst/stan/simplewwinference.stan | 232 ------------------------------- 1 file changed, 232 deletions(-) delete mode 100644 inst/stan/simplewwinference.stan diff --git a/inst/stan/simplewwinference.stan b/inst/stan/simplewwinference.stan deleted file mode 100644 index 9ff0fb27..00000000 --- a/inst/stan/simplewwinference.stan +++ /dev/null @@ -1,232 +0,0 @@ -functions { -#include functions/ar1.stan -#include functions/diff_ar1.stan -#include functions/convolve.stan -#include functions/infections.stan -#include functions/hospitalization.stan -#include functions/observation_model.stan -#include functions/utils.stan -} -// end functions - -// The fixed input data -data { - int gt_max; - int hosp_delay_max; - vector[hosp_delay_max] inf_to_hosp; - int dur_inf; // duration people are infectious (number of days) - real mwpd; // mL of WW produced per person per day - int if_l; // length of infection feedback pmf - vector[if_l] infection_feedback_pmf; // infection feedback pmf - int ot; // maximum time index for the hospital admissions (max number of days we could have observations) - int oht; // number of days that we have hospital admissions observations - int owt; // number of days of observed WW (should be roughly ot/7) - int uot; // unobserved time before we observe hospital admissions/ WW - int ht; // horizon time (nowcast + forecast time) - int n_weeks; // number of weeks for weekly random walk on R(t) - matrix[ot + ht, n_weeks] ind_m; // matrix needed to transform R(t) from weekly to daily - int tot_weeks; // number of weeks for the weekly random walk on IHR (includes unobserved time) - matrix[uot + ot + ht, tot_weeks] p_hosp_m ; // matrix needed to convert p_hosp RW from weekly to daily - vector[gt_max] generation_interval; // generation interval distribution - real n; // population size - array[owt] int ww_sampled_times; // the days on which WW is sampled relative - // to the days with which hospital admissions observed - array[oht] int hosp_times; // the days on which hospital admissions are observed - array[oht] int hosp; // observed hospital admissions - array[ot + ht] int day_of_week; // integer vector with 1-7 corresponding to the weekday - vector[owt] log_conc; // log(genome copies/mL) - int compute_likelihood; // 1= use data to compute likelihood - int include_ww; // 1= include wastewater data in likelihood calculation - int include_hosp; // 1 = fit to hosp, 0 = only fit wastewater model - - // Priors - vector[6] viral_shedding_pars; // tpeak, viral peak, shedding duration mean and sd - real autoreg_rt_a; - real autoreg_rt_b; - real autoreg_p_hosp_a; - real autoreg_p_hosp_b; - real inv_sqrt_phi_prior_mean; - real inv_sqrt_phi_prior_sd; - real r_prior_mean; - real r_prior_sd; - real log10_g_prior_mean; - real log10_g_prior_sd; - real i0_over_n_prior_a; - real i0_over_n_prior_b; - real wday_effect_prior_mean; - real wday_effect_prior_sd; - real initial_growth_prior_mean; - real initial_growth_prior_sd; - real sigma_ww_prior_mean; - real eta_sd_sd; - real p_hosp_prior_mean; - real p_hosp_sd_logit; - real p_hosp_w_sd_sd; - real inf_feedback_prior_logmean; - real inf_feedback_prior_logsd; -} - -transformed data { - // viral shedding parameters - real t_peak_mean = viral_shedding_pars[1]; - real t_peak_sd = viral_shedding_pars[2]; - real viral_peak_mean = viral_shedding_pars[3]; - real viral_peak_sd = viral_shedding_pars[4]; - real dur_shed_mean = viral_shedding_pars[5]; - real dur_shed_sd = viral_shedding_pars[6]; - // natural scale -> lognormal parameters - // https://en.wikipedia.org/wiki/Log-normal_distribution - real r_logmean = convert_to_logmean(r_prior_mean, r_prior_sd); - real r_logsd = convert_to_logsd(r_prior_mean, r_prior_sd); - // reversed generation interval - vector[gt_max] gt_rev_pmf = reverse(generation_interval); - vector[if_l] infection_feedback_rev_pmf = reverse(infection_feedback_pmf); -} - -// The parameters accepted by the model. -parameters { - vector[n_weeks-1] w; // weekly random walk - real eta_sd; // step size of random walk - real autoreg_rt;// coefficient on AR process in R(t) - array [(include_ww==1) ? 1 : 0] real autoreg_p_hosp; - real log_r; // baseline reproduction number estimate (log) - real i0_over_n; // Per capita incident infections - // on day -uot before first observation day - real initial_growth; // initial growth from I0 to first observed time - real inv_sqrt_phi_h; - real sigma_ww; - real p_hosp_mean; // Estimated IHR - vector[(include_ww==1) ? tot_weeks : 0] p_hosp_w; - array [(include_ww==1) ? 1 : 0] real p_hosp_w_sd; - real t_peak; // time to viral load peak in shedding - real viral_peak; // log10 peak viral load shed /mL - real dur_shed; // duration of detectable viral shedding - real log10_g; // log10 of number of genomes per infected individual - simplex[7] hosp_wday_effect; // day of week reporting effect, sums to 1 - real infection_feedback; // infection feedback - -} - -transformed parameters { - vector[ot + uot + ht] new_i; // daily incident infections /n - vector [(include_ww ==1) ? ot + uot + ht: 1] p_hosp; // probability of hospitalization - vector[(include_ww == 1) ? tot_weeks-1: 0] p_hosp_in_weeks; // the weekly vector of probability of hospital admissions - vector[ot + uot + ht] model_hosp; // model estimated hospital admissions - vector[oht] exp_obs_hosp; // expected observed hospital admissions - vector[ot] exp_obs_hosp_no_wday_effect; // expected observed hospital admissions before weekday effect - vector[gt_max] s; // viral kinetics trajectory (normalized) - vector[ot + uot + ht] model_log_v; // model estimated log viral genomes shed per person - vector[ot+ht] model_log_v_ot; // model estimated log viral genomes shed per person during ot - vector[owt] exp_obs_log_v; // model estimated log viral genomes shed per person at ww sampled times only - vector[ot + uot + ht] model_net_i; // number of net infected individuals shedding on each day (sum of individuals in dift stages of infection) - real phi_h = inv_square(inv_sqrt_phi_h); // previouslt inv_square(inv_sqrt_phi_h) - vector[ot + ht] unadj_r; // R(t) - vector[ot + ht] rt; // R(t) - real i0 = i0_over_n * n; // Absolute initial incident infections - vector[n_weeks] log_rt_weeks; // log R(t) in weeks for autocorrelated RW - - - // AR + RW implementation: - log_rt_weeks = diff_ar1(log_r, autoreg_rt, eta_sd, w, 0); - unadj_r = ind_m * log_rt_weeks; - unadj_r = exp(unadj_r); - - // Expected daily number of new infections (per capita), using EpiNow2 assumptions re pre and post observation time - // Using pop = 1 so that damping is normalized to per capita - (new_i, rt) = generate_infections( - unadj_r, - uot, - gt_rev_pmf, - log(i0_over_n), - initial_growth, - ht, - infection_feedback, - infection_feedback_rev_pmf); - - // Expected hospitalizations: - // generates all hospitalizations, across unobserved time, observed time, and forecast time - if(include_ww==1){ - p_hosp = assemble_p_hosp(p_hosp_m, p_hosp_mean, p_hosp_w_sd[1], - autoreg_p_hosp[1], p_hosp_w, tot_weeks, 1); - model_hosp = convolve_dot_product(p_hosp .* new_i, reverse(inf_to_hosp), - ot + uot + ht); - }else{ - p_hosp[1] = inv_logit(p_hosp_mean); - // generates all hospitalizations, across unobserved time, observed time, and forecast time - model_hosp = convolve_dot_product(p_hosp[1] * new_i, reverse(inf_to_hosp), - ot + uot + ht); - } - - - - // just get the expected observed hospitalizations - exp_obs_hosp_no_wday_effect = model_hosp[uot + 1 : uot + ot]; - // apply the weekday effect so these are distributed with fewer admits on Sat & Sun - // multiply by n because data must be integer so need this to be in actual numbers not proportions - exp_obs_hosp = n * day_of_week_effect(exp_obs_hosp_no_wday_effect[hosp_times], - day_of_week[hosp_times], hosp_wday_effect); - - // Expected shed viral genomes: - // Shedding kinetics trajectory - s = get_vl_trajectory(t_peak, viral_peak, dur_shed, gt_max); - - // This should also be a convolution of incident infections and shedding kinetics pmf times avg total virus shed - model_net_i = convolve_dot_product(new_i, reverse(s), uot + ot + ht); // net number of infected individuals - // log number of viral genomes shed on a given day = net infected individuals * amount shed per individual - model_log_v = log(10)*log10_g + log(model_net_i + 1e-8); // adding for numerical stability - // genome copies/mL = genome copies/(person * mL of WW per person day) - model_log_v_ot = model_log_v[(uot + 1) : (uot + ot + ht)] - log(mwpd); - exp_obs_log_v = model_log_v_ot[ww_sampled_times]; -} - -// Prior and sampling distribution -model { - // priors - vector[7] effect_mean = rep_vector(wday_effect_prior_mean, 7); - w ~ std_normal(); - eta_sd ~ normal(0, eta_sd_sd); - autoreg_rt ~ beta(autoreg_rt_a, autoreg_rt_b); - autoreg_p_hosp ~ beta(autoreg_p_hosp_a, autoreg_p_hosp_b); - log_r ~ normal(r_logmean, r_logsd); - i0_over_n ~ beta(i0_over_n_prior_a, i0_over_n_prior_b); - initial_growth ~ normal(initial_growth_prior_mean, initial_growth_prior_sd); - inv_sqrt_phi_h ~ normal(inv_sqrt_phi_prior_mean, inv_sqrt_phi_prior_sd); - sigma_ww ~ normal(0, sigma_ww_prior_mean); - log10_g ~ normal(log10_g_prior_mean, log10_g_prior_sd); - hosp_wday_effect ~ normal(effect_mean, wday_effect_prior_sd); - p_hosp_mean ~ normal(logit(p_hosp_prior_mean), p_hosp_sd_logit); - p_hosp_w ~ std_normal(); - p_hosp_w_sd ~ normal(0, p_hosp_w_sd_sd); - t_peak ~ normal(t_peak_mean, t_peak_sd); - viral_peak ~ normal(viral_peak_mean, viral_peak_sd); - dur_shed ~ normal(dur_shed_mean, dur_shed_sd); - infection_feedback ~ lognormal(inf_feedback_prior_logmean, inf_feedback_prior_logsd); - - // Compute log likelihood - if (compute_likelihood == 1) { - if (include_ww == 1) { - log_conc ~ normal(exp_obs_log_v, sigma_ww); - } - - if (include_hosp == 1) { - hosp ~ neg_binomial_2(exp_obs_hosp, phi_h); - } - } // end if for computing log likelihood -} - -generated quantities { - array[ot + ht] real pred_hosp; - array[ot + ht] real pred_new_i; - array[ot + ht] real pred_conc; - vector[ot + ht] exp_state_ww_conc = exp(model_log_v_ot); // state mean wastewater concentration - real g = pow(10, log10_g); - - pred_hosp = neg_binomial_2_rng(n * day_of_week_effect(model_hosp[uot + 1 : - uot + ot + ht], - day_of_week, - hosp_wday_effect), - phi_h); - pred_new_i = neg_binomial_2_rng(n * new_i[uot + 1 : uot + ot + ht], phi_h); - - pred_conc = normal_rng(model_log_v_ot, sigma_ww); -} From 43d6e69788289977c217455d0f55ba52e99e8792 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 12 Jul 2024 14:04:02 -0400 Subject: [PATCH 097/103] add package name in front of package fxns in data-raw --- data-raw/covid_pmfs.R | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/data-raw/covid_pmfs.R b/data-raw/covid_pmfs.R index ccc353d8..f5a0ea02 100644 --- a/data-raw/covid_pmfs.R +++ b/data-raw/covid_pmfs.R @@ -1,5 +1,5 @@ # Load in the parameters -params <- get_params( +params <- wwinference::get_params( system.file("extdata", "example_params.toml", package = "wwinference" ) @@ -7,20 +7,20 @@ params <- get_params( # Put it all together generation_interval <- withr::with_seed(42, { - simulate_double_censored_pmf( + wwinference::simulate_double_censored_pmf( max = params$gt_max, meanlog = params$mu_gi, sdlog = params$sigma_gi, fun_dist = rlnorm, n = 5e6 - ) |> wdrop_first_and_renormalize() + ) |> wwinference::drop_first_and_renormalize() }) -inc <- make_incubation_period_pmf( +inc <- wwinference::make_incubation_period_pmf( params$backward_scale, params$backward_shape, params$r ) -sym_to_hosp <- make_hospital_onset_delay_pmf( +sym_to_hosp <- wwinference::make_hospital_onset_delay_pmf( params$neg_binom_mu, params$neg_binom_size ) -inf_to_hosp <- make_reporting_delay_pmf(inc, sym_to_hosp) +inf_to_hosp <- wwinference::make_reporting_delay_pmf(inc, sym_to_hosp) usethis::use_data(generation_interval, overwrite = TRUE) From 1a72e0921fb294b833542166b2d8bb2f81beaae7 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 12 Jul 2024 14:10:58 -0400 Subject: [PATCH 098/103] fix switch for inits and rename --- R/wwinference.R | 53 ++++++++++++++++++-------------------------- data-raw/test_data.R | 2 +- man/wwinference.Rd | 6 ++--- 3 files changed, 25 insertions(+), 36 deletions(-) diff --git a/R/wwinference.R b/R/wwinference.R index 066a186a..cb7f3609 100644 --- a/R/wwinference.R +++ b/R/wwinference.R @@ -27,8 +27,8 @@ #' user based on the date they are producing a forecast #' @param mcmc_options The MCMC parameters as defined using #' `get_mcmc_options()`. -#' @param spec_inits Boolean indicating whether or not to specify the -#' initialization of the sampler, default is `TRUE`, meaning that the +#' @param generate_initial_values Boolean indicating whether or not to specify +#' the initialization of the sampler, default is `TRUE`, meaning that the #' initialization lists will be passed to the `cmdstanr::sample()` #' function #' @param compiled_model The pre-compiled model as defined using @@ -66,7 +66,7 @@ wwinference <- function(ww_data, "2023-12-06" ), mcmc_options = wwinference::get_mcmc_options(), - spec_inits = TRUE, + generate_initial_values = TRUE, compiled_model = wwinference::compile_model()) { # Check that data is compatible with specifications check_date(ww_data, model_spec$forecast_date) @@ -86,39 +86,29 @@ wwinference <- function(ww_data, compute_likelihood = 1 ) - init_lists <- c() - for (i in 1:mcmc_options$n_chains) { - init_lists[[i]] <- get_inits(stan_data, params) + init_lists <- NULL + if (generate_initial_values) { + init_lists <- c() + for (i in 1:mcmc_options$n_chains) { + init_lists[[i]] <- get_inits(stan_data, params) + } } fit_model <- function(compiled_model, standata, model_spec, - init_lists, - spec_inits) { - if (isTRUE(spec_inits)) { - fit <- compiled_model$sample( - data = stan_data, - init = init_lists, - seed = mcmc_options$seed, - iter_sampling = mcmc_options$iter_sampling, - iter_warmup = mcmc_options$iter_warmup, - max_treedepth = mcmc_options$max_treedepth, - chains = mcmc_options$n_chains, - parallel_chains = mcmc_options$n_chains - ) - } else { - fit <- compiled_model$sample( - data = stan_data, - seed = mcmc_options$seed, - iter_sampling = mcmc_options$iter_sampling, - iter_warmup = mcmc_options$iter_warmup, - max_treedepth = mcmc_options$max_treedepth, - chains = mcmc_options$n_chains, - parallel_chains = mcmc_options$n_chains - ) - } + init_lists) { + fit <- compiled_model$sample( + data = stan_data, + init = init_lists, + seed = mcmc_options$seed, + iter_sampling = mcmc_options$iter_sampling, + iter_warmup = mcmc_options$iter_warmup, + max_treedepth = mcmc_options$max_treedepth, + chains = mcmc_options$n_chains, + parallel_chains = mcmc_options$n_chains + ) return(fit) } @@ -131,8 +121,7 @@ wwinference <- function(ww_data, compiled_model, standata, model_spec, - init_lists, - spec_inits + init_lists ) if (!is.null(fit$error)) { # If the model errors, return a list with the diff --git a/data-raw/test_data.R b/data-raw/test_data.R index 0a8a4331..18ab2b3e 100644 --- a/data-raw/test_data.R +++ b/data-raw/test_data.R @@ -58,7 +58,7 @@ fit <- wwinference::wwinference( iter_sampling = 25, iter_warmup = 25 ), - spec_inits = FALSE, + generate_initial_values = FALSE, compiled_model = model ) diff --git a/man/wwinference.Rd b/man/wwinference.Rd index 91474480..29db7b85 100644 --- a/man/wwinference.Rd +++ b/man/wwinference.Rd @@ -10,7 +10,7 @@ wwinference( count_data, model_spec = wwinference::get_model_spec(forecast_date = "2023-12-06"), mcmc_options = wwinference::get_mcmc_options(), - spec_inits = TRUE, + generate_initial_values = TRUE, compiled_model = wwinference::compile_model() ) } @@ -33,8 +33,8 @@ user based on the date they are producing a forecast} \item{mcmc_options}{The MCMC parameters as defined using \code{get_mcmc_options()}.} -\item{spec_inits}{Boolean indicating whether or not to specify the -initialization of the sampler, default is \code{TRUE}, meaning that the +\item{generate_initial_values}{Boolean indicating whether or not to specify +the initialization of the sampler, default is \code{TRUE}, meaning that the initialization lists will be passed to the \code{cmdstanr::sample()} function} From 5ba795c5a3e730e1cba56654b12484fde95d28ca Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 12 Jul 2024 14:12:12 -0400 Subject: [PATCH 099/103] remove knit from imports --- DESCRIPTION | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index f384034a..8c944e1d 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -80,8 +80,7 @@ Imports: rlang, scales, ggplot2, - posterior, - knitr + posterior Remotes: stan-dev/cmdstanr VignetteBuilder: From ed7a5a72af60ea5314209e5ed9abc63b93305fe7 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 12 Jul 2024 14:28:20 -0400 Subject: [PATCH 100/103] try adding knitr to suggests --- DESCRIPTION | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index 8c944e1d..dd93d3a0 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -61,7 +61,8 @@ Roxygen: list(markdown = TRUE) RoxygenNote: 7.3.1 Suggests: testthat (>= 3.0.0), - bookdown + bookdown, + knitr Config/testthat/edition: 3 LazyData: true Imports: From a0aa4a4707db8a793c53e9f7bec871b6bffdec59 Mon Sep 17 00:00:00 2001 From: Kaitlyn Johnson <94390107+kaitejohnson@users.noreply.github.com> Date: Fri, 12 Jul 2024 14:47:17 -0400 Subject: [PATCH 101/103] Update R/wwinference.R Co-authored-by: Dylan H. Morris --- R/wwinference.R | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/R/wwinference.R b/R/wwinference.R index cb7f3609..2aac3ed2 100644 --- a/R/wwinference.R +++ b/R/wwinference.R @@ -28,8 +28,9 @@ #' @param mcmc_options The MCMC parameters as defined using #' `get_mcmc_options()`. #' @param generate_initial_values Boolean indicating whether or not to specify -#' the initialization of the sampler, default is `TRUE`, meaning that the -#' initialization lists will be passed to the `cmdstanr::sample()` +#' the initialization of the sampler, default is `TRUE`, meaning that +#' initialization lists will be generated and passed as the `init` argument +#' to the model object [`$sample()`][cmdstanr::model-method-sample] call. #' function #' @param compiled_model The pre-compiled model as defined using #' `compile_model()` From 1ed01bb55c05284fa34f3e01dde2a0929dec5956 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 12 Jul 2024 14:47:50 -0400 Subject: [PATCH 102/103] update documentation --- man/wwinference.Rd | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/man/wwinference.Rd b/man/wwinference.Rd index 29db7b85..a8d4c47f 100644 --- a/man/wwinference.Rd +++ b/man/wwinference.Rd @@ -34,8 +34,9 @@ user based on the date they are producing a forecast} \code{get_mcmc_options()}.} \item{generate_initial_values}{Boolean indicating whether or not to specify -the initialization of the sampler, default is \code{TRUE}, meaning that the -initialization lists will be passed to the \code{cmdstanr::sample()} +the initialization of the sampler, default is \code{TRUE}, meaning that +initialization lists will be generated and passed as the \code{init} argument +to the model object \code{\link[cmdstanr:model-method-sample]{$sample()}} call. function} \item{compiled_model}{The pre-compiled model as defined using From 3310d1d18c1350b03401b813590edcf334ca15e2 Mon Sep 17 00:00:00 2001 From: kaitejohnson Date: Fri, 12 Jul 2024 14:52:58 -0400 Subject: [PATCH 103/103] fix precommit --- R/wwinference.R | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/R/wwinference.R b/R/wwinference.R index 2aac3ed2..8ef0116e 100644 --- a/R/wwinference.R +++ b/R/wwinference.R @@ -29,7 +29,7 @@ #' `get_mcmc_options()`. #' @param generate_initial_values Boolean indicating whether or not to specify #' the initialization of the sampler, default is `TRUE`, meaning that -#' initialization lists will be generated and passed as the `init` argument +#' initialization lists will be generated and passed as the `init` argument #' to the model object [`$sample()`][cmdstanr::model-method-sample] call. #' function #' @param compiled_model The pre-compiled model as defined using