Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding YML processing #16

Merged
merged 2 commits into from
Oct 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 43 additions & 44 deletions README.md

Large diffs are not rendered by default.

32 changes: 18 additions & 14 deletions README.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,25 +10,29 @@ This repository is used for coordinating all the software development activities
```{r}
#| label: load-data
#| echo: false
library(readxl)
library(yaml)
library(data.table)

existing_software <- read_excel(
path = "data/box/ForeSITE_Tools surveyV2_ Tool D&I only.xlsx",
skip = 2
) |> as.data.table()
existing_software <- list.files(
"data", pattern = "ya?ml$", full.names = TRUE
) |>
lapply(FUN = yaml.load_file, as.named.list = TRUE) |>
lapply(as.data.table) |>
rbindlist()

writexl::write_xlsx(x = existing_software, path="data/software.xlsx")

# Renaming relevant columns
cnames <- c(
Tool = "Tool Name",
Description = "Brief Description",
Languages = "Languages",
Repo = "GitHub Repo (new or old if existing one)",
Contact = "Name of developer, maintainer, or key contact",
Email = "Email of developer, maintainer or key contact",
Links = "Link to web page/documentation (optional)",
Type="Type of tool",
Diseases="Relevant disease(s)"
Tool = "tool_name",
Description = "brief_description",
Languages = "languages",
Repo = "github_repo_new_or_old_if_existing_one",
Contact = "name_of_developer_maintainer_or_key_contact",
Email = "email_of_developer_maintainer_or_key_contact",
Links = "link_to_web_page_documentation_optional",
apulsipher marked this conversation as resolved.
Show resolved Hide resolved
Type="type_of_tool",
Diseases="relevant_disease_s"
)

setnames(existing_software, cnames, names(cnames))
Expand Down
Binary file modified README_files/figure-commonmark/wordcloud-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
62 changes: 62 additions & 0 deletions data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# List of existing software

This folder holds information about existing software developed by the ForeSITE group. To add a new software, please create a new yaml file (template below) and add it to the list. Executing the [../README.qmd](../README.qmd) quarto file will automatically update the list of software included in the README.md at the root level of the repository and update the `software.xlsx` file included in this folder.

## YAML Template
Fill in the template fields with your project's information.
```yaml
tool_name:
brief_description:
name_of_developer_maintainer_or_key_contact:
email_of_developer_maintainer_or_key_contact:
is_it_actively_maintained_yes_no:
relevant_disease_s:
maturity:
license:
languages:
audience_type:
required_expertise_to_use_tool:
type_of_tool:
type_of_data_input_needed:
link_to_web_page_documentation_optional:
link_to_source_code_optional:
reviewer:
github_repo_new_or_old_if_existing_one:
complete_yes_no:
pkg_dev_assessment_how_hard_is_to_make_into_a_package_notes:
overall_assessment_easy_win_needs_some_work_needs_lots_of_work_long_term_project:
```

## Example Project YAML File
```yaml
apulsipher marked this conversation as resolved.
Show resolved Hide resolved
tool_name: 'epiworld: Fast Agent-Based Epi Models'
brief_description: A flexible framework for Agent-Based Models (ABM), the 'epiworldR'
package provides methods for prototyping disease outbreaks and transmission models
using a 'C++' backend, making it very fast. It supports multiple epidemiological
models, including the Susceptible-Infected-Susceptible (SIS), Susceptible-Infected-Removed
(SIR), Susceptible-Exposed-Infected-Removed (SEIR), and others, involving arbitrary
mitigation policies and multiple-disease models. Users can specify infectiousness/susceptibility
rates as a function of agents' features, providing great complexity for the model
dynamics. Furthermore, 'epiworldR' is ideal for simulation studies featuring large
populations.
name_of_developer_maintainer_or_key_contact: George G. Vega Yon
email_of_developer_maintainer_or_key_contact: [email protected]
is_it_actively_maintained_yes_no: 'Yes'
relevant_disease_s: .na.character
maturity: Published
license: MIT
languages: R, C++, Python, Webessembly
audience_type: Modelers
required_expertise_to_use_tool: TBD
type_of_tool: Epidemic Model - Scenario Modeling
type_of_data_input_needed: Parameter inputs for simulating the model
link_to_web_page_documentation_optional: https://github.com/UofUEpiBio/epiworld, https://github.com/UofUEpiBio/epiworldR/,
https://github.com/UofUEpiBio/epiworldpy, https://github.com/UofUEpiBio/epiworldRShiny
link_to_source_code_optional: .na.character
reviewer: .na.character
github_repo_new_or_old_if_existing_one: https://github.com/UofUEpiBio/epiworld
complete_yes_no: 'yes'
pkg_dev_assessment_how_hard_is_to_make_into_a_package_notes: .na.character
overall_assessment_easy_win_needs_some_work_needs_lots_of_work_long_term_project: .na.character
```

31 changes: 31 additions & 0 deletions data/airborne_release_of_infectious_pathogens_simulator.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
tool_name: Airborne release of infectious pathogens simulator
brief_description: 'Estimate airborne dispersal, human exposure, and infection probabilities
and timelines after a release of a quantity of infectious organisms. Scenarios(s)
Modeled: Airborne release and human inhalational exposure and infection, similar
the Sverdlovsk anthrax leak of 1979.'
name_of_developer_maintainer_or_key_contact: Damon Toth
email_of_developer_maintainer_or_key_contact: [email protected]
is_it_actively_maintained_yes_no: 'No'
relevant_disease_s: Any pathogen with supportable assumptions; we have applied our
tools to Anthrax
maturity: R code is organized and documented but not publicly available. Could be
made publicly available or packaged for use with moderate effort.
license: TBD
languages: R
audience_type: TBD
required_expertise_to_use_tool: TBD
type_of_tool: Epidemic Model - Scenario Modeling
type_of_data_input_needed: Exposure localized to release area.
link_to_web_page_documentation_optional: 1-Toth D, Gundlapalli A, Schell W, Bulmahn
K, Walton T, Woods C, Coghill C, Gallegos F, Samore M, Adler F (2013). Quantitative
models of the dose-response and time course of inhalational anthrax in humans. PLoS
Pathog, 9(8), e1003555. https://doi.org/10.1371/journal.ppat.1003555. 2-Bulmahn
K, Canella M, Coghill C, Gallegos F, Gundlapalli A, Schell W, Toth D, Walton T,
Woods C (2012). Final Supplementary Risk Assessment for the Boston University National
Emerging Infectious Diseases Laboratories, National Institutes of Health. https://www.bu.edu/neidl/files/2013/01/SFEIR-Volume-III.pdf.
link_to_source_code_optional: .na.character
reviewer: George
apulsipher marked this conversation as resolved.
Show resolved Hide resolved
github_repo_new_or_old_if_existing_one: .na.character
complete_yes_no: .na.character
pkg_dev_assessment_how_hard_is_to_make_into_a_package_notes: Asked for the code
overall_assessment_easy_win_needs_some_work_needs_lots_of_work_long_term_project: .na.character
23 changes: 23 additions & 0 deletions data/arima_generalize_arima_vector_autoregression.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
tool_name: ARIMA; Generalize ARIMA; Vector Autoregression
brief_description: To forecast the weekly positive test number
name_of_developer_maintainer_or_key_contact: Yue Zhang
email_of_developer_maintainer_or_key_contact: [email protected]
is_it_actively_maintained_yes_no: N/A
relevant_disease_s: RSV, influenza, COVID-19
maturity: Research or Development phase
license: TBD
languages: Python
audience_type: TBD
required_expertise_to_use_tool: TBD
type_of_tool: Epidemic Model - Forecasting
type_of_data_input_needed: 'The objective of this study is to use multiple time series
data to predict weekly infection counts for mutiple virus. Timeframe: 12/2002-01/2024'
link_to_web_page_documentation_optional: no pubication plan
link_to_source_code_optional: .na.character
reviewer: Andrew
apulsipher marked this conversation as resolved.
Show resolved Hide resolved
github_repo_new_or_old_if_existing_one: .na.character
complete_yes_no: .na.character
pkg_dev_assessment_how_hard_is_to_make_into_a_package_notes: Nothing to do but publish
analysis files. Yue will publish those
overall_assessment_easy_win_needs_some_work_needs_lots_of_work_long_term_project: Unsure
- Needs group thinking
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
tool_name: Attention-Based Models for Snow-Water Equivalent Prediction
brief_description: 'Transformer architectures for spatio-temporal prediction (or synthetic
data generation/imputation). Scenarios(s) Modeled: Predicting the SWE value for
multiple SNOTEL locations in the Western US using the Attention Models'
name_of_developer_maintainer_or_key_contact: Ananth Kalyanaraman
email_of_developer_maintainer_or_key_contact: [email protected]
is_it_actively_maintained_yes_no: 'Yes'
relevant_disease_s: Synthetic data generation/imputation
maturity: Software page: https://github.com/Krishuthapa/SWE-Attention (currently tested
for SWE prediction application)
license: TBD
languages: Python
audience_type: TBD
required_expertise_to_use_tool: TBD
type_of_tool: Epidemic Model - Scenario Modeling
type_of_data_input_needed: SWE values vary spatiotemporally—affected by weather, topography,
and other environmental factors.
link_to_web_page_documentation_optional: https://ojs.aaai.org/index.php/AAAI/article/view/30337
link_to_source_code_optional: .na.character
reviewer: George
github_repo_new_or_old_if_existing_one: https://github.com/Krishuthapa/SWE-Attention
complete_yes_no: 'yes'
pkg_dev_assessment_how_hard_is_to_make_into_a_package_notes: "Potentially a python
module. There are functions and classes that are well-defined. Maybe exporting the
classes should be enough.\r\n\r\nBut it also has hardcoded stuff in the class def
files."
overall_assessment_easy_win_needs_some_work_needs_lots_of_work_long_term_project: Needs
some work - Functions/classes are mixed with the code. Need to separate them and
add them to namespace/__init__.py file.
33 changes: 33 additions & 0 deletions data/autoencoder_and_clustering_based_anomaly_detection.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
tool_name: Autoencoder and clustering based anomaly detection
brief_description: Initially developed as a project with DHS Science and Technology,
this project took place from 2018 to 2019 (pre-COVID). The approach here is to
find potentially anomalous cases which are also related. To create anomaly “scores”,
a neural network autoencoder is used to process over patient data from the emergency
department visit and quantify how “rare” this kind of visit might be compared to
all other previous visits. Then a density-based clustering is used to identify
any potential clusters of anomalous cases since one single case may not warrant
concern, but a group of patients with similar labs, signs/symptoms, etc might suggest
common exposures and conditions.
name_of_developer_maintainer_or_key_contact: Kelly Peterson
email_of_developer_maintainer_or_key_contact: [email protected] ;[email protected]
is_it_actively_maintained_yes_no: N/A
relevant_disease_s: NA
maturity: Still early in its maturity. Early investigations show that this currently
emits too many potential “anomaly clusters” to be useful, so sensitivity needs to
be reduced before being used in an operation capacity
license: TBD
languages: Python
audience_type: TBD
required_expertise_to_use_tool: TBD
type_of_tool: Decision Support tool
type_of_data_input_needed: Uses VA CDW data including Emergency Department visits,
associated ICD codes, health factors, labs, orders, medications, and procedures.
Python technology stacks. Autoencoder models trained with PyTorch, and clustering
is performed with HDBScan.
link_to_web_page_documentation_optional: Nothing published.
link_to_source_code_optional: .na.character
reviewer: Andrew
github_repo_new_or_old_if_existing_one: .na.character
complete_yes_no: .na.character
pkg_dev_assessment_how_hard_is_to_make_into_a_package_notes: .na.character
overall_assessment_easy_win_needs_some_work_needs_lots_of_work_long_term_project: .na.character
52 changes: 52 additions & 0 deletions data/bayesian_transmission_model.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
tool_name: Bayesian Transmission Model
brief_description: 'Provides estimates for critical epidemiological parameters that
characterize the spread of bacterial pathogens in healthcare settings. Parameter
estimated: Transmission rate (frequency-dependent or density-dependent mass action),
importation probability, clearance rate (loss of colonization per colonized person
per unit time), surveillance test sensitivity, surveillance test specificity, effect
of covariate on transmission (multiplier in relation to overall transmission rate).'
name_of_developer_maintainer_or_key_contact: Karim Khader
email_of_developer_maintainer_or_key_contact: [email protected]
is_it_actively_maintained_yes_no: 'No'
relevant_disease_s: Bacterial pathogens and other pathogens that result in both symptomatic
and asymptomatic disease states.
maturity: C++ code can be compiled and run; customization may be required for specific
uses and to specify the underlying model/parameters of interest.
license: TBD
languages: C++
audience_type: TBD
required_expertise_to_use_tool: TBD
type_of_tool: Parameter estimation 
type_of_data_input_needed: Developed for use in a healthcare setting, accounts for
‘flow’ of patients, can allow for multiple sub-units (multiple hospitals, multiple
wards within each hospital). Uses disease (random or uniform) surveillance data;
that is testing data that is not targeted to specific populations more/less likely
to be infected.
link_to_web_page_documentation_optional: "1.Khader K, Thomas A, Stevens V, Visnovsky
L, Nevers M, Toth D, Keegan LT, Jones M, Rubin M, Samore MH (2021). Association
Between Contact Precautions And Transmission of Methicillin-Resistant Staphylococcus
Aureus in Veterans Affairs Hospitals. JAMA Netw Open.\r\n2.Khader K, Munoz-Price
LS, Hanson R, Stevens V, Keegan LT, Thomas A, Pezzin LE, Nattinger A, Singh S, Samore
MH (2021). Transmission Dynamics of Clostridioides difficile in 2 High-Acuity Hospital
Units. Clin Infect Dis.\r\n3.Khader K, Thomas A, Huskins WC, Stevens V, Keegan LT,
Visnovsky L, Samore MH (2021). Effectiveness of Contact Precautions to Prevent Transmission
of Methicillin-Resistant Staphylococcus aureus and Vancomycin-Resistant Enterococci
in Intensive Care Units. Clin Infect Dis.\r\n4.Khader K, Thomas A, Jones M, Toth
D, Stevens V, Samore MH (2019). Variation and trends in transmission dynamics of
Methicillin-resistant Staphylococcus aureus in veterans affairs hospitals and nursing
homes. Epidemics.\r\n5.Thomas A, Khader K, Redd A, Leecaster M, Zhang Y, Jones M,
Greene T, Samore M (2018). Extended models for nosocomial infection: parameter estimation
and model selection. Math Med Biol, 35(suppl_1), 29-49.\r\n6.Khader K, Thomas A,
Huskins WC, Leecaster M, Zhang Y, Greene T, Redd A, Samore MH (2017). A dynamic
transmission model to evaluate the effectiveness of infection control strategies.
Open Forum Infect Dis.\r\n7.Thomas A, Redd A, Khader K, Leecaster M, Greene T, Samore
M (2015). Efficient parameter estimation for models of healthcare-associated pathogen
transmission in discrete and continuous time. Math Med Biol, 32(1), 79-98."
link_to_source_code_optional: .na.character
reviewer: George
github_repo_new_or_old_if_existing_one: https://github.com/EpiForeSITE/bayesian-transmission
complete_yes_no: 'yes'
pkg_dev_assessment_how_hard_is_to_make_into_a_package_notes: A large C++ program.
Probably easy to write a cmd line wrapper within R. We could also use Rcpp11.
overall_assessment_easy_win_needs_some_work_needs_lots_of_work_long_term_project: Leave
for later - Too complex to address | No functional programming whatsoever
38 changes: 38 additions & 0 deletions data/branching_process_outbreak_simulator.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
tool_name: Branching process outbreak simulator
brief_description: 'Quantifies risk posed by individual importers of a novel transmissible
pathogen to a generic population, with intervention effects. Scenarios(s) Modeled:
Novel introduction of transmissible pathogen by infected traveler, by accidentally
infected laboratory worker, or similar scenario; intervention scenarios for improved
detection of initial case and for delayed mitigation after ongoing outbreak is detected.'
name_of_developer_maintainer_or_key_contact: Damon Toth
email_of_developer_maintainer_or_key_contact: [email protected]
is_it_actively_maintained_yes_no: 'No'
relevant_disease_s: Any emerging transmissible pathogen; we have applied the tool
to Ebola and Middle East respiratory syndrome (MERS).
maturity: R code used for publication results is reasonably organized and documented
but not publicly available. Key functions have been shared and used by others; could
be made publicly available or packaged for use with reasonable effort.
license: TBD
languages: R
audience_type: TBD
required_expertise_to_use_tool: TBD
type_of_tool: Epidemic Model - Scenario Modeling
type_of_data_input_needed: Generic population model applicable to local community
experiencing the initial importation / infection.
link_to_web_page_documentation_optional: "1-Toth D, Gundlapalli A, Khader K, Pettey
W, Rubin M, Adler F, Samore M (2015). Estimates of outbreak risk from new introductions
of Ebola with immediate and delayed transmission control. Emerg Infect Dis, 21(8),
1402-1408. https://doi.org/10.3201/eid2108.150170.\r\n2-Toth D, Tanner W, Khader
K, Gundlapalli A (2016). Estimates of the risk of large or long-lasting outbreaks
of Middle East respiratory syndrome after importations outside the Arabian Peninsula.
Epidemics, 16, 27-32. https://doi.org/10.1016/j.epidem.2016.04.002"
link_to_source_code_optional: .na.character
reviewer: George
github_repo_new_or_old_if_existing_one: https://github.com/EpiForeSITE/branching_process/
complete_yes_no: 'yes'
pkg_dev_assessment_how_hard_is_to_make_into_a_package_notes: All code is functions
only. It could be bundled into an R pkg very easily. We only need to have an example
so it is more complete.
overall_assessment_easy_win_needs_some_work_needs_lots_of_work_long_term_project: Easy
win - Functions are already in a separate file, just need to be added to a namespace/__init__.py
file.
31 changes: 31 additions & 0 deletions data/carriage_duration_estimation_from_serial_testing_data.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
tool_name: Carriage duration estimation from serial testing data
brief_description: 'Estimate the duration and heterogeneity of individuals’ colonization
episodes for organisms of interest. Parameter estimated: Average and distribution
of clearance rate(s) across multiple candidate model forms, average (re)acquisition
rate, sensitivity/specificity of testing. Estimates derived via maximum likelihood
techniques.'
name_of_developer_maintainer_or_key_contact: Damon Toth
email_of_developer_maintainer_or_key_contact: [email protected]
is_it_actively_maintained_yes_no: TBD
relevant_disease_s: Any pathogen with appropriate serialized test data; we have applied
our tools to S. aureus.
maturity: R code used for publication results is publicly available on Github [https://github.com/alexbeams/StaphCarrierTypes]
with documentation. Could be packaged for wider use.
license: TBD
languages: R
audience_type: TBD
required_expertise_to_use_tool: TBD
type_of_tool: Parameter estimation 
type_of_data_input_needed: Appropriate for application to data sets from repeated
testing of the same individuals over long time periods relative to typical carriage
duration. Useful for understanding dynamics of background carriage in a wide population,
important to understand for evaluating intervention effectiveness.
link_to_web_page_documentation_optional: Beams A, Keegan L, Adler F, Samore M, Khader
K, Toth D (2023), Are Staphylococcus aureus Carrier Types Evidence of Population
Heterogeneity? American Journal of Epidemiology 192(3), 455–466. https://doi.org/10.1093/aje/kwac201.
link_to_source_code_optional: .na.character
reviewer: Andrew
github_repo_new_or_old_if_existing_one: .na.character
complete_yes_no: .na.character
pkg_dev_assessment_how_hard_is_to_make_into_a_package_notes: .na.character
overall_assessment_easy_win_needs_some_work_needs_lots_of_work_long_term_project: .na.character
Loading