diff --git a/content/about/publications/images/ranking.png b/content/about/publications/images/ranking.png new file mode 100644 index 000000000..f4f07da9c Binary files /dev/null and b/content/about/publications/images/ranking.png differ diff --git a/content/about/publications/index.md b/content/about/publications/index.md index 367223e6b..83be6b872 100644 --- a/content/about/publications/index.md +++ b/content/about/publications/index.md @@ -1,6 +1,12 @@ --- title: Research papers: + - title: "Outlier Ranking for Large-Scale Public Health Data" + image: ranking.png + authors: Joshi, Townes T., Gormley, Neureiter, Wilder, Rosenfeld + link: https://ojs.aaai.org/index.php/AAAI/article/view/30222 + year: 2024 + journal: Association for the Advancement of Artificial Intelligence - title: "Smooth Multi-Period Forecasting with Application to Prediction of COVID-19 Cases" image: smoothing-paper-teaser.jpg authors: Tuzhilina, Hastie, McDonald, Tay, Tibshirani diff --git a/content/blog/2024-01-01-flash-intro.Rmd b/content/blog/2024-01-01-flash-intro.Rmd index a101d3957..c18b4dc9f 100644 --- a/content/blog/2024-01-01-flash-intro.Rmd +++ b/content/blog/2024-01-01-flash-intro.Rmd @@ -9,6 +9,7 @@ authors: - nolan - richa - tina + - catalina heroImage: blog-lg-flash.jpeg heroImageThumb: blog-thumb-flash.jpeg summary: | @@ -29,13 +30,34 @@ These issues, if undetected, can have critical downstream ramifications for data ![Fig 1. Data quality changes in case counts, shown by the large spikes in March and July 2022, when cases were trending down, resulted in similar spikes for predicted counts (red) from multiple forecasts that were then sent to the US CDC. A weekly forecast per state, for cases, hospitalizations, and deaths, up to 4 weeks in the future means that modeling teams would have to review 600 forecasts per week and may not have been able to catch the upstream data issue.](/blog/2024-01-01-flash-intro/forecast.jpg) +We care about finding data issues like these so that we can alert downstream data users accordingly. That is why our goal in the FlaSH team (Flagging Anomalies in Streams related to public Health) is to quickly identify data points that warrant human inspection and create tools to support data review. Towards this goal, our team of researchers, engineers, and data reviewers iterate on our deployed interdisciplinary approach. We will cover the different methods and perspectives of the FlaSH project, starting with the visualization and user experience perspectives. -We care about finding data issues like these so that we can alert downstream data users accordingly. That is why our goal in the FlaSH team (Flagging Anomalies in Streams related to public Health) is to quickly identify data points that warrant human inspection and create tools to support data review. Towards this goal, our team of researchers, engineers, and data reviewers iterate on our deployed interdisciplinary approach. In this blog series, we will cover the different methods and perspectives of the FlaSH project. +## Visualization and User Experience +*Perspectives from our expert data reviewer, who has been working with this system for over a year -- Tina Townes.* -Members: Ananya Joshi, Nolan Gormley, Richa Gadgil, Tina Townes \ +
[![**Fig 2a.** Revised FlaSH Dashoard](/blog/2024-01-01-flash-intro/new_dash.png)](/blog/2024-01-01-flash-intro/new_dash.png)
+ +

In its initial stages, the FlaSH dashboard (Fig 2b) only enabled me to assess potential anomalies by viewing graphs, line-by-line for each location of the numerous signals that have flagged anomalies, as generated by the FlaSH program. This was a particularly daunting task as daily FlaSH outputs generated and continue to produce a large number of reports in the form of compressed lines that required clicking on to expand and reveal more details. Without the new dashboard's features, I was spending a significant amount of time scrolling through the daily list of anomaly reports and manually sorting what I wanted to review by clicking on and expanding only certain report lines and leaving them expanded until I was done with my selection process and ready to review the expanded lines. I would also often make notes and document interesting patterns in anomalies in a separate notepad, decreasing the efficiency and speed of my review process. My attention became divided as I was parsing though the daily anomaly list to search for reports in certain geographies (I knew I wanted to examine these due to prior report patterns), while simultaneously trying to focus on assessing new anomalies.

+ +
[![**Fig 2b.** Prior FlaSH Dashoard](/blog/2024-01-01-flash-intro/old_dash.png)](/blog/2024-01-01-flash-intro/old_dash.png)
+ + +

With the old dashboard setup, it was not easy for me to review the lines of daily anomaly reports because I couldn't efficiently filter various incoming anomalies when I needed to examine specific geographic areas or signals. For example, one particular week I was seeing a lot of anomaly reports in a county in Puerto Rico Monday through Wednesday. By Thursday of that week I wanted to, upon logging into the platform, immediately proceed to filter the daily anomaly reports to look specifically at that Puerto Rican county right away, but had no way of filtering by geography with the old dashboard. The updated dashboard now has a menu that lets me efficiently select to filter lines not only by the geographic regions, but also by various indicators as well. This new setup speeds up my daily review process as it lets me quickly focus on specific geographies and finish reviewing those so that I can move on and focus on examining other anomaly reports in different geographies. +

+ + +

Now, in its current iteration (Fig 2a), the FlaSH dashboard lets me easily filter daily anomaly results by various variables including geos and signal types, and also view a national map offering a quick glimpse of locations of high FlaSH scores. Furthermore, the updated FlasH dashboard now enables me to take detailed notes on particularly interesting anomalies, trends and other issues of importance, and maintain these notes in an organized, searchable fashion within the platform.

+

Finally, now with the dashboard’s repositioned filtering menu, the page layout becomes an even more familiar environment. The menu echoes the user-friendly layouts of popular retail and informational sites, making navigation much more intuitive and smoother, thus allowing me to work through various options more quickly.

+

These new dashboard features allow me to devote more of my time and efforts to assessing anomalies of interest and focus on geographies with high concentrations of problematic data or noteworthy trends.

+ +## Additional Information +For more information, please check out our [demo video](https://www.youtube.com/watch?v=fWe6M-rTQQ0), open-source methods [(1)](https://github.com/cmu-delphi/covidcast-indicators/blob/main/_delphi_utils_python/delphi_utils/flash_eval/eval_day.py) [(2)](https://github.com/Ananya-Joshi/outshines_sparky), and publications [(1)](https://arxiv.org/abs/2306.16914) [(2)](https://ojs.aaai.org/index.php/AAAI/article/view/30222). + + +Members: Ananya Joshi, Nolan Gormley, Richa Gadgil, Tina Townes, Catalina Vajiac (part time) \ Former Members: Luke Neurieter, Katie Mazaitis \ Advisors: Peter Jhon, Roni Rosenfeld, Bryan Wilder - +**Revised July 12th 2024** diff --git a/content/blog/2024-01-01-flash-intro.html b/content/blog/2024-01-01-flash-intro.html index ffc7ac290..a2a7d478a 100644 --- a/content/blog/2024-01-01-flash-intro.html +++ b/content/blog/2024-01-01-flash-intro.html @@ -9,6 +9,7 @@ - nolan - richa - tina + - catalina heroImage: blog-lg-flash.jpeg heroImageThumb: blog-thumb-flash.jpeg summary: | @@ -19,19 +20,59 @@ --- +
+ +
-

Delphi publishes millions of public-health-related data points per day, including the total number of daily influenza cases, hospitalizations, and deaths per county and state in the United States (US). This data helps public health practitioners, data professionals, and members of the public make important, informed decisions relating to health and well-being.

+

Delphi publishes millions of public-health-related data points per day, such as the total number of daily COVID-19 cases, hospitalizations, and deaths per county and state in the United States (US). This data helps public health practitioners, data professionals, and members of the public make important, informed decisions relating to health and well-being.

Yet, as data volumes continue to grow quickly (Delphi’s data volume expanded 1000x in just 3 years), it is infeasible for data reviewers to inspect every one of these data points for subtle changes in

These issues, if undetected, can have critical downstream ramifications for data users (as shown by the example in Fig 1).

-
- -

Fig 1. Data quality changes in case counts, shown by the large spikes in March and July 2022, when cases were trending down, resulted in similar spikes for predicted counts (red) from multiple forecasts that were then sent to the US CDC. A weekly forecast per state, for cases, hospitalizations, and deaths, up to 4 weeks in the future means that modeling teams would have to review 600 forecasts per week and may not have been able to catch the upstream data issue.

+
+Fig 1. Data quality changes in case counts, shown by the large spikes in March and July 2022, when cases were trending down, resulted in similar spikes for predicted counts (red) from multiple forecasts that were then sent to the US CDC. A weekly forecast per state, for cases, hospitalizations, and deaths, up to 4 weeks in the future means that modeling teams would have to review 600 forecasts per week and may not have been able to catch the upstream data issue. +
Fig 1. Data quality changes in case counts, shown by the large spikes in March and July 2022, when cases were trending down, resulted in similar spikes for predicted counts (red) from multiple forecasts that were then sent to the US CDC. A weekly forecast per state, for cases, hospitalizations, and deaths, up to 4 weeks in the future means that modeling teams would have to review 600 forecasts per week and may not have been able to catch the upstream data issue.
+
+

We care about finding data issues like these so that we can alert downstream data users accordingly. That is why our goal in the FlaSH team (Flagging Anomalies in Streams related to public Health) is to quickly identify data points that warrant human inspection and create tools to support data review. Towards this goal, our team of researchers, engineers, and data reviewers iterate on our deployed interdisciplinary approach. We will cover the different methods and perspectives of the FlaSH project, starting with the visualization and user experience perspectives.

+
+

Visualization and User Experience

+

Perspectives from our expert data reviewer, who has been working with this system for over a year – Tina Townes.

+
+
+Fig 2a. Revised FlaSH Dashoard
-

We care about finding data issues like these so that we can alert downstream data users accordingly. That is why our goal in the FlaSH team (Flagging Anomalies in Streams related to public Health) is to quickly identify data points that warrant human inspection and create tools to support data review. Towards this goal, our team of researchers, engineers, and data reviewers iterate on our deployed interdisciplinary approach. In this blog series, we will cover the different methods and perspectives of the FlaSH project.

-

Members: Ananya Joshi, Nolan Gormley, Richa Gadgil, Tina Townes  

+
+

+In its initial stages, the FlaSH dashboard (Fig 2b) only enabled me to assess potential anomalies by viewing graphs, line-by-line for each location of the numerous signals that have flagged anomalies, as generated by the FlaSH program. This was a particularly daunting task as daily FlaSH outputs generated and continue to produce a large number of reports in the form of compressed lines that required clicking on to expand and reveal more details. Without the new dashboard’s features, I was spending a significant amount of time scrolling through the daily list of anomaly reports and manually sorting what I wanted to review by clicking on and expanding only certain report lines and leaving them expanded until I was done with my selection process and ready to review the expanded lines. I would also often make notes and document interesting patterns in anomalies in a separate notepad, decreasing the efficiency and speed of my review process. My attention became divided as I was parsing though the daily anomaly list to search for reports in certain geographies (I knew I wanted to examine these due to prior report patterns), while simultaneously trying to focus on assessing new anomalies. +

+
+
+Fig 2b. Prior FlaSH Dashoard +
+
+

+With the old dashboard setup, it was not easy for me to review the lines of daily anomaly reports because I couldn’t efficiently filter various incoming anomalies when I needed to examine specific geographic areas or signals. For example, one particular week I was seeing a lot of anomaly reports in a county in Puerto Rico Monday through Wednesday. By Thursday of that week I wanted to, upon logging into the platform, immediately proceed to filter the daily anomaly reports to look specifically at that Puerto Rican county right away, but had no way of filtering by geography with the old dashboard. The updated dashboard now has a menu that lets me efficiently select to filter lines not only by the geographic regions, but also by various indicators as well. This new setup speeds up my daily review process as it lets me quickly focus on specific geographies and finish reviewing those so that I can move on and focus on examining other anomaly reports in different geographies. +

+

+Now, in its current iteration (Fig 2a), the FlaSH dashboard lets me easily filter daily anomaly results by various variables including geos and signal types, and also view a national map offering a quick glimpse of locations of high FlaSH scores. Furthermore, the updated FlasH dashboard now enables me to take detailed notes on particularly interesting anomalies, trends and other issues of importance, and maintain these notes in an organized, searchable fashion within the platform. +

+

+Finally, now with the dashboard’s repositioned filtering menu, the page layout becomes an even more familiar environment. The menu echoes the user-friendly layouts of popular retail and informational sites, making navigation much more intuitive and smoother, thus allowing me to work through various options more quickly. +

+

+These new dashboard features allow me to devote more of my time and efforts to assessing anomalies of interest and focus on geographies with high concentrations of problematic data or noteworthy trends. +

+
+
+

Additional Information

+

For more information, please check out our demo video, open-source methods (1) (2), and publications (1) (2).

+

Members: Ananya Joshi, Nolan Gormley, Richa Gadgil, Tina Townes, Catalina Vajiac (part time)  

Former Members: Luke Neurieter, Katie Mazaitis  

Advisors: Peter Jhon, Roni Rosenfeld, Bryan Wilder

+

Revised July 12th 2024

+
diff --git a/content/blog/2024-01-30-flash-framework.Rmd b/content/blog/2024-01-30-flash-framework.Rmd index b2664051e..6a10e21a8 100644 --- a/content/blog/2024-01-30-flash-framework.Rmd +++ b/content/blog/2024-01-30-flash-framework.Rmd @@ -13,7 +13,7 @@ output: toc: true acknowledgements: Thank you to George Haff, Carlyn Van Dyke, and Ron Lunde for editing this blog post. --- -Insights from public health data can keep communities safe. However, identifying these insights in large volumes of modern public health data can be laborious^[Rosen, George. A history of public health. JHU Press, 2015.]. As a result, over the past few decades, public health agencies have built monitoring systems, like [ESSENCE](https://www.cdc.gov/nssp/new-users.html) (CDC), [EIOS](https://www.who.int/initiatives/eios) (WHO), and [DHIS2](https://dhis2.org/) (WHO), where users can set custom statistical alerts and then investigate these alerts using data visualizations^[Chen, Hsinchun, Daniel Zeng, and Ping Yan. Infectious disease informatics: syndromic surveillance for public health and biodefense. Vol. 21. New York: Springer, 2010.]. These alerting systems largely follow the following formula^[Murphy, Sean Patrick, and Howard Burkom. "Recombinant temporal aberration detection algorithms for enhanced biosurveillance." Journal of the American Medical Informatics Association 15.1 (2008): 77-86.] as shown in Fig 1.: +Insights from public health data can keep communities safe. However, identifying these insights in large volumes of modern public health data can be laborious^[Rosen, George. A history of public health. JHU Press, 2015.]. As a result, over the past few decades, public health agencies have built monitoring systems, like [ESSENCE](https://www.cdc.gov/nssp/new-users.html) (CDC) and [DHIS2](https://dhis2.org/) (WHO), where users can set custom statistical alerts and then investigate these alerts using data visualizations^[Chen, Hsinchun, Daniel Zeng, and Ping Yan. Infectious disease informatics: syndromic surveillance for public health and biodefense. Vol. 21. New York: Springer, 2010.]. These alerting systems largely follow the following formula^[Murphy, Sean Patrick, and Howard Burkom. "Recombinant temporal aberration detection algorithms for enhanced biosurveillance." Journal of the American Medical Informatics Association 15.1 (2008): 77-86.] as shown in Fig 1.:
diff --git a/content/blog/2024-01-30-flash-framework.html b/content/blog/2024-01-30-flash-framework.html index 0b9254a00..0c6b93943 100644 --- a/content/blog/2024-01-30-flash-framework.html +++ b/content/blog/2024-01-30-flash-framework.html @@ -21,7 +21,7 @@
-

Insights from public health data can keep communities safe. However, identifying these insights in large volumes of modern public health data can be laborious1. As a result, over the past few decades, public health agencies have built monitoring systems, like ESSENCE (CDC), EIOS (WHO), and DHIS2 (WHO), where users can set custom statistical alerts and then investigate these alerts using data visualizations2. These alerting systems largely follow the following formula3 as shown in Fig 1.:

+

Insights from public health data can keep communities safe. However, identifying these insights in large volumes of modern public health data can be laborious1. As a result, over the past few decades, public health agencies have built monitoring systems, like ESSENCE (CDC) and DHIS2 (WHO), where users can set custom statistical alerts and then investigate these alerts using data visualizations2. These alerting systems largely follow the following formula3 as shown in Fig 1.:

Fig 1 Standard Approach for Alerting Systems diff --git a/content/people/headshots/catalina.jpeg b/content/people/headshots/catalina.jpeg new file mode 100644 index 000000000..e737c79ee Binary files /dev/null and b/content/people/headshots/catalina.jpeg differ diff --git a/content/people/index.md b/content/people/index.md index 761cbad33..4ff30dd1a 100644 --- a/content/people/index.md +++ b/content/people/index.md @@ -910,6 +910,13 @@ people: affiliation: CMU/MLD team: - core +- key: catalina + firstName: Catalina + lastName: Vajiac + image: catalina.jpeg + affiliation: CMU/CSD + team: + - contributors - firstName: Ana Karina lastName: Van Nortwick image: ana-karina-van-nortwick.jpeg diff --git a/static/blog/2024-01-01-flash-intro/new_dash.png b/static/blog/2024-01-01-flash-intro/new_dash.png new file mode 100644 index 000000000..29d13baba Binary files /dev/null and b/static/blog/2024-01-01-flash-intro/new_dash.png differ diff --git a/static/blog/2024-01-01-flash-intro/old_dash.png b/static/blog/2024-01-01-flash-intro/old_dash.png new file mode 100644 index 000000000..7452403b4 Binary files /dev/null and b/static/blog/2024-01-01-flash-intro/old_dash.png differ