Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doctor Visits show an unexpected weekly trend #2044

Open
nolangormley opened this issue Aug 30, 2024 · 2 comments
Open

Doctor Visits show an unexpected weekly trend #2044

nolangormley opened this issue Aug 30, 2024 · 2 comments
Assignees
Labels
data quality Missing data, weird data, broken data

Comments

@nolangormley
Copy link
Contributor

Actual Behavior:

When looking at the data from the Doctor Visits signal, it shows a weekly trend, similar to signals that have weekly reporting (where there is a spike at the beginning of the week).

docvisit

Expected behavior

Roni and I were looking through this yesterday and didn't seem to understand why this was. Since this is a daily reported signal, we expected it to be much more smooth.

Context

Here's some code to replicate the plot above

import wget

docvisit = wget.download("https://api.covidcast.cmu.edu/epidata/covidcast/csv?signal=doctor-visits:smoothed_cli&start_day=2024-05-29&end_day=2024-08-29&geo_type=nation")
docvisitadj = wget.download("https://api.covidcast.cmu.edu/epidata/covidcast/csv?signal=doctor-visits:smoothed_adj_cli&start_day=2024-05-29&end_day=2024-08-29&geo_type=nation")

df = pd.read_csv("covidcast-doctor-visits-smoothed_cli-2024-05-29-to-2024-08-29.csv")
dfadj = pd.read_csv("covidcast-doctor-visits-smoothed_adj_cli-2024-05-29-to-2024-08-29.csv")

df.time_value = pd.to_datetime(df.time_value, utc=True)
dfadj.time_value = pd.to_datetime(dfadj.time_value, utc=True)
dfadj = dfadj[['time_value', 'value']].rename(columns={'time_value':'time_value', 'value':'valueadj'})

foo = df[['time_value', 'value']].merge(dfadj, on='time_value', how='left')
foo.plot(x='time_value', y=['value', 'valueadj'])
@nolangormley nolangormley added the data quality Missing data, weird data, broken data label Aug 30, 2024
@nolangormley nolangormley self-assigned this Aug 30, 2024
@RoniRos
Copy link
Member

RoniRos commented Sep 1, 2024

Back in July Peter and I noticed this pattern in the "hospital admissions" signals in Texas. Dmitry investigated it and concluded (1) it is already present in the raw signal we receive; and (2) in that signal it is only present in data from Texas but not from other states. The current signal (Doctors Visits) is from the same source. :-(

@dshemetov Did I remember your conclusions correctly? And did we ever file an issue about it? If so, we should link/consolidate them.

@dshemetov
Copy link
Contributor

dshemetov commented Oct 2, 2024

Hi @RoniRos, sorry this slipped my radar.

As to (1): I only guessed that it wasn't in the raw data in hospital-admissions. I looked at the data with weekday effects removed (unadjusted) and found that it still had this anomaly, but the unadjusted signal is still downstream of a "left Gaussian linear" smoother, which could have a bug. My personal hunch was that a smoother was unlikely to lead to this pattern, but you had the opposite intuition. I didn't try to look for pre-smoother raw data, have had very few extra cycles for this investigation.

As to (2): I didn't do a comprehensive comparison to all the states, but a handful or so, and Texas was the only one with this anomaly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data quality Missing data, weird data, broken data
Projects
None yet
Development

No branches or pull requests

3 participants