Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regularize data fetching methods #366

Open
sgsmob opened this issue Oct 26, 2020 · 3 comments
Open

Regularize data fetching methods #366

sgsmob opened this issue Oct 26, 2020 · 3 comments
Labels
Engineering Used to filter issues when synching with Asana needs-coordination This work should be assigned to a coordinator and split up into several subtasks

Comments

@sgsmob
Copy link
Contributor

sgsmob commented Oct 26, 2020

There are many different ways the indicators fetch remote data (email, AWS S3, cURL, sftp, etc.) and each one does so in a different spot. It would be helpful to have some regularized interface for data fetching (say, an AbstractDataFetcher class from which EmailDataFetcher, etc. inherit) as part of all indicators to consistently and clearly handle data fetching and any associated failures.

@krivard krivard added the needs-coordination This work should be assigned to a coordinator and split up into several subtasks label Oct 26, 2020
@chinandrew
Copy link
Contributor

chinandrew commented Nov 11, 2020

Was curious and did an inventory of data fetching methods for reference.

  • cdc_covidnet: makes API call with requests
  • changehc: FTP download with paramiko
  • claims_hosp: not 100% sure, but there's pandas calls to filepath variable (EDIT: FTP download with paramiko)
  • combo_cases_and_deaths: uses covidcast.signal
  • google_symptoms: passes download URL to pandas
  • jhu: passes download URL to pandas
  • nchs_mortality: uses sodapy to access Socrata API
  • quidel and quidel_covidtest: accesses an email address with imap_tools
  • safegraph: downloads from s3 with an aws s3 sync+subprocess command
  • usafacts: passes download URL to pandas

@krivard
Copy link
Contributor

krivard commented Nov 11, 2020

@mariajahja do you have details on how claims_hosp fetches source data? Is it a separate process that pulls from a magic email address? It might be good to normalize that with how quidel does it.

@mariajahja
Copy link
Member

@krivard It’s a separate python script (using paramiko) that pulls from a private Delphi ftp server. HSP deposits them there directly, and a bigchunk machine downloads it locally, runs the sensor update, then deletes the copy.

If it’s helpful, back when HSP used email @korlaxxalrok created a script to pull that data.

@SumitDELPHI SumitDELPHI added the Engineering Used to filter issues when synching with Asana label Dec 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Engineering Used to filter issues when synching with Asana needs-coordination This work should be assigned to a coordinator and split up into several subtasks
Projects
None yet
Development

No branches or pull requests

5 participants