Based on handed scraped data on goverment reports:
- Argentina: https://docs.google.com/spreadsheets/d/e/2PACX-1vTfinng5SDBH9RSJMHJk28dUlW3VVSuvqaBSGzU-fYRTVLCzOkw1MnY17L2tWsSOppHB96fr21Ykbyv/pub#
- Santa Fe: https://docs.google.com/spreadsheets/d/19aa5sqdsj3nYmBqllPXvgj72cvx63SzB2Hx8B02vMwU
Santa Fe reports are cumulative. National reports shows new daily cases.
We decided to work with cumulatives time series. There is a smart design decision behind that: if we have cumulative confirmed cases, we don't have to read all the entries, only with the frequentcy we are interested in (imagine weekly analysis).
'Sospechosos' could decrease because some cases can move to 'Confirmados' or 'Descartados'.
For non python users csv's are generated periodically to be parsed and used (see ./csv/
) folder. All with cumulative time series.
- For Santa Fe:
csv/SantaFe_AllData.csv
- For Argentina:
csv/Argentina_Provinces.csv
Check last update time on csv/last_update.txt
Python API for working with Argentina COVID data reported.
DataTypes exported:
- COVIDStats namedtuple
- ArgentinaAPI class
API methods:
- api.get_stats(date) API public properties:
- api.df_provinces : pandas.DataFrame
- api.provinces : List[str]
from argentina_api import *
print('COVIDStats namedtuple:', COVIDStats._fields)
COVIDStats namedtuple: ('date', 'place_name', 'confirmados', 'muertos', 'recuperados', 'activos')
When load the data, the API tells if there are no entries in 'Info' sheet for certain city.
api = ArgentinaAPI('./')
Downloading Argentinian provinces table from google drive (https://docs.google.com/spreadsheets/d/e/2PACX-1vTfinng5SDBH9RSJMHJk28dUlW3VVSuvqaBSGzU-fYRTVLCzOkw1MnY17L2tWsSOppHB96fr21Ykbyv/pub#)
Date must be expressed in DD/MM format.
api.get_stats('26/03')[:3]
[COVIDStats(date='26/03', place_name='BUENOS AIRES', confirmados=158, muertos=4, recuperados=15, activos=139),
COVIDStats(date='26/03', place_name='CABA', confirmados=197, muertos=4, recuperados=53, activos=140),
COVIDStats(date='26/03', place_name='CATAMARCA', confirmados=0, muertos=0, recuperados=0, activos=0)]
Also exports a DataFrame df_provinces
.
With the content of Google Drive data by province (see link above). Provinces names are normalized using normalize_str function.
api.df_provinces.head(3)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
03/03 | 04/03 | 05/03 | 06/03 | 07/03 | 08/03 | 09/03 | 10/03 | 11/03 | 12/03 | ... | 02/04 | 03/04 | 04/04 | 05/04 | 06/04 | 07/04 | 08/04 | 09/04 | 10/04 | 11/04 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TYPE | PROVINCIA | |||||||||||||||||||||
ACTIVOS | BUENOS AIRES | 0 | 0 | 1 | 2 | 2 | 3 | 3 | 4 | 5 | 9 | ... | 289 | 308 | 333 | 365 | 375 | 405 | 421 | 442 | 460 | 493 |
CABA | 1 | 1 | 1 | 5 | 5 | 7 | 8 | 9 | 10 | 13 | ... | 283 | 311 | 345 | 376 | 389 | 411 | 427 | 445 | 455 | 498 | |
CATAMARCA | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 rows × 40 columns
Also exports a List[str] provinces
with all the provinces names:
api.provinces
['BUENOS AIRES',
'CABA',
'CATAMARCA',
'CHACO',
'CHUBUT',
'CORDOBA',
'CORRIENTES',
'ENTRE RIOS',
'FORMOSA',
'JUJUY',
'LA PAMPA',
'LA RIOJA',
'MENDOZA',
'MISIONES',
'NEUQUEN',
'RIO NEGRO',
'SALTA',
'SAN JUAN',
'SAN LUIS',
'SANTA CRUZ',
'SANTA FE',
'SANTIAGO DEL ESTERO',
'TIERRA DEL FUEGO',
'TUCUMAN']
provinces = api.df_provinces.loc['CONFIRMADOS']['26/03']
provinces = provinces[provinces>0]
provinces.plot.bar()
<matplotlib.axes._subplots.AxesSubplot at 0x7fce3ff5e280>
Python API for working with Santa Fe (Argentina) COVID data reported.
DataTypes exported:
- CityInfo, COVIDStats namedtuples
- SantaFeAPI class
API methods:
- api.get_stats(date)
- api.get_cities_stats(date)
- api.get_departments_stats(date) API public properties:
- api.df : pandas.DataFrame
- api.all_names : List[str]
- api.cities : List[str]
- api.departments : List[str]
Exported functions:
- is_city(str)
- is_deparment(str)
- normalize_str(str)
from santa_fe_api import *
print('COVIDStats namedtuple:', COVIDStats._fields)
COVIDStats namedtuple: ('date', 'place_name', 'confirmados', 'descartados', 'sospechosos')
When load the data the API tells if there are no entries in 'Info' sheet for certain city.
api = SantaFeAPI('./')
Download from google drive...
Date must be expressed in DD/M/YYYY format.
api.get_stats('26/3/2020')[:3]
[COVIDStats(date='26/3/2020', place_name='#D_IRIONDO', confirmados=0.0, descartados=1.0, sospechosos=0.0),
COVIDStats(date='26/3/2020', place_name='ARMSTRONG', confirmados=0.0, descartados=1.0, sospechosos=0.0),
COVIDStats(date='26/3/2020', place_name='RAFAELA', confirmados=5.0, descartados=1.0, sospechosos=9.0)]
api.get_cities_stats('26/3/2020')[:3]
[COVIDStats(date='26/3/2020', place_name='ARMSTRONG', confirmados=0.0, descartados=1.0, sospechosos=0.0),
COVIDStats(date='26/3/2020', place_name='RAFAELA', confirmados=5.0, descartados=1.0, sospechosos=9.0),
COVIDStats(date='26/3/2020', place_name='SAN GENARO', confirmados=0.0, descartados=0.0, sospechosos=1.0)]
api.get_departments_stats('26/3/2020')[:10]
[COVIDStats(date='26/3/2020', place_name='#D_IRIONDO', confirmados=0.0, descartados=1.0, sospechosos=0.0),
COVIDStats(date='26/3/2020', place_name='#D_GENERAL LOPEZ', confirmados=1.0, descartados=5.0, sospechosos=4.0),
COVIDStats(date='26/3/2020', place_name='#D_SAN CRISTOBAL', confirmados=0.0, descartados=1.0, sospechosos=1.0),
COVIDStats(date='26/3/2020', place_name='#D_GARAY', confirmados=2.0, descartados=2.0, sospechosos=5.0),
COVIDStats(date='26/3/2020', place_name='#D_GENERAL OBLIGADO', confirmados=0.0, descartados=2.0, sospechosos=5.0),
COVIDStats(date='26/3/2020', place_name='#D_SAN JUSTO', confirmados=0.0, descartados=0.0, sospechosos=2.0),
COVIDStats(date='26/3/2020', place_name='#D_ROSARIO', confirmados=15.0, descartados=88.0, sospechosos=22.0),
COVIDStats(date='26/3/2020', place_name='#D_CASTELLANOS', confirmados=5.0, descartados=4.0, sospechosos=11.0),
COVIDStats(date='26/3/2020', place_name='#D_CASEROS', confirmados=2.0, descartados=0.0, sospechosos=1.0),
COVIDStats(date='26/3/2020', place_name='#D_CONSTITUCION', confirmados=1.0, descartados=3.0, sospechosos=1.0)]
Also exports a pandas.DataFrame df
.
With the content of Google Drive 'AllData' with ['TYPE','DEPARTMENT','PLACE']
index.
Values are cumulative. City names are normalized using normalize_str function.
api.df.head(3)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
14/3/2020 | 15/3/2020 | 16/3/2020 | 17/3/2020 | 18/3/2020 | 19/3/2020 | 20/3/2020 | 21/3/2020 | 22/3/2020 | 23/3/2020 | ... | 31/3/2020 | 1/4/2020 | 2/4/2020 | 3/4/2020 | 4/4/2020 | 5/4/2020 | 6/4/2020 | 7/4/2020 | 8/4/2020 | 9/4/2020 | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TYPE | DEPARTMENT | PLACE | |||||||||||||||||||||
CONFIRMADOS | ##TOTAL | ##TOTAL | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 2.0 | 2.0 | 4.0 | 4.0 | 15.0 | ... | 133.0 | 144.0 | 152.0 | 160.0 | 165.0 | 176.0 | 184.0 | 187.0 | 189.0 | 195.0 |
#D_9 DE JULIO | #D_9 DE JULIO | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
TOSTADO | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 rows × 27 columns
to_department(str)
property stores CityName to DepartmentName assignations.
all_names
property stores Set[CityName or DepartmentName].
cities
property stores Set[CityName].
departments
property stores Set[DepartmentName].
print('Some cities: {}'.format(list(api.cities)[:3]))
print('Some departments: {}'.format(list(api.departments)[:3]))
print('Cities with the respective departments: {}'.format([(c,api.to_department[c]) for c in list(api.cities)[:2]]))
Some cities: ['ARMSTRONG', 'RAFAELA', 'SAN GENARO']
Some departments: ['#D_IRIONDO', '#D_CONSTITUCION', '#D_GENERAL OBLIGADO']
Cities with the respective departments: [('ARMSTRONG', '#D_BELGRANO'), ('RAFAELA', '#D_CASTELLANOS')]
Uses is_city(str)
is_deparment(str)
method to check if a place name is city or department.
ciudades = api.df.loc['CONFIRMADOS'][ api.df.loc['CONFIRMADOS'].index.map(lambda x : is_city(x[1])) ]['26/3/2020']
ciudades = ciudades[ciudades>0].reset_index(['DEPARTMENT'],drop=True)
ciudades.plot.bar()
<matplotlib.axes._subplots.AxesSubplot at 0x7fce3d839dc0>