-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ocn 027 rw0 #292
base: master
Are you sure you want to change the base?
Ocn 027 rw0 #292
Conversation
…plumes_processing.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor change requests and a couple questions but overall looks good.
Only thing I didn't check was if the column names in Carto match those in the files sent to s3. Did you do that?
|
||
You can also download the original dataset [directly through Resource Watch](http://wri-public-data.s3.amazonaws.com/resourcewatch/raster/ocn_027_rw0_nitrogen_plumes.zip), or [from the source website](https://knb.ecoinformatics.org/view/urn%3Auuid%3Ac7bdc77e-6c7d-46b6-8bfc-a66491119d07). | ||
|
||
###### Note: This dataset processing was done by Claire Hemmerly, and QC'd by [Chris Rowe](https://www.wri.org/profile/chris-rowe). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add any link here if you want.
|
||
You can view the processed Wastewater Plumes in Coastal Areas dataset [on Resource Watch](https://resourcewatch.org/data/explore/11804f04-d9c7-47b9-8d27-27ce6ed6c042). | ||
|
||
You can also download the original dataset [directly through Resource Watch](http://wri-public-data.s3.amazonaws.com/resourcewatch/raster/ocn_027_rw0_nitrogen_plumes.zip), or [from the source website](https://knb.ecoinformatics.org/view/urn%3Auuid%3Ac7bdc77e-6c7d-46b6-8bfc-a66491119d07). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this if a raster file we don't link to our s3 version. You can just link to the source.
Side note, do you know why the zip file on our s3 is so large? My browser says it is 11GB but the source file looks to be 250MB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I compressed the zipfile and uploaded to the correct folder. I deleted the one that was 11G
|
||
# convert the data type of columns to integer | ||
for col in gdf.columns[1:9]: | ||
gdf[col] = gdf[col].fillna(0).astype('int') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure we want na values to be zero? Are there any existing zero values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only NA values are in the columns that show % nitrogen input, which we're not using, and the same rows have 0 for the nitrogen input in g/yr, so I think updating the NAs to 0 should be fine. I had trouble creating the processed shapefile when the values were floats not integers, I got an error basically saying the numbers were too big.
|
||
# convert the data type of columns to integer | ||
for col in gdf.columns[1:9]: | ||
gdf[col] = gdf[col].fillna(0).astype('int') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, just want to make sure we are accurately translating the data here.
''' | ||
|
||
# load in the polygon shapefile | ||
shapes = os.path.join(raw_data_file_unzipped, 'effluent_N_pourpoints_all.shp') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use processed_data_file
here instead of shapes
for consistency with our other scripts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think the main thing is that the final thing you upload is called processed_data_file
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ive updated this section, let me know if it looks ok
set_default_credentials(username=CARTO_USER, base_url="https://{user}.carto.com/".format(user=CARTO_USER),api_key=CARTO_KEY) | ||
|
||
# upload data frame to Carto | ||
to_carto(gdf, dataset_name + '_edit', if_exists='replace') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency can you use processed_data_file
instead of gdf
?
Upload original data and processed data to Amazon S3 storage | ||
''' | ||
# initialize AWS variables | ||
aws_bucket = 'wri-public-data' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should actually be aws_bucket = 'wri-projects'
for raster datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
I added the processed data file steps and checked the col names, should match now |
Checklist for Reviewing a Pre-Processing Script