-
-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducibility for NZ coast #525
Comments
hi nick, which version of CoastSat are you using? I would need to dig out the parameters that I used for NZ (the different settings + classifier used can make a big difference). Tidal correction can also contribute to it, although 40 m bias is huge. |
I've tried with https://pypi.org/project/coastsat-package/0.1.73/ and https://github.com/kvos/CoastSat/releases/tag/v2.6 and had the same problem. I'm using these settings: # settings for the shoreline extraction
settings = {
# general parameters:
'cloud_thresh': 0.1, # threshold on maximum cloud cover
'dist_clouds': 100, # ditance around clouds where shoreline can't be mapped
'output_epsg': 2193, # epsg code of spatial reference system desired for the output
# quality control:
'check_detection': False, # if True, shows each shoreline detection to the user for validation
'adjust_detection': False, # if True, allows user to adjust the postion of each shoreline by changing the threhold
'save_figure': True, # if True, saves a figure showing the mapped shoreline for each image
# [ONLY FOR ADVANCED USERS] shoreline detection parameters:
'min_beach_area': 1000, # minimum area (in metres^2) for an object to be labelled as a beach
'min_length_sl': 500, # minimum length (in metres) of shoreline perimeter to be valid
'cloud_mask_issue': False, # switch this parameter to True if sand pixels are masked (in black) on many images
'sand_color': 'default', # 'default', 'latest', 'dark' (for grey/black sand beaches) or 'bright' (for white sand beaches)
'pan_off': False, # True to switch pansharpening off for Landsat 7/8/9 imagery
's2cloudless_prob': 40, # threshold to identify cloud pixels in the s2cloudless probability mask
# add the inputs defined previously
'inputs': inputs
} I did also try with |
Hi Nick, Its possible that your reference shoreline buffer is too small if the reference shoreline buffer isn't covering enough of your beach causing you to have to resample the reference shoreline. You could try setting Coastsat_package is meant to replicate the original coastsat with only some minor alterations that are documented in the readme. I'll take a look at the code to see if there is anything wrong that could be causing this issue, but my hunch is that this difference is due to different settings. You could try experimenting by changing the 'min_beach_area' and 'min_length_sl' to see if that changes the results. |
I don't think it's a problem with Seems a bit noisier than 100: The offset issue still occurs with I tried halving |
Could this be a CRS issue? What CRS did you use @kvos? |
as long as you have used a local projection system CRS should be fine (only small differences). Maybe try a different site to see if the differences are also so large. CoastSat v1 and v2 give very similar results in most places but occasionally due to a pixel rounding issue with GEE there is a pixel shift between L5 and L7/L8, so that was the main fix from v1 to v2. |
Yes - I'm using L5, L7 and L8. We seem to have a good overlap in terms of satellite imagery timestamps, judging by this venn diagram: I tried with those settings, same problem I've been downloading imagery in parallel for all of the NZ sites over the past few days (with https://github.com/UoA-eResearch/CoastSat/blob/main/batch_process.py), so am in a position to perform large scale comparison. For the 457 sites I've processed so far, comparing with your results for those exact sites, here's the output of count 457.000000
mean -40.086508
std 19.911214
min -101.793516
25% -52.151887
50% -38.624932
75% -27.483396
max 86.271906
Name: diff, dtype: float64 or, looking at the composite transects: count 24573.000000
mean -42.372753
std 20.819492
min -332.331356
25% -56.321811
50% -41.023628
75% -28.938976
max 181.798978
dtype: float64 |
I've also run into a separate issue with id count
6 nzd0418-0000 2
5 nzd0418-0001 2
4 nzd0418-0002 2
3 nzd0418-0003 2
2 nzd0418-0004 2
11 nzd0418-0005 2
21 nzd0418-0006 2
7 nzd0419-0000 2
8 nzd0419-0001 2
9 nzd0419-0002 2
10 nzd0419-0003 2
12 nzd0419-0004 2
1 nzd0419-0005 2
13 nzd0419-0006 2
0 nzd0419-0007 2
14 nzd0419-0008 2
15 nzd0419-0009 2
16 nzd0419-0010 2
17 nzd0419-0011 2
18 nzd0419-0012 2
19 nzd0419-0013 2
20 nzd0419-0014 2 |
I cannot tell what is happening by looking at the distribution of differences, but if you plot a few transects with the time-series from both coastsat-website and your run I may be able to pick some differences. If you can color/symbol each sensor that would also help. |
Could this be a collection 1 vs 2 issue? |
ok that's a big difference. time_series_raw.csv are not tidally corrected. Collection 1 vs Collection 2 would not create such differences as it's only some improvement of georeferencing of the images. |
I will also run nzd0188 with my coastsat installation to see what it outputs and compare with yours. give me some time as I'm doing this after work so I may take a week or two |
I tried with coastsat latest master (commit hash |
this is all pointing towards the CoastSat v1 vs v2 versions and that one to half-a-pixel shift. I'll send you another raw time-series file to test. Have you tried at another site to see if this bias is also at other locations or just nzd0188? |
Sounds plausible - with 30m resolution, a 1-1.5 pixel shift would be ~40m. Yes, as you can see above (#525 (comment)) I checked the bias against all 560 NZ sites, it's pretty consistently ~40m. I can try run CoastSat v1 and compare |
I don't think you will be able to run CoastSat v1 as the API have changed since then (and Collection 1 is not available any more). I have found my old hard drive with several versions of the dataset, can you pls try the following for ndz0188 see if any differences and if any matches with yours. I have a document that mentions which settings were used for each run that I can share afterwards. |
v2 diff 13.3115900009912
v4 diff 13.470779674808872
v6 diff 0.0 Looks like v6 is identical to v1.4 on Zenodo, v2 and v4 are ~13m offset |
Ok that doesn't explain the 40 m bias. I checked again and when I did that run, I used project EPSG:3857 for New Zealand, which is a terrible projection. Maybe try to rerun your case with 3857 to see if it gets closer? In any case it would be better to rerun with a local projection. |
the length of the transects does not matter as in the code it's doing a point to line intersection. But you can set how far from the reference shorelines you are happy to detection shorelines with the |
I'm re-running also that nzd site and will let you know if I also have that bias. |
Looks like that's it - if I run with 3857 instead of 2193, I get much closer intersects to you: nzd0188-0000 -4.434607
nzd0188-0001 0.411641
nzd0188-0002 0.920183
nzd0188-0003 2.035048
nzd0188-0004 2.440351
nzd0188-0005 3.095899
nzd0188-0006 3.454840
nzd0188-0007 3.273954
nzd0188-0008 4.340197
nzd0188-0009 4.085155
nzd0188-0010 3.926616
nzd0188-0011 4.420428
nzd0188-0012 5.972960
nzd0188-0013 5.352206
nzd0188-0014 8.106336
dtype: float64
count 15.000000
mean 3.160080
std 2.850384
min -4.434607
25% 2.237699
50% 3.454840
75% 4.380313
max 8.106336
dtype: float64 Even just doing something as "simple" as measuring the length of the transects in NZ gives different results between 3857 and 2193: transects = gpd.read_file("transects.geojson")
transects = transects[transects.site_id.str.startswith("nzd")]
transects["2193_3857_diff"] = transects.to_crs(2193).length - transects.to_crs(3857).length
display(transects["2193_3857_diff"].describe())
transects["2193_3857_diff"].plot.hist(bins=100)
plt.show()
transects.plot("2193_3857_diff", legend=True)
plt.show() outputs: count 32357.000000
mean -96.913352
std 16.042381
min -126.141293
25% -109.831806
50% -97.275156
75% -83.160770
max -69.910778
dtype: float64 The distortion gets worse the further south you go |
good one, finally we found the issue! That 3857 projection is really terrible, I don't know why I used it for that run that is on the zenodo repo. I did use 2193 when I ran CoastSat locally for validation at Tairua, but then when I batch processed I used 3857 for both New Zealand and Japan (and local projections for the other countries). Back then, I never thought it would make more than 1m difference, 40 m is huge. Ok, so I will rerun the whole data in local UTM projections for Japan and NZ. For the moment, you can trust your outputs using 2193 more than the zenodo dataset. |
btw, since you did this comparison, I would be interested to know whether you found any differences by running coastsat or coastseg with same parameters? thx |
It looks like EPSG:6669 would be suitable for Japan. Comparing the length of the Japan transects between 6669 and 3857: transects = gpd.read_file("transects.geojson")
transects = transects[transects.site_id.str.startswith("jap")]
transects["6669_3857_diff"] = transects.to_crs(6669).length - transects.to_crs(3857).length
display(transects["6669_3857_diff"].describe())
transects["6669_3857_diff"].plot.hist(bins=100)
plt.show()
transects.plot("6669_3857_diff", legend=True)
plt.show() outputs: count 11257.000000
mean -84.074368
std 17.780187
min -114.021744
25% -101.615119
50% -80.386440
75% -69.361963
max -54.925937
Name: 6669_3857_diff, dtype: float64 Given Japan is on the other side of the equator, the distortion gets worse the further north you go Only differences between coastsat_package and this repo I noticed is that the downloaded files are named slightly differently:
And that issue I mentioned before about having to flip coordinate order (which seems to also be a problem in earlier versions of CoastSat, so might've been fixed here but not there?). Could be because 2193 is in northing, easting order (y,x), whereas 3857 is easting, northing order (x,y). The resulting intersects were roughly the same, <3m mean diff: nzd0188-0000 -0.091116
nzd0188-0001 -0.043623
nzd0188-0002 -0.049835
nzd0188-0003 -0.027537
nzd0188-0004 -0.005873
nzd0188-0005 -0.048840
nzd0188-0006 -0.022342
nzd0188-0007 -0.083212
nzd0188-0008 0.056931
nzd0188-0009 0.547146
nzd0188-0010 0.031298
nzd0188-0011 0.504352
nzd0188-0012 0.997233
nzd0188-0013 2.213869
nzd0188-0014 1.667043
dtype: float64 I also note the README (both here, and in coastseg) recommends using Anaconda, might be worth mentioning that Anaconda is only free for organisations with <200 employees - https://www.theregister.com/2024/08/08/anaconda_puts_the_squeeze_on/ You might also be interested in taking a look at the Leaflet map I made - https://uoa-eresearch.github.io/CoastSat/?debug Would appreciate any feedback you have about that |
hi Nick, love the Leaflet map! great work, 250 lines of HTML/JS and looks like a full app! I'm really impressed.
|
Yes
Yes
About a week, downloading imagery in parallel with https://github.com/UoA-eResearch/CoastSat/blob/main/batch_process.py
Yes
Yes
Yes, with https://github.com/UoA-eResearch/CoastSat/blob/main/tidal_correction.ipynb
Yes, but you should be aware that I've extended the transects to fix the problem I mentioned in #525 (comment), so those intersect measurements won't match against your original transects.geojson, only my https://github.com/UoA-eResearch/CoastSat/blob/main/transects_extended.geojson |
ok thanks for the info, all in one week is a good effort, I remember when I did it a few years ago the bottleneck was bandwidth and it would take much longer to downlaod all the Landsat imagery for 500 beaches. How did you parallelise it? within the same machine? Your batch_process.py looks very similar to my own except that it's not estimating the slopes. A couple of suggestions:
Extenting the seawards end of the transects makes no difference at all as it's using a point-to-line equation. You can use the original transects and increase |
With https://tqdm.github.io/docs/contrib.concurrent/#process_map, see https://github.com/UoA-eResearch/CoastSat/blob/3dac01970962ce784fde80ea8adce16a9d4071dd/batch_process.py#L138.
Yes, on my wave.storm-surge.cloud.edu.au NeCTAR VM. Are you familiar with NeCTAR? It would've been available to you at UNSW, via Intersect - https://ardc.edu.au/services/ardc-nectar-research-cloud/. It might even still be available to you at the DPE via Intersect, https://support.ehelp.edu.au/support/solutions/articles/6000264341 says "Select NSW Government Departments".
No. Downloading each site is painfully slow, upwards of an hour. The trick is that many slow things together, in parallel, have a higher overall throughput. Downloading 32 sites in parallel seems good. I also looked into GEE's high volume endpoint:
I didn't need to estimate slopes, as you'd already done it for me ;) |
As for the utilty of extending the transects, I understand now it's not an issue, but my first instinct when looking at them was that something was wrong. So by leaving them extended, I'm trying to pre-emptively prevent others (like my colleagues for example) from thinking something is wrong. Because if that was my first instinct, it would likely be theirs too. Also it might come in handy when intersecting other datasets, like https://data.coastalchange.nz/layer/119441-nzccd-coastlines/ |
hey @neon-ninja, I'm serving the coastsat data via http://www.coastsat.space , an upgraded version of the original coastsat web portal. The data here for NZ is the one that is on Zenodo. While I can reprocess Japan with a better projection, I think there is no point on doing it for NZ and maintain two versions. Do you think I can link it to the data in your cloud buckets directly? |
Yes. Do you want the version prior to extending the transects? Or will you use my extended transects? |
dig coastsat.space
; <<>> DiG 9.18.18-0ubuntu0.22.04.2-Ubuntu <<>> coastsat.space
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60487
;; flags: qr rd ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;coastsat.space. IN A
;; ANSWER SECTION:
coastsat.space. 0 IN A 192.168.1.112
;; Query time: 0 msec
;; SERVER: 172.23.48.1#53(172.23.48.1) (UDP)
;; WHEN: Tue Sep 10 09:57:31 NZST 2024
;; MSG SIZE rcvd: 62 I don't think this will work outside your local network right now |
sorry I had restricted it to local network while testing, should work now. |
192.168.1.112 is an IP within a reserved special use subnet (https://en.wikipedia.org/wiki/Reserved_IP_addresses), it's a "Private network", "Used for local communications within a private network". It's not going to work outside of your local network. See https://downforeveryoneorjustme.com/coastsat.space. If you intend to host this at your home (not recommended, better to host it on NeCTAR), you'd need to configure the domain coastsat.space to point at your WAN IP (the ipv4 address shown on https://whatismyipaddress.com/), and configure your router NAT settings to forward ports 80 and 443 to the machine hosting the web server.
Yes, I do, everything in git is tracked. I extended the transects in UoA-eResearch/CoastSat@79e7edb, so the commit before that (UoA-eResearch/CoastSat@b69b671) would be using your transects as is. Intersects are here -> https://github.com/UoA-eResearch/CoastSat/tree/79e7edbc8a5e7dc31a21e2a71cddd6bd306ad4a7/data. If you want to access them directly from a browser, you can fix the mimetype + add a CDN via a URL like https://rawcdn.githack.com/UoA-eResearch/CoastSat/79e7edbc8a5e7dc31a21e2a71cddd6bd306ad4a7/data/nzd0001/transect_time_series_tidally_corrected.csv I also kept all of the imagery, so it's relatively easy to rerun it all. |
dig coastsat.space
; <<>> DiG 9.18.18-0ubuntu0.22.04.2-Ubuntu <<>> coastsat.space
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39242
;; flags: qr rd ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;coastsat.space. IN A
;; ANSWER SECTION:
coastsat.space. 0 IN A 61.68.1.31
;; Query time: 3630 msec
;; SERVER: 172.23.48.1#53(172.23.48.1) (UDP)
;; WHEN: Fri Sep 20 11:12:39 NZST 2024
;; MSG SIZE rcvd: 62 Looks like http://coastsat.space/ is working now (port 80 / plain HTTP only, no 443 / HTTPS / letsencrypt) |
cheers mate, mostly thanks to your tips! learning heaps from this conversation. I'll try to update the database soon, running a new projection for Japan (suffers same issue as NZ), and will write a script that retrieves your CSV files for NZ and writes into my postgres db. Will certainly add a note to acknowledge the source and link to your leaflet map. For the extended transects did you only change the length, not the orientation? |
Yes - here's where I did that - https://github.com/UoA-eResearch/CoastSat/blob/79e7edbc8a5e7dc31a21e2a71cddd6bd306ad4a7/extend_transects.ipynb |
Just letting you know, Giovanni Coco said that we should publish our recalculated intersects on Zenodo since this dataset is significantly different to https://zenodo.org/records/7758183, and we want to avoid confusion. Key differences: I've used 2193 I've enabled the Zenodo GitHub integration, and updated https://github.com/UoA-eResearch/CoastSat/blob/main/update.sh to automatically push a tag and release when it runs (this script also fetches new imagery and finds shorelines / intersects on the 1st of every month). So, there should be a new automated release every month, with the latest data. This is my first time using Zenodo, hopefully I did it all correctly. |
yes good idea, can you make a link to the Pacific Rim repo and mention that it's an enhancement for NZ? Once I re-process the rest of the Pacific (exc NZ) with better projection, Landsat 9 and FES2022, I'll take a copy of your Zenodo as well so that they are all sitting in one spot. |
Sure, done. Here's the Zenodo link - https://zenodo.org/records/13835883 |
Hi @kvos,
Thanks for making this open source. I'm trying to partially update the dataset published at https://zenodo.org/records/7758183, to add in shorelines from satellite imagery from 2022-2024 in New Zealand. One of the problems I noticed while trying to do this, is that I don't get the same results as you (pre-tidal correction), even for the exact same satellite imagery.
I'm using @2320sharon's coastsat-package (https://github.com/SatelliteShorelines/coastsat_package, 0.1.73)
Here's my notebook: https://github.com/UoA-eResearch/CoastSat/blob/b3a9739af2e26fae26ed4b7bf68dea8b040ed9e0/update.ipynb
I'm looking at site
nzd0188
, which is Tairua beachI'm using the
polygons.geojson
,shorelines.geojson
, andtransects.geojson
from that Zenodo linkI noticed a problem with the reference shoreline though, it didn't have enough points to accurately build a mask, as that relies on buffering from the points that construe the line. So I had to split the shoreline by the transects to add in enough points.
I've also noticed I've had to flip coordinate order (lat lng -> lng lat) at times in order for things to work
As you can see from the table at the bottom of my notebook, my calculated intersects are typically off from yours by roughly 40m
Any ideas what I'm doing wrong?
Cheers,
Nick
The text was updated successfully, but these errors were encountered: