Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AggregateRaster (Zonal statistics) discrepancy with QGIS Zonal statistics #43

Open
tastatham opened this issue Apr 24, 2020 · 4 comments

Comments

@tastatham
Copy link

Hi Dask-geomodeling,

I came across this package a few months back and @caspervdw very kindly provided an example for applying zonal statistics. See here for the implementation we are using.

I have spot checked the zonal statistic outputs from your package with QGIS zonal statistics and the outputs are very different. For example; comparing the zonal statistics outputed from QGIS (GHSLsum) and dask-geomodeling (ghsl) we can see there is a clear difference.
ghsl
The highlighted cell in yellow is a 1km2 vector grid cell and the two highlighted red cells are the ghsl raster layers (250m). Just these two raster cells alone have a VALUE of 7.29 and 32.12, which suggests the QGIS approach is more realistic.

I believe that these differences is due to the fact that the QGIS zonal statistics includes a weight (fraction of each cell covered by the polygon), that takes into account partial overlays, whereas the dask-geomodeling approach does not.

Is this assumption is correct, are you planning to add this functionality in later releases of this package?

@caspervdw
Copy link
Collaborator

Hi @tastatham , thanks for your report.

I wasn't aware on the differences in approach between QGis and dask-geomodeling. It is an interesting idea to weigh the partial pixel overlap.

Which approach is the best approach would however depend on your problem. I think this is only useful for large cell sizes, for smaller cell sizes the effect would be negligible (but the effect on performance would not be).

I don't for see that we are going to work on this functionality in the near future though. If you're up to it, we would welcome a contribution in this area.

@tastatham
Copy link
Author

@caspervdw yes I agree, it's entirely dependent on your problem.

I was considering applying a function for partial overlaps for this package myself but I came across this command line tool: https://github.com/isciences/exactextract, which is written entirely in C++. As such, this implementation is very fast and accounts for on the fly computations, much like QGIS. The documentation here explains other common utilities and how they differ. I thought I would share this with you, as this could be useful your end/other users.

@caspervdw
Copy link
Collaborator

That looks very interesting, I didn't know this library. Tagging @arjanverkerk as he will probably be interested as well.

Just wondering have you tried https://rasterio.readthedocs.io/en/latest/?

@tastatham
Copy link
Author

I have used rasterio but to my knowledge, rasterio does not support zonal statistics directly? https://pythonhosted.org/rasterstats/ supports this operation. Like your package, this can be ran multithreaded e.g. https://github.com/perrygeo/python-rasterstats/blob/master/examples/multiproc.py but the cool thing about your package is that it offers the ability for lazy evaluation via Dask & a geometry tiler to define how much memory to allocate.

There are several other packages I have tested. Most packages out there do not account for partial overlaps because polygon clipping is expensive. I might create a notebook comparing common approaches that are available at present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants