-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Readily available algorithms / overlap of interests #2
Comments
thanks for reaching out! I had seen your numba cell_tree code -- and intended to ask about merging those projects. I'm still wary of relying on numba for "production" work (e.g. as paret of web service, or ...) -- it's great for interactive data analysis, and, as you point out algorithm development / debugging. And maybe I'm overly cautious -- I commited to Cython long before numba existed :-) And with conda-forge, at least I no longer have to expect my users to compile stuff :-) All that to say that I'm not ready to commit to pure-numba, but I think they can co-exist. My thought for the cell_tree2D project: We put them in on repo, and then they can share all. the. tests, etc. I think that will help both projects, and keep the API consistent, which I think is a good thing. and two options for packaging -- two. packages: By default, on import it would look for the cython version, if it's there, it would use it -- if not, then it would use the numba version. Alternatively, you could select which version you want to use:
And, now that you mention it -- we could merge cell_tree with geometry_utils. I had though they were kinds of separate, but there really is a lot of intertwining -- cell_tree needs point_in_polygon (at least a simple version) and geometry could use a cell_tree, for, e.g. fast multiple points in a complex polygon:
Speaking of which -- do you have traingulation algorithm? we've been using "mapbox_earcut": https://github.com/skogler/mapbox_earcut_python which is a python wrapper around C++ code, ported from. JS, I think. A numba implimentaion might make sense. Honestly -- I have no idea if it's the best way to go -- one of our team found it, and started using it -- it's worked so far :-) As for Julia -- I had high hopes for it as a way to write compiled extension for Python way back when -- it seemed there as a lot of pollination of the communities -- but somehow that was never made as easy as I thought it could be -- oh well. LPython looks promising -- I hadn't seen that before -- doesn't look ready for production work, but good to keep an eye on it. All this to say: I propose we join forces and create a single project for geometry utilities -- combo of python, numpy and cython (optionally calling C/C++) (for now). We can use this project if you like -- I don't think it's had any real uptake at this point. NOTE: I recently was working on another project that needed a fast point in polygon routine. I thought that would be a good chance to finalize, and make use of, this project. But in that case: many points, small number of polygon vertices, a pure-numpy implementation was just as fast -- so why introduce the dependency? https://github.com/asascience-open/xarray-subset-grid/blob/main/profiling/p_in_p_perf.py (and the Shapely version was slower in all cases) Thanks for the interest! |
The webservices argument is a good argument, I honestly hadn't considered that! numba_celltree takes some time to compile (more than 20s I think), but then everything is cached. I force everything to 64bit floats anyway, so there's no JIT'ing required afterwards. But at least one downstream user was getting a timeout on one of their CI tests due to compilation.
I've added a few extra methods; they could be added to the C version as well, it's a little bit of work. Would also be a nice opportunity to make the C version run in parallel too; it should also be twice as fast as the numba version because numba lacks a parallel concurrent data structure to append to; so I search first, count, allocate, then search again and actually store results... I'm a bit more experienced nowadays, so I think I'm probably more open to non-numba approaches now. I also had the (naive) hope that people might get involved more easily if it's "just Python", but "laypeople" aren't interested in computational geometry by a long shot...
I've been using mapbox_earcut too. It seems nice and robust and extremely fast; I found out that actually combines pretty nicely with the cell tree. In our case, we have workflow where we need to know the wetted area (of rivers/lakes/etc.) for each grid cell of a model. This requires millions of intersections and shapely/GEOS are just too slow. But converting the polygons to triangles, then storing them in a celltree, followed by the polygon intersection seems to work pretty well! The first alternative that comes to mind is Shewchuks EDIT: reminds me, there's also this one: https://github.com/mourner/delaunator-rs
Yes; incidentally, I tried this mostly motivated by (perverse) curiosity: It was not a good experience. Mapbox earcut uses a doubly linked list because it has to insert a lot of values. This is easy in Python (just store references), but not with numba. I could simulate this with a numpy structured array / vector and using integers instead of pointers; but it needed a lot more workarounds. (Funnily enough, I found that people have done the same in Rust because Rust's borrow checker doesn't like the ownership ambiguity of linked lists.) I saw your comment on the mapbox_earcut repo PR about more maintainers; if you're compiling C(++) and generating Python bindings for celltree already anyway, you could consider moving in the mapbox earcut code in as well; including the triangle code + python bindings would be an idea too, since neither the mapbox_earcut nor the triangle maintainer is very active anymore. I'm a maintainer on the conda triangle bindings recipe. Triangle has a weird custom license though, it might make sense to isolate it for those reasons. A few marginal improvements for earcut came to mind:
I see it's processing all points per polygon side at once; makes sense. Can't vectorize it reasonably for multiple polygons though! |
Worst case, you get a NotImplimentedError if an implementation doesn't have a given method. Have you added any sanity checking? The C code can fail badly if the input data aren't consistent :-) -- even if it doesn't hard-crash, it doesn't give the user any idea why it didn't work :-( So I'd like to have an optional check-the-input-for-correctness method of some. sort.
exactly -- and it's putting the "points" loop in numpy, but the "vertexes" loop is in Python -- so a really large polygon would be a lot slower. As far as I have found Shewchuck's code is still by far the best (only) out there for constrained delauney -- which we very much need. We actually have our own wrappers around it -- I thought we'd published that, but maybe didn't because of the licensing issue -- that is painful, I tried reaching out to him about it. some years ago, but never got a reply - maybe one of these days when in Berkeley I could find him in person ... One of the key things we found what that code was that it would call sys.exit() on some error conditions, which would crash Python :-( -- so we called it in a sub-process -- other than that, I don't know if our code has any advantages. I haven't tested the py-triangle code to that particular problem. (I think it mostly showed up if you did a "quality" triangulation). BTW: the https://github.com/asascience-open/xarray-subset-grid project may be of interest to Deltares -- it is yet ANOTHER attempt at a problem that's been addressed a number of times (including a Deltares project, I think). We provided some funding to kick that off -- we are looking for a robust and universal solution that can hopefully continue to be extended. None of the existing codes did everything we need well. I'm not entirely sure why the developer decided to start from scratch, but he did -- but now it's in pretty useful shape. So where to go from here? It sound like you're interested in more collaboration -- how about we put all this in one project? We can decide later to what extent it should be one package or multiple more focused packages. We could either start a new one or use this one. your choice? And we could put in NOAA's org, or Deltares' org, or even start a new org for this. |
Fair enough.
Numba isn't memory safe and doesn't bound check by default either, so it can crash Python pretty easily. I haven't seen that happen with the celltree though. The whole thing feels pretty robust to me since it the tree is built from the bounding boxes of the input data. I guess it'll go wrong if you have overlapping faces, but does it really spin out of control? I only ended up adding this: https://deltares.github.io/numba_celltree/api.html#numba_celltree.CellTree2d.validate_node_bounds
The conda-recipe just uses this one: https://github.com/drufat/triangle By the way, the MODFLOW folks are just writing the triangle input files and calling the executable: https://flopy.readthedocs.io/en/latest/source/flopy.utils.triangle.html
I still think the best solution on the long term is via xarray with custom index objects. It requires the explicit indexes (which has been in progress from quite some time): https://github.com/pydata/xarray/projects/1 I should mention, some of my colleagues are also developing this C++ library:
Yes I would prefer collaboration. I'm pretty happy with the mesh related packages I've setup so far (numba_celltree, pandamesh, xugrid), I've been fairly productive (if I say so myself), but I'm the primary maintainer for all. Currently, from the perspective of "geometry operations on plain numpy arrays", numba_celltree provides what I need and I can easily extend it (I basically don't have to explain myself to anybody either...). However, we currently have two celltree implementations that are largely identical; it would make sense to move those together. I think consolidating stuff like mapbox_earcut would also be reasonable (as long as it takes plain numpy arrays), if we see a need. I think in the end we would have to make a decision on numba versus C/C++/Cython. I don't think it's tenable to have duplicate classes for everything; somebody compromising seems more reasonable (in terms of accepting numba / accepting ahead of time compilation). That might take some time. I think the question is: how we do make it worthwhile to get started in the mean time? For example, I need a celltree not for polygons, but for line segments (river networks): Deltares/numba_celltree#11 I still need more robust barycentric interpolation weights as well. Which currently missing functionality is most pressing for you? We'd start benefitting immediately if there's some shared stake there. (geometry_utils might not cut it by the way, since the name seems taken on pypi? https://pypi.org/project/geometry-utils/) Having said that, I also don't have all that much time on my hands at the moment to get things off the ground... |
Hi, maybe relevant to your interests: I ended up porting celltree to numba because I rarely use C or C++ and I needed extra functionality (search for polygon and line intersects, not just points).
To do so, I needed a bunch of algorithms. The small ones can be found here: https://github.com/Deltares/numba_celltree/blob/main/numba_celltree/geometry_utils.py
E.g. the same point in polygon algorithm: https://github.com/Deltares/numba_celltree/blob/9da5d941f27cae2b42498206b8625ff3ff42b9e3/numba_celltree/geometry_utils.py#L93
I also included a number of other algorithms:
https://github.com/Deltares/numba_celltree/tree/main/numba_celltree/algorithms
I use the barycentric stuff for interpolation, cyrus_beck, liang_barsky, sutherland_hodgman are for clipping. Separating axes to identify whether two (convex) polygons intersect.
It's all pretty lightweight, I defined some Point, Vector named tuples. Everything has a higher up function which vectorizes it.
Test coverage is okay -- but I'm sure it's still missing a lot of (literal) edge cases.
It uses numba instead of Cython, because I find it a lot easier to distribute and to debug (just switch JIT compiling off temporarily).
If you were okay with numba, you could just copy it. But porting to Cython should also be straightforward -- depends on where you'd like to go with this project. (There's quite some active development on the Python compilation front, see e.g. https://lpython.org/ or mojo, etc. -- I might end up switching compilers). Requirements are minimal: just numba and numpy.
Shared interests
It's worth mentioning: I think my interests are very similar, I have numpy arrays that represent spatial data and I'd like to operate on them efficiently. Shapely is generally too expensive. Currently, whenever I need something new I just add it to
numba_celltree
because most of it goes through there. But ideally, we'd have a nice Python package that generally available. I'm also open to renamingnumba_celltree
and accepting PRs of course!Ideally, I'd also like to support 3D geometry to add Ugrid3D unstructured grids support to
numba_celltree
. My gut feeling is that that's a lot more work than 2D, and it would be a shame to have it under-generalized or under-utilized. It's also possible that using vtk is just the much easier approach for 3D, but I haven't researched that very carefully.Julia
Finally: whenever you're looking for readable implementations of computational geometry, it's worth checking out the Julia ecosystem:
https://github.com/JuliaGeometry
https://github.com/JuliaGeo/GeometryOps.jl
The Julia community, though small, seems to be moving forward a lot faster, because a straightforward choice to implement things in Julia and get good performance. In Python you have to make choices about Cython, C, C++, Fortran, Rust, or some JIT compilers which support a subset of Python -- all of which seems to generate so much friction and scatters efforts so that things don't get off the ground. Calling Julia from Python isn't very attractive at this moment, but the Julia people seem to making exciting progress at static compilation. That would probably be my preferred solution -- at that point I'd port the celltree stuff again, but then to Julia. Until then, porting from Julia to Python is often very straightforward.
The text was updated successfully, but these errors were encountered: