-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELEASE] cuspatial v24.12 #1497
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Forward-merge branch-24.10 into branch-24.12
Merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Contributes to rapidsai/build-planning#106 This project doesn't need any of the changes that are being made as part of that issue across most other repos... it's already doing all of its `pip` and `conda` installs in CI in single installs (which is strict and safe) 🎉 This small PR just makes its handling of environment variables in `ci/build_docs.sh` consistent with other RAPIDS repos, in the interest of keeping those scripts looking similar so it's easy to apply changes across all RAPIDS projects. ref: rapidsai/cudf#17013 (comment) Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Mark Harris (https://github.com/harrism) - Mike Sarahan (https://github.com/msarahan) URL: #1469
This merge request fixes the CMake linking logic to cudftestutil for branch-24.12, allowing you to specify which version of GTest/GBench to use as long as the API is source compatible. Follows up on: rapidsai/cudf#16839 Authors: - Basit Ayantunde (https://github.com/lamarrr) Approvers: - Mark Harris (https://github.com/harrism) - Bradley Dice (https://github.com/bdice) URL: #1475
Fixes #1471. Fixes the invalid polygon geometry and improves geometry slicing tests to not depend on hard-coded vertex positions that are also hard-coded in the fixture. Authors: - Mark Harris (https://github.com/harrism) - Bradley Dice (https://github.com/bdice) Approvers: - Michael Wang (https://github.com/isVoid) - Bradley Dice (https://github.com/bdice) URL: #1472
A few minor documentation updates to align with the rest of RAPIDS. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Mark Harris (https://github.com/harrism) URL: #1476
Contributes to rapidsai/build-planning#108 Contributes to rapidsai/build-planning#111 Proposes building `libcuspatial` wheels with `--no-build-isolation`, to improve the rate of `sccache` cache hits and therefore reduce CI times. Also proposes printing `sccache` stats to CI logs after wheel and conda builds. ## Notes for Reviewers # Authors: - James Lamb (https://github.com/jameslamb) - Bradley Dice (https://github.com/bdice) Approvers: - Mike Sarahan (https://github.com/msarahan) - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: #1473
Contributes to rapidsai/build-planning#94 Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - James Lamb (https://github.com/jameslamb) URL: #1479
This PR is replacing the `VAULT_HOST` variable with `AWS_ROLE_ARN`. This is required to use the new token service to get AWS credentials. Authors: - Jordan Jacobelli (https://github.com/jjacobelli) Approvers: - Paul Taylor (https://github.com/trxcllnt) - Bradley Dice (https://github.com/bdice) - Jake Awe (https://github.com/AyodeAwe) URL: #1478
Contributes to rapidsai/build-planning#110 Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI. * checks on wheel size (compressed), - *to be sure they're under PyPI limits* - *and to prompt discussion on PRs that significantly increase wheel sizes* * checks on README formatting - *to ensure they'll render properly as the PyPI project homepages* - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/* Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Mark Harris (https://github.com/harrism) - Bradley Dice (https://github.com/bdice) URL: #1482
…AL (#1483) Contributes to rapidsai/build-planning#118 Modifies `libcuspatial.load_library()` in the following ways: * prefer wheel-provided `libcuspatial.so` to system installation * expose environment variable `RAPIDS_LIBCUSPATIAL_PREFER_SYSTEM_LIBRARY` for switching that preference * load `libcuspatial.so` with `RTLD_LOCAL`, to prevent adding symbols to the global namespace ([dlopen docs](https://linux.die.net/man/3/dlopen)) ## Notes for Reviewers ### How I tested this See "How I tested this" in rapidsai/cudf#17316 Also opened this PR originally pulling in built packages from rapidsai/cudf#17316 # Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: #1483
Contributes to rapidsai/build-planning#118 The pattern introduced in #1483 breaks editable installs in devcontainers. In that type of build, `libcuspatial.so` is built outside of the wheel but **not installed**, so it can't be found by `ld`. Extension modules in `cuspatial` are able to find it via RPATHs instead. This proposes: * try-catching the entire library-loading attempt, to silently do nothing in cases like that * ~adding an import of the `cuspatial` Python library in the `devcontainers` CI job, as a smoke test to catch issues like this in the future~ *(edit: removed those, [`devcontainer` builds run on CPU nodes](https://github.com/rapidsai/shared-workflows/blob/4e84062f333ce5649bc65029d3979569e2d0a045/.github/workflows/build-in-devcontainer.yaml#L19))* ## Notes for Reviewers ### How I tested this Tested this approach on rapidsai/kvikio#553 # Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1484
Improves on a performance regression identified by @trxcllnt when using `quadtree_point_in_polygon` Seems like the core slowdown is that the passed `points` is a `GeoSeries`, and the x/y points are generated by interleaving them in `GeoColumnAccessor.xy` and then slicing x/y back out. Since these points appear to be held in a `cudf.ListDtype` (i.e. `[x, y]`), I tried avoiding interleaving and slicing the the x and y points individually in the lists using the `cudf.Series.list.get` API but this was failing for x/y points for `.polygons.x`/`.polygons.y` since some cuspatial APIs expect these points to be `float` (not sure if it's a cudf bug in `list.get` or how points as structured for polygons) Instead, I was able to get a 5x speedup by * Caching the interleaving step to avoid doing this multiple times * Slicing a cupy array of points instead of a `cudf.Series` (which uselessly slices an `.index` which is not used) <details> <summary>Benchmarking code</summary> ```python # 1.3G points.arrow # 722K polys.arrow import cudf import cupy import cuspatial import pyarrow as pa from time import perf_counter_ns polys = pa.ipc.open_stream("polys.arrow").read_all() polys = cudf.DataFrame.from_arrow(polys).rename(columns={"tract": "poly"}) point = polys["poly"].list.leaves.struct.explode() polys = cuspatial.GeoSeries.from_polygons_xy( point.interleave_columns(), polys["poly"]._column.elements.offsets, polys["poly"]._column.offsets, cupy.arange(0, len(polys) + 1) ) point = pa.ipc.open_stream("points.arrow").read_all() point = cudf.DataFrame.from_arrow(point) min_x = point["x"].min() max_x = point["x"].max() min_y = point["y"].min() max_y = point["y"].max() max_size = 0 max_depth = 16 threshold = 10 while max_size <= threshold: max_depth -= 1 max_size = len(point) / pow(4, max_depth) / 4 scale = max(max_x - min_x, max_y - min_y) / (1 << max_depth) point_xy = cuspatial.GeoSeries.from_points_xy(point.interleave_columns()) point_map, quadtree = ( cuspatial.quadtree_on_points(point_xy, min_x, max_x, min_y, max_y, scale, max_depth, max_size)) t0 = perf_counter_ns() cuspatial.quadtree_point_in_polygon( cuspatial.join_quadtree_and_bounding_boxes( quadtree, cuspatial.polygon_bounding_boxes(polys[0:1]), min_x, max_x, min_y, max_y, scale, max_depth ), quadtree, point_map, point_xy, polys[0:1] ) t1 = perf_counter_ns() print(f"{(t1 - t0) / (10 ** 6)}ms") t0 = perf_counter_ns() poly_offsets = cudf.Series(polys[0:1].polygons.part_offset)._column ring_offsets = cudf.Series(polys[0:1].polygons.ring_offset)._column poly_points_x = cudf.Series(polys[0:1].polygons.x)._column poly_points_y = cudf.Series(polys[0:1].polygons.y)._column from cuspatial._lib import spatial_join cudf.DataFrame._from_data( *spatial_join.quadtree_point_in_polygon( cuspatial.join_quadtree_and_bounding_boxes( quadtree, cuspatial.polygon_bounding_boxes(polys[0:1]), min_x, max_x, min_y, max_y, scale, max_depth ), quadtree, point_map._column, point["x"]._column, point["y"]._column, poly_offsets, ring_offsets, poly_points_x, poly_points_y, ) ) t1 = perf_counter_ns() print(f"{(t1 - t0) / (10 ** 6)}ms") # 127.406344ms # this PR # 644.963021ms # branch 24.10 ``` </details> Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Paul Taylor (https://github.com/trxcllnt) - Mark Harris (https://github.com/harrism) URL: #1446
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
github-actions
bot
added
conda
Related to conda and conda configuration
cmake
Related to CMake code or build configuration
Python
Related to Python code
labels
Dec 6, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
ci
cmake
Related to CMake code or build configuration
conda
Related to conda and conda configuration
libcuspatial
Relates to the cuSpatial C++ library
Python
Related to Python code
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
❄️ Code freeze for
branch-24.12
and v24.12 releaseWhat does this mean?
Only critical/hotfix level issues should be merged into
branch-24.12
until release (merging of this PR).What is the purpose of this PR?
branch-24.12
intomain
for the release