Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] dask-cuda v24.12 #1411

Merged
merged 15 commits into from
Dec 11, 2024
Merged

[RELEASE] dask-cuda v24.12 #1411

merged 15 commits into from
Dec 11, 2024

Conversation

raydouglass
Copy link
Member

❄️ Code freeze for branch-24.12 and v24.12 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-24.12 until release (merging of this PR).

What is the purpose of this PR?

  • Update documentation
  • Allow testing for the new release
  • Enable a means to merge branch-24.12 into main for the release

raydouglass and others added 14 commits September 19, 2024 11:46
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Durations output were previously increased to show all tests to allow us debugging of timeouts. However, now they have not been as important so limiting to only the 50 longer running tests is best to decrease log lengths, we may soon remove it entirely if they are not currently important.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #1393
Contributes to rapidsai/build-planning#106

Proposes specifying the RAPIDS version in `conda install` calls that install CI artifacts, to reduce the risk of CI jobs picking up artifacts from other releases.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)

URL: #1395
This PR closes: #1281

Usage example:
```
from dask_cuda import LocalCUDACluster
from dask.distributed import Client

cluster = LocalCUDACluster(rmm_allocator_external_lib_list=["torch", "cupy"])
client = Client(cluster)
```

Verify working
```
def get_torch_allocator():
    import torch
    return torch.cuda.get_allocator_backend()
    
client.run(get_torch_allocator)
```

```
client.run(get_torch_allocator)
```

```
{'tcp://127.0.0.1:37167': 'pluggable',
 'tcp://127.0.0.1:38749': 'pluggable',
 'tcp://127.0.0.1:43109': 'pluggable',
 'tcp://127.0.0.1:44259': 'pluggable',
 'tcp://127.0.0.1:44953': 'pluggable',
 'tcp://127.0.0.1:45087': 'pluggable',
 'tcp://127.0.0.1:45623': 'pluggable',
 'tcp://127.0.0.1:45847': 'pluggable'}
```

Without it its `native`.


Context: This helps NeMo-Curator to have a  more stable use of Pytorch+dask-cuda 

CC: @pentschev .

Authors:
  - Vibhu Jawa (https://github.com/VibhuJawa)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1392
UCXX CI tests had been previously disabled due to instabilities, see #1270 (comment), it should now be much more resilient so we should reenable them in preparation for the permanent migration to UCXX.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)

URL: #1396
Ignore legacy Dask dataframe warnings that the implementation is going to be soon removed, introduced in
dask/dask#11437 .

The warning is only raised for `DASK_DATAFRAME__QUERY_PLANNING=False` cases.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Richard (Rick) Zamora (https://github.com/rjzamora)
  - James Lamb (https://github.com/jameslamb)

URL: #1397
Contributes to rapidsai/build-planning#108

This is a pure Python project, so it doesn't need configuration about CMake or `sccache`.

This proposes removing them to simplify build scripts a bit.

It also proposes updating the `rapids-dependency-file-generator` pre-commit hook to it's latest version, something I'm trying to roll out across RAPIDS as part of rapidsai/build-planning#108.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)

URL: #1400
In cudf & cuml we have observed a ~10% to ~20% respectively speed up of pytest suite execution by switching pytest traceback to `--native`:

```
currently:

102474 passed, 2117 skipped, 902 xfailed in 892.16s (0:14:52)

--tb=short:

102474 passed, 2117 skipped, 902 xfailed in 898.99s (0:14:58)

--tb=no:

102474 passed, 2117 skipped, 902 xfailed in 815.98s (0:13:35)

--tb=native:

102474 passed, 2117 skipped, 902 xfailed in 820.92s (0:13:40)
```

This PR makes similar change to `dask-cuda` repo.

xref: rapidsai/cudf#16851

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1389
Add support for initial warmup runs in benchmarks and allows profiling all iterations or just the last one.

This is technically a breaking change since `--profile` now profiles all iterations, and the new `--profile-last` option profiles only the last one as `--profile` used to behave.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: #1402
Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
  - *and to prompt discussion on PRs that significantly increase wheel sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
  - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/*

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #1404
Temporarily disable UCXX tests in CI due to some non-deterministic failures during code freeze phase. They will be reenabled after 24.12 release.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)

URL: #1406
Handling the str vs. bytes discrepancy should have been covered by the changes in #1118.

Authors:
  - Lawrence Mitchell (https://github.com/wence-)
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - https://github.com/jakirkham

URL: #1130
@raydouglass raydouglass requested review from a team as code owners November 21, 2024 20:51
@raydouglass raydouglass requested review from msarahan and removed request for a team November 21, 2024 20:51
@github-actions github-actions bot added python python code needed conda conda issue ci labels Nov 21, 2024
@raydouglass raydouglass merged commit e5059a5 into main Dec 11, 2024
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci conda conda issue python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants