Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocm sdk 6.1.2 release #65

Open
lamikr opened this issue Jun 8, 2024 · 8 comments · Fixed by #161
Open

rocm sdk 6.1.2 release #65

lamikr opened this issue Jun 8, 2024 · 8 comments · Fixed by #161

Comments

@lamikr
Copy link
Owner

lamikr commented Jun 8, 2024

I started porting to 6.1.2 version and it's now on wip/rocm_sdk_builder_612

Changelog so far:

  • all rocm base libraries updated to rocm-6.1.2
  • cmake 3.26.6 is now build on early phase after python
  • added separate binfo file for pytorch_python dependencies
  • dropped patches merged on upstreams
  • other packages updated
    • openmpi 5.0.3
    • pytorch v.2.3.1
    • pytorch v2.3.1
    • pytorch vision v0.18.1
    • onnxruntime v1.18.0
    • deepspeed 31a57fa392 from June 7, 2024
      Plan still to check at least the aotriton, ucc and ucx package updates
@jeroen-mostert
Copy link
Contributor

On Manjaro (and presumably Arch) building this branch fails due to the msgpack dependency breaking. The fix is mentioned here: #27 (comment):

By changing line 105 of rocm_sdk_builder/src_projects/hipBLASLt/tensilelite/Tensile/Source/lib/CMakeLists.txt to find_package(msgpack REQUIRED NAMES msgpack msgpack-cxx msgpack-c) CMake is able to find the dependency and the build can continue.

But this doesn't seem to have landed as part of the general batch of fixes for Ubuntu.

I can't easily produce a pull at the moment as I'm spread across machines but FWIW I've attached the (unsigned) patch, main credit still goes to Dani.
hipBLASLt-000x-Tensilelite-fix-msgpack.patch.txt

@lamikr
Copy link
Owner Author

lamikr commented Jun 19, 2024

Thanks, I somehow originally thought that the fix was only needed to install_deps.sh. Should have read more carefully. Based on to Danis comment, I now created the pr for this but I put @daniandtheweb as an author of the patch and you also signing it of. Is that OK.

#76

@lamikr
Copy link
Owner Author

lamikr commented Jun 21, 2024

Manjaro and arch-linux patch is now merged. I also updated the hipBLASLt patches to remove some patch that was not needed. I also get rid of from hardcoding of rocm install dir and rocm llvm dir in aotriton build by making patches that pass the install dir as a build parameter. Similar type of patch is coming to aotriton, I am just still testing it on different environments.

After those patches are in 6.1.2 branch, there should to my knowledge be any projects with hardcoded install directory.

@lamikr
Copy link
Owner Author

lamikr commented Jul 13, 2024

First 6.1.2 release seems actually contain much more new stuff and changes that I initially thought but I think we should at least try to

@jeroen-mostert
Copy link
Contributor

jeroen-mostert commented Aug 29, 2024

ROCm 6.2.0 was released almost a month ago now. Of course there's no rush in this particular project, but maybe we can consider 6.1.2 "good enough" to slap a release tag on it? Some of the binfos on master have started to target 6.2.0 already, it's probably a good idea to pick a commit at some point before that and stick a pin in it. Or switch to a rolling release, but I think there's worth in combining with a known ROCm version. There's no end to extra binfos you can add, so if that's the limiting factor you know there will never be a release. :)

@lamikr
Copy link
Owner Author

lamikr commented Sep 2, 2024

I agree, we start to be ready. I had used Fedora 40 and Mageia 9 mostly for testing and and when I did a builds on Ubuntu 22.04 and 24.04 I found some pytorch issues that needed to be fixed.

The Ubuntu 24.04 one especially was weird, unlike on other distros, the linking failed for missing symbols in hsa-runtime until I added the search and link commands for it to cmakefiles... I don' The set command for hsa-runtime64 seemed that was in cmakefiles seemed not to be enought for ubuntu and I needed to add link command separately. t know why it's behaving differently there, cmake on ubuntu was 3.26.4... Just double testing on fedora and mageia that it did not break anything.

Then just readme.md needs to be updated for new examples and babs.sh commands.

@jeroen-mostert
Copy link
Contributor

Now that you mention that, I have also had build/link issues with the 64-bit lib split on Manjaro, even with all the appropriate env variables set, but these mostly concerned third-party applications not included with the repo itself. I didn't keep track of them as I went along; eventually I got sick enough of tweaking Makefiles that I just unified lib and lib64 and turned lib64 into a symlink (the way business is normally done on Arch/Manjaro anyway).

lamikr added a commit that referenced this issue Oct 8, 2024
- fixes: #65

Signed-off-by: Mika Laitio <[email protected]>
lamikr added a commit that referenced this issue Oct 9, 2024
- fixes: #65

Signed-off-by: Mika Laitio <[email protected]>
@lamikr lamikr closed this as completed in a9f715d Oct 9, 2024
@lamikr lamikr reopened this Oct 9, 2024
@lamikr
Copy link
Owner Author

lamikr commented Oct 10, 2024

Done some release related work:

  1. Tested the build again with Ubuntu 22.04, 24.04 and Fedora 40 and fixed 2 or 3 Ubuntu 22.04 related build errors that has appeared recently. There were also dependency problem in new hipTensor I had added maybe one week ago, I needed to change it to build after composable kernel.
  2. ./babs.sh has been updated to include ./babs.sh --update command. I think this was important addition
    as it makes it much easier to handle updates as it will clean up all projects which has been changed so that
    they get rebuild much easier.
  3. README.md has seen big update to show all changes in babs.sh functionality, added applications, changed examples, etc..
  4. Changelog needs still to be done before the release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants