Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XPU and MPS support #127

Open
jatkinson1000 opened this issue May 13, 2024 · 4 comments · May be fixed by #125
Open

XPU and MPS support #127

jatkinson1000 opened this issue May 13, 2024 · 4 comments · May be fixed by #125
Assignees
Labels
enhancement New feature or request hackathon

Comments

@jatkinson1000
Copy link
Member

@ElliottKasoar has done some work to use the MPS backend on apple silicon in #125

He has also started similar efforts to add XPU support on the intel GPU devices, but this may take more work if the backend isn't currently part of Torch.
@christopheredsall and @ma595 have offered to look at this on Dawn.

@jatkinson1000 jatkinson1000 added the enhancement New feature or request label May 13, 2024
@jatkinson1000 jatkinson1000 linked a pull request May 13, 2024 that will close this issue
@jatkinson1000 jatkinson1000 linked a pull request May 13, 2024 that will close this issue
@jatkinson1000
Copy link
Member Author

Looks like there is some XPU info in the C++ docs on pytorch: https://pytorch.org/cppdocs/api/dir_c10.html#dir-c10

@ElliottKasoar
Copy link
Contributor

ElliottKasoar commented May 19, 2024

Sorry for not sharing more about what I've looked into on the XPU side. I would have liked to see if I could at least do something with the basic Intel extension that in theory I could run on CPU, which is probably a sensible first step, but I've had no time.

What I'd had found so far:

From PyTorch:

From Intel:

The find_package and variables set differ between the PyTorch and Intel CMakeLists.txt examples, so I suspect the PyTorch example is slightly outdated.

My initial thoughts for changes we would need were something along the lines of an ENABLE_INTEL flag, which we could use both to change the find_package and what we link to:

if(ENABLE_INTEL)
  find_package(IPEX REQUIRED)
else()
  find_package(Torch REQUIRED)
endif()
...
...
-
if(ENABLE_INTEL)
  target_link_libraries(${LIB_NAME} PRIVATE ${TORCH_IPEX_LIBRARIES})
else()
  target_link_libraries(${LIB_NAME} PRIVATE ${TORCH_LIBRARIES})
endif()

I can't check right now, but I think finding the package worked ok, but I was getting errors linking, I think related to glibc versions.

@ElliottKasoar
Copy link
Contributor

It's not immediately completely clear exactly how it fits in with ipex, but worth noting that it looks like Intel (Data Center, for now) GPUs should be supported natively as of PyTorch 2.4.

My initial reading is that ipex still should offer additional optimisation for CPU and/or GPU, and I think I saw a comment that there's "No change of the upstreaming goal", but this potentially simplifies immediate setup, with ipex potentially becoming more of a nice-to-have optimisation option instead of a requirement.

I may have misunderstood, though.

@ma595
Copy link
Member

ma595 commented Oct 28, 2024

It seems like PyTorch now provides binaries built for xpu support from 2.5.0 onwards (as of 28th October, 2.5.1 has been released).

Installation on Dawn. Allocate a node as usual: srun -A <account> -p pvc --gres=gpu:1 -N 1 -t 01:00:0 --pty /bin/bash

module purge
module load default-dawn
module load python/3.11.9/gcc/7xr7o47s
python3 -m venv venv3
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/test/xpu

This fails as follows:

>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/rds/project/rds-5mCMIDBOkPU/rse/ftorch/FTorch/venv3-new-pytorch-binary/lib/python3.11/site-packages/torch/__init__.py", line 367, in <module>
    from torch._C import *  # noqa: F403
    ^^^^^^^^^^^^^^^^^^^^^^
ImportError: /rds/project/rds-5mCMIDBOkPU/rse/ftorch/FTorch/venv3-new-pytorch-binary/lib/python3.11/site-packages/torch/lib/libpti_view.so.0.9: undefined symbol: zelTracerRTASParallelOperationDestroyExpRegisterCallback

This suggests that pti package needs to be built. https://github.com/intel/pti-gpu. Specifically the ze_tracer.

The previous is one option that needs to be checked. Perhaps there is an easier approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hackathon
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants