XPU and MPS support #127

jatkinson1000 · 2024-05-13T11:57:01Z

@ElliottKasoar has done some work to use the MPS backend on apple silicon in #125

He has also started similar efforts to add XPU support on the intel GPU devices, but this may take more work if the backend isn't currently part of Torch.
@christopheredsall and @ma595 have offered to look at this on Dawn.

jatkinson1000 · 2024-05-19T08:01:22Z

Looks like there is some XPU info in the C++ docs on pytorch: https://pytorch.org/cppdocs/api/dir_c10.html#dir-c10

ElliottKasoar · 2024-05-19T22:11:53Z

Sorry for not sharing more about what I've looked into on the XPU side. I would have liked to see if I could at least do something with the basic Intel extension that in theory I could run on CPU, which is probably a sensible first step, but I've had no time.

What I'd had found so far:

From PyTorch:

PyTorch Tutorials for CPU and GPU via Python and CPU-only C++, including example CMakeLists.txt

From Intel:

General Intel docs for CPU and GPU
- The search doesn't seem to work for me, so it can be tricky to find things
- XPU C++ API
Intel C++ inference example with installation instructions (note the cppsdk package for installing)
- Includes a useful link to a (CPU) example for C++, including an example app and CMakeLists.txt
- Equivalent XPU example

The find_package and variables set differ between the PyTorch and Intel CMakeLists.txt examples, so I suspect the PyTorch example is slightly outdated.

My initial thoughts for changes we would need were something along the lines of an ENABLE_INTEL flag, which we could use both to change the find_package and what we link to:

if(ENABLE_INTEL)
  find_package(IPEX REQUIRED)
else()
  find_package(Torch REQUIRED)
endif()
...
...
-
if(ENABLE_INTEL)
  target_link_libraries(${LIB_NAME} PRIVATE ${TORCH_IPEX_LIBRARIES})
else()
  target_link_libraries(${LIB_NAME} PRIVATE ${TORCH_LIBRARIES})
endif()

I can't check right now, but I think finding the package worked ok, but I was getting errors linking, I think related to glibc versions.

ElliottKasoar · 2024-08-03T21:43:20Z

It's not immediately completely clear exactly how it fits in with ipex, but worth noting that it looks like Intel (Data Center, for now) GPUs should be supported natively as of PyTorch 2.4.

My initial reading is that ipex still should offer additional optimisation for CPU and/or GPU, and I think I saw a comment that there's "No change of the upstreaming goal", but this potentially simplifies immediate setup, with ipex potentially becoming more of a nice-to-have optimisation option instead of a requirement.

I may have misunderstood, though.

ma595 · 2024-10-28T10:04:40Z

It seems like PyTorch now provides binaries built for xpu support from 2.5.0 onwards (as of 28th October, 2.5.1 has been released).

Installation on Dawn. Allocate a node as usual: srun -A <account> -p pvc --gres=gpu:1 -N 1 -t 01:00:0 --pty /bin/bash

module purge
module load default-dawn
module load python/3.11.9/gcc/7xr7o47s
python3 -m venv venv3
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/test/xpu

This fails as follows:

>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/rds/project/rds-5mCMIDBOkPU/rse/ftorch/FTorch/venv3-new-pytorch-binary/lib/python3.11/site-packages/torch/__init__.py", line 367, in <module>
    from torch._C import *  # noqa: F403
    ^^^^^^^^^^^^^^^^^^^^^^
ImportError: /rds/project/rds-5mCMIDBOkPU/rse/ftorch/FTorch/venv3-new-pytorch-binary/lib/python3.11/site-packages/torch/lib/libpti_view.so.0.9: undefined symbol: zelTracerRTASParallelOperationDestroyExpRegisterCallback

This suggests that pti package needs to be built. https://github.com/intel/pti-gpu. Specifically the ze_tracer.

The previous is one option that needs to be checked. Perhaps there is an easier approach?

jatkinson1000 assigned christopheredsall, ma595 and ElliottKasoar May 13, 2024

jatkinson1000 added the enhancement New feature or request label May 13, 2024

jatkinson1000 linked a pull request May 13, 2024 that will close this issue

Add MPS and XPU devices #125

Open

jatkinson1000 added the hackathon label Sep 9, 2024

jatkinson1000 unassigned christopheredsall Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XPU and MPS support #127

XPU and MPS support #127

jatkinson1000 commented May 13, 2024

jatkinson1000 commented May 19, 2024

ElliottKasoar commented May 19, 2024 •

edited

Loading

ElliottKasoar commented Aug 3, 2024

ma595 commented Oct 28, 2024 •

edited

Loading

XPU and MPS support #127

XPU and MPS support #127

Comments

jatkinson1000 commented May 13, 2024

jatkinson1000 commented May 19, 2024

ElliottKasoar commented May 19, 2024 • edited Loading

ElliottKasoar commented Aug 3, 2024

ma595 commented Oct 28, 2024 • edited Loading

ElliottKasoar commented May 19, 2024 •

edited

Loading

ma595 commented Oct 28, 2024 •

edited

Loading