Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU inference not working #949

Open
charrezde opened this issue Dec 4, 2024 · 5 comments
Open

GPU inference not working #949

charrezde opened this issue Dec 4, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@charrezde
Copy link

Summary

The compilation is happening with CUDA but the FHE inference is not possible on CUDA. It by default convert the input to numpy, which is only working with cpu.
Will there be a support for using GPUs for inference?

Description

  • versions affected: 1.7.0
  • python version: 3.10
  • config (optional: HW, OS): Ubuntu 24.04.1 LTS
  • workaround (optional): transfer the input to the cpu
  • proposed fix (optional):

Step by step procedure someone should follow to trigger the bug:

minimal POC to trigger the bug

print("Minimal POC to reproduce the bug")

# compile on gpu
images= images.cpu()
model = model.cpu()
compilation_device="cuda"
n_bits=6
rounding_threshold_bits=6
compile_config = {
    "n_bits": n_bits,
    "rounding_threshold_bits": (
        {"n_bits": rounding_threshold_bits, "method": "APPROXIMATE"}
        if rounding_threshold_bits is not None
        else None
    ),
}
config = Configuration(enable_tlu_fusing=True, print_tlu_fusing=False)
compile_config.update(
    {
        "p_error": 0.05,
        "configuration": config,
    }
)
q_module = compile_torch_model(model, torch_inputset=images, **compile_config, device=compilation_device)

# inference on gpu
images= images.to("cuda")
q_module.forward(images, fhe="disable") # Error here

# fix: inference on cpu
images= images.cpu()
q_module.forward(images, fhe="disable")
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/data/myuser/homomorphic-encryption/demo/resnet_fhe.py", line 373, in <module>
    main()
  File "/data/myuser/homomorphic-encryption/demo/resnet_fhe.py", line 352, in main
    evaluate_model_cml(q_module, mini_test_dataloader, fhe="disable", device=compilation_device)
  File "/data/myuser/homomorphic-encryption/demo/resnet_fhe.py", line 58, in evaluate_model_cml
    batch_outputs = q_module.forward(data, fhe=fhe)
  File "/data/myuser/homomorphic-encryption/.venv/lib/python3.10/site-packages/concrete/ml/quantization/quantized_module.py", line 465, in forward
    q_x = to_tuple(self.quantize_input(*x))
  File "/data/myuser/homomorphic-encryption/.venv/lib/python3.10/site-packages/concrete/ml/quantization/quantized_module.py", line 730, in quantize_input
    q_x = tuple(
  File "/data/myuser/homomorphic-encryption/.venv/lib/python3.10/site-packages/concrete/ml/quantization/quantized_module.py", line 732, in <genexpr>
    self.input_quantizers[idx].quant(x[idx])  # type: ignore[arg-type]
  File "/data/myuser/homomorphic-encryption/.venv/lib/python3.10/site-packages/concrete/ml/quantization/quantizers.py", line 758, in quant
    qvalues = numpy.rint(values / self.scale + self.zero_point)
  File "/data/myuser/homomorphic-encryption/.venv/lib/python3.10/site-packages/torch/_tensor.py", line 1087, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

@charrezde charrezde added the bug Something isn't working label Dec 4, 2024
@jfrery
Copy link
Collaborator

jfrery commented Dec 4, 2024

Hi @charrezde,

The input you send in the q_module.forward should be numpy and not torch. You can fix this with:

q_module.forward(images.detach().cpu().numpy(), fhe="disable")

However, GPU acceleration is for the FHE execution so you need to do fhe=execute to actually see the GPU execution.

@charrezde
Copy link
Author

Hi @jfrery , this does not seem to be working for me

q_module.forward(images.to("cuda"), fhe="execute")

This is failing due to:

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

It is by default converting the inputs to numpy.

If I run:

q_module.forward(images.detach().numpy(), fhe="execute")

I get my process killed.

@jfrery
Copy link
Collaborator

jfrery commented Dec 4, 2024

this does not seem to be working for me

q_module.forward(images.to("cuda"), fhe="execute")
This is failing due to:

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
It is by default converting the inputs to numpy.

Yes that is expected. We don't handle torch input for the inference part. Only for the compilation so you need to convert them to numpy.

If I run:

q_module.forward(images.detach().numpy(), fhe="execute")
I get my process killed.

Alright I would need more information here:

  • What kind of message do you get?
  • What's your GPU ?
  • What model are you trying to compile?
  • and what's the input images size?
    It's likely that the GPU memory isn't enough to run the circuit. Encrypted data are much bigger than those in plaintext.

@charrezde
Copy link
Author

  • message I get: Killed
  • NVIDIA Tesla T4, 16384MiB
  • I am currently testing with resnet18 but I will use bigger models and yolo detectors
  • image_size (224, 224,3)

@jfrery
Copy link
Collaborator

jfrery commented Dec 6, 2024

How many images are you trying to pass in the compile? You should probably monitor the gpu memory as I think this is the bottleneck. And maybe try with a single image in the compile and a single image in the test. Hopefully that fits within the 16GB.

We run resnet like model on h100. I am not sure you can do that on a T4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants