Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to install on macOS? #146

Closed
0xdevalias opened this issue Apr 9, 2023 · 13 comments
Closed

How to install on macOS? #146

0xdevalias opened this issue Apr 9, 2023 · 13 comments

Comments

@0xdevalias
Copy link

0xdevalias commented Apr 9, 2023

Originally posted as part of the following issue:

As part of that, I got: ModuleNotFoundError: No module named 'llama_inference_offload'

Which led me to this repo, where I tried to install the requirements as follows:

cd ..
# ..snip..

⇒ git clone [email protected]:qwopqwop200/GPTQ-for-LLaMa.git
# ..snip..cd GPTQ-for-LLaMa
# ..snip..

⇒ pyenv local miniconda3-latest/envs/textgen
# ..snip..

⇒ pip install -r requirements.txt
Collecting git+https://github.com/huggingface/transformers (from -r requirements.txt (line 4))
  Cloning https://github.com/huggingface/transformers to /private/var/folders/j4/kxtq1cjs1l98xfqncjbsbx1c0000gn/T/pip-req-build-_6j4_tu0
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /private/var/folders/j4/kxtq1cjs1l98xfqncjbsbx1c0000gn/T/pip-req-build-_6j4_tu0
  Resolved https://github.com/huggingface/transformers to commit 656e869a4523f6a0ce90b3aacbb05cc8fb5794bb
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: safetensors==0.3.0 in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen/lib/python3.10/site-packages (from -r requirements.txt (line 1)) (0.3.0)
Collecting datasets==2.10.1
  Downloading datasets-2.10.1-py3-none-any.whl (469 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 469.0/469.0 kB 6.8 MB/s eta 0:00:00
Requirement already satisfied: sentencepiece in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen/lib/python3.10/site-packages (from -r requirements.txt (line 3)) (0.1.97)
Collecting accelerate==0.17.1
  Using cached accelerate-0.17.1-py3-none-any.whl (212 kB)
ERROR: Could not find a version that satisfies the requirement triton==2.0.0 (from versions: none)
ERROR: No matching distribution found for triton==2.0.0

But that resulted in the errors:

ERROR: Could not find a version that satisfies the requirement triton==2.0.0 (from versions: none)
ERROR: No matching distribution found for triton==2.0.0

Looking at pypi there appears to be 2.0.0 version of triton, so i'm not sure why it wouldn't be able to install it:

Looking at the built files for version 2.0.0:

I'm guessing it might be because there may not be a python 3.10.x version built?

⇒ python --version
Python 3.10.9
@0xdevalias
Copy link
Author

0xdevalias commented Apr 9, 2023

Seems I still got the same error on python 3.9.x:

⇒ python --version
Python 3.9.16

⇒ pip install -r requirements.txt
Collecting git+https://github.com/huggingface/transformers (from -r requirements.txt (line 4))
  Cloning https://github.com/huggingface/transformers to /private/var/folders/j4/kxtq1cjs1l98xfqncjbsbx1c0000gn/T/pip-req-build-wwj2wmga
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /private/var/folders/j4/kxtq1cjs1l98xfqncjbsbx1c0000gn/T/pip-req-build-wwj2wmga
  Resolved https://github.com/huggingface/transformers to commit 656e869a4523f6a0ce90b3aacbb05cc8fb5794bb
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: safetensors==0.3.0 in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from -r requirements.txt (line 1)) (0.3.0)
Collecting datasets==2.10.1
  Using cached datasets-2.10.1-py3-none-any.whl (469 kB)
Requirement already satisfied: sentencepiece in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from -r requirements.txt (line 3)) (0.1.97)
Collecting accelerate==0.17.1
  Using cached accelerate-0.17.1-py3-none-any.whl (212 kB)
ERROR: Could not find a version that satisfies the requirement triton==2.0.0 (from versions: none)
ERROR: No matching distribution found for triton==2.0.0

Potentially related:

@0xdevalias
Copy link
Author

0xdevalias commented Apr 9, 2023

Seems that triton will install from source ok though:

On my MacBook Pro 2019 (intel), I followed the install instructions from here (along with a little extra to setup a conda environment to do it in):

As follows:

⇒ conda create -n textgen_py3_9_16 python=3.9.16
# ..snip..

⇒ conda activate textgen_py3_9_16
# ..snip..

⇒ git clone [email protected]:openai/triton.git
# ..snip..cd triton/python
# ..snip..

⇒ pip install cmake
# ..snip..

⇒ pip install -e .
Obtaining file:///Users/devalias/dev/AI/text-generation-webui/repositories/triton/python
  Preparing metadata (setup.py) ... done
Requirement already satisfied: filelock in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from triton==2.1.0) (3.11.0)
Installing collected packages: triton
  Running setup.py develop for triton
Successfully installed triton-2.1.0

Originally posted by @0xdevalias in triton-lang/triton#1465 (comment)

That installed version 2.1.0, but we can do 2.0.0 by doing the following:

⇒ git checkout v2.0.0
# ..snip..

⇒ pip install -e .
Obtaining file:///Users/devalias/dev/AI/text-generation-webui/repositories/triton/python
  Preparing metadata (setup.py) ... done
Requirement already satisfied: cmake in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from triton==2.0.0) (3.26.1)
Requirement already satisfied: filelock in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from triton==2.0.0) (3.11.0)
Requirement already satisfied: torch in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from triton==2.0.0) (2.0.0)
Collecting lit
  Downloading lit-16.0.0.tar.gz (144 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 145.0/145.0 kB 3.3 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Requirement already satisfied: jinja2 in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from torch->triton==2.0.0) (3.1.2)
Requirement already satisfied: sympy in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from torch->triton==2.0.0) (1.11.1)
Requirement already satisfied: typing-extensions in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from torch->triton==2.0.0) (4.5.0)
Requirement already satisfied: networkx in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from torch->triton==2.0.0) (3.1)
Requirement already satisfied: MarkupSafe>=2.0 in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from jinja2->torch->triton==2.0.0) (2.1.2)
Requirement already satisfied: mpmath>=0.19 in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from sympy->torch->triton==2.0.0) (1.3.0)
Building wheels for collected packages: lit
  Building wheel for lit (setup.py) ... done
  Created wheel for lit: filename=lit-16.0.0-py3-none-any.whl size=93582 sha256=90c6c50decf1b60e45356b3a993c62d719b6506090f7899d82f6e2f9ef0ff031
  Stored in directory: /Users/devalias/Library/Caches/pip/wheels/c7/ee/80/1520ca86c3557f70e5504b802072f7fc3b0e2147f376b133ed
Successfully built lit
Installing collected packages: lit, triton
  Attempting uninstall: triton
    Found existing installation: triton 2.1.0
    Uninstalling triton-2.1.0:
      Successfully uninstalled triton-2.1.0
  Running setup.py develop for triton
Successfully installed lit-16.0.0 triton-2.0.0

Once I did that, I could go back to this project and pip install -r requirements.txt completed successfully!

@0xdevalias
Copy link
Author

0xdevalias commented Apr 9, 2023

After a few little hacks (see linked issue comment below) I managed to get the main webUI to start and load the model:

But then it fails when it tries to generate any of the prompts it raises AssertionError: Torch not compiled with CUDA enabled, even though I passed --cpu through to the webui (though I suspect this project still tries to load it on the GPU despite that?):

Traceback (most recent call last):
  File "/Users/devalias/dev/AI/text-generation-webui/modules/callbacks.py", line 66, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "/Users/devalias/dev/AI/text-generation-webui/modules/text_generation.py", line 220, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/transformers/generation/utils.py", line 1485, in generate
    return self.sample(
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/transformers/generation/utils.py", line 2524, in sample
    outputs = self(
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
    outputs = self.model(
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 577, in forward
    layer_outputs = decoder_layer(
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 196, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/devalias/dev/AI/text-generation-webui/repositories/GPTQ-for-LLaMa/quant.py", line 450, in forward
    out = QuantLinearFunction.apply(x.reshape(-1,x.shape[-1]), self.qweight, self.scales,
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/cuda/amp/autocast_mode.py", line 106, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/Users/devalias/dev/AI/text-generation-webui/repositories/GPTQ-for-LLaMa/quant.py", line 364, in forward
    output = matmul248(input, qweight, scales, qzeros, g_idx, bits, maxq)
  File "/Users/devalias/dev/AI/text-generation-webui/repositories/GPTQ-for-LLaMa/quant.py", line 336, in matmul248
    output = torch.empty((input.shape[0], qweight.shape[1]), device='cuda', dtype=torch.float16)
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
Output generated in 0.31 seconds (0.00 tokens/s, 0 tokens, context 67)

Searching for hardcoded references to cuda:

These are the files that seem to be hardcoding the device:

  • DEV = torch.device('cuda:0')
  • DEV = torch.device('cuda:0')
  • DEV = torch.device('cuda:0')
  • GPTQ-for-LLaMa/quant.py

    Lines 335 to 358 in 9463299

    def matmul248(input, qweight, scales, qzeros, g_idx, bits, maxq):
    output = torch.empty((input.shape[0], qweight.shape[1]), device='cuda', dtype=torch.float16)
    grid = lambda META: (triton.cdiv(input.shape[0], META['BLOCK_SIZE_M']) * triton.cdiv(qweight.shape[1], META['BLOCK_SIZE_N']),)
    matmul_248_kernel[grid](input, qweight, output,
    scales, qzeros, g_idx,
    input.shape[0], qweight.shape[1], input.shape[1], bits, maxq,
    input.stride(0), input.stride(1),
    qweight.stride(0), qweight.stride(1),
    output.stride(0), output.stride(1),
    scales.stride(0), qzeros.stride(0))
    return output
    def transpose_matmul248(input, qweight, scales, qzeros, g_idx, bits, maxq):
    output_dim = (qweight.shape[0] * 32) // bits
    output = torch.empty((input.shape[0], output_dim), device='cuda', dtype=torch.float16)
    grid = lambda META: (triton.cdiv(input.shape[0], META['BLOCK_SIZE_M']) * triton.cdiv(output_dim, META['BLOCK_SIZE_K']),)
    transpose_matmul_248_kernel[grid](input, qweight, output,
    scales, qzeros, g_idx,
    input.shape[0], qweight.shape[1], output_dim, bits, maxq,
    input.stride(0), input.stride(1),
    qweight.stride(0), qweight.stride(1),
    output.stride(0), output.stride(1),
    scales.stride(0), qzeros.stride(0))
    return output
  • GPTQ-for-LLaMa/quant.py

    Lines 455 to 484 in 9463299

    def autotune_warmup(model, transpose = False):
    """
    Pre-tunes the quantized kernel
    """
    from tqdm import tqdm
    n_values = {}
    for _, m in model.named_modules():
    if not isinstance(m, QuantLinear):
    continue
    k = m.infeatures
    n = m.outfeatures
    if n not in n_values:
    n_values[n] = (k, m.qweight.cuda(), m.scales.cuda(), m.qzeros.cuda(), m.g_idx.cuda(), m.bits, m.maxq)
    print(f'Found {len(n_values)} unique N values.')
    print('Warming up autotune cache ...')
    for m in tqdm(range(0, 12)):
    m = 2 ** m # [1, 2048]
    for n, (k, qweight, scales, qzeros, g_idx, bits, maxq) in n_values.items():
    a = torch.randn(m, k, dtype=torch.float16, device='cuda')
    matmul248(a, qweight, scales, qzeros, g_idx, bits, maxq)
    if transpose:
    a = torch.randn(m, n, dtype=torch.float16, device='cuda')
    transpose_matmul248(a, qweight, scales, qzeros, g_idx, bits, maxq)
    del n_values

Whereas in the text-generation-webui there appears to be code setting device=cpu:

@0xdevalias 0xdevalias changed the title How to install on macOS? (ERROR: Could not find a version that satisfies the requirement triton==2.0.0 (from versions: none)) How to install on macOS? Apr 9, 2023
@qwopqwop200
Copy link
Owner

macos is not supported.

@0xdevalias
Copy link
Author

0xdevalias commented Apr 10, 2023

@qwopqwop200 Is it not supported because it there are technical limitations that say it can't be, or just not supported because you don't want to have to put in the extra effort/capacity/etc to do so?

If the latter then I might look into it more, but if there are technical limitations preventing it, it would be good to know those up front.

@qwopqwop200
Copy link
Owner

Currently, if you are not an apple silicon, I think you will probably apply. Apple silicon is not supported due to technical limitations.

@0xdevalias
Copy link
Author

Thanks for that :)

I have both an Intel 2019 Macbook Pro (which I was using for the above), and an apple silicon M2 MacBook Pro (which I haven't tried to run anything on yet)

If you're able to, what are the technical limitations that currently prevent it from running on Apple silicon?

@qwopqwop200
Copy link
Owner

To be precise, the biggest thing is not to support CUDA, and there are no other limitations.

@0xdevalias
Copy link
Author

So could the references to cuda as the device just be changed to mps or cpu or similar then (which is what I was suggesting above in #146 (comment)), or are there cuda specific customisations happening in this repo's code?

@qwopqwop200
Copy link
Owner

Currently, this code supports only cuda users, and it is thought that the implementation of cpu is possible. But I don't have the ability to implement it.

@qwopqwop200
Copy link
Owner

Currently an alternative to this is to use llama.cpp.

@0xdevalias
Copy link
Author

Ok, thanks for the info :)

@Erraoudy
Copy link

same issue here, can anyone help about How to install on macOS?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants