Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using 'flash_attention_2' for the 'attn_implementation' option results in a 'not installed' error for the 'flash_attn' python package #605

Open
mmartin9684-sil opened this issue Dec 5, 2024 · 4 comments
Assignees

Comments

@mmartin9684-sil
Copy link
Collaborator

When this option is used in a config.yml file to request the use of Flash Attention for a model, the experiment reports an ImportError for the flash_att module, as shown in the stack trace below.

params:
     attn_implementation: flash_attention_2

This occurs when launching an experiment into ClearML/AQuA server from the local command line.

[INFO|modeling_utils.py:3937] 2024-12-04 18:54:24,805 >> loading weights file pytorch_model.bin from cache at /root/.cache/huggingface/hub/models--facebook--nllb-200-distilled-1.3B/snapshots/7be3e24664b38ce1cac29b8aeed6911aa0cf0576/pytorch_model.bin
[INFO|safetensors_conversion.py:61] 2024-12-04 18:54:24,942 >> Attempting to create safetensors variant
2024-12-04 18:54:25,205 - clearml.model - INFO - Selected model id: b7a7fa8e150a4282bdb5a592d8d5a342
[INFO|safetensors_conversion.py:24] 2024-12-04 18:54:25,215 >> Attempting to convert .bin model on the fly to safetensors.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/root/.clearml/venvs-builds/3.10/task_repository/silnlp.git/silnlp/nmt/experiment.py", line 217, in <module>
    main()
  File "/root/.clearml/venvs-builds/3.10/task_repository/silnlp.git/silnlp/nmt/experiment.py", line 213, in main
    exp.run()
  File "/root/.clearml/venvs-builds/3.10/task_repository/silnlp.git/silnlp/nmt/experiment.py", line 46, in run
    self.train()
  File "/root/.clearml/venvs-builds/3.10/task_repository/silnlp.git/silnlp/nmt/experiment.py", line 72, in train
    model.train()
  File "/root/.clearml/venvs-builds/3.10/task_repository/silnlp.git/silnlp/nmt/hugging_face_config.py", line 783, in train
    AutoModelForSeq2SeqLM.from_pretrained(self._config.model, config=model_config),
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 4091, in from_pretrained
    config = cls._autoset_attn_implementation(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1617, in _autoset_attn_implementation
    cls._check_and_enable_flash_attn_2(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1747, in _check_and_enable_flash_attn_2
    raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
2024-12-04 13:54:40
Process failed, exit code 1
@mmartin9684-sil mmartin9684-sil added the bug Something isn't working label Dec 5, 2024
@isaac091
Copy link
Collaborator

isaac091 commented Dec 5, 2024

Hi @mmartin9684-sil, that library isn't installed because FlashAttention2 does not support using padding tokens, which is something we need. The other options, "eager" and "sdpa", are supported.

@ddaspit
Copy link
Collaborator

ddaspit commented Dec 5, 2024

@mmartin9684-sil You want to use sdpa and not flash_attention_2. As Isaac said, FlashAttention2 isn't fully supported for NLLB. SDPA provides similar optimizations and is compatible with NLLB.

@mmartin9684-sil
Copy link
Collaborator Author

Understood, the tests are being done with SPDA (thanks Isaac!).

This issue to track that there's a problem with our implementation of FlashAttention.

@ddaspit ddaspit moved this from 🆕 New to 📋 Backlog in SIL-NLP Research Dec 5, 2024
@ddaspit ddaspit removed the bug Something isn't working label Dec 5, 2024
@TaperChipmunk32
Copy link
Collaborator

I added a note to the wiki that FlashAttention2 is not currently compatible with NLLB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 📋 Backlog
Development

No branches or pull requests

4 participants