Using 'flash_attention_2' for the 'attn_implementation' option results in a 'not installed' error for the 'flash_attn' python package #605

mmartin9684-sil · 2024-12-05T17:28:05Z

When this option is used in a config.yml file to request the use of Flash Attention for a model, the experiment reports an ImportError for the flash_att module, as shown in the stack trace below.

params:
     attn_implementation: flash_attention_2

This occurs when launching an experiment into ClearML/AQuA server from the local command line.

[INFO|modeling_utils.py:3937] 2024-12-04 18:54:24,805 >> loading weights file pytorch_model.bin from cache at /root/.cache/huggingface/hub/models--facebook--nllb-200-distilled-1.3B/snapshots/7be3e24664b38ce1cac29b8aeed6911aa0cf0576/pytorch_model.bin
[INFO|safetensors_conversion.py:61] 2024-12-04 18:54:24,942 >> Attempting to create safetensors variant
2024-12-04 18:54:25,205 - clearml.model - INFO - Selected model id: b7a7fa8e150a4282bdb5a592d8d5a342
[INFO|safetensors_conversion.py:24] 2024-12-04 18:54:25,215 >> Attempting to convert .bin model on the fly to safetensors.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/root/.clearml/venvs-builds/3.10/task_repository/silnlp.git/silnlp/nmt/experiment.py", line 217, in <module>
    main()
  File "/root/.clearml/venvs-builds/3.10/task_repository/silnlp.git/silnlp/nmt/experiment.py", line 213, in main
    exp.run()
  File "/root/.clearml/venvs-builds/3.10/task_repository/silnlp.git/silnlp/nmt/experiment.py", line 46, in run
    self.train()
  File "/root/.clearml/venvs-builds/3.10/task_repository/silnlp.git/silnlp/nmt/experiment.py", line 72, in train
    model.train()
  File "/root/.clearml/venvs-builds/3.10/task_repository/silnlp.git/silnlp/nmt/hugging_face_config.py", line 783, in train
    AutoModelForSeq2SeqLM.from_pretrained(self._config.model, config=model_config),
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 4091, in from_pretrained
    config = cls._autoset_attn_implementation(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1617, in _autoset_attn_implementation
    cls._check_and_enable_flash_attn_2(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1747, in _check_and_enable_flash_attn_2
    raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
2024-12-04 13:54:40
Process failed, exit code 1

The text was updated successfully, but these errors were encountered:

isaac091 · 2024-12-05T19:31:39Z

Hi @mmartin9684-sil, that library isn't installed because FlashAttention2 does not support using padding tokens, which is something we need. The other options, "eager" and "sdpa", are supported.

ddaspit · 2024-12-05T20:32:42Z

@mmartin9684-sil You want to use sdpa and not flash_attention_2. As Isaac said, FlashAttention2 isn't fully supported for NLLB. SDPA provides similar optimizations and is compatible with NLLB.

mmartin9684-sil · 2024-12-05T20:37:31Z

Understood, the tests are being done with SPDA (thanks Isaac!).

This issue to track that there's a problem with our implementation of FlashAttention.

TaperChipmunk32 · 2024-12-06T15:53:39Z

I added a note to the wiki that FlashAttention2 is not currently compatible with NLLB.

mmartin9684-sil added the bug Something isn't working label Dec 5, 2024

mmartin9684-sil assigned ddaspit Dec 5, 2024

ddaspit added this to SIL-NLP Research Dec 5, 2024

github-project-automation bot moved this to 🆕 New in SIL-NLP Research Dec 5, 2024

ddaspit moved this from 🆕 New to 📋 Backlog in SIL-NLP Research Dec 5, 2024

ddaspit removed the bug Something isn't working label Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using 'flash_attention_2' for the 'attn_implementation' option results in a 'not installed' error for the 'flash_attn' python package #605

Using 'flash_attention_2' for the 'attn_implementation' option results in a 'not installed' error for the 'flash_attn' python package #605

mmartin9684-sil commented Dec 5, 2024

isaac091 commented Dec 5, 2024

ddaspit commented Dec 5, 2024

mmartin9684-sil commented Dec 5, 2024

TaperChipmunk32 commented Dec 6, 2024

Using 'flash_attention_2' for the 'attn_implementation' option results in a 'not installed' error for the 'flash_attn' python package #605

Using 'flash_attention_2' for the 'attn_implementation' option results in a 'not installed' error for the 'flash_attn' python package #605

Comments

mmartin9684-sil commented Dec 5, 2024

isaac091 commented Dec 5, 2024

ddaspit commented Dec 5, 2024

mmartin9684-sil commented Dec 5, 2024

TaperChipmunk32 commented Dec 6, 2024