You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When this option is used in a config.yml file to request the use of Flash Attention for a model, the experiment reports an ImportError for the flash_att module, as shown in the stack trace below.
params:
attn_implementation: flash_attention_2
This occurs when launching an experiment into ClearML/AQuA server from the local command line.
[INFO|modeling_utils.py:3937] 2024-12-04 18:54:24,805 >> loading weights file pytorch_model.bin from cache at /root/.cache/huggingface/hub/models--facebook--nllb-200-distilled-1.3B/snapshots/7be3e24664b38ce1cac29b8aeed6911aa0cf0576/pytorch_model.bin
[INFO|safetensors_conversion.py:61] 2024-12-04 18:54:24,942 >> Attempting to create safetensors variant
2024-12-04 18:54:25,205 - clearml.model - INFO - Selected model id: b7a7fa8e150a4282bdb5a592d8d5a342
[INFO|safetensors_conversion.py:24] 2024-12-04 18:54:25,215 >> Attempting to convert .bin model on the fly to safetensors.
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/root/.clearml/venvs-builds/3.10/task_repository/silnlp.git/silnlp/nmt/experiment.py", line 217, in <module>
main()
File "/root/.clearml/venvs-builds/3.10/task_repository/silnlp.git/silnlp/nmt/experiment.py", line 213, in main
exp.run()
File "/root/.clearml/venvs-builds/3.10/task_repository/silnlp.git/silnlp/nmt/experiment.py", line 46, in run
self.train()
File "/root/.clearml/venvs-builds/3.10/task_repository/silnlp.git/silnlp/nmt/experiment.py", line 72, in train
model.train()
File "/root/.clearml/venvs-builds/3.10/task_repository/silnlp.git/silnlp/nmt/hugging_face_config.py", line 783, in train
AutoModelForSeq2SeqLM.from_pretrained(self._config.model, config=model_config),
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 4091, in from_pretrained
config = cls._autoset_attn_implementation(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1617, in _autoset_attn_implementation
cls._check_and_enable_flash_attn_2(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1747, in _check_and_enable_flash_attn_2
raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
2024-12-04 13:54:40
Process failed, exit code 1
The text was updated successfully, but these errors were encountered:
Hi @mmartin9684-sil, that library isn't installed because FlashAttention2 does not support using padding tokens, which is something we need. The other options, "eager" and "sdpa", are supported.
@mmartin9684-sil You want to use sdpa and not flash_attention_2. As Isaac said, FlashAttention2 isn't fully supported for NLLB. SDPA provides similar optimizations and is compatible with NLLB.
When this option is used in a config.yml file to request the use of Flash Attention for a model, the experiment reports an ImportError for the flash_att module, as shown in the stack trace below.
This occurs when launching an experiment into ClearML/AQuA server from the local command line.
The text was updated successfully, but these errors were encountered: