Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logit softcapping to GQA #876

Merged
merged 4 commits into from
Oct 31, 2024
Merged

Conversation

kunal-vaishnavi
Copy link
Contributor

Description

This PR adds the softcap attribute to the GroupQueryAttention op.

Motivation and Context

This PR helps resolve the NaN output issue with Gemma-2 raised in this issue.

@kunal-vaishnavi kunal-vaishnavi merged commit e222963 into main Oct 31, 2024
13 checks passed
@kunal-vaishnavi kunal-vaishnavi deleted the kvaishnavi/logit-softcapping branch October 31, 2024 00:35
@fxmarty-amd
Copy link

fxmarty-amd commented Oct 31, 2024

ORT 1.19.2 does not support softcap. I later had:

onnxruntime_genai.onnxruntime_genai.OrtException: Load model from /home/fxmarty/repos/onnxruntime-genai/src/python/py/models/llama_oga/model.onnx failed:This is an invalid model. In Node, ("/model/layers.0/attn/GroupQueryAttention", GroupQueryAttention, "com.microsoft", -1) : ("/model/layers.0/attn/qkv_proj/MatMulNBits/output_0": tensor(float),"","","past_key_values.0.key": tensor(float),"past_key_values.0.value": tensor(float),"/model/attn_mask_reformat/attn_mask_subgraph/Sub/Cast/output_0": tensor(int32),"/model/attn_mask_reformat/attn_mask_subgraph/Gather/Cast/output_0": tensor(int32),"cos_cache": tensor(float),"sin_cache": tensor(float),) -> ("/model/layers.0/attn/GroupQueryAttention/output_0": tensor(float),"present.0.key": tensor(float),"present.0.value": tensor(float),) , Error Unrecognized attribute: softcap for operator GroupQueryAttention

@kunal-vaishnavi
Copy link
Contributor Author

The softcap attribute was added here. This PR has been merged to prepare for the stable releases of ONNX Runtime v1.20 and ONNX Runtime GenAI v0.5.0, which will be happening soon. You can build ONNX Runtime GenAI from source with a nightly ONNX Runtime version by copying over the necessary files to ort for now until the releases occur.

@fxmarty-amd
Copy link

@kunal-vaishnavi Thank you, makes sense

aciddelgado pushed a commit that referenced this pull request Nov 5, 2024
### Description

This PR adds the `softcap` attribute to the `GroupQueryAttention` op.

### Motivation and Context

This PR helps resolve the `NaN` output issue with Gemma-2 raised in
[this issue](#692).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants