Add logit softcapping to GQA #876

kunal-vaishnavi · 2024-09-05T17:19:12Z

Description

This PR adds the softcap attribute to the GroupQueryAttention op.

Motivation and Context

This PR helps resolve the NaN output issue with Gemma-2 raised in this issue.

src/python/py/models/builder.py

fxmarty-amd · 2024-10-31T09:53:24Z

ORT 1.19.2 does not support softcap. I later had:

onnxruntime_genai.onnxruntime_genai.OrtException: Load model from /home/fxmarty/repos/onnxruntime-genai/src/python/py/models/llama_oga/model.onnx failed:This is an invalid model. In Node, ("/model/layers.0/attn/GroupQueryAttention", GroupQueryAttention, "com.microsoft", -1) : ("/model/layers.0/attn/qkv_proj/MatMulNBits/output_0": tensor(float),"","","past_key_values.0.key": tensor(float),"past_key_values.0.value": tensor(float),"/model/attn_mask_reformat/attn_mask_subgraph/Sub/Cast/output_0": tensor(int32),"/model/attn_mask_reformat/attn_mask_subgraph/Gather/Cast/output_0": tensor(int32),"cos_cache": tensor(float),"sin_cache": tensor(float),) -> ("/model/layers.0/attn/GroupQueryAttention/output_0": tensor(float),"present.0.key": tensor(float),"present.0.value": tensor(float),) , Error Unrecognized attribute: softcap for operator GroupQueryAttention

kunal-vaishnavi · 2024-10-31T18:05:22Z

The softcap attribute was added here. This PR has been merged to prepare for the stable releases of ONNX Runtime v1.20 and ONNX Runtime GenAI v0.5.0, which will be happening soon. You can build ONNX Runtime GenAI from source with a nightly ONNX Runtime version by copying over the necessary files to ort for now until the releases occur.

fxmarty-amd · 2024-11-04T09:22:19Z

@kunal-vaishnavi Thank you, makes sense

### Description This PR adds the `softcap` attribute to the `GroupQueryAttention` op. ### Motivation and Context This PR helps resolve the `NaN` output issue with Gemma-2 raised in [this issue](#692).

kunal-vaishnavi added 2 commits August 9, 2024 21:08

Add logit softcapping for Gemma-2

b8889e7

Merge branch 'main' into kvaishnavi/logit-softcapping

48c0a07

kunal-vaishnavi mentioned this pull request Sep 5, 2024

try to build model gemma2 ,but failed. #692

Open

yufenglee reviewed Sep 6, 2024

View reviewed changes

src/python/py/models/builder.py Show resolved Hide resolved

tianleiwu reviewed Sep 7, 2024

View reviewed changes

src/python/py/models/builder.py Outdated Show resolved Hide resolved

kunal-vaishnavi added the 0.5.0 label Oct 28, 2024

kunal-vaishnavi and others added 2 commits October 30, 2024 13:41

Change default epsilon value

1104235

Merge branch 'main' into kvaishnavi/logit-softcapping

2b507f9

tianleiwu approved these changes Oct 30, 2024

View reviewed changes

kunal-vaishnavi merged commit e222963 into main Oct 31, 2024
13 checks passed

kunal-vaishnavi deleted the kvaishnavi/logit-softcapping branch October 31, 2024 00:35

theshaneobrien mentioned this pull request Nov 1, 2024

model-qa.py is broken #1024

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add logit softcapping to GQA #876

Add logit softcapping to GQA #876

kunal-vaishnavi commented Sep 5, 2024

fxmarty-amd commented Oct 31, 2024 •

edited

Loading

kunal-vaishnavi commented Oct 31, 2024

fxmarty-amd commented Nov 4, 2024

Add logit softcapping to GQA #876

Add logit softcapping to GQA #876

Conversation

kunal-vaishnavi commented Sep 5, 2024

Description

Motivation and Context

fxmarty-amd commented Oct 31, 2024 • edited Loading

kunal-vaishnavi commented Oct 31, 2024

fxmarty-amd commented Nov 4, 2024

fxmarty-amd commented Oct 31, 2024 •

edited

Loading