You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Yarn team,
thank you guys for the awesome work. Currently I'm trying to evaluate several rope scaling methods and fortunately there are all available in this git. I have some question related to the Config of rope scaling.
I see that in the requirements.txt you already include transformers >= 4.34.0, so it mean I could use the "linear" and "dynamic-ntk" out of the box with transformers, just by add the rope scaling in AutoConfig.from_pretrained() like that: config.rope_scaling = { "type": "linear", "factor": args.linear }
or config.rope_scaling = { "type": "dynamic", "factor": args.dynamic_ntk }
I tried that and remove the patch for linear & dynamic-ntk and the result look identical when using your implemented patch.
Moreover, it also support Falcon architecture. (https://github.com/huggingface/transformers/blob/main/src/transformers/models/falcon/modeling_falcon.py#L162)
So my question is that, are there any different between this two implementation? or your implementation for linear & dynamic-ntk patch is for keeping the reproduction eval consistent?
The text was updated successfully, but these errors were encountered:
Hi Yarn team,
thank you guys for the awesome work. Currently I'm trying to evaluate several rope scaling methods and fortunately there are all available in this git. I have some question related to the Config of rope scaling.
I see that in the requirements.txt you already include transformers >= 4.34.0, so it mean I could use the "linear" and "dynamic-ntk" out of the box with transformers, just by add the rope scaling in AutoConfig.from_pretrained() like that:
config.rope_scaling = { "type": "linear", "factor": args.linear }
or
config.rope_scaling = { "type": "dynamic", "factor": args.dynamic_ntk }
I tried that and remove the patch for linear & dynamic-ntk and the result look identical when using your implemented patch.
Moreover, it also support Falcon architecture. (https://github.com/huggingface/transformers/blob/main/src/transformers/models/falcon/modeling_falcon.py#L162)
So my question is that, are there any different between this two implementation? or your implementation for linear & dynamic-ntk patch is for keeping the reproduction eval consistent?
The text was updated successfully, but these errors were encountered: