You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the YaRN paper, rope_base=10000 (static YaRN) was used, yielding excellent extrapolation results. Could the authors clarify whether setting rope_base to 500000 while using YaRN would produce a synergistic effect, i.e., achieving results that surpass both YaRN (rope_base=10000) and NTK-aware (rope_base=500000)? @jquesnelle
The text was updated successfully, but these errors were encountered:
In the YaRN paper, rope_base=10000 (static YaRN) was used, yielding excellent extrapolation results. Could the authors clarify whether setting rope_base to 500000 while using YaRN would produce a synergistic effect, i.e., achieving results that surpass both YaRN (rope_base=10000) and NTK-aware (rope_base=500000)? @jquesnelle
The text was updated successfully, but these errors were encountered: