We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
想问一下,在这个icl/util_classes/predictor_classes.py中的Predictor类中,prefix_idxs到底是怎么确定的,我看到不同的数据集有不同的设置方式。 if task_name == 'sst2': self.prefix_idxs = [tokenizer.encode('Sentiment', add_special_tokens=False)[-1], tokenizer.encode(':', add_special_tokens=False)[0]] elif task_name == 'agnews': self.prefix_idxs = [tokenizer.encode('Answer', add_special_tokens=False)[-1], tokenizer.encode(':', add_special_tokens=False)[0]] elif task_name == 'trec': self.prefix_idxs = [tokenizer.encode(' Type', add_special_tokens=False)[-1], tokenizer.encode(':', add_special_tokens=False)[0]] elif task_name == 'emo': self.prefix_idxs = [tokenizer.encode('Emotion', add_special_tokens=False)[-1], tokenizer.encode(':', add_special_tokens=False)[0]] 我想问一下,如果对于其他的数据集(如gsm8k)应该怎么确定呢?谢谢
The text was updated successfully, but these errors were encountered:
啊这个就是按照附录里的每个任务的prompt模版(这个模版本身也是抄的别人在这个数据集上是怎么做的),然后取了label前面的两个token
Sorry, something went wrong.
gsm8k的问题是,如果你用的prompt里要求模型显式输出The answer is xxx,那这儿就是'The answer is'的最后两个token(xxx之前的两个token),但如果答案要自行抽取的话,那就不能用上面的代码了。(我看gsm8k上怎么抽取答案好像也五花八门的,我也不确定怎么干好https://github.com/facebookresearch/llama/issues/325)
No branches or pull requests
想问一下,在这个icl/util_classes/predictor_classes.py中的Predictor类中,prefix_idxs到底是怎么确定的,我看到不同的数据集有不同的设置方式。
if task_name == 'sst2':
self.prefix_idxs = [tokenizer.encode('Sentiment', add_special_tokens=False)[-1],
tokenizer.encode(':', add_special_tokens=False)[0]]
elif task_name == 'agnews':
self.prefix_idxs = [tokenizer.encode('Answer', add_special_tokens=False)[-1],
tokenizer.encode(':', add_special_tokens=False)[0]]
elif task_name == 'trec':
self.prefix_idxs = [tokenizer.encode(' Type', add_special_tokens=False)[-1],
tokenizer.encode(':', add_special_tokens=False)[0]]
elif task_name == 'emo':
self.prefix_idxs = [tokenizer.encode('Emotion', add_special_tokens=False)[-1],
tokenizer.encode(':', add_special_tokens=False)[0]]
我想问一下,如果对于其他的数据集(如gsm8k)应该怎么确定呢?谢谢
The text was updated successfully, but these errors were encountered: