Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for NusaTranslation MT #356

Closed
SamuelCahyawijaya opened this issue Jul 9, 2023 · 4 comments
Closed

Create dataset loader for NusaTranslation MT #356

SamuelCahyawijaya opened this issue Jul 9, 2023 · 4 comments
Assignees

Comments

@SamuelCahyawijaya
Copy link
Member

NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?nusa_translation_mt

Dataset nusa_translation_mt
Description NusaTranslation is a sentence-level datasets which covers 11 local languages in Indonesia. The dataset is human-translated from a part of IndoLEM Sentiment and EmoT dataseets where a native-speaker annotator are requested to translate to the target language given an Indonesian sentence. The data cover ~72k sentence pairs of translation data.
License CC-BY-NC-SA 4.0
@catlaughing
Copy link
Contributor

#self-assign

@SamuelCahyawijaya
Copy link
Member Author

Closed in #364

@fhudi
Copy link
Contributor

fhudi commented Sep 13, 2023

@SamuelCahyawijaya @catlaughing
Issues / bug found on this dataset.

  1. Batak (btk) cannot be loaded
  2. Unable to load complete language pairs

@SamuelCahyawijaya
Copy link
Member Author

@fhudi : The issue has been fixed now. Kindly try updating to nusacrowd==0.1.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

3 participants