Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where is the source file on the nlpl page exactly? #3

Open
tgmerritt opened this issue Jul 16, 2021 · 1 comment
Open

Where is the source file on the nlpl page exactly? #3

tgmerritt opened this issue Jul 16, 2021 · 1 comment

Comments

@tgmerritt
Copy link

The notebook references http://opus.nlpl.eu/download.php?f=OpenSubtitles/v2018/mono/OpenSubtitles.it.gz as the source, when I visit the linked opus.nlpl.eu page I see this grid with a bunch of LANG.xml.gz files - I cannot seem to locate a different file than Italian - can you link me to the exact page where I can find alternatives to Italian language so that I can train the model with a different data source please?

@frankplus
Copy link
Owner

https://opus.nlpl.eu/OpenSubtitles-v2018.php is the page with all the conversational dataset provided by OpenSubtitles.
Look for the first row in the second table, corresponding to the monolingual plain text files (tokenized).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants