Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The translate pipeline does not perform punctuation normalization #611

Open
benjaminking opened this issue Dec 19, 2024 · 1 comment
Open
Assignees
Labels
bug Something isn't working pipeline 6: infer Issue related to using a trained model to translate.

Comments

@benjaminking
Copy link
Collaborator

When performing translation, whether through translate.py or experiment.py with --translate, the Moses punctuation normalizer is not used. This does not match the other pipelines or what is done in Serval. Currently, a sentence could be translated differently with test.py and translate.py.

@benjaminking benjaminking added bug Something isn't working pipeline 6: infer Issue related to using a trained model to translate. labels Dec 19, 2024
@benjaminking benjaminking self-assigned this Dec 19, 2024
@benjaminking
Copy link
Collaborator Author

I have a fix written and tested for NLLB. Is it worth committing this change since NLLB is the dominant use case? Or is it worth testing for other models like Madlad first?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pipeline 6: infer Issue related to using a trained model to translate.
Projects
None yet
Development

No branches or pull requests

1 participant