Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented Support Another Languages - Portugue tested | Add Google Translation #1596

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

joaorura
Copy link
Contributor

Some small models, such as llama 3.1 or 3.2 3b or 2B. Mainly quantized, have difficulties with prompts in Portuguese, occasionally generating text in English, which is accented with part of the prompts in English.
In this way, translating all parts of the prompt is interesting.
In addition, for small models, translating can be a poorly executed task.
In this way, being able to have support from Google Translate is a quick and practical way to translate the prompts without worrying about getting a more robust model.
In addition, there was no support in English, so I added support for some more languages ​​using the NLTK segmenter implementation.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Oct 29, 2024
@joaorura
Copy link
Contributor Author

Another problem I had with the translation was that in none of my attempts did the hash load work. I noticed that it always changed between iterations.
I realized that this is because Python has the hash calculated for strings that did not remain constant between different executions.
I fixed this and implemented it in a generic way.

@jjmachan
Copy link
Member

hey @joaorura thanks a lot for putting together this PR 🙂

I'm not sure about this strategy though and even if we do it, we have to refactor it as optional so as to not add any new dependencies but seems like an overkill.

@shahules786 what do you think?

@jjmachan jjmachan added the waiting 🤖 waiting for response. In none will close this automatically label Nov 8, 2024
@joaorura
Copy link
Contributor Author

joaorura commented Nov 8, 2024

Remembering that in addition to the Google Translate translation and the issue of translating all the strings, there is also a language limitation issue that I was also able to resolve. I don't know if it is this strategy that you mentioned, but I believe it is a potential help for me and essential since I would not be able to run the project without it.
There is also a problem with the hash that was not working because the hashes never matched.

@github-actions github-actions bot removed the waiting 🤖 waiting for response. In none will close this automatically label Nov 8, 2024
@shahules786
Copy link
Member

@joaorura Adding Google Translate is an overkill. One can use larger LLMs to do the translation if necessary. Would recommend closing this PR. @jjmachan

@joaorura
Copy link
Contributor Author

joaorura commented Nov 15, 2024

@joaorura Adding Google Translate is an overkill. One can use larger LLMs to do the translation if necessary. Would recommend closing this PR. @jjmachan

@shahules786

Although it is always an option to use larger LLMs, they are not necessarily available to us. After all, to use them, you need powerful hardware available with the model data available or a key with values ​​available in some online API that needs to be configured and generally requires monetary values. So when running smaller models, it is interesting to have this resource. I honestly do not understand the context of exaggeration... The scenario of usefulness to the detriment of the context of smaller LLMs seems quite clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants