You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Let me describe my journey to get here, to give an idea of why and what is necessary to improve.
Initially, I wanted to get Telegram Desktop to understand words like expectedly (it already has unexpectedly), or autocorrect.
There is zero information in Telegram GUI, which honestly should at least have an attribution nearby:
Telegram has the .dic and .aff files, aff has comments, so information can be added there at least.
Then through combination of this lovingly-stablebot-locked issue telegramdesktop/tdesktop#7960
and someone linking me to the relevant doc portion here I got to learn I am looking for Hunspell.
Hunspell seems to have open PR for many years, such as this hunspell/hunspell#612 2018 one, which seems fairly simple, and does not seem well maintained.
Later I found out that this is probably just LO using the upstream source here in this repo somehow, but wasted time figuring out if the site Arch uses is actually up to date.
I have also tried to look up the words I had problems with + words from the 2021 PR here as suggested, which led me to:
A) believe they should indeed be there, as they have more "Should Include" stars than words that are already included:
B) Very confused as "larger (size 80) SCOWL size [1]" is: "[1] The word was not in any of the speller dictionaries but was found in an larger SCOWL size. The smaller dictionaries included words up to size 60, and the larger dictionary include words up to size 70."
Which is not at all helpful as to figuring out what my dictionary size is or where to look for the bigger one?
TL;DR
Make sure it is EASY for people to contribute.
Add comments to the .aff files with information.
Explain what SCOWL/sizes are better on the lookup aspell page
What SHOULD the projects be attributing when using the huspell(?) dictionaries? I am sure TG upstream would be fine with adding some standardized linkthrough, but what to tell them.
Information that should be added at minimum in my opinion is:
Versioning info, to know what dictionary version one is looking at
Dictionary size info
Link to (or hardcode as text) a quick FAQ with how to add a missing word - info on the word checker look up the words should be there, for example.
Link to the repository where to contribute (this one(?))
Whatever else is necessary to NOT make people go through the 9 circles of Hell that I went through as described above
Add info about the process from getting it included upstream here to getting it included downstream
If I am barking up the wrong tree in some cases, please direct me appropriately, I am dazed and confused.
The text was updated successfully, but these errors were encountered:
C0rn3j
changed the title
Add information about the project(s) to the final dictionary files
Make contributing easier - add information about the project(s) to the final dictionary files
Dec 7, 2024
A lot of these seam like downstream issues in particular telegram could better document where the dictionary comes from. There is also a lot of outdated information out there that needs to be updated.
The way to suggest words is to just open an issue like you did. Version information is included in the README with the official dictionary. I am open to adding it to the actual affix file also. At the very end of the README there is information on how to dictionary is created, for example:
Build Date: Mon Dec 7 20:19:27 EST 2020
Wordlist Command: mk-list --accents=strip en_US 60
However, I agree this information is easy to miss and also rather cryptic to the average user. I am therefor open to add more human readable information to both the README and affix file to how the dictionary is created and in particular what SCOWL size is used.
Let me describe my journey to get here, to give an idea of why and what is necessary to improve.
Initially, I wanted to get Telegram Desktop to understand words like
expectedly
(it already hasunexpectedly
), orautocorrect
.There is zero information in Telegram GUI, which honestly should at least have an attribution nearby:
Telegram has the .dic and .aff files, aff has comments, so information can be added there at least.
Then through combination of this lovingly-stablebot-locked issue telegramdesktop/tdesktop#7960
and someone linking me to the relevant doc portion here I got to learn I am looking for Hunspell.
Hunspell seems to have open PR for many years, such as this hunspell/hunspell#612 2018 one, which seems fairly simple, and does not seem well maintained.
https://github.com/hunspell/hunspell?tab=readme-ov-file#dictionaries This made me believe that LibreOffice is somehow the upstream source for the dictionaries.
I went to check out that Arch Linux repos have https://archlinux.org/packages/extra/any/hunspell-en_us/ at version
2020.12.07
.Checking out the LO repo, the dictionary has a 2021 commit - https://cgit.freedesktop.org/libreoffice/dictionaries/commit/en/en_US.dic?id=4fa94195b8136364dd40bf2b0366a0fe32058899
Later I found out that this is probably just LO using the upstream source here in this repo somehow, but wasted time figuring out if the site Arch uses is actually up to date.
There is a lot of mentions about SCOWL and its sizes across the various docs:
https://cgit.freedesktop.org/libreoffice/dictionaries/tree/en/README_en_US.txt
I have also tried to look up the words I had problems with + words from the 2021 PR here as suggested, which led me to:
A) believe they should indeed be there, as they have more "Should Include" stars than words that are already included:
B) Very confused as
"larger (size 80) SCOWL size [1]"
is:"[1] The word was not in any of the speller dictionaries but was found in an larger SCOWL size. The smaller dictionaries included words up to size 60, and the larger dictionary include words up to size 70."
Which is not at all helpful as to figuring out what my dictionary size is or where to look for the bigger one?
TL;DR
Information that should be added at minimum in my opinion is:
If I am barking up the wrong tree in some cases, please direct me appropriately, I am dazed and confused.
The text was updated successfully, but these errors were encountered: