-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate typos in the dictionary against a dictionary of valid words #1140
Comments
Originally posted by @Gelma in #1014 (comment) |
If you have aspell installed, you can dump the aspell wordbook. I checked (the probably outdated version of debian stable) with codespell. It results in
|
hunspell and en_GB results in
|
Thanks for this @sebweb3r . As you may have seen we got some core stuff in via #1142 . I'm not quite sure what my "other examples need checking" comment meant with regards not closing this issue then. I'm a bit unclear which way your checks have been done. Is this Codespell run against aspell and hunspell's dictionaries? Also
Originally posted by @peternewman in #1619 |
Sorry for not being precise. I've dumped the aspell and hunspell dictionaries. Then, I've checked the dumbs with codespell. So all of these lines are words, that are "wrong" in codespell, but exist in aspell or hunspell. But I'm not sure, if one wants to delete all of the corrections. |
I haven't seen #1142 yet, but I will have a closer look. |
Going the other way around, and running aspell against codespell's correct-words (generated with
(And possibly many more? I got bored of checking... 😉 (as there are many entries in codespell that aspell doesn't recognise, but a Google search suggests are still spelled correctly) ) If codespell is going to suggest corrections, those corrections ought to be spelled correctly 😀 |
@lurch that's why I never let spellcheckers automatically fix the errors. acknowledgment depends on enUS or enGB #1623 (One of the physics journals insists on using the variant without e. But they have both spellings on their introductions webpage :-) ) |
Ooops, I didn't realise that it had multiple spellings (like color / colour), sorry!
Cool 👍 |
So I added some checking, but we need #1485 to have a larger dictionary and fewer false positives, or we need to split the main dictionary and rare into corrections that are in the dictionary and those that aren't, so we can prioritise more carefully checking the non-dictionary words. Currently it doesn't check the corrections as lots of valid technical terms aren't in the aspell word list. |
@peternewman Words not in aspell dictionary can be added after #2933. Such words need to be whitelisted because some specialised words will be missing from the aspell or other dictionaries, no matter how large the dictionary is. Can we close this issue? |
We'll need to find a list of valid words from somewhere, but this keeps happening to varying degrees of detectability, e.g. #1014 (comment)
The text was updated successfully, but these errors were encountered: