Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add codespell to CI #2075

Closed
wants to merge 1 commit into from
Closed

Add codespell to CI #2075

wants to merge 1 commit into from

Conversation

DimitriPapadopoulos
Copy link
Contributor

The intention is to help catch typos early.

A few notes:

  • One of the caveats is the necessity to maintain a list of false positives. There are not too many of them, but contributors might have to add a new false positive to the list from time to time.

  • The location of the list of false positives is:
    .github/codespell_ignore_words.txt

  • Codespell checks the current directory for a file named setup.cfg or .codespellrc. In the absence of setup.cfg, I have added a new hidden file .codespellrc. The benefit is that the same configuration is automatically used when launching codespell manually from the root directory or from the CI.

  • The default dictionaries used by codespell are clear and rare. While rare might catch a few additional typos, it raises a few additional false positives, most notably complies and theses.

@JelleZijlstra
Copy link
Member

Thanks, but I'm not sure this is word the overhead of an extra CI check to maintain.

@DimitriPapadopoulos
Copy link
Contributor Author

No problem, it's as you wish. It was one of the motivations behind PR #2073.

A few notes:

* One of the caveats is the necessity to maintain a list of false positives.
  There are not too many of them, but contributors might have to add a new
  false positive to the list from time to time.

* The location of the list of false positives is:
  	.github/codespell_ignore_words.txt

* Codespell checks the current directory for a file named setup.cfg or
  .codespellrc. In the absence of setup.cfg, I have added a new hidden
  file .codespellrc. The benefit is that the same configuration is
  automatically used when launching codespell manually from the root
  directory or from the CI.

* The default dictionaries used by codespell are "clear" and "rare".
  While "rare" might catch a few additional typos, it raises a few
  additional false positives, most notably "complies" and "theses".
@CAM-Gerlach
Copy link
Member

A little late, but instead of adding a whole new CI workflow, we could simply add the codespell pre-commit check to our existing pre-commit config. This keeps the version locked for reproducible results, and requires no additional maintenance aside from the rare case of adding a new ignore. Furthermore, this would not only check and auto-correct spelling on the CIs, but allow contributors to do so locally with no additional setup, either on-demand (pre-commit run codespell) or, if they choose, automatically with each commit. I regularly use it on all my pre-commit workflows across a number of orgs/projects, to great success.

@DimitriPapadopoulos
Copy link
Contributor Author

Indeed codespell has been part of Supported hooks since pre-commit/pre-commit.com#237 in 2019.

@DimitriPapadopoulos
Copy link
Contributor Author

On the other hand, contributors should not let codespell auto-correct typos. There are false positives: contributors need to review suggestions and silence codespell when needed.

@hugovk
Copy link
Member

hugovk commented Nov 15, 2021

I'd be generally against including a spellchecker like codespell in the CI (and pre-commit is run on the CI).

It's likely to fail for valid words and the contributor might not know they need to add something to an ignore list, and may change their text to something worse just to get past the tool, or give up.

But they can be useful with human review, so perhaps a make command (with a pinned version) for on-demand runs. Of the codespell's 101 suggestions for this repo, about 10% are valid findings / unintentional misspellings, and the rest would need ignoring.

@DimitriPapadopoulos
Copy link
Contributor Author

But they can be useful with human review, so perhaps a make command (with a pinned version) for on-demand runs. Of the codespell's 101 suggestions for this repo, about 10% are valid findings / unintentional misspellings, and the rest would need ignoring.

You're right, that's a lot of false positives, but I just want to point out that a lot of typos have already been fixed by #2073.

@DimitriPapadopoulos
Copy link
Contributor Author

I could add codespell to requirements.txt and then a codespell target to the Makefile.

Do you have a suggestion for the location/name of the file containing the false positives, or at least the most common ones? I tend to name it after the --ignore-words option, typically codespell/ignore-words.txt.

Would I really need to pin the version? I see that other tools are not really pinned down:

peps/requirements.txt

Lines 1 to 6 in 4109619

# Requirements for building PEPs with Sphinx
sphinx >= 3.5
docutils >= 0.16
# For RSS
feedgen >= 0.9.0 # For RSS feed

@hugovk
Copy link
Member

hugovk commented Nov 15, 2021

Do you have a suggestion for the location/name of the file containing the false positives, or at least the most common ones? I tend to name it after the --ignore-words option, typically codespell/ignore-words.txt.

Or prefix with a dot so it's hidden by default? .codespell/ignore-words.txt

Would I really need to pin the version? I see that other tools are not really pinned down:

Hm, maybe not. The pinning idea was so that new warnings don't suddenly appear in new versions, but as this isn't run by CI perhaps pinning is not needed.

@CAM-Gerlach
Copy link
Member

I definitely can understand not wanting to deter contributors if it runs in CI, given the chance of false positives. And if autofix is enabled, while pre-commit will abort the commit with a message to review the changes, which the contributor would then need to re-add and re-commit, I could certainly see the possibility of a contributor not carefully checking, especially if they weren't familiar with the workflow.

Instead of adding it as a hard requirement which gets installed on all platforms, CIs and locally, (regardless if the user actually wants to perform spell checking), and in the same environment as the build (potential for dep conflicts), why not just add the pre-commit hook with hook-stage: manual, and run it via pre-commit run codespell (perhaps with an alias in the Makefile, for completeness)? Pre-commit will then take care of installing it if/when it is needed in its own isolated environment with a consistent pinned version for all devs (with easy updates via pre-commit autoupdate), and it works cross-platform unlike relying solely on the Makefile.

@DimitriPapadopoulos
Copy link
Contributor Author

Instead of adding it as a hard requirement which gets installed on all platforms, CIs and locally, (regardless if the user actually wants to perform spell checking), and in the same environment as the build (potential for dep conflicts) [..]

I understand you are referring to the addition to requirements.txt, aren't you? The codespell package does not have any runtime dependencies - except the standard Python library of course. It is a pretty light addition.

More important to me is the question of consistent versions across developers. Is it really indispensable to have a single version? Can't some developers run a more recent version or use a more recent dictionary (see Updating the dictionaries), if they want to? On the other hand, we wouldn't want them to use a very outdated version.

@DimitriPapadopoulos
Copy link
Contributor Author

Perhaps this discussion should be moved to the new PR #2151. I have added codespell configuration files, and I'm open to suggestions like this one on how to manually run codespell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants