Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let user decide their preferred encoding method #56

Open
basvdheuvel opened this issue Aug 21, 2019 · 1 comment
Open

Let user decide their preferred encoding method #56

basvdheuvel opened this issue Aug 21, 2019 · 1 comment

Comments

@basvdheuvel
Copy link

  • pypugjs version: 5.8.1
  • Flask version: 1.1.1
  • Python version: 3.6.9
  • Operating System: Linux 5.2.9-arch1-1-ARCH

Description

If a template file contains a single non-ASCII character (e.g. "ë"), the conversion might contain a wrongly converted character (e.g. "ë"). After doing some research I found that this issue is strongly connected to this PyPugJS issue.

The problem is with the chardet package. By scanning a file, it makes a guess at the encoding of the file. However, if there is only a single non-ASCII character in the file, a wrong encoding might be detected (confer this and this issue). A solution was proposed to this problem, but it never got accepted and is now out-of-date. The correspondence in that last referenced pull request contains a quick and dirty patch, which resolved the issue for me.

I am, however, not satisfied with this kind of solution. It is not nice for users to have to go through the research I went through to resolve a strange bug as this one. Even mentioning the hotfix in the PyPugJS documentation seems like the wrong way to go. The problem is that this package now forces users to rely on an unreliable package.

My proposal is to change the open method in pypugjs/runtime.py introduced in PR #27 to use a global setting which the user can use to force their preferred encoding. The default value would be auto, which uses chardet. Other values can be any strings, as long as they are valid names of encodings.

I would love to do this work myself, but I am on a tight deadline for a job, and I might not have time nor urgency to resolve the issue once I'm done with that job. Hopefully somebody else can pick up the slack. Many thanks!

@loleg
Copy link

loleg commented Sep 18, 2019

This issue is affecting me on Linux as well, thanks for raising. Looking through the code, I think your proposal is valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants