-
-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text extraction issue with Inter v4.1 in XeLaTeX-generated PDFs #774
Comments
Your PDFs appear to be corrupt, or infected, or both. |
@kenmcd I highly doubt they are either corrupt or infected. It's more likely that some of your protection tools are giving false positives. Anyway, a zip archive is attached. |
Appears the encoding is wrong in the v4.1 PDF. |
@kenmcd I discovered that, for example, I can confirm that the same input produces correct mappings in a LibreOffice-generated PDF while using both glyphs. Unfortunately, I lack detailed knowledge about how mappings in PDFs work. However, it's clear that all the necessary information to map alternate glyphs correctly exists in the font, as LibreOffice handles it successfully. This is likely an issue with XeTeX itself or the LaTeX packages it relies on. I will report this to their team and am closing this issue, as it is no longer relevant. Thank you for such a great font! |
I just realized what is going on. |
@kenmcd Sorry, did I rush to close the issue? Feel free to reopen it if you believe it is related to the font. |
@kenmcd This completely blew my mind, as the following produces correct mappings by enabling the
|
No, I do not think this is an issue (error) with the font. According to OpenType specs... Just to be sure, I checked Inter v4.1 Regular OTF - and So the |
The font’s Some PDF producers like XeTeX here will use the Try using |
Seeing #541, it is seems unlikely that PUA mappings are going away. |
@khaledhosny I can confirm that |
The solution with |
Just a wild guess: could it be worth mapping these glyphs to both the Private Use Area (PUA) and the actual text? I mean having the |
The proper code points are already mapped to the default glyphs, and it is not possible to map the same code point to different glyphs in |
Text copied from a XeLaTeX-produced PDF using Inter v4.1 contains unexpected characters, while version 3.19 works flawlessly.
To Reproduce
Install XeTeX, Poppler, curl, unzip
Run the script below in a dedicated directory. Both produced PDFs are attached for reference:
It outputs the following, even though both PDFs appear fine visually:
Expected behavior
I expect it to output:
Environment
Additional notes
You can reproduce the issue by copying text from the provided PDFs. The problem is evident at least in macOS Preview.
inter-3.19.pdf
inter-4.1.pdf
The text was updated successfully, but these errors were encountered: