Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v14.1.1 - Plaintext export does not support unicode characters (symbol, emojis...) #639

Open
john-p-knapp opened this issue Oct 5, 2024 · 10 comments

Comments

@john-p-knapp
Copy link

It appears that some characters are getting dropped in the plaintext export. I have specifically noticed –“”’ (U+2013, U+201C, U+201D, U+2019) are dropping in plaintext but are being exported in html and pdf formats.

I've tested with both the 14.1.1 & 14.1.2 beta versions. Happy to gather any additional details that would be helpful.

Thanks for all the work you do!

@cleidigh
Copy link
Collaborator

cleidigh commented Oct 6, 2024

@john-p-knapp
Thanks for the props!
I did a few tests today and verified your findings.

It's actually a larger issue. The plaintext "converter" api in Thunderbird actually does not handle unicode. Even the smile emoji is not converted. I am actually surprised this has not come up before. There is some technical or perhaps philosophical perspective that "plaintext" == ASCII which would mean not including unicode characters.

Aside from philosophy, I am not sure I can address this with the current apis. I will see if there are any new methods .
@cleidigh

@jobisoft
Copy link
Collaborator

jobisoft commented Oct 11, 2024

Hi @cleidigh !

In latest Thunderbird 128 we have added a new API:
browser.messengerUtilities.convertToPlainText

That seems to correctly keep these chars:

await browser.messengerUtilities.convertToPlainText("<p>Some special chars: –“”’ (U+2013, U+201C, U+201D, U+2019)</p><p>Are not dropped in plaintext</p>")

Result:

Some special chars: –“”’ (U+2013, U+201C, U+201D, U+2019)

Are not dropped in plaintext

The API is using this under the hood, but you should of course use the API directly :-)

We miss you in the developer meetings

@cleidigh
Copy link
Collaborator

@john-p-knapp
Using the new html converter mentioned by @jobisoft , I have changed plaintext export such that it supports conversion of all unicode characters.
You can grab from here to test:

@cleidigh

@cleidigh cleidigh changed the title v14.1.1 - Characters missing in plaintext export v14.1.1 - Plaintext export does not support unicode characters (symbol, emojis...) Nov 13, 2024
@cleidigh cleidigh self-assigned this Nov 13, 2024
@Dricc123
Copy link

Is this only for export? because I have the issue with emails I imported. Letters with accents in french don't show properly in some plaintext emails

@cleidigh
Copy link
Collaborator

@Dricc123
Import has a totally different path and does not use a plain text converter.
I am not sure about what you are seeing.
Do you have a sample email you could send me?
[email protected]

@cleidigh

@cleidigh
Copy link
Collaborator

@Dricc123
Got your email with your explanation.
Can you send me an eml with the problem? I want to try Importing myself.
@cleidigh

@cleidigh
Copy link
Collaborator

@Dricc123
FYI I did a roundtrip export /import of a message with unicode characters without any problem.
I think I need to see one of your problem emls.
@cleidigh

@Dricc123
Copy link

Dricc123 commented Nov 23, 2024

@cleidigh thanks for checking. I finally found my root cause: it's the mbox file that was created by Google Takeout that has a problem: https://webapps.stackexchange.com/questions/71153/takeout-breaks-my-non-ascii
Nothing related to Thunderbird. Thanks again.

@cleidigh
Copy link
Collaborator

@Dricc123
Thanks for the heads up.
It did sound like a source issue.
@cleidigh

@cleidigh
Copy link
Collaborator

cleidigh commented Dec 6, 2024

@john-p-knapp
Would you have a chance to check this in beta 9?
Get here :

@cleidigh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants