-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode non-breaking space and various characters not getting translated properly #55
Comments
Hello @dwertheimer, thanks for the feedback! |
Thank you so much for the quick response.
Here's a quick test case showing the ones that I've found so far. There may
be more. Solving these would be a great start. Thanks again for the tool and getting
back to me!
Github won't let me attach an enex file. So trying as a zip. Let me know if this works. @wormi4ok
[TestCase2.enex.zip](https://github.com/wormi4ok/evernote2md/files/7072974/TestCase2.enex.zip)
|
Now, it's a bit clearer what we are talking about.
If we don't escape these symbols, input html like this:
which breaks the formatting. |
I encountered this issue recently, and since my notes didn't contain any pathological cases like that one, I worked around it by enabling the DoNotEscape option in godown (commenting out lines 283-287 here would have the same effect). I think the better solution would be to improve the escaping within godown to selectively escape characters within their contexts. For example, Some discretion needed as well: IMHO |
Hope you saw my comment on the other (closed) issue, which is still an
issue for me. Please let me know. Thanks so much. Let me know what I can do
to help track this down.
David
…On Sun, Aug 29, 2021 at 1:02 PM, Stanislav Petrashov < ***@***.***> wrote:
Now, it's a bit clearer what we are talking about.
1. I don't see the <0xa0> in the resulting md file. It may depend on
the editor, but a least in the TestCase2.enex that you've shared - the
NBSP symbol looks correct to me.
2. Talking about \ chars. Even though it looks ugly indeed, I don't
think that it's possible to avoid escaping without breaking the resulting
markdown. Here is a list of all escaped characters explained - https://
github.com/mattn/godown/blob/master/godown.go#L16-L28
<https://github.com/mattn/godown/blob/master/godown.go#L16-L28>
If we don't escape these symbols, input html like this:
<a href="https://github.com">Test ] me</a>
becomes
[Test ] me](https://github.com)
which breaks the formatting.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#55 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACEI6VFTUB3QCOU6AJGX5L3T7KG6RANCNFSM5DANCNEA>
.
|
That would suggest that your markdown reader is not processing the document as UTF-8. If it can't handle that character, it would fail to handle any non-ASCII characters. It may make sense for evernote2md to include a UTF-8 BOM in the file if it contains UTF-8 to help downstream consumers. |
I'm having an issue with erroneous characters in my converted markup files as well. An example of my case is a webclip note that contains I tried the testcase provided above by @dwertheimer but there the decoding seems ok to me. |
Hey @jussihuotari ! That's an interesting observation. I've searched for the errors like |
Good question @wormi4ok. I checked the Regarding a reproducible example: I amended the earlier test case in this issue with the text causing the error, attached here. I'm not sure if this is actually well-formed / realistic note format, but maybe it's useful as a test case? When I run In my test, these errors are not present in the |
Thanks for this tool! I don't know if Evernote changed the .enex export formatting or what, but running the command on a normal note results in some weird Unicode characters that show up as strange characters in markdown.
For instance:
<0xa0>
characters in the resulting markdown (this is apparently a non-breaking space. Markdown readers show that as aÂ
or something else.\(
or\)
\-
\!
@wormi4ok, is there any chance I could get your help with this? I guess I could create my own post-processor, but I would really prefer not to do that. Seems like this may be an easy change for you. I don't know Go so I can't do it myself.
The text was updated successfully, but these errors were encountered: