Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

barcode generation seems to use UTF-8 instead of ISO-8859-1, causing improper expansion of escape sequences. #509

Open
petep0p opened this issue Nov 14, 2024 · 8 comments

Comments

@petep0p
Copy link

petep0p commented Nov 14, 2024

I generated the following code in Binary Eye:
qr_code__b_3a492

i used escape sequences to input the data, as i wanted a code that contained a specific string of hex values (attempting to program a barcode scanner). However, inputting the escape sequence:
\xC6

generates: "C3 86" instead of the expected "C6". This difference corresponds exactly to the differences in the value for the character "Æ" between UTF-8 encoding and ISO-8859-1 encoding. I read somewhere that the QR Code definition specifies that codes should use the ISO-8859-1 encoding, but i don't have the reference off the top of my head. Aside from that, it seems more logical to have an escape sequence be encoded in the most straightforward possible manner.

is this a bug, or am I not correctly understanding how everything is supposed to work?

thanks!! I love this app and use it all the time.

@petep0p
Copy link
Author

petep0p commented Nov 14, 2024

as added info, the same above code also has the following unintended expansions on other escape sequences:
"F2" appears as "C3 B2" (entered as \xF2)
"FF" appears as "C3 BF" (entered as \xFF)
"FD" appears as "C3 BD" (entered as \xFD)
these all follow a parallel type of relationship between UTF-8 and ISO-8859-1 as compared to "C6"

@markusfisch
Copy link
Owner

Hi, and thanks for filing an issue about that 👍 Yes, this is a bug, indeed! I'm already working on fixing it…

The error already occurs during unscaping here.

@petep0p
Copy link
Author

petep0p commented Nov 15, 2024

awesome, thank you! you're the best

@Sophira
Copy link

Sophira commented Dec 5, 2024

I read somewhere that the QR Code definition specifies that codes should use the ISO-8859-1 encoding, but i don't have the reference off the top of my head. Aside from that, it seems more logical to have an escape sequence be encoded in the most straightforward possible manner.

One thing worth pointing out is that the QR code generated by BinaryEye includes ECI data that identifies it as containing UTF-8 data. That being the case, I'm surprised that Markus is saying this is a bug. Am I misunderstanding how things work?

@markusfisch
Copy link
Owner

No, you're right, BinaryEye does indeed only use UTF-8 encoding when generating barcodes with text content. It's just that supporting escape sequences, which can insert arbitrary byte sequences into UTF-8 strings, can mess things up quite a bit.

Also, QR Codes with binary content are encoded in binary mode. So there's no UTF-8 conversion there, which would distort the data, of course.

When I said this is a bug I meant that (I think) it should be possible to encode binary data that is expressed with escape sequences in binary mode. For example, if you want to encode "\x00\xC6", this should be encoded in binary mode, and not as UTF-8 encoded text.

This leads to the question what the app does interpret as "binary", and I have a very simple (and probably very questionable) metric for that (source): if there's a byte value below 32 and it's not a tab, new line or carriage return, I treat the content as binary.

Maybe another checkbox to explicitly choose binary mode would make more sense.

@Sophira
Copy link

Sophira commented Dec 7, 2024

I think a "treat input data as binary" checkbox and parsing explicit hex codes makes sense; your audience for this app are likely going to be techy types who know what they need. There's one caveat to think about, though: the possibility of non-ASCII characters that aren't input as hex codes.

There's an intuitive solution to that that I'd like to suggest: When the checkbox is checked, swap the regular text field out for either an editable hex dump, or a text field that only allows entering characters in the range [0-9A-F] (and possibly forces spaces between every two characters). This not only makes it clear what's expected, but also means users don't have to keep typing \x between each byte.

@petep0p
Copy link
Author

petep0p commented Dec 7, 2024

i second that suggestion! an editable hex dump would be ideal. it's surprisingly difficult to find tools to generate codes like i was trying to make, and that feature is exactly what I'd been searching for and failing to find.

@markusfisch
Copy link
Owner

There's an intuitive solution to that that I'd like to suggest: When the checkbox is checked, swap the regular text field out for either an editable hex dump, or a text field that only allows entering characters in the range [0-9A-F] (and possibly forces spaces between every two characters). This not only makes it clear what's expected, but also means users don't have to keep typing \x between each byte.

I like that idea! It bugs me, too, that the app doesn't really support encoding binary data at the moment.

So I will try and close this gap as soon as I find the time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants