Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Most of the hashs wont works regarding file encoding #139

Open
GoogleCodeExporter opened this issue Jan 22, 2016 · 4 comments
Open

Most of the hashs wont works regarding file encoding #139

GoogleCodeExporter opened this issue Jan 22, 2016 · 4 comments

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?
1. Create a text file containing aÿþa
2. Convert it to AINSII/UTF8/UTF8-NOBOM/UTF16/.....
3. Check the hash returned and compare with a Checksum application.

What is the expected output? What do you see instead?
The CryptoJS lib force conversion to UTF8 on the input, it will return wrong 
hash...


What version of the product are you using? On what operating system?
Firefox 31 / 3.1.2

Please provide any additional information below.
How to fix it ? Simply don't use UTF8 encode on the input because you really 
don't need it, you would need UTF8 encode if you wanted to "SHOW" the content, 
but you really don't need to UTF8 encode the file to get a hash of it.... Or 
you will get a very wrong hash.


I attached a screenshot showing how to fix it for SHA3.js file, however, you 
will have the same issues in almost all if not all the others hashs 
implementation. I got the same issue with SHA-256.

For SHA3   : q to e
For SHA256 : l to k

Easy to realize if you look at my screenshot.

Note that is a "temp-fix", you may need the UTF8 somewhere else in the file for 
w/e reason.

Original issue reported on code.google.com by [email protected] on 3 Aug 2014 at 2:52

Attachments:

@GoogleCodeExporter
Copy link
Author

I used JS Beautifier to get the code like that btw, if you wonder why my file 
is not minified.

http://jsbeautifier.org/

Original comment by [email protected] on 3 Aug 2014 at 2:54

@GoogleCodeExporter
Copy link
Author

also, I use js FileReader with reader.readAsBinaryString to open my file.

Original comment by [email protected] on 3 Aug 2014 at 2:58

@GoogleCodeExporter
Copy link
Author

So, the issue is that JavaScript strings are UTF-16, always. When you 
readAsBinaryString, of course, only a small subset of JavaScript's possible 
characters are used, but CryptoJS has no way to know that. In hindsight, I 
probably should have required the library user to always specify the character 
encoding of the input. Instead, the current behavior is that if you don't 
specify the character encoding (by first converting to bytes), then UTF-8 is 
picked as the default.

Original comment by Jeff.Mott.OR on 3 Aug 2014 at 3:51

@GoogleCodeExporter
Copy link
Author

We may not know the character encoding, it's hard to deal with character 
encoding when it come to a file.

For example: a UTF8-NoBom encoded file give you no hint about its current 
encoding, you need to parse the first character and determine what encoding it 
is, when this is done automatically when running a webserver like apache2, this 
task adds alot of code on the developper side who would use the API.

I don't know about UTF-16 and JavaScript strings but I can guarranty that any 
non-AINSII encoding mismatch between my checksum tool and the output the 
website gave me.

I don't know how much it affect users that will use this library only for 
normal strings (instead of output from a file), it would probably don't affect 
them at all... ? I don't see a case where you could have a mismatch of hash 
since the string will be coded INSIDE the file anyway.

Original comment by [email protected] on 3 Aug 2014 at 5:54

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant