You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Attempt to hit the .json endpoint for a record that has UTF-8 characters that are non-ASCII, like here.
Expected result (user story)
As a website user
When I hit the JSON endpoint at `/catalog/{id}.json`
And when the PBCore XML for the record contains UTF-8 encoded, non-ASCII characters
The PBCore JSON displays with the UTF-8 characters
In order to have a reliable JSON endpoint for valid records.
Actual result
500 Internal Server Error
If you are the administrator of this website, then please read this web application's log file and/or the web server's log file to find out what went wrong.
Background
At the time of this writing, if you look for the string "Children’s" in the PBCore of this record, you can see the special apostrophe.
Under the hood, this is represented within the byte string as "\xE2", "\x80", "\x99", but the string being processed within the ruby code is (for reasons Idk yet) is being read as binary, without the UTF-8 encoding.
Then, when the XML is attempted to be converted to JSON (via an XSLT transformation performed by Nokogiri XML tool), we get a 500 error, because whatever is doing the processing apparently doesn't have the encoding required to interpret those 3 bytes, but needs to be able to for some reason (again, idk why yet). But this is what throws the
We've had encoding issues like this before, and we've always concluded that the more we can stick with UTF-8 characters, without having to replace them, the better. Thus, if it's UTF-8 coming in, and UTF-8 coming out, then making sure strings are using the right encoding throughout should help fix/avoid this problem.
Possible solutions
use xml_str.force_encoding('UTF-8') on the string prior to translating with the XSLT stylesheet.
Find where the string is being read as binary (i.e. during the XSLT transformation or before), and see if there's a param to pass to set the encoding.
Done when
User story above is satisfied.
The text was updated successfully, but these errors were encountered:
To replicate
Attempt to hit the
.json
endpoint for a record that has UTF-8 characters that are non-ASCII, like here.Expected result (user story)
Actual result
Background
"\xE2", "\x80", "\x99"
, but the string being processed within the ruby code is (for reasons Idk yet) is being read as binary, without the UTF-8 encoding.Possible solutions
xml_str.force_encoding('UTF-8')
on the string prior to translating with the XSLT stylesheet.Done when
User story above is satisfied.
The text was updated successfully, but these errors were encountered: