Fix 500 errors for accessing the JSON of valid records #2894

afred · 2024-12-18T17:52:25Z

To replicate

Attempt to hit the .json endpoint for a record that has UTF-8 characters that are non-ASCII, like here.

Expected result (user story)

As a website user
When I hit the JSON endpoint at `/catalog/{id}.json`
And when the PBCore XML for the record contains UTF-8 encoded, non-ASCII characters
The PBCore JSON displays with the UTF-8 characters
In order to have a reliable JSON endpoint for valid records.

Actual result

500 Internal Server Error
If you are the administrator of this website, then please read this web application's log file and/or the web server's log file to find out what went wrong.

Background

At the time of this writing, if you look for the string "Children’s" in the PBCore of this record, you can see the special apostrophe.
Under the hood, this is represented within the byte string as "\xE2", "\x80", "\x99", but the string being processed within the ruby code is (for reasons Idk yet) is being read as binary, without the UTF-8 encoding.
Then, when the XML is attempted to be converted to JSON (via an XSLT transformation performed by Nokogiri XML tool), we get a 500 error, because whatever is doing the processing apparently doesn't have the encoding required to interpret those 3 bytes, but needs to be able to for some reason (again, idk why yet). But this is what throws the
We've had encoding issues like this before, and we've always concluded that the more we can stick with UTF-8 characters, without having to replace them, the better. Thus, if it's UTF-8 coming in, and UTF-8 coming out, then making sure strings are using the right encoding throughout should help fix/avoid this problem.

Possible solutions

use xml_str.force_encoding('UTF-8') on the string prior to translating with the XSLT stylesheet.
Find where the string is being read as binary (i.e. during the XSLT transformation or before), and see if there's a param to pass to set the encoding.

Done when

User story above is satisfied.

The text was updated successfully, but these errors were encountered:

Fixes #2894.

afred added the bug 🐛 label Dec 18, 2024

afred added a commit that referenced this issue Dec 18, 2024

Forces encoding of XML to UTF-8 prior to converting to JSON

cf67d36

Fixes #2894.

afred linked a pull request Dec 18, 2024 that will close this issue

Forces encoding of XML to UTF-8 prior to converting to JSON #2895

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 500 errors for accessing the JSON of valid records #2894

Fix 500 errors for accessing the JSON of valid records #2894

afred commented Dec 18, 2024

Fix 500 errors for accessing the JSON of valid records #2894

Fix 500 errors for accessing the JSON of valid records #2894

Comments

afred commented Dec 18, 2024

To replicate

Expected result (user story)

Actual result

Background

Possible solutions

Done when