Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 500 errors for accessing the JSON of valid records #2894

Open
afred opened this issue Dec 18, 2024 · 0 comments · May be fixed by #2895
Open

Fix 500 errors for accessing the JSON of valid records #2894

afred opened this issue Dec 18, 2024 · 0 comments · May be fixed by #2895
Labels

Comments

@afred
Copy link
Contributor

afred commented Dec 18, 2024

To replicate

Attempt to hit the .json endpoint for a record that has UTF-8 characters that are non-ASCII, like here.

Expected result (user story)

As a website user
When I hit the JSON endpoint at `/catalog/{id}.json`
And when the PBCore XML for the record contains UTF-8 encoded, non-ASCII characters
The PBCore JSON displays with the UTF-8 characters
In order to have a reliable JSON endpoint for valid records.

Actual result

500 Internal Server Error
If you are the administrator of this website, then please read this web application's log file and/or the web server's log file to find out what went wrong.

Background

  1. At the time of this writing, if you look for the string "Children’s" in the PBCore of this record, you can see the special apostrophe.
  2. Under the hood, this is represented within the byte string as "\xE2", "\x80", "\x99", but the string being processed within the ruby code is (for reasons Idk yet) is being read as binary, without the UTF-8 encoding.
  3. Then, when the XML is attempted to be converted to JSON (via an XSLT transformation performed by Nokogiri XML tool), we get a 500 error, because whatever is doing the processing apparently doesn't have the encoding required to interpret those 3 bytes, but needs to be able to for some reason (again, idk why yet). But this is what throws the
  4. We've had encoding issues like this before, and we've always concluded that the more we can stick with UTF-8 characters, without having to replace them, the better. Thus, if it's UTF-8 coming in, and UTF-8 coming out, then making sure strings are using the right encoding throughout should help fix/avoid this problem.

Possible solutions

  1. use xml_str.force_encoding('UTF-8') on the string prior to translating with the XSLT stylesheet.
  2. Find where the string is being read as binary (i.e. during the XSLT transformation or before), and see if there's a param to pass to set the encoding.

Done when

User story above is satisfied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant