Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character encoding problem in Aboutware #1560

Closed
conorom opened this issue Feb 13, 2018 · 2 comments
Closed

Character encoding problem in Aboutware #1560

conorom opened this issue Feb 13, 2018 · 2 comments
Assignees
Milestone

Comments

@conorom
Copy link
Contributor

conorom commented Feb 13, 2018

Any non-standard (maybe non-ASCII?!) characters in the Aboutware will give an error 500 on the page. This seems to include smart quotes etc as well as non-English characters. The log looks like this:

I, [2018-02-13T14:16:21.068214 #13356]  INFO -- : [04d0c734-26cd-4c22-a53e-da491b5ba63c]   Rendered monograph_catalog/_index_monograph.html.erb (1346.2ms)
I, [2018-02-13T14:16:21.068396 #13356]  INFO -- : [04d0c734-26cd-4c22-a53e-da491b5ba63c]   Rendered monograph_catalog/index.html.erb within layouts/curation_concerns/catalog (1352.4ms)
I, [2018-02-13T14:16:21.068913 #13356]  INFO -- : [04d0c734-26cd-4c22-a53e-da491b5ba63c] Completed 500 Internal Server Error in 1661ms (ActiveRecord: 20.3ms)
F, [2018-02-13T14:16:21.070673 #13356] FATAL -- : [04d0c734-26cd-4c22-a53e-da491b5ba63c]   
F, [2018-02-13T14:16:21.070764 #13356] FATAL -- : [04d0c734-26cd-4c22-a53e-da491b5ba63c] ActionView::Template::Error (incompatible character encodings: UTF-8 and ASCII-8BIT):
F, [2018-02-13T14:16:21.071159 #13356] FATAL -- : [04d0c734-26cd-4c22-a53e-da491b5ba63c]     137:         <div role="tabpanel" class="tab-pane aboutware row" id="aboutware">
[04d0c734-26cd-4c22-a53e-da491b5ba63c]     138:           <div class="col-sm-12">
[04d0c734-26cd-4c22-a53e-da491b5ba63c]     139:             <!-- this assumes that the "aboutware" FileSet is an html doc -->
[04d0c734-26cd-4c22-a53e-da491b5ba63c]     140:             <%= @monograph_presenter.aboutware.file.content.html_safe %>
[04d0c734-26cd-4c22-a53e-da491b5ba63c]     141:           </div>
[04d0c734-26cd-4c22-a53e-da491b5ba63c]     142:         </div>
[04d0c734-26cd-4c22-a53e-da491b5ba63c]     143:       <% end %>
F, [2018-02-13T14:16:21.071240 #13356] FATAL -- : [04d0c734-26cd-4c22-a53e-da491b5ba63c]   
F, [2018-02-13T14:16:21.071322 #13356] FATAL -- : [04d0c734-26cd-4c22-a53e-da491b5ba63c] app/views/monograph_catalog/_index_monograph.html.erb:140:in `_app_views_monograph_catalog__index_monograph_html_erb___2875450765774053515_47447254155840'

Googling the error above gives hits for folks having similar problems (often when the characters are in a DB and they print them straight onto the page). Some report that forcing the encoding like this works:
<%= monograph_presenter.aboutware.file.content.force_encoding("UTF-8").html_safe %>

I guess this is the first thing to try, but it would be nice to know why this is needed.

@conorom
Copy link
Contributor Author

conorom commented Mar 1, 2018

There is a thread in Slack about this where this simple solution was proposed (never tested).
<%= monograph_presenter.aboutware.file.content.force_encoding("UTF-8").html_safe %>

This is something you'll find when you search for the error above. Most people seem to encounter it when the offending chars are stored in, e.g. the Rails MySQL DB (in UTF8) and then unceremoniously dumped on the UTF8-encoded page (like we do from Fedora). This might well just make it work. Would be nice to know why it is needed though.

@moseshll
Copy link
Contributor

moseshll commented Mar 6, 2018

Hydra::PCDM::File seems unable to get the encoding right when its remote_content method is called, even if Fedora seems to have the right MIME type/encoding set when queried directly on http://127.0.0.1:8984. The incorrect default seems to be ASCII-8BIT. There is some discussion here: samvera/hyrax#1089 but I am dubious this is actually fixed.

Will go ahead with Conor's force_encoding() suggestion.

@moseshll moseshll added in progress and removed ready labels Mar 6, 2018
conorom added a commit that referenced this issue Mar 7, 2018
Addresses #1560 character encoding problem in Aboutware.
@conorom conorom closed this as completed Mar 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants