Skip to content

Commit

Permalink
Forces UTF-8 encoding on ALTO XML. (#2298)
Browse files Browse the repository at this point in the history
* Forces UTF-8 encoding on ALTO XML.

* Adds comment to kickstart Circle CI.

* Switch encoding methods.

* Try to stop errors.

* Changes logic for delivering UV search.

* Fixes rspec.

* Overrides controller action to allow CORS.

* Corrects Override message.

* Adds parent modules to methods.

* Corrected the use of newer code.

* Try exact website.

* Override header setter instead.

* Revert override.

* enforce status code.

* Tries to forgo auth.

* Try something else.

* Another go.
  • Loading branch information
bwatson78 authored Jan 2, 2025
1 parent 881e304 commit 3d779de
Show file tree
Hide file tree
Showing 4 changed files with 7 additions and 6 deletions.
1 change: 1 addition & 0 deletions app/controllers/catalog_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ def self.modified_field

# CatalogController-scope behavior and configuration for BlacklightIiifSearch
include BlacklightIiifSearch::Controller
skip_before_action :authenticate_user!, only: :iiif_search

configure_blacklight do |config|
# configuration for Blacklight IIIF Content Search
Expand Down
5 changes: 3 additions & 2 deletions app/models/file_set.rb
Original file line number Diff line number Diff line change
Expand Up @@ -87,13 +87,14 @@ def preferred_file
end
end

# The two methods below err when storing text in Solr, so forcing UTF-8 encoding removes errant text (most likely ASCII).
def alto_xml
return extracted&.content if extracted&.file_name&.first&.include?('.xml')
return extracted&.content&.force_encoding('UTF-8')&.encode("UTF-8", invalid: :replace, replace: "") if extracted&.file_name&.first&.include?('.xml')
nil
end

def transcript_text
transcript_file&.content&.force_encoding('UTF-8') if transcript_file&.file_name&.first&.include?('.txt')
transcript_file&.content&.force_encoding('UTF-8')&.encode("UTF-8", invalid: :replace, replace: "") if transcript_file&.file_name&.first&.include?('.txt')
end

private
Expand Down
2 changes: 1 addition & 1 deletion app/views/manifest/manifest.json.jbuilder
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ end
# within the Work to activate, but each text-optimized FileSet's alto_xml_tesi,
# transcript_text_tesi, and is_page_of_ssi fields must also be indexed for normal
# searching functions.
if @solr_doc['all_text_tsimv'].present?
if @image_concerns.any? { |id| SolrDocument&.find(id)&.[]('alto_xml_tesi')&.present? }
json.service do
json.child! do
json.set! :@context, 'http://iiif.io/api/search/0/context.json'
Expand Down
5 changes: 2 additions & 3 deletions spec/views/manifest/manifest.json.jbuilder_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -96,10 +96,9 @@
expect(work.file_sets.count).to eq 5
end

context 'when all_text_tsimv is present' do
let(:solr_document) { SolrDocument.new(attributes.merge('all_text_tsimv' => 'So much text!')) }

context 'when @image_concerns contains values in alto_xml_tesi' do
it 'renders a IIIF Search service' do
allow(image_concerns).to receive(:any?).and_return(true)
render
parsed_rendered_manifest = JSON.parse(rendered)

Expand Down

0 comments on commit 3d779de

Please sign in to comment.