Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulkrax error for missing source_identifier #896

Open
KatharineV opened this issue Nov 22, 2024 · 8 comments
Open

Bulkrax error for missing source_identifier #896

KatharineV opened this issue Nov 22, 2024 · 8 comments

Comments

@KatharineV
Copy link
Collaborator

Team, I tried to run new importers thru Bulkrax for the first time since the cutover, and I got a "new" StandardError message that I've never seen before.

Error: StandardError - Missing at least one required element, missing element(s) are: source_identifier

Importer with Published Works and PDFs that failed: https://adl.b2.adventistdigitallibrary.org/importers/511?locale=en

Importer with Images and JPEGs that failed: https://adl.b2.adventistdigitallibrary.org/importers/512?locale=en

Image

I could use your help to figure out what's happening here. I've never used source_identifier as a column header before, and I don't know where that metadata would go, so I'm curious why it's required and now my spreadsheets are rejected because it's missing. Any insight will be helpful!

@KatharineV KatharineV converted this from a draft issue Nov 22, 2024
@KatharineV
Copy link
Collaborator Author

This is affecting staging too. I am now totally stumped. https://adl.s2.adventistdigitallibrary.org/importers/395?locale=en

@KatharineV
Copy link
Collaborator Author

Further testing on staging:
I added source_identifier as a column and for the data I just replicated the identifier and identifier.ark field content. The importer succeeded with a source_identifier column present, but other errors appeared.

Working importer with source_identifier column added:
https://adl.s2.adventistdigitallibrary.org/importers/395?locale=en

Sample work: https://adl.s2.adventistdigitallibrary.org/concern/conference_items/fca810d4-33e1-4ce6-b208-eb8b5d61cd6a

Problems/errors:

  1. The work came in as the wrong work type: CSV reads "image" and the work type rendered as "conferenceItem". I didn't notice this problem before, so perhaps the source_identifier jostled something in the backend.
  2. The identifier.ark data is missing from the metadata.
  3. Without identifier.ark, we don't have a human readable slug, which is part of our customizations that worked prior to this Bulkrax change.
  4. The source_identifier data is showing in the source field, which we already map data to from the source column, so now we have two values coming from two columns into a single field, and that's not good behavior for the way we use Source, which is a faceted term that we rely on very, very, very heavily to convey one and only one thing--institution of origin.
  5. And lastly, while I'm unsure if this is related, I see that the related_url links are not attaching JPEGs to these works. The URLs display in the related_url field, but they don't attach the file. Could this be tied to the same Bulkrax change? Because related_url has imported files for us up to this point, almost always without issue.

Source facet now all cluttered up with source_identifier data:
Image

@KatharineV
Copy link
Collaborator Author

Another problem with the source_identifier field behavior in our instance:
If I put source_identifier in the CSV with only one repeated value for multiple works, then Bulkrax imports only one work and assigns the source_identifier as the canonical identifier, which is not acceptable behavior in our instance. I'm not sure if the identifier.ark or the identifier field is the one that our instance has prioritized up to this point (if asked, we'd choose identifier.ark--that is the most stable ID we use across all our works).

https://testing.s2.adventistdigitallibrary.org/importers/42?locale=en

Image

@KatharineV
Copy link
Collaborator Author

Confirmed that Bulkrax only brings works in as Conference Items. I created a new importer with a CSV with work_type column containing every work type EXCEPT conferenceItem in the metadata. All six works imported as Conference Items.

https://testing.s2.adventistdigitallibrary.org/importers/43?locale=en

@laritakr
Copy link
Contributor

This error is originating from a Hyku update which changes how bulkrax gets its config mappings. Another hyku update will be required to more appropriately handle the mappings.

@KatharineV
Copy link
Collaborator Author

@laritakr based on our customizations, can you tell me which field is set up as the "required" field in place of source_identifier?

Image

@laritakr
Copy link
Contributor

I'm not sure... I would tend to think it would be "original_identifier" but I am not at all confident about that.

@bkiahstroud
Copy link

I can answer this one. @laritakr is correct; in that example documentation, the required field would be original_identifier. In Adventist's field mappings, it's identifier:

'identifier' => { from: ['identifier'], source_identifier: true },

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

3 participants