Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine pub_date_isim date extractions for date facet #785

Open
7 tasks
eporter23 opened this issue Jul 26, 2021 · 0 comments
Open
7 tasks

Refine pub_date_isim date extractions for date facet #785

eporter23 opened this issue Jul 26, 2021 · 0 comments
Labels
Application Development Index Epics and issues related to the index for Blacklight Discovery Layer Phase 2 Issues deferred to Phase 2 of the Blacklight Project Requires Reindexing

Comments

@eporter23
Copy link
Contributor

eporter23 commented Jul 26, 2021

Story: As a cataloging specialist, I want our index to exclude certain types of 008 dates, so that users do not experience date facet anomalies due to incorrectly coded data which may come from external sources

This ticket follows work done in #371. While testing the date facet, some records (especially those with multiple dates or complex dates) do not function as expected due to how they are originally coded and how we extract the data from the MARC records. In reviewing some potential cleanup reports with cataloging, we feel it may be more feasible to adjust our indexing process than try to correct large numbers of records with invalid date entries.

  • Review the 008 cases in the Date Facet - 008 Extraction Scenarios document:
  • Position 06 = c
  • Position 06 = d
  • Position 06 = n
  • Position 06 = r
  • Position 06 = s
  • Adjust our 008 date extraction logic as needed to implement the changes noted

Examples:

  • In some cases, records erroneously have values entered in the date1 and date 2 positions, but in fact are coded in the 06 position to contain a single date only (06 = s)
  • In some other cases, a record contains “0000” in the date 1 (position 7-10) or date 2 (11-14) positions of the 008 field. “0000” is not valid for the MARC specification for 008 and we should not store this value in our index.

References and mockups:

SOLR and Indexing

  • pub_date_isim - adjust indexing logic based on notes in the extraction scenarios document
@eporter23 eporter23 added Application Development Index Epics and issues related to the index for Blacklight Discovery Layer Requires Reindexing labels Jul 26, 2021
@lovinscari lovinscari added the Phase 2 Issues deferred to Phase 2 of the Blacklight Project label Jan 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Application Development Index Epics and issues related to the index for Blacklight Discovery Layer Phase 2 Issues deferred to Phase 2 of the Blacklight Project Requires Reindexing
Projects
None yet
Development

No branches or pull requests

2 participants