-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #30 from mlibrary/remediated-sh-headings
Remediated sh headings
- Loading branch information
Showing
28 changed files
with
699 additions
and
37 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
name: Update subject headings config file | ||
|
||
on: | ||
workflow_dispatch: | ||
schedule: | ||
- cron: '0 8 1 * *' #8AM first of the month | ||
|
||
|
||
jobs: | ||
update_subject_headings: | ||
runs-on: ubuntu-latest | ||
outputs: | ||
sha: ${{ steps.cpr.outputs.pull-request-head-sha }} | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- name: Create .env file | ||
run: cat env.* > .env | ||
- name: Load .env file | ||
uses: xom9ikk/dotenv@v2 | ||
- name: Set up Ruby 3.3 | ||
uses: ruby/setup-ruby@v1 | ||
with: | ||
ruby-version: '3.3' | ||
bundler-cache: true | ||
- name: set path | ||
run: | | ||
echo "$GITHUB_WORKSPACE/exe" >> $GITHUB_PATH | ||
- name: get update | ||
env: | ||
ALMA_API_KEY: ${{ secrets.ALMA_API_KEY }} | ||
SUBJECT_HEADING_REMEDIATION_SET_ID: ${{ vars.SUBJECT_HEADING_REMEDIATION_SET_ID }} | ||
run: browse subjects generate_remediated_authorities_file | ||
- name: Create Pull Request | ||
id: cpr | ||
uses: peter-evans/create-pull-request@v6 | ||
with: | ||
commit-message: "update remediated subject headings config file" | ||
title: Update remediated subject headings config file | ||
reviewers: niquerio |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,6 +15,7 @@ authority_browse.zip | |
reports/ | ||
*.db | ||
.solargraph.yml | ||
*.sql | ||
|
||
tmp/* | ||
!tmp/.keep |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
```mermaid | ||
flowchart TD | ||
A[Set up Subject Authorities DB] --> B[Iterate over remediated authority records\nto add remediated headings to subjects table] | ||
B --> C[Iterate over remediated authority records again.\n Add see_instead xrefs and broader/narrower xrefs] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,3 @@ | ||
BIBLIO_SOLR="http://YOUR_SOLR_URL/solr/biblio" | ||
ALMA_API_KEY="YOUR_API_KEY" | ||
SUBJECT_HEADING_REMEDIATION_SET_ID="YOUR_SET_ID" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,163 @@ | ||
module AuthorityBrowse | ||
class RemediatedSubjects | ||
include Enumerable | ||
|
||
# List of RemediatedSubjects::Entriees | ||
# @param file_path [String] Path to config file with remediated subjects | ||
# info | ||
def initialize(file_path = S.remediated_subjects_file) | ||
xml_lines = File.readlines(file_path) | ||
@entries = xml_lines.map do |line| | ||
Entry.new(line) | ||
end | ||
end | ||
|
||
def each(&block) | ||
@entries.each(&block) | ||
end | ||
|
||
class Entry | ||
# An Authority Record Entry | ||
# @param xml [String] Authority Record MARCXML String | ||
def initialize(xml) | ||
@record = MARC::XMLReader.new(StringIO.new(xml)).first | ||
end | ||
|
||
def id | ||
@record["001"].value | ||
end | ||
|
||
def preferred_term | ||
@preferred_term ||= Term::Preferred.new(@record["150"]) | ||
end | ||
|
||
# Returns the cross references found in the 450 and 550 fields | ||
# @return [Array<Term>] An Array of xref terms | ||
def xrefs | ||
@record.fields(["450", "550"]).map do |field| | ||
[Term::SeeInstead, Term::Broader, Term::Narrower].find do |kind| | ||
kind.match?(field) | ||
end&.new(field) | ||
end.compact | ||
end | ||
|
||
# Adds the preferred term and xrefs to the subjects and subjects_xrefs | ||
# db tables | ||
# @return [Nil] | ||
def add_to_db | ||
preferred_term.add_to_db(id) | ||
xrefs.each do |xref| | ||
xref.add_to_db(id) | ||
end | ||
end | ||
end | ||
|
||
class Term | ||
# @param field [MARC::DataField] A subject term field | ||
def initialize(field) | ||
@field = field | ||
end | ||
|
||
# What kind of field it is. It's used for setting the xref_kind in the subjects_xrefs table. | ||
def kind | ||
raise NotImplementedError | ||
end | ||
|
||
# This is the first step in adding the xref to term to the database. It's | ||
# overwritten for a PreferredTerm. The check for id and match_text is to | ||
# make sure the id isn't already in the db. If the id given is the match | ||
# text that means the term isn't in the db. | ||
# | ||
# @param preferred_term_id [[TODO:type]] [TODO:description] | ||
def add_to_db(preferred_term_id) | ||
if id == match_text | ||
AuthorityBrowse.db[:subjects].insert(id: id, label: label, match_text: match_text, deprecated: false) | ||
end | ||
end | ||
|
||
# @return [String] | ||
def label | ||
@field.subfields | ||
.filter_map do |x| | ||
x.value if ["a", "v", "x", "y", "z"].include?(x.code) | ||
end | ||
.join("--") | ||
end | ||
|
||
# @return [String] | ||
def match_text | ||
AuthorityBrowse::Normalize.match_text(label) | ||
end | ||
|
||
# @return [String] | ||
def id | ||
AuthorityBrowse.db[:subjects]&.first(match_text: match_text)&.dig(:id) || match_text | ||
end | ||
|
||
class Preferred < Term | ||
# Adds the preferred term to the db | ||
# | ||
# @return nil | ||
def add_to_db(id) | ||
AuthorityBrowse.db[:subjects].insert(id: id, label: label, match_text: match_text, deprecated: false) | ||
end | ||
end | ||
|
||
class SeeInstead < Term | ||
def self.match?(field) | ||
field.tag == "450" | ||
end | ||
|
||
def kind | ||
"see_instead" | ||
end | ||
|
||
# @param preferred_term_id [String] | ||
# @return [Nil] | ||
def add_to_db(preferred_term_id) | ||
super | ||
xrefs = AuthorityBrowse.db[:subjects_xrefs] | ||
xrefs.insert(subject_id: id, xref_id: preferred_term_id, xref_kind: kind) | ||
end | ||
end | ||
|
||
class Broader < Term | ||
def self.match?(field) | ||
field.tag == "550" && field["w"] == "g" | ||
end | ||
|
||
def kind | ||
"broader" | ||
end | ||
|
||
# @param preferred_term_id [String] | ||
# @return [Nil] | ||
def add_to_db(preferred_term_id) | ||
super | ||
xrefs = AuthorityBrowse.db[:subjects_xrefs] | ||
xrefs.insert(subject_id: preferred_term_id, xref_id: id, xref_kind: kind) | ||
xrefs.insert(subject_id: id, xref_id: preferred_term_id, xref_kind: "narrower") | ||
end | ||
end | ||
|
||
class Narrower < Term | ||
def self.match?(field) | ||
field.tag == "550" && field["w"] == "h" | ||
end | ||
|
||
def kind | ||
"narrower" | ||
end | ||
|
||
# @param preferred_term_id [String] | ||
# @return [Nil] | ||
def add_to_db(preferred_term_id) | ||
super | ||
xrefs = AuthorityBrowse.db[:subjects_xrefs] | ||
xrefs.insert(subject_id: preferred_term_id, xref_id: id, xref_kind: kind) | ||
xrefs.insert(subject_id: id, xref_id: preferred_term_id, xref_kind: "broader") | ||
end | ||
end | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.