Review Genre extraction of bibliographic entities #36

alliyya · 2022-04-30T18:29:26Z

alliyya · 2022-04-30T18:37:46Z

this relates to https://gitlab.com/calincs/infrastructure/vocabularies/-/issues/9

alliyya · 2022-05-08T13:31:11Z

bibliographic records ID (5440d4e1-6a18-413d-a866-9364bf0c0e51) changed with the newer files but the ids(94449) used in the mapping to genre was the old ones.

bibliographic records ID (5440d4e1-6a18-413d-a866-9364bf0c0e51) doesn't align with items in
genre_map ('94449': ['NOVEL', 'DETECTIVE'])

Also look into if the list for genre_map is being properly appended to.

alliyya · 2022-05-08T16:30:27Z

can confirm that genre_map keys was not being appended to but was being overwritten with every file parsed, leading to some extra missed genres.

Future todo potentially: establish if there's any weighting to be attached to help narrow down genres. Ex. Only use a genre if it's been associated with textscope 3+ times or something in cases when the genres are not correct and are just a one off mention referring to a different text.

alliyya · 2022-05-08T16:31:28Z

Next steps:

update genre_map lists to be appended to instead of overwritten.

SusanBrown · 2022-05-08T20:13:30Z

Sounds like a good catch. I don’t follow the logic of the “todo” —is this to omit rarely mentioned genres from the list of those associated with an author? On May 8, 2022, at 12:30 PM, Alliyya Mo ***@***.******@***.***>> wrote: CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to ***@***.******@***.***> can confirm that genre_map keys was not being appended to but was being overwritten with every file parsed, leading to some extra missed genres. Future todo: establish if there's any weighting to be attached to help narrow down genres. Ex. Only use a genre if it's been associated with textscope 3+ times or something in cases when the genres are not correct and are just a one off mention referring to a different text. — Reply to this email directly, view it on GitHub<#36 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAEFJIFFU462NCWVUUXRU3DVI7T25ANCNFSM5UYV24BQ>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

alliyya · 2022-05-09T10:27:50Z

I don’t follow the logic of the “todo” —is this to omit rarely mentioned genres from the list of those associated with an author?

Essentially, yes. At a later point, we'd likely want to review the genres extracted and see how accurate they are.

Example: we have a textscope that's mentioned in 5 different entries, and 4 entries use similar genres (ex. cwrc:letter and cwrc:romance) to describe it but 1 entry uses a genre that doesn't align or make sense for the particular work (cwrc:dictionary).

We can make up some rules that are like grab the 3 most common genres of text or only use a genre if it's associated with a text scope more than 2 times.

It was of an idea requiring further investigation rather than a concrete TODO.

This reverts commit 0fd024e.

alliyya added the type:bug label Apr 30, 2022

alliyya self-assigned this Apr 30, 2022

alliyya closed this as completed Apr 30, 2022

alliyya reopened this Apr 30, 2022

alliyya added Conversion: CWRC This is related to the conversion process using the CWRC ontologies. (Classic Branch) Conversion: LINCS This is related to the conversion process using CIDOC-CRM and the CWRC vocabularies. (Main Branch) labels May 8, 2022

alliyya added a commit that referenced this issue May 8, 2022

Using REF attribute in genre mapping when possible #36

c7e6f05

alliyya added a commit that referenced this issue May 9, 2022

appending properly to genre_map #36

0fd024e

alliyya removed the Conversion: CWRC This is related to the conversion process using the CWRC ontologies. (Classic Branch) label May 11, 2022

alliyya added a commit that referenced this issue May 17, 2022

Updating genre mapping #36

6c4c32f

alliyya added a commit that referenced this issue Jul 19, 2022

Revert "appending properly to genre_map #36"

2fd3134

This reverts commit 0fd024e.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review Genre extraction of bibliographic entities #36

Review Genre extraction of bibliographic entities #36

alliyya commented Apr 30, 2022 •

edited

Loading

alliyya commented Apr 30, 2022

alliyya commented May 8, 2022

alliyya commented May 8, 2022 •

edited

Loading

alliyya commented May 8, 2022

SusanBrown commented May 8, 2022 via email

alliyya commented May 9, 2022

Review Genre extraction of bibliographic entities #36

Review Genre extraction of bibliographic entities #36

Comments

alliyya commented Apr 30, 2022 • edited Loading

alliyya commented Apr 30, 2022

alliyya commented May 8, 2022

alliyya commented May 8, 2022 • edited Loading

alliyya commented May 8, 2022

SusanBrown commented May 8, 2022 via email

alliyya commented May 9, 2022

alliyya commented Apr 30, 2022 •

edited

Loading

alliyya commented May 8, 2022 •

edited

Loading