-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review Genre extraction of bibliographic entities #36
Comments
bibliographic records ID (5440d4e1-6a18-413d-a866-9364bf0c0e51) changed with the newer files but the ids(94449) used in the mapping to genre was the old ones. bibliographic records ID (5440d4e1-6a18-413d-a866-9364bf0c0e51) doesn't align with items in Also look into if the list for genre_map is being properly appended to. |
can confirm that Future todo potentially: establish if there's any weighting to be attached to help narrow down genres. Ex. Only use a genre if it's been associated with textscope 3+ times or something in cases when the genres are not correct and are just a one off mention referring to a different text. |
Next steps:
|
Sounds like a good catch.
I don’t follow the logic of the “todo” —is this to omit rarely mentioned genres from the list of those associated with an author?
On May 8, 2022, at 12:30 PM, Alliyya Mo ***@***.******@***.***>> wrote:
CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to ***@***.******@***.***>
can confirm that genre_map keys was not being appended to but was being overwritten with every file parsed, leading to some extra missed genres.
Future todo: establish if there's any weighting to be attached to help narrow down genres. Ex. Only use a genre if it's been associated with textscope 3+ times or something in cases when the genres are not correct and are just a one off mention referring to a different text.
—
Reply to this email directly, view it on GitHub<#36 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAEFJIFFU462NCWVUUXRU3DVI7T25ANCNFSM5UYV24BQ>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Essentially, yes. At a later point, we'd likely want to review the genres extracted and see how accurate they are. Example: we have a textscope that's mentioned in 5 different entries, and 4 entries use similar genres (ex. We can make up some rules that are like grab the 3 most common genres of text or only use a genre if it's associated with a text scope more than 2 times. It was of an idea requiring further investigation rather than a concrete TODO. |
see results from query
Is genre being extracted correctly? Potentially rextract.
Tasks:
The text was updated successfully, but these errors were encountered: