Remove few xpath redundancies in y-003, add drama exclusion #658
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I modified the xpath (see first one below) to catch instances of ancestor drama or letter that didn't match any of the rest of the exclusions, and ran it on the corpus. I got a dozen or so hits on various drama, mostly scenes at the very end of a dramatis personae (I didn't know that was a thing), and a handful of others.
I did not get any letters, however, and looking further at it I'm not sure how we would. We do have sections that are letters in our epistolary works, but the things in a letter that would end in lowercase, e.g. dateline, recipient, salutation, valediction, signature, etc., are all going to be contained in something that is already excluded, e.g. salutation, header, footer. If I understand correctly, we would have to have something in the body of the letter that validly ended in a lowercase letter, and I can't think of anything, unless it was in a container that is already excluded, e.g. a table. But, again, if there was such a thing, we don't have any in the corpus (if the below xpath is correct), so it's theoretical at the moment.
I therefore removed the z3998:letter exclusion from the code. It's easy enough to add back if you prefer it to stay.
xpath I used to test for drama/letter:
I ran a second xpath (see below) to see what shook out from p's having a class that didn't match any of the other exclusions. There were a total of ten books that had matches; most of them that had matches had multiple matches. Is 10 out of 960 worth an exclusion? I'll leave that with you; I did not make any change for this in this PR.
xpath I used for the class test: