Skip to content

Commit

Permalink
Make the reading ease score cope better with free-form poetry
Browse files Browse the repository at this point in the history
When processing the text to calculate the reading ease score, we can first preprocess it to allow for verse that doesn’t end with punctuation. If we add a full stop at the end of verses just for the reading ease algorithm then we get more normal scores, and it shouldn’t affect other productions in a meaningful way.

1. Mina Loy’s Poetry goes from -128.58 to 42.45
2. William Carlos William’s Poetry goes from 79.5 to 79.6
3. Laurence Sterne’s Tristram Shandy goes from 51.35 to 51.55
4. Every other repo I’ve tried has no change.
  • Loading branch information
robinwhittleton authored and acabal committed Mar 1, 2024
1 parent 5fdcc0e commit 9c5b73d
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions se/formatting.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,11 @@ def get_flesch_reading_ease(xhtml: str) -> float:
A float representing the Flesch reading ease of the text.
"""

# Add a full stop to sentences that don’t end in punctuation
# This is primarily for free-form poetry like Mina Loy’s, where the
# reading score can end up being extremely low without this.
xhtml = regex.sub(r"([A-Za-z])(<\/span>\n)*\s*</p>", r"\1.\2</p>", xhtml)

# Remove HTML tags
text = regex.sub(r"<title>.+?</title>", " ", xhtml)
text = regex.sub(r"<.+?>", " ", text, flags=regex.DOTALL)
Expand Down

0 comments on commit 9c5b73d

Please sign in to comment.