Skip to content

Commit

Permalink
Add lint tests y-026 thru y-030
Browse files Browse the repository at this point in the history
  • Loading branch information
vr8hub authored and acabal committed May 23, 2024
1 parent a513974 commit efb01d3
Show file tree
Hide file tree
Showing 11 changed files with 179 additions and 1 deletion.
2 changes: 1 addition & 1 deletion se/se_epub_lint.py
Original file line number Diff line number Diff line change
Expand Up @@ -3042,7 +3042,7 @@ def _lint_xhtml_typo_checks(filename: Path, dom: se.easy_xml.EasyXmlTree, file_c
if typos:
messages.append(LintMessage("y-027", "Possible typo: Extra [text]’[/] at end of paragraph.", se.MESSAGE_TYPE_WARNING, filename, typos))

# Check for `<abbr>` preceded or followed by text. Ignore compass directions followed by `ly`, like S.S.W.ly
# Check for `<abbr>` preceded or followed by text. Ignore plurals (e.g. TVs) and compass directions followed by `ly`, like S.S.W.ly
typos = [node.to_string() for node in dom.xpath("/html/body//abbr[(preceding-sibling::node()[1])[re:test(., '[A-Za-z]$')] or (following-sibling::node()[1])[re:test(., '^[A-Za-z](?<!s\\b)') and not((./preceding-sibling::abbr[1])[contains(@epub:type, 'se:compass')] and re:test(., '^ly\\b'))]]")]
if typos:
messages.append(LintMessage("y-028", "Possible typo: [xhtml]<abbr>[/] directly preceded or followed by letter.", se.MESSAGE_TYPE_WARNING, filename, typos))
Expand Down
12 changes: 12 additions & 0 deletions tests/lint/typos/y-026/golden/y-026-out.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
y-026 [Manual Review] chapter-1.xhtml Possible typo: no punctuation before
conjunction `But/And/For/Nor/Yet/Or`.
<p>Fabled daies But show us how belts can be heats.</p>
<p>Though we assume the latter, a heart can hardly be considered an
unsquared community And also being a sneeze.</p>
<p>An increase For a Monday is the right perspective.</p>
<p>The tailing scissor reveals itself Nor as an ungrown sister to those
who look.</p>
<p>This is not to discredit the idea Yet before arieses, springs were
only fleshes.</p>
<p>The tawie tramp comes from a jouncing sunshine Or, to be more
specific, the islands could be said to resemble cussed poultries.</p>
45 changes: 45 additions & 0 deletions tests/lint/typos/y-026/in/src/epub/text/chapter-1.xhtml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/, se: https://standardebooks.org/vocab/1.0" xml:lang="en-GB">
<head>
<title>I: Maybe and Yet Maybe Not</title>
<link href="../css/core.css" rel="stylesheet" type="text/css"/>
<link href="../css/local.css" rel="stylesheet" type="text/css"/>
</head>
<body epub:type="bodymatter z3998:fiction">
<section id="chapter-1" epub:type="chapter">
<hgroup>
<h2 epub:type="ordinal z3998:roman">I</h2>
<!-- EXCLUSION 1, p in hgroup -->
<p epub:type="title">Maybe and Yet Maybe Not</p>
</hgroup>
<!-- VALID 1, punctuation before conjunctions -->
<p>A church is a heathy donald. But sugars are flossy parties. And the enquiry of a note becomes a scurry board. For an acknowledgment is an unkinged geranium. Nor would the literature have us believe that a rattly work is not but a fan! Yet do we know that the dreamlike agreement comes from a rightist gondola? Or one cannot separate rakes from plotless raviolis.</p>
<!-- EXCLUSION 2, italic starting with conjunction -->
<p>However, their meeting was, in this moment, about <i epub:type="se:name.publication.book">But What Do We Do</i>, an immane sideboard. What we don't know for sure is whether <i epub:type="se:name.publication.play">And Not Grasshoppers</i> are raring manicures. If this was somewhat unclear <i epub:type="se:name.publication.magazine">For the Seashores</i> could be said to resemble fenny sidecars. As far as we can estimate, a crumbly cobweb's <i epub:type="se:name.vessel.ship">Nor a Bagpipe</i> comes with it the thought that the riftless rabbit is an army. The literature would have us believe <i epub:type="se:name.publication.newspaper">Yet a Creamlaid Date</i> is not but a beggar. A voice is a cleanly element <i epub:type="se:name.music.opera">Or the Enarched Teacher</i> comes from a composed berry.</p>
<!-- EXCLUSION 3, non-alpha before space before conjunction -->
<p>Far from the truth, But they were lost without the birchen apple that composed their work. In modern times an unstarched soldier without shrines: And is truly a australia of unraised casts. Authors often misinterpret; For the deposit as a glyptic snake, Nor when in actuality it feels more like an undrunk august. Some suspect mother-in-laws⁠—Yetare thought of simply as newsstands.</p>
<!-- EXCLUSION 4, rsquo after conjunction -->
<p>Upon the Captain’s coarse blue vest the cold raindrops started like steel beads; and he could hardly maintain himself aslant against the stiff Nor’-Wester that came pressing against him, importunate to topple him over the parapet, and throw him on the pavement below.</p>
<!-- EXCLUSION 5, period after conjunction -->
<p>That is what will be done, no Ifs, Ands, or But.</p>
<!-- EXCLUSION 6, question mark after conjunction -->
<p>What could we do, caught in the Now and not Yet?</p>
<!-- EXCLUSION 7, dash after conjunction -->
<p>“He is called Or-tis, Jemadar of Jemadars.”</p>
<!-- EXCLUSION 8, word joiner after conjunction -->
<p>The zippy shade reveals itself as a big Or⁠—the longer yew to those who look.</p>
<!-- FAIL 1, no punctuation before But -->
<p>Fabled daies But show us how belts can be heats.</p>
<!-- FAIL 2, no punctuation before And -->
<p>Though we assume the latter, a heart can hardly be considered an unsquared community And also being a sneeze.</p>
<!-- FAIL 3, no punctuation before For -->
<p>An increase For a Monday is the right perspective.</p>
<!-- FAIL 4, no punctuation before Nor -->
<p>The tailing scissor reveals itself Nor as an ungrown sister to those who look.</p>
<!-- FAIL 5, no punctuation before Yet -->
<p>This is not to discredit the idea Yet before arieses, springs were only fleshes.</p>
<!-- FAIL 6, no punctuation before Or -->
<p>The tawie tramp comes from a jouncing sunshine Or, to be more specific, the islands could be said to resemble cussed poultries.</p>
</section>
</body>
</html>
14 changes: 14 additions & 0 deletions tests/lint/typos/y-027/golden/y-027-out.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
y-027 [Manual Review] chapter-1.xhtml Possible typo: Extra `’` at end of
paragraph.
<p>“We can assume that any instance of an hour can be construed as a
plaguy stretch. This is not to discredit the idea that an inept family's coin
comes with it the thought that the goodish son is a show. Authors often
misinterpret the helmet as a useless lipstick, when in actuality it feels more
like a vying noodle. Unfortunately, that is wrong; on the contrary, authors
often misinterpret the waiter as an algoid seat, when in actuality it feels more
like an oaken land.”’</p>
<p>“The first maroon jury is, in its own way, a dolphin. We can assume
that any instance of a point can be construed as a cirrate protocol. The
serviced domain comes from a vinous vise. What we don't know for sure is whether
or not a sex is a sloughy boat. Some assert that a shock sees a michelle as a
hobnailed insect.” ’</p>
23 changes: 23 additions & 0 deletions tests/lint/typos/y-027/in/src/epub/text/chapter-1.xhtml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/, se: https://standardebooks.org/vocab/1.0" xml:lang="en-GB">
<head>
<title>I</title>
<link href="../css/core.css" rel="stylesheet" type="text/css"/>
<link href="../css/local.css" rel="stylesheet" type="text/css"/>
</head>
<body epub:type="bodymatter z3998:fiction">
<section id="chapter-1" epub:type="chapter">
<h2 epub:type="ordinal z3998:roman">I</h2>
<!-- VALID 1, paragraph beginning/ending with double-quotes -->
<p>“Some hitchy crows are thought of simply as statistics. Those readings are nothing more than cocoas. A dress of the great-grandmother is assumed to be a stingless half-sister.”</p>
<!-- EXCLUSION 1, dialog with inner quote -->
<p>“A waitress is a chartered shape. ‘Their cook was, in this moment, a plosive sagittarius. The first midmost system is, in its own way, a behavior. “Some implied waiters are thought of simply as virgos. The fruitless jam comes from an unkinged cyclone.” ’</p>
<!-- EXCLUSION 2, something other space between rdquo and rsquo -->
<p>“A witness sees a sphynx as a mordant frown. Recent controversy aside, a pillow is a lunge from the right perspective. The literature would have us believe that a watchful needle is not but a jail.”⁠—’</p>
<!-- FAIL 1, rsquo after rdquo -->
<p>“We can assume that any instance of an hour can be construed as a plaguy stretch. This is not to discredit the idea that an inept family's coin comes with it the thought that the goodish son is a show. Authors often misinterpret the helmet as a useless lipstick, when in actuality it feels more like a vying noodle. Unfortunately, that is wrong; on the contrary, authors often misinterpret the waiter as an algoid seat, when in actuality it feels more like an oaken land.”’</p>
<!-- FAIL 2, whitespace/rsquo after rdquo -->
<p>“The first maroon jury is, in its own way, a dolphin. We can assume that any instance of a point can be construed as a cirrate protocol. The serviced domain comes from a vinous vise. What we don't know for sure is whether or not a sex is a sloughy boat. Some assert that a shock sees a michelle as a hobnailed insect.” ’</p>
</section>
</body>
</html>
4 changes: 4 additions & 0 deletions tests/lint/typos/y-028/golden/y-028-out.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
y-028 [Manual Review] chapter-1.xhtml Possible typo: `<abbr>` directly preceded
or followed by letter.
<abbr epub:type="z3998:initialism">I.B.M.</abbr>
<abbr epub:type="z3998:initialism">P.D.Q.</abbr>
23 changes: 23 additions & 0 deletions tests/lint/typos/y-028/in/src/epub/text/chapter-1.xhtml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/, se: https://standardebooks.org/vocab/1.0" xml:lang="en-GB">
<head>
<title>I</title>
<link href="../css/core.css" rel="stylesheet" type="text/css"/>
<link href="../css/local.css" rel="stylesheet" type="text/css"/>
</head>
<body epub:type="bodymatter z3998:fiction">
<section id="chapter-1" epub:type="chapter">
<h2 epub:type="ordinal z3998:roman">I</h2>
<!-- VALID 1, abbreviation without letters on either side. -->
<p>Recent controversy aside, the <abbr epub:type="z3998:initialism">A.T.V.</abbr> of a barbara becomes an unmourned pharmacist.</p>
<!-- EXCLUSION 1, plural abbreviation -->
<p>A step-aunt is a softdrink from the <abbr epub:type="z3998:initialism">TV</abbr>s perspective.</p>
<!-- EXCLUSION 2, compass abbreviation followed by "ly" -->
<p>In the neighbourhood of their South Pole Camp the drifts were <abbr epub:type="se:compass">S. W.</abbr>ly.</p>
<!-- FAIL 1, abbreviation immediately preceded by text -->
<p>A muscid giraffe without<abbr epub:type="z3998:initialism">I.B.M.</abbr> is truly a instruction of baldish cables. A camp is a musician's pizza. Some rhotic bikes are thought of simply as engineers.</p>
<!-- FAIL 2, abbreviation immediately followed by text -->
<p>Steer the feet, get the card board, and twist the pupils to the <abbr epub:type="z3998:initialism">P.D.Q.</abbr>est show ever.</p>
</section>
</body>
</html>
4 changes: 4 additions & 0 deletions tests/lint/typos/y-029/golden/y-029-out.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
y-029 [Manual Review] chapter-1.xhtml Possible typo: Italics followed by a
letter.
<i>caw-caw-caw-caw</i>
<em>Is</em>
27 changes: 27 additions & 0 deletions tests/lint/typos/y-029/in/src/epub/text/chapter-1.xhtml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/, se: https://standardebooks.org/vocab/1.0" xml:lang="en-GB">
<head>
<title>I</title>
<link href="../css/core.css" rel="stylesheet" type="text/css"/>
<link href="../css/local.css" rel="stylesheet" type="text/css"/>
</head>
<body epub:type="bodymatter z3998:fiction">
<section id="chapter-1" epub:type="chapter">
<h2 epub:type="ordinal z3998:roman">I</h2>
<!-- VALID 1, italics/emphasis with no immediately following letters -->
<p>Some posit the <em>wormy</em> carnation to be less than younger. As far as we can estimate, a scalelike taurus is a parcel of the mind. Few can name a <i>zinky</i> india that isn't a backhand punch.</p>
<!-- EXCLUSION 1, italics plural -->
<p>This could be, or perhaps an ease of the shoe is assumed to be a pesky creator. In recent years, before donkeies, <i>servant</i>s were only surprises. They were lost without the classy propane that composed their cd. An ophthalmologist is a shelf from the right perspective.</p>
<!-- EXCLUSION 2, emphasis plural -->
<p>An almanac sees a leg as a spermic postbox. Some posit the unglazed disgust to be less than dun. The gas is a stage. <em>Menu</em>s are a lucid liver.</p>
<!-- EXCLUSION 3, italics immediately followed by multiple letters -->
<p>They were lost without the <i>surbas</i>ed condition that composed their blue. As far as we can estimate, sweated drives show us how indias can be strangers. The faucet is a poland.</p>
<!-- EXCLUSION 4, emphasis immediately followed by multiple letters -->
<p>A point can <em>hard</em>ly be considered a nutmegged oxygen without also being a Thursday.</p>
<!-- FAIL 1, italic immediately followed by letter -->
<p>The crow <i>caw-caw-caw-caw</i>d at him.</p>
<!-- FAIL 2, emphasis immediately followed by letter -->
<p><em>Is</em>n’t he a ninny?” sighed Fanny.</p>
</section>
</body>
</html>
7 changes: 7 additions & 0 deletions tests/lint/typos/y-030/golden/y-030-out.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
y-030 [Manual Review] chapter-1.xhtml Possible typo: Lowercase quotation
following a period. Check either that the period should be a comma, or that the
quotation should start with a capital.
To be more specific, a manx is a swordfish's edger. “the trout” of a
part becomes a starchy sprout.
“Before birthdaies, bladders were only bongos. ‘recent controversy’
aside, a rotate is a footnote from the right perspective.
19 changes: 19 additions & 0 deletions tests/lint/typos/y-030/in/src/epub/text/chapter-1.xhtml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/, se: https://standardebooks.org/vocab/1.0" xml:lang="en-GB">
<head>
<title>I</title>
<link href="../css/core.css" rel="stylesheet" type="text/css"/>
<link href="../css/local.css" rel="stylesheet" type="text/css"/>
</head>
<body epub:type="bodymatter z3998:fiction">
<section id="chapter-1" epub:type="chapter">
<h2 epub:type="ordinal z3998:roman">I</h2>
<!-- VALID 1, period followed by capitalized quotation -->
<p>In modern times a gorilla of the population is assumed to be a queasy cousin. “A pot is a cuticle's quill.”</p>
<!-- FAIL 1, period followed by ldquo and lowercase letter -->
<p>To be more specific, a manx is a swordfish's edger. “the trout” of a part becomes a starchy sprout.</p>
<!-- FAIL 2, period followed by lsquo and lowercase letter -->
<p>“Before birthdaies, bladders were only bongos. ‘recent controversy’ aside, a rotate is a footnote from the right perspective.</p>
</section>
</body>
</html>

0 comments on commit efb01d3

Please sign in to comment.