Skip to content

Commit

Permalink
lint: Don't include table cells that are probably ditto marks when ch…
Browse files Browse the repository at this point in the history
…ecking t-001
  • Loading branch information
acabal committed Nov 19, 2024
1 parent 654a7c1 commit 3cdd53b
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 1 deletion.
6 changes: 5 additions & 1 deletion se/se_epub_lint.py
Original file line number Diff line number Diff line change
Expand Up @@ -3645,7 +3645,11 @@ def lint(self, skip_lint_ignore: bool, allowed_messages: Optional[List[str]] = N
headings.append((header_text, str(filename)))

# Check for double spacing
matches = regex.search(fr"[{se.NO_BREAK_SPACE}{se.HAIR_SPACE} ]{{2,}}", file_contents)
# First, remove any table cells which contain quotation marks followed by multiple spaces, as those are probably ditto marks.
dom_copy = deepcopy(dom)
for td_node in dom_copy.xpath(f"//td[re:test(., '”[{se.NO_BREAK_SPACE}{se.HAIR_SPACE} ]+”')]"):
td_node.remove()
matches = regex.search(fr"[{se.NO_BREAK_SPACE}{se.HAIR_SPACE} ]{{2,}}", dom_copy.to_string())
if matches:
double_spaced_files.append(filename)

Expand Down
23 changes: 23 additions & 0 deletions tests/lint/typography/t-001/in/src/epub/text/chapter-1.xhtml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,29 @@
<p>The first kaput cuticle is, in its own way, a tree. A sunflower is an underwear from the right perspective. A flashy sprout's hydrant comes with it the thought that the deictic freon is a cheque. ⁠ ⁠… Some sonless elements are thought of simply as caravans.</p>
<!-- ERROR 5, consecutive spaces on either side of an HTML comment -->
<p>Far from the truth, an innocent sees a glue as an unposed thumb. A Thursday of the lier is assumed to be an honied donna. Nowhere <!-- HTML comment --> is it disputed that the widest rutabaga comes from a flamy kendo.</p>
<table>
<tbody>
<tr>
<td/>
<th scope="rowgroup"><time datetime="1865">1865</time>.</th>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td>14</td>
<td>
<time datetime="1865-06-21">June 21</time>
</td>
<!-- VALID 5, ignore these ditto marks -->
<td>”       ”</td>
<td>Southeast face</td>
<td>11,200?</td>
<td>Guides⁠—Michel Croz, Christian Almer, Franz Biener; porter⁠—Luc Meynet. See <a href="chapter-15.xhtml#chapter-15-p-32">Chapter <span epub:type="z3998:roman">XV</span></a>.</td>
</tr>
</tbody>
</table>
</section>
</body>
</html>

0 comments on commit 3cdd53b

Please sign in to comment.