Skip to content

Commit

Permalink
cleaner: keep HTML table structure more intact
Browse files Browse the repository at this point in the history
  • Loading branch information
ACA committed Mar 3, 2024
1 parent e747a7f commit 4e2036d
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions lncrawl/core/cleaner.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,13 @@ def __init__(self) -> None:
# the attributes to keep while cleaning a tag
"src",
"style",
# table and table children attributes
"colspan",
"rowspan",
"headers",
"scope",
"axis",
"id", # id required for headers ref
]
)
self.whitelist_css_property: Set[str] = set(
Expand Down

0 comments on commit 4e2036d

Please sign in to comment.