You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This has the drawback that we sometimes ends-up with unwanted front pages. Typical use case is all iframes which are meant to only be embedded within a page.
I think this could easily be solved with an additional CLI parameter containing an is_front_exclude regex on ZIM path that must not be marked is_front. I don't think having an is_front_include is necessary.
The text was updated successfully, but these errors were encountered:
Good point, we might even already have the information in the WARC. I don't remember exactly when / where we discussed this. Probably just using this information is serving at least 80% of the need here and in an automated way which is way superior. To be investigated
Currently, the fact that a ZIM item is marked
is_front
is purely based on the item mimetype:warc2zim/src/warc2zim/items.py
Lines 58 to 62 in 5de5d0e
This has the drawback that we sometimes ends-up with unwanted front pages. Typical use case is all iframes which are meant to only be embedded within a page.
I think this could easily be solved with an additional CLI parameter containing an
is_front_exclude
regex on ZIM path that must not be markedis_front
. I don't think having anis_front_include
is necessary.The text was updated successfully, but these errors were encountered: