Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CONSERVATIVE_WHITESPACE_ON_MINIFYING to retain inline text spaces #148

Merged
merged 1 commit into from
Jan 11, 2021

Conversation

tisdall
Copy link
Contributor

@tisdall tisdall commented Jan 8, 2021

I was looking into the issue outlined in #121 and #21 as it's affecting me. Essentially the current system of completely removing whitespace in a text flow has some bugs and it's not easy to fix. So, in an effort to at least get things working I've added a CONSERVATIVE_WHITESPACE_ON_MINIFYING option to turn off eagerly removing whitespace and always leave at least 1 space (I'm also hoping that feature may be useful to someone else for other purposes). The default is to continue working as before for backwards compatibility.

The slightly longer explanation

If you try minifying a <i>b</i> it will properly reduce to a <i>b</i> and the display output in a browser will be "a b" (note the space). However, if you try minifying a<i> b</i> it will reduce to a<i>b</i> and then get displayed as "ab" (note, no space). Ideally, the 2nd case should end up being a <i>b<i> to retain that inline space but at the right spot.

Now, I tried going through the code to see if I could fix that 2nd case, but I'm not sure I see a way to get there without some major restructuring. One issue is that when examining the NavigableString of ' b' in is_inflow, the previous_sibling is None (in the above example). BS4 doesn't treat a tag next to a text block as siblings.

I did find another project that seemed to do a good job of this here: https://github.com/kangax/html-minifier/blob/51ce10f4daedb1de483ffbcccecc41be1c873da2/src/htmlminifier.js#L65-L83 However, that's in JS using completely different libraries and would require quite a bit of time to rework with BS4. That project also had a "conservative whitespace" option and I thought it might be a good feature to add and also provide a workaround for this issue until a better solution is developed.

One last point... Any solution essentially assumes that inline/inline-block HTML elements will remain such, but CSS could easily break things if someone were to do something like <div style="display:inline">. Obviously that's a silly thing to do, but this change allows for things like that to work as expected. Sometimes HTML on the page comes from third party libraries and it's not always possible to fix these things to have proper HTML.

@tisdall
Copy link
Contributor Author

tisdall commented Jan 8, 2021

Seems Travis may not actually come back with a result as they're in the process of shutting down "travis-ci.org" and they scheduled a complete shutdown as Dec 31, 2020. I ran the tests locally with Python 3.8.3 and they all passed. It seems people are supposed to create accounts and transition over to "travis-ci.com". Ref: https://travis-ci.community/t/extremely-poor-official-communication-of-the-org-shutdown/10568

@tisdall
Copy link
Contributor Author

tisdall commented Jan 8, 2021

Ah.. managed to run! I'll try to fix that issue with py2 unittest.

@tisdall tisdall force-pushed the keep-needed-spaces branch from 8c72bd6 to 3db9295 Compare January 8, 2021 20:35
@andrewsmedina andrewsmedina merged commit 234bd92 into cobrateam:master Jan 11, 2021
@andrewsmedina
Copy link
Member

@tisdall thanks for contribute

@tisdall
Copy link
Contributor Author

tisdall commented Jan 11, 2021

I was not expecting this to be accepted so fast! Thank you for taking the time on the weekend!

What is the possibility of 0.12.0 release into pypi? I see the last release was March 2019 but this change is the only addition since then.

@tisdall tisdall deleted the keep-needed-spaces branch January 11, 2021 13:49
@tisdall
Copy link
Contributor Author

tisdall commented Jan 13, 2021

Just FYI, I'm trying to see if I can come up with a slightly better solution for #121 and #21 . It appears we have a lot of code relying on the more eager space removal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants