-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: get rid of dropcss' html parser #1
Comments
Yes you are right about the redundancy. The AST html of node-html-parser is obviously very different from that of dropcss. I faced this issue and came to the conclusion that the potential speed gain is not worth the work involved. At least for my use-case. There is a lot of steps in the overall processing. I measured the duration of each step to assert that. In this way we should have a good overview to determine which step can/should be improved in order to have a significant gain in terms of speed. An other approach would be the exact opposite of what you proposed. I mean, to get rid of node-html-parser. PS: Did you use the same beastcss instance for every html processing or did you create a new instance each time you need to process html ? |
Well, it's less about speed and more about stability. As you know, dropcss is very sensitive (to the extent of being fragile) to html structure. These are 2 issues I immediately came across just testing with a single page from our project: leeoniya/dropcss#52, leeoniya/dropcss#53 . I imagine there can be more like that. Actually most of my tests I did with my own custom version of critters. It's quite similar to beastcss - but now I see that you implemented caching of parsed css - that's smart! It seems to give 2x boost, comparing to my own naive implementation. Overall around 10x comparing to the original critters |
yep, this is certainly something to consider. i wrote dropcss to process the output of a HTML-ified vdom AST and CSS that was generated from JS objects. both of these were for an SSR use-case, and the output was therefore expected to be clean. any additions to the lib to handle manually authored or poorly processed HTML and CSS are just niceties. if the input needs to be made nice, then the expectation has always been that it will be pre-processed by html and css parsers or minifiers which are spec-compliant and more tolerant; i would never recommend dropcss as a general html and css processing tool for arbitrary inputs. however, html and css are rarely hand authored without tooling or linters/prettier these days, so in fact dropcss may work in quite a few cases, but far from all. fun fact: i did not originally intend to write my own parsers. v0 of dropcss relied on the fastest external libs for everything that i could find at the time [1].
while this achieved the thoroughness goals (mostly), the perf was still several factors slower than PurgeCSS, so i decided to try my hand at writing my own parsers :) frankly, i was amazed by the end result:
i should also note that dropcss is effectively in a "done" state; i do not intend to add and maintain all the future css selector expansions, such as https://www.w3.org/TR/css-nesting-1/, |
Looks like
beastcss
parses html anyway. And it does it in a more spec-compliant way.Having dropcss to parse it again is both redundant and error-prone (dropcss cuts quite a number of corners in an attempt to make html parsing fast).
We could probably use a patched dropcss version that receives html AST as an input instead of a raw string.
Any thoughts?
The text was updated successfully, but these errors were encountered: