Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: get rid of dropcss' html parser #1

Open
amakhrov opened this issue Sep 16, 2021 · 3 comments
Open

Proposal: get rid of dropcss' html parser #1

amakhrov opened this issue Sep 16, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@amakhrov
Copy link

Looks like beastcss parses html anyway. And it does it in a more spec-compliant way.
Having dropcss to parse it again is both redundant and error-prone (dropcss cuts quite a number of corners in an attempt to make html parsing fast).

We could probably use a patched dropcss version that receives html AST as an input instead of a raw string.

Any thoughts?

@freddy38510
Copy link
Owner

Yes you are right about the redundancy. The AST html of node-html-parser is obviously very different from that of dropcss. I faced this issue and came to the conclusion that the potential speed gain is not worth the work involved. At least for my use-case.

There is a lot of steps in the overall processing. I measured the duration of each step to assert that.
I should do it again and come back with the results for multiple cases (few/many DOM nodes, small/big CSS).

In this way we should have a good overview to determine which step can/should be improved in order to have a significant gain in terms of speed.

An other approach would be the exact opposite of what you proposed. I mean, to get rid of node-html-parser.

PS: Did you use the same beastcss instance for every html processing or did you create a new instance each time you need to process html ?

@amakhrov
Copy link
Author

amakhrov commented Sep 17, 2021

Well, it's less about speed and more about stability. As you know, dropcss is very sensitive (to the extent of being fragile) to html structure. These are 2 issues I immediately came across just testing with a single page from our project: leeoniya/dropcss#52, leeoniya/dropcss#53 . I imagine there can be more like that.

Actually most of my tests I did with my own custom version of critters. It's quite similar to beastcss - but now I see that you implemented caching of parsed css - that's smart! It seems to give 2x boost, comparing to my own naive implementation. Overall around 10x comparing to the original critters

@leeoniya
Copy link

leeoniya commented Sep 18, 2021

As you know, dropcss is very sensitive (to the extent of being fragile) to html structure. These are 2 issues I immediately came across just testing with a single page from our project: leeoniya/dropcss#52, leeoniya/dropcss#53 . I imagine there can be more like that.

yep, this is certainly something to consider. i wrote dropcss to process the output of a HTML-ified vdom AST and CSS that was generated from JS objects. both of these were for an SSR use-case, and the output was therefore expected to be clean. any additions to the lib to handle manually authored or poorly processed HTML and CSS are just niceties. if the input needs to be made nice, then the expectation has always been that it will be pre-processed by html and css parsers or minifiers which are spec-compliant and more tolerant; i would never recommend dropcss as a general html and css processing tool for arbitrary inputs. however, html and css are rarely hand authored without tooling or linters/prettier these days, so in fact dropcss may work in quite a few cases, but far from all.

fun fact: i did not originally intend to write my own parsers. v0 of dropcss relied on the fastest external libs for everything that i could find at the time [1].

dropcss is an unused CSS cleaner based on Fast HTML Parser, CSSTree and css-select.

while this achieved the thoroughness goals (mostly), the perf was still several factors slower than PurgeCSS, so i decided to try my hand at writing my own parsers :) frankly, i was amazed by the end result:

  1. 0 dependencies
  2. < 10kb min
  3. fastest and most thorough on clean inputs

i should also note that dropcss is effectively in a "done" state; i do not intend to add and maintain all the future css selector expansions, such as https://www.w3.org/TR/css-nesting-1/, :has, :matches. though i'd review/accept PRs to add these.

[1] leeoniya/dropcss@9347a29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants