Proposal: get rid of dropcss' html parser #1

amakhrov · 2021-09-16T19:37:50Z

Looks like beastcss parses html anyway. And it does it in a more spec-compliant way.
Having dropcss to parse it again is both redundant and error-prone (dropcss cuts quite a number of corners in an attempt to make html parsing fast).

We could probably use a patched dropcss version that receives html AST as an input instead of a raw string.

Any thoughts?

The text was updated successfully, but these errors were encountered:

freddy38510 · 2021-09-17T13:18:51Z

Yes you are right about the redundancy. The AST html of node-html-parser is obviously very different from that of dropcss. I faced this issue and came to the conclusion that the potential speed gain is not worth the work involved. At least for my use-case.

There is a lot of steps in the overall processing. I measured the duration of each step to assert that.
I should do it again and come back with the results for multiple cases (few/many DOM nodes, small/big CSS).

In this way we should have a good overview to determine which step can/should be improved in order to have a significant gain in terms of speed.

An other approach would be the exact opposite of what you proposed. I mean, to get rid of node-html-parser.

PS: Did you use the same beastcss instance for every html processing or did you create a new instance each time you need to process html ?

amakhrov · 2021-09-17T16:31:39Z

Well, it's less about speed and more about stability. As you know, dropcss is very sensitive (to the extent of being fragile) to html structure. These are 2 issues I immediately came across just testing with a single page from our project: leeoniya/dropcss#52, leeoniya/dropcss#53 . I imagine there can be more like that.

Actually most of my tests I did with my own custom version of critters. It's quite similar to beastcss - but now I see that you implemented caching of parsed css - that's smart! It seems to give 2x boost, comparing to my own naive implementation. Overall around 10x comparing to the original critters

leeoniya · 2021-09-18T14:07:37Z

As you know, dropcss is very sensitive (to the extent of being fragile) to html structure. These are 2 issues I immediately came across just testing with a single page from our project: leeoniya/dropcss#52, leeoniya/dropcss#53 . I imagine there can be more like that.

yep, this is certainly something to consider. i wrote dropcss to process the output of a HTML-ified vdom AST and CSS that was generated from JS objects. both of these were for an SSR use-case, and the output was therefore expected to be clean. any additions to the lib to handle manually authored or poorly processed HTML and CSS are just niceties. if the input needs to be made nice, then the expectation has always been that it will be pre-processed by html and css parsers or minifiers which are spec-compliant and more tolerant; i would never recommend dropcss as a general html and css processing tool for arbitrary inputs. however, html and css are rarely hand authored without tooling or linters/prettier these days, so in fact dropcss may work in quite a few cases, but far from all.

fun fact: i did not originally intend to write my own parsers. v0 of dropcss relied on the fastest external libs for everything that i could find at the time [1].

dropcss is an unused CSS cleaner based on Fast HTML Parser, CSSTree and css-select.

while this achieved the thoroughness goals (mostly), the perf was still several factors slower than PurgeCSS, so i decided to try my hand at writing my own parsers :) frankly, i was amazed by the end result:

0 dependencies
< 10kb min
fastest and most thorough on clean inputs

i should also note that dropcss is effectively in a "done" state; i do not intend to add and maintain all the future css selector expansions, such as https://www.w3.org/TR/css-nesting-1/, :has, :matches. though i'd review/accept PRs to add these.

[1] leeoniya/dropcss@9347a29

freddy38510 added the enhancement New feature or request label Jan 20, 2022

freddy38510 mentioned this issue Feb 23, 2023

Infinite loop on ":where" selector leeoniya/dropcss#62

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: get rid of dropcss' html parser #1

Proposal: get rid of dropcss' html parser #1

amakhrov commented Sep 16, 2021

freddy38510 commented Sep 17, 2021

amakhrov commented Sep 17, 2021 •

edited

Loading

leeoniya commented Sep 18, 2021 •

edited

Loading

Proposal: get rid of dropcss' html parser #1

Proposal: get rid of dropcss' html parser #1

Comments

amakhrov commented Sep 16, 2021

freddy38510 commented Sep 17, 2021

amakhrov commented Sep 17, 2021 • edited Loading

leeoniya commented Sep 18, 2021 • edited Loading

amakhrov commented Sep 17, 2021 •

edited

Loading

leeoniya commented Sep 18, 2021 •

edited

Loading