feat: Make the consumer crate no-std #471

mwcampbell · 2024-10-22T14:46:17Z

I think the only potentially controversial part of this is the decision to switch from hash tables to B-trees for the temporary data structures that are built while processing an update. Here's my reasoning.

HashMap and HashSet aren't in alloc because the default hasher requires entropy. We could add a std feature to accesskit_consumer and then bring in hashbrown as a dependency only when std is disabled. But for simplicity, I'd rather not add an extra feature and different code paths if we can avoid it.

I don't like depending on a random number generator anyway, even when using std. Predictability is a valuable attribute in software, and I think it's worth maximizing it at as many levels of the stack as possible. Also, at least on Windows, using the random number generator adds an extra library dependency that non-Rust users have to use when statically linking AccessKit.

As for performance, according to the Rust collections documentation, the performance of B-tree operations is O(log(n)), where as the expected performance for hash tables is O(1). Note, though, that the latter is expected performance, meaning that it's not guaranteed because hashing is probabilistic. In my opinion, log(n) never gets too large, and having that predictable upper bound is better.

I'm aware that the page I linked above also says to use hash tables unless one has a specific reason to do otherwise. I think I now disagree with that advice. I think it overvalues best-case performance compared to predictability.

Switching to B-trees does increase total binary size. I mitigated this somewhat by going to the trouble to avoid removing entries from the temporary pending_nodes and pending_children maps. This should be good for speed as well, since as I mentioned, the maps are temporary; they go away as soon as update processing is finished. With this mitigation, the added binary size, for a size-optimized build with panic = "abort" on x86-64, is just under 10 KB. When I switch the platform adapters to also use B-trees for their own structures, that should drop a few KB.

mwcampbell · 2024-11-02T15:12:34Z

I'm no longer sure that using B-trees for the temporary collections is the right choice. Aside from the fact that Rust's hash tables use a relatively complex hasher with random inputs by default, I feel that the B-tree collections are overly complicated for what we're doing with them. In particular, B-trees require a larger number of allocations, at least for small collections, than a hash table without collisions.

So, I now think it would be better to directly use hashbrown for no-std, and possibly use the relatively simple rustc-hash hasher to minimize binary size and avoid randomness. Adding rustc-hash would be an extra dependency, but a small one, and compromising actual code efficiency for a lower dependency count doesn't make me happy. As for hashbrown, unconditionally depending on it could lead to some code duplication in typical builds that use std, though I don't know how much, because of generic monomorphization. So maybe, in the consumer crate, we want to conditionally enable no-std and hashbrown using a feature.

mwcampbell · 2024-11-03T01:50:56Z

I decided to go back to using hash tables for the reasons I discussed in the previous comment. I decided to just use hashbrown unconditionally, for simplicity, and to use hashbrown's default hasher, which is now foldhash. foldhash's README presents reasons why foldhash provides a better balance of quality and performance than rustc-hash (also called fxhash). As for DoS resistance, when used in no-std mode, foldhash provides some basic protection, using the stack pointer at the time of hasher construction as a source of non-determinism. I think that's a good balanced approach, providing some basic protection while not bringing in a dependency on a cryptographic RNG.

With accesskit_consumer now using hashbrown and foldhash while accesskit_windows still uses the std HashMap with its default hasher, a fully size-optimized build of the Windows hello_world example is about 1 KB larger than it was before this PR, using the std hash map across the board. If I modify accesskit_windows to also use hashbrown with foldhash, then the size of that build goes down by about 2 KB from where it started. So I can modify the platform adapters to use hashbrown with foldhash in a separate PR, with the fix tag so as not to cause a semver-breaking change in those crates.

mwcampbell · 2024-11-03T02:04:15Z

@DataTriny Can you please investigate why cargo deny failed in the latest CI run? I don't think it could have been caused by anything I changed.

DataTriny · 2024-11-03T12:42:48Z

@mwcampbell Failure is due to foldhash's license: it uses Zlib which we currently don't allow.

I think this license is fine though so you can add it in the list of allowed licenses in deny.toml.

TTWNO · 2024-11-23T04:17:26Z

Very happy to see this change! Congrats, y'all!

feat: Make the consumer crate no-std

2d4aabc

mwcampbell force-pushed the consumer-no-std branch from ff679ae to 2d4aabc Compare November 3, 2024 01:46

Allow the zlib license, introduced by foldhash

f89c89d

DataTriny approved these changes Nov 21, 2024

View reviewed changes

DataTriny merged commit f25d03a into main Nov 21, 2024
9 checks passed

DataTriny deleted the consumer-no-std branch November 21, 2024 19:17

github-actions bot mentioned this pull request Nov 21, 2024

chore: release main #481

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Make the consumer crate no-std #471

feat: Make the consumer crate no-std #471

mwcampbell commented Oct 22, 2024

mwcampbell commented Nov 2, 2024

mwcampbell commented Nov 3, 2024

mwcampbell commented Nov 3, 2024

DataTriny commented Nov 3, 2024

TTWNO commented Nov 23, 2024

feat: Make the consumer crate no-std #471

feat: Make the consumer crate no-std #471

Conversation

mwcampbell commented Oct 22, 2024

mwcampbell commented Nov 2, 2024

mwcampbell commented Nov 3, 2024

mwcampbell commented Nov 3, 2024

DataTriny commented Nov 3, 2024

TTWNO commented Nov 23, 2024