Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Make the consumer crate no-std #471

Merged
merged 2 commits into from
Nov 21, 2024
Merged

feat: Make the consumer crate no-std #471

merged 2 commits into from
Nov 21, 2024

Conversation

mwcampbell
Copy link
Contributor

I think the only potentially controversial part of this is the decision to switch from hash tables to B-trees for the temporary data structures that are built while processing an update. Here's my reasoning.

HashMap and HashSet aren't in alloc because the default hasher requires entropy. We could add a std feature to accesskit_consumer and then bring in hashbrown as a dependency only when std is disabled. But for simplicity, I'd rather not add an extra feature and different code paths if we can avoid it.

I don't like depending on a random number generator anyway, even when using std. Predictability is a valuable attribute in software, and I think it's worth maximizing it at as many levels of the stack as possible. Also, at least on Windows, using the random number generator adds an extra library dependency that non-Rust users have to use when statically linking AccessKit.

As for performance, according to the Rust collections documentation, the performance of B-tree operations is O(log(n)), where as the expected performance for hash tables is O(1). Note, though, that the latter is expected performance, meaning that it's not guaranteed because hashing is probabilistic. In my opinion, log(n) never gets too large, and having that predictable upper bound is better.

I'm aware that the page I linked above also says to use hash tables unless one has a specific reason to do otherwise. I think I now disagree with that advice. I think it overvalues best-case performance compared to predictability.

Switching to B-trees does increase total binary size. I mitigated this somewhat by going to the trouble to avoid removing entries from the temporary pending_nodes and pending_children maps. This should be good for speed as well, since as I mentioned, the maps are temporary; they go away as soon as update processing is finished. With this mitigation, the added binary size, for a size-optimized build with panic = "abort" on x86-64, is just under 10 KB. When I switch the platform adapters to also use B-trees for their own structures, that should drop a few KB.

@mwcampbell
Copy link
Contributor Author

I'm no longer sure that using B-trees for the temporary collections is the right choice. Aside from the fact that Rust's hash tables use a relatively complex hasher with random inputs by default, I feel that the B-tree collections are overly complicated for what we're doing with them. In particular, B-trees require a larger number of allocations, at least for small collections, than a hash table without collisions.

So, I now think it would be better to directly use hashbrown for no-std, and possibly use the relatively simple rustc-hash hasher to minimize binary size and avoid randomness. Adding rustc-hash would be an extra dependency, but a small one, and compromising actual code efficiency for a lower dependency count doesn't make me happy. As for hashbrown, unconditionally depending on it could lead to some code duplication in typical builds that use std, though I don't know how much, because of generic monomorphization. So maybe, in the consumer crate, we want to conditionally enable no-std and hashbrown using a feature.

@mwcampbell
Copy link
Contributor Author

I decided to go back to using hash tables for the reasons I discussed in the previous comment. I decided to just use hashbrown unconditionally, for simplicity, and to use hashbrown's default hasher, which is now foldhash. foldhash's README presents reasons why foldhash provides a better balance of quality and performance than rustc-hash (also called fxhash). As for DoS resistance, when used in no-std mode, foldhash provides some basic protection, using the stack pointer at the time of hasher construction as a source of non-determinism. I think that's a good balanced approach, providing some basic protection while not bringing in a dependency on a cryptographic RNG.

With accesskit_consumer now using hashbrown and foldhash while accesskit_windows still uses the std HashMap with its default hasher, a fully size-optimized build of the Windows hello_world example is about 1 KB larger than it was before this PR, using the std hash map across the board. If I modify accesskit_windows to also use hashbrown with foldhash, then the size of that build goes down by about 2 KB from where it started. So I can modify the platform adapters to use hashbrown with foldhash in a separate PR, with the fix tag so as not to cause a semver-breaking change in those crates.

@mwcampbell
Copy link
Contributor Author

@DataTriny Can you please investigate why cargo deny failed in the latest CI run? I don't think it could have been caused by anything I changed.

@DataTriny
Copy link
Member

@mwcampbell Failure is due to foldhash's license: it uses Zlib which we currently don't allow.

I think this license is fine though so you can add it in the list of allowed licenses in deny.toml.

@DataTriny DataTriny merged commit f25d03a into main Nov 21, 2024
9 checks passed
@DataTriny DataTriny deleted the consumer-no-std branch November 21, 2024 19:17
@github-actions github-actions bot mentioned this pull request Nov 21, 2024
@TTWNO
Copy link

TTWNO commented Nov 23, 2024

Very happy to see this change! Congrats, y'all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants