Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Efficient and predictable maps in Move Framework #15330

Open
igor-aptos opened this issue Nov 20, 2024 · 0 comments
Open

[Feature Request] Efficient and predictable maps in Move Framework #15330

igor-aptos opened this issue Nov 20, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request move-framework Issues related to the Framework modules/libraries

Comments

@igor-aptos
Copy link
Contributor

igor-aptos commented Nov 20, 2024

🚀 Feature Request

Implement in Move Framework efficient maps, with predictable performance on every operation. Both single struct map, as well as unbounded in size, multi-storage-slot, big map.

Motivation and Pitch

  • single struct map:
    • We currently have SimpleMap, which is implemented as a bag of items, with O(n) performance on most operations, making it useful for only the smallest (few elements) instances.
    • Goal is to implement efficient OrderedMap, such that performance scales well as it increases in size
  • unbounded multi-storage-slot big map:
    • We currently have SmartTable, implementation has various limitations:
      • any two modification operations conflict with one another, making any workload using it fully sequential
      • collisions degrade performance arbitrarily - whether they come from unlucky usage or malicious party
    • Goal is to implement efficient BigOrderedMap, such that performance of every operation is predictable and scalable (i.e. has a quite efficient upper bound, irrespective of data inserted), and operations can happen in parallel when the map grows big enough.

Both of the current datastructures - SimpleMap and SmartTable, cannot be improved, because their layout is fixed (becuase they were implemented before enums were available). So we will need new implementations, and those should supersede (i.e. deprecate) the current ones fully. In addition, they should be written in a way that they can be evolved in the future.

Implementation

Evaluation

OrderedMap::SortedVectorMap

Implementation is a SortedVectorMap, but given the efficiency improvements of vector operations with memcpy, it's performance characteristics are extremely good.

Running different sizes tests, here we measure nanoseconds taken for a single pair of insert + remove operation, into a map of varied size:

num elements SimpleMap OrderedMap SortedVectorMap
10 61 65
100 219 85
1000 1508 105
10000 14835 142

Basically achieving log(n) performance, with very simple implementation.

BigOrderedMap::BPlusTreeMap

Implementation uses BPlusTreeMap, which is chosen as it is best suited for the onchain storage layout - where majority of cost comes from loading and writing to storage items, and there is no partial read/write of them. It also rebalances in a way that reduces amount writes needed

  • writing to keys that are not close enough, to a BPlusTreeMap with multiple layers, is generally parallel. More elements in the BPlusTreeMap, the more operations will be parallel.
  • it is an enum, so can be evolved in the future as needed
  • has an option to pre-pay and pre-allocate storage slots, and to reuse them, achieving predictable gas charges.
    • defined potentially generally useful StorageSlotsAllocator, to manage indexed storage slots for datastructures that need it. Easier to use and more flexible/configurable than directly using Table. (which is also an enum, and so can be evolved / new strategies can be added)
  • keeps root note directly inside the resource, to reduce number of resources needed by 1, and optimizes operations when root is the leaf node.
  • whenever key or value is inserted/modified, we check the size to make sure it is allowed (i.e. that it is smaller than max_node_size / max_degree). this requires bcs::serialized_size() call on every insert/upsert.
    • in case types have constant sizes, check is performed once, on construction, and then it’s skipped on insert/upsert
      For performance, we measured two things.

At large scale

Most relevantly - we measured performance at scale, in comparison to SmartTable. So adding 1M keys into each, with making entries be 4KB on each. we get:

metric SmartTable, 4KB nodes SmartTable, 1KB nodes BigOrderedMap BPlusTreeMap, 4KB nodes BigOrderedMap BPlusTreeMap, 1KB nodes
tps (u64 -> None) 1300 1968 1899 2516
tps (u256 -> None) 2166 3152 2313 2219
gas/txn (u64 -> None) 15 11 9 9
gas/txn (u256 -> None) 10 8 9 10
storage fee/txn (u64 -> None) 1147 1342 652 977
storage fee /txn (u256 -> None) 2814 3926 2177 3701

This shows BigOrderedMap being more storage-efficient, especially when keys/values are small, due to SmartTable storing hash in addition to key,value. It is also more performant even on 1M keys, when we are optimizing for storage costs (i.e. more elements in a single bucket). As we reduce the size of the bucket, SmartTable becomes more competitive, but the storage costs increase significantly.

Note: Test is compared on the single thread to compare apples to apples, as SmartTable has no parallelism. BigOrderedMap is parallel, and running on more threads gives order of magnitude higher throughput.

At small scale

We also measured performance at small scale, and how much overhead is it to use BigOrderedMap, instead of just OrderedMap, when it is unknown if data will be too large to be stored in a single resource.
Here we measure nanoseconds taken for a single pair of insert + remove operation, into a map of varied size.

num elements SimpleMap OrderedMap SortedVectorMap BigOrderedMap BPlusTreeMap, all inlined BigOrderedMap BPlusTreeMap, max_degree=16 SmartTable, 1 bucket SmartTable, preallocated, 1 per bucket
10 61 65 123 123 80 62
100 219 85 146 455 229 62
1000 1508 105 168 567 1458 75
10000 14835 142 210 656 15726 80

Here we can see that inlining of the root node makes the overhead be less than 2 times, and even splitting small maps to achieve higher parallelism - keeps the overhead reasonable (i.e. another 2-3 times). But in all cases it scales extremely well as dataset increases in size.

Alternatives

Unordered maps in general can produce better performance than ordered maps, as they have more freedom. But:

  • it is quite complex to provide hash-table based approach that has predictable performance on collisions, when operations can be defined after the fact.
  • as evaluation shows, due to efficiency of comparison (compared to hashing) and memcopy operations, we can actually achieve quite comparable performance, reducing the need for separate non-ordered versions

Other alternative would be to implement the whole datastructures within rust (i.e. have them backed by rust structs itself), and provide native calls to the operations. This is quite more complicated, and requires such support for native memory layout within VM, so this has been punted on. We might revisit it at some point in the future, if it can further improve performance significantly.

@igor-aptos igor-aptos added enhancement New feature or request move-framework Issues related to the Framework modules/libraries labels Nov 20, 2024
@igor-aptos igor-aptos self-assigned this Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request move-framework Issues related to the Framework modules/libraries
Projects
None yet
Development

No branches or pull requests

1 participant