-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch Bitset from an R6 class to a named list. #186
Conversation
R6 object have significant overhead, especially when instantiating the objects. While this is acceptable for long-lived objects such as events or variables, bitsets are created and destroyed very regularly during simulations. We can replace our use of an R6 class for `Bitset` with named lists that are intended to look and feel just like the original API, but which significant performance improvement. The reference semantics provided by R6 don't matter in our case, since all mutability happens behind the external pointer. On malariasimulation, I get a 30-35% performance improvement when using this new implementation of Bitset on population sizes under 10k, and about 10% speedup at 100k. The object-oriented named list based interface still adds a bit of overhead compared to using the externalptr and Rcpp functions directly, but doing so requires intrusive changes in the use site.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## dev #186 +/- ##
==========================================
- Coverage 96.28% 96.20% -0.09%
==========================================
Files 36 36
Lines 1722 1790 +68
==========================================
+ Hits 1658 1722 +64
- Misses 64 68 +4 ☔ View full report in Codecov by Sentry. |
Here are some benchmarks of the old version ("R6"), the new one ("list"), and the equivalent code without any wrappers, calling the C++ functions from R directly ("ptr"): https://gist.github.com/plietar/9d07ffb01b9ea7a27a12696b09242435 These compare the performance of doing The |
Unfortunately the generated documentation for this is pretty terrible. I need to figure out a good way of keeping the R6 look but with the alternative implementation. |
a926860
to
425f103
Compare
Found a workaround for the documentation problem: Instead of relying on roxygen to do the formatting, we include Rd markup directly in our code. It's not the prettiest implementation, but at least the output looks okay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very hacky, I respect it 😆
Since this is localised to Bitsets, this seems a fine practical decision. But if we ever reuse this for other R6 objects, we should consider extending ROxygen to make documentation cleaner.
R6 object have significant overhead, especially when instantiating the objects. While this is acceptable for long-lived objects such as events or variables, bitsets are created and destroyed very regularly during simulations.
We can replace our use of an R6 class for
Bitset
with named lists that are intended to look and feel just like the original API, but which significant performance improvement. The reference semantics provided by R6 don't matter in our case, since all mutability happens behind the external pointer.On malariasimulation, I get a 30-35% performance improvement when using this new implementation of Bitset on population sizes under 10k, and about 10% speedup at 100k.
The object-oriented named list based interface still adds a bit of overhead compared to using the externalptr and Rcpp functions directly, but doing so requires intrusive changes in the use site.