-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove UB on overflow in Allocate::next()
#250
base: master
Are you sure you want to change the base?
Remove UB on overflow in Allocate::next()
#250
Conversation
Remove all unsafe in `Allocate::next()`.
Allocate::next()
.Allocate::next()
Probably you guys should just revert #238. The problem with a possible UB it introduced is that use legion::world::Allocate;
fn main() {
// Assuming `sizeof(usize) == sizeof(u64)`.
let zero_nonzero = Allocate::new().skip(usize::MAX - 16).next();
} and get UB in this totally safe code. Yeah, currently overflow in this crate's code is totally impossible, as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that we should use NonZeroU64::new()
instead of NonZeroU64::new_unchecked
, so that on overflow the iterator returns None
rather than constructing a NonZeroU64
with 0.
I am not sure how, after making that change, the Allocator
iterator being public could cause UB?
// This is either the first block, or we overflowed to the next block. | ||
self.next = NEXT_ENTITY.fetch_add(BLOCK_SIZE, Ordering::Relaxed); | ||
debug_assert_eq!(self.next % BLOCK_SIZE, 0); | ||
static NEXT_ENTITY_BLOCK_START: AtomicU64 = AtomicU64::new(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initializing this to BLOCK_SIZE has the same effect of preventing the first entity from being allocated as 0, but without causing 1 entity in every block to be skipped. It skips the first block instead (which isn't a problem).
// Safety: self.next can't be 0 as long as the first block is skipped, | ||
// and no overflow occurs in NEXT_ENTITY | ||
let entity = unsafe { | ||
debug_assert_ne!(self.next, 0); | ||
Entity(NonZeroU64::new_unchecked(self.next)) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomGillen
Here's the problem: you have no UB as long as NEXT_ENTITY
does not overflow. But how do you know that it won't overflow? Welp, you don't know. If the game runs long enough, it will overflow at some point in time.
Alternatives:
- Skip the first block, use
NonZero::new
, returnNone
after overflow. There is no UB, but this breaksAllocate
promise to never returnNone
, and the game will crash with panic on overflow, because we expectAllocate
to never returnNone
and simply get next entity byAllocate.next().unwrap()
. - Skip the first ID in every block. No UB, never returns
None
. The game is probably doomed to have bugs after overflow anyway, due to rewriting old entities, but it won't crash with a panic. - Do not use
NonZero
. No UB, never returnsNone
, more performant than previous alternative, has a downside ofOption<Entity>
not being the same size asEntity
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option 1 is what it is designed to do.
If the internal counter has overflowed, the system will panic (or it should, with checked NonZeroU64
construction). You have exhausted the available 64 bit address space. Every ECS has this issue (generational IDs or not) unless they don't provide unique IDs at all.
If you allocated 1000 entities every frame at 60fps, it would take 10 million years before your program panicked. Even single entity allocations that waste most of a block aren't much worse in reality.
The solution for someone experiencing this issue would be to switch the internal ID from a u64 to a u128. That would give you ~2x10^26 years until panic.
Options 2 and 3 will cause the application to behave incorrectly in bizarre and difficult to diagnose ways instead of panicking, which I am not convinced is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bizarre and difficult to diagnose ways instead of panicking
Yeah, you are right, after ten million years panic is better.
Though, there are ways to avoid those bizarre bugs with something like
fn leak(e: Entity) -> EntityLocation;
which would remove entity ID but won't remove entity data from archetype. But the thing is that users should call it on forever-living entities, and if the user forgets to do so, we are back to bizarre bugs.
Okay, ten million years is a thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for bumping this issue, but I have to ask: If the intention is to have it panic when NonZeroU64 overflows, then perhaps that should be included in a comment in this function, if not in the documentation for Allocate as a whole? Maybe a line like this:
Allocate will eventually overflow if enough IDs are generated. However, this is very unlikely to be a concern in reality, unless you plan on running your program for a few hundred millennia, and are generating hundreds of thousands of entity IDs every second during that timespan.
It just seems odd that you'd intend for the program to crash when it overflows, yet not document that fact.
Description
Removes UB on overflow and also removes all unsafe in
Allocate::next()
, since I haven't found any real speed differences between unsafe and safe functions.Also moves single-use static item as close as possible to the place where it's used.
Pros: no unsafe, no UB on overflow, even if that overflow is unlikely to ever happen.
Cons: slight performance regression, gets overflowed faster (one index per block is ignored and never used), doesn't fix any overflow bugs like overwriting very old entities after overflow.
Benchmark results:
Expected and somewhat big difference on
BLOCK_SIZE
(16) testcase, since now it skips one item per block and actual block size regressed toBLOCK_SIZE - 1
.~2% regression in 100k testcase.
Motivation and Context
Because, well, I think that even highly unlikely UB is a bad thing.
How Has This Been Tested?
I used this benchmark to get the time of old and new iterators. It uses #249 to avoid vector reallocation noise.
Checklist:
Edit: accidentally loaded PR with incomplete message.