Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove UB on overflow in Allocate::next() #250

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ImmemorConsultrixContrarie
Copy link
Contributor

@ImmemorConsultrixContrarie ImmemorConsultrixContrarie commented Mar 25, 2021

Description

Removes UB on overflow and also removes all unsafe in Allocate::next(), since I haven't found any real speed differences between unsafe and safe functions.
Also moves single-use static item as close as possible to the place where it's used.

Pros: no unsafe, no UB on overflow, even if that overflow is unlikely to ever happen.

Cons: slight performance regression, gets overflowed faster (one index per block is ignored and never used), doesn't fix any overflow bugs like overwriting very old entities after overflow.

Benchmark results:

Entity allocator//1     time:   [23.369 ns 23.390 ns 23.414 ns]                                 
                        change: [-0.5274% -0.2831% -0.0357%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Entity allocator//16    time:   [45.481 ns 45.522 ns 45.567 ns]                                  
                        change: [+4.9684% +5.2765% +5.5631%] (p = 0.00 < 0.05)
                        Performance has regressed.
Entity allocator//100   time:   [182.26 ns 182.48 ns 182.73 ns]                                  
                        change: [-0.7384% -0.5018% -0.2754%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Entity allocator//1000  time:   [1.7108 us 1.7141 us 1.7175 us]                                    
                        change: [+2.6156% +2.9058% +3.1810%] (p = 0.00 < 0.05)
                        Performance has regressed.
Entity allocator//999999                                                                             
                        time:   [1.9298 ms 1.9320 ms 1.9345 ms]
                        change: [-1.7528% +1.7163% +4.8989%] (p = 0.35 > 0.05)
                        No change in performance detected.

Expected and somewhat big difference on BLOCK_SIZE (16) testcase, since now it skips one item per block and actual block size regressed to BLOCK_SIZE - 1.

~2% regression in 100k testcase.

Motivation and Context

Because, well, I think that even highly unlikely UB is a bad thing.

How Has This Been Tested?

I used this benchmark to get the time of old and new iterators. It uses #249 to avoid vector reallocation noise.

use criterion::*;
use legion::*;

fn bench_entity_allocate(c: &mut Criterion) {
    let mut group = c.benchmark_group("Entity allocator");
    for i in [1, 16, 100, 1000, 999_999].iter() {
        group.bench_with_input(BenchmarkId::new("", i), i, 
            |b, i| b.iter(|| world::Allocate::new().take(*i).collect::<Vec<_>>()));
    }
    group.finish();
}

criterion_group!(
    basic,
    bench_entity_allocate
);
criterion_main!(basic);

Checklist:

  • Acknowledged that by making this pull request I release this code under an MIT/Apache 2.0 dual licensing scheme.
  • My code follows the code style of this project.
  • If my change required a change to the documentation I have updated the documentation accordingly.
  • I have updated the content of the book if this PR would make the book outdated.
  • I have added tests to cover my changes.
  • My code is used in an example.

Edit: accidentally loaded PR with incomplete message.

Remove all unsafe in `Allocate::next()`.
@ImmemorConsultrixContrarie ImmemorConsultrixContrarie changed the title Remove UB on overflow in Allocate::next(). Remove UB on overflow in Allocate::next() Mar 25, 2021
@ImmemorConsultrixContrarie
Copy link
Contributor Author

ImmemorConsultrixContrarie commented Mar 25, 2021

Probably you guys should just revert #238. The problem with a possible UB it introduced is that Allocate is a public item, thus users of the lib could write 100% safe code:

use legion::world::Allocate;

fn main() {
    // Assuming `sizeof(usize) == sizeof(u64)`.
    let zero_nonzero = Allocate::new().skip(usize::MAX - 16).next();
}

and get UB in this totally safe code.

Yeah, currently overflow in this crate's code is totally impossible, as legion::World::remove leaks memory, thus you could also fix the problem by making Allocate a private struct; though if the memory leak would be fixed, overflow would become possible, even though it would take many (many-many-many) hours.

Copy link
Collaborator

@TomGillen TomGillen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that we should use NonZeroU64::new() instead of NonZeroU64::new_unchecked, so that on overflow the iterator returns None rather than constructing a NonZeroU64 with 0.

I am not sure how, after making that change, the Allocator iterator being public could cause UB?

// This is either the first block, or we overflowed to the next block.
self.next = NEXT_ENTITY.fetch_add(BLOCK_SIZE, Ordering::Relaxed);
debug_assert_eq!(self.next % BLOCK_SIZE, 0);
static NEXT_ENTITY_BLOCK_START: AtomicU64 = AtomicU64::new(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initializing this to BLOCK_SIZE has the same effect of preventing the first entity from being allocated as 0, but without causing 1 entity in every block to be skipped. It skips the first block instead (which isn't a problem).

Comment on lines -71 to -76
// Safety: self.next can't be 0 as long as the first block is skipped,
// and no overflow occurs in NEXT_ENTITY
let entity = unsafe {
debug_assert_ne!(self.next, 0);
Entity(NonZeroU64::new_unchecked(self.next))
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomGillen
Here's the problem: you have no UB as long as NEXT_ENTITY does not overflow. But how do you know that it won't overflow? Welp, you don't know. If the game runs long enough, it will overflow at some point in time.
Alternatives:

  1. Skip the first block, use NonZero::new, return None after overflow. There is no UB, but this breaks Allocate promise to never return None, and the game will crash with panic on overflow, because we expect Allocate to never return None and simply get next entity by Allocate.next().unwrap().
  2. Skip the first ID in every block. No UB, never returns None. The game is probably doomed to have bugs after overflow anyway, due to rewriting old entities, but it won't crash with a panic.
  3. Do not use NonZero. No UB, never returns None, more performant than previous alternative, has a downside of Option<Entity> not being the same size as Entity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option 1 is what it is designed to do.

If the internal counter has overflowed, the system will panic (or it should, with checked NonZeroU64 construction). You have exhausted the available 64 bit address space. Every ECS has this issue (generational IDs or not) unless they don't provide unique IDs at all.

If you allocated 1000 entities every frame at 60fps, it would take 10 million years before your program panicked. Even single entity allocations that waste most of a block aren't much worse in reality.

The solution for someone experiencing this issue would be to switch the internal ID from a u64 to a u128. That would give you ~2x10^26 years until panic.

Options 2 and 3 will cause the application to behave incorrectly in bizarre and difficult to diagnose ways instead of panicking, which I am not convinced is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bizarre and difficult to diagnose ways instead of panicking

Yeah, you are right, after ten million years panic is better.

Though, there are ways to avoid those bizarre bugs with something like

fn leak(e: Entity) -> EntityLocation;

which would remove entity ID but won't remove entity data from archetype. But the thing is that users should call it on forever-living entities, and if the user forgets to do so, we are back to bizarre bugs.
Okay, ten million years is a thing.

Copy link

@LikeLakers2 LikeLakers2 Jun 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for bumping this issue, but I have to ask: If the intention is to have it panic when NonZeroU64 overflows, then perhaps that should be included in a comment in this function, if not in the documentation for Allocate as a whole? Maybe a line like this:

Allocate will eventually overflow if enough IDs are generated. However, this is very unlikely to be a concern in reality, unless you plan on running your program for a few hundred millennia, and are generating hundreds of thousands of entity IDs every second during that timespan.

It just seems odd that you'd intend for the program to crash when it overflows, yet not document that fact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants