Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared and Unique slices and ZST #168

Closed
Lokathor opened this issue Jul 17, 2019 · 13 comments
Closed

Shared and Unique slices and ZST #168

Lokathor opened this issue Jul 17, 2019 · 13 comments

Comments

@Lokathor
Copy link
Contributor

Lokathor commented Jul 17, 2019

At what is currently the bottom of #93 there were some interesting questions raised, but it was offtopic to that issue so here's a new one.

Big Question: How exactly do ZST interact with slices?

There's obviously a lot of sub-questions involved here.

  • The particular bit of discussion in the other issue is that a user wanted to concat two shared slices.
  • The normal way to do this is to see if one slice starts at one byte past the end of the other, and then combine the lengths if so. (EDIT: important note for readers, you can't combine two slices that are part of separate allocations even if they're adjacent in memory, that's UB)
  • However, the length of a shared slice of ZST is almost arbitrary (exception: uninhabited types cannot have a shared slice length >0), so you don't need to even check for the pointer being exactly one byte past the end, you can skip straight to adding up the lengths and use either input's address as the output's address (if the inputs are of an uninhabited type both lengths will be 0 and 0+0==0, or someone did UB before you and it doesn't matter what you do).

So the immediate questions to ask ourselves are:

  1. Do we all agree that this is valid reasoning?
  2. Does this logic hold with unique slices?
  3. Any other questions people want to raise about how slices and ZST interact?
@RalfJung
Copy link
Member

The way I understood it, the key question we debated was whether this is a safe function:

fn arbitrary_blowup<'a, T>(slice: &'a [T], length: usize) -> &'a [T] {
    assert!(core::mem::size_of::<T>() == 0);
    assert!(!slice.is_empty());
    unsafe {
        core::slice::from_raw_parts(slice.as_ptr(), length);
    }
}

By "safe" I mean "can be called by arbitrary safe code without causing UB".

I originally thought it was not, but now I am convinced it is.

@gnzlbg asks

I’m not sure I follow, arbitrary blowup gets a reference to a slice, and
increases the size of the slice, returning a reference to it. AFAICT this
creates a reference to values that do not exist.

The thing is, &slice[0] and &slice[1] return values that are identical in any aspect when this is a ZST slice. The index you are passing just does not matter at all, except for the bounds check. So how could there possibly be unsafety from growing a non-empty slice to something bigger?

@Lokathor
Copy link
Contributor Author

Does the index also not matter for &mut [ZST]?

If it does matter, how does that work with split_at_mut, since wouldn't both sides of the split end up with the same base address anyway?

@scottmcm
Copy link
Member

Hmm, I guess that because this is just getting you new references to the ZSTs, it's safe, since you could always have gotten new references by copying the references.

@RalfJung
Copy link
Member

RalfJung commented Jul 17, 2019

First, are we in agreement for shared slices? That the function is safe?

For mutable slices, no, the function is not sound. I wrote this down more than a year ago in some discussion but it seems the same questions will come up again and again, and of course I cannot find my write-up, so here you go.

Basically @scottmcm gave the answer away: you could have copied the shared refs. You cannot do that with mutable refs. So imagine a type of which you can only get one instance:

pub mod one {
    //! A value of which you can only have one.
    use std::sync::atomic::{AtomicBool, Ordering};
    
    static USED: AtomicBool = AtomicBool::new(false);
    
    pub struct OnlyOne(()); // private field to make constructor private
    
    impl OnlyOne {
        pub fn new() -> Self {
            let used_before = USED.compare_and_swap(false, true, Ordering::Relaxed);
            if used_before {
                panic!("Cannot get more than one of these!");
            }
            OnlyOne(())
        }
        
        pub fn check(_x1: &mut Self, _x2: &mut Self) {
            // We could cause UB here if we wanted, and still have a safe API.
            unreachable!()
        }
    }
}

With arbitrary_blowup_mut, I could do something like:

let one = &mut [OnlyOne::new()];
let two = arbitrary_blowup(one, 2);
let (left, right) = two.split_at_mut(1);
OnlyOne::check(left, right); // oops!

@Lokathor
Copy link
Contributor Author

Lokathor commented Jul 17, 2019

Okay, sure, but also I wasn't talking about arbitrary_blowup, where you expand the count that supposedly exist as much as you want (absolutely not safe for mut), I was talking about a concat case where the "count" that exists doesn't go up as a cause of the function (concat just adds the lengths). That should be safe for mut.

except how does split_at_mut interact with &mut aliasing? When you do split_at_mut it does an add internally, which adds 0 bytes because size is zero, so you end up with two &mut that point to the same address. Does rust/llvm have specific handling internally for this?

@HeroicKatora
Copy link

HeroicKatora commented Jul 17, 2019

First of all, sorry for taking the other thread too far. It started off as what I perceived to be closely related to the thread title and then diverged to something entirely different. And I realize now that I should have forked probably very promptly after the first answer when it was at least clear in hindsight that it would widen the scope considerably. I'll try not to repeat this.

Now, it is not possbily to conclusively proof dynamically that two non-mutable ZST slices do not 'alias' each other. They could have been created from either overlapping ranges or from disjunct ranges and would still look alike. But at the same time, two &mut references would guarantee the non-aliasing through &mut properties. A concat function would thereforee need at least as many pre-conditions as a concat_mut function. If the intuition over the &T equivalents is correct then they would require the same preconditions. If not however, concat_mut would be possible while concat would not. That seem rather odd.

@RalfJung
Copy link
Member

RalfJung commented Jul 18, 2019

First of all, please stop talking about aliasing in this thread. One point I forgot to raise yesterday: There is no aliasing for ZST. Aliasing is a property involving overlapping ranges of memory and hence cannot happen with empty ranges. Also see rust-lang/rust-memory-model#44 (thanks @Lokathor for finding this).

The only concern we have in this thread is about "safety invariants". When you have two &mut T, you know they point to two logically distinct values of T; if T is non-Copy then those values could express ownership even if they are ZST, and that's why arbitrary_blowup_mut is unsound. The tricky and confusing part here is that there is some kind of "identity" of these values but that identity has nothing to do with a memory location -- there can be arbitrarily many "distinct" instances of a ZST stored "at the same location".

Concatenation is fine, as is split_at_mut, as the total number of mutable references is preserved. For concatenation though notice the issue at oberien/str-concat#8.

@Lokathor
Copy link
Contributor Author

Lokathor commented Jul 18, 2019

@RalfJung I made this thread to ask further questions, as you requested in the other thread, and then I repeated my aliasing question one time because you ignored it the first time.

After you said, just there, that there's no aliasing for ZST as if it was somehow obvious (hint: it's not) I was able to find a year old issue about it in the previous repo: rust-lang/rust-memory-model#44, but this is not covered at all in the UCG and the Rustonomicon gives a very hand-wavey definition as well. You can't expect people to know what the book will say before you've written the book about it.

@comex
Copy link

comex commented Jul 18, 2019

For arbitrary_blowup, I can see why it's safe if a general "concatenate adjacent slices" function is safe, since you could use the latter to implement the former by concatenating a slice with itself. But are we sure the latter is safe? For non-zero-sized types, it seems impossible to avoid UB anyway...

As for not checking the pointer, what if a ZST is designed such that pointers to it encode information in the pointer value itself?

@Lokathor
Copy link
Contributor Author

Lokathor commented Jul 18, 2019

As explained in the issue Ralf mentioned, general concat is not safe because even if two slices fall next to each other in memory they must also be part of the same allocation to be combined, which you in general have no way of knowing (if all slices are known to be out of the same vec or something it can work out, but you need that extra info).

I updated the first post of the thread to help note that difficulty.

I don't know about the pointer encode part.

@RalfJung
Copy link
Member

RalfJung commented Jul 18, 2019

I made this thread to ask further questions, as you requested in the other thread, and then I repeated my aliasing question one time because you ignored it the first time.

Sorry. I didn't mean to ignore it. I also meant my post more as an answer to your OP above than any specific comment below, when I realized that this is a point I forgot to make last night (as I was falling asleep...).

After you said, just there, that there's no aliasing for ZST as if it was somehow obvious (hint: it's not)

Seems I have a hard time figuring out what is obvious and what is not. :/

I was able to find a year old issue about it in the previous repo: rust-lang/rust-memory-model#44

Good find!

but this is not covered at all in the UCG and the Rustonomicon gives a very hand-wavey definition as well. You can't expect people to know what the book will say before you've written the book about it.

The issue is I have no idea where to put this. I (tried to) expressed my frustration about that fact above. I was not frustrated by you and I am sorry if I came across as such!

For arbitrary_blowup, I can see why it's safe if a general "concatenate adjacent slices" function is safe, since you could use the latter to implement the former by concatenating a slice with itself. But are we sure the latter is safe? For non-zero-sized types, it seems impossible to avoid UB anyway...

Not sure about concat, but I am sure the function presented here is safe. No need to make the question more complicated. :)

@RalfJung
Copy link
Member

@Lokathor Now that aliasing is defined in the glossary, what is left to be discussed here? Are there open questions, things that should be put into the UCG document?

@Lokathor
Copy link
Contributor Author

naw we can close this up and call it good for now.

reminder to others who find this later: people can always feel free to start new issues if there are further questions not answered here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants