-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom vtable API #567
base: master
Are you sure you want to change the base?
Custom vtable API #567
Conversation
95a7229
to
169cd44
Compare
Fixed minor CI issues.
|
There is another related request: #558 |
The implementation seems ok at a cursory glance. To me the main question is whether we want this or not, and if so, how should the public API of it look. |
To clarify, I need this feature to run hyper with zero copy networking. The hyper uses types from the http crate which uses With custom vtables for I'm still not sure about those methods of the |
@HyeonuPark - This looks great. One concern I have is to ensure that 3rd party container implementors have everything they need to convert their containers to and from Bytes. Other than the Regarding the Doc comments, I am implementing a branch against your branch that implements documentation that is more closely aligned with Tokio's conventions, and is clippy compliant. I'll make a PR to your repo and you're welcome to merge them in if you want. @seanmonstar - Is there a way that we can put this implementation in a feature gate, or maybe even a major version RC so that we can incorporate it into projects for better exposure and vetting? |
unsafe fn from_bytes_parts(data: &mut AtomicPtr<()>, ptr: *const u8, len: usize) -> Self; | ||
|
||
/// Returns new `Bytes` based on the current parts. | ||
unsafe fn clone(data: &AtomicPtr<()>, ptr: *const u8, len: usize) -> Bytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyeonuPark - Is there a case in which one would need to clone bytes using the components?
If so, is there another way to go about this?
A third party container implementor (say, Mmap https://docs.rs/memmap/latest/memmap/struct.Mmap.html) isn't going to be able to construct a Bytes instance because they won't have access to the internals of Bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current design cloning bytes should use the component, since cloning mechanisms like reference counting are implemented as component. For example they should do things like Arc::increment_strong_count()
within it. Currently the Bytes
is shipped with number of builtin components:
- Static component backed by
&'static [u8]
and does nothing on cloning - Shared component which increase refcount on cloning.
- Promotable component which initially store bytes without
Arc
and be promoted to shared component on first cloning.
And the BytesMut
has number of components similar but slightly different stretagies. And the promotable one is the only reason we have &AtomicPtr<()>
not *const ()
. Is it necessary? I'm not sure, but it was one of the goals of this PR to not change existing behaviors.
For the mmap part, I built my own buffer type from aligned slices of giant single mmap region. First few bytes of each slices are considered header and one of its fields are used as refcount since I don't want additional allocation for it. This refcount is modified on clone and drop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the fact that clone
returns Bytes
. A 3rd party implementation isn't going to be able to construct Bytes using the parts, because they don't have access to the internals of Bytes
.
We would have to provide some sort of public Bytes::from_parts
but I think that isn't the best approach.
Better would be that the trait doesn't specify that the implementation returns Bytes
. Instead, the consumer of the clone function (a Bytes instance) would receive the new parts, and know how to construct them.
/// All implementations must fulfill the following requirements: | ||
/// - They are cheaply cloneable and thereby shareable between an unlimited amount | ||
/// of components, for example by modifying a reference count. | ||
/// - Instances can be sliced to refer to a subset of the the original buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trait and its methods all require a # Safety
section in order to comply with the Repo's clippy config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to say this PR itself wasn't intended to be merged as-is, mostly because I was afraid it's too big to be merged at once. I planned to split it into multiple PRs once it got enough support. Also I want comments from the original author since the original logics are untouched and well, it's the # Safety
section of the public trait document.
I checked your PR to this branch. It looks good for me except for the default clone impl, for the reason I've described in above comment. The default "do nothing on clone" strategy would be only valid for &'static [u8]
buffers which is covered by builtin impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. That default impl was sort of a placeholder.
I think we need to come up with a layered strategy, though.
We should offer a default refcounted implementation for those that don't care about:
- Don't care about copy-on-write optimizations
- Special rules for cloning,
} | ||
unsafe impl BytesImpl for StaticImpl { | ||
fn into_bytes_parts(this: Self) -> (AtomicPtr<()>, *const u8, usize) { | ||
let mut bytes = mem::ManuallyDrop::new(Bytes::from_static(this.0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
into_bytes_parts
is used to construct a bytes instance using with_impl
, correct?
So this constructs a Bytes
instance in order to decompose the Bytes instance so that we can construct a Bytes
instance?
That is a tad confusing.
Do we even need to have a StaticImpl
?
Since from_static
"manually" constructs a VTable and the Bytes struct, it could just use free method versions of into_bytes_parts, clone, drop, into_vec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree things are confusing with StaticImpl
but it's because Bytes::from_static()
is const fn
so we can't use traits within it on stable. And I tried to DRY as much as possible in other places. Maybe it would be better to have right dependencies between functions even if it duplicates some logic?
StaticImpl
would be needed to allow downcasting Bytes
to &'static [u8]
without reinventing the wheel. Actual API for it is not included. I'm not sure which one is better, adding Bytes::try_into_static()
or impl TryFrom<Bytes> for &'static [u8]
or exposing SharedImpl
to public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I didn't notice the const. I think further decomposing the logic into utility functions, and then duplicating the minimum amount is the best approach.
... or exposing SharedImpl to public.
I think the solution here is "layers". We offer some default implementations, similar to SharedImpl, as a lower-level public API so that people can compose their own higher-level implementations of BytesImpl
/// | ||
/// Useful if you want to construct `Bytes` from your own buffer implementation. | ||
#[inline] | ||
pub fn with_impl<T: BytesImpl>(bytes_impl: T) -> Bytes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the namefrom_impl
might be more clear, since it consumes T and converts to Bytes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I agree on it.
Thinking more about Or maybe we could implement *edit* |
One more question: Should we specify that |
For the AtomicPtr bits, we might need to re-export the relevant types from I was able to make a working impl of BytesImpl for mmap::Mmap rather easily. All that said, I would definitely consider this the low-level and error-prone and certainly not DRY approach to incorporating a buffer type into a Bytes container. For basic buffer types, (both mutable and immutable) we should be able to make a much simpler interface. I also think that pub unsafe fn from_parts<T: BytesImpl>(data: &AtomicPtr<()>, ptr: *const u8, len: usize) -> Self {
let dptr = data.load(Ordering::Acquire);
Bytes {
ptr,
len,
data: AtomicPtr::new(dptr.cast()),
vtable: &Vtable {
type_id: TypeId::of::<T>,
clone: T::clone,
will_truncate: T::will_truncate,
into_vec: T::into_vec,
drop: T::drop,
},
}
} |
It would be nice to make
I haven't concerned exposing loom AtomicPtr. Maybe we need to newtype it like
I really should have document if more, that multiple The reasons why Edit: |
Right. When you convert one of the instances of Bytes back to T, but other instances of Bytes remain, you now have more than one instance that owns the inner buffer. Let's say you have: struct MyBuf {
inner: Box<[u8]>,
} When we convert MyBuf to a Bytes instance, the new instance takes control of the Now let's convert one of those Bytes instances back into However, there is still an instance of Bytes, and its copy of SharedImpl also owns the We need to find a way to design the API such that it is impossible to convert from Bytes to T if the refcount is > 1, similar to the design of Rc and RefCell. Also, we would need to disallow conversion from Bytes to Bytes Mut if refcount > 1. We can use the same underlying mechanism to enforce both invariants. If we just built and published the "Inner" Container States API: The 3rd party implementors would be able to easily adapt their implementation without having to worry about the internals of memory management, alignment, thread-safety, etc. Perhaps all of those above types implement the same type, e.g.
|
At a high level, we'd need a 4d matrix of functionality. These wouldn't necessarily all be separate buffer types, but we need to ensure that we can handle the use-cases. (de)Allocation Style:
Mutability:
Immutable -> Mutable Conversion Method
Clone Operation:
I think that we'd also need to add a method to the vtable to accomodate the potential of converting to mutable. Converting from Mutable to immutable shouldn't need to access the VTable.. I don't think.. |
I'm going to attempt a version that formalizes this lower-level API, and also uses more a Typestate scheme and enum based dispatch, which, among other things, is going to be faster than any vtable. (not that the current impl is an actual ptr based vtable) It should be quite a bit safer, too. Certain state transitions will be inexpressible. |
Maybe you want to convert
At this point I think those promotable impls are special enough that 3rd party promotables are not something we need. I think its purpose, besides historical reasons, is to make round trip between unsafe trait BytesImpl: Clone + 'static where Vec<u8>: From<Self> {
fn into_bytes_parts(this: Self) -> (*mut (), *const u8, usize);
unsafe fn from_bytes_parts(data: *mut (), ptr: *const u8, len: usize) -> Self;
} I don't think we need abstraction on mutable conversion. |
It wasn't intended to make generic SharedImpl a first class citizen. If we decide to make it, it would be a helper type on top of the Let me describe my own use case. First user creates a context. The context mmap a large memory, do some mlock stuff, split them into aligned chunks and push them all into its own queue. Each chunk stores inline refcount and pointer to the context on its first few bytes. User acquire new mutable buffer from context by pop the queue, write into it, and freeze it to construct clonable, sliceable and immutable buffer type. On drop with refcount 0 it pushes itself into the context queue. And I want to make this immutable buffer
I'm not sure about it. Single thread runtime or runtime per thread users may want to have hybrid refcount and optimistically use unsynchronized field on original thread. Will it actually boost performance? I don't know but I don't want to prevent it. Also my own use case is an example to manage refcount internally to not have separate allocation for refcount. If user don't want to handle atomic refcounting baggage they may use helper libraries, like generic SharedImpl.
Semantically it's perfectly ok, but is it useful? One of the major use case of |
I was hoping that we could offer some intermediate types to make it easier for implementors to integrate with Bytes, but I think you're right, it's not possible in most cases. Later, we could offer a public version that is similar to SharedImpl to make it easy for people with trivial buffer types (that can be freed with the global allocator). It should not be in scope right now. After thinking about it more, I think the fundamental dilemma of 3rd party integration comes down to the You also make a great case regarding refcounting baggage, etc. That should also be up to the implementor. I think I'm finally convinced that your design is the best approach. That said, there are a few things we still need to nail down..
The use-case for try_unfreeze is almost always to reduce allocations in the
Understanding that it is a draft, I'd say that a final implementation should include the following: (in order of priority)
** Edit ** |
You mention the downcasting approach for |
I made a PR with my proposed module and naming changes: HyeonuPark#2 I will make an additional PR following that with some takes at implementing |
One other side effect of this module refactoring is that it makes it very simple to put any allocations behind a |
This is my suggestion about constructing
Bytes
with custom vtable. Since it's not a small modification, I want to share my direction to maintainers before writing more docs or bikeshedding etc. You can consider this PR as an executable RFC.Related issues: #359 #437 #526
The base idea is to expose a trait purely consists of associated functions. It's not a breaking change to add associated function with default impl to existing trait. Default impl being mandatory can be a problem, but it's an inherent problem of exposing custom vtable.
At this point this PR is consists of 5 commits. They're quite independent that I think each deserves its own PR after the consensus is made. It's possible that some of them will be advanced while others are rejected. First 2 commits are about improving current vtable signatures. it's hard to do so after stabilizing them. Next 2 commits implements custom vtable API. The last one implements downcasting.
The first commit is about changing the name of the vtable entry
to_vec
tointo_vec
to match its semantics. Changes in its signature also reduces atomic operations, though performance wasn't a goal here. I hardly expect it actually improve performance.The second commit is to add
will_truncate
function to vtable. Currently promotable buffers can't discard bytes from the last without promotion, soBytes::trucate
specialize them by equality checking vtable address with 2 promotable vtable static variables. It doesn't seems like a good idea with open set of vtables.The third commit implements the
BytesImpl
trait and theBytes::with_impl
constructor. It would be the main commit of this PR.The fourth commit occupies majority of diffs this PR have, but all it does it to move functions into impl blocks. It should not add or change any behavior.
The fifth commit is about downcasting. I think round tripping between custom buffers and the
Bytes
is the next step people would expect. Potentiallytry_unsplit
can also be built on top of it.Thanks for reading. Any feedback is welcome.