Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🤘 metal: Migrate to objc2 architecture with objc2-metal bindings #225

Merged
merged 3 commits into from
Oct 3, 2024

Conversation

MarijnS95
Copy link
Member

The current objc crate stack is completely unmaintained and has severely fallen out of date with no updates for over 4 years. The metal-rs crate, built on top of this architecture, is completely written by hand which is tedious to keep up-to-date, not to mention has inconsistencies in its implementation.

All of this is superseded by the new objc2 crate stack built by @madsmtm. Beyond providing what seems like a better, safer abstraction over Objective-C, all bindings are completely autogenerated meaning we'll no longer lag behind upstream bindings (requiring painstaking manual patching) or have inconsistencies in the implementations, as long as the generator is properly able to represent the bindings.


This PR is a draft to allow me to get a better understanding of Objective-C, as well as for @madsmtm to chime in on the current use of Metal bindings. Who also helpfully provided me a branch with planned future improvements, including changing the generated bindings to be closer to metal-rs to make other migrations easier to manage.

One important thing that is missing from this PR is interop with older metal-rs types. objc2(-metal) has the typical Id::from_raw() interop that could allow us to upgrade gpu-allocator here while still using metal-rs for a little while longer in our own codebase. Such interop could be provided like the public-winapi optional feature, or hand-rolled in our codebase if so desired.

Cargo.toml Outdated Show resolved Hide resolved
src/metal/mod.rs Outdated
location: MemoryLocation,
) -> AllocationCreateDesc<'a> {
let size_and_align = device.heap_acceleration_structure_size_and_align_with_size(size);
// TODO: Why is this one unsafe?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely just because no-one has marked it as safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's interesting. How much manual work is involved in improving the representation of the Metal API (or anything objc in general)?

I never want my implementations to be one way, and would love to aid in contributing back upstream bindings improvements. Is there some documentation or guidance in what goes into deducing that a function is safe (or any other thing that I might want to change), and how/where to ultimately mark it as such?

I'll look into your crate documentation meanwhile!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/madsmtm/objc2/blob/master/crates/header-translator/src/data/Metal.rs looks like this is where I'll want to start making changes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the place, yeah. I've been considering designing an "auto-safe" system where you could mark an entire class as being safe, and then all the methods on that class that take references and not e.g. pointers would be marked safe.

But it's a tough issue, and not a burden I wanted to tackle right now, especially in the political sense (I cannot really afford for objc2 to be seen as something that plays quick and dirty with unsafe).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw some of these methods were likely commented out and marked as unsafe because of possibly going out of bounds? E.g. setWidth: and friends, how do you deduce what happens here? And heapAccelerationStructureSizeAndAlignWithSize: too?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you deduce what happens here

In reality, I don't really - the issue here is that I don't know enough about Metal to know what's safe and what isn't, the only reason that a lot of things are marked safe is mostly to retain approximate feature-compatibility with the metal crate, I haven't actually done a thorough review of everything there.

In the case of indexes I went with being overly cautious, they looked like something that might very easily run into UB if you passed Metal some buffer and then set the size of the buffer to something else, but it's very probable that Metal protects against this internally, and that it isn't actually an issue.

Copy link
Member Author

@MarijnS95 MarijnS95 May 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds very sensible. Sounds like the combination of common-sense and maybe runtime-testing (if I pass an invalid value: does Metal prevent me from (ab)using it by returning an error or otherwise)?

More importantly this might be something to document in the header-translator files directly (if not already done so), maybe serving a double purpose as # Safety docs when deducing that a function is explicitly unsafe (or maybe also why it would be "safe" if deduced as such).


the only reason that a lot of things are marked safe is mostly to retain approximate feature-compatibility with the metal crate

Ouch; I hope any failures there to not mark things unsafe when they really are unsafe don't trickle down into your crates. I think that's the kind of compatibility you shouldn't retain, to demonstrate that there's an obvious safety benefit in using objc2.

Copy link

@madsmtm madsmtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see that you're progressing here! I can't really help with the correctness of using the Metal API itself, am not experienced enough with it to do so, but I'll be happy to continue answering questions if you're using objc2-specific APIs that are confusing (this also goes for future PRs in this and other repos).

Cargo.toml Outdated Show resolved Hide resolved
@madsmtm madsmtm mentioned this pull request May 23, 2024
4 tasks
@MarijnS95 MarijnS95 force-pushed the objc2 branch 7 times, most recently from b089113 to 5173453 Compare July 7, 2024 20:59
@MarijnS95 MarijnS95 marked this pull request as ready for review July 7, 2024 20:59

fn main() {
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("trace")).init();

let device = Arc::new(metal::Device::system_default().unwrap());
let device = unsafe { metal::MTLCreateSystemDefaultDevice() };
// TODO: Not SendSync
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this todo mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, the new Rust clippy lint that complains about using a not-Send-Sync datastructure inside an Arc (since it can't be shared across threads, it's useless and an Rc should be used instead). This is not something to affect gpu-allocator, but will likely affect our framework.

@Jasper-Bekkers Jasper-Bekkers changed the title metal: Migrate to objc2 architecture with objc2-metal bindings 🤘 metal: Migrate to objc2 architecture with objc2-metal bindings Jul 8, 2024
Comment on lines 9 to 11
let device = Arc::new(metal::Device::system_default().unwrap());
let device = unsafe { metal::MTLCreateSystemDefaultDevice() };
Copy link
Member Author

@MarijnS95 MarijnS95 Jul 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madsmtm does it appear to you that anything weird is happening with this change? This seems to be the same implementation as metal-rs yet with objc2 it always returns NULL on an M2 Air. It returns a valid device via metal-rs.

The only difference is that there's no framework linking on the extern "C" block like there is on metal-rs, is that arranged globally?

Copy link
Member Author

@MarijnS95 MarijnS95 Jul 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind. Conveniently I just went through a rename in the documentation for madsmtm/objc2#638 and spotted https://docs.rs/objc2-metal/latest/objc2_metal/, we need to set up the linking via the framework exactly.

I expected that to result in a linker/compiler error, not a "returns NULL at runtime" 😅

Should we use that or another PR to hide such functions behind a feature that turns on linking automatically, as an extra requirement to use MTLCreateSystemDefaultDevice() if it's known to always return NULL (and be effectively useless) otherwise?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also quite funky that MTLCopyAllDevices() doesn't suffer from this CoreAnimation dependency and can be called directly; probably because it needs CA to figure out which of the devices is the "default" one (though it's easy to just pick the one and only GPU from MTLCopyAllDevices() as multi-GPU seems not very common).

Copy link

@madsmtm madsmtm Jul 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use that or another PR to hide such functions behind a feature that turns on linking automatically, as an extra requirement to use MTLCreateSystemDefaultDevice() if it's known to always return NULL (and be effectively useless) otherwise?

Hmm, I'm not entirely sure. This is mostly just an already documented caveat (see Apple's official docs), and not something handled by Swift either.

The hardest part about this is that #[link(...)] attributes are not always passed on to the linker (at least not before rust-lang/rust#121293 is resolved), so users could still run into this problem, even if we provided such a feature flag (e.g. if they compile a static Rust library w. Cargo, and link that with a C project compiled in Xcode that only links to the Metal framework).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, specifically to note that while MTLCreateSystemDefaultDevice() comes from a link against the Metal framework, and something internal expects us to link to CoreGraphics (but MTLCreateSystemDefaultDevice isn't linked from CoreGraphics itself). I may have implicitly skipped over that.

@Jasper-Bekkers
Copy link
Member

I've now succesfuly ran with these changes, I think we should go ahead and merge them.

@MarijnS95
Copy link
Member Author

6a2f521

Add 'all' dependency because I can't get it to work without when basing our code on the latest 'objc2-metal' branch

This sounds wrong so I've dropped the commit. If gpu-allocator builds on/for Metal correctly, the issue lies elsewhere. Maybe another crate forgot to list the right features? Or there is interop that requires a different set of features?

Let me investigate for a bit before we merge this 🙏

@Jasper-Bekkers
Copy link
Member

This sounds wrong so I've dropped the commit.

Yeah no worries, it's a hackfix for me to be able to use this in my local testing, I can apply it locally. I didn't want to dive into why it was breaking (after having spent some time on it).

If gpu-allocator builds on/for Metal correctly, the issue lies elsewhere. Maybe another crate forgot to list the right features? Or there is interop that requires a different set of features?

I'm building on a later commit then what's released, I think the feature setup just changed.

@MarijnS95
Copy link
Member Author

MarijnS95 commented Oct 3, 2024

Let me investigate for a bit before we merge this 🙏

Found it: the git version that you're patching to hides quite a few of the types used here behind a new MTLAllocation feature since the XCode 16 beta 1 update:

madsmtm/objc2-generated@aadc4ed

We can't explicitly add this feature yet because it's not available in the objc2 dependency here, and "all" would kind of be papering over it since it includes MTLAllocation on the patched version.


Let's add MTLAllocation in our own codebase (or all) and patch it up here when we upgrade to the new version (which I hope counts as a semver break).

@Jasper-Bekkers
Copy link
Member

Sounds good, lets merge?

The current `objc` crate stack is completely unmaintained and has
severely fallen out of date with no updates for over 4 years.  The
`metal-rs` crate, built on top of this architecture, is completely
written by hand which is tedious to keep up-to-date, not to mention
has inconsistencies in its implementation.

All of this is superseded by the new `objc2` crate stack built by
@madsmtm.  Beyond providing what seems like a better, safer abstraction
over Objective-C, _all_ bindings are completely autogenerated meaning
we'll no longer lag behind upstream bindings (requiring painstaking
manual patching) or have inconsistencies in the implementations, as long
as the generator is properly able to represent the bindings.
`size_of(_val)()` was added to the prelude in Rust 1.80, causing
`unused_qualifications` warnings whenever we qualify a call to it with
via `std::mem::size_of_val()`.

The easiest workaround is to remove the prefix and explicitly import the
function in scope.  We annotate the import with a `TODO` to remove it
once bumping our MSRV on or past 1.80.
@MarijnS95
Copy link
Member Author

Ready for merge once CI is green 🎉

@MarijnS95 MarijnS95 merged commit 031b39d into main Oct 3, 2024
26 checks passed
@MarijnS95 MarijnS95 deleted the objc2 branch October 3, 2024 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants