-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HSA HAL for AIE #775
HSA HAL for AIE #775
Conversation
d6fb421
to
b74dd30
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for putting this together, Maksim. This is awesome!
|
||
// TODO(max): this doesn't go here, should go in pending_actions maybe? | ||
// like shouldn't be done right at dispatch time | ||
// Configure the queue's hardware context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need the context before submitting anything into the queue? I'm not familiar with pending actions.
One thing I noticed is that, when using the python bindings, if you configured the dispatch early on, and then ran the compilation, there was a bug that would kill the context in the device upon execution of the subprocess (From python). This is because somehow upon the subcommand finishing, it would call an operation that would force a flush in the file descriptor of the driver, and that would kill the context.
So early creation of the context had some implications in that case. But this is really a corner case.
iree_string_view_t identifier{}; | ||
iree_hal_hsa_dynamic_symbols_t hsa_symbols{}; | ||
int num_aie_agents{}; | ||
hsa::hsa_agent_t agents[IREE_HAL_HSA_MAX_DEVICES]{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you currently do not care to much about the GPU side, but any thoughts if what I originally had made sense?
c44d59b
to
f768298
Compare
void PrepareMatmulExecutable() { | ||
IREE_ASSERT_OK(iree_hal_executable_cache_create( | ||
device_, iree_make_cstring_view("default"), | ||
iree_loop_inline(&loop_status_), &executable_cache_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this loop inline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be honest I have no clue how this works? I copied/ported this from IREE https://github.com/iree-org/iree/blob/894dfbe962f5225975a5355ec260fa243f0e75fb/runtime/src/iree/hal/cts/executable_cache_test.h#L26
fec213a
to
c730870
Compare
- namespace hsa - implement direct linking against hsa-runtime - fix driver test Adding soft queue dispatch logic to dispatch commands to AIE agents (#2)
c730870
to
327369c
Compare
a24f88c
to
1ac9e93
Compare
This is the long-awaited HSA HAL for us (AIE). It should be considered beta - it all works (all the current tests pass, even on Strix) but there are probably still many "sharp edges". In addition, I haven't timed anything so I don't know if it's actually slower or faster than the XRT HAL.
There's a lot here if you haven't worked with IREE's HAL before or HSA before. So ask all the questions - investigating/answering them will help me clarify my own understanding as well.
Only thing to explicitly call out is that currently it's pinned to a branch of ROCr which will be merged as soon as I fix up the tests over there.
Note: only Linux (a current limitation of HSA itself).
Shout out to @eddierichter-amd for fighting in the trenches with me on this.
cc @stellaraccident @powderluv @kumardeepakamd @MaheshRavishankar