-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tuner] Refactor tuner flow to use unified functions for models and dispatches #577
Conversation
…ispatches Signed-off-by: Max Dawkins <[email protected]>
This PR is not really ready to land yet, since it requires an updated iree-compiler version. I'm mainly sending this out right now for visibility. EDIT: I also realized I forgot to add the actual TD spec application function in the TD spec generation. I will add that in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great, thanks for the detailed description and breakdown of the plan!
We should split it into smaller PRs and start landing 😸
class TestTuner(libtuner.TuningClient): | ||
def __init__(self): | ||
self.compile_flags = [ | ||
"--iree-hip-target=gfx942", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For dispatches, we don't need to provide the target as dispatches are already configured.
But we will need this for the full model,
self.benchmark_flags = [ | ||
"--benchmark_repetitions=3", | ||
"--benchmark_format=json", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually we should use the bindings for benchmarking too. Just FYI, this is definitely outside of the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's actually what I did at first, but the compiler bindings did not seem to recognize many of the compiler codegen flags. I didn't dig too deep into it, but I think we may need to expose more flags to the compiler api in IREE before we can do this for compilation. And then since I was already using subprocess for compilation, I just did the same for benchmarking for consistency.
@@ -1,5 +1,5 @@ | |||
[project] | |||
name = "SHARK Tuner" | |||
name = "SHARK-Tuner" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The space was causing my pip install of the tuner to fail for some reason, so I just made it a dash. I forgot to remove this change before pushing. I'm not sure why it fails to pip install on my machine (and evidently not on the CI), but if there is no preference for having a space then I think it's probably safer to use a dash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I've never tried installing it, I don't know if it actually works. I mostly copy-pasted from other project file in this repo
tile_sizes = [0, 0] | ||
tile_sizes.append(configuration.tile_sizes[2]) | ||
return tile_sizes | ||
|
||
def get_workgroup_tile_sizes(self, configuration: Configuration): | ||
tile_sizes = configuration.tile_sizes[:2] | ||
tile_sizes.append(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will probably also have to refactor how we store tile sizes to support other pipelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there are a number of additional things that could probably use refactoring, but I left that out of the scope of this PR, since it is already very big. I think it makes sense to refactor this when we move the TD generation to use purely python bindings.
This PR adds direct transform dialect spec generation for candidate configurations. This is the first part of the large refactoring described in #577. The way TD specs are generated is by matching against certain types of operations, and then creating a named sequence with `transform.iree.match.cast_compatible_dag_from_root` based on the matched operation. This is done for each configuration found, and the specs are saved to the temporary tuning directory to be used later in tuning. One main difference in the flow of candidate generation is that state is no longer tracked by saving files to a temporary directory. Instead, ir modules are passed to each function, and only at the very end of candidate generation are the transform dialect specs written to files. This makes things cleaner, since there no longer needs to be a coordination of file paths. Signed-off-by: Max Dawkins <[email protected]>
This PR adds direct transform dialect spec generation for candidate configurations. This is the first part of the large refactoring described in nod-ai#577. The way TD specs are generated is by matching against certain types of operations, and then creating a named sequence with `transform.iree.match.cast_compatible_dag_from_root` based on the matched operation. This is done for each configuration found, and the specs are saved to the temporary tuning directory to be used later in tuning. One main difference in the flow of candidate generation is that state is no longer tracked by saving files to a temporary directory. Instead, ir modules are passed to each function, and only at the very end of candidate generation are the transform dialect specs written to files. This makes things cleaner, since there no longer needs to be a coordination of file paths. Signed-off-by: Max Dawkins <[email protected]>
This PR refactors the tuner flow into a unified compilation + benchmark path, and adds an example test showing how to use the new flow. The PR does not remove any of the code from the old paths, because the new path requires a newer version of iree-compiler, which is not compatible with most of the tuner code yet. This PR is quite lengthy so the new flow is described below:
Goals and Motivations
The purpose of this PR is first to unify the model and dispatch compilation and benchmark flow, and second to remove much of the spooky action at a distance between tuner clients and the internal tuner implementation. The first is accomplished by having a single compile and benchmark function, and the second is greatly improved by attempting to hide as much of the CandidateTracker information from the client as possible.
Candidate Generation
Candidate generation uses the same constraint logic to determine a set of potential configurations, and then directly generates transform dialect specs based on the input mlir file, and the generated constraints. The TD specs are created by matching specific operation types in the input mlir file, and then creating a
transform.iree.match.cast_compatible_dag_from_root
in the TD named sequence matcher. This uses a mix of python bindings for matching root ops, and string formatting for building the TD spec, but can eventually be shifted to using exclusively python bindings.The main difference in the new candidate generation is that only TD specs are created, instead of generating the full configured source for compilation later. The source will be stripped of compilation info and compiled with the TD spec during the compile step.
Note: Candidate generation with the new path was only implemented for MMT and Conv in this initial PR. The other operations can be supported fairly easily after this PR is merged.
Candidate Compilation
Candidates are now compiled with a single function for both models and dispatches. There are a few primary differences here:
iree-codegen-strip-compilation-info
pass before compilation. This allows dispatches to be compiled in the same way as models.get_iree_compile_flags
, which is used in place ofget_dispatch_compile_command
andget_model_compile_command
. This new function hides the candidate trackers from the client, and is only meant to return some additional iree-compile flags (not including the source file path oriree-hal-target-backends
). This simplifies the TuningClient implementations significantly, and makes it much easier to understand how to create one.Candidate Benchmarking
Similar to the compilation, the
get_dispatch_benchmark_command
andget_model_benchmark_command
are replaced by a singleget_iree_benchmark_module_flags
function. This function hides the candidate trackers from the client, and simply expects a list of extra flags for iree-benchmark-module (not including--device
,--module
, or--benchmark_format
). The benchmark parsing also needed to be rewritten to match the new benchmarking command.