-
Notifications
You must be signed in to change notification settings - Fork 626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add in-tree special_models test suite using reworked iree-tooling. #17883
Add in-tree special_models test suite using reworked iree-tooling. #17883
Conversation
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
…o clearly distinct packages Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
def pytest_addoption(parser): | ||
parser.addoption( | ||
"--goldentime-rocm-e2e-ms", | ||
action="store", | ||
type=float, | ||
help="Golden time to test benchmark", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This can be deferred to a follow-up refactoring, to keep this PR more incremental)
We might not need flags for these options now that the files are living in the same repo.
I'd prefer for the workflow files to be relatively frozen, with expected results stored in source files in the tests (or very close to the tests). We could pull from a source file like this: https://github.com/iree-org/iree/blob/main/build_tools/benchmarks/common/benchmark_thresholds.py
This puts default values in a source file but then none of the defaults actually matter, since CI jobs always override them with CLI flags. When a developer runs the test they shouldn't also need to match exactly the flags used in the workflow file.
class VmfbManager: | ||
sdxl_clip_cpu_vmfb = None | ||
sdxl_vae_cpu_vmfb = None | ||
sdxl_unet_cpu_vmfb = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this file exist? Can't these just be inlined into the files that use them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this file to share data between tests. There is no cleaner way to inline and use the result from one test in another test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which tests share data? I couldn't tell from a high level glance through. A few cases that I looked seemed to only use the vmfb from within the same file, so wouldn't a global in those files serve the same purpose as this separate file?
I want to avoid having a file like this needing to know about every configuration of every test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I tried the global variable method. But, the way it works with those is that you can only consume global variables in all the independent tests. But, we can't change the value in one test for the vmfb and then have the run_module test consume that vmfb generated. The cleanest way I was able to find was this.
(By tests sharing data I mean every iree_run_module test is trying to use a vmfb generated by the respective iree_compile test)
sd3_clip_inference_input_0 = fetch_source_fixture( | ||
"https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sd3-prompt-encoder/inference_input.0.bin", | ||
group="sd3_clip", | ||
) | ||
|
||
sd3_clip_inference_input_1 = fetch_source_fixture( | ||
"https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sd3-prompt-encoder/inference_input.1.bin", | ||
group="sd3_clip", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine for now, but we should zip these (nod-ai/SHARK-TestSuite#285) or use an array somehow here. This is a lot of boilerplate to download and use individual loose files.
@pytest.mark.depends(on=["test_compile_clip_rocm"]) | ||
def test_run_clip_rocm(SD3_CLIP_COMMON_RUN_FLAGS, sd3_clip_real_weights): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add pytest marks for cpu, rocm, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we are using -k command line arg similar to SHARK-TestSuite/iree_tests to check for the backend pattern, but yeah should mark here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, fine to do what is most similar to the current setup in this PR, but marks will be more flexible. We couldn't as easily use marks before since the test cases were generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this test style is a big improvement over the cross-repo style with json files.
def test_compile_clip_cpu(sd3_clip_mlir): | ||
VmfbManager.sd3_clip_cpu_vmfb = iree_compile( | ||
sd3_clip_mlir, "cpu", CPU_COMPILE_FLAGS | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this differ from the fixture approach taken here?
iree/experimental/regression_suite/tests/pregenerated/test_llama2.py
Lines 97 to 107 in 6df0372
@pytest.fixture | |
def llama2_7b_f16qi4_stripped_rdna3_rocm_vmfb(llama2_7b_f16qi4_stripped_source): | |
return iree_compile( | |
llama2_7b_f16qi4_stripped_source, | |
"rdna3_rocm", | |
flags=COMMON_FLAGS | |
+ [ | |
"--iree-hal-target-backends=rocm", | |
"--iree-rocm-target-chip=gfx1100", | |
], | |
) |
iree/experimental/regression_suite/tests/pregenerated/test_llama2.py
Lines 220 to 240 in 6df0372
@pytest.mark.presubmit | |
@pytest.mark.unstable_linalg | |
@pytest.mark.plat_rdna3_rocm | |
def test_step_rdna3_rocm_stripped(llama2_7b_f16qi4_stripped_rdna3_rocm_vmfb): | |
iree_benchmark_module( | |
llama2_7b_f16qi4_stripped_rdna3_rocm_vmfb, | |
device="rocm", | |
function="first_vicuna_forward", | |
args=[ | |
"--input=1x1xi64", | |
], | |
) | |
iree_benchmark_module( | |
llama2_7b_f16qi4_stripped_rdna3_rocm_vmfb, | |
device="rocm", | |
function="second_vicuna_forward", | |
args=[ | |
"--input=1x1xi64", | |
] | |
+ (["--input=1x32x1x128xf16"] * 64), | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous approach, we only had the iree_run_module part as the tests. Now, we make the iree_compile part tests also. They wouldn't show as tests in the log if we make compilation a fixture, which I think is pretty valuable for anyone running these tests.
I think we should aim to land this as an incremental move and then continue to iterate on the design after getting past the cross-repository work hurdles. Minimally, we should:
Both of those seem like small but nontrivial tasks, so I think we should still revert the test suite PR and get the other updates (including the torch-mlir bump) running in parallel. Beyond that, I think focusing on the development experience with these tests will guide answers to the other review comments. The tests should be runnable locally using filters appropriate for whatever system they are run on. If a test fails, it should be obvious how to mark it as xfail in that configuration. If a benchmark is outside of expected values, it should be obvious how to update the values for that configuration (I'm not sure if CLI flags or source files is best for that yet). |
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, good enough to land with a few nits. A bit rough around the edges so starting in experimental/
is reasonable. Let's aim to iterate on the code and get it out of experimental/
in O(1-2 weeks).
experimental/regression_suite/shark-test-suite-models/sd3/test_clip.py
Outdated
Show resolved
Hide resolved
if return_code == 0: | ||
return 0, proc.stdout | ||
logging.getLogger().info(f"Command failed with error: {proc.stderr}") | ||
return 1, proc.stdout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the same bug from nod-ai/SHARK-TestSuite#286. Once landed we can iterate on a fix in this repo.
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
…ree-org#17883) With this, we move away from using all the specialized json config files and complex workflows. Instead, we use python scripts which allow us to use custom flags, tolerances, and configurations based on the backend/model. Related PR in TestSuite: nod-ai/SHARK-TestSuite#271 This PR also removes all dependencies on SHARK-TestSuite tooling. Reworked the tools here so that downloading, caching, testing, and benchmarking occurs as intended with tools solely from this repo for iree_special_models. Whenever we are adding test files here, the goal is for an IREE user to be able to clone the repo and run the run tests knowing nothing about the SHARK-TestSuite . Also didn't realize, but ireers here already has a process of stamping here to check if a file is already produced. I think we have to remove this because it will skip even if there is a newer version of the file available and there's really no point when downloading to a cache because once it's there, it is never removed so not a valuable signal. (Third times the charm. Had to close the last two versions of this PR because couldn't get passed a pre-commit check that led me to rebase and add a bunch of commits that weren't mine 🤦 ) ci-exactly: build_all, test_amd_mi300, build_packages, regression_test --------- Signed-off-by: saienduri <[email protected]> Signed-off-by: Lubo Litchev <[email protected]>
With this, we move away from using all the specialized json config files and complex workflows.
Instead, we use python scripts which allow us to use custom flags, tolerances, and configurations based on the backend/model.
Related PR in TestSuite: nod-ai/SHARK-TestSuite#271
This PR also removes all dependencies on SHARK-TestSuite tooling. Reworked the tools here so that downloading, caching, testing, and benchmarking occurs as intended with tools solely from this repo for iree_special_models. Whenever we are adding test files here, the goal is for an IREE user to be able to clone the repo and run the run tests knowing nothing about the SHARK-TestSuite .
Also didn't realize, but ireers here already has a process of stamping here to check if a file is already produced. I think we have to remove this because it will skip even if there is a newer version of the file available and there's really no point when downloading to a cache because once it's there, it is never removed so not a valuable signal.
(Third times the charm. Had to close the last two versions of this PR because couldn't get passed a pre-commit check that led me to rebase and add a bunch of commits that weren't mine 🤦 )
ci-exactly: build_all, test_amd_mi300, build_packages, regression_test